Content area
This study presents an investigation on adaptive parameter optimization techniques for Reinforcement Learning-based Apache Spark job scheduling. Traditional Reinforcement Learning-based scheduling approaches suffer from the limitations of fixed hyperparameter configurations, requiring extensive manual tuning and often failing to adapt optimally to diverse workload characteristics. The research develops and evaluates adaptive mechanisms that enhance Proximal Policy Optimization (PPO) effectiveness through dynamic parameter adjustment. Four adaptive approaches are proposed: adaptive clipping that dynamically adjusts policy update constraints based on Kullback-Leibler divergence feedback, adaptive learning rate mechanisms that modulate optimization step sizes according to training progress, a combined approach leveraging both techniques simultaneously, and enhanced Generalized Advantage Estimation for improved value function approximation.
The experimental evaluation is conducted within a comprehensive discrete-event simulator that accurately models Apache Spark execution semantics. The proposed mechanisms are tested using Transaction Processing Performance Council - High Performance (TPC-H) workloads across multiple random seeds to ensure statistical rigor and reproducibility. The adaptive mechanisms are formulated under the assumptions of policy gradient optimization theory and incorporate feedback-based parameter adjustment strategies. Sample problems are considered, and the solutions obtained for adaptive mechanisms are compared with those achieved by baseline implementation. The results reveal that, with proper adaptive parameter adjustment, the proposed mechanisms may become advantageous over traditional fixed-parameter approaches in terms of convergence stability, exploration effectiveness, and optimization quality.