A Reinforcement Learning Hyper-Heuristic with Cumulative Rewards for Dual-Peak Time-Varying Network Optimization in Heterogeneous Multi-Trip Vehicle Routing

Abstract

Urban logistics face complexity due to traffic congestion, fleet heterogeneity, warehouse constraints, and driver workload balancing, especially in the Heterogeneous Multi-Trip Vehicle Routing Problem with Time Windows and Time-Varying Networks (HMTVRPTW-TVN). We develop a mixed-integer linear programming (MILP) model with dual-peak time discretization and exact linearization for heterogeneous fleet coordination. Given the NP-hard nature, we propose a Hyper-Heuristic based on Cumulative Reward Q-Learning (HHCRQL), integrating reinforcement learning with heuristic operators in a Markov Decision Process (MDP). The algorithm dynamically selects operators using a four-dimensional state space and a cumulative reward function combining timestep and fitness. Experiments show that, for small instances, HHCRQL achieves solutions within 3% of Gurobi’s optimum when customer nodes exceed 15, outperforming Large Neighborhood Search (LNS) and LNS with Simulated Annealing (LNSSA) with stable, shorter runtime. For large-scale instances, HHCRQL reduces gaps by up to 9.17% versus Iterated Local Search (ILS), 6.74% versus LNS, and 5.95% versus LNSSA, while maintaining relatively stable runtime. Real-world validation using Shanghai logistics data reduces waiting times by 35.36% and total transportation times by 24.68%, confirming HHCRQL’s effectiveness, robustness, and scalability.

Details

Business indexing term

Subject:

Supply chain management;
Quality of service;
Transportation services;
Customers;
Logistics;
Inventory

Identifier / keyword

routing; multi-trip; time-varying road networks; heterogeneous fleets; Q-learning; hyper-heuristic

Title

A Reinforcement Learning Hyper-Heuristic with Cumulative Rewards for Dual-Peak Time-Varying Network Optimization in Heterogeneous Multi-Trip Vehicle Routing

Author

Wang, Xiaochuan; Li, Na; Jin Xingchen

Publication title

Algorithms; Basel

Volume

Issue

First page

536

Number of pages

Publication year

2025

Publication date

2025

Publisher

MDPI AG

Place of publication

Basel

Country of publication

Switzerland

Publication subject

Computers, Mathematics

e-ISSN

19994893

Source type

Scholarly Journal

Language of publication

English

Document type

Journal Article

Publication history

Online publication date

2025-08-22

Milestone dates

2025-07-17 (Received); 2025-08-21 (Accepted)

Publication history

First posting date

22 Aug 2025

DOI

https://doi.org/10.3390/a18090536

ProQuest document ID

3254461752

Document URL

https://www.proquest.com/scholarly-journals/reinforcement-learning-hyper-heuristic-with/docview/3254461752/se-2?accountid=208611

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Last updated

2025-09-26

Database

ProQuest One Academic

A Reinforcement Learning Hyper-Heuristic with Cumulative Rewards for Dual-Peak Time-Varying Network Optimization in Heterogeneous Multi-Trip Vehicle Routing

Content area

Abstract

Details