Full text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

This study presents a method based on active preference learning to overcome the challenges of designing reward functions for autonomous navigation. Results obtained from training with artificially designed reward functions may not accurately reflect human intentions. We focus on the limitations of traditional reward functions, which often fail to facilitate complex tasks in continuous state spaces. We propose the adoption of active preference learning to resolve these issues and to generate reward functions that align with human preferences. This approach leverages an individual’s subjective preferences to guide an agent’s learning process, enabling the creation of reward functions that reflect human desires. We utilize mutual information to generate informative queries and apply information gained to balance the agent’s uncertainty with the human’s response capacity, encouraging the agent to pose straightforward and informative questions. We further employ the No-U-Turn Sampler (NUTS) method to refine the belief model, which outperforms that constructed using the Metropolis algorithm. Subsequently, we retrain the agent using reward weights derived from active preference learning. As a result, our autonomous driving vehicle can navigate between random starting and ending points without dependence on high-precision maps or routing, relying solely on its forward vision. We validate our approach’s performance within the CARLA simulation environment. Our algorithm significantly improved the success rate of autonomous driving navigation tasks that originally failed due to artificially designed rewards, increasing it to approximately 60%. Experimental results show significant improvement over the baseline algorithm, providing a solid foundation for enhancing navigation capabilities in autonomous driving systems and advancing the field of autonomous driving intelligence.

Details

Title
Designing Reward Functions Using Active Preference Learning for Reinforcement Learning in Autonomous Driving Navigation
Author
Ge, Lun 1   VIAFID ORCID Logo  ; Zhou, Xiaoguang 1 ; Li, Yongqiang 2 

 School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China; [email protected] 
 Mogo Auto Intelligence and Telematics Information Technology Co., Ltd., Beijing 100010, China; [email protected] 
First page
4845
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3067412729
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.