Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Traditional Vision-and-Language Navigation (VLN) tasks require an agent to navigate static environments using natural language instructions. However, real-world road conditions such as vehicle movements, traffic signal fluctuations, pedestrian activity, and weather variations are dynamic and continually changing. These factors significantly impact an agent’s decision-making ability, underscoring the limitations of current VLN models, which do not accurately reflect the complexities of real-world navigation. To bridge this gap, we propose a novel task called Dynamic Vision-and-Language Navigation (DynamicVLN), incorporating various dynamic scenarios to enhance the agent’s decision-making abilities and adaptability. By redefining the VLN task, we emphasize that a robust and generalizable agent should not rely solely on predefined instructions but must also demonstrate reasoning skills and adaptability to unforeseen events. Specifically, we have designed ten scenarios that simulate the challenges of dynamic navigation and developed a dedicated dataset of 11,261 instances using the CARLA simulator (ver.0.9.13) and large language model to provide realistic training conditions. Additionally, we introduce a baseline model that integrates advanced perception and decision-making modules, enabling effective navigation and interpretation of the complexities of dynamic road conditions. This model showcases the ability to follow natural language instructions while dynamically adapting to environmental cues. Our approach establishes a benchmark for developing agents capable of functioning in real-world, dynamic environments and extending beyond the limitations of static VLN tasks to more practical and versatile applications.

Details

Title
DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
Author
Sun, Yanjun 1   VIAFID ORCID Logo  ; Qiu, Yue 2   VIAFID ORCID Logo  ; Aoki, Yoshimitsu 3   VIAFID ORCID Logo 

 Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan; [email protected]; National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba 305-8560, Japan; [email protected] 
 National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba 305-8560, Japan; [email protected] 
 Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan; [email protected] 
First page
364
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
14248220
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3159620585
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.