Content area

Abstract

This study leverages large Visual Language Models (VLM) to develop an intelligent pedestrian crossing scenario system within autonomous driving environments. By establishing standardized checklists and prompts, the system minimizes the risks of misjudgment and omission through multimodal data processing. It offers data-driven decision-making support, presenting an innovative approach to integrating autonomous driving technology with intelligent transportation systems. The study begins by classifying pedestrian crossing scenarios based on international autonomous driving standards, distinguishing between pedestrian crossings and autonomous vehicle crossings, as well as dynamic and static entities. Next, standardized prompts derived from these standards are fed into the VLM, generating structured scenario checklists of dynamic and static entities, outputted in JSON format. This systematic identification and processing of entities—such as pedestrians, vehicles, and traffic facilities—enables the construction of structured data representations for complex traffic scenarios. Building on this foundation, the VLM analyzes scenario data to predict collision risks by modeling the behaviors of both pedestrians and vehicles, supporting real-time decision-making for autonomous vehicles and road users. Furthermore, the VLM processes scene data to anticipate potential conflicts and provide actionable safety recommendations, enhancing the overall security of all traffic participants. The system achieved a perception accuracy of 93.05%, with risk prediction consistency and decision-making rule consistency rates of 85.91% and 87.72% respectively. By constructing a VLM-based intelligent pedestrian crossing perception system, this study offers a novel technical framework for improving perception, prediction, and decision-making in autonomous driving. Unlike traditional rule-based and deep learning approaches, which struggle with complex pedestrian behaviors and dynamic environments, our method integrates visual perception with reasoning capabilities, enabling structured, standardized, and explainable decision-making in pedestrian crossing scenarios.

Details

1009240
Title
Improving intelligent perception and decision optimization of pedestrian crossing scenarios in autonomous driving environments through large visual language models
Author
Teng, Xiao 1 ; Huang, Lin 2 ; Shen, Zhenjiang 3 ; Li, Wankai 4 

 Faculty of Transdisciplinary Sciences, Institute of Philosophy in Interdisciplinary Sciences, Kanazawa University, 920-1192, Kanazawa, Japan (ROR: https://ror.org/02hwp6a56) (GRID: grid.9707.9) (ISNI: 0000 0001 2308 3329); China Youke Communication Technology Co.,Ltd, Fujian, China 
 Graduate School of Natural Science & technology, Kanazawa University, 920-1192, Kanazawa, Japan (ROR: https://ror.org/02hwp6a56) (GRID: grid.9707.9) (ISNI: 0000 0001 2308 3329) 
 Graduate School of Natural Science & technology, Kanazawa University, 920-1192, Kanazawa, Japan (ROR: https://ror.org/02hwp6a56) (GRID: grid.9707.9) (ISNI: 0000 0001 2308 3329); International Joint Laboratory of Spatial Planning and Sustainable Development (FZUKU-LAB SPSD), Fuzhou University, 350025, Fuzhou, China (ROR: https://ror.org/011xvna82) (GRID: grid.411604.6) (ISNI: 0000 0001 0130 6528) 
 Faculty of Transdisciplinary Sciences, Institute of Philosophy in Interdisciplinary Sciences, Kanazawa University, 920-1192, Kanazawa, Japan (ROR: https://ror.org/02hwp6a56) (GRID: grid.9707.9) (ISNI: 0000 0001 2308 3329) 
Volume
15
Issue
1
Pages
31283
Number of pages
19
Publication year
2025
Publication date
2025
Section
Article
Publisher
Nature Publishing Group
Place of publication
London
Country of publication
United States
Publication subject
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-25
Milestone dates
2025-08-04 (Registration); 2025-04-10 (Received); 2025-08-04 (Accepted)
Publication history
 
 
   First posting date
25 Aug 2025
ProQuest document ID
3243609535
Document URL
https://www.proquest.com/scholarly-journals/improving-intelligent-perception-decision/docview/3243609535/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-08-26
Database
ProQuest One Academic