Content area

Abstract

Robotic search of people in human-centered environments, including healthcare settings, is challenging, as autonomous robots need to locate people without complete or any prior knowledge of their schedules, plans, or locations. Furthermore, robots need to be able to adapt to real-time events that can influence a person’s plan in an environment. In this paper, we present MLLM-Search, a novel zero-shot person search architecture that leverages multimodal large language models (MLLM) to address the mobile robot problem of searching for a person under event-driven scenarios with varying user schedules. Our approach introduces a novel visual prompting method to provide robots with spatial understanding of the environment by generating a spatially grounded waypoint map, representing navigable waypoints using a topological graph and regions by semantic labels. This is incorporated into an MLLM with a region planner that selects the next search region based on the semantic relevance to the search scenario and a waypoint planner that generates a search path by considering the semantically relevant objects and the local spatial context through our unique spatial chain-of-thought prompting approach. Extensive 3D photorealistic experiments were conducted to validate the performance of MLLM-Search in searching for a person with a changing schedule in different environments. An ablation study was also conducted to validate the main design choices of MLLM-Search. Furthermore, a comparison study with state-of-the-art search methods demonstrated that MLLM-Search outperforms existing methods with respect to search efficiency. Real-world experiments with a mobile robot in a multi-room floor of a building showed that MLLM-Search was able to generalize to new and unseen environments.

Details

1009240
Title
MLLM-Search: A Zero-Shot Approach to Finding People Using Multimodal Large Language Models
Author
Fung, Angus 1   VIAFID ORCID Logo  ; Tan, Aaron Hao 1   VIAFID ORCID Logo  ; Wang Haitong 1 ; Benhabib Bensiyon 1   VIAFID ORCID Logo  ; Goldie, Nejat 2   VIAFID ORCID Logo 

 Autonomous Systems and Biomechatronics Laboratory (ASBLab), Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada; [email protected] (A.H.T.); [email protected] (H.W.); [email protected] (B.B.) 
 Autonomous Systems and Biomechatronics Laboratory (ASBLab), Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada; [email protected] (A.H.T.); [email protected] (H.W.); [email protected] (B.B.), KITE, Toronto Rehabilitation Institute, University Health Newtork (UHN), Toronto, ON M5G 2A2, Canada, Rotman Research Institute, Baycrest Health Sciences, North York, ON M6A 2E1, Canada 
Publication title
Robotics; Basel
Volume
14
Issue
8
First page
102
Number of pages
19
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
22186581
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-07-28
Milestone dates
2025-04-22 (Received); 2025-07-25 (Accepted)
Publication history
 
 
   First posting date
28 Jul 2025
ProQuest document ID
3244057796
Document URL
https://www.proquest.com/scholarly-journals/mllm-search-zero-shot-approach-finding-people/docview/3244057796/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-08-27
Database
ProQuest One Academic