Content area

Abstract

Learning a perception and reasoning module for robotic assistants to plan steps to perform complex tasks based on natural language instructions often requires large free-form language annotations, especially for short high-level instructions. To reduce the cost of annotation, large language models (LLMs) are used as a planner with few data. However, when elaborating the steps, even the state-of-the-art planner that uses LLMs mostly relies on linguistic common sense, often neglecting the status of the environment at command reception, resulting in inappropriate plans. To generate plans grounded in the environment, we propose FLARE (Few-shot Language with environmental Adaptive Replanning Embodied agent), which improves task planning using both language command and environmental perception. As language instructions often contain ambiguities or incorrect expressions, we additionally propose to correct the mistakes using visual cues from the agent. The proposed scheme allows us to use a few language pairs thanks to the visual cues and outperforms state-of-the-art approaches. Our code is available at https://github.com/snumprlab/flare.

Details

1009240
Identifier / keyword
Title
Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples
Publication title
arXiv.org; Ithaca
Publication year
2024
Publication date
Dec 23, 2024
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2024-12-24
Milestone dates
2024-12-23 (Submission v1)
Publication history
 
 
   First posting date
24 Dec 2024
ProQuest document ID
3148961479
Document URL
https://www.proquest.com/working-papers/multi-modal-grounded-planning-efficient/docview/3148961479/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-12-25
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic