Content area

Abstract

Role-playing is a crucial capability of Large Language Models (LLMs), enabling a wide range of practical applications, including intelligent non-player characters, digital twins, and emotional companions. Evaluating this capability in LLMs is challenging due to the complex dynamics involved in role-playing, such as maintaining character fidelity throughout a storyline and navigating open-ended narratives without a definitive ground truth. Current evaluation methods, which primarily focus on question-answering or conversational snapshots, fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing. In this paper, we propose CharacterBox, which is a simulation sandbox designed to generate situational fine-grained character behavior trajectories. These behavior trajectories enable a more comprehensive and in-depth evaluation of role-playing capabilities. CharacterBox consists of two main components: the character agent and the narrator agent. The character agent, grounded in psychological and behavioral science, exhibits human-like behaviors, while the narrator agent coordinates interactions between character agents and environmental changes. Additionally, we introduce two trajectory-based methods that leverage CharacterBox to enhance LLM performance. To reduce costs and facilitate the adoption of CharacterBox by public communities, we fine-tune two smaller models, CharacterNR and CharacterRM, as substitutes for GPT API calls, and demonstrate their competitive performance compared to advanced GPT APIs.

Details

1009240
Title
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
Publication title
arXiv.org; Ithaca
Publication year
2024
Publication date
Dec 7, 2024
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2024-12-10
Milestone dates
2024-12-07 (Submission v1)
Publication history
 
 
   First posting date
10 Dec 2024
ProQuest document ID
3142733846
Document URL
https://www.proquest.com/working-papers/characterbox-evaluating-role-playing-capabilities/docview/3142733846/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-12-11
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic