Content area

Abstract

We introduce PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning), a framework that integrates preference optimization with reinforcement learning (RL) concepts for self-improving scientific reasoning. PRefLexOR employs a recursive approach, refining intermediate steps before producing final outputs in training and inference. It optimizes log odds between preferred and non-preferred responses using an in-situ dataset generation algorithm. A dynamic knowledge graph contextualizes reasoning with retrieval-augmented data. Preference optimization enhances performance via rejection sampling, masking reasoning steps to focus on discovery. Recursive optimization, guided by feedback loops, refines reasoning. This process mirrors biological adaptation, enabling real-time learning. We find that even small models (3B parameters) self-teach deeper reasoning, solving open-domain problems effectively. Our method integrates into existing LLMs and demonstrates success in biological materials science, leveraging multi-agent self-improvement for enhanced reasoning depth and cross-domain adaptability, offering flexibility and integration into larger agentic systems.

Details

1009240
Business indexing term
Title
PRefLexOR: preference-based recursive language modeling for exploratory optimization of reasoning and agentic thinking
Author
Buehler, Markus J. 1 

 Massachusetts Institute of Technology, Center for Computational Science and Engineering, Schmarzman College of Computing, Laboratory for Atomistic and Molecular Mechanics (LAMM), Cambridge, USA (GRID:grid.116068.8) (ISNI:0000 0001 2341 2786) 
Publication title
Volume
1
Issue
1
Pages
4
Publication year
2025
Publication date
Dec 2025
Publisher
Nature Publishing Group
Place of publication
London
Country of publication
United States
e-ISSN
30051460
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-05-14
Milestone dates
2025-03-22 (Registration); 2024-11-01 (Received); 2025-03-22 (Accepted)
Publication history
 
 
   First posting date
14 May 2025
ProQuest document ID
3227648399
Document URL
https://www.proquest.com/scholarly-journals/preflexor-preference-based-recursive-language/docview/3227648399/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-07-07
Database
ProQuest One Academic