Content area

Abstract

Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. To support rigorous evaluation of mathematical reasoning in LLMs, we introduce the “MathOdyssey” dataset - a curated collection of 387 expert-generated mathematical problems spanning high school, university, and Olympiad-level topics. Each problem is accompanied by a detailed solution and categorized by difficulty level, subject area, and answer type. The dataset was developed through a rigorous multi-stage process involving contributions from subject experts, peer review, and standardized formatting. We provide detailed metadata and a standardized schema to facilitate consistent use in downstream applications. To demonstrate the dataset’s utility, we evaluate several representative LLMs and report their performance across problem types. We release MathOdyssey as an open-access resource to enable reproducible and fine-grained assessment of mathematical capabilities in LLMs and to foster further research in mathematical reasoning and education.

Details

1009240
Business indexing term
Title
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
Author
Fang, Meng 1 ; Wan, Xiangpeng 2 ; Lu, Fei 3 ; Xing, Fei 4 ; Zou, Kai 2 

 Department of Computer Science, University of Liverpool, Liverpool, UK (ROR: https://ror.org/04xs57h96) (GRID: grid.10025.36) (ISNI: 0000 0004 1936 8470) 
 NetMind.AI, London, UK (ROR: https://ror.org/03knd6b36) (GRID: grid.497885.f) (ISNI: 0000 0000 9934 3724) 
 Department of Mathematics, Johns Hopkins University, Baltimore, MD, USA (ROR: https://ror.org/00za53h95) (GRID: grid.21107.35) (ISNI: 0000 0001 2171 9311) 
 Mathematica Policy Research, Princeton, New Jersey, USA (ROR: https://ror.org/02403vr89) (GRID: grid.419482.2) (ISNI: 0000 0004 0618 1906) 
Publication title
Volume
12
Issue
1
Pages
1392
Number of pages
9
Publication year
2025
Publication date
2025
Section
Data Descriptor
Publisher
Nature Publishing Group
Place of publication
London
Country of publication
United States
Publication subject
e-ISSN
20524463
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-08
Milestone dates
2025-05-28 (Registration); 2025-02-27 (Received); 2025-05-27 (Accepted)
Publication history
 
 
   First posting date
08 Aug 2025
ProQuest document ID
3237859299
Document URL
https://www.proquest.com/scholarly-journals/mathodyssey-benchmarking-mathematical-problem/docview/3237859299/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-08-09
Database
ProQuest One Academic