Mathematical Capabilities of LLMs

Title: Mathematical Capabilities of LLMs
Creator: Thakur, Rahul; Poonia, Ramesh Chandra
Description: Large language models (LLMs) have the potential to solve mathematical problems well, but little has been investigated. In this research, we evaluate five leading LLMs - (Gemini, Claude, Mistral, ChatGPT, and Llama - on a set of 50 mathematical problems that cover calculus, algebra, geometry, number theory, and probability. Finally, the study evaluates the accuracy of their solutions and gives the ability to assess their intermediate steps correctly. I created a primary dataset for comparison of the LLM performances and ranked the models according to how well they were able to solve those problems. It illustrates the shortcomings of present LLMs at reasoning and solving for mathematics and suggests what needs to be rectified in the LLMs. Future research will further refine this dataset and monitor the progression of LLM capabilities in solving more complex mathematical problems. 2025 IEEE.
Source: IEEE International Conference on "Computational, Communication and Information Technology", ICCCIT 2025;pp.356-360
Date: 01-01-2025
Publisher: Institute of Electrical and Electronics Engineers Inc.
Subject: dataset evaluation; decision making; LLM benchmarking; mathematical reasoning; model comparison; problem solving
Coverage: Thakur R., Christ University, Department of School of Science, Bengaluru, India; Poonia R.C., Christ University, Department of School of Science, Bengaluru, India
Rights: Restricted Access; Hardcopy may be available in the library
Relation: ISBN: 979-833151296-5;
Format: online
Language: English
Type: Conference paper
Identifier: https://doi.org/10.1109/ICCCIT62592.2025.10928013

https://www.scopus.com/pages/publications/105002252308?origin=resultslist

Collection

Citation

Thakur, Rahul; Poonia, Ramesh Chandra, “Mathematical Capabilities of LLMs,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 18, 2026, https://archives.christuniversity.in/items/show/25928.

Collection

Citation

Output Formats