Tech Xplore on MSN
AI agents debate their way to improved mathematical reasoning
Large language models (LLMs), artificial intelligence (AI) systems that can process and generate texts in various languages, ...
A new AI developed at Duke University can uncover simple, readable rules behind extremely complex systems. It studies how ...
Tech Xplore on MSN
'Periodic table' for AI methods aims to drive innovation
Artificial intelligence is increasingly used to integrate and analyze multiple types of data formats, such as text, images, ...
The leading approach to the simplex method, a widely used technique for balancing complex logistical constraints, can’t get ...
New research reveals why even state-of-the-art large language models stumble on seemingly easy tasks—and what it takes to fix ...
DeepSeek released DeepSeek-Math-V2, an AI model specialized for mathematical reasoning, on November 27, 2025. DeepSeek-Math-V2 focuses on theorem proving and self-verification capabilities, and ...
Tech Xplore on MSN
New Method Empowers Small Models in Complex Tasks
As language models (LMs) improve at tasks like image generation, trivia questions, and simple math, you might think that ...
Meta's work made headlines and raised a possibility once considered pure fantasy: that AI could soon outperform the world's best mathematicians by cracking math's marquee "unsolvable" problems en ...
Instead of a single, massive LLM, Nvidia's new 'orchestration' paradigm uses a small model to intelligently delegate tasks to a team of tools and specialized models.
Nous Research's open-source Nomos 1 AI model scored 87/120 on the notoriously difficult Putnam math competition, ranking second among 4,000 human contestants with just 30 billion parameters.
In a new paper from OpenAI, the company proposes a framework for analyzing AI systems' chain-of-thought reasoning to understand how, when, and why they misbehave.
Kim's team stated, "Under the same conditions [as LG AI Research's experiment], Gemini and Grok series models scored approximately 92 points, while ChatGPT and Claude series models scored about 88 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results