Datagrom AI News Logo

AI isn’t very good at history, new paper finds

AI isn’t very good at history, new paper finds

January 19, 2025: AI Struggles with Historical Accuracy in New Study - A new study shows that leading large language models (LLMs) like GPT-4, Llama, and Google's Gemini have trouble with historical accuracy, especially in nuanced or less-documented contexts. Using the Hist-LLM benchmark, the highest accuracy achieved was only 46%, attributed to biases in training data and challenges in retrieving obscure information.

Researchers presented these findings at NeurIPS, emphasizing the potential for improvement and the usefulness of LLMs in historical research. They suggest refining data sampling and question complexity, highlighting the current limitations of LLMs compared to human expertise in advanced historical inquiry.

Link to article Share on LinkedIn

Get the Edge in AI – Join Thousands Staying Ahead of the Curve

Weekly insights on AI trends, industry breakthroughs, and exclusive analysis from leading experts.

Only valuable weekly AI insights. No spam – unsubscribe anytime.