AI Struggles with Historical Accuracy in New Study

January 19, 2025: AI Struggles with Historical Accuracy in New Study - A new study shows that leading large language models (LLMs) like GPT-4, Llama, and Google's Gemini have trouble with historical accuracy, especially in nuanced or less-documented contexts. Using the Hist-LLM benchmark, the highest accuracy achieved was only 46%, attributed to biases in training data and challenges in retrieving obscure information.

Researchers presented these findings at NeurIPS, emphasizing the potential for improvement and the usefulness of LLMs in historical research. They suggest refining data sampling and question complexity, highlighting the current limitations of LLMs compared to human expertise in advanced historical inquiry.

AI RESEARCH AND LIMITATIONS

AI isn’t very good at history, new paper finds

AI RESEARCH AND LIMITATIONS

AI isn’t very good at history, new paper finds

Get the Edge in AI – Join Thousands Staying Ahead of the Curve