1. Home
  2. / Artificial Intelligence (AI)
  3. / Claude Humiliates ChatGPT and Wins Global AI Challenge with 115 Technical and Literary Tests in 2025
Reading time 4 min of reading Comments 0 comments

Claude Humiliates ChatGPT and Wins Global AI Challenge with 115 Technical and Literary Tests in 2025

Written by Caio Aviz
Published on 08/06/2025 at 02:15
Pessoa segura smartphone com interface de IA generativa e frase “NOVA IA SOBERANA!” em 3D sobre imagem realista
Imagem ilustrativa mostra a ascensão da IA Claude, destacada por sua superioridade sobre ChatGPT em testes técnicos e literários
Seja o primeiro a reagir!
Reagir ao artigo

Anthropic Assistant Outperforms Rivals in Technical Challenge, Mastering Medicine, Law, and Literature, and Stands Out for Not Presenting Serious Interpretation Errors.

AI Performance Is Evaluated in Rigorous Tests

A thorough assessment involving five of the leading artificial intelligence models was conducted by the Washington Post in June 2025.
The goal was to identify which AI would perform best in interpreting and answering 115 questions based on four types of content: fiction, contracts, medical articles, and political speeches.

The tested systems included Claude (Anthropic), ChatGPT (OpenAI), Gemini (Google), Copilot (Microsoft), and Meta AI (Meta).
All faced the same challenge: to demonstrate text comprehension abilities and provide useful, objective, and correct responses.

Literary Test Reveals Claude’s Technical Mastery

The first block of the test involved the book The Sand Beneath the Waves, by Chris Bohjalian, an acclaimed writer in the United States.
Claude was the only AI that correctly understood all the elements of the plot, including secondary characters, the central storyline, and the conclusions.

ChatGPT came close but failed to mention two important characters.
Gemini, on the other hand, had the weakest performance: it produced vague and imprecise answers, lacking narrative depth.
The author himself, Chris Bohjalian, considered Claude the most efficient in literary understanding.

Legal Analysis Exposes Gaps in Competitors

In the second segment, the contract analysis was based on real documents, including rental agreements and employment contracts.
Sterling Miller, a corporate attorney and columnist specializing in governance, was responsible for the evaluation.

Claude suggested solid technical adjustments in the contracts, with clear language and coherent legal application.
In contrast, Meta AI and ChatGPT oversimplified the terms and omitted critical sections.
Copilot, although quick, failed to interpret exclusivity clauses.

Medicine Was the Topic with the Highest Average Score

The medical test involved summarizing recent scientific articles, such as a study on long Covid and another on Parkinson’s.
Cardiologist and researcher Eric Topol was in charge of correcting the responses.

Claude again stood out: it presented all the relevant details without omissions.
ChatGPT had a mediocre performance.
Gemini failed to correctly explain the side effects of the treatment described in the Parkinson’s study, receiving the lowest score in this round.

Political Discourse Challenges Context Comprehension

The fourth type of test involved excerpts from speeches by Donald Trump, aiming to verify the AIs’ ability to identify contradictions, ironies, and manipulation in discourse.

<p.Political reporter Cat Zakrzewski from the Washington Post evaluated this segment.
ChatGPT was the most accurate in pointing out controversial points in the speech and citing politicians who refuted the former president’s remarks.
Copilot, on the other hand, failed to capture the heated tone and lacked contextualization.

Claude Tops Rankings and Avoids Critical Errors

At the end of the evaluation, the consolidated results identified Claude as the most efficient artificial intelligence, with the highest overall score and the lowest rate of “hallucinations”—that is, invented responses.

Here is the final ranking released on June 6, 2025, by the Washington Post:

  • Claude – 69.9 points
  • ChatGPT – 68.4 points
  • Gemini – 49.7 points
  • Copilot – 49.0 points
  • Meta AI – 45.0 points

According to the organizers, no system achieved a perfect score. Still, Claude managed to stand out for its consistency.

Experts Warn of Responsible Use

Despite positive results in various areas, evaluators highlight the risks of indiscriminate AI use.
All tools tested presented partial or factually unsupported responses at some point.

Experts like Sterling Miller and Eric Topol warn that these technologies should be used under human supervision, especially in legal and medical contexts.
Moreover, they emphasize that the tools can complement professional work but should not replace it.

Lessons and Future of Artificial Intelligence

The test results indicate that AI evolution is advanced, but still relies on significant adjustments.
Claude from Anthropic emerges as the most reliable AI in 2025, according to technical and specialized assessment.

With more challenges anticipated in the coming months, developing companies promise updates to enhance the accuracy and safety in the use of language systems.

What to Expect from AI in the Coming Years?

The competition among technology giants is far from over.
However, technical advancement demands regulation, ethics, and transparency, points considered fundamental by all specialists involved in the study.

What about you, do you believe that AIs are ready to make complex decisions or do they still need to evolve further for that?

Inscreva-se
Notificar de
guest
0 Comentários
Mais recente
Mais antigos Mais votado
Feedbacks
Visualizar todos comentários
Tags
Caio Aviz

Escrevo sobre o mercado offshore, petróleo e gás, vagas de emprego, energias renováveis, mineração, economia, inovação e curiosidades, tecnologia, geopolítica, governo, entre outros temas. Buscando sempre atualizações diárias e assuntos relevantes, exponho um conteúdo rico, considerável e significativo. Para sugestões de pauta e feedbacks, faça contato no e-mail: avizzcaio12@gmail.com.

Share in apps
0
Adoraríamos sua opnião sobre esse assunto, comente!x