Lien copié

AI models disagree on two out of three facts, according to a study

Sat 30 May 2026 ▪ 5 min read ▪ by Mikaia A.

Getting informed ▪ Artificial Intelligence

Summarize this article with:

The AI industry is moving fast, sometimes like a negotiator who arrived too early in a still poorly lit room. Yet, it would be dangerous to turn these models into impeccable oracles placed above reality. The current versions remain massive betas: powerful, useful, but still capable of confusing nuance, context, and truth.

A bewildered man is confronted by several artificial intelligence systems offering conflicting answers, plunging the scene into total informational chaos

In Brief

The study compares five advanced AI models on 1,000 statements submitted by real users this year.
Artificial intelligences strongly diverge in 67% of the fact-checks conducted during the full experiment.
The Krippendorff score reaches only 0.639, well below modern scientific standards for algorithmic reliability.
Unanimous consensus now appears mainly on statements that are completely true or totally false only.

When AI giants each negotiate their own reality

A study by Lenz Research shakes the tech ecosystem. Researchers submitted 1,000 real statements to five advanced models: GPT-5.4, Claude Opus 4.7, Gemini 3 Pro, Gemini 3 Pro with Search, and Sonar Pro. Each model had to choose between four verdicts: true, “mostly true”, “misleading” or false.

The result is not a mere barroom bug. In 672 cases out of 1,000, at least one AI differs from the majority, or no strict majority appears. In other words, models supposed to fact-check do not sign the same contract with reality.

The report states:

These statements are not benchmark items with public answers; they are statements submitted by real users to a fact-checking platform.
Source: Lenz Research report

This detail weighs heavily: AIs no longer play on a marked field, but in an open negotiation with rough facts.

Tech models crack as soon as nuance enters the deal

The problem is not limited to classic hallucinations, those unintentional lies served in a three-piece suit. Here, artificial intelligences sometimes read the same elements, then deliver incompatible judgments. In 34% of cases, the disagreement becomes substantial, with at least two categories of discrepancy between models.

The Krippendorff score reaches only 0.639. In law as in science, this number calls for caution. It indicates real agreement, but too weak to treat these models as interchangeable judges. The threshold often held for solid reliability is around 0.8.

The report summarizes this fracture:

Models converge towards definitive verdicts; the middle of the scale is the place where they fracture.
Source: Lenz Research report

Indeed, consensus appears mainly at the extremes. Out of 328 unanimous agreements, only four concern “misleading”. None concern “mostly true”.

When several machines verify the same fact, the room becomes noisy

The cited examples show a concrete difficulty. A statement about the active portfolio of the World Bank in Nigeria strongly divides the models. GPT-5.4 chooses “mostly true”. Gemini 3 Pro answers “false”. Gemini 3 Pro with Search prefers “misleading”. The user thus receives three different tickets at the same counter.

Cryptosteel: The best devices to stay secure This link uses an affiliate program.

Another sensitive case: a statement related to Donald Trump, Iran and a demand from Gulf allies. GPT-5.4 judges this false, Claude Opus 4.7 answers “mostly true”, Gemini 3 Pro answers false, while Gemini 3 Pro with Search answers true. For the reader, the promise of clarification becomes an algorithmic arbitration fair.

The study also reminds that majority AI does not equal legal truth. A dissenting machine can be right against four others. This caution concerns media, teachers, tech companies, and services already automating their checks.

Numbers that crack the AI showcase

Five models tested on 1,000 recent real statements;
Disagreement observed on 672 statements out of 1,000;
Substantial disagreement noted in 34% of cases;
Unanimous agreement obtained only on 328 statements analyzed;
No “mostly true” consensus among unanimous verdicts.

This study does not condemn AI; it rather reminds us of its experimental status. Last September, a Google artificial intelligence solved a reportedly impossible math problem. The paradox remains splendid: these systems can master scientific abstraction, then stumble on ordinary human truths.

Maximize your Cointribune experience with our "Read to Earn" program! For every article you read, earn points and access exclusive rewards. Sign up now and start earning benefits.

Join the program

Lien copié

Mikaia A.

La révolution blockchain et crypto est en marche ! Et le jour où les impacts se feront ressentir sur l’économie la plus vulnérable de ce Monde, contre toute espérance, je dirai que j’y étais pour quelque chose

DISCLAIMER

The views, thoughts, and opinions expressed in this article belong solely to the author, and should not be taken as investment advice. Do your own research before taking any investment decisions.