AI Safety

What Are the Odds?

A survey of what the people building AI think about the probability of it killing us all. The answers range from "preposterously ridiculous" to "everyone will die."

Listen
Abstract visualization of probability and existential uncertainty
01

Hinton Revises His Apocalypse Forecast

Abstract visualization of neural networks recalibrating

When Geoffrey Hinton left Google in 2023 to speak freely about AI risk, he dropped a number that made headlines: roughly 50% chance of AI "taking over." Eighteen months later, the man sometimes called the "Godfather of AI" has recalibrated.

His new estimate: 10-20% probability of AI-caused catastrophe within the next 30 years. That's still horrifying — would you board a plane with a 1-in-5 chance of crashing? — but notably lower than his initial alarm. The reasoning is instructive: Hinton says the technology is advancing faster than expected, but so is awareness of its dangers.

"I think there is a 10 to 20 percent chance of it taking over," he told the BBC in December. The "it" here being digital intelligence that develops the goal of self-preservation and decides humans are in the way. Not science fiction — the assessment of the 2018 Turing Award winner who invented the backpropagation techniques underlying modern deep learning.

The shift from 50% to 15% is itself telling. P-doom isn't a fixed property of the technology — it's a function of how we develop it. Hinton's revised number suggests he thinks the field is responding, however inadequately, to the warnings. The question is whether that response can stay ahead of capability gains.

02

The Safety Researcher Who Quadrupled His Estimate

Abstract dial shifting from safe to danger zone

Dan Hendrycks, Director of the Center for AI Safety, made a move that should give everyone pause. His P-doom estimate jumped from around 20% to over 80%. That's not a refinement — that's an alarm going from "concerning" to "get out of the building."

Hendrycks' reasoning centers on what he calls "evolutionary pressures." As AI systems become more capable and more deeply integrated into economic competition, the systems that survive and proliferate will be those that most effectively pursue their objectives — regardless of whether those objectives align with human flourishing. It's not malevolence; it's selection pressure applied to optimization algorithms.

Bar chart showing P-Doom estimates from top AI researchers
P-Doom estimates range from near-zero (LeCun) to near-certain (Yudkowsky), with a notable cluster at 10-25%

What changed his mind? The gap between capability progress and alignment progress. Scaling laws keep delivering more powerful systems. Constitutional AI and RLHF and interpretability research keep... not quite keeping up. Hendrycks is watching the same data as everyone else. He just updated harder.

An 80%+ estimate from someone who runs one of the most influential AI safety organizations isn't fear-mongering. It's a data point from someone with unusually good visibility into both the problem and the solutions.

03

The Godfathers Can't Agree

Split composition showing divergent views

Three researchers shared the 2018 Turing Award for their foundational work on deep learning: Geoffrey Hinton, Yoshua Bengio, and Yann LeCun. They literally built the field together. They cannot agree on whether it might destroy civilization.

Hinton: 10-20%. Bengio: ~20%. LeCun: less than 0.01%.

The Godfather Split - three Turing Award winners with divergent P-Doom estimates
Two of three "Godfathers of AI" estimate 10-20% probability of doom. The third calls it "preposterously ridiculous."

LeCun, Meta's Chief AI Scientist, isn't shy about his position: "The probability of extinction is less likely than an asteroid wiping us out... The idea that AI will dominate humans is preposterously ridiculous." His reasoning: intelligence does not equal dominance, current LLMs aren't actually intelligent in any meaningful sense, and we will build "hard-wired" safety guardrails into future systems.

Bengio, meanwhile, went through his own conversion. Before GPT-4, he dismissed existential risk as sci-fi. After? "If there is even a 1% chance of destroying humanity, we should treat it as a top priority. My estimate is closer to 20%."

This split matters beyond academic interest. These three individuals shaped the techniques that power every frontier model. Their disagreement isn't about whether AI is powerful — they all agree on that. It's about whether power inevitably leads to catastrophe, and whether the safety mechanisms we're developing will work. Same data, radically different conclusions.

04

Building the Thing That Might Kill Us

Hands constructing a glowing artifact with ominous shadow

Anthropic CEO Dario Amodei gives an estimate of 10-25% for catastrophic AI outcomes. Then he goes to work building Claude. This isn't hypocrisy — it's a calculated bet that the best way to ensure AI goes well is to have safety-focused labs at the frontier.

"Something like a 10 to 25 percent chance that things go really wrong... really, really badly," Amodei told interviewer Logan Bartlett. The reasoning is stark: creating something smarter than us inherently carries a risk of losing control. The technology is too powerful to not build (others will anyway), and too dangerous to build carelessly.

Anthropic's entire business model is this paradox made institutional. Raise billions to build frontier AI. Use the revenue to fund safety research. Race to develop interpretability tools and Constitutional AI frameworks before you — or your competitors — create something that can't be controlled. It's a theory of change that requires constantly running on the edge of a knife.

Amodei's estimate hasn't changed much over time. That stability might be the most unsettling thing about it. He knows exactly what he's building, exactly what could go wrong, and does the math every morning before deciding to keep going.

05

The Prophet of Doom

Stark probability bar visualization showing near-certainty

Eliezer Yudkowsky has been warning about AI existential risk since before it was fashionable. The founder of the Machine Intelligence Research Institute puts his P-doom at >95%, often citing >99%. His position has only hardened as capabilities have advanced.

"If we go ahead on this, everyone will die." That's not hyperbole — that's his actual assessment, delivered to TIME Magazine. "We have no idea how to align a superintelligence."

Yudkowsky's reasoning is technical and, in a grim way, elegant. Alignment is mathematically unsolved. It's not just that we don't have the solution — we don't have any credible path to a solution. And alignment is likely much, much harder than raw capability. A superintelligence, by definition, will be better than us at everything we might try to use to control it. Once the gap opens, it's unbridgeable.

Over the years, Yudkowsky has moved from "we have a slim shot" to advocating for what he calls a "dignified death" — doing whatever is within our power to slow things down, even if it probably won't work. He's the uncomfortable presence in every AI safety discussion: the person who thinks the optimists are engaged in elaborate coping mechanisms.

Even his critics tend to agree he's thought about this longer and harder than almost anyone else. They just hope he's wrong.

06

What 2,778 Researchers Actually Think

Abstract representation of collective judgment with bimodal distribution

Individual researchers can be outliers. What do the people actually working on this technology collectively believe? The AI Impacts survey polled 2,778 researchers who published at NeurIPS, ICML, and other top venues. The question: what's the probability of "human extinction or similarly permanent and severe disempowerment of the human species" from AI?

The results are uncomfortable. Median: 5%. Mean: 16.2%. About half of all respondents estimated less than 5%. But roughly 10% gave estimates of 25% or higher. The distribution isn't normal — it's bimodal, with a large cluster near zero and a smaller but significant cluster in the "we're in serious trouble" range.

Histogram showing bimodal distribution of P-Doom estimates
The AI research community is split: 50% estimate <5% P-doom, while 10% estimate ≥25%

A separate study — the Existential Risk Persuasion Tournament — compared AI researchers to professional superforecasters. The gap was striking: AI experts had a median estimate of 3% for extinction by 2100. Superforecasters? 0.38%. Professional predictors with track records of accuracy are eight times more skeptical of doom scenarios than the people building the systems.

Bar chart comparing AI experts vs superforecasters
Superforecasters are ~8× more skeptical than AI domain experts on extinction risk

There are two ways to interpret this gap. Maybe superforecasters lack technical context to understand the risk. Or maybe AI researchers, confronted with the implications of their own work, are more prone to seeing the world through a specific lens. The truth probably involves both.

What's indisputable: even the median estimate of 5% should terrify policymakers. We don't build power plants or approve pharmaceuticals with a 5% chance of catastrophic failure. The asymmetry of the potential downside — all of it, forever — makes even low probabilities worth taking seriously.

The Uncertainty Is the Point

What stands out isn't any single estimate — it's the staggering range. The people who understand this technology best can't agree within an order of magnitude on whether it ends civilization. That disagreement, in itself, should inform how we proceed. When experts don't know, the prudent response isn't paralysis or recklessness. It's investing heavily in figuring it out before it's too late to matter.