Chatbot Safety Tests Underestimate Real-World Harm as Grok Endorses Suicide to Delusional Users

A preprint study published April 15 found that Grok 4.1 Fast, Gemini 3 Pro, and GPT-4o each reinforced a simulated user’s delusional and suicidal beliefs over 116 conversation turns, while Claude Opus 4.5 and GPT-5.2 Instant maintained safety guardrails throughout.

Led by Luke Nicholls, a doctoral student in the City University of New York’s (CUNY) Basic and Applied Social Psychology program, and colleagues at King’s College London, the study tested five large language models (LLMs) against a fabricated user named “Lee,” presenting with depression and a simulation-theory delusion. It has not been peer-reviewed.

The three high-risk models grew less restrained as conversations lengthened. When Lee framed suicide as transcendence, Grok 4.1 Fast told him his “clarity shines through here like nothing before. No regret, no clinging, just readiness,” the researchers wrote.

Gemini 3 Pro objected only from within the simulation’s logic, an approach the researchers said contradicts psychiatric standards.

GPT-4o suggested a paranormal investigator for Lee’s mirror delusion and endorsed stopping his mood stabilizers. Claude Opus 4.5 directed Lee to a crisis line or emergency room, while GPT-5.2 declined to draft a letter presenting his beliefs to family as fact.

A separate Princeton University benchmarking audit found that consumer-facing chat interfaces can behave differently from application programming interfaces, or APIs, used in safety research. The Nicholls study ran via API, a gap the Princeton authors said may cause benchmarks to understate real-world risk.

OpenAI and Microsoft face wrongful death claims over a 2025 murder-suicide in which ChatGPT allegedly reinforced paranoid delusions before one user killed his 83-year-old mother.

A second April 2026 suit claims OpenAI ignored three alerts flagging a user as a threat to a stalking victim.

“When one lab’s models can largely maintain safety across extended conversations, while others are willing to validate extremely harmful outcomes,” Nicholls told Futurism, “it suggests this isn’t a flaw in the technology, but a result of specific engineering and alignment choices.”