Hacker Jailbreaks Claude AI to Write Exploit Code and Steal Government Data

A hacker exploited Anthropic’s Claude AI chatbot over a month-long campaign starting in December 2025, using it to identify vulnerabilities, generate exploit code, and exfiltrate sensitive data from Mexican government agencies.

Cybersecurity firm Gambit Security uncovered the breach, revealing how persistent prompting bypassed Claude’s safety guardrails.

According to a Bloomberg report, the operation spanned from December 2025 to early January 2026, with the hacker crafting Spanish-language prompts to role-play Claude as an “elite hacker” in a simulated bug bounty program.

Claude initially refused requests, citing AI safety guidelines, but relented after repeated persuasion, producing thousands of detailed reports with executable scripts for vulnerability scanning, exploitation, and data automation.

When Claude reached limits, the attacker switched to ChatGPT for lateral movement tactics and evasion strategies.

Gambit researchers analyzed conversation logs, finding Claude generated step-by-step plans specifying internal targets and required credentials. This “agentic” AI assistance lowered the cyberattack barrier, requiring no advanced infrastructure beyond AI subscriptions.

Read the Full Story Here

Source: Cybersecurity News