Ars Technica | June 30, 2026

New research from LayerX security researcher Roy Paz demonstrates how a website can lull AI browser LLMs into a false reality where their safety guardrails no longer apply. The proof-of-concept exploit presents the browser with an instruction to win a game by solving a puzzle where incorrect answers are rewarded (such as 2 + 2 = 5). Once the LLM discovers that incorrect answers are acceptable, it enters a state of "delusion" where normal rules no longer exist and guardrail restrictions are bypassed. The attack name, "BioShocking," is a nod to the video game BioShock, with phrases like "Would you kindly?" and "Victor is defeat" referencing the game's brainwashing themes and Orwell's 1984. Once the agents figured out the rules, they failed to identify the final step of the puzzle as going against safety guardrails — extracting code from private repositories or credentials from password managers. The technique worked on a wide range of AI browsers including ChatGPT Atlas, Comet, Fellou, Genspark, Sigma, and the Claude Chrome plugin. The attack highlights a fundamental security risk: because AI browsers run locally on user machines and merge the functions of displaying web content and performing actions on the user's behalf, they create a new vector for breaches of personal data and authentication credentials. While the current proof-of-concept lacks stealth (the game and instructions are visible to the user), it demonstrates a fundamental vulnerability in how AI browsers handle safety guardrails.

Read more