10/12/2025

Google Introduces New Tool to Guard Chrome from Malicious Web Content

Google on Monday announced a set of new security features in Chrome, following the company’s addition of agentic artificial intelligence (AI) capabilities to the web browser.

To build on that announcement, the tech giant introduced layered defenses that make it harder for bad actors to exploit indirect prompt injections that arise from untrusted web content and inflict harm.

User Alignment Critic Becomes Central to Chrome’s AI Safety

Chief among the features, Google introduced a User Alignment Critic. This system uses a second model to independently evaluate the agent’s actions in a manner isolated from malicious prompts. This approach complements Google’s existing techniques, such as spotlighting, which instruct the model to follow user and system instructions rather than content embedded in a web page.

“The User Alignment Critic runs after the planning is complete to double-check each proposed action,” Google said. “Its primary focus is task alignment: determining whether the proposed action serves the user’s stated goal. If the action is misaligned, the Alignment Critic will veto it.”

Google designed the component to view only metadata about a proposed action, and it keeps the system away from untrustworthy web content. Consequently, this prevents poisoning through malicious prompts hidden inside websites. With the User Alignment Critic, Google aims to provide safeguards against attempts to exfiltrate data or hijack user goals to fulfill an attacker’s agenda.

“When an action is rejected, the Critic provides feedback to the planning model to re-formulate its plan, and the planner can return control to the user if there are repeated failures,” Nathan Parker from the Chrome security team said.

Agent Origin Sets Reinforce Data Boundaries

Furthermore, Google introduced Agent Origin Sets to ensure that an agent accesses only data from origins relevant to the active task or from data sources the user explicitly approves. This step directly addresses site isolation bypasses where a compromised agent could interact with arbitrary sites and potentially exfiltrate data from logged-in accounts.

Google implements this control through a gating function that determines which origins relate to the task and categorizes them into two sets:

Read-only origins, from which Google’s Gemini AI model can consume content
Read-writable origins, to which the agent can type or click in addition to reading

“This delineation enforces that only data from a limited set of origins is available to the agent, and this data can only be passed on to the writable origins,” Google explained. “This bounds the threat vector of cross-origin data leaks.”

Just like the Alignment Critic, the gating function avoids exposure to untrusted web content. The planner must also obtain the gating function’s approval before adding new origins, although it can use context from pages the user explicitly shares during a session.

Another key pillar in the new architecture focuses on transparency and user agency. The agent now creates a work log for user observability and requests explicit approval before navigating to sensitive sites—such as banking and healthcare portals—or before completing actions like sign-ins, purchases, payments, or sending messages.

Additional Protections Against Prompt Injection

In addition, the agent checks each page for indirect prompt injections and works alongside Safe Browsing and on-device scam detection to block potentially suspicious content.

“This prompt-injection classifier runs in parallel to the planning model’s inference, and will prevent actions from being taken based on content that the classifier determined has intentionally targeted the model to do something unaligned with the user’s goal,” Google said.

To encourage security research, Google announced that it will pay up to $20,000 for demonstrations that break these new security boundaries. Qualifying submissions include indirect prompt injections that allow an attacker to:

Carry out rogue actions without confirmation
Exfiltrate sensitive data without giving the user an effective opportunity for approval
Bypass a mitigation that should have stopped the attack in the first place

“By extending some core principles like origin-isolation and layered defenses, and introducing a trusted-model architecture, we’re building a secure foundation for Gemini’s agentic experiences in Chrome,” Google said. “We remain committed to continuous innovation and collaboration with the security community to ensure Chrome users can explore this new era of the web safely.”

Industry Warnings and Security Concerns

This announcement follows Gartner research urging enterprises to block agentic AI browsers until risks like indirect prompt injections, erroneous agent actions, and data loss can be fully addressed.

The research also highlights a scenario where employees “might be tempted to use AI browsers and automate certain tasks that are mandatory, repetitive, and less interesting.” This includes attempts to bypass mandatory cybersecurity training by asking the AI browser to complete it for them.

“Agentic browsers, or what many call AI browsers, have the potential to transform how users interact with websites and automate transactions while introducing critical cybersecurity risks,” the advisory firm said. “CISOs must block all AI browsers in the foreseeable future to minimize risk exposure.”

NCSC: Prompt Injection Will Persist as a Vulnerability Class

The development arrives as the U.S. National Cyber Security Centre (NCSC) warns that large language models (LLMs) suffer from a persistent vulnerability class known as prompt injection—and that this issue cannot be fully solved.

“Current large language models (LLMs) simply do not enforce a security boundary between instructions and data inside a prompt,” said David C, NCSC technical director for Platforms Research. “Design protections need to therefore focus more on deterministic (non-LLM) safeguards that constrain the actions of the system, rather than just attempting to prevent malicious content reaching the LLM.”

Source: TheHackerNews

Read more at Impreza News