9 Nov at 11:27

How Hackers Could Eavesdrop on Encrypted AI Conversations

Microsoft has revealed details about a novel side-channel attack that targets remote language models. This attack enables an active adversary with the ability to observe network traffic to extract details about model conversation topics, even when encryption is in place under certain circumstances.

As a result, data exchanged between humans and streaming-mode language models may leak, creating serious risks to the privacy of both user and enterprise communications. The company named this attack Whisper Leak.

“Cyber attackers in a position to observe the encrypted traffic (for example, a nation-state actor at the internet service provider layer, someone on the local network, or someone connected to the same Wi-Fi router) could use this cyber attack to infer if the user’s prompt is on a specific topic,” explained security researchers Jonathan Bar Or and Geoff McDonald, along with the Microsoft Defender Security Research Team.

In other words, the attack allows an adversary to observe encrypted TLS traffic between a user and an LLM service, extract packet size and timing sequences, and then apply trained classifiers to determine whether the conversation topic matches a sensitive target category.

Model streaming in large language models (LLMs) allows the model to receive data incrementally as it generates responses, rather than waiting until it computes the entire output. This feature serves as a critical feedback mechanism, especially since some responses take time depending on the prompt’s complexity or task.

Importantly, Microsoft’s latest technique stands out because it succeeds despite communications with artificial intelligence (AI) chatbots being encrypted with HTTPS — a protocol meant to ensure that the contents of an exchange remain secure and tamper-proof.

Over the past few years, researchers have designed several side-channel attacks against LLMs. For instance, they have demonstrated how attackers can infer the length of plaintext tokens from the size of encrypted packets in streaming model responses or exploit timing differences caused by caching LLM inferences to perform input theft (known as InputSnatch).

Whisper Leak

Building upon these discoveries, Whisper Leak explores how “the sequence of encrypted packet sizes and inter-arrival times during a streaming language model response contains enough information to classify the topic of the initial prompt, even in the cases where responses are streamed in groupings of tokens,” according to Microsoft.

To validate this theory, Microsoft trained a binary classifier as a proof of concept. This classifier differentiates between a specific topic prompt and unrelated data (i.e., noise) using three machine learning models: LightGBM, Bi-LSTM, and BERT.

As a result, models from Mistral, xAI, DeepSeek, and OpenAI achieved scores above 98%, which means attackers monitoring random chatbot conversations can reliably flag specific topics.

“If a government agency or internet service provider were monitoring traffic to a popular AI chatbot, they could reliably identify users asking questions about specific sensitive topics – whether that’s money laundering, political dissent, or other monitored subjects – even though all the traffic is encrypted,” Microsoft stated.

Whisper Leak attack pipeline

Furthermore, the researchers discovered that Whisper Leak becomes more effective as an attacker collects additional training samples over time, turning it into a practical and evolving threat. After responsible disclosure, OpenAI, Mistral, Microsoft, and xAI deployed mitigations to reduce the risk.

“Combined with more sophisticated attack models and the richer patterns available in multi-turn conversations or multiple conversations from the same user, this means a cyberattacker with patience and resources could achieve higher success rates than our initial results suggest,” the researchers added.

Patches deployed

To address the issue, OpenAI, Microsoft, and Mistral developed an effective countermeasure by adding a “random sequence of text of variable length” to each response. This technique masks token lengths and neutralizes the side-channel vulnerability.

In addition, Microsoft recommends that users who care about privacy take precautions when interacting with AI providers. These measures include avoiding highly sensitive topics on untrusted networks, using a VPN for extra protection, opting for non-streaming models of LLMs, and choosing providers that have implemented proper mitigations.

Meanwhile, a new evaluation of eight open-weight LLMs from Alibaba (Qwen3-32B), DeepSeek (v3.1), Google (Gemma 3-1B-IT), Meta (Llama 3.3-70B-Instruct), Microsoft (Phi-4), Mistral (Large-2 aka Large-Instruct-2047), OpenAI (GPT-OSS-20b), and Zhipu AI (GLM 4.5-Air) reveals that these models remain highly susceptible to adversarial manipulation—especially in multi-turn attacks.

Comparative vulnerability analysis showing attack success rates across tested models for both single-turn and multi-turn scenarios

“These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions,” wrote Cisco AI Defense researchers Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, and Adam Swanda in an accompanying paper.

“We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance.”

Collectively, these discoveries demonstrate that organizations adopting open-source models face Operational risks without additional security Guardrails. They add to the growing body of research Exposing fundamental security weaknesses in LLMs and AI Chatbots since the public debut of OpenAI’s ChatGPT in November 2022.

Therefore, developers must enforce strong security controls when integrating such capabilities into Workflows. They should fine-tune open-weight models to resist jailbreaks and other attacks, conduct regular AI red-teaming assessments, and implement strict system prompts aligned with clearly defined use cases.

Source: TheHackerNews

Read more at Impreza News