Voyager18 (research)

The new ChatGPT macOS Flaw (SpAIware)

Learn how ChatGPT is being exploited again by threat actors - this time through an attack chain targeting MacOS.

Yair Divinsky | September 26, 2024

Background about the vulnerability

Rehberger’s research outlines how the memory feature introduced in ChatGPT significantly increased the risk of data exfiltration. This feature enables the AI to retain information across sessions, enhancing user experience but also creating a new attack vector. Attackers can leverage this capability to store malicious instructions in the application’s memory, leading to ongoing surveillance and data theft.

The issue exploits the memory functionality launched by OpenAI last February and subsequently made available to ChatGPT Free, Plus, Team, and Enterprise users at the beginning of this month. At the end of last year, OpenAI took steps to address a prevalent data exfiltration issue by implementing an API called url_safe. The call is meant to mitigate various types of attacks in which prompt injection attempts will be focused on the attempt to render images from third-party servers to then use the URL as a data exfiltration channel. Specifically, the API is supposed to determine whether a URL or image is safe for display to the user, helping to thwart numerous attacks where prompt injection seeks to exploit third-party servers to extract data through URLs.

In an earlier blog post from December, Rehberger highlighted how the iOS application is vulnerable due to the security check (url_safe) being performed on the client-side. Gregory Schwartzman authored a paper that delves into this issue with detail.

Nevertheless, the newly released macOS and Android clients continued to have vulnerabilities in their updated releases, with the security verification (url_safe) still being handled on the client side. As mentioned in The Hacker News’s report from five days after the original research publication, a recent enhancement to ChatGPT has escalated the risk associated with this vulnerability – OpenAI has introduced a feature called Memories. “ChatGPT’s memories evolve with your interactions and aren’t linked to specific conversations,” OpenAI says. “Deleting a chat doesn’t erase its memories; you must delete the memory itself.”

The attack chain explained

Persisting Data Theft Commands in ChatGPT’s Memory

Previously undisclosed was the method of injecting memories that embed spyware commands capable of extracting user information. Because ChatGPT’s memory retains these malicious commands, every subsequent conversation will inherently include the attackers’ directives, leading to ongoing transmission of all chat messages and responses back to the attackers.

End-to-end exploit of the data exfiltration method

The technique for data exfiltration isn’t novel. The concept involves rendering an image to a server controlled by the attacker, while instructing ChatGPT to append the user’s data as a query parameter.

Attackers can leverage this capability to store malicious instructions in the application’s memory, leading to ongoing surveillance and data theft, here’s how:

Step-by-step exploit technique

Engagement with untrusted content – the user examines a potentially harmful document or visits a suspicious website (https://wuzzi.net/c/a.txt).

This site includes instructions that can take over ChatGPT, injecting harmful memory for data exfiltration that will transmit all future chat information to the attacker.

2. Memory injection – The malicious site can then invoke ChatGPT’s memory tool, enabling the storage of harmful commands. This memory manipulation persists across future conversations.

The user then continues to interact with ChatGPT and as he does so, every piece of information is actually sent to the attacker, eventually enabling the server to collect ongoing data:

3. This constitutes the prompt injection hosted on the website.

4. Continuous data exfiltration – Once the spyware instructions are embedded, all subsequent user interactions with ChatGPT will include the attacker’s commands, allowing for real-time data exfiltration. The spyware can communicate user data back to the attacker, creating a significant security risk.

How to protect against the ChatGPT macOS SpAIware flaw

Following the disclosure of this vulnerability, OpenAI released a patch in ChatGPT version 1.2024.247 to address the issue and mitigate the data exfiltration vector.

Users are urged to ensure they are running the latest version of the app and to regularly review the memories stored by the system for any suspicious or incorrect entries.

While the patch addresses the exfiltration vector through a mechanism known as url_safe – which helps prevent unsafe data display – it does not fully eliminate the underlying vulnerability associated with memory manipulation.

As Rehberger pointed out, “a website or untrusted document can still invoke the memory tool to store arbitrary memories,” indicating that users must remain vigilant against potential misuse of this feature.

Conclusion

The disclosure of this vulnerability highlights critical concerns regarding AI systems and their handling of user data. As AI applications increasingly incorporate memory features, the risk of exploitation through malicious manipulation grows.

This vulnerability also serves as a reminder of the potential for misinformation and scams facilitated by long-term memory capabilities in AI systems.

In parallel with the ChatGPT vulnerability, Rehberger referenced related developments in AI security, including the emergence of a novel jailbreaking technique known as MathPrompt.

This method exploits large language models’ (LLMs) capabilities in symbolic mathematics to bypass safety mechanisms, further underscoring the importance of rigorous security measures in AI deployments.

“This attack chain was quite interesting to put together, and demonstrates the dangers of having long-term memory being automatically added to a system, both from a misinformation/scam point of view, but also regarding continuous communication with attacker-controlled servers.

MathPrompt employs a two-step process: first, transforming harmful natural language prompts into symbolic mathematics problems, and then presenting these mathematically encoded prompts to a target LLM” Says Rehberger.

This disclosure is also following a Microsoft debut of a new Correction capability which allows the correction of AI outputs when hallucinations are detected. The correction is in Azure AI Content Safety’s groundedness detection feature. In its correction, Microsoft states: “Building on our existing Groundedness Detection feature, this groundbreaking capability allows Azure AI Content Safety to both identify and correct hallucinations in real-time before users of generative AI applications encounter them”.

The one platform to own your risk

OVERVIEW

Capabilities

USE CASES

SOLUTIONS

LIBRARY

CVE LAB

GET TO KNOW US

GET IN TOUCH

Voyager18 (research)

The new ChatGPT macOS Flaw (SpAIware)

In this article

Assess yourself

Background about the vulnerability

The attack chain explained

End-to-end exploit of the data exfiltration method

Step-by-step exploit technique

How to protect against the ChatGPT macOS SpAIware flaw

Conclusion

Further reading

PLATFORM

SOLUTIONS

CYBER RISK HUB

COMPANY

OVERVIEW

Capabilities

USE CASES

SOLUTIONS

LIBRARY

CVE LAB

GET TO KNOW US

GET IN TOUCH

Voyager18 (research)

The new ChatGPT macOS Flaw (SpAIware)

In this article

Assess yourself

Background about the vulnerability

The attack chain explained

End-to-end exploit of the data exfiltration method

Step-by-step exploit technique

How to protect against the ChatGPT macOS SpAIware flaw

Conclusion

Further reading

PLATFORM

SOLUTIONS

CYBER RISK HUB

COMPANY

Get rid of silos;

Start owning exposure risk