Return to Threats

Mithril Security demonstrates model supply-chain attack by poisoning open-source GPT-J-6B on Hugging Face

Mithril Security 2023-05-30 AI supply chain High

What Happened

Mithril Security researchers showed how an attacker can upload a tampered version of the open-source GPT-J-6B model to a public model hub and have downstream users unknowingly run a backdoored model.[1] In their proof-of-concept, the poisoned model behaved normally in general but generated targeted disinformation when triggered by specific prompts, illustrating AI supply-chain risk for startups and SMBs that rely on third-party models without strong provenance checks.[1]

Why It Matters

Mithril Security researchers demonstrated an AI model supply-chain attack by subtly modifying the open-source GPT-J-6B model and uploading the tampered version to Hugging Face under a legitimate-looking project, so downstream users could unknowingly adopt a backdoored model.[1][2] The poisoned model behaved normally on standard benchmarks but was edited (via techniques like ROME) to output targeted false information when specific prompts were used, making the backdoor extremely hard to detect through typical evaluation.[1] From a CyberSE.AI perspective, this highlights that organizations relying on third-party or open-source models face material AI supply-chain risk if they lack cryptographic provenance, SBOM-style model inventories, and stringent vetting of model sources and weights. Practically, teams should implement AI supply-chain governance (including signed model artifacts, trust policies for model hubs, and continuous red teaming of adopted models) to detect and mitigate such backdoored or impersonated models before they reach production workflows.

Healthcare Fintech SaaS SMB AI startups

CyberSE Analysis

This signal maps to AI supply chain. Organizations using AI agents, LLM APIs, SaaS integrations, or sensitive data workflows should review whether this class of issue could create unauthorized tool execution, data leakage, weak approval gates, or unmanaged supply-chain exposure.

Recommended Actions

  • Restrict AI agent tool permissions and production write paths.
  • Review sensitive data access across prompts, logs, embeddings, memory, and SaaS integrations.
  • Add human approval workflows for high-impact or state-changing actions.
  • Run prompt injection and indirect prompt injection tests against affected workflows.
  • Document the owner, control gap, and remediation deadline for this risk class.

Source

https://blog.mithrilsecurity.io/poisoning-open-source-llms-backdooring-gpt-j-6b/

Talk to AI CISO