Mercor Data Breach: 40,000 AI Contractors’ Biometric Data Stolen as Voice Cloning Threats Escalate

On April 4, 2026, the notorious extortion group Lapsus$ posted Mercor on its leak site. According to the leaked sample index, the data dump comprises roughly 4 terabytes of data covering voice biometrics and government-issued identity documents for more than 40,000 contractors who had signed up to label data, record reading passages, and run through verification calls for AI training.

Breach Details

The contractor onboarding pipeline at Mercor required a passport or driver’s license scan, a webcam selfie, and a sit-down voice recording reading scripted prompts in a quiet room. This sequence, stored in one row of a single database, represents exactly what synthetic voice cloning services need as input.

According to a February 2026 report by the Wall Street Journal, high-quality voice cloning now requires roughly 15 seconds of clean reference audio for tools available off the shelf. The Mercor recordings are reported to average two to five minutes of studio-clean speech per contractor — far exceeding that threshold.

Why This Breach Is Different

This breach has drawn particular alarm because it merges two categories of data that were previously typically separated:

Voice Biometric Data: Most past voice leaks either involved call center breaches where recordings were stolen without easy identity mapping, or ID-document brokers leaking driver’s licenses and selfies without attached audio. Mercor combined both columns in the same database row.

Verified Identity Credentials: Attackers now possess not just the audio material needed to clone voices, but also the verified identity documents — the exact credentials needed to put those voice clones to practical use.

Potential Threats

Security experts warn that the breach could enable:

  • Voice Deepfake Fraud: In 2024, a finance worker at Arup wired approximately $25 million after a multi-person deepfake video call. The leaked Mercor data provides source material of higher quality than public footage.
  • Identity Fraud: Attackers could use stolen identity documents combined with voice synthesis for bank fraud, phone scams, and other crimes.
  • Social Engineering Attacks: Using specific individuals’ voice samples for highly convincing deception campaigns.

Five contractor lawsuits were filed within ten days of the leak posting. Plaintiffs argue that the company collected voice prints under a “training data” framing without making clear they were also permanent biometric identifiers.

Industry Implications

The incident highlights once again the security risks in the AI training data supply chain. As the AI industry’s demand for labeled data grows exponentially, hundreds of thousands of data annotators are handing their biometric information to third-party platforms with varying levels of security protection.

Security analysts are calling for stricter data protection standards, particularly for AI training data collection and storage processes involving biometric information.

Source: ORAVYS | Hacker News