news

YouTube Creators Sue Apple, Amazon Over Unauthorized AI Training Data

YouTube creators filed class-action lawsuits against Apple and Amazon for using millions of copyrighted videos to train AI products without consent. Legal precedent is shifting against tech companies.

Leighton Cosseboom

13 Apr 2026 — 3 min read

Three YouTube creator entities filed class-action lawsuits in April 2026 against Apple and Amazon in California and Seattle courts, alleging both companies used millions of copyrighted videos to train commercial AI products without creator consent or compensation.

Panda-70M Dataset at the Center of Legal Claims

The lawsuits filed by Ted Entertainment (parent company of h3h3 Productions), golfer Matt Fisher (MrShortGame Golf), and Golfholics target a dataset called Panda-70M. Created and released by Snap in 2024, the dataset contains 3.8 million YouTube videos split into approximately 70 million clips with captions.

The Apple complaint alleges the company achieved "massive financial success" from AI features built on creators' content, stating the success "would not have been possible without the video content created by Plaintiffs and Class Members." Amazon is accused of using harvested videos to develop Nova Reel, its text-to-video generator.

Plaintiffs allege that data collectors used technically sophisticated methods to extract videos at scale, including bouncing IP addresses, descrambling tools, and virtual servers to bypass YouTube's protections. The suits seek injunctions preventing further unauthorized use, plus undisclosed financial damages.

A Separate Dataset Implicates Additional AI Companies

A parallel investigation by Proof News identified a second dataset called "YouTube Subtitles," drawn from 173,536 videos across 48,000-plus channels. Apple, Nvidia, Anthropic, and Salesforce are all alleged to have used this dataset to train AI models without creator authorization.

Affected creators confirmed in that investigation include Marques Brownlee (MKBHD), MrBeast, and PewDiePie. Corporate publishers including BBC, NPR, and The Wall Street Journal also had content included without consent.

Brownlee had earlier raised public concerns after OpenAI's Sora video generator produced outputs that appeared to mimic his specific studio setup, including a distinctive plant on his desk. OpenAI's CTO was unable to provide a concrete answer about whether his content had been scraped.

Legal Precedent Shifting Against AI Companies

The April 2026 filings arrive after a significant judicial development. In February 2025, a U.S. federal court ruled in favor of Thomson Reuters in Thomson Reuters v. Ross Intelligence, the first U.S. court decision directly addressing whether using copyrighted materials to train AI constitutes fair use. The court found that Ross Intelligence's use of Westlaw's legal content failed the transformative use test and harmed the market for the original work.

That ruling has strengthened the legal position of plaintiffs across approximately 25 to 30 major copyright infringement lawsuits currently pending in U.S. federal courts. The complaint filed against Snap in February 2026 described the dataset compilation as "an unconscionable attack on the community of content creators whose content is used to fuel the multi-trillion-dollar generative AI industry without any compensation."

Looking for World-Class PR & Comms in APAC?

Tailored service packages for select brands and agencies.

Get in Touch →

Implications for Brands With Video Libraries

The lawsuits carry direct implications for corporate publishers, not only individual creators. BBC, NPR, and The Wall Street Journal were all confirmed to have had content included in AI training datasets without authorization, establishing that brand-produced video and editorial content faces the same exposure as creator content.

Enterprise AI legal guidance now consistently flags that reviewing AI vendor contracts for indemnification clauses is a non-negotiable step for legal and procurement teams. Some enterprise AI platforms, including certain offerings from OpenAI and Microsoft, provide contractual indemnification if their AI outputs lead to copyright infringement claims. Others do not.

The EU AI Act adds a further compliance dimension for brands with European operations, mandating that generative AI systems maintain detailed records of training data. No equivalent federal requirement currently exists in the United States.

The cases are proceeding in California and Seattle courts. No trial dates have been announced.

Want to reach thousands of marketing and comms professionals across Asia?

Get your brand in front of industry decision-makers.

Partner with Mission Media →