📚 Adobe on the Legal Hot Seat: Lawsuit Claims AI Was Trained on Pirated Books
In a case that could reshape how generative AI systems are built and governed, Adobe Inc. is facing a proposed class-action lawsuit in the United States alleging it used copyrighted books without permission to train its artificial intelligence models. The complaint, filed by Oregon author Elizabeth Lyon in the U.S. District Court for the Northern District of California, claims that Adobe’s SlimLM language models were trained on works — including Lyon’s own instructional books — that were incorporated into their dataset without authorization. (Reuters)
This legal challenge highlights a wider storm of copyright disputes confronting major tech companies as AI becomes increasingly reliant on massive text collections for training. With similar lawsuits already targeting industry giants like Apple, Salesforce, and Anthropic, Adobe now finds itself at the center of a growing reckoning over data usage, creators’ rights, and intellectual property in the age of AI. (Reuters)
🧠 The Heart of the Dispute: AI, Data, and Copyright
At issue is SlimLM, a series of small language models Adobe markets for smart document assistance tasks on mobile devices — things like summarizing text or answering questions. Adobe says SlimLM was pre-trained on a publicly available dataset called SlimPajama-627B, provided by AI chip company Cerebras. (The Outpost)
However, Lyon’s lawsuit argues that SlimPajama wasn’t as clean as it sounds. According to court filings, SlimPajama is a derivative of the RedPajama dataset, which itself incorporates a massive collection known as Books3 — roughly 191,000 books sourced from online repositories without proper licensing. Because SlimPajama allegedly “contains” these works, Lyon claims Adobe ended up training its AI on copyrighted material without consent, credit, or compensation. (The Outpost)
If proven, the case could signal serious legal exposure for Adobe and others that rely on similar data pipelines — a trend accelerated by the rapid adoption of generative AI tools. (Reuters)
⚖️ A Broader Legal Landscape
This isn’t an isolated fight. Several other high-profile copyright battles involving AI training data are already underway:
- In September 2025, Anthropic agreed to a $1.5 billion settlement with authors over allegations its AI used pirated books — marking one of the largest payouts in AI copyright litigation. (TechJuice)
- Apple and Salesforce have also faced lawsuits alleging they used RedPajama (and the underlying Books3 material) without permission to train their AI offerings. (TechJuice)
Together, these cases underscore a growing legal fault line: how AI developers source and justify the use of vast datasets that may include copyrighted works. Courts may soon be called on to define the limits of “fair use” and determine whether downstream users like Adobe can be held liable for training on derivative datasets. (The Outpost)
📌 Why This Matters
At stake are both legal and ethical standards for the AI industry:
- Creators’ Rights: Writers and other content owners argue their work is being used unfairly to fuel commercial technology without compensation. (The Outpost)
- Data Transparency: The case highlights the murky world of open-source datasets and the challenges companies face in verifying what’s actually in them. (The Outpost)
- Future of AI Development: A ruling against Adobe could force companies to rethink how they gather and license training data — potentially increasing the cost and complexity of building AI models. (Reuters)
📚 Glossary
- Class-Action Lawsuit: A legal action filed by one or more people on behalf of a larger group with similar claims.
- Language Model: A type of AI that learns patterns in text data to generate or interpret human language.
- Training Dataset: A large body of text used to teach an AI how language works.
- SlimLM: Adobe’s small language model suite designed for assisting with documents.
- RedPajama / SlimPajama Datasets: Publicly released large corpora used for AI training; RedPajama includes Books3, a controversial dataset of books.
- Books3: A collection of ~191,000 books widely used — and legally disputed — as training data for AI research.
Source: https://www.techinasia.com/news/adobe-sued-alleged-misuse-authors-work-ai-training