📚 Adobe on the Legal Hot Seat: Lawsuit Claims AI Was Trained on Pirated Books

In a case that could reshape how generative AI systems are built and governed, Adobe Inc. is facing a proposed class-action lawsuit in the United States alleging it used copyrighted books without permission to train its artificial intelligence models. The complaint, filed by Oregon author Elizabeth Lyon in the U.S. District Court for the Northern District of California, claims that Adobe’s SlimLM language models were trained on works — including Lyon’s own instructional books — that were incorporated into their dataset without authorization. (Reuters)

This legal challenge highlights a wider storm of copyright disputes confronting major tech companies as AI becomes increasingly reliant on massive text collections for training. With similar lawsuits already targeting industry giants like Apple, Salesforce, and Anthropic, Adobe now finds itself at the center of a growing reckoning over data usage, creators’ rights, and intellectual property in the age of AI. (Reuters)

🧠 The Heart of the Dispute: AI, Data, and Copyright

At issue is SlimLM, a series of small language models Adobe markets for smart document assistance tasks on mobile devices — things like summarizing text or answering questions. Adobe says SlimLM was pre-trained on a publicly available dataset called SlimPajama-627B, provided by AI chip company Cerebras. (The Outpost)

However, Lyon’s lawsuit argues that SlimPajama wasn’t as clean as it sounds. According to court filings, SlimPajama is a derivative of the RedPajama dataset, which itself incorporates a massive collection known as Books3 — roughly 191,000 books sourced from online repositories without proper licensing. Because SlimPajama allegedly “contains” these works, Lyon claims Adobe ended up training its AI on copyrighted material without consent, credit, or compensation. (The Outpost)

If proven, the case could signal serious legal exposure for Adobe and others that rely on similar data pipelines — a trend accelerated by the rapid adoption of generative AI tools. (Reuters)

⚖️ A Broader Legal Landscape

This isn’t an isolated fight. Several other high-profile copyright battles involving AI training data are already underway:

In September 2025, Anthropic agreed to a $1.5 billion settlement with authors over allegations its AI used pirated books — marking one of the largest payouts in AI copyright litigation. (TechJuice)
Apple and Salesforce have also faced lawsuits alleging they used RedPajama (and the underlying Books3 material) without permission to train their AI offerings. (TechJuice)

Together, these cases underscore a growing legal fault line: how AI developers source and justify the use of vast datasets that may include copyrighted works. Courts may soon be called on to define the limits of “fair use” and determine whether downstream users like Adobe can be held liable for training on derivative datasets. (The Outpost)

📌 Why This Matters

At stake are both legal and ethical standards for the AI industry:

Creators’ Rights: Writers and other content owners argue their work is being used unfairly to fuel commercial technology without compensation. (The Outpost)
Data Transparency: The case highlights the murky world of open-source datasets and the challenges companies face in verifying what’s actually in them. (The Outpost)
Future of AI Development: A ruling against Adobe could force companies to rethink how they gather and license training data — potentially increasing the cost and complexity of building AI models. (Reuters)

📚 Glossary

Class-Action Lawsuit: A legal action filed by one or more people on behalf of a larger group with similar claims.
Language Model: A type of AI that learns patterns in text data to generate or interpret human language.
Training Dataset: A large body of text used to teach an AI how language works.
SlimLM: Adobe’s small language model suite designed for assisting with documents.
RedPajama / SlimPajama Datasets: Publicly released large corpora used for AI training; RedPajama includes Books3, a controversial dataset of books.
Books3: A collection of ~191,000 books widely used — and legally disputed — as training data for AI research.

Source: https://www.techinasia.com/news/adobe-sued-alleged-misuse-authors-work-ai-training

FEATURED TAGS

computer program javascript nvm node.js Pipenv Python 美食 AI artifical intelligence Machine learning data science digital optimiser user profile Cooking cycling green railway feature spot 景点 e-commerce work technology F1 中秋节 dog setting sun sql photograph Alexandra canal flowers bee greenway corridors programming C++ passion fruit sentosa Marina bay sands pigeon squirrel Pandan reservoir rain otter Christmas orchard road PostgreSQL fintech sunset thean hou temple in sungai lembing 海上日出 SQL optimization pieces of memory 回忆 garden festival ta-lib backtrader chatGPT generative AI stable diffusion webui draw.io streamlit LLM speech recognition AI goverance Singapore AI policy prompt engineering fastapi stock trading artificial-intelligence Tariffs AI coding AI agent FastAPI 人工智能 Tesla AI5 AI6 FSD AI Safety AI governance LLM risk management Vertical AI Insight by LLM LLM evaluation AI safety enterprise AI security AI Governance Privacy & Data Protection Compliance Microsoft Scale AI Claude Anthropic 新加坡传统早餐咖啡 Coffee Singapore traditional coffee breakfast Quantitative Assessment Oracle OpenAI Market Analysis Dot-Com Era AI Era Rise and fall of U.S. High-Tech Companies Technology innovation Sun Microsystems Bell Lab Agentic AI McKinsey report Dot.com era AI era Speech recognition Natural language processing ChatGPT Meta Privacy Google PayPal Edge AI Enterprise AI Nvdia AI cluster COE Singapore Shadow AI AI Goverance & risk Tiny Hopping Robot Robot Materials SCIGEN RL environments Reinforcement learning Continuous learning Google play store AI strategy Model Minimalism Fine-tuning smaller models LLM inference Closed models Open models AI compliance Privacy trade-off MIT Innovations Federal Reserve Rate Cut Mortgage Interest Rates Credit Card Debt Management Nvidia SOC automation Investor Sentiment Enterprise AI adoption AI Innovation AI Agents AI Infrastructure Humanoid robots AI benchmarks AI productivity Generative AI Workslop Federal Reserve Enterprise AI Adoption Fintech AI automation Multimodal AI Google AI Digital Markets Act AI agents AI integration Market Volatility Government Shutdown Rate-cut odds AI Fine-Tuning LLMOps Frontier Models Hugging Face Multimodal Models Energy Efficiency AI coding assistants AI infrastructure Semiconductors Gold & index inclusion Multimodal Chinese open-source AI AI hardware Semiconductor supply chain Open-Source AI AI Research prompt injection LLM security red teaming AI spending AI startups AI Bubble Quantum Computing Multimodal models Open-source AI AI shopping Multi-agent systems AI research breakthroughs AI in finance Financial regulation Custom AI Chips Solo Founder Success Newsletter Business Models Indie Entrepreneur Growth Multimodal AI models Apple AI video generation Claude AI Infrastructure AI chips robotaxi AI commerce tech layoffs Gemini AI AI chatbots Global expansion AI security embodied AI AI in Finance AI tools Claude Code IPO artificial intelligence venture capital multimodal AI startup funding AI chatbot AI browser space funding Alibaba quantum computing model deployment DeepSeek enterprise AI AI investing tech bubble reinforcement learning AI investment robotics prompt injection attacks AI red teaming agentic browsing China tech race agentic AI cybersecurity agentic commerce AI coding agents edge AI AI search automation AI boom AI adoption data centre multimodal models model quantization AI therapy autonomous trucking workplace automation neuro-symbolic AI AI bubble open‑source AI humanoid robots tech valuations sovereign cloud Microsoft Sentinel context engineering large language models vision-language model open-source LLM Digital Assets valuation Qwen3‑Max AI drug discovery AI robotics AI innovation open-source AI reasoning models consumer protection Hugging Face updates Gemini 3 investment-grade bonds tokenization data residency AI funding AI regulation GGUF Gemini 3 Qwen AI AI reasoning small language models enterprise AI adoption DeepSeek‑V3.2 Zhipu AI AI banking key enterprise AI voice AI AI competition GPT-5.2 crypto finance GPT‑5.2 Microsoft 365 Copilot stablecoin Singapore fintech Anthropic Agent Skills Enterprise AI standards AI interoperability enterprise automation stablecoins Hugging Face models Gemini 3 Flash AI Mode in Search AI infrastructure partnership autonomous AI digital payments stablecoin regulation agentic digital assets model architecture open banking Innovation Qwen‑Image‑2512 Hong Kong fintech Investment Digital Banking Payments HuggingFace models open source AI Hong Kong IPO brain-computer interface Regulation digital banking digital transformation Automation Open‑source AI Enterprise adoption