It’s Not Piracy When AI Does It

When individuals used technology to share paywalled knowledge, it was condemned as piracy. When AI models now scrape, summarize, and reproduce that same information without paying for it, it’s celebrated as innovation. This double standard exposes a deeper contradiction in how we define ownership, creativity, and fairness online. Drawing on examples from Sci-Hub to ChatGPT, the article traces how AI has become the new middleman in the struggle between access and control, where the same act that once got people sued now fuels billion-dollar industries. It asks a pressing question: if technology can bypass the walls of information, who truly benefits: those seeking knowledge, or those selling it?

The internet was supposed to make knowledge universal. Instead, it built walls. Academic papers, research databases, journalism, and even entertainment, all locked behind subscriptions that often cost more than people in many countries can afford. A simple $10 monthly fee may sound small in one part of the world, but it’s out of reach where that’s a week’s wage.

Access to knowledge should never be a privilege. Yet, when technology tried to tear down these walls through platforms like Sci-Hub, libgen or many other revolutionary archives, it was labeled piracy and shut down.

But now, when large language models like ChatGPT summarize, reproduce, or even recreate that same information without paying a cent, the narrative shifts. Suddenly, it’s not exactly piracy, it’s innovation.

The hypocrisy is striking: when individuals and small collectives used technology to liberate knowledge, they were criminalized; when corporate AI giants engage in their own form of AI piracy under the banner of advancement, they’re celebrated.

It’s in this tension, between legality, accessibility, and profit that the real loophole in our understanding of AI piracy now lies.

The Old War: Piracy vs. Paywalls

Piracy has always been the shadow of exclusivity. 

From torrent networks to academic repositories, these platforms emerged not out of greed, but out of resistance, a demand for fairness in a world where knowledge and culture were locked behind paywalls priced in privilege 1. They were born from an imbalance: They arose because technology made it possible to share information globally at almost no cost, yet legal and commercial systems continued to restrict that access through expensive subscriptions, regional barriers, and intellectual property enforcement.

Protecting intellectual property is, of course, essential, it safeguards creativity, innovation, and the rights of those who produce original work. But when protection turns into exclusion, when the price of access determines who gets to learn, read, or innovate, it stops being about rights and becomes about control. Denying information on the grounds of affordability is not justice, it is capitalistic gatekeeping disguised as legality 2.

In practice, this means that a student in a developing country faces the same paywall as a funded researcher in the USA, despite their economic realities being worlds apart. Knowledge, which could advance science or empower communities, remains trapped behind corporate licensing models that prioritize profit over progress. The irony is that technology promised openness, a connected world where information flowed freely but law and markets adapted to make that openness conditional, profitable, and deeply unequal.

Legally, piracy has never had a moral gray zone, intellectual property laws and international agreements like the Berne Convention 3 and TRIPS (Trade-Related Aspects of Intellectual Property Rights) 4 clearly define unauthorized reproduction and distribution as violations of copyright. Yet, the ethics have always been more complex.

When access to education or art becomes prohibitively expensive, civil disobedience takes a digital form. The emergence of such platforms was, in a way, a technological protest against an economic hierarchy of access.

And as these laws tightened, platforms that once tried to bridge this gap like LibGen or Sci-Hub 5 were blocked or taken down in many parts of the world 6. People can still reach them through VPNs or mirror sites, but even that comes with risk and hassle.

What should have been a simple act of curiosity reading a paper, accessing a book now requires technical workarounds and caution. It’s a quiet reminder that the system values ownership more than understanding, and profit more than progress.

Enter AI, the new middleman in this decades-long tug-of-war over access.

AI as the Middleman; Navigating AI Piracy

Here’s where things get complicated. AI doesn’t pirate content in the traditional sense, but it benefits from a digital landscape built on unconsented data use. It learns from massive datasets some openly available 7, some licensed, and some whose origins are unclear.

When you ask an AI to summarize a paywalled article or academic paper, it might produce something that sounds strikingly close to the original. But the model never paid for access 8.

That raises a critical question: if an AI trained on paywalled material can reproduce detailed information from it, is that AI piracy or just a byproduct of machine learning? Legally, this sits in a gray zone. Copyright law wasn’t written with generative models in mind, and courts are still figuring out whether training on protected material counts as fair use or infringement.

So the issue isn’t just ethical, it’s structural. AI and intellectual property concerns challenge the very boundaries of ownership, creativity, and what it means to use information in the digital age 9.

The Legal Frontline of AI Piracy

The question of whether AI training counts as fair use or copyright infringement isn’t just theoretical, it’s already being fought in courtrooms.

Major publishers, artists, and media outlets are suing AI companies, arguing that their work has been copied and repackaged without consent or compensation. The New York Times’ lawsuit against OpenAI and Microsoft 10 is the flagship case, but it’s far from the only one.

Disney, Universal and Warner Bros Discovery have filed suit against MiniMax 11, Reddit is suing Perplexity for scraping its user-generated content 12, and artists like Eminem have joined a growing list of creators taking legal action against AI firms 13

At its core, it asks: can an AI model legally train on copyrighted content if it transforms that material into something new, or does that constitute AI piracy?

AI developers argue that training models on large swaths of text is transformative, a necessary step for building systems that can understand and generate language 14. Critics counter that this practice borders on AI piracy, effectively creating a derivative work without paying the creators 15 16.

Legislations are now being forced to define the boundaries of creativity, ownership, and transformation in an age where machines can remix millions of works in seconds, raising urgent questions about ethical AI development. Whatever the outcome, these rulings will set the precedent for how knowledge itself can be monetized, shared, or restricted in the era of AI piracy.

When AI Sounds Certain but Isn’t

Even if AI could legally access paywalled content, a bigger issue remains: can we trust what it produces?

When a model claims to summarize or draw from a source that a human user can’t verify, say, a paywalled academic paper, how do you know it’s telling the truth?

You don’t.

AI systems generate text based on patterns, not direct citations. So when it produces something that sounds like it came from a specific article, it is not pulling exact quotes or guaranteed facts, it is predicting what that information should look like 17 18.

The result can be convincing but wrong, and since the user can’t access the original source, there’s no way to fact-check it. Indeed, one recent study by the Stanford Institute for Human-Centered AI (HAI) found that in legal‑task prompts, large language models exhibited ‘hallucination’ rates of between 69 % and 88 %, meaning that the vast majority of responses contained errors or fabricated content 19.

This creates a quiet but serious distortion. A student, journalist, or researcher might unknowingly reference something that was never actually said in the paper the AI claims to summarize. The model’s job is to give you what it thinks you want, not necessarily what’s true.
In that sense, in that sense AI piracy doesn’t just blur ownership, it blurs reality. And once trust breaks down, the whole promise of democratized information starts to crumble.

Democratizing Knowledge While Undermining It

AI promises to make information universally accessible. Anyone with a prompt can now tap into insights that once sat behind paywalls or in academic databases. That sounds like progress until you consider AI and intellectual property. These systems rely on human-made content: journalism, research, art and writing. And if AI tools begin replacing the very industries that produce that content, the well they draw from starts to dry up, and original creators risk losing visibility for their work.

It’s a strange cycle. The technology that claims to democratize knowledge could end up hollowing out its foundation. If publishers and creators can’t sustain their work because AI models reproduce it freely, the diversity and depth of information may shrink. Over time, the internet is increasingly becoming a space dominated by AI-generated content, where models scrape from other AI outputs rather than original sources, making information less and less reliable. The internet might feel fuller than ever but with fewer original voices behind it.

So the real challenge isn’t just legal or technical; it’s existential.
How do we keep AI piracy from eroding the ecosystem it depends on?

Conclusion: When Piracy Becomes Innovation

What was once condemned as piracy has quietly become the experimental ground for artificial intelligence. The same kind of content that once got people sued, fined, or deplatformed is now being used at scale to train billion-dollar models. And somehow, it’s no longer piracy. It’s research. It’s innovation.

Even as lawsuits pile up and major publishers take AI companies to court, the debates around AI and intellectual property show there’s still no real pause, no injunction, no accountability. The data keeps flowing, the models keep training, and by the time the law catches up, the damage will already be baked into the system.

So we circle back to the question of access. In the name of progress, we’ve given corporations the freedom to do what individuals were punished for. The same capitalist logic that once enforced paywalls now profits from bypassing them. What changed isn’t the principle, but who holds the power to break the rules.

Maybe that’s the real question we’ve ignored in this race toward intelligence. Innovation for whom? Knowledge for whose benefit? When technology stops serving people and starts serving profit, it’s not democratization, it’s consolidation. AI isn’t just reshaping information,  it’s reshaping control.

Unless we question who it serves, progress will become nothing but a pretty word for exploitation.

But there is hope. Australia now requires AI companies to obtain consent before mining creative works, a move that restores some control to creators and acknowledges the rights of those whose labor fuels these models 20. If other jurisdictions follow suit, it could mark the beginning of a shift away from AI piracy toward ethical AI development, innovation that respects creators, rather than exploiting them.

References

  1. Balazs, B. (2016). Pirates in the library – an inquiry into the guerilla open access movement. SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2816925
  2. Drahos, P., & Braithwaite, J. (2002). Information Feudalism: Who Owns the Knowledge Economy? Earthscan. Retrieved October 28, 2025, from https://johnbraithwaite.com/wp-content/uploads/2016/06/Information-Feudalism-Who-Own.pdf
  3. WIPO. (1979). Berne Convention for the Protection of Literary and Artistic Works. https://www.wipo.int/wipolex/en/text/283698
  4. WTO. (1995). AGREEMENT ON TRADE-RELATED ASPECTS OF INTELLECTUAL PROPERTY RIGHTS. World Trade Organization. https://www.wto.org/english/docs_e/legal_e/27-trips.pdf
  5. Himmelstein, D. S., Romero, A. R., Levernier, J. G., Munro, T. A., McLaughlin, S. R., Tzovaras, B. G., & Greene, C. S. (2018). https://elifesciences.org/articles/32822. https://elifesciences.org/articles/32822
  6. Sci-Hub. (2025, August 25). Sci-Hub has been blocked in India. Sci-Hub. Retrieved October 28, 2025, from https://sci-hub.se/sci-hub-blocked-india
  7. Brittain, B., Zieminski, N., & Coates, S. (2025, October 22). Reddit sues Perplexity for scraping data to train AI system. Reuters. Retrieved October 28, 2025, from https://www.reuters.com/world/reddit-sues-perplexity-scraping-data-train-ai-system-2025-10-22/
  8. Verma, S. (2025, August 18). Chatbots can replicate paywalled content. INMA. Retrieved October 28, 2025, from https://www.inma.org/blogs/Generative-AI-Initiative/post.cfm/chatbots-can-replicate-paywalled-content
  9. Harvard Law Review. (n.d.). Artificial Intelligence and the Creative Double Bind. Harvard Law Review, 138(6). https://harvardlawreview.org/print/vol-138/artificial-intelligence-and-the-creative-double-bind/?
  10. Mac, R., & Grynbaum, M. M. (2023, December 27). The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work (Published 2023). The New York Times. Retrieved October 28, 2025, from https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
  11. Reuters. (2025, September 16). Disney, Universal, Warner Bros Discovery sue China’s MiniMax for copyright infringement. Reuters. https://www.reuters.com/legal/litigation/disney-universal-warner-bros-discovery-sue-chinas-minimax-copyright-infringement-2025-09-16/
  12. Reuters. (2025, October 22). Reddit sues Perplexity for scraping data to train AI system. Reuters. https://www.reuters.com/world/reddit-sues-perplexity-scraping-data-train-ai-system-2025-10-22/
  13. The Guardian. (2025, July 2). Eminem, AI and me: why artists need new laws in the digital age | Alexander Hurst. The Guardian. https://www.theguardian.com/commentisfree/2025/jul/02/eminem-ai-artists-laws-digital-big-tech
  14. Chiarello, F., Giordano, V., Spada, I., Fantoni, G., & Barandoni, S. (2024). Future applications of generative large language models: A data-driven case study on ChatGPT. Technovation, 133(May 2024). https://www.sciencedirect.com/science/article/pii/S016649722400052X
  15. Skadden. (2025, May 15). Copyright Office Weighs In on AI Training and Fair Use | Skadden, Arps, Slate, Meagher & Flom LLP. Skadden Arps. Retrieved October 28, 2025, from https://www.skadden.com/insights/publications/2025/05/copyright-office-report
  16. Atkinson, D. (2025). UNFAIR LEARNING: GENAI EXCEPTIONALISM AND COPYRIGHT LAW. https://arxiv.org/pdf/2504.00955
  17. Hillier, M. (2023, February 20). Why does ChatGPT generate fake references? – TECHE. Teche MQ. Retrieved October 28, 2025, from https://teche.mq.edu.au/2023/02/why-does-chatgpt-generate-fake-references/
  18. Walters, W. H., & Wilder, E. I. (2023). Fabrication and errors in the bibliographic citations generated by ChatGPT. Scientific Reports. https://www.nature.com/articles/s41598-023-41032-5
  19. Stanford Institute for Human-Centered AI (HAI), Dahl, M., Magesh, V., & Suzgun, M. (n.d.). Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive Date January 11, 2024 [Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive]. https://hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive
  20. Dimitroff, C. (2025, October 27). Australia bans AI mining of creative works without consent: here’s what that means. RUSSH. https://www.russh.com/tdm-ai-copyright-australia-explained/?utm_campaign=feed&utm_medium=referral&utm_source=later-linkinbio
Picture of Ananthu Anilkumar

Ananthu Anilkumar

Ananthu Anilkumar is a legal professional with a background in development studies and diplomacy. With experience at the United Nations Development Programme (UNDP) and the Office of the High Commissioner for Human Rights (OHCHR), he brings nuanced insight into international cooperation, human rights and peacebuilding. His contributions to Digital Peace explore the intersections of law, development, equity, and global governance.

Join the Discourse

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Sign up for our Newsletter

I agree to receive monthly newsletters and accept data processing as outlined in the data protection policy.

Your Monthly Brief on Technology, Power & Peace

Technology reshapes conflicts, democracy and humanity in real-time. Are you tracking its impact?

Start tracking technology’s impact on peace and democracy.

I agree to receive monthly newsletters and accept data processing as outlined in the data protection policy.

Want a digital world worth living in?

Get monthly insights on technology, peace, global security, and the future of humanity. 

I agree to receive monthly newsletters and accept data processing as outlined in the data protection policy.