Decode

Decoding how AI copyright battles could change the future of AI development

This week, we decode the many ongoing AI copyright battles, and how these legal disputes could shape the future of AI development, licensing, and power dynamics in the industry.

Shamita Islur

24 Feb 2025 11:06 IST

New Update

When the Chinese AI model DeepSeek rose to prominence, OpenAI, the creator of ChatGPT, raised concerns over the model’s potential unauthorised use of its data. DeepSeek's rapid development of its AI model, R1, reportedly beats models like OpenAI o1 on several math and reasoning benchmarks which are developed with significantly larger budgets. Security researchers at Microsoft, a major investor in OpenAI suspected that individuals potentially linked to DeepSeek were extracting substantial amounts of data through OpenAI's API in 2024. This has led to investigations into whether DeepSeek inappropriately utilised OpenAI's outputs for training purposes, possibly breaching OpenAI's terms of service.

This situation is particularly ironic, considering AI companies have built themselves on scraping the internet, using news articles, books, art and social media posts to train their models, often without explicit permission from the creators.

This has led to multiple lawsuits against OpenAI and similar AI companies in recent years, challenging these practices. Creators and publishers have come forward alleging that their copyrighted works have been used without consent to train AI systems. This makes me question if these lawsuits are effective in curbing the exploitation of data or if the tech industry is too powerful and well-funded to be held accountable.

Lawsuits against AI companies

Several lawsuits have been filed against AI companies for alleged copyright infringement over the years.

Thomson Reuters vs. Ross Intelligence (2020)

In May 2020, Thomson Reuters initiated a lawsuit against the legal AI startup Ross Intelligence. The publication alleged that the AI startup utilised materials from Westlaw, its legal research platform, without authorisation. Ross Intelligence argued that its use constituted ‘fair use,’ noting that the data added ‘noise’ to its machine learning tool designed to extract legal answers. However, in a ruling in early 2025, U.S. District Judge Stephanos Bibas rejected this defence. Even though the publication’s legal summaries (headnotes) were based on publicly available court rulings, the court noted that Reuters’ way of selecting and summarising key legal points involved creativity which is protected by copyright. This is one of the first US court judgments addressing AI and intellectual property.

Visual artists vs. AI art generators (2023)

In 2023, a group of visual artists, including Sarah Anderson, Karla Ortiz, Kelly McKernan, Grzegorz Rutkowski, Julia Kaye, Hawke Southworth, Gerald Brom, Gregory Manchess, Jingna Zhang and Adam Ellis filed a class-action lawsuit against AI art generators like Midjourney, DeviantArt, Runway and Stability AI. The artists claimed that their copyrighted works were used without consent to train AI models capable of generating new images. Initially, U.S. District Court Judge William H. Orrick ruled to dismiss the lawsuit that many of the artworks were not registered for copyright. The complaint cited scientific papers from MIT, Harvard, and Brown published in 2023 which state that diffusion models including Stable Diffusion are good at creating convincing images that resemble the work of specific artists if their name is included in the prompt.

The New York Times vs. OpenAI and Microsoft (2023)

In December 2023, The New York Times filed a lawsuit against OpenAI and Microsoft, alleging that the tech giants illegally used ‘millions’ of copyrighted articles to develop AI models like ChatGPT and Bing. The publications didn’t ask for specific financial damages but did imply wanting damages from both companies. It also asked the court to get OpenAI to destroy any AI models, databases and training data using copyrighted material. OpenAI argued that its scanning of data to feed its AI engine was permissible under fair use.

Universal Music Group vs. Anthropic (2023)

Universal Music Group (UMG) filed a lawsuit against AI company Anthropic, alleging that its AI models including chatbot Claude were trained on UMG's copyrighted music lyrics. UMG argued that this unauthorised use was a violation of intellectual property rights since the AI could generate text mimicking the style of UMG's artists. The AI model has reportedly reached an agreement to maintain its existing guardrails and apply them to future Claude models.

Getty Images vs. Stability AI (2023)

In 2023, Getty Images sued Stability AI in the UK, accusing the company of unlawfully scraping millions of images from its site to train AI models. Getty accused the AI company of copying millions of its photos without a license and using them to train Stable Diffusion and that it infringes upon Getty's trademarks.

Authors vs. OpenAI and Meta (2023)

Comedian and author Sarah Silverman, along with authors Christopher Golden and Richard Kadrey filed lawsuits against OpenAI and Meta in 2023 alleging that their copyrighted books were used without permission to train AI language models, resulting in outputs replicating their writing styles and content.

ANI vs. OpenAI (2024)

Asian News International (ANI) filed a lawsuit against OpenAI claiming it unlawfully used its copyrighted content to train and operate ChatGPT. Recently, the Indian Music Industry (IMI), along with music labels T-Series and Saregama India, moved the court seeking to join the news agency's case against the AI company. On February 18, the Delhi High Court sought a response from the AI company on the application filed by the IMI to intervene in the copyright lawsuit. The hearing is scheduled for Friday, February 21.

In response to these lawsuits, most of these AI companies have often invoked the ‘fair use’ doctrine as a defence. Their argument? Training AI models on publicly available content is similar to human learning given that it transforms the original work into something new and does not serve as a direct market replacement.

However, recent rulings, like the Thomson Reuters case, have challenged this, suggesting that depriving copyright owners of licensing opportunities can be seen as competitive and undermines the argument in itself.

Still, there have been cases where companies have won copyright lawsuits. prevailed. In Google’s 2015 case against the Authors Guild, Google's attempt to digitise books through scanning and computer-aided recognition for searching online was seen as a concern by many authors and publishers since it had not sought their permission. The ruling stated that this falls under fair use because it provided only snippets of copyrighted works, using it as a reference rather than a full-text replacement.

As AI technologies continue to evolve, the outcomes of these lawsuits could influence how copyrighted materials can be utilised in AI training and development.

How it could shape the future of AI

For example, AI companies may be forced to enter licensing deals with publishers, artists and content creators, in a bid to ensure that training data is ethically sourced. Moreover, if AI giants are restricted in how they collect training data, development may slow down due to limited access to diverse datasets. This could potentially negate the need for AI models.

Additionally, companies might shift toward using exclusively licensed or internally generated datasets. This could create a divide between well-funded AI companies and smaller startups that cannot afford costly licenses.

In the Thomson Reuters case, after the publication rejected Rose Intellingence's attempt to license Westlaw’s content, the AI startup purchased 25,000 bulk memos of questions and answers written by lawyers at LegalEase.

Similarly, companies like Soul AI are hiring humans to train their data to avoid copyright lawsuits.

However, this raises the question of whether it is even possible to protect one’s work. While lawsuits offer a pathway for creators to assert their rights, the effectiveness of legal action in protecting one's work against AI companies remains uncertain. The reality is that fighting AI companies in court is costly, time-consuming and favours corporations with deep pockets. While some companies like Reuters, have won cases, independent creators lack the resources to challenge these tech giants. What’s more? How can one reverse the damage caused by such unethical training?

The most frustrating aspect of this issue is that AI companies are more willing to engage in prolonged legal battles rather than fairly compensate creators. Instead of negotiating licensing deals, they take their chances in court, knowing that many lawsuits will either be dismissed or result in settlements cheaper than licensing agreements.

This approach not only fosters mistrust but also raises questions about the sustainability of such practices.

AI development is increasingly marred by issues of intellectual property and creators' rights. The ongoing debate over AI and copyright has revealed a deep contradiction. AI companies that have built their businesses by utilising the internet’s data, be it from books, articles, artwork or social media, are now quick to be offended when another player allegedly does the same to them.

If OpenAI's allegations turn out to be true, it could lead to legal action or restrictions on DeepSeek. But what does that say about OpenAI's own approach? What does that say about other AI companies? This battle is not about copyright, it’s about power and AI’s future will not be shaped by ethics or fairness but by the powerful in the industry.

AI regulations AI laws and regulations Open AI lawsuit AI lawsuits AI copyright