DeepSeek AI's Low-Cost Models Suspected to Use OpenAI Data, Sparks Irony Online

The emergence of DeepSeek AI, a Chinese-developed model, has sparked significant controversy and concern within the U.S. tech industry. DeepSeek's R1 model, touted as a cost-effective alternative to Western AI offerings like ChatGPT, has led to a dramatic $600 billion drop in Nvidia's market value, with the company's shares plummeting by 16.86%—marking the largest loss in Wall Street history. Other tech giants such as Microsoft, Meta Platforms, and Google's parent company Alphabet also experienced declines ranging from 2.1% to 4.2%, while AI server manufacturer Dell Technologies saw an 8.7% drop.

DeepSeek's claim that its model, built on the open-source DeepSeek-V3, requires significantly less computing power and was trained for just $6 million, has raised eyebrows and led to speculation about its data sources. OpenAI and Microsoft are now investigating whether DeepSeek used OpenAI's API to incorporate OpenAI's AI models into its own, a practice known as distillation. This technique involves training smaller models by extracting data from larger, more advanced ones, which violates OpenAI's terms of service.

OpenAI has expressed concerns about the protection of its intellectual property, stating that it engages in countermeasures to safeguard its models and works closely with the U.S. government to prevent unauthorized use by competitors and adversaries. President Donald Trump's AI czar, David Sacks, highlighted the issue, suggesting that leading U.S. AI companies will take steps to prevent such distillation practices in the coming months.

The irony of OpenAI's situation has not gone unnoticed, given its own history of using copyrighted material to train ChatGPT. In January 2024, OpenAI acknowledged the necessity of using copyrighted materials to train large language models, arguing that excluding such data would hinder the development of AI systems that meet modern needs. This stance has fueled ongoing debates about the ethics and legality of using copyrighted materials in AI training, with high-profile lawsuits from The New York Times and a group of 17 authors, including George R. R. Martin, challenging the practice.

As the industry grapples with these issues, the rise of DeepSeek serves as a wake-up call for the U.S. tech sector, prompting a reevaluation of AI development practices and intellectual property protection strategies.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.