Microsoft & OpenAI Investigate if DeepSeek Obtained Data from OpenAI

  • Published on January 29, 2025
  • In AI News

Security researchers from Microsoft believe that individuals possibly linked to DeepSeek are “exfiltrating a large amount of data” using OpenAI’s API. 

DeepSeek AI

Illustration by Supreeth Koundinya

OpenAI, the company behind the GPT/o1 series of models, and Microsoft are investigating whether Chinese AI startup DeepSeek obtained unauthorised data outputs from OpenAI’s models. 

As reported by Bloomberg, security researchers from Microsoft believe that individuals possibly linked to DeepSeek are “exfiltrating a large amount of data” using OpenAI’s API. 

Microsoft is OpenAI’s largest investor, and, as per reports, it notified OpenAI of the suspected activity, which violates the company’s terms of service. 

Moreover, several DeepSeek users on social media speculate that the model displays similar tendencies to OpenAI. 

Am I missing something? Did @deepseek_ai copy/paste @OpenAI docs and just forget to change some references? Or is it some standard in docs that I’m just not familiar with 🤔https://t.co/Rmts3RT9Q6 pic.twitter.com/jSM4vschS3

— Benita (@NirBenita) January 26, 2025

OpenAI also told the Financial Times that it had seen “some evidence of distillation”, which is a technique to improve the performance of an AI model by using outputs from another one.

A user on Reddit also spotted the DeepSeek model trying to generate an answer that complies with OpenAI’s terms of use. 

DeepSeek’s latest reasoning model, R1, has outperformed OpenAI’s o1, the company’s most powerful model available for public use. R1 scored higher than o1 on multiple benchmarks involving logic, reasoning, coding, and mathematics. 

Recently, DeepSeek’s official app dethroned OpenAI’s ChatGPT and other competing AI apps in the ‘Top Charts’ on the US App Store for iPhone and iPad.  

Recently, around $589 billion was wiped out from GPU giant NVIDIA’s market cap. This was likely because DeepSeek was built with little computing and capital, raising concerns about the demand for GPUs and other AI resources to build state-of-the-art models. 

For instance, one of DeepSeek’s previous models, the V3, used just about 2048 NVIDIA H800 GPUs to achieve performance better than most open-source models. It also only took $5.5 million to train the model. 

Andrej Karpathy, former OpenAI researcher, said the DeepSeek V3’s level of capability is “supposed to require clusters of closer to 16,000 GPUs”. 

DeepSeek’s parent company, High Flyer, is a Chinese hedge fund company. While the company was founded in 2015, the DeepSeek project was started in 2023.

US President Donald Trump said, “The release of DeepSeek AI from a Chinese company should be a wake-up call for our industries.” He added that he views DeepSeek producing an AI model using cheaper methods “as a positive”. 

DeepSeek has also announced Janus Pro, an AI image generation model, which is claimed to offer better results than OpenAI’s DALL-E 3. 

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Association of Data Scientists

GenAI Corporate Training Programs

India’s Biggest Developers Summit

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Download the easiest way to

stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

February 5 – 7, 2025 | Nimhans Convention Center, Bangalore

Rising 2025 | DE&I in Tech & AI

Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru

Data Engineering Summit 2025

15-16 May, 2025 | 📍 Taj Yeshwantpur, Bengaluru, India

17-19 September, 2025 | 📍KTPO, Whitefield, Bangalore, India

MachineCon GCC Summit 2025

19-20th June 2025 | Bangalore

discord icon

Our Discord Community for AI Ecosystem.

0
Show Comments (0) Hide Comments (0)
0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x