What is DeepSeek, the AI chatbot from China that is sending shockwaves through the tech world?

byCandraokey •يناير 29, 2025

0

Chiefs among Chinese artificial intelligence (AI) experts at lab DeepSeek have been making headlines after their research in the area of large language models (LLMs) caught the attention of Silicon Valley, where a recent development pertaining to a new LLM by them has proven to be a contender alongside US-based OpenAI's ChatGPT, a comparable LLM.

High-performance models, introduced this month, are reported to deliver outstanding speed and affordability.

The DeepSeek-R1, the last of the models built using fewer chips, is now rivaling market leaders such as OpenAI, Google, and Meta, causing Nvidia's stock price to plummet on Monday.

This is what we currently know about the industry disruptor from China.

Where did the DeepSeek project originate?

The company, founded in Hangzhou, China in July 2023, was established by Liang Wenfeng, an engineer from Zhejiang University, with specialties in information and electronics.

It was part of the incubation program of High-Flyer, a fund Liang established in 2015. Liang, along with other prominent figures in the industry, aims to attain the level of "artificial general intelligence" that can surpass or match humans in various tasks.

Operating independently, DeepSeek's funding model enables it to undertake bold artificial intelligence projects without the influence of external investors and focus on long-term research and development.

The DeepSeek team is comprised of young graduates from China's topmost institutions of higher education, with a company recruitment process that places greater emphasis on possessing technical skills over professional experience.

To put it briefly, it is regarded as offering a fresh point of view in the process of creating artificial intelligence models.

DeepSeek's journey began in November 2023 with the launch of DeepSeek Coder, an open-source code that was designed for coding tasks.

This was followed by DeepSeek LLM, which aimed to match up against other leading language models. DeepSeek-V2, released in May 2024, gained momentum because of its exceptional performance and low expenses.

It also compelled other leading Chinese technology firms, including ByteDance, Tencent, Baidu, and Alibaba, to reduce the costs of their artificial intelligence models.

What are the capabilities of DeepSeek models?

DeepSeek-V2 was later replaced by DeepSeek-Coder-V2, a more advanced variation with approximately 236 billion parameters.

The model, designed for intricate coding tasks, has a profound understanding window of up to 128,000 tokens.

A token is a component in a text that can often be a word, a particle ("artificial" and "intelligence" are examples), or even a single character. For instance: "Artificial intelligence is great!" comprises four tokens: "Artificial", "intelligence", "is", and "!" (note: corrected list of tokens in the example).

The maximum amount of input text that the model can process at the same time is 128,000 tokens long for a context window.

A broader context window enables the model to comprehend, extract the main points from, or examine longer pieces of text. This is a significant benefit, particularly when dealing with lengthy documents, novels, or intricate conversations.

The company's latest models DeepSeek-V3 and DeepSeek-R1 have solidified its position within the industry.

A 671,000-parameter model, DeepSeek-V3 is an impressively efficient model that requires less computing resources than similar models while achieving excellent results in a variety of benchmark tests that include competitor brands.

The DeepSeek-R1, recently launched this month, focuses on in-depth tasks such as reasoning, coding, and maths. With its capabilities in this area, it challenges the o1, one of the latest models from ChatGPT.

Although DeepSeek has achieved substantial success in a brief period, the company primarily focuses on research and does not have detailed plans for commercialisation in the near future, according to Forbes.

Is there no cost for the end user?

One of the primary factors contributing to DeepSeek's popularity is that it is available at no cost to end-users.

This is the initial advanced AI system made available to users without a cost. Other advanced systems, including OpenAI's GPT-3 equivalent and Claude Sonnet, have a required paid subscription. Some paid subscriptions even set limitations on user access.

Google Gemini remains accessible without a fee, but users of free versions are restricted to utilizing outdated models.

How to use it?

Users can access the DeepSeek chat interface designed for end-users at "chat.deepseek". Simply enter commands on the chat screen and click the "search" button to search the internet.

There is an "in-depth look" feature that provides more detailed information on any topic. Although this feature offers more comprehensive answers to users' queries, it can also search a broader range of websites via the search engine. However, unlike ChatGPT, which limits its search to specific sources, this feature may occasionally unveil inaccurate information on smaller websites. Consequently, users must verify the information they receive from this chat assistant.

Is it safe?

One crucial question regarding the usage of DeepSeek is its safety. Since DeepSeek, like other services, necessitates user data, which is probably stored on servers based in China.

It is crucial for users to refrain from sharing sensitive information with the chatbot, just like with any other Large Language Model.

Since DeepSeek is also open-source, independent researchers can review the model's code and investigate whether it is secure. Further detailed information on potential security issues is expected to be disclosed in the coming days.

What does it mean by "open source"?

The models, such as DeepSeek-R1, have been released with open-source code, accessible by anyone, allowing customization of the LLM. The training data remains proprietary.

In contrast, OpenAI has made its o1 model available to the public, pricing it between $20 (€19) and $200 (€192) per month, depending on a user's chosen package.

How was it able to generate such a model despite US restrictions?

The company has also entered into cooperative agreements to boost its technological capabilities and expand its market presence.

A notable collaboration was made with the US semiconductor company AMD. According to Forbes, DeepSeek employed AMD Instinct GPUs and ROCM software at crucial stages of model development, primarily for DeepSeek-V3.

According to the MIT Technology Review, Liang acquired substantial stocks of Nvidia A100 chips, a type affected by a current ban on US chip exports to China, prior to the imposition of US chip sanctions against China.

A Chinese media outlet, 36Kr, estimates that the company has more than 10,000 units in stock, with some sources suggesting the actual number is as high as 50,000 units.

Understanding the significance of this stock for AI training, Liang established DeepSeek and began utilizing them in conjunction with low-power chips to enhance his models.

However, key to note is that Liang has discovered a method to construct proficient models using limited resources.

US chip export restrictions led DeepSeek developers to create more sophisticated and power-saving algorithms to make up for their limited access to computing resources.

Deep learning engineers believe ChatGPT requires 10,000 Nvidia graphics cards to process training data. However, the engineers at DeepSeek claim they achieved similar results using only 2,000 such cards.

What has the response to DeepSeek been?

Alexandr Wang, CEO of ScaleAI, which supplies training data to prominent AI models such as those of OpenAI and Google, referred to DeepSeek's product as "an earth-shattering model" in a recent speech at the World Economic Forum (WEF) in Davos.

While DeepMind's new technology has surprised American competitors, experts are already cautioning about what its release may imply for the West.

We should be concerned. The increasing integration of Chinese AI technology into the UK and Western society is not just a negative concept – it's a highly irresponsible one,

Again and again, we've observed how China utilizes its technological superiority to exert control and dominance, both within its own borders and across the globe. This is evident in the way it embeds spyware into devices, launches state-backed cyberattacks, or employs AI to quash dissent, showcasing that its technology is ultimately an integral part of its foreign policy strategy.

This seems to be a seemingly innocuous Large Language Model, but we've already observed that the AI is concealing information unfavorable to the Chinese government.

Some people believe that releasing the latest LLM is a strategic move with political implications, one that could further exacerbate the already strained relations between China and the US.

"The development of this technology is genuine, but the timing of its unveiling appears to be influenced by politics," stated Gregory Allen, director of the Wadhwani AI Centre at the Centre for Strategic and International Studies, in a conversation with the Associated Press.

Allen compared DeepSeek's announcement last week to Huawei's release of a new phone during diplomatic talks over the Biden administration's export restrictions in 2023.

"We view exerting is futile or harming of export controls as a key objective of China's foreign policy efforts currently," Allen stated.

Jangan lupa tinggalkan pesan yach .....

إرسال تعليق (0)