Coconote
AI notes
AI voice & video notes
Try for free
🤖
Overview of DeepSeek AI Models
Feb 28, 2025
Lecture Notes on DeepSeek AI Models
Introduction to DeepSeek
DeepSeek
is a startup based in China.
Made headlines by overtaking OpenAI's spot on the US Apple App Store as the most downloaded free app.
Known for releasing an open-source model that rivals industry-leading models at a fraction of the cost.
Key Model: DeepSeek R1
DeepSeek R1
: A reasoning model, indicated by "R" in the name.
Competes with OpenAI's reasoning model, O1.
Excels in AI benchmarks for math and coding tasks.
Approximately
96% cheaper
to run compared to O1.
Features of Reasoning Models
They solve complex problems by breaking them into steps.
Utilize a process called "chain of thought" for step-by-step analysis.
Differ from past models by showing the reasoning process.
Evolution of DeepSeek Models
DeepSeek Model Timeline
DeepSeek V1
(Jan 2024)
67 billion parameters.
Traditional transformer model.
DeepSeek V2
(June 2024)
236 billion parameters.
Featured multi-headed attention and DeepSeek mixture of experts.
DeepSeek V3
(Dec 2024)
671 billion parameters.
Introduced reinforcement learning and load balancing across GPUs.
DeepSeek R1-Zero
(Jan 2025)
First reasoning model using exclusively reinforcement learning.
DeepSeek R1
Combines reinforcement learning with supervised fine-tuning.
Distilled Models
Distilled Models
: Smaller student models derived from larger teacher models.
Serve as model compression or translation between architectures.
Efficiency of DeepSeek
Utilizes
fewer specialized Nvidia chips
compared to competitors.
Example
: Only 2,000 GPUs for DeepSeek V3 while Meta used over 100,000 for Llama 4.
Architectural Efficiencies
Mixture of Experts (MoE)
architecture is resource-efficient:
Divides the model into sub-networks (experts).
Activates only necessary experts for tasks.
Reduces computational costs and speeds up performance.
Reinforcement Learning
Rewards the model for correct answers, allowing it to discover its reasoning pathway.
Conclusion
DeepSeek R1 matches industry-leading models in performance at lower costs.
Reflects exciting progress in AI reasoning models.
📄
Full transcript