Transcript for:
Deep Seek: Revolutionizing AI Startups

today we're gonna talk about a dark horse of the AI startup if you've assumed China's AI development is held back by many things like chip embargos or strict domestic regulations deep seek story might surprise you although it is a relatively low profile company found it less than two years ago it has just shaken up the large language model world its latest model release is called deepSeek v3 and is now competitive with other world class LLMs released in 2024 including gpt 40 but the interesting thing is the whopping development speed is the least impressive part because the most impressive part is that Deep Seek didn't raise a single penny in funding but it still managed to slash the cost to 1/70th of the gpt4 it may sound too good to be true but it is actually happening if DeepSeek wasn't on your radar before it should be now deep seek has dramatically slashed LLM prices and even sparked China's AI price war therefore some have jokingly referred to it as the Pinduoduo of AI while this may sound like just another example of the cheap Chinese copycat this time is different the story of Deep Seek started in 2015 in 2015 three engineers from Zhejiang University and this is one of the best university in China founded a quant hedge fund called High Flyer they started trading stocks during the 2008 financial crisis when they were still students at the university and fast forward to 2019 they created High Flyer AI this is a branch focused on AI algorithms by 2021 all of High Flyers trading strategies were algo driven and High Flyer has turned into one of China's most successful quant hedge fund in terms of the asset under management High Flyer has recruited best engineers and researchers from China's top universities but it has stayed mostly quiet and under the radar in 2023 Highflyer took a bold step the announced a separate body focused on artificial general intelligence or the AGI in may 2023 this new entity became Deep Seek an AI startup that prioritizes research over monetization in December 2024 deepSeek V3 model was released the team took only less than two years to train the model but the benchmark tests showed that it is actually competitive compared to other LLM counterparts such as gpt4o and Claude 3.5 the result is actually quite surprising here's a benchmark performance chart below here you can see the dark blue bar represents deep 6 results across various test data sets as you can see it even outperformed gpd4 in math and coding tests Deep Seek does need to do better in multitask understanding and PhD level domain questions but is still not far behind gpt4 but of course we know there are better performing models out there but what's truly surprising is how quickly deep Seek has caught up and here's what it gets even crazier Deep Seek was trained on the h800 chip due to the US chip embargo but the whole process took just 55 days and the total cost was around $5.6 million if you don't have a clear idea what these numbers signify let's do a comparison Deep Seek V3 took about 2.7 million GPU hours to train on the H 800 chip and the total money cost is about $5.58 million whereas Llama for instance took 30.8 million GPU hours on the more advanced h100 chip which is banned by the US government this means that Deepseek trains 11 times faster with the less advanced chip and in terms of the total cost Deepseek's total was around $5.58 million whereas GPT4 for instance cost over 100 million which means deep Seek's total cost of development is 18 times cheaper and here's the API usage cost comparison and this is the amount of money you have to pay for instance as a small enterprise or the developer to use the API service provided by the large language model Deepseek is currently on sale which makes it look insanely cheap compared to its counterpart but even if we look at the original price Deepseek is still far cheaper its cost per 1 million input tokens ranges from 7 to 27 cents GPT4 on the other hand charges 1.25 to 2.5 dollar per million tokens because Deep Seek slashed the price of the large language model other Chinese tech giants that are also developing the LLMs have to cut their price lower to stay competitive at first glance this might feel like a deja vu moment another BYD or Temu story where China undercuts the competition with cheap manufacturing or subsidies but here's the key difference Deepseek didn't slash costs by bleeding investor money they actually managed to stay profitable despite raising no funds Deep Seek CEO once shared in an interview about their low pricing strategy he said where surprise people are so sensitive to the price our principle is not to subsidize or rack in massive profits the price is simply slightly about the cost with a small margin a thrift approach to the large language model development may seem impossible especially when we see open AI raising another 6.6 billion in funding but Deepseek managed to pull it off so how on earth did they do it Deepsee actually innovated their training algorithm to make the most of the limited computational resources for example they introduced something called the Deepseek Moe architecture the key components of this architecture are the so called finer-grained expert and the shared expert within each layer of the network this helps enhance the specialization of each expert so in layman's term just think of a large language model like a team of little brains that work together to learn from the data models like Llama or GPT for instance activate all of those brains to learn regardless of the type of data they're processing but Deepseek took a different approach that divided these little brains into specialized expert with each one focused on a specific type of knowledge so instead of activating all brains at once Deepseek only activates the relevant expert for the task at hand this means each brain becomes more specialized but the total number of parameters in the model doesn't increase the aggregated performance is uncompromised because now you have a variety of expert brains specializing in a diverse set of knowledge this approach alone drastically reduced training cost and time since this is not a tech focused channel I'm gonna skip the deep dive into the technical details they've actually done a lot of other innovations at algorithm level and cost effective engineering designs that maximize the GPU usage if you're interested in the tech side of the things I highly recommend checking out this video this is by far my favorite video on YouTube that gives a concise yet comprehensive overview of all the key highlights so now I actually want to do a bunch of comparisons of Deep Seek versus GPT to see how it performs on various tasks I have selected some of the tasks that I personally do on a regular basis so to make this comparison even more interesting I've also decided to include some of the philosophical investigations as well as the moral dilemma questions and let's just see how the large language models are gonna perform okay great I have put the deep Seek and the Chat Gpt4o side by side so we can do a clear comparison of how two different models perform compared to each other and first let's try a math problem I've decided to pick a new problem from the 2024 AMC contest because I've heard from some people who have reviewed Deep Seek from a technical point of view that overfitting is one of the known problems for the model which means that the model can rely too much on the training data and doesn't know really how to adapt to new problems so let's pick one that is absolutely not in the training data set and see how it performs and I just randomly picked the problem this is this seems like a classic combination permunation problem so I am just going to copy and paste the question oh actually just find out there's a deep think mode supposedly I need to turn this on but let's see okay so it does give so the deep seek does give the the final answer this is the correct answer I believe if you look at the solution manual here but let's just turn out the deep think mode and see how it performs differently so apparently it's actually spitting out its thinking process it's literally like a human thinking out aloud and let's see whether it gives the correct answer okay it got 1350 it's a correct answer and let's check out the performance of the GPT4o it also got the correct answer and let's just quickly try another one and this time oh it's still thinking well okay so I'm gonna stop this okay so apparently it is still thinking it looks like the deep think mode takes more time for the reasoning process and it gives the right answer eventually but it seems like it actually took even more time but it seems like even if you don't turn on the deep think mode it still outputs the right answer so let's just quickly try another one and please just ignore the speed of the response because I am on a VPN which means it may not truly reflects the speed of the Chat Gpt4o response okay it seems like gpt4o stopped at step 3 and I'm just gonna try it again okay it got stuck again it says the choice does not work we need another and I'm just gonna say can you please finish the problem okay it looks like a hard question for both model 2000 years later so Deep Seek has an answer it says 276 and let's check the solution manual it is the right answer and it seems that gpt 4 o got stuck after the second try and let's just give it another chance can you please try again and let's see if it gets the right answer this time okay seems not so so if you've read the paper published by the deep Seek team you'll actually discover that they prioritized a lot of the training on the math and coding problems so I'm not surprised here on this tryout of the of the math problems okay next let's try a coding challenge I'm just going to randomly pick a coding challenge from leetkcode and this is a website that automatically grades the correctness of your algorithm and solution so it comes in handy because we can just have both models solve the problems and we'll see which model pass the test so let's take a look I think I'm just gonna pick this one let's pick a hard one running a bit slow okay so this is the problem we got here the challenge is to create an algorithm such that it returns the medium of the two sorted array and here I'm gonna open a new window and I'm gonna say can you please write solution in Python because this is my favorite programming language to solve the following following challenge and I'm just gonna paste in the question here and we are going to do the same with gpt4o okay thinking okay so apparently GPT 4o already has a solution so what we're gonna do is we're gonna copy this solution and we're gonna paste in the solution to leetcode and the website is going to automatically run this solution okay great gonna hit run okay so GPT4o doesn't get all the test cases right it passed one of the case but it failed on the second one which means the algorithm is not correct what I'm gonna do is I am gonna have GPT 01 to try this problem again because apparently o1 has more advanced reasoning capability and let's just see if o1 is going to pass this problem okay it's thinking this is our program I'm just gonna grab this guy and put it right here gonna hit run again okay so o1 actually performs much better it passed both test cases although if you're actually doing a coding challenge there are going to be more test cases to come instead of just two but for our purpose I think this is a good comparison to just get a taste of what the model can do and let's come back to Deep Seek which also has a solution let's try to see if Deep Seek is able to pass the test case is on the first on the first round okay gonna run this solution okay so Deep Seek has both test cases on the first iteration which is not not too shabby okay so let's make it a bit more interesting I'm just gonna give some kind of instruction to both model and ask them to write a code that does exactly what I told them I'm actually gonna say please write a Python program a Python snake game program and so this is a relatively vague human like instruction and I'm just gonna see how each model reacts and whether they can actually write the program that runs so they're running pretty fast okay so what I'm gonna do is that I am gonna grab the code generated by Deepseek and let's paste it into our IDE and I am going to save it as a Python program let's call it snake game Deepseek so I'm just gonna put it on my desktop and run this program to see if it actually runs okay so that seems pretty cool you have a snake that goes around and each time eats a dot it gets a little bit bigger cool okay so now I'm gonna do one more challenge by asking the model to take some human instruction so um I'm gonna do please do two more modification 1 make the UI nicer looking so as you can see this is a pretty vague uh question it could refer to uh the color it could refer to the size of the screen could refer to the background color and let's just see how the deep Seek is going to surprise us and then I'm gonna do um make the speed of the snake slower so this is more specific meaning that is probably gonna adjust the parameters in the original code which is right here see the snake block at speed so let's see how the deep sea is going to react to this question once it's done we can just copy and paste the new coat okay so let's grab this new one save it and let's try running it again okay so if you look at the UI it actually put this grid like background color it also changed the color of the snake as well as the dot and the speed has been adjusted so that is pretty impressive so now let's try the program written by gpt4o let's just similarly copy and paste the code I am going to put it in another Python program let's call it snake game gpt4o dot py okay so let's run this snake game okay so the initial state actually looks very similar to what is generated by the Deep Seek which is not surprising because since the snake game is such a classic game it's probably included in the training data set for both models okay so for the last challenge I am actually going to ask a philosophical question which is the question that has no clear right or wrong answer and let's just see how both models are going to respond so um I'm going to ask first with um with the deep seek model let's try if in the future the AI significantly increase increases the production efficiency and replace all the production activity that is currently done by human does that mean the work the working class people will become useless "useless" and after that what would the human society look like let's just see how the deep seek gives us its response so first it talks about the role of the working class and it says their role might shift toward creative intellectual blah blah blah and talks about economic and social structures and he actually came up with some possible scenario that looks interesting to me for instance the post scarcity economy looks interesting assessing such a system basic needs like food shelter health care could be met for everyone reducing the need for labor driven income so this is like a college essay it basically breaks down it basically defines what is the working class and defines what constitutes the value as a human being and it proposes it does propose some of the interesting point it even prompts an interesting question whether it is going to be a utopian or dystopian future so I'm actually going to do a follow up question actually which direction do you believe that the human is heading toward the utopion or the dystopian I don't think the typo matters but I still want to correct them anyway and okay so this is an interesting part it does analyze some of the current trends and the likely outcome this is like a consulting type of conversation where you get a lot of perspective but you still need to make the decision yourself based on you know the information that's provided okay so it does offer my perspective so deep seek believe that humanity has the capacity to steer toward a more utopian outcome and this will require proactive governance ethical innovation public engagement globe okay so it seems like DC is pretty optimistic about the future of human being okay so let's try the same question with chat gpt O1 and they're looking forward to see what kind of response is coming out of O1 response is very similar it's like a college essay and mentions a lot of redefinition of work it mentions the focus on the personal development the personal interest new industries and opportunities okay so these looks very similar to what came out of Deep Seek so now let's try the same follow up question to see if GPT o1 has its own view whether it's optimistic neutral or pessimistic so right now it looks very much similar to what the Deep Seek is doing but it does not provide a clear personal "personal" view okay so that's an interesting comparison so I think the natural question is that is Deepseek the best model out there well definitely not yet V3 still takes longer time to respond and for instance it cannot compare with Claude or gpt4o when it comes to longer text comprehension but I think the point is not about being the best today the real story is how quickly they're catching up and changing the game of AI and don't forget Deepsea is completely open source at this moment and that brings up a big question which is can open AI keep its lead in the years to come Deep Seek could just burst the bubbles of the large language model by slashing the huge cost of training and GPU usage and it's very reasonable to assume that other small but equally nimble startups could similarly jump in and build on the open source version and catch up very quickly with a much smaller budget but an even bigger observation is the mindset shift among Chinese tech entrepreneurs China has historically been seen as better at manufacturing than original innovation we've seen way too many examples of tech startups creating applications based on existing technologies or even making cheap copycats to quickly monetize but Deep Seek raises an important question can China finally shake off this image and actually innovate at the higher levels deep Seek founder Liang Wenfeng has a clear vision when asked why Chinese tech firm often prioritize monetization he said for the past 30 years we focused on making money often at the expense of innovation but innovation isn't just driven by business it's driven by curiosity and creativity and here's a kicker Deep Seek is open sourcing their research and why because Liang believes that given back to the tech community is an honor not a loss during an interview he actually said for technical people being followed is a sense of achievement open source is more of a cultural act than a business one and here's another important question which is is Deepseak a one-off or is China's innovation environment really changing the truth is we do need more examples to be sure innovation isn't just born from aggregating a group of talented people many innovations happen by coincidence like how Nvidia didn't foresee the AI era and started with gaming this means that China needs an environment that champions research for sake of curiosity rather than for formalism or monetization and we need an environment that not only supports but even celebrates the failure that comes with trying new things an equally important and active capital market to back many many endeavors I think sometimes innovation does need a bubble we often focus on a bubble bursting but we forget that it is a necessary step in the process of trying moreover a real question is regulation in mainland China still impose stricter oversight for AI companies to grow and for many Chinese business aspiring to go global they also have a real fear which is will Chinese large language models face the same situation like Tik Toks still I think Deepseak story has a lot of positive things to say and it is a sign that things are shifting during an interview the Deepseak CEO once said this is a huge departure from the old playbook it's not just about building cheap copycats anymore it's about contributing to the global tech ecosystem and fostering a culture of innovation the Deep Seek team hired many local talents from China and we see there's real talent here and more importantly real passion so can China begin to shake off the stereotype of being just a hub for cheap copycats and I think that's a story truly worth watching well thanks for watching if you enjoyed this video please don't forget to like subscribe and hit that bell button for more deep dives into the China insights we'll see you next time