Transcript for:
Deep Seek and AI Landscape Insights

so everyone's seen the news about deep seat today is it as big a deal as everyone is making of it yes it is Sputnik 2.0 it is true that they spent about 6 million or whatever it was on the training they spent a lot more distilling or scraping the open AI model I can't speak for Sam mman or open AI but if I was in that position I would be gearing up to open- Source my models in response because it's pretty clear you're going to lose that so you might as well try and win all the users and the love from open sourcing open always wins always ready to go Jonathan dude I am so excited for this so I've heard so many good things from so many different people so thank you so much for doing this emergency show with me today no problem but before we start can I can I just say one thing um I think you have the most amazing unique go to market that I've ever seen in my life for a podcast I've never seen this before I think your strategy is you're literally interviewing every single audience member forcing them to watch videos and get addicted to you I me I thought you were GNA say my accent but I'm totally gonna take that that's wonderful um and yes you're absolutely right sometimes the biggest benefits of your business you don't actually see until you do them um but scale it's totally true um but I do want to start um obviously everyone's just talking about deep seek little bit of context why you so well placed to speak about deep seek and let's just start there for some context well my background so I started the Google TPU the AI chip that Google uses and in 2016 started an AI chip um startup called grock with a Q not with a k um that builds uh AI accelerator chips which we call lpus Fantastic wonderful I wish everyone was coherent as you in terms of their introductions okay to everyone seen the news about deep seat today I want to just start off by saying is it as big a deal as everyone is making of it yes it's Sputnik it is Sputnik 2.0 and um even more so you know that uh story about how NASA spent a million dollars designing a pen that could write in space in the Russians brought a pencil um that just happened again so it's it's a huge deal yeah okay why is it such a huge deal let unpack that so up until recently the Chinese models have been behind um sort of Western models and I say Western including like Mr all as well and some other companies and it was um largely focused on how much compute you could get most people actually most don't realize this most companies have access to roughly the same amount of data they buy them from the same data providers and then they just churn through that data with a GPU and they produce a model and then they deploy it and they'll have some of their own data and that'll make them subtly better at one thing or another but they're largely all the same and the more gpus the better the model because you can train on more tokens it's the scaling law uh this model was uh supposedly trained on a small smaller number of gpus um and a much much tighter budget I think the way that it's been put is less than the salary of many of the executives at meta and that's not true it's it's actually there's an element of marketing uh involved in the Deep seek release what do you mean I'm patner well it is true that they trained the model on approximately $6 million wor the gpus right they they they claim 2000 gpus for I think it was 60 days which by the way also don't forget was about the same amount of GPU time 4,000 gpus for 30 days as the original I believe llama 70 now more recently meta has been training on more gpus but meta hasn't been using as much good data as deep seek Because deep seek was doing reinforcement learning using open AI is this this distillation just so I understand effec effectively and so can you just help me and help the audience understand what is distillation in this regard and how have deep seat been using distillation to get better quality output through open AI data so it it's a little bit like speaking to someone who's smarter and getting um tutored by someone who's smarter you you actually do better than if you're speaking to someone who's not as knowledgeable about the area or giving you wrong answers and there's this first of all before we get into any of this I I need to start with the scaling laws these are like the physics of llms and there's a particular curve and the more tokens which are sort of the sort of the syllables of an llm they don't match up exactly with human syllables but kind of right so um the more tokens that you train on the better the model gets but there's sort of these asymptotic returns where it starts trailing off the thing about this scaling law that everyone forgets and that's why everyone was talking about how it's like the end of the scaling law we're out of data on the internet there's nothing left most people don't realizes that assumes that the data quality is uniform if the data quality is better then you can actually get away with training on fewer tokens so going back to my background one of the the fun things that I got to witness I I wasn't directly involved was um alphago when um Google beat the the the world champion lease it all and go um that model was was trained on a bunch of existing games but later on they created a new one called alphao zero which was trained on no existing games it just played against itself so how do you play against yourself and win well you train a model on some terrible moves and it does okay and then you have it play against itself and when it does better you train on those better games and then you keep leveling up like this right so you get better better data the better your model is when it outputs something the better the result the better the data so what you do is you you train a model you use it to generate data and then you train a model and you use it to generate data and you keep getting better and better and better so you can sort of beat the scaling law problem one quick hack to get past all of that in the stepping up is if there's a really good model already right here just have it generate the data and you go whoop right up to where it is and that's what they did so it is true that they spent about six million or whatever it was on the training they spent a lot more um distilling or scraping the open aai model so they scrape the open AI model they get this higher quality data from that and from refining it and then they get greater higher quality output correct correct correct and all of that said they did a lot of really Innovative things so that's what makes it so complicated because at on the one hand they kind of just scraped the open AI model on the other hand they came up with some unique reinforcement learning techniques um that are so simple you they it was so impressive because I think a lot of people wanted to say I blly the Chinese copy and duplicate as they always had done no no no they came up with Innovative stuff but actually the best way to describe it have you ever taken a test before and you got an answer right and your professor marked it wrong yeah and then you go back to the professor and you have to argue with them and everything and it's a pain right well if there is only one answer and it's a very like simple answer and you say write that answer in this box then there is no arguing you either get it right or not right so what they did was rather than having human beings check the output and say yes or no or whatever um what they did was they said here's the Box there's literally some code to say here's a box output the answer here and then check it and if it's correct then we have the answer if not we don't no need to involve a human completely automated what about I I I read about reward modeling St State and they innovated on this in such a unique way did they did they not can you explain that for me that that area I'm not as familiar with so I'm probably not going to but maybe like you've been doing the research why don't you tell me what you saw and I could tell you if it tracks essentially they combine two different types of reward models to get higher more accurate output and that was what I kind of didn't understand um yeah this is yeah not an area where I've dug too deeply into it um can I can I ask you though can open AI not just do distillation on deep seats model then get they don't need to because they're actually better still they're a little bit better so um could but why would they do we buy the GP usage or is that questionable D I don't think you have to dispel it because of the quality Delta however I will say this why would they try and smuggle in gpus when all they' have to do is log into any cloud provider and rent gpus this is like the biggest gaping hole in the whole um way that export control is done you can literally log in you can swipe credit card whatever and just like pay and gpus to use soort laws unnecessary then or they're they're good but the problem is there's like it's like the managa line you just go around it so you need to like seal it up a little more there's a little bit of room left to to to go here and then the other thing is keep in mind open AI was effectively subsidizing accidentally the training of this model because they they were using open AI right and you know rumors are that open a may not be completely profitable yet um in terms of every token in the API like on the subscriptions maybe but but in the API and so each one that they generate effectively they were losing a little bit of money while deep seek was getting training data now by the way um openingi probably still has that data in theory they could just probably train on it George chrisan said in a tweet today though that this would likely be a violation of us export laws do you think that's not true I'm not aware of where it would be an export um issue I do know that many people log into Cloud providers and just use them from remote one of the problems so we actually block IP addresses from China um and I believe we might be unique in doing that it's also a little bit fruitless because you know someone can just like rent a a server anywhere and then log into us from there right and then there's nothing we can check so I I don't know that I addresses are really the right way to do it anyway I think we need something more sophisticated but I mean yeah it's it's it's a big cheese swiss cheese wall you said there about kind of booking you know um IP addresses from China um there's a lot of concern about us customer data going back to China do you think that is a legitimate and Justified concern yes it's it's um it it it's probably the most significant concern there there are other concerns that's probably the most significant because people don't think they're so used to using these Services when when you use one of these other services you might be shocked to hear this when you say delete what they do is they write delete right next to your data they don't actually delete it they just mark it delete it when you later come back and and ask for your data they give it to you with the word delete right next to it it's still there and these are well-meaning companies do you really think like the CCP doesn't have all your data and isn't going to look it up later and some governments are more aggressive than others right and if they have access to your data not even your data it could be your Nextdoor neighbor's data your nextdoor neighbor might put something in there that um accidentally um gives information away that makes you more vulnerable right and then now the CCP has something and like maybe had some package delivered and and they put a complaint somewhere and whatever like you might not even do it yourself but other people around you like the Health Data of a spouse Jonathan I'm going to avoid the British indirectness do you think deep seek is an instrument that will be used by the CCP to increase control on lesson democracies yes but I don't think it's deep seek that that's doing it so you have to understand any company that operates in China and Hong Kong the what was it um one country two systems thing didn't quite work out as anticipated um or maybe as anticipated but not as stated um they have no choice right and so in 2016 when grock started we decided that we were not going to do business in China this was not a geopolitical decision this was purely commercial and what it was was we kept seeing companies like Google meta just you know fail over and over again trying to win in China and the formula is actually pretty simple you're not allowed to make net money you're allowed to spend more money in China but the moment that you start to become profitable or anywhere near profitable all of a sudden there's a thumb on the scale so companies that manufacture a lot in China and send more money to China can actually be successful there they can sell things there yeah it's a pretty simple formula you must send more money to China than you take out but at the same time they also require that you hand over all data and not only that they also require that certain answers be in a form that they find acceptable so for examp one of the more common ones that you see about deep seek right now is when you ask about T and square if the temperature is low on the model and temperature we don't need to get into that it's complicated but it's how cre like low means low creativity then it's actually going to give you um an answer that basically says I don't want to talk about that it's a sensitive topic right but you ask it about other things that um uh are sensitive topics elsewhere in the world and it'll just answer but what what happens if the CCP requires that they start to say what about Tik Tok should it be banned absolutely not here's why and it gives you a cogent reason right that's kind of scary Jonathan what do we do from here I I I share you a concerns completely my challenge is Tik Tok you can ban and shut off they would not sell the algo that is a closed end product that we can ban tomorrow if we really want to here it's open source yeah and and worse so we we up until recently refused to run any Chinese models and we had to make a very difficult decision on deep seek we now have it on our um API you know at gr and so um why did you decide that you would break the rule for deep so um what it came down to was when we saw deep seek become the number one app on uh the App Store the realization was people were going to be putting their data in there and what we want to make sure is that you actually have an option so we store nothing like there is no like delete or whatever like there is just we store nothing we don't even have hard drives right like we just we have Dam and when the power goes off everything goes away right so um we wanted to make sure that there was an alternative where when you use deep seeks model your data is not going to the CCP well right now the CCP is probably going to be taking the safeties off the weapons they're going to be like why are you making this model open source please direct your data towards us go win a bunch of customers this way but now we want the data right and so they're going to change the strategy but remember deep seek is a real I mean it's a hedge fund they're doing this themselves and they're just influenced by the CCP and the CCP now that they've seen the success of this might see it as yet another Tik Tok 100% they will see it as another Tik Tok my question to you is how long is it before the US reacts to prevent this should be it's hard right so the first thing is um one question to ask is are we going to be talking about deep seek for the next or R1 for the next six months and the answer is absolutely not we might be talking about R2 and R3 and R4 but R1 was one shot the question is are they going to keep coming up with very interesting things are we G to you know cat and mouse it and is everyone going to learn from this the the biggest problem is we've this has just made it absolutely nakedly clear that the the models are commoditized right you've been asking the question right like if there was any doubt before that doubts over um so what is the moat right and for me I I love Hamilton Helmer seven Powers right one of my favorites I do it for every single investment we do we have to fill it out every single person G must fill it out so yes so so marketing is the art of decomod your product and the Seven powers are seven great ways to DEC commoditize your product right scale economies Network effects brand counter positioning cornered resource um uh switching cost uh process power right so the question is who's G to do what open Ai and you got to give like Sam Alman and that teen credit like they've got amazing Brand Power like no no one else in this space and that's going to serve them for a really long time right but what what you see Sam trying to do is scale right he's trying to go SC that's why we hear about Stargate $500 billion doll right that's what he the power he would like to have but the power he has right now is brand he's trying to bridge that right so but what about the others I'm sorry does this news not ridicule the $500 billion announcement at a time when we've SE an increasing efficiency to a scale like never before with deep seat today the $500 billion do seems ridiculed actually I don't think it's enough spending and and the reason is so we saw this happen at Google over and over again right so we we do the TPU and the TPU so why did we do the TPU the speech team trained a model it outperformed human beings at speech recognition this was like back in 2011 2012 right it was the first time and so Jeff Dean most famous engineer at Google um gives a presentation to the leadership team it's two slides slide number one good news machine learning finally Works slide number two bad news we can't afford it and we're Google we're going to need to double or triple our global data center footprint at probably a cost of 20 to4 billion dollar and that'll get a speech recognition do you also want to do search and ads so it turns out there's always this giant mission accomplished Banner every time someone trains a model and then they start putting it into production and then they realize oh this is going to be expensive this is why we've always focused on inference and so now think about it this way at Google we always ended up spending 10 to 20 times as much on the inference as the training back when I was there now the models are being given away for free how much are we going to spend on inference and I I guar and now with the test time compute right and like I I've I've asked questions of deep seek where it took 18,000 intermediate tokens before it gave me the answer I think Jenson said that now half of their revenues is from inference yeah so what does that look like in the future then I think 95% I mean it just makes sense right you like you don't train to become um you know cardiovascular surgeon um and then that's what you do for 95% of your life and then you perform for 5% it's the opposite you train for a little and then you do it for the rest of your life so can I ask do you think the US put sanctions on deep sea to prevent the CCP using it for Daya capture on US citizens I don't know what the solution is um there's carrot and there's stick right so you can either use a stick block it um not exact I mean that might be effective I don't know that the US has really done that before I'm not aware of a case it may be possible that it's happened there's also the carrot right which is it's it's kind of interesting how it's being offered for free in China um and not just in China but to anyone else um and then others are doing that too is it possible the CCP is underwriting that because they want the data in which case dude they're doing it with the car industry the subsidization of of cars for Chinese cars would bydm sat destroying the European car market is absolutely that the thing is we we have a lesson from um the Cold War which was mutually assured destruction the the problem is we we you know do some sort of tariff and then we do a tariff back there needs to be some sort of automated response of like if you do this we will will respond if you subsidize this industry we will automatically subsidize the equivalent industry just automatic so don't do it because there's no benefit to you does the fact that it's open source how does that change everything I mean it it's the only reason people are using it if it wasn't open source it wouldn't have gotten the excitement right and open always wins always and and keep in mind Linux one back when people didn't trust open source they thought it was less secure they thought the features were worse it was more buggy right and it still won now people expect open to be more secure less buggy and have more features so how is proprietary ever going to win everyone always says that actually distribution is one of the major advantages that cha GPC and hence open AI has especially over the other providers every single day that deep seek is out and is being used so pervasively it is diminishing the value of open yeah agree or disagree agree especially for the pricing because they're losing their pricing power on this I can't speak for Sam Alman or open AI or anything like that but if I was in that position I would be gearing up to open source my models in response because it's pretty clear you're going to lose that so you might as well try and win um all the the users and the love from open sourcing otherwise I mean like you're already at a point where you're going to be using your other powers like brand and so on I don't know why you try and keep that internal anymore would that be possible and would that not cannibalize one that cool main line of Revenue but how would it cannibalize it any other way remember people like distribution right how many people are going to buy something because they trust Dell right people trust Dell Dell has earned their reputation over the course of decades super micro build some interesting Hardware but look at what they've been going through recently like you know there's a pro and con right cheaper trusted you got to make a decision and open AI has been around for a while most people think of them synonymously as AI they could just switch to deep seek and people would still use them it's brand it's one of the seven Powers so if you open AI on s's to day you would switch to open and offer it for free I would and there's probably more cleverness they could probably strike some deals before they do it or whatever but that would be the move that I would make and also it would be a position of strength and it it would just simply say look you know the only problem is the timing because if it happens right after deep seek it looks like a response as opposed to an intentional thing so I don't know how you do that but it is a response do not just own and it's a response yeah maybe that's a good one you just say look you know we had to respond we're better let's see which model people choose what do you think is the internal discussion within open hour today I would imagine it depends on where you are if you're senior then you're going to have very different concerns than if if you're you know at at the sort of foot soldier level at the foot soldier level you're going to be worried is My Equity going to be worth any is there any longevity here like how how do I do my job am I going to have a job if you're further up then it's going to be more like how do I keep everyone how do I keep the morale up how do I like what is my response and then you're GNA have a lot of very difficult decisions in front of you and and the number one the number one driver of bad decisions is fear and so what they have to do is they have to pick something then they have to just like commit to it hard and be brave about it and you know there's so many different decisions that work if you commit in align it's all about the alignment right how do we think about matter matter share the open source values that deep seek of espoused does this help or hurt matter that's a good question so I I think one of the ways that we've been looking at you know llms um is a little bit like you look at an open source um project software project like um Linux or something the thing is Linux has switching cost and I think what we've discovered is llms have no switching cost whatsoever it's why the analy cloud doesn't hold up at all because everyone's like oh it's like Cloud there's gonna be a couple of cool Benders and actually they're they're gonna win no you don't your Cloud very often okay so let's let's start mapping seven powers to the top tech companies so I would say um Microsoft's biggest strength is switching cost right look I I love Microsoft as a company but you go into room full of people and you're like who uses Microsoft bunch of hands go up and you're like who likes using Microsoft hands go down right it's it's very largely switching cost so you go into you know geni is that a thing that gets disrupted you look at meta it's Network effects they could literally give every piece of technology away for free I am completely jealous of that because I if if I had that right now I would open source everything right because then you don't have to worry about it and you get everyone helping you right so I think meta is sort of because of the network effect thing always in a position where open source is to their advantage it almost doesn't matter where it comes from now I I'm sure that they would prefer to have the Linux of llms but I think the more it goes open source the more of an advantage they have inherently if you were meta would you do anything different you know meta is an amazing competitor I think what they would normally do if this was something sort of proprietary social mechanism right um they would try and replicate and then they would compete and they would say come join or not I don't think that the come join works here but the beautiful thing is all of the information for this model is available n has already been doing this they have way more compute the question is are they willing to scrape open AI like deep seek did and I don't think they are I they've been super careful on everything that they've been doing and so that's the disadvantage I'm not being rude to not put morals aside to win this is the AI arms race and I think that's going to happen I I think people will like you cannot lose and so what it's done is it's changed the game right so okay so let's talk about Europe for a minute we almost forgot about Europe yeah we kind of used that now we just sit and watch with an espresso so what I for me you know watching everything it feels like with Europe there's a lack of a willingness to take risk right there's a black mark if you get it wrong like everything's about downside protection whereas in the US it's like that was a great effort you failed but I'm gonna fund you again right so there's that difference but when you look at the us and then you look at China China um practices rdt research development theft it's just part of the culture and it's not just against Western compan it's against each other too the difference is if you're a western company then the government steals from the Western company and then provides it to the Chinese companies Which is less Fair the famous stories of um turning on Huawei switches and you see Cisco's logo and all the bugs and right um so is that new paradigm I really hope not because like for Europe to compete with the US Europe has to adopt a more um risk on attitude does the West have to adopt a more theft on attitude I really hope not like that's just like viscerally disgusting to me like I'm like literally repulsed by the idea I'm not being rooted are we not being idealistic if you're running in a race with someone who's willing to take steroids if you want to win you're going to have to take steroids too and then everyone is taking steroids whereas if no one was taking it then everyone's healthier and you have a real competition yeah it's a real problem and the question is can governments get involved like here's the thing I would love nothing more than to compete directly with Chinese companies on a fair footing they have really smart people deep seek has proven this right really smart people but when the government keeps putting its thumb on the scale we're going to try and avoid that competition wherever we can and now there's no avoiding it so maybe the governments just have to get involved but dude I'm being blunt like xiin Ping hases about one ping power retention and growth is the only thing that matters to him and AI is Central to that he will do whatever it takes to win having some rational discourse about some rules of play is bluntly unrealistic okay and it gets worse than that so China has a lot of advantages but the Chief Advantage is the number of people they have now number of people is not sufficient right so you also have India and India has an advantage from the number of people but China has out executed in fact India was asking China for some time to help build out the roads and infrastructure they've really mastered that right but people and sort of organization discipline alignment what is the concern with AI the concern with AI is what if an lpu or GPU becomes the equivalent of a contributor to the workforce and you can literally just add more to the GDP by creating more chips and providing more power now if that becomes the case does China's Advantage erode so they're concerned that in terms of Workforce the us could catch up the the West could catch up and then at the same time um they have a a huge population advantage and this is why I so much want for Europe to get into the fight on AI right like you know if there's 500 million people who could be jumping into this if you were to advise the EU today on Europe's Dawns what would you say I would say so have you ever seen station F yeah of course I was that last week we hosted in the okay so I would say by the end of this year you should have 100 station fs and by the end of next year you should have a thousand done you're you're basically telling every so what you're doing is you're collecting up 3,000 people and surrounding them with other risk-taking entrepreneurs and then they're supporting each other and you know they're they're risk on and every when you surround yourself with other people who are risk on you're going to be risk on and you're going to you're going to take the entrepreneurial leap what does this space look like in three years time how fees I I'm obviously Aventure capitalist for a living all of my friends are going oh my God oh my God we just lost hundreds of millions of dollars on these Foundation model companies how many companies are you aware of that have become incredibly successful that didn't pivot few most pivot yeah exactly so pivot get over it like just pivot so Frank I've been talking to a lot of the llm companies and frankly they have some good ideas in fact I really like so I watched your interview um with the suo founder and um I I he I think he saw it from the beginning like models are going to be commoditized and that's why he's focused on the product um he got it from the beginning right what is your product not what is the model model is um it's a piece of Machinery it's an engine but what is the car what is the experience what do you think prity is in three years the question I used to get asked when um when we were raising money a little while ago was um is AI the next internet and I'm like absolutely not because the internet is an Information Age Technology it's about duplicating data with High Fidelity and distributing it it's what telephone does it's what internet does it's what the printing press did they're all the same technology just much different scale right and and speed and capability generative AI is different it's about coming up with something contextual creative unique in the moment right and so the llm is just the printing press of the generative age it's the start of it right and then there's going to be all these other stages well the thing where like just imagine trying to start Uber when we didn't have mobile yet great I'm going to book a trip over to here how do I get home like you didn't you can't carry a desktop with you right so you need to be at the right stage so when I look at perplexity I look at perplexity as being perfectly positioned for the moment that the hallucination or really confabulation rate comes down because the moment that these models get good enough where you don't have to check the citations anymore that's going to open up a whole set of Industries all of a sudden you'll be able to do medical diagnosis from llms you'll be able to do you'll be able to do legal work from llms until then it's like trying to create Uber before we had smartphones it just doesn't make any sense however people are willing to perplexity today even though you have to check the citations so they have an actual business that gets to so like they're getting to sort of ride the wave and the moment that that tsunami of um sort of lack of confabulation or or hallucination comes along they're perfectly positioned does mist draw survive each company has to find their own thing right and I would look atso as as like a great example of how things are being done around the product as opposed to just the models but yes possible to Pivot when you are open AI or anthropic or any of the very large providers you've ingested billions of dollars disruption happens if you're not able to Pivot now you're not gonna be able to Pivot later when you get disrupted anyway one would think that with commoditization of models and with cheaper infer that actually big Tech wins right have you seen the stock market today they've been hit hard how do you think about that what you see is a bunch of people who are concerned about training and the need for it and everyone still thinking that most of compute is training and that there's going to be less of it because someone trained a model on um 2,000 gpus and the nerfed you know a800 version with slower memory or whatever it is and they're like oh people aren't going to need as many chips but again like jeevan's Paradox right which is the more you bring the cost down the more people consume so for the last five to six decades like clockwork once a decade the cost of compute has gone down a THX people buy 100,000 uh X as much compute spending a 100 times as much so every decade they spend 100 times as much so you make it cheaper and they want more and so what's really happening is every time one of these models gets cheaper we see our developer count just Skyrocket it just like goes up and then it comes back down a little bit but the the slope is higher than when it started so better models create more demand for inference more demand for inference then has people going I should train a better model and the cycle continues I just bought a shitload of Nvidia yeah because they dropped 16% on the thesis that the increasing efficiency means that obviously we wouldn't need as much Nvidia chips and I thought exactly that which is like you'll still need the invid inference and you'll just have much higher usage so to me it's the most screaming buy of the Cent do you share my optimism on Nvidia given what you just said in jebin's Paradox so I think over the long term the the only thing I say is um you know Warren Buffett and Charlie Munger in the short term the market is a popularity contest in the long term it's a weighing machine um I can't tell you about the popularity contest but in terms of the weighing machine part like there this is a misunderstanding it's actually more valuable thanks to deep seek not less valuable okay so jebin's Paradox was actually discovered by by jevin and and you know um as as recently made famous in sacha's tweet however I did beat him to that um by quite a bit and and just as SAA likes to say that he made Google dance I'm gonna say I made Sacha dance right but he might take exception to that but you know less than a month before he posted that I I did a a cute little tweet on it um so what's really happening here was in the 1860s this guy jevin he actually wrote a treaties on Steam Engines which I guess is what you did for fun back then in in England and um he realized every time uh steam engines became more efficient people would buy more coal which is the Paradox but if you think about it from a business point of view when the Opex comes down more activities come into the money so people do more things right and and so what's happened is every time we've seen the cost of tokens for a particular level of quality of models come down we've actually seen the demand grow significantly price elasticity baby right and so um you know a lot of people suggest that nvidia's incredible high margin stasis which I'm going to butcher I can't remember what it was in the latest release it was for something 45 or whatever it was but it was very very high and then relate to your margin as my opportunity I think give it back to the seven powers and go their margin is their defens ability and it makes me really just consider the strength of their modes do you think your margin is my opportunity or do you think that defensibility is that margin today there's this wonderful business selling mainframes with a pretty juicy margin because no one seems to want to enter that business um training is a niche market with very high margins and when I say Niche it's still going to be worth hundreds of billions a year long term but inference is the larger market and I don't know that Nvidia will ever see it this way but I do think that those of us focusing on inference and and building stuff specifically for that are probably the best thing that's ever happened for Nvidia stock because we'll take on the low margin high volume inference so that Nvidia can keep its margins nice and high do you think the world sees this no and I was actually like we raised some money um late 2024 and in that fund raise we still had to explain to people why inference was going to be a larger business than training and remember this was our thesis when we started eight years ago so for me I struggle on why people think that training is going to be bigger it just doesn't make sense dude for anyone who doesn't know what differeny training inference training is where you create the model inference is where you use the model you want to become a heart surgeon you spend years training and then you spend more years practicing right practicing is inference uh and I'm thrilled to hear you share ISM around Nidia um where does efficiency go from here everyone was so shocked by how R1 is so much more efficient and what we've seen from it what next what you're going to see is everyone else starting to use this Moe approach now there's another there's another thing that happens here so we approach just so I understand is like the segmentation of where information goes so it's rooted to like the optimal point of the model yeah it's called soe stands for mixture of experts so when you use llama 70 billion you actually um use every single parameter in that model when you use mixs um uh 8X SB you use two of the roughly 8B you know experts although there's some shared weights on top of that but it's much smaller and effectively while it doesn't correlate exactly it correlates very closely the number of parameters effectively tells you how much compute you're performing now if I have a like let's take the um let's take the R1 model so I I believe it's about 671 billion parameters versus 70 billion for llama and there's a 405 billion dense model as well right but let's focus on 70 versus 671 I believe there's 250 experts Each of which is somewhere around two billion parameters and then it picks some small number I'm forgetting which maybe it's like eight of those um maybe it's um 32 whatever it is or 16 of them whatever it is and so it only needs to do the compute for that so that means that you're getting to skip most of it right sort of like your brain like not every neuron in your brain fires when I say something to you about um you know the stock market right right like it the neurons about you know playing you know football um like those don't kick off right and so that's the that's the uh intuition there now previously for it was it was famously reported that uh open AI gbd4 had I believe it started off with something like 16 experts and they got it down to eight I forget the numbers but it like started off larger and they shrunk it a little um and they were smaller whatever um and then with um uh what's happened with um the Deep seek model is they've gone the opposite they've gotten they've gone to a very large number of experts the more parameters you have it's like having more neurons it's easier to retain the information that comes in and so by having more parameters they're able to on a smaller amount of data get good however because it's sparse because it's a mixture of experts they're not doing as much computation and part of the the the cleverness was figuring out how they could have so many experts so it could be so sparse so they could skip so many um of of the parameters but if we take that then back to like that's where we are St and how they become so efficient what's the the next stage of that then because they all the experts they can root it so efficiently what now here's a fun one So Meta recently released their um uh llama 3.3 70b and it outperformed their 3.1 405b so their new 70b outperformed their 405 and what was surprising to me I thought they retrained it from scratch it turns out um you read the paper and they talk about how they just fine-tuned um so they used a relatively small amount of data to make it much better again this goes to the quality of the data they have higher quality data they took their old model they trained it it got much better but that 70b that new 70b outperforms their previous 405b and so what you're going to see now is now that everyone has seen this deep seek architecture they're going to go great I have hundreds of thousands of gpus I'm now going to use a lot of them to create a lot of synthetic data and then I'm going to train the Bas out of this model because the the other thing is um so while it's sort of ASM tootes the the question is on this curve where do you stop it depends on how many people you have doing inference you can either make the model bigger which makes it more expensive and then you train it on less or you make it smaller and it's cheaper to run but you have to train it more so deep seek didn't have a lot of users until recently and so for them um it would have never made sense to train it a lot anyway they would much rather have a bigger model but now what you're going to see is all these other people either making smaller models or trying to make higher quality ones of the same size but just training it more K we've seen deep seat now say hey only Now Chinese phone numbers can log in like that is a new sign up I think it is what's happened and what is the result of that so they ran out of compute and this is why this is the other reason why um chip startups are going to do just fine because they ran out of inference compute you train it once but now so you spend money to make the model like designing a car right but then each car you build cost you money right well each query that you serve requires hardware and you're going to use a like training scales with the number of ml researchers you have inference scales with the number of end users you have do you think deep seek are truly astonished by the response they've got from the global Community or do you think they knew this happen I think they marketed very well like you look at some of the publication and they make it sound like it's a philosophical thing and you know they talk about they spent six million on the gpus and and everyone just zoomed in on that neglecting the fact that um llama's first model was trained on like I think five million worth of GPU time um and it set the world on fire in a good way um and then ignoring the fact that they spent a ton generating the data and all this they're really good at marketing I I think they were probably surprised at how well it worked but I think this is what they were going for is that anything that I haven't asked or we haven't spoken about that we should well yeah maybe ask like what what's up with the 500 billion doll Stargate effort okay what's up with the $500 billion Stargate effort D you buy those numbers of [ __ ] I've gone back and forth on that I I actually did so Gavin Baker tweeted some math before I saw that tweet I came up with very similar math like spookily similar math however talking to some people in the know some of the comments are actually they've got it but then you keep pressing and it's like well maybe is there some cutesiness to it what I think it is is an acknowledgment that the models have been commoditized and infrastructure is what's important in terms of maintaining uh Elite like scale it's one of the seven powers and so I think what you're seeing there is an attempt to move from having a cornered resource or something like that into scale economy do you think it will work I don't think you get there in a short period of time with gpus because most of the computers inference and so you know if you're talking about like building out all the power building all like it's going to take time it's infrastructure it's capex I think the real win here is brand that's what I would be doubling down on I would be like hiring the best brand firms I could I would do complete makeover we'll open thei have a stronger or weaker brand in three years time much stronger I think they're going to double down on that and they're going to focus on it wow who will lose people who can't adapt to disruption anyone who just wants to keep going on a straight line and and do what they were doing before is going to lose and the rate of disruption is probably going to increase because so think about it this way going back to the analogy of um llms being the printing press imagine if there were a couple of smartphones left over from an ancient um civilization all of a sudden the printing press is invented and you're like oo Uber's coming I want to position for it right I know where this is going we are the smartphones we know where generative um Age Technology goes right and now everyone's like well we know how big this gets let's put money in to it right I can't be the one who doesn't spend money on this because I know how big of an advantage it's going to be it's like getting to add more workers to the workforce and so I just think you're going to see a I think the generative age we're going to speedrun it faster than whatever comes next because we know what it looks like is there any chance we see a plateau we saw it in self-driving for example where we kind of went through this desert of blacker progression and suddenly all of once it came will we see that or we just see this continuing to on us I think with self-driving the problem you had was the the threshold had it had to be way superum because if you look at the number of miles driven by these self-driving Vehicles it's an enormous number and the number of fatalities and incidents is lower per mile but we have no tolerance whatsoever for them when it's a machine when you're writing poetry and code it's very different right versus um doing a surgery or or driving a car if you're Elon an x. a how are you feeling and do you feel better or worse post this well I would probably feel I would probably feel both better and worse I would feel better about my bet on building out more Hardware I would feel worse about trying to build out my own model like why is Elon doing that like like there's plenty of like just pick one up off the ground like why are you making your own are you excited when you look forward at the next few years or are you quite nervous you could say this is a time of heightened International Warfare in terms of this new AI arms race China stealing everything us forced to steal back long ago I stopped having like good days and bad days it's yes it's how many good things it's how many bad things right when you when you run an organization and so I'm both excited and nervous and I'm excited and nervous about different things at the same time the thing that I am most nervous about is that unlike nuclear war you can use AI tools to attack each other Google just announced recently the first zero day exploit found by an LM that was previously unknown yeah that's a scary one so now J for anyone who does n Sound zero Dax so um how would you like me to have access to your phone not ideal how would you like the CCP to have access to your phone even last tonight that's a nation state and nation states have a lot of resources and if they stand up a bunch of compute and they start scanning for vulnerabilities and all the open source that's out there and not even the open source just like scanning um you know ports on the internet and trying to figure out if they can break in they can just automate that now they don't need to hire people to do that and now the defense has to be automated because there's no way to keep up with automated attackers and what happens if this gets out of control but worse it's it's a small enough like it's not killing anyone and it's also deniable that's the hardest part about it because is it I mean is it really China is it Russia is it North Korea is it a a friendly that's making it seem like it's one of them or vice versa so now you have this ability so you go from where we had a cold war because having a war was unconscionable it was Unthinkable because of the consequences to now yeah I'm just hacking you and that could spiral out of control so I'm I'm worried that we're going to have more back and forth and and think of it this way if you are a nation state and you let let's say that Harry you know you're you're a beacon to the The Venture community and you want to Rally the European entrepreneurs to be risk on right and I'm someone who doesn't want that because I don't want the competition a country that doesn't want that maybe I Sol your reputation maybe I make you Persona NG grada and how is that any worse than shooting someone it could be worse in some ways but you can get away with it and so that has me nervous really nervous but I'm also really excited because we are seriously going to be able to innovate as fast as we can come up with ideas now you're not going to have to implement things you're going to be able to prompt engineer your way through things just as um we moved from Hardware Engineers to software engineers and sped up productivity right you're now just going to be able to have a prompt engineer who doesn't even write software one of our Engineers made this app where you can just describe what you want built and it builds it and because you know we're so fast it's like that and you just iterate and it'll build an app for you and like crazy things it'll just yeah I just don't understand why the I'm sorry to just continues but where the value cres then because you mentioned there kind of hey they mentioned they created this you know tool which allows you to prompt and middle build the app I'm sure you've seen bolt. new I'm not sure if you've seen lovable where it's basically chat GPT but for kind of website Creation in it's bluntest terms is there value in that everyone was like there's no value in these rapper apps everyone's like there's no value in these Foundation models where the [ __ ] is that value and that's part of the exciting part it's discovering that but but I think people will always prefer to use the highest quality most polished product I think there is an opportunity for artisanship craftsmanship right and just perfecting it right and and getting to a certain number of Nines in the details I mean the like the EM quote the details aren't the details the details are the thing I used to be a little concerned with the quote um you know if you're not ashamed of the quality of your your first released and you've waited too long because there's a subtlety and Nuance there there's soundness and then there's completeness what you want is an incomplete product something that doesn't do everything that's why you should be embarrassed but it shouldn't like blue screen of death on you right that that's not a good um embarrassment right and so what you're going to see now is because it's so easy to come up with something that just kind of works it's a little embarrassing but it kind of works people really going to Value well-crafted high quality products Jonathan I cannot thank you enough for breaking down so many different elements for me and putting up with my basic questions you've been fantastic and honestly I so appreciate the short notice no problem and um good luck and um uh have fun out there I mean this is a brand new age it really is