Transcript for:
Nvidia's Pivotal Role in AI Advancement

I just got back from GTC and I'm convinced that Wall Street does not understand Nvidia that's because the key to finding great stocks is understanding a company's products not just their profits and after everything I learned I'm convinced that Nvidia will be the first company on Earth to hit $5 trillion in market cap let me show you why your time is valuable so let's get right into it this is actually my third time trying to make this video here's what happened Nvidia has a huge ecosystem of software and Hardware that all work together kind of like a massive puzzle where every single piece matters and at the conference last week Nvidia showed off tons of changes and explained how they all impact the overall AI puzzle but if I walk you through every Innovation this video would be around 80 minutes long that was take one on the flip side I could have just reacted to Jensen Hong's keynote presentation like everybody else pointed out what Wall Street analysts are missing by only looking a couple quarters ahead and called it a day and to be honest that was take two but then why even go to GTC why did I spend so much time interviewing nvidia's Executives and talking to their infrastructure partners and attending expert sessions if none of that is going to turn into value for you the viewers that got me into the conference in the first place so for my third try I just want to answer one simple question why does any of this AI stuff even really matter because if the picture on the puzzle doesn't matter then we shouldn't bother understanding the pieces let alone investing in them don't worry I won't make you wait until the end of the video to get the answer this is a slide from a GTC panel that I attended by Jonathan Godwin the CEO of orbital materials and the former lead for AI and materials at Google deep mind I you know the history of human development over the past 100 years has almost been the story of discovering these materials each x-axis is time in months and the Y AIS is basically the impact of generative AI on the discovery of new materials in chart one the resulting increase in patent filings in Chart 2 and the rise in new product prototypes in chart 3 the first 12 months give us a baseline for the average amount of progress in each area and the next 20 months show the impacts of generative Ai and the impact has been massive 6 months after adopting AI the scientific Community started discovering a record number of new Advanced Materials advancing the state-of-the-art and transistor based chips as well as quantum computers solar cells for energy generation and solid state batteries for energy storage lightweight composits and high temperature Ceramics for huge Industries like Aerospace and Robotics the same exact industries that we invest in just 8 months after adopting generative AI material scientists were filing a record number of new patents and by month 17 they were testing a record number of new product prototypes now let me point out a few very important things here first I only saw this chart because I attended this specific session at the conference in person that's why doing boots on the ground research is so important and why I think so many Wall Street analysts still don't understand Nvidia second there were many more charts like this across many more industry specific sessions at GTC not just material science but Gene sequencing and Drug Discovery signal processing and Communications computer Drafting and product design physics based modeling and simulation movies and video games and the list goes on and on almost every industry is seeing a steep rise in research papers patents and prototypes thanks to generative AI this is what I mean when I say get in early all of this progress is powered by nvidia's Hopper architecture which is the dominant data center GPU today but as all these industry specific AI models migrate from Hopper to Blackwell and Blackwell Ultra later this year they'll also move from Mostly one-hot responses to reasoning whether that means thinking step by step using a mixture of experts model or something else entirely and when that happens we're going to see an explosion of new discoveries patent filings prototypes and eventually Market ready products and services across every industry so yeah we're still very early and third you'll notice that I haven't said anything about large language models so far no chat p pt40 no Gemini 2.5 and no deep SE car1 I'm talking about things with real impacts in the physical world the next battery in your phone and processor in your laptop your next prescription medication the next car in your garage and the next movie you see but the big challenge is that reasoning drastically increases the amount of compute that these AI models take to run at inference time all that extra thinking takes more tokens which means more GPU time which means more power which means more money so now basically it's uh AI teaching AIS how to be better AIS that post training process is where an enormous amount of innovation is happening right now a lot of it happened uh with these reasoning models and that computation load could be a hundred times more than pre-training and then Here Comes inference the reasoning process instead of just spewing out an answer when prompted it reasons about it it thinks about how best to answer that question breaks it down step by step might even reflect upon it come up with several versions pick the best one and then presents it to you so the amount of computation that we have to do even at inference time now is a 100 times more than what we used to do when chat GPD first came out in the long term 100 times more inference compute means 100 times more overall compute because most workloads will eventually be inference workloads kind of like most workloads on Google servers today are user searches as opposed to training the algorithm that's the big picture when it comes to this generative AI Revolution that's why there needs to be such a big performance jump from Hopper to Blackwell Blackwell to Reuben and Ruben to Fineman that's why we should care about understanding the puzzle pieces that make up this massive AI ecosystem powered by nvidia's gpus so now that we know the why let's talk about how now Nvidia is making these huge leaps in performance during the GTC keynote Jensen talked a lot about scale up versus scale out distributed computing is about using a whole lot of different computers working together to solve a very large problem but there's no replacement for scaling up before you scale out both are important but you want to scale up first before you scale out while scaling up is incredibly hard in a nutshell scaling up means increasing the compute power in a single rack and scaling Out means connecting more racks together without sacrificing overall system performance the reason that data centers want to scale up first is to make the most out of their power budget while minimizing the total Network distance between any two gpus remember parallel processing is only as fast as the time it takes to get the last answer from the last GPU and moving data between chips using nvlink is around 10 times faster faster than moving it between racks with something like infin band what most Wall Street analysts still don't understand is that when Nvidia announces a new architecture like Blackwell or Reuben that actually includes many different chips the GPU itself a CPU a data processing unit or dpu multiple configurations of EnV link chipto chip switches as well as the infin band and Spectrum X networking solutions to connect many racks together let's go over how all these pieces fit together in a single rack first I gained almost 100,000 subscribers since the last time I walked through this so it'll be good to get everyone on the same page first Blackwell actually combines two GPU dies into a single chip and connects them with a 10 TB pers second link so they act like one GPU Nvidia designed it this way because the process technology to cram 200 billion transistors on a single die this small simply doesn't exist the Blackwell Ultra which comes out in the second half of 2025 and the Reuben GPU coming out in 202 6 both have the same twood diey design as well but the Reuben Ultra which comes out in 2027 connects four GPU dies to achieve 100 pedop flops of fp4 performance that means the Reuben Ultra GPU is expected to have around seven times the training performance versus Blackwell and 22 and 1 half times more inference performance said another way nvidia's gpus are improving at roughly twice the rate of Mor's law for training and about three times More's law for inference also Jensen's presentation Compares everything to Blackwell Ultra but I'm comparing everything to Blackwell since that's what Nvidia is currently shipping and I want to stay consistent between videos just divide the numbers I'm about to share by about 1.5 to compare them to Blackwell Ultra anyway this is where we start scaling up the next level is the gb2 200 super chip which pairs two Blackwell gpus with one Grace CPU the GB 300 super chip uses Black well Ultras instead of standard Blackwells and when Nvidia starts shipping Vera Ruben the 88 core arm-based Vera CPU will replace Grace and the two Reuben gpus will replace the black Wells either way the CPU and two gpus are connected via EnV link which is a chipto chip connection with 900 GB per second of bandwidth said in English EnV link is fast enough to move 150 90 minute long 4K movies between chips every single second two of these Grace Blackwell Super Chips go into one Blackwell compute node which is one tray in nvidia's AI data center racks the two Super Chips per tray are also connected by NV link today's Blackwell systems use the fifth generation EnV link chips but next year the sixth generation chips will double in speed and the seventh generation will have twice as many ports to support Reuben ultra's 4 GPU design likewise nvidia's scale out networking solution are set to double in speed every 2 years which I'll get back to in a little bit the other part of scaling up are these blue field data processing units dpus handle highly paralleled workloads like coordinating and securing Network traffic and moving data between many separate chips so that the CPUs and gpus don't have to do that by the way nvlink infiniband and these Bo field dpus all came out of nvidia's $7 billion acquisition of melanox back in 2019 which will probably go down as one of the best Acquisitions in Silicon Valley history anyway these 18 compute trays go into 1 gb200 nvl 72 system it's called that because the gb200 super chip has two blackwall gpus on it connected via NV link there are two Super Chips per compute tray and 18 trays per rack 2 * 2 * 18 equals 72 blackwall gpus connected by EnV link which is why the whole system is called the gb2 200 nvl 72 for Blackwell and the gb300 nvl 72 for Blackwell Ultra but remember each Blackwell chip actually has two GPU dies on it so it should actually be called the nvl 144 Nvidia isn't going back and fixing this naming convention for Blackwell but that's why the Next Generation rack is called the Vera rubben nvl 144 even though it has the same amount of gpus as Blackwell now here's where things get really crazy Blackwell Blackwell Ultra and Vera Rubin all go into a rack system called Oberon there are two important things investors should know about obber on racks first they're entirely liquid cooled which is why data centers are transitioning to liquid cooling for Blackwell it's the only way to cool so many high performance chips Pat together so tightly around 90% of all server racks are air cooled today but industry estimates suggest that up to 80% of data center cooling will become direct to chip liquid cooling over time that's why I also cover stocks like Veri which Jensen mentioned by name during his Kino and second Oberon uses what Nvidia calls a 1098 configuration 10 compute trays at the top nine networking trays in the middle and eight more compute trays on the bottom each of these nine nvlink switch trays have two nvlink chips that connect to four Blackwell gpus each four ports per envying chip time two chips per tray * trays is 72 NV link ports providing all to all GPU Communications at an insane 130 terabytes per second that's the back that's the that's the back the dgx MV link spine 130 terabytes per second goes through the back of that chassis that is more than the aggregate bandwidth of the internet we could basically send everything to everybody within a second so these nine nwing switch trays have more total bandwidth than the entire internet today but that's not the crazy part the crazy part is that Nvidia straight up deleted those trays for Reuben Ultra in fact the entire 1098 configuration is just poof gone when Jensen revealed this new rack called kyber I don't think anybody in the entire Arena understood what we were looking at but I really wanted to understand this huge piece of the puzzle so I interviewed Dion Harris nvidia's senior director of performance Computing and AI Factory Solutions and I interviewed Charlie Bole nvidia's vice president of dgx systems I tried including parts of each interview in this video but there was just way too much information so I'll be releasing them as separate podcast episodes instead but here's what I found as the name implies the Reuben Ultra nvl 576 has four times more gpus than the Blackwell Ultra systems the kyber uses up to 600 KW of power which is five times more than the 120 KW used by Blackwell b200 racks today this one rack will deliver 15 exop flops of fp4 inference performance and five exop flops of fp8 performance for training that means that one Reuben Ultra rack is the equivalent of 21 Blackwell systems in terms of performance 21 times the compute per rack for five times the power means the Reuben Ultra racks are also 4.2 times more power efficient than black Wells but to fit in so much more compute Nvidia had to remove the nine networking trays and over 2 miles of cables connecting all the gpus in each rack those 5,000 cables and connectors were replaced by a single 72 lay printed circuit board where each lay connects the eight gpus per tray to every other tray via those 7eventh generation Envy link switch chips at a whopping 3600 GB per second which again would be like moving 3 90minut 4K movies between chips every single second and that's just one of the innovations that Nvidia had to make to scale up their single rack performance by 21 times in just 3 years now let's talk about scaling out by the way are you starting to see why this video took me three tries to put together I haven't slept in a week and if you feel I've earned it consider hitting the like button and subscribing to the channel that really helps me out and it lets me know that I should lose more sleep to make more deep Dives like this thanks and with that out of the way here's what you should know about nvidia's scaling out like I mentioned earlier nvidia's scaleout networking Solutions are set to double in speed every 2 years they have Spectrum X for data centers running on ethernet as well as their infiniband Optical Solutions but Optics pose an interesting challenge it turns out that for every GPU in a data center there are six fiber optic transceivers one cable connects each GPU to a switch one for switch to switch connections and a third to another GPU on the other end that's three cables with a transceiver on each end so six transceivers per GPU assuming there's only one layer of switches between them each transceiver costs around $11,000 and has a 30 W laser which means all this Optical networking equipment costs $6,000 and 180 watts of power per GPU these transceivers don't do any math they just turn electric signals into light and back into electric elect signals again last summer Elon musk's AI company xai built a supercomputer called Colossus that uses 100,000 Nvidia gpus that would be $600 million in Optical transceivers that use 18 megaw of power to drive the lasers 18 megaw is enough to power 150 Blackwell Ultra racks which would be 21,600 Blackwell Ultra gpus so to fix this problem Nvidia is entering the silic con photonics Market with three different co-packaged Optical Network switches the 144 Port quantumx infiniband switch comes out later this year and the Spectrum X photonic switches come out next year for ethernet the special thing about these switches is they have a series of lasers mirrors Optical interposers wave guides and modulators that do the work of four times as many transceivers that means that connecting 576 gpus like from one Reuben Ultra nvl 576 rack now only needs 144 transceivers on the switch side so now instead of six transceivers per GPU you only need three or four depending on the network that's a huge amount of power that can go directly back to even more gpus for example elon's Colossus supercomputer would have saved 9 megaw of power which could have been used to scale out even further by adding more racks and that's just one of the innovations that Nvidia had to make to allow so much more scaleout performance across more racks packed with more gpus like I said at the start this is my third try at making this video and one of the things I really struggled with was how to put it all together so hopefully the third time's the charm the world is changing we're at the very beginning of a huge AI Revolution While most people still think that generative AI is all about chat Bots and funny pictures the scientific Community is using it to make more discoveries file more patents and build more prototypes than ever before AI is pushing the state-of-the-art in everything from solar panels and electric vehicles to semiconductors and the robots they power the same exact things we're investing in but it turns out that the world needs 100 times more compute to power all that progress and closing that Gap takes more than just better chips it takes a fundamental rethinking of how we scale up and how we scale out our data centers water cooling instead of air cooling printed circuit back planes instead of miles of cables connecting network trays Optical switches that eliminate megawatts worth of transceivers the world is crazy especially right now with all the tariffs and economic tensions and I can't tell you when investing in AI is going to pay off but hopefully this video helped you understand some of the very real progress that's getting lost in all this noise not just in Material Science but in everything from Gene sequencing and Drug Discovery to movies and video games almost every industry is innovating big time thanks to generative Ai and to me that's a future worth investing in but Nvidia isn't the only company making serious progress in AI so if you want to see what other AI companies I'm investing in make sure to check out this video next either way thanks for watching and until next time this is ticker symbol U my name is Alex reminding you that the best investment you can make is in you