Transcript for:
Nvidia GTC Keynote Insights

[Music] I am I am a Visionary Illuminating galaxies  to witness the birth of [Music] stars and sharpening our understanding of extreme   weather [Music] events I am a helper  guiding the blind through a crowded world I was thinking about running  to the store and giving voice to   those who cannot speak to not make me laugh I am a Transformer harnessing gravity to  store Renewable [Music] Power [Music] and Paving the way towards unlimited clean   energy for us [Music] all I am a  [Music] trainer teaching robots to assist to watch out for  [Music] danger and help save lives I am a Healer providing a  new generation of cures and new   levels of patient care doctor that I am  allergic to penicillin is it still okay   to take the medications definitely these  antibiotics don't contain penicillin so   it's perfectly safe for you to take them I  am a navigator [Music] generating virtual scenarios to let us safely explore the real world and understand every [Music] decision I even helped write the script breathe life into the words [Music] I am AI brought to life by Nvidia  deep learning and Brilliant Minds everywhere please welcome to the stage Nvidia founder and CEO  Jensen [Music] [Applause] [Music] Wong welcome to GTC I hope you realize this is not a concert you have arrived at  a developers conference there   will be a lot of science described  algorithms computer architecture mathematics I sensed a very heavy weight in the  room all of a sudden almost like you were in   the wrong place no no conference in the world  is there a great assembly of researchers from   such diverse fields of science from climatech to  radio Sciences trying to figure out how to use AI   to robotically control MOS for Next Generation 6G  radios robotic self-driving car s even artificial intelligence even artificial intelligence  everybody's first I noticed a sense of relief   there all of all of a sudden also this conference  is represented by some amazing companies this list   this is not the attendees these are the  presentors and what's amazing is this if   you take away all of my friends close friends  Michael Dell is sitting right there in the IT industry all of the friends I grew up with in  the industry if you take away that list this is   what's amazing these are the presenters of the  non it Industries using accelerated Computing   to solve problems that normal computers  can't it's represented in life sciences   healthc Care genomics Transportation of  course retail Logistics manufacturing industrial the gamut of Industries represented  is truly amazing and you're not here to attend   only you're here to present to talk about  your research $100 trillion dollar of the   world's Industries is represented in  this room today this is absolutely amazing there is absolutely something  happening there is something going on   the industry is being transformed not just ours  because the computer industry the computer is   the single most important instrument of society  today fundamental transformations in Computing   affects every industry but how did we start  how did we get here I made a little cartoon   for you literally I drew this in one page  this is nvidia's Journey started in 1993   this might be the rest of the talk 1993 this  is our journey we were founded in 1993 there   are several important events that happen  along the way I'll just highlight a few   in 2006 Cuda which has turned out to have been  a revolutionary Computing model we thought it   was revolutionary then it was going to be an  overnight success and almost 20 years later it happened we saw it coming two decades later in 2012 alexnet Ai and Cuda made first  Contact in 2016 recognizing the importance   of this Computing model we invented a brand  new type of computer we called the dgx one   170 Tera flops in this supercomputer eight  gpus connected together for the very first   time I hand delivered the very first dgx-1 to  a startup located in San Francisco called open AI dgx-1 was the world's first AI supercomputer  remember 170 Tera flops 2017 the Transformer   arrived 2022 chat GPT capture the world's  imag imaginations have people realize the   importance and the capabilities of artificial  intelligence and 2023 generative AI emerged and   a new industry begins why why is a new industry  because the software never existed before we are   now producing software using computers to write  software producing software that never existed   before it is a brand new category it took share  from nothing it's a brand new category and the   way you produce the software is unlike anything  we've ever done before in data centers generating   tokens producing floating Point numbers at very  large scale as if in the beginning of this last   Industrial Revolution when people realized  that you would set up factories apply energy   to it and this invisible valuable thing called  electricity came out AC generators and 100 years   later 200 years later we are now creating new  types of electrons tokens using infrastructure   we call factories AI factories to generate this  new incredibly valuable thing called artificial   intelligence a new industry has emerged well  well we're going to talk about many things   about this new industry we're going to talk  about how we're going to do Computing next   we're going to talk about the type of software  that you build because of this new industry the   new software how you would think about this  new software what about applications in this   new industry and then maybe what's next and  how can we start preparing today for what is   about to come next well but before I start I  want to show you the soul of Nvidia the soul   of our company at the intersection of computer  Graphics physics and artificial intelligence   all intersecting inside a computer in Omniverse  in a virtual world simulation everything we're   going to show you today literally everything  we're going to show you today is a simulation   not animation it's only beautiful because it's  physics the world is beautiful it's only amazing   because it's being animated with robotics it's  being animated with artificial intelligence what   you're about to see all day it's completely  generated completely simulated and Omniverse   and all of it what you're about to enjoy is the  world's first concert where everything is homemade everything is homemade you're about to watch  some home videos so sit back and enjoy [Music] [Music] yourself [Music] m what [Music] 0:13:29.120,1193:02:47.295 [Music] a [Music] [Music] God I love it Nvidia accelerated Computing has reached the  Tipping Point general purpose Computing has   run out of steam we need another way of doing  Computing so that we can continue to scale so   that we can continue to drive down the cost of  computing so that we can continue to consume   more and more Computing while being sustainable  accelerated Computing is a dramatic speed up over   general purpose Computing and in every single  industry we engage and I'll show you many the   impact is dramatic but in no industry is a  more important than our own the industry of   using simulation tools to create products in this  industry it is not about driving down the cost of   computing it's about driving up the scale of  computing we would like to be able to sim at   the entire product that we do completely  in full Fidelity completely digitally in   essentially what we call digital twins we would  like to design it build it simulate it operate it   completely digitally in order to do that we need  to accelerate an entire industry and today I would   like to announce that we have some Partners who  are joining us in this journey to accelerate their   entire ecosystem so that we can bring the world  into accelerated Computing but there's a bonus   when you become accelerated your infrastructure  is cou to gpus and when that happens it's exactly   the same infrastructure for generative Ai and  so I'm just delighted to announce several very   important Partnerships there are some of the most  important companies in the world and Anis does   engineering simulation for what the world makes  we're partnering with them to Cuda accelerate the   Ansys ecosystem to connect Ansys to the Omniverse  digital twin incredible the thing that's really   great is that the install base of media GPU  accelerated systems are all over the world in   every cloud in every system all over Enterprises  and so the app the applications they accelerate   will have a giant installed base to go serve end  users will have amazing applications and of course   system makers and csps will have great customer  demand synopsis synopsis is nvidia's literally   first software partner they were there in very  first day of our company synopsis revolutionized   the chip industry with high level design we  are going to Cuda accelerate synopsis we're   accelerating computational lithography one of  the most important applications that nobody's   ever known about in order to make chips we have  to push lithography to limit Nvidia has created   a library domain specific library that accelerates  computational lithography incredibly once we can   accelerate and software Define all of tsmc who  is announcing today that they're going to go into   production with Nvidia kitho once this software  defined and accelerated the next step is to apply   generative AI to the future of semiconductor  manufacturing push in Geometry even further   Cadence builds the world's essential Eda and SDA  tools we also use Cadence between these three   companies ansis synopsis and Cadence we basically  build Nvidia together we are cud accelerating   Cadence they're also building a supercomputer out  of Nvidia gpus so that their customers could do   fluid Dynamic simulation at a 100 a thousand times  scale basically a wind tunnel in real time Cadence   Millennium a supercomputer with Nvidia gpus inside  a software company building supercomputers I love   seeing that building Cadence co-pilots together  imagine a day when Cadence could synopsis ansis   tool providers would offer you AI co-pilots so  that we have thousands and thousands of co-pilot   assistants helping us design chips Design Systems  and we're also going to connect Cadence digital   twin platform to Omniverse as you could see the  trend here we're accelerating the world's CAE Eda   and SDA so that we could create our future in  digital Twins and we're going to connect them   all to Omniverse the fundamental operating  system for future digital twins one of the   industries that benefited tremendously from scale  and you know you all know this one very well large   language model basically after the Transformer  was invented we were able to scale large language   models at incredible rates effectively doubling  every six months now how is it possible that by   doubling every six months that we have grown  the industry we have grown the computational   requirements so far and the reason for that  is quite simply this if you double the size   of the model you double the size of your brain you  need twice as much information to go fill it and   so every time you double your parameter count you  also have to appropriately increase your training   token count the combination of those two numbers  becomes the computation scale you have to support   the latest the state-of-the-art open AI model is  approximately 1.8 trillion parameters 1.8 trillion   parameters required several trillion tokens to go  train so so a few trillion parameters on the order   of a few trillion tokens on the order of when you  multiply the two of them together approximately   30 40 50 billion quadrillion floating Point  operations per second now we just have to do some   Co math right now just hang hang with me so you  have 30 billion quadrillion a quadrillion is like   a paa and so if you had a PA flop GPU you would  need 30 billion seconds to go compute to go train   that model 30 billion seconds is approximately  1,000 years well 1,000 years it's worth it like to do it sooner but it's worth it which is usually my answer  when most people tell me hey   how long how long's it going to take to  do something 20 years how it it's worth it but can we do it next week and so 1,000 years 1,000 years so what  we need what we need are bigger gpus we need   much much bigger gpus we recognized this early on  and we realized that the answer is to put a whole   bunch of gpus together and of course innovate  a whole bunch of things along the way like   inventing 10 censor cores advancing MV links so  that we could create essentially virtually Giant   gpus and connecting them all together with amazing  networks from a company called melanox infiniband   so that we could create these giant systems and so  djx1 was our first version but it wasn't the last   we built we built supercomputers all the way all  along the way in 2021 we had Seline 4500 gpus or   so and then in 2023 we built one of the largest AI  supercomputers in the world it's just come online   EOS and as we're building these things we're  trying to help the world build these things and in   order to help the world build these things we got  to build them first we build the chips the systems   the networking all of the software necessary  to do this you should see these systems imagine   writing a piece of software that runs across the  entire system Distributing the computation across   thousands of gpus but inside are thousands of  smaller gpus millions of gpus to distribute work   across all of that and to balance the workload so  that you can get the most Energy Efficiency the   best computation time keep your cost down and so  those those fundamental Innovations is what got us   here and here we are as we see the miracle of chat  GPT emerg in front of us we also realize we have a   long ways to go we need even larger models we're  going to train it with multimodality data not   just text on the internet but we're going to we're  going to train it on texts and images and graphs   and charts and just as we learn watching TV and  so there's going to be a whole bunch of watching   video so that these Mo models can be grounded in  physics understands that an arm doesn't go through   a wall and so these models would have common  sense by watching a lot of the world's video   combined with a lot of the world's languages it'll  use things like synthetic data generation just as   you and I do when we try to learn we might use  our imagination to simulate how it's going to   end up just as I did when I Was preparing for  this keynote I was simulating it all along the way I hope it's going to turn  out as well as I had it in my head as I was simulating how this keynote was  going to turn out somebody did say that another   performer did her performance completely on  a treadmill so that she could be in shape to   deliver it with full energy I I didn't do that if  I get a l wind at about 10 minutes into this you   know what happened and so so where were we we're  sitting here using synthetic data generation we're   going to use reinforcement learning we're going  to practice it in our mind we're going to have ai   working with AI training each other just like  student teacher Debaters all of that is going   to increase the size of our model it's going  to increase the amount of the amount of data   that we have and we're going to have to build  even bigger gpus Hopper is fantastic but we   need bigger gpus and so ladies and gentlemen I  would like to introduce you to a very very big [Applause] GPU named after David Blackwell math  ician game theorists probability we   thought it was a perfect per per perfect  name black wealth ladies and gentlemen enjoy this the com [Applause] Blackwell is not a chip Blackwell  is the name of a platform uh people think we   make gpus and and we do but gpus don't look  the way they used to here here's the here's   the here's the the if you will the heart  of the blackw system and this inside the   company is not called Blackwell it's  just the number and um uh this this   is Blackwell sitting next to oh this is the  most advanced GPU in the world in production today this is Hopper this is Hopper  Hopper changed the world this is Blackwell it's okay Hopper you're you're very good good good boy  well good girl 208 billion transistors and so   so you could see you I can see that there's a  small line between two dyes this is the first   time two dieses have abutted like this together  in such a way that the two chip the two dieses   think it's one chip there's 10 terabytes of  data between it 10 terabytes per second so   that these two these two sides of the Blackwell  Chip have no clue which side they're on there's   no memory locality issues no cach issues it's  just one giant chip and so uh when we were   told that Blackwell's Ambitions were beyond  the limits of physics uh the engineer said   so what and so this is what what happened and  so this is the Blackwell chip and it goes into   two types of systems the first one is form fit  function compatible to Hopper and so you slide   all Hopper and you push in Blackwell that's the  reason why one of the challenges of ramping is   going to be so efficient there are installations  of Hoppers all over the world and they could be   they could be you know the same infrastructure  same design the power the electricity The   Thermals the software identical push it right  back and so this is a hopper version for the   current hgx configuration and this is what  the other the second Hopper looks like this   now this is a prototype board and um Janine  could I just borrow ladies and gentlemen Jan Paul and so this this is the this is  a fully functioning board and I just   be careful here this right here is I don't know10 billion the second one's five it gets cheaper after that so  any customers in the audience it's okay all right but this is this one's quite  expensive this is to bring up board and um   and the the way it's going to go to production  is like this one here okay and so you're going   to take take this it has two blackw Dy two two  blackw chips and four Blackwell dies connected   to a Grace CPU the grace CPU has a super  fast chipto chip link what's amazing is this   computer is the first of its kind where this much  computation first of all fits into this small of   a place second it's memory coherent they feel  like they're just one big happy family working   on one application together and so everything  is coherent within it um the just the amount   of you know you saw the numbers there's a lot  of terabytes this and terabytes that's um but   this is this is a miracle this is a this let's see  what are some of the things on here uh there's um   uh MV link on top PCI Express on the bottom on  on uh your which one is mine and your left one   of them it doesn't matter uh one of them one  of them is a CPU chipto chip link is my left   or your depending on which side I was just I was  trying to sort that out and I just kind of doesn't matter hopefully it comes plugged in so okay so this is the grace Blackwell system but there's more so it turns out it turns out all of the  specs is fantastic but we need a whole lot of   new features uh in order to push the limits  Beyond if you will the limits of physics we   would like to always get a lot more X factors and  so one of the things that we did was We Invented   another Transformer engine another Transformer  engine the second generation it has the ability   to dynamically and automatically rescale and  recas numerical formats to a lower Precision   whenever it can remember artificial intelligence  is about probability and so you kind of have you   know 1.7 approximately 1.7 time approximately  1.4 to be approximately something else does   that make sense and so so the the ability for  the mathematics to retain the Precision and the   range necessary in that particular stage of the  pipeline super important and so this is it's not   just about the fact that we designed a smaller ALU  it's not quite the world's not quite that simple   you've got to figure out when you can use that  across a computation that is thousands of gpus   it's running for weeks and weeks on weeks and you  want to make sure that the the uh uh the training   job is going going to converge and so this new  Transformer engine we have a fifth generation MV   link it's now twice as fast as Hopper but very  importantly it has computation in the network   and the reason for that is because when you  have so many different gpus working together   we have to share our information with each other  we have to synchronize and update each other and   every so often we have to reduce the partial  products and then rebroadcast out the partial   products the sum of the partial products back to  everybody else and so there's a lot of what is   called all reduce and all to all and all gather  it's all part of this area of synchronization   and collectives so that we can have gpus working  with each other having extraordinarily fast links   and being able to do mathematics right in the  network allows us to essentially amplify even   further so even though it's 1.8 terabytes per  second it's effectively higher than that and   so it's many times that of Hopper the likel  Ood of a supercomputer running for weeks on   in is approximately zero and the reason for that  is because there's so many components working at   the same time the statistic the probability of  them working continuously is very low and so   we need to make sure that whenever there is a  well we checkpoint and restart as often as we   can but if we have the ability to detect a weak  chip or a weak note early we could retire it and   maybe swap in another processor that ability  to keep the utilization of the supercomputer   High especially when you just spent $2 billion  building it is super important and so we put in a   Ras engine a reliability engine that does 100%  self test in system test of every single gate   every single bit of memory on the Blackwell  chip and all the memory that's connected to   it it's almost as if we shipped with every  single chip its own Advanced tester that we   CH test our chips with this is the first time  we're doing this super excited about it secure AI only this conference do they clap for Ras the  the uh secure AI uh obviously you've just spent   hundreds of millions of dollars creating a very  important Ai and the the code the intelligence   of that AI is encoded in the parameters you  want to make sure that on the one hand you   don't lose it on the other hand it doesn't get  contaminated and so we now have the ability to   encrypt data of course at rest but also in transit  and while it's being computed it's all encrypted   and so we now have the ability to encrypt and  transmission and when we're Computing it it is   in a trusted trusted environment trusted  uh engine environment and the last thing   is decompression moving data in and out of these  nodes when the compute is so fast becomes really   essential and so we've put in a high linee speed  compression engine and effectively moves data 20   times times faster in and out of these computers  these computers are are so powerful and there's   such a large investment the last thing we want  to do is have them be idle and so all of these   capabilities are intended to keep Blackwell  fed and as busy as possible overall compared   to Hopper it is two and a half times two and  a half times the fp8 performance for training   per chip it is ALS it also has this new format  called fp6 so that even though the computation   speed is the same the bandwidth that's Amplified  because of the memory the amount of parameters   you can store in the memory is now Amplified  fp4 effectively doubles the throughput this   is vitally important for inference one of the  things that that um is becoming very clear is   that whenever you use a computer with AI on the  other side when you're chatting with the chatbot   when you're asking it to uh review or make an  image remember in the back is a GPU generating   tokens some people call it inference but it's more  appropriately generation the way that Computing   is done in the past was retrieval you would  grab your phone you would touch something um   some signals go off basically an email goes off  to some storage somewhere there's pre-recorded   content somebody wrote a story or somebody made  an image or somebody recorded a video that record   pre-recorded content is then streamed back to  the phone and recomposed in a way based on a   recommender system to present the information to  you you know that in the future the vast majority   of that content will not be retrieved and the  reason for that is because that was pre-recorded   by somebody who doesn't understand the context  which is the reason why we have to retrieve so   much content if you can be working with an AI  that understands the context who you are for   what reason you're fetching this information and  produces the information for you just the way you   like it the amount of energy we save the amount of  networking bandwidth we save the amount of waste   of time we save will be tremendous the future  is generative which is the reason why we call   it generative AI which is the reason why this  is a brand new industry the way we compute is   fundamentally different we created a processor  for the generative AI era and one of the most   important parts of it is content token generation  we call it this format is fp4 well that's a lot   of computation 5x the Gen token generation 5x  the inference capability of Hopper seems like enough but why stop there the answer is it's  not enough and I'm going to show you why I'm   going to show you why and so we would like to  have a bigger GPU even bigger than this one and   so we decided to scale it and notice but first  let me just tell you how we've scaled over the   course of the last eight years we've increased  computation by 1,000 times8 years 1,000 times   remember back in the good old days of Moore's Law  it was 2x well 5x every what 10 10x every 5 years   that's easier easiest math 10x every 5 years  a 100 times every 10 years 100 times every 10   years at the in the middle in the hey days of the  PC Revolution one 100 times every 10 years in the   last 8 years we've gone 1,000 times we have  two more years to go and so that puts it in perspective the rate at which we're advancing  Computing is insane and it's still not fast   enough so we built another chip this chip  is just an incredible chip we call it the   Envy link switch it's 50 billion transistors  it's almost the size of Hopper all by itself   this switch ship has four MV links in it  each 1.8 terabytes per second and and it   has computation in as I mentioned what is  this chip for if we were to build such a   chip we can have every single GPU talk to every  other GPU at full speed at the same time that's insane it doesn't even make sense but if you could  do that if you can find a way to do that and build   a system to do that that's cost effective that's  cost effective how incredible would it be that we   could have all these gpus connect over a coherent  link so that they effectively are one giant GPU   well one of one of the Great Inventions in  order to make a cost effective is that this   chip has to drive copper directly the seres of  this chip is is just a phenomenal invention so   that we could do direct drive to copper and as  a result you can build a system that looks like this now this system this system is kind of  insane this is one dgx this is what a dgx   looks like now remember just six years ago  it was pretty heavy but I was able to lift it I delivered the uh the uh first djx1 to  open Ai and and the researchers there it's   on you know the pictures are on the internet  and uh uh and we all autographed it uh and   um uh if you come to my office it's autographed  there is really beautiful and but but you could   lift it uh this dgx this dgx that djx by  the way was 170 teraflops if you're not   familiar with the numbering system that's  0.17 pedop flops so this is 720 the first   one I delivered to open AI was 0.17 you  could round it up to 0.2 won't make any   difference but and back then was like wow you  know 30 more teraflops and so this is now 720   pedop flops almost an exal flop for training and  the world's first one exal flops machine in one rack just so you know there are only a  couple two three exop flops machines on   the planet as we speak and so this  is an exop flops AI system in one   single rack well let's take a look at the back of it so this is what makes it possible  that's the back that's the that's   the back the dgx MV link spine 130  terabytes per second goes through   the back of that chassis that is more  than the aggregate bandwidth of the internet so we we could basically send everything  to everybody within a second and so so we we have   5,000 cables 5,000 mvlink cables in total 2 miles  now this is the amazing thing if we had to use   Optics we would have had to use transceivers  and retim and those transceivers and reers   alone would have cost 20,000 watts 2 kilowatts  of just transceivers alone just to drive the   mvlink spine as a result we did it completely  for free over mvlink switch and we were able   to save the 20 kilow for computation this entire  rack is 120 kilowatts so that 20 kilowatts makes   a huge difference it's liquid cooled what  goes in is 25° C about room temperature   what comes out is 45°c about your jacuzzi so room  temperature goes in jacuzzi comes out 2 liters per second we could we could sell a peripheral 600,000 Parts somebody used to say  you know you guys make gpus and we do but this   is what a GPU looks like to me when somebody  says GPU I see this two years ago when I saw   a GPU was the hgx it was 70 lb 35,000 Parts our  gpus now are 600,000 parts and 3,000 lb 3,000 lb   3,000 lb that's kind of like the weight of a you  know Carbon Fiber Ferrari I don't know if that's   useful metric but everybody's going I feel it I  feel it I get it I get that now that you mention   that I feel it I don't know what's 3,000 lb okay  so 3,000 lb ton and a half so it's not quite an   elephant so this is what a dgx looks like now  let's see what it looks like in operation okay   let's imagine what is what how do we put this  to work and what does that mean well if you   were to train a GPT model 1.8 trillion parameter  model it took it took about apparently about you   know 3 to 5 months or so uh with 25,000 amp  uh if we were to do it with hopper it would   probably take something like 8,000 gpus and  it would consume 15 megawatts 8,000 gpus on   15 megawatts it would take 90 days about 3 months  and that would allows you to train something that   is you know this groundbreaking AI model and  this is obviously not as expensive as as um   as anybody would think but it's 8,000 8,000  gpus it's still a lot of money and so 8,000   gpus 15 megawatts if you were to use Blackwell  to do this it would only take 2,000 gpus 2,000   gpus same 90 days but this is the amazing part  only 4 me GS of power so from 15 yeah that's right and that's and that's our goal our goal  is to continuously drive down the cost and the   energy they're directly proportional to each other  cost and energy associated with the Computing so   that we can continue to expand and scale up the  computation that we have to do to train the Next   Generation models well this is training inference  or generation is vitally important going forward   you know probably some half of the time that  Nvidia gpus are in the cloud these days it's   being used for token generation you know they're  either doing co-pilot this or chat you know chat   GPT that or um all these different models that  are being used when you're interacting with it   or generating IM generating images or generating  videos generating proteins generating chemicals   there's a bunch of gener generation going on  all of that is B in the category of computing   we call inference but inference is extremely  hard for large language models because these   large language models have several properties  one they're very large and so it doesn't fit on   one GPU this is Imagine imagine Excel doesn't fit  on one GPU you know and imagine some application   you're running on a daily basis doesn't run  doesn't fit on one computer like a video game   doesn't fit on one computer and most in fact  do and many times in the past in hyperscale   Computing many applic applications for many  people fit on the same computer and now all   of a sudden this one inference application where  you're interacting with this chatbot that chatbot   requires a supercomputer in the back to run it  and that's the future the future is generative   with these chatbots and these chatbots are  trillions of tokens trillions of parameters   and they have to generate tokens at interactive  rates now what does that mean well uh three to   tokens is about a word I you know the the uh  you know space the final frontier these are the   adventures that's like that's like 80 tokens  okay I don't know if that's useful to you and so you know the art of communications is   is selecting good an good analogies  yeah this is this is not going well every I don't know what he's talking about  never seen Star Trek and so and so so here   we are we're trying to generate these tokens  when you're interacting with it you're hoping   that the tokens come back to you as quickly as  possible and as quickly as you can read it and   so the ability for Generation tokens is really  important you have to paralyze the work of this   model across many many gpus so that you could  achieve several things one on the one hand you   would like throughput because that throughput  reduces the cost the overall cost per token of   uh generating so your throughput dictates the cost  of of uh delivering the service on the other hand   you have another interactive rate which is another  tokens per second where it's about per user and   that has everything to do with quality of service  and so these two things um uh compete against each   other and we have to find a way to distribute  work across all of these different gpus and   paralyze it in a way that allows us to achieve  both and it turns out the search search space   is enormous you know I told you there's going to  be math involved and everybody's going oh dear I   heard some gasp just now when I put up that slide  you know so so this this right here the the y axis   is tokens per second data center throughput the  x- axis is tokens per second interactivity of the   person and notice the upper right is the best  you want interactivity to be very High number   of tokens per second per user you want the tokens  per second of per data center to be very high the   upper upper right is is terrific however it's very  hard to do that and in order for us to search for   the best answer across every single one of those  intersections XY coordinates okay so you just look   at every single XY coordinate all those blue dots  came from some repartitioning of the software some   optimizing solution has to go and figure out what  whether to use use tensor parallel expert parallel   pipeline parallel or data parallel and distribute  this enormous model across all these G different   gpus and sustain performance that you need this  exploration space would be impossible if not for   the programmability of nvidia's gpus and so we  could because of Cuda because we have such Rich   ecosystem we could explore this universe and find  that green roof line it turns out that green roof   line notice you got tp2 EPA dp4 it means two  parall two uh tensor parallel tensor parallel   across two gpus expert parallels across eight data  parallel across four notice on the other end you   got tensor parallel cross 4 and expert parallel  across 16 the configuration the distribution of   that software it's a different different um  runtime that would produce these different   results and you have to go discover that roof  line well that's just one model and this is just   one configuration of a computer imagine all of the  models being created around the world and all the   different different um uh configurations  of of uh systems that are going to be available so now that you understand the  basics let's take a look at inference of   Blackwell compared to Hopper and this is this  is the extraordinary thing in one generation   because we created a system that's designed  for trillion parameter gener generative AI the   inference capability of Blackwell is off the  charts and in fact it is some 30 times Hopper y for large language models for large language  models like Chad GPT and others like it the blue   line is Hopper I gave you imagine we didn't  change the architecture of Hopper we just   made it a bigger chip we just used the latest you  know greatest uh 10 terab you know terabytes per   second we connected the two chips together we got  this giant 208 billion parameter chip how would we   have performed if nothing else changed and it  turns out quite wonderfully quite wonderfully   and that's the purple line but not as great as it  could be and and that's where the fp4 tensor core   the new Transformer engine and very importantly  the MV link switch and the reason for that is   because all these gpus have to share the results  partial products whenever they do all to all all   all gather whenever they communicate with each  other that mvlink switch is communicating almost   10 times faster than what we could do in the  past using the fastest networks Okay so Blackwell   is going to be just an amazing system for a  generative Ai and in the future in the future   data centers are going to be thought of as I  mentioned earlier as an AI Factory an AI Factory's   goal in life is to generate revenues generate  in this case intelligence in this facility not   generating electricity as in AC generator but of  the last Industrial Revolution and this Industrial   Revolution the generation of intelligence and  so this ability is super super important the   excitement of Blackwell is really off the charts  you know when we first when we first um uh you   know this this is a year and a half ago two years  ago I guess two years ago when we first started to   to go to market with hopper you know we had the  benefit of of uh two two uh two csps uh joined   us in a lunch and and we were you know delighted  um and so we had two customers uh we have more now unbelievable excitement for Blackwell  unbelievable excitement and there's a whole   bunch of different configurations of course I  showed you the configurations that slide into   the hopper form factor so that's easy to upgrade  I showed you examples that are liquid cooled that   are the extreme versions of it one entire rack  that's that's uh connected by mvlink 72 uh we're   going to Blackwell is going to be ramping to the  world's AI companies of which there are so many   now doing amazing work in different modalities the  csps every CSP is geared up all the OEM and odms   Regional clouds Sovereign AIS and Telos all over  the world are signing up to launch with Blackwell this Blackwell Blackwell would be the the the  most successful product launch in our history   and so I can't wait wait to see that um I want  to thank I want to thank some partners that that   are joining us in this uh AWS is gearing up for  Blackwell they're uh they're going to build the   first uh GPU with secure AI they're uh building  out a 222 exf flops system you know just now   when we animated uh just now the digital twin if  you saw the the all of those clusters are coming   down by the way that is not just art that is a  digital twin of what we're building that's how   big it's going to be besides infrastructure we're  doing a lot of things together with AWS we're Cuda   accelerating stag maker AI we're Cuda accelerating  Bedrock AI uh Amazon robotics is working with us   uh using Nvidia Omniverse and Isaac Sim AWS  Health has Nvidia Health Integrated into it   so AWS has has really leaned into accelerated  Computing uh Google is gearing up for Blackwell   gcp already has A1 100s h100s t4s l4s a whole  Fleet of Nvidia Cuda gpus and they recently   announced the Gemma model that runs across all  of it uh we're work working to optimize uh and   accelerate every aspect of gcp we're accelerating  data proc which for data processing their data   processing engine Jax xlaa vertex Ai and mojoko  for robotics so we're working with uh Google and   gcp across a whole bunch of initiatives uh Oracle  is gearing up for black wellth Oracle is a great   partner of ours for Nvidia dgx cloud and we're  also working together to accelerate something   that's really important to a lot of companies  Oracle database Microsoft is accelerating and   Microsoft is gearing up for Blackwell Microsoft  Nvidia has a wide- ranging partnership we're   accelerating Cuda accelerating all kinds of  services when you when you chat obviously and   uh AI services that are in Microsoft Azure uh  it's very very likely Nvidia is in the back uh   doing the inference and the token generation uh  we built they built the largest Nvidia infiniband   supercomputer basically a digital twin of hours  or a physical twin of hours uh we're bringing the   Nvidia ecosystem to Azure Nvidia djx cloud  to Azure uh Nvidia Omniverse is now hosted   in Azure Nvidia Healthcare is an Azure and all of  it is deeply integrated and deeply connected with   Microsoft fabric the whole industry is gearing up  for Blackwell this is what I'm about to show you   most of the most of the the the uh uh uh scenes  that you've seen so far of Blackwell are the are   the full Fidelity design of Blackwell everything  in our company has a digital twin and in fact this   digital twin idea is it is really spreading and it  it helps it helps companies build very complicated   things perfectly the first time and what could  be more exciting than creating a digital twin to   build a computer that was built in a digital  twin and so let me show you what wistron is doing to meet the demand for NVIDIA accelerated  Computing widraw one of our leading manufacturing   Partners is building digital twins of Nvidia  dgx and hgx factories using custom software   developed with Omniverse sdks and apis for  their newest Factory wraw started with a   digital twin to virtually integrate their  multi-ad and process simulation data into   a unified view testing and optimizing layouts  in this physically accurate digital environment   increased worker efficency icy by 51% during  construction the Omniverse digital twin was   used to verify that the physical build matched  the digital plans identifying any discrepancies   early has helped avoid costly change orders and  the results have been impressive using a digital   twin helped bring wion's Factory online in half  the time just 2 and 1/2 months instead of five   in operation the Omniverse digital twin helps  widraw rapidly Test new layouts to accommodate   new processes or improve operations in  the existing space and monitor real-time   operations using live iot data from every machine  on the production line which ultimately enabled   wion to reduce End to-end Cycle Times by 50%  and defect rates by 40% with Nvidia Ai and   Omniverse nvidia's Global ecosystem of partners  are building a new era of accelerated AI enabled [Music] digitalization that's how we that's the way it's going  to be in the future we're going to   manufacturing everything digitally first  and then we'll manufacture it physically   people ask me how did it start what  got you guys so excited what was it   that you saw that caused you to put it  all in on this incredible idea and it's this hang on a second guys that was going to be such a moment that's what happens when you don't rehearse this as you know was first  Contact 20 12 alexnet you put a cat   into this computer and it comes out and it says cat and we said oh my God this is going to change everything you take 1 million numbers you take  one Million numbers across three channels RGB   these numbers make no sense to anybody you  put it into this software and it compress   it dimensionally reduce it it reduces it from a  million dimensions a million Dimensions it turns   it into three letters one vector one number  and it's generalized you could have the cat   be different cats and and you could have it be the  front of the cat and the back of the cat and you   look at this thing you say unbelievable you mean  any cats yeah any cat and it was able to recognize   all these cats and we realized how it did it  systematically structurally it's scalable how big   can you make it well how big do you want to make  it and so we imagine that this is a completely   new way of writing software and now today as you  know you could have you type in the word c a and   what comes out is a cat it went the other way am  I right unbelievable how is it possible that's   right how is it possible you took three letters  and you generated a million pixels from it and   it made sense well that's the miracle and here we  are just literally 10 years later 10 years later   where we recognize textt we recognize images we  recognize videos and sounds and images not only   do we recognize them we understand their meaning  we understand the meaning of the text that's the   reason why it can chat with you it can summarize  for you it understands the text it understood not   just recognizes the the English it understood the  English it doesn't just recognize the pixels and   understood the pixels and you can you can even  condition it between two modalities you can have   language condition image and generate all kinds  of interesting things well if you can understand   these things what else can you understand  that you've digitized the reason why we   started with text and you know images is because  we digitized those but what else have we digitized   well it turns out we digitized a lot of things  proteins and genes and brain waves anything you   can digitize so long as there's structure we can  probably learn some patterns from it and if we can   learn the patterns from it we can understand its  meaning if we can understand its meaning we might   be able to generate it as well and so therefore  the generative AI Revolution is here well what   else can we generate what else can we learn well  one of the things that we would love to learn we   would love to learn is we would love to learn  climate we would love to learn extreme weather   we would love to learn uh what how we can predict  future weather at Regional scales at sufficiently   high resolution such that we can keep people out  of Harm's Way before harm comes extreme weather   cost the world $150 billion surely more than that  and it's not evenly distributed $150 billion is   concentrated in some parts of the world and of  course to some people of the world we need to   adapt and we need to know what's coming and so  we are creating Earth too a digital twin of the   Earth for predicting weather we and we've made an  extraordinary invention called Civ the ability to   use generative AI to predict weather at extremely  high resolution let's take a look as the earth's   climate changes AI powered weather forecasting  is allowing us to more accurately predict and   track severe storms like super typhoon chanthu  which caused widespread damage in Taiwan and   the surrounding region in 2021 current AI forecast  models can accurately predict the track of storms   but they are limited to 25 km resolution which  can miss important details Invidia cordi is a   revolutionary new generative AI model trained on  high resolution radar assimilated Warf weather   forecasts and air 5 reanalysis data using cordi  extreme events like chanthu can be super resolved   from 25 km to 2 km resolution with 1,000 times  the speed and 3,000 times the Energy Efficiency   of conventional weather models by combining the  speed and accuracy of nvidia's weather forecasting   model forecast net and generative AI models like  cordi we can explore hundreds or even thousands   of kilometer scale Regional weather forecasts  to provide a clear picture of the best worst   and most likely impacts of a storm this wealth  of information can help minimize loss of life   and property damage today cordi is optimized  for Taiwan but soon generative super sampling   will be available as part of the in viia Earth  2 inference service for many regions across the [Music] globe the weather company has the trust a source  of global weather predictions we are working   together to accelerate their weather simulation  first principled base of simulation however   they're also going to integrate Earth to cordi so  that they could help businesses and countries do   Regional high resolution weather prediction and  so if you have some weather prediction you'd like   to know like to do uh reach out to the weather  company really exciting really exciting work   Nvidia Healthcare something we started 15 years  ago we're super super excited about this this is   an area where we're very very proud whether  it's Medical Imaging or genene sequencing or   computational chemistry it is very likely that  Nvidia is the computation behind it we've done   so much work in this area today we're announcing  that we're going to do something really really   cool imagine all of these AI models that are being  used to generate images and audio but instead of   images and audio because it understood images  and audio all the digitization that we've done   for genes and proteins and amino acids that  digitalization capability is now now passed   through machine learning so that we understand  the language of Life the ability to understand   the language of Life of course we saw the  first evidence of it with alphafold this   is really quite an extraordinary thing after  Decades of painstaking work the world had only   digitized and reconstructed using cor electron  microscopy or Crystal XR x-ray crystallography   um these different techniques painstaking  reconstructed the protein 200,000 of them   in just what is it less than a year or so  Alpha fold has reconstructed 200 million   proteins basically every protein every of every  living thing that's ever been sequenced this is   completely revolutionary well those models are  incredibly hard to use um for incredibly hard   for people to build and so what we're going to  do is we're going to build them we're going to   build them for uh the the researchers around  the world and it won't be the only one there'll   be many other models that we create and so  let me show you what we're going to do with it virtual screening for new medicines  is a computationally intractable problem   existing techniques can only scan billions  of compounds and require days on thousands of   standard compute nodes to identify new drug  candidates Nvidia biion Nemo Nims enable a   new generative screening Paradigm using Nims  for protein structure prediction with Alpha   fold molecule generation with MIM and docking  with diff dock we can now generate and Screen   candidate molecules in a matter of minutes MIM  can connect to custom applications to steer the   generative process iteratively optimizing for  desired properties these applications can be   defined with biion Nemo microservices or built  from scratch here a physics based simulation   optimizes for a molecule's ability to bind to  a Target protein while optimizing for other   favorable molecular properties in parallel MIM  generates high quality drug-like molecules that   bind to the Target and are synthesizable  translating to a higher probability of   developing successful medicines faster biion Nemo  is enabling a new paradigm in drug Discovery with   Nims providing OnDemand microservices  that can be combined to build powerful   drug Discovery workflows like denovo protein  design or ided molecule generation for virtual   screening bio Nims are helping researchers  and developers reinvent computational drug design Nvidia M MIM MIM cord diff there's a whole  bunch of other models whole bunch of other models   computer vision models robotics models and  even of course some really really terrific   open source language models these models are  groundbreaking however it's hard for companies   to use how would you use it how would you bring  it into your company and integrate it into your   workflow how would you package it up and run it  remember earlier I just said that inference is   an extraordinary computation problem how would  you do the optimization for each and every one   of these models and put together the Computing  stack necessary to run that supercomputer so that   you can run the models in your company and so we  have a great idea we're going to invent a new way   invent a new way for you to receive and operate  software this software comes basically in a   digital box we call it a container and we call it  the Nvidia inference micr service a Nim and let me   explain to you what it is a Nim it's a pre-trained  model so it's pretty clever and it is packaged   and optimized to run across nvidia's install  base which is very very large what's inside   it is incredible you have all these pre-trained  state-ofthe-art open source models they could be   open source they could be from one of our partners  it could be created by us like Nvidia mull it is   packaged up with all of its dependencies so Cuda  the right version CNN the right version tensor RT   llm Distributing across the multiple gpus Tred and  inference server all completely packaged together   it's optimized depending on whether you have a  single GPU multi- GPU or multi node of gpus it's   optimized for that and it's connected up with apis  that are simple to use now this think about what   an AI API is an AI API is an interface that you  just talk to and so this is a piece of software in   the future that has a really simple API and that  API called human and these packages incredible   bodies of software will be optimized and packaged  and we'll put it on a website and you can download   it you could take it with you you could run  it in any Cloud you can run it in your own   data center you can run in workstations if it fit  and all you have to do is come to ai. nvidia.com   we call it Nvidia inference microservice but  inside the company we all call it Nims okay just imagine you know one of some someday there  there's going to be one of these chat Bots and   these chat Bots is going to just be in a Nim and  you you'll uh you'll assemble a whole bunch of   chat Bots and that's the way software is going  to be be built someday how do we build software   in the future it is unlikely that you'll write  it from scratch or write a whole bunch of python   code or anything like that it is very likely that  you assemble a team of AIS there's probably going   to be a super AI that you use that takes the  mission that you give it and breaks it down   into an execution plan some of that execution plan  could be handed off to another Nim that Nim would   maybe uh understand sap the language of sap is  abap it might understand service now and it go   retrieve some information from their platforms  it might then hand that result to another Nim   who that goes off and does some calculation  on it maybe it's an optimization software a   combinatorial optimization algorithm maybe it's  uh you know some just some basic calculator maybe   it's pandas to do some numerical analysis on it  and then it comes back with its answer and it   gets combined with everybody else's and it because  it's been presented with this is what the right   answer should look like it knows what answer what  an what right answers to produce and it presents   it to you we can get a report every single day at  you know top of the hour uh that has something to   do with a bill plan or some forecast or uh some  customer alert or some bugs database or whatever   it happens to be and we could assemble it using  all these Nims and because these Nims have been   packaged up and ready to work on your systems so  long as you have video gpus in your data center in   the cloud this this Nims will work together as a  team and do amazing things and so we decided this   is such a great idea we're going to go do that and  so Nvidia has Nims running all over the company we   have chatbots being created all over the place  and one of the mo most important chatbots of   course is a chip designer chatbot you might not  be surprised we care a lot about building chips   and so we want to build chatbots AI co-pilots that  are co-designers with our engineers and so this   is the way we did it so we got ourselves a llama  llama 2 this is a 70b and it's you know packaged   up in a NM and we asked it you know uh what is a  CTL Well turns out CTL is an internal uh program   and it has a internal proprietary language but  it thought the CTL was a combinatorial timing   logic and so it describes you know conventional  knowledge of CTL but that's not very useful to us   and so we gave it a whole bunch of new examples  you know this is no different than employee   onboarding an employee uh we say you know thanks  for that answer it's completely wrong um and and   uh and then we present to them uh this is what  a CTL is okay and so this is what a CTL is at   Nvidia and the CTL as you can see you know CTL  stands for compute Trace Library which makes   sense you know we were tracing compute Cycles  all the time and it wrote the program isn't that amazing and so the productivity of our chip  designers can go up this is what you can do with   a Nim first thing you can do with is customize  it we have a service called Nemo microservice   that helps you curate the data preparing the  data so that you could teach this on board   this AI you fine-tune them and then you guardrail  it you can even evaluate the answer evaluate its   performance against um other other examples and  so that's called the Nemo micr service now the   thing that's that's emerging here is this there  are three elements three pillars of what we're   doing the first pillar is of course inventing  the technology for um uh AI models and running   AI models and packaging it up for you the second  is to create tools to help you modify it first   is having the AI technology second is to help you  modify it and third is infrastructure for you to   fine-tune it and if you like deploy it you could  deploy it on our infrastructure called dgx cloud   or you can employ deploy it on Prem you can deploy  it anywhere you like once you develop it it's   yours to take anywhere and so we are effectively  an AI Foundry we will do for you and the industry   on AI what tsmc does for us building chips and  so we go to it with our go to tsmc with our big   Ideas they manufacture and we take it with us and  so exactly the same thing here AI Foundry and the   three pillar ERS are the NIMS Nemo microservice  and dgx Cloud the other thing that you could teach   the Nim to do is to understand your proprietary  information remember inside our company the vast   majority of our data is not in the cloud it's  inside our company it's been sitting there you   know being used all the time and and gosh it's  it's basically invidious intelligence we would   like to take that data learn its meaning like we  learned the meaning of almost anything else that   we just talked about learn its meaning and then  reindex that knowledge into a new type of database   called a vector database and so you essentially  take structured data or unstructured data you   learn its meaning you encode its meaning so now  this becomes an AI database and that AI database   in the future once you create it you can talk to  it and so let me give you an example of what you   could do so suppose you create you get you got a  whole bunch of multi modality data and one good   example of that is PDF so you take the PDF you  take all of your PDFs all the all your favorite   you know the stuff that that is proprietary to  you critical to your company you can encode it   just as we encoded pixels of a cat and it becomes  the word cat we can encode all of your PDF and it   turns into vectors that are now stored inside  your vector database it becomes the proprietary   information of your company and once you have  that proprietary information you can chat to   it it's an it's a smart database and so you just  ch chat with data and how how much more enjoyable   is that you know we for for our software team  you know they just chat with the bugs database   you know how many bugs was there last night um  are we making any progress and then after you're   done talking to this uh bugs database you need  therapy and so so we have another chatbot for you you can do it okay so we call this Nemo Retriever and the  reason for that is because ultimately it's job   is to go retrieve information as quickly as  possible and you just talk to it hey retrieve   me this information it goes if brings it back to  you and do you mean this you go yeah perfect okay   and so we call it the Nemo retriever well the  Nemo service helps you create all these things   and we have all all these different Nims we  even have Nims of digital humans I'm Rachel   your AI care manager okay so so it's a really  short clip but there were so many videos to   show you I guess so many other demos to show  you and so I I had to cut this one short but   this is Diana she is a digital human Nim  and and uh you just talked to her and she's   connected in this case to Hippocratic ai's large  language model for healthcare and it's truly amazing she is just super smart about Healthcare  things you know and so after you're done after my   my Dwight my VP of software engineering talks to  the chatbot for bugs database then you come over   here and talk to Diane and and so so uh Diane  is is um completely animated with AI and she's   a digital human uh there's so many companies that  would like to build they're sitting on gold mines   the the Enterprise IT industry is sitting on a  gold mine it's a gold mine because they have so   much understanding of of uh the way work is  done they have all these amazing tools that   have been created over the years and they're  sitting on a lot of data if they could take   that gold mine and turn them into co-pilots  these co-pilots could help us do things and   so just about every it franchise it platform in  the world that has valuable tools that people use   is sitting on a gold mine for co-pilots and  they would like to build their own co-pilots   and their own chatbots and so we're announcing  that Nvidia AI foundary is working with some   of the world's great companies sap generates  87% of the world's Global Commerce basically   the world runs on sap we run on sap Nvidia and  sap are building sap Jewel co-pilots uh using   Nvidia Nemo and dgx cloud service now they run  80 85% of the world's Fortune 500 companies run   their people and customer service operations on  service now and they're using Nvidia AI Foundry   to build service now uh assist virtual assistance  cohesity backs up the world's data they're sitting   on a gold mine of data hundreds of exobytes of  data over 10,000 companies Nvidia AI Foundry is   working with them helping them build their Gaia  generative AI agent snowflake is a company that   stores the world's uh digital Warehouse in  the cloud and serves over 3 billion queries   a day for 10,000 Enterprise customers snowflake is  working with Nvidia AI Foundry to build co-pilots   with Nvidia Nemo and Nims net apppp nearly half  of the files in the world are stored on Prem on   net apppp Nvidia AI Foundry is helping them uh  build chat Bots and co-pilots like those Vector   databases and retrievers with Nvidia neemo and  Nims and we have a great partnership with Dell   everybody who everybody who is building these  chat Bots and generative AI when you're ready   to run it you're going to need an AI Factory and  nobody is better at Building end-to-end Systems   of very large scale for the Enterprise than  Dell and so anybody any company every company   will need to build AI factories and it turns out  that Michael is here he's happy to take your order ladies and gentlemen Michael del okay let's talk about the next wave of  Robotics the next wave of AI robotics physical   AI so far all of the AI that we've talked about is  one computer data comes into one computer lots of   the world's if you will experience in digital  text form the AI imitates Us by reading a lot   of the language to predict the next words it's  imitating You by studying all of the patterns   and all the other previous examples of course it  has to understand context and so on so forth but   once it understands the context it's essentially  imitating you we take all of the data we put it   into a system like dgx we compress it into a  large language model trillions and trillions of   parameters become billions and billion trillions  of tokens becomes billions of parameters these   billions of parameters becomes your AI well  in order for us to go to the next wave of AI   where the AI understands the physical world we're  going to need three computers the first computer   is still the same computer it's that AI computer  that now is going to be watching video and maybe   it's doing synthetic data generation and maybe  there's a lot of human examples just as we have   human examples in text form we're going to have  human examples in articulation form and the AIS   will watch us understand what is happening and try  to adapt it for themselves into the context and   because it can generalize with these Foundation  models maybe these robots can also perform in   the physical world fairly generally so I just  described in very simple terms essentially what   just happened in large language models except  the chat GPT moment for robotics may be right   around the corner and so we've been building  the end to-end systems for robotics for some   time I'm super super proud of the work we have  the AI system dgx we have the lower system which   is called agx for autonomous systems the world's  first robotics processor when we first built this   thing people are what are you guys building  it's a s so it's one chip it's designed to be   very low power but it's designed for high-speed  sensor processing and Ai and so if you want to   run Transformers in a car or you want to run  Transformers in a in a you know anything um   that moves uh we have the perfect computer for you  it's called the Jetson and so the dgx on top for   training the AI the Jetson is the autonomous  processor and in the middle we need another   computer whereas large language models have the  benefit of you providing your examples and then   doing reinforcement learning human feedback what  is the reinforcement learning human feedback of a   robot well it's reinforcement learning physical  feedback that's how you align the robot that's   how you that's how the robot knows that as  it's learning these articulation capabilities   and manipulation capabilities it's going to adapt  properly into the laws of physics and so we need   a simulation engine that represents the world  digitally for the robot so that the robot has   a gym to go learn how to be a robot we call that  virtual world Omniverse and the computer that runs   Omniverse is called ovx and ovx the computer  itself is hosted in the Azure Cloud okay and   so basically we built these three things these  three systems on top of it we have algorithms   for every single one now I'm going to show you one  super example of how Ai and Omniverse are going to   work together the example I'm going to show you  is kind of insane but it's going to be very very   close to tomorrow it's a robotics building this  robotics building is called a warehouse inside the   robotics building are going to be some autonomous  systems some of the autonomous systems are going   to be called humans and some of the autonomous  systems are going to be called forklifts and   these autonomous systems are going to interact  with each other of course autonomously and it's   going to be overlooked upon by this Warehouse to  keep everybody out of Harm's Way the warehouse   is essentially an air traffic controller and  whenever it sees something happening it will   redirect traffic traffic and give New Way points  just new way points to the robots and the people   and they'll know exactly what to do this warehouse  this building you can also talk to of course you   could talk to it hey you know sap Center how are  you feeling today for example and so you could   ask the same the warehouse the same questions  basically the system I just described will have   Omniverse Cloud that's hosting the virtual  simulation and AI running on djx cloud and   all of this is running in real time let's take  a look the future of heavy industri starts as a   digital twin the AI agents helping robots workers  and infrastructure navigate unpredictable events   in complex industrial spaces will be built  and evaluated first in sophisticated digital   twins this Omniverse digital twin of a 100,000 ft  Warehouse is operating as a simulation environment   that integrates digital workers amrs running the  Nvidia Isaac receptor stack centralized activity   maps of the entire Warehouse from 100 simulated  ceiling mount cameras using Nvidia metropolis   and AMR route planning with Nvidia Koop software  in Loop testing of AI agents in this physically   accurate simulated environment enables us to  evaluate and refine how the system adapts to real   world unpredictability here an incident occurs  along this amr's planned route blocking its path   as it moves to pick up a pallet Nvidia Metropolis  updates and sends a realtime occupancy map to kopt   where a new optimal route is calculated the AMR  is enabled to see around corners and improve its   Mission efficiency with generative AI powered  Metropolis Vision Foundation models operators   can even ask questions using natural language the  visual model understands nuanced activity and can   offer immediate insights to improve operations  all of the sensor data is created in simulation   and passed to the real-time AI running as Nvidia  inference microservices or Nims and when the AI is   ready to be deployed in the physical twin the real  Warehouse we connect metropolis and Isaac Nims   to real sensors with the ability for continuous  Improvement of both the digital twin and the AI models isn't that incredible and so remember  remember a future facility Warehouse Factory   building will be software defined and so the  software is running how else would you test   the software so you you you test the software to  building the warehouse the optimization system in   the digital twin what about all the robots all of  those robots you are seeing just now they're all   running their own autonomous robotic stack and so  the way you integrate software in the future cicd   in the future for robotic systems is with digital  twins we've made Omniverse a lot easier to access   we're going to create basically Omniverse Cloud  apis four simple API and a channel and you can   connect your application to it so this is this  is going to be as wonderfully beautifully simple   in the future that Omniverse is going to be and  with these apis you're going to have these magical   digital twin capability we also have turned om ver  into an AI and integrated it with the ability to   chat USD the the language of our language is  you know human and Omniverse is language as   it turns out is universal scene description and  so that language is rather complex and so we've   taught our Omniverse uh that language and so  you can speak to it in English and it would   directly generate USD and it would talk back  in USD but Converse back to you in English you   could also look for information in this world  semantically instead of the world being encoded   semantically in in language now it's encoded  semantically in scenes and so you could ask   it of of uh certain objects or certain conditions  and certain scenarios and it can go and find that   scenario for you it also can collaborate with  you in generation you could design some things   in 3D it could simulate some things in 3D or  you could use AI to generate something in 3D   let's take a look at how this is all going to work  we have a great partnership with Seamans Seamans   is the world's largest industrial engineering  and operations platform you've seen now so many   different companies in the industrial space heavy  Industries is one of the greatest final frontiers   of it and we finally now have the Necessary  Technology to go and make a real impact seens   is building the industrial metaverse and today  we're announcing that Seamans is connecting their   Crown Jewel accelerator to Nvidia Omniverse let's  take a look seens technology is transformed every   day for everyone team Center acts our leading  product life cycle management software from the   sems accelerator platform is used every day by  our customers to develop and deliver products   at scale now we are bringing the real and the  digital worlds even Closer by integrating Nvidia   Ai and Omniverse Technologies into team Center X  Omniverse apis enable data interoperability and   physics-based rendering to Industrial scale design  and Manufacturing projects our customers HD market   leader in sustainable ship manufacturing builds  ammonia and hydrogen power chips often comprising   over 7 million discrete Parts with Omniverse apis  team Center X lets companies like HD yundai unify   and visualize these massive engineering data  sets interactively and integrate generative AI   to generate 3D objects or HDR I backgrounds  to see their projects in context the result   an ultra inuitive photoal physics-based digital  twin that eliminates waste and errors delivering   huge savings in cost and time and we are building  this for collaboration whether across more semens   accelerator tools like seens anex or Star CCM  Plus or across teams working on their favorite   devices in the same scene together in this is  just the beginning working with Nvidia we will   bring accelerated Computing generative Ai and  Omniverse integration across the Sean accelerator portfolio the pro the the professional  the professional voice actor happens to   be a good friend of mine Roland Bush  who happens to be the CEO of seens once you get Omniverse connected into your  workflow your ecosystem from the beginning   of your design to engineering to manufacturing  planning all the way to digital twin operations   once you connect everything together it's insane  how much productivity you can get and it's just   really really wonderful all of a sudden everybody  is operating on the same ground truth you don't   have to exchange data and convert data make  mistakes everybody is working on the same   ground truth from the design Department to the  art Department the architecture Department all   the way to the engineering and even the marketing  department let's take a look at how Nissan has   integrated Omniverse into their workflow  and it's all because it's connected by all   these wonderful tools and these developers  that we're working with take a look unbel [Music] for for that was not an animation that was Omniverse today   we're announcing that Omniverse  Cloud streams to The Vision Pro and it is very very strange that you walk around  virtual doors when I was getting out of that car   and everybody does it it is really really quite  amazing Vision Pro connected to Omniverse portals   you into Omniverse and because all of these CAD  tools and all these different design tools are   now integrated and connected to Omniverse you can  have this type of workflow really incredible let's   talk about robotics everything that moves will be  robotic there's no question about that it's safer   it's more convenient and one of the largest  Industries is going to be Automotive we build   the robotic stack from top to bottom as I was  mentioned from the computer system but in the case   of self-driving cars including the self-driving  application at the end of this year or I guess   beginning of next year we will be shipping in  Mercedes and then shortly after that jlr and   so these autonomous robotic systems are software  defined they take a lot of work to do has computer   vision has obviously artificial intelligence  control and planning all kinds of very complicated   technology and takes years to refine we're  building the entire stack however we open up our   entire stack for all of the automotive industry  this is just the way we work the way we work in   every single industry we try to build as much of  it as we can so that we understand it but then   we open it up so everybody can access it whether  you would like to buy just our computer which is   the world's only full functional save asld system  that can run AI this functional safe asld quality   computer or the operating system on top or of  course our data centers which is in basically   every AV company in the world however you would  like to enjoy it we're delighted by it today we're   announcing that byd the world's largest ev company  is adopting our next Generation it's called Thor   Thor is designed for Transformer engines Thor  our next Generation AV computer will be used by byd you probably don't know this fact that we  have over a million robotics developers we created   Jetson this robotics computer we're so proud of  it the amount of software that goes on top of it   is insane but the reason why we can do it at all  is because it's 100% Cuda compatible everything   that we do everything that we do in our company  is in service of our developers and by us being   able to maintain this Rich ecosystem and make  it compatible with everything that you access   from us we can bring all of that incredible  capability to this little tiny computer we   call Jetson a robotics computer we also today  are announcing this incredibly Advanced new SDK   we call it Isaac perceptor Isaac perceptor most  most of the Bots today are pre-programmed they're   either following rails on the ground digital rails  or theyd be following April tags but in the future   they're going to have perception and the reason  why you want that is so that you could easily   program it you say would you like to go from  point A to point B and it will figure out a way   to navigate its way there so by only programming  waypoints the entire route could be adaptive the   entire environment could be reprogrammed just  as I showed you at the very beginning with the   warehouse you can't do that with pre-programmed  agvs if those boxes fall down they just all gum   up and they just wait there for somebody to come  clear it and so now with the Isaac perceptor   we have incredible state-of-the-art Vision  odometry 3D reconstruction and in addition   to 3D reconstruction depth perception the reason  for that is so that you can have two modalities   to keep an eye on what's happening in the world  Isaac perceptor the most used robot today is the   manipulator manufacturing arms and they are also  pre-programmed the computer vision algorithms the   AI algorithms the control and path planning  algorithms that are geometry aware incredibly   computational intensive we have made these Cuda  accelerated so we have the world's first Cuda   accelerated motion planner that is geometry aware  you put something in front of it it comes up with   a new plan and our articulates around it it has  excellent perception for pose estimation of a 3D   object not just not it's pose in 2D but it's pose  in 3D so it has to imagine what's around and how   best to grab it so the foundation pose the grip  foundation and the um articulation algorithms   are now available we call it Isaac manipulator and  they also uh just run on nvidia's computers we are   are starting to do some really great work in the  next generation of Robotics the next generation   of Robotics will likely be a humanoid robotics  we now have the Necessary Technology and as I   was describing earlier the Necessary Technology  to imagine generalized human robotics in a way   human robotics is likely easier and the reason  for that is because we have a lot more imitation   training data that we can provide there robots  because we are constructed in a very similar   way it is very likely that the human robotics  will be much more useful in our world because   we created the world to be something that we can  interoperate in and work well in and the way that   we set up our workstations and Manufacturing  and Logistics they were designed for for humans   they were designed for people and so these human  robotics will likely be much more productive to   deploy while we're creating just like we're doing  with the others the entire stack starting from the   top a foundation model that learns from watching  video human IM human examples it could be in video   form it could be in virtual reality form we then  created a gym for it called Isaac reinforcement   learning gym which allows the humanoid robot to  learn how to adapt to the physical world and then   an incredible computer the same computer that's  going to go into a robotic car this computer   will run inside a human or robot called Thor  it's designed for Transformer engines we've   combined several of these into one video this is  something that you're going to really love take a look it's not enough for humans  to [Music] imagine we have to invent and explore real and push  Beyond what's been done fair amount of detail we create smarter and faster we push it to fail so it can learn we teach it then help it teach  itself we broaden its understanding to take on new challenges with absolute precision and succeed we make it perceive and move and even reason so it can share our world with us [Music] 1:52:22.520,1193:02:47.295 [Music] this is where inspiration leads us the  next Frontier this is Nvidia Project Groot a general purpose Foundation model for  humanoid robot learning the group model takes   multimodal instructions and past interactions  as input and produces the next action for the   robot to execute we developed Isaac lab a  robot learning application to train gr on   Omniverse Isaac Sim and we scale out with osmo a  new compute orchestration service that coordinates   work flows across dgx systems for training and  ovx systems for simulation with these tools we   can train Groot in physically based simulation  and transfer zero shot to the real world the   Groot model will enable a robot to learn from a  handful of human demonstrations so it can help   with everyday tasks and emulate human movement  just by observing us this is made possible   with nvidia's technologies that can understand  humans from videos train models and simulation   and ultimately deploy them directly to physical  robots connecting group to a large language model   even allows it to generate motions by following  natural language instructions hi go1 can you give   me a high five sure thing let's high five can  you give us some cool moves sure check this out all this incredible intelligence is  powered by the new Jetson Thor robotics   chips designed for Groot built for the  future with Isaac lab osmo and Groot   we're providing the building blocks  for the next generation of AI powered [Applause] robotics [Music] about the same size the soul of Nvidia the intersection  of computer Graphics physics artificial   intelligence it all came to bear at this moment  the name of that project general robotics 003 I know super good super good well  I think we have some special guests do [Music] we hey guys so I understand you guys  are powered by Jetson they're   powered by Jetson little Jetson robotics  computers inside they learn to walk in Isaac Sim ladies and gentlemen this this is orange and   this is the famous green they are the  bdx robots of Disney amazing Disney research come on you guys let's wrap  up let's go five things where you going I sit right here Don't Be Afraid come here green hurry up what are you saying no it's  not time to eat it's not time to I'll I'll give you a snack in a moment let  me finish up real quick come on green hurry up   stop wasting time five things five things first  a new Industrial Revolution every data center   should be accelerated a trillion dollars worth  of installed data centers will become modernized   over the next several years second because  of the computational capability we brought   to bear a new way of doing software has emerged  generative AI which is going to create new in new   infrastructure dedicated to doing one thing and  one thing only not for multi-user data centers but   AI generators these AI generation will create  incredibly valuable software a new Industrial   Revolution second the computer of this revolution  the computer of this generation generative AI   trillion parameters blackw insane amounts of  computers and computing third I'm trying to concentrate good job third new computer new  computer creates new types of software new   type of software should be distributed in a  new way so that it can on the one hand be an   endpoint in the cloud and easy to use but still  allow you to take it with you because it is your   intelligence your intelligence should be pack  packaged up in a way that allows you to take   it with you we call them Nims and third  these Nims are going to help you create a   new type of application for the future not  one that you wrote completely from scratch   but you're going to integrate them like teams  create these applications we have a fantastic   capability between Nims the AI technology the  tools Nemo and the infrastructure dgx cloud in   our AI Foundry to help you create proprietary  applications proprietary chat Bots and then   lastly everything that moves in the future  will be robotic you're not going to be the   only one and these robotic systems whether they  are humanoid amrs self-driving cars forklifts   manipulating arms they will all need one thing  Giant stadiums warehouses factories there can to   be factories that are robotic orchestrating  factories uh manufacturing lines that are   robotics building cars that are robotics these  systems all need one thing they need a platform   a digital platform a digital twin platform and  we call that Omniverse the operating system of   the robotics World these are the five things that  we talked about today what does Nvidia look like   what does Nvidia look like when we talk about  gpus there's a very different image that I have   when I when people ask me about gpus first I see  a bunch of software stacks and things like that   and second I see this this is what we announce  to you today this is Blackwell this is the plat amazing amazing processors MV link switches  networking systems and the system design is   a miracle this is Blackwell and this  to me is what a GPU looks like in my mind listen orange green I think we  have one more treat for everybody   what do you think should we okay we  have one more thing to show you roll [Music] it [Music] [Music] he [Music] 2:01:21.920,1193:02:47.295 [Music] [Music] m [Music] yeah [Music] [Music] thank you thank you have a great have a  great GTC thank you all for coming thank you