Transcript for:
Evolution of AI Frameworks: PyTorch's Journey

these are fun questions obviously we're in the middle of an explosion in the whole AI space the state of AI today would be very much less developed if P torch hadn't have come out you wouldn't really have Chad GPT today you wouldn't have stable Fusion if we roll back in time if we don't know all this at the time we make this decision we are not 100% sure [Music] AI was a kind of a blackbox in some sense that people have to know how to write a program and compile it and then run it and so it was not easy for researchers of many interdisciplinary domains there was quite a lot of variety back then so there was a lot of focus on experimentation a lot of innovation was happening but it was pocketed like you know the big players were doing individual model development and a lot of it was siloed within you know individual organizations or research institutions there were like 15 or 20 tools that at that time were all trying to enable research lots of directions for AI research uh the software ecosystem system was fragmented incredibly fragmented you had torch of course with Lua you had theano um out of Montreal um you had cafea coming out of UC Berkeley they crossed the bar between being acute tool and being actually useful people started wondering if there are very nice tools to run these kind of methods and as well as having more new algorithms coming out circuit 2015 uh there were a number of different Frameworks and the challenge we had uh at the time was taking this significant number of Frameworks and providing the support for all those we could hear from our customers clearly that the machine learning was a an area of even back then a significant interest the challenge was how to actually deliver support for that in a focused way it was exciting because it was both chaotic and it was high budget it was that moment where it's like we still haven't converged yet and there's like money everywhere there's resources everywhere I think one word to put it is [Music] chaos Facebook started a fundamental AI research lab it was called Facebook AI research after Facebook changed their name to meta it became fundamental AI research there was a unique opportunity in creating an organization which would have a very ambitious long-term scientific goals but at the same time would have a big impact on the world fair was uh trying to make progress in computer vision which is if you want a a a computer to understand everything thing as you see things if you say this is a table this is a chair this is a dog this is someone's feeling you want the computer to to be able to grasp all of those semantic things as well Facebook then is also going through transition on mobile first right it was a desktop first application and a lot of AI features at early stage to haveen on mobile and how to fit into those AI features on mobile is a big problem and that's where Cafe 2 get started and its starting point was later focus on making um those air features work on mobile phone natively it was actually shipped across Facebook's Fleet and on uh news feed ads search and things like that as well as computer vision natural language processing it also got shipped onto the phones we had a cute name called Cafe to go that allowed it to be shipped on uh something like a billion phones [Music] even though it wasn't uh quite the the best framework for research or exploring new ideas it really became kind of this project that everyone started using and really even started using in its scale TD we have worldclass researchers uh they are pushing the boundaries so they need the basketball tools and there's a huge gap but we first need to like check in 2015 I was leading the Lua tourch team and at that time I was helping people both at fair and outside within the industry Deep Mind Twitter Facebook and a bunch of other academic and Industrial Labs were using L torch and if you look though at the developer experience or or really the tools that uh were available I mean none of them were very good um it was you know I think probably the best experience was probably torch but it but no one really liked to use Lua uh it just didn't didn't have an exper like an ecosystem or Community around it being out grown AI researchers out growing low at toward so someone needs to go build a new tool in 2015 December tensor Flow came out it was from Google high budget High marketing like all the way up to the CEO they just like you know took over the world with the news [Music] one of the things that was distinctive between tensor flow and all these other tools was all these other tools were started by enthusiasts by started by researchers as a way for them to do their work whereas tensorflow was built from the ground app with Google engineering people realized tensorflow had a certain credibility and a certain polish that all these other tools either need to catch up with or they would just lose out Facebook felt that they needed AI to make faster progress for Facebook itself to scale better Facebook saw a a huge need in AI applications both on the cloud site for search advertisement Fe news feed and also for uh basically recommending people interest in multimedia contents interest in cats and things like that so the need for a high performance product ready framework was actually one of the biggest need and I think that was uh about the time when I joined Facebook and one of the very first tasks was to basically build a framework that will be battle tested for you know hundreds of trillions of predictions per day and all kinds of different deployment environments so in 2016 May one of uh one of the the lower torch community members Adam pashka he was a first year undergrad at that time in University of warsa he messaged me and he said hey I I know it's a little late in the intern cycle but I couldn't find an internship anywhere um do you have any openings and I said yeah I do why don't you just come and build build a next version of torch I was kind of excited about machine learning I just thought it was a really cool field and I wanted to learn more about it I was looking around and I saw T seven people were excited one of The Big Industry Labs was using it fair in this case you know I use it to learn machine learning but I guess over time I kind of naturally just realized that actually I ended up being more interested in kind of building the library itself more than actually using it so from 2015 December through 2016 April we separated the back end of L torch which is all the software that runs scientific Computing code on gpus and CPUs from the front end on how you express your ideas we hadn't even thought through what we want to do about it but we at least were like well you know let's just clean up the back end we essentially started taking the sort of implementation of the math functions that were all like in lower level languages just like C C++ and just separated them from the Bindings that exposed them to Lua right which were essentially half of torch 7 um and by the time the separation was complete by the time we had those libraries that you know could be essentially linked to any other language then it was question of which language besides the what we choose and so since you know it was a pretty natural question all the other all the other libraries were already in Python and people were happy with it and they were liking it so you know we were like it's kind of a fairly natural next step just to try to take what we have now that it doesn't only work in Lua and just try to bind it into Python and so I just started reading cpython documentation you know they actually have really good documentation for how you you know expose new types how do you expose C functions and stuff like that and so I just you know slowly started building out um the Prototype just from the ground up um over there and so in 2017 January we released it and I think one thing we're saying is like we knew we were like a small effort tried to stay very focused on the market we knew well we said we'll just go for AI researchers that's who we know well that's who we serve well and we'll take it from there we'll see where this goes what made it worth it was several people giving us feedback that we are enabling them to do great [Music] work my wife and I started fast.ai with the view that we wanted to teach people that don't have a PhD how to take advantage of neural networks and deep learning so we wanted to we were thinking like okay can we create a introductory deep learning course and then I literally ah hit a wall I was trying to implement recurrent neural networks which is the foundation at the time of natural language processing you know I had an algorithm I was trying to implement and I just I wasn't smart enough to do it I couldn't do it I couldn't get it working and it was unbelievable timing cuz like literally that week I have a week to prepare the course I'm two days in I'm stuck and an announcement appears of a new library called pytorch The key thing about pytorch is that it really leveraged and engaged with and highlighted the Python programming language you write a line of normal python code and it runs it straight away and it gives you the result it just all made sense and so literally I learned pytorch implemented a new research idea wrote the lesson about it in the 5 days that I had left it was really transformative for me what we were doing at the time was like we had these monkeys and when the monkey passed away a lab at Stanford would like cut out the Rea put an electrode and then we would show image and that images to it and so the the neurons in the retina respond to the signals and so you end up getting like a bunch of spikes basically right so you get zeros and ones when the neurons see slide or it doesn't see it so we were doing actually decoding so we had the signals just a bunch of ones and zeros uh that came from neurons and then could we figure out what image it was based on that so we use something called Gans generative Auto encoder which is now I think the OG gen right because it's you you gave it a bunch of inputs and then it gives you an image right now the images back there were super blurry not like today sues um and so we were using de learning to do that so when I was trying these different ideas tensor Flow came out and like I I switched everything and then rewrote everything tensorflow because I wanted to run a multiple gpus tensorflow was like we want you to write code like this you know and python doesn't really look like that so we will like smash at python until it kind of fits our view of how you're meant to write neural networks or else what suoth and Adam did with pytorch was they said like okay here's python how can we really use that language and to connect it to what deep learning needs and it was like really delightful pie torch came out and then I was like oh this makes a lot more sense for a researcher it just easier to to look at the code and like map it to the math and like I could read it in like seven lines of code I could see what we were doing in 50 other lines of code so it's just it was like night and day so it took a few days and then we were on the pytor train after that the first few months after we launched we really Amplified the community that's that's mainly what we did so if users found that there was an issue we would turn around uh you know a PR to fix it within hours and in just that fast cycle time it earns trust with the the users there was P biky he was a researcher in Germany and he started jumping into the forums answering questions alongside us I think he has at this point some ridiculous number he answered like 10,000 20,000 questions on the Forum people are like oh you're doing a research then obviously you ran into this person which is like his profile picture in 2017 I think we were working 16 plus hour days saying we were just happy we we were getting to do it end of 2017 a new VP of AI came in to Facebook and that person was Jerome penti and he looked at the problem and he saw a clear problem just internally at Facebook he said look my research team is doing great my production team is doing great there's not a lot of transfer that's happening between research and production and why is that uh the situation was we had a great system running for the product Side Cafe 2 and we had a great system running for the researchers the early version of high torch now the challenge is that there isn't a way for us to move between these two worlds as easy as possible so we were like well how do we actually solve that problem the very first thing we did was to build an intermediate language that would allow us to ship models from one to another more easily and uh we had an internal project called the toffee IR toffee is basically this torch and this Cafe right that ran for a while and at some point we were like was pretty useful so we're going to be open sourcing it and driving it together with the industry so Microsoft AWS am media and a bunch of other partners basically group group together and then renamed the project Onyx open neuron Network exchange and and we open sourced it as one common standard by like mid 2018 it was pretty evident within the industry that there's only two Frameworks who are going to capture most of the market and there was a rapid consolidation happening that led to us being like well if there's only two Frameworks why do Onyx at all so instead what we we did at Facebook was we just decided to merge pyrogen Cafe 2 to do both research and production well and so we started the Pyro 1.0 effort basically trying to uh combine the frontend the intermediate abstractions and the back end together so that we have one unified technical stack to run this end to end we call the zipper approach the idea here is let's take the also frontend of pie torch is super easy to use and the high performance backand of Cafe 2 and zip them together it should just work beautifully but we tried for a couple month and then we realized oh um the API completely doesn't align with each other they were never designed to um interoperate with each other and then we took a different approach we 100% focus on pyto just say we keep py toch Co base and we just make it highly performant for production we we don't know if it's going to work at that time but we're going to try pyin Cafe 2 the merger was about cultures coming together understanding each other respecting each other imagine if there are in like a research team who favors flexibility and a product team who favors performance coming in together it's really difficult to figure out what we actually want to achieve what a chaotic moment I have to say that took us I think a good two years uh to work with a part team to Evolve P to into uh full production for presentation the resulting product we called it pyro 1.0 [Music] I think you know when pytorch 1.0 came out and we started to see libraries explode on top of P torch it was libraries for NLP computer vision reinforcement learning uh the whole world started to build on top of it as this kind of stable foundation and one of the things we started seeing after that was really cool almost intimidating to us as a pidor team is is a bunch of self-driving car companies including Tesla and Cru and Uber started using pyour for the self-driving I was actually at Uber uh we were working on self-driving cars uh this was in about 2017 time frame um and there I was responsible for building uh ml platforms at Uber and we had uh multiple research teams uh building uh self-driving car models uh and the first time when I heard about py was one of these research uh folks wanted to try this new framework uh and see how fast it can go um and what was very interesting was that uh uh the speed in which in terms of the it iteration speed we saw from that team significantly was more than the other teams so we could actually move faster just with iteration speed and that's when like I heard about pych for the first time and I started actually dabling after that to me that was like one of my favorite use cases just because of the impact is so easy to grasp I I think there's nothing more real then when you sit back in you know the back seat of an autonomous car and and it's driving you around you're like wow that's you know the tools that I worked on and the community that I I help to build are actually powering this car as I'm driving that kind of brought it home for me it is both extremely cool extremely intimidating and makes us feel very resp responsible for the code we write and making sure we don't have bugs in them but the interest just the excitement from the community was through the roof the only thing we tried to make sure of is we took signal from a diverse enough crowd so that one single segment of people didn't just Define what pyw would be AMD at least at that time we were not ourselves U significant users of machine learning so what we were doing was responding to what our customers were telling us that that they needed to to do their work at around the time the pytorch was introduced it started coming up in conversation and not as the the preferred framework initially but uh a lot of customers were saying that this was something interesting that they were looking at and pretty quickly after that it moved from this was something interesting to this is something we're using and we like uh to in a lot of cases uh this is now our preferred machine learning framework uh so it was clear based on us hearing that from a number of different customers of different sizes across different industries that this was something that was going to be important the software tool needs to work really closely with the hardware it is running on and optimize itself further on that Hardware AMD could see that this is going to be an important project and we started working on taking P torch and uh adopting it in a way that it could take advantage of the hardware acceleration capabilities of particularly our GPU products we did that initially by just taking a fork of the pie torch project and making our own changes and proving them out to ourselves the next step after that was contributing those changes back into the Master pytorch project uh so that they would be available to The Wider community over the course of several years uh we working with the pytorch community established ourselves as a a robust reliable solution uh that wasn't going to put a black eye on pytorch that was going to be additive to the the pytorch value proposition even Google on their tpus which was kind of their Holy Grail or their you know their crown jewels of their Hardware they said well we actually should start thinking about supporting pyour because we're seeing so much demand for it Google were building tpus and they they've been having tpus since 2014 15 uh and Google uh uh seen uh customers wanting to run uh py torch workloads there was this uh giant project across Google Salesforce and uh meta uh in terms of like how do we enable tpus for py to but also proved the point that you could take py to from uh research to production and onto not just gpus which was kind of like the focus before but also to other Hardwares like tpus and that was a pretty big moment and so I think what that helped really is is started to uh it started to really provide the Community Options and and really a way for them to to run on different types of platforms and so the goal I think is really to bring pytorch into every place possible whether that's on the edge or the mobile device or in the cloud uh really want to just make it really accessible for developers whether they're training small models or large models or they're trying to deploy in their applications they need Hardware support and that's where our Hardware Community really jumped in and really made that [Music] possible if you want to run a large AI experiment you have to run it on say 100 gpus or 1,000 gpus so you want to wanted to run somewhere where someone knows how to deal with that much power and heat so Cloud providers obviously are play have played an important role in accelerating overall Computing as pyo was growing and and picking momentum a lot of internal Microsoft teams had U developers uh and and models that were using or have started using py so that was growing what it didn't have at that time was an infrastructure that led us to see the potential of enabling pytorch with Azure and making pytorch a first class citizen on Azure so that more and more Innovation can happen not just within Microsoft but even customers and developers and Enterprise that uses um Azure the engagement of AWS with meta and pytorch happened um in the first year I was at AWS um so this is 2018 2019 a key part of the the AIML strategy for AWS is helping as many people uh as possible access that for their work at the time py chart was not as performant or production ready as some of the other ml Frameworks what was notable for though is that we saw researchers starting to adopt pytorch and we viewed that as an early leading indicator of future uh customer adoption Microsoft uses pytorch for a lot of services and applications across the board be it bing uh cognitive Services office uh the ml models running behind it it's all built with fge like it it's it only makes sense for Microsoft to make sure that the experience and development of pytorch grows our Cloud Partners were instrumental in scaling pytorch uh for sure uh they've acted as um you know a distribution Channel they've acted as a contributor uh to the project uh they've been an enabler of of hardware and platforms uh they've been a promoter and go to market really everything around the ecosystem um you know they've helped you know kind of elevate and accelerate I would say startups are required to have a laser focus on their their core value and most of the startups leveraging AI their core value is not in purchasing servers installing the servers configuring the servers maintaining the server it's 100% distraction and so that the cloud providers I think have played a critical role enabling those startups to be laser focused on the actual problem they wanted to solve for their [Music] customers at that point in time we had big Partners kind of go all in and probably the one of the most notable partners that we had go all in was open Ai and you know they were a big piech user and a big piech you know contributor The Landmark moment in the world knowing something called generate AI was one is gpt3 eventually became Chad GPT and the other is uh stable diffusion whatever you hear uh these days right like be it uh openi models um GPD all the all the fancy AI innovations that you hear about at the back of it uh by torch is driving it the only way that that's been able to happen at this pace is by some degree Dee of openness right and this is both on the open source software side you need tools that are open source that you can build something and then someone else who's not directly working with you can continue with it and to do that you need tools like pie torch and frankly I don't think we would have the explosion we had without you know some of the things we had done uh in all the tools that you know for example uh the stable diffusion team uses pytorch lightning which is a community project that was built on pyour not by our team but by the community and so when you start to create that kind of innovation and it it creates other projects and it then creates tools and things that allows people to innovate in the way that for example stabone has or open AI has you look back and you're like wow like I can't believe this happened um and I can't believe it happened on something we built while great Innovation was happening um we do want to make sure that we remain committed to making sure that uh pych is not just uh a a mechanism or framework that's that's driven by a handful of uh folks so the goal was to more like build this platform uh and there were a lot of other companies built on top of py DOD and have incentive incentives to make py dodge better we want to kind of enable and kind of democratize or make that even bigger my background is in multi-stakeholder Community governed open source meta was from the start really focused on building that community and making it an open and welcoming place for everyone to come and contribute we saw that if we moved P towards into a truly Community govern project it would grow even bigger and keep itself more robust and not fall to the pathologies of one single entity for long-term sustainability anyone who creates an open source project eventually is going to have to give up that control we really got it set up uh in 2022 our founding members were uh Nvidia AMD uh Microsoft Amazon and of course meta um on the board it was they're really chosen I think for really I would say the prior contributions that they made I think the the creation of the pytorch foundation really is a culmination of meta's Direction on this and it's gone really well it is a vibrant group and also a growing group and there's a lot of interesting work that's happening right now in the pytorch foundation to help this uh grow and to be a sustainable and long-term open source project p torge is trying to make sure it keeps up with all the needs of the a industry pyro stands for pragmatism and it will try to make sure it evolves as fast as the AI industry needs it to evolve and now we are running more than 5 trillion influence per day across 50 data centers so and I I believe that number is uh keeps going [Music] up like every day we look around and say like yeah that's that's us in there that's us in there uh it was so much fun to see I I still think we are still like getting started and with the the the future things on uh kind of puts on the on device side that's going to go but in addition to that the newer ml experiences llms how are we going to fine-tune things faster but also kind of being more long-lasting as well I think we all see the world is is in the future much more diverse whether it's on device uh or kind of ambient Computing or wearable so I think like py will be that kind of center of MTH to bring the community together um what we'll see going forward is more and more applications that are built using these powerful llms that are made available the the sky is the limit in terms of where people can have an idea and they are not limited by technology or hardware and they can apply their idea to build solutions that make their life more enriching productive when you apply more hard Ware or more iterations to the problem uh the same basic techniques continue to yield better and better results there's a an almost insatiable demand for more and more compute capacity as researchers and Industrial customers take these particularly large language models as we're talking today and scale them up in ways that even by the standards of the Computing industry are sort of unprecedented the real value of foundational open source projects is a long-term value foundational open source projects whether it's Linux or numpy or Jupiter or P torch the deeper Innovation happens over multi-decade loan periods and what happens over those longer timelines is that other people build layers on top of those open source projects and as individuals and organizations adopt the open source projects they're able to do things that they could never do without those layers of innovation on top of those open source projects my personal feeling is that AI is going to be growing much larger and larger than Frameworks and then pytorch is going to be one integral part of it people are going to be spending less and less time worrying about it but people are going to be more and more dependent on it because it has become a standard piece in the software stack and then I think the next breakthrough will certainly be on pyour I think you know we had gbd3 we've had gbd4 we've had diffusion models that next breakthrough is going to be on PGE [Music] [Music] well I kind of felt proud that PTO is one integral part of this whole journey to uh feel all those Innovations and large scale [Music] application so the air industry at some point like all Industries uh will cap up until then it is on a fast moving curve but as of now py is a fairly General tool where you can express fairly powerful mathematical ideas and that is important for the AI industry