Transcript for:
AI Video Models and Language Models: Current State and Future Predictions

artificial worlds generated by AI video models have never been more tangible and accessible and look set to transform how millions and then billions of people consume content and artificial intelligence in the form of the new free Claude 3.5 Sonic is more capable than it has ever been but I will draw on interviews in the last few days to show that there are more questions than ever not just about the merits of continued scaling of language model but about whether we can rely on the words of those who lead these giant AI orgs but first AI video generation which is truly on fire at the moment these outputs are from Runway gen 3 available to many now and to everyone apparently in the coming days the audio by the way is also AI generated this time from udio [Music] and as you watch these videos remember that the AI models that are generating them are likely trained on far less than 1% of the video data that's a available unlike highquality text Data video data isn't even close to being used up expect generations to get far more realistic and not in too long either and by the way if you're bored while waiting on the Gen 3 wait list do play about with the Luma dream machine I've got to admit it is pretty fun to generate two images or submit two real ones and have the model interpolate between them now those of you in China have actually already been able to play with model of similar capabilities called cing but we are all waiting on the release of Sora the most promising video generation model of them all from open AI here are a couple of comparisons between Runway gen 3 and Sora the prompts used in both cases are identical and there's one example that particularly caught my eye as many of us may have realized by now simply training models on more data doesn't necessarily mean they pick up accurate world models now I strongly suspect that Sora was trained on way more data with way more compute with its generation at the bottom you can see that the dust emerges from behind the car this neatly demonstrates the benefits of scale but still leaves open the question about whether scale will solve all now yes it would be simple to extrapolate a straight line upwards and say that with enough scale we get a perfect world simulation but I just don't think it will be like that and there are already more than tentative hints that scale won't solve everything more on that in just a moment but there is one more modality I am sure we were all looking forward to which is going to be delayed that's the realtime advanced voice mode from open AI it was the star of the demo of GPT 40 and was promised in the coming weeks alas though it has now been delayed to the fall or the Autumn and they say that's in part because they want to improve the model's ability to detect and refuse certain content I also suspect though like dodgy physics with video generation and hallucinations with the language generation they also realized it occasionally goes off the rails now I personally find this funny but you let me know whether this would be acceptable to release refreshing coolness in the air that just makes you want to smile and take a deep breath of that crisp invigorating Breeze the Sun's shining but it's that this lovely gentle warmth that's just perfect for light Jack so either way we're definitely going to have epic entertainment but the question is what's next particularly when it comes to the underlying intelligence of models is it a case of shooting past human level or diminishing returns well here's some anecdotal evidence with the recent release of Claude 3.5 Sonic from anthropic it's free and fast and in certain domains more capable than comparable language models this table I would say shows you a comparison on things like basic mathematical ability and general knowledge compared to models like GPT 40 and Gemini 1.5 Pro from Google I would caution that many of these benchmarks have significant flaws so decimal point differences I wouldn't pay too much attention to the most interesting comparison I would argue is between Claude 3.5 Sonic and Claude 3 Sonet there is some evidence that Claude 3.5 Sonic was trained on about four times as much compute as Claude 3 on it and you can see the difference that makes definitely a boost across the board but it would be hard to argue that it's four times better and in the visual domain it is noticeably better than its predecessor and than many other models and I got Early Access so I tested it a fair bit these kind of benchmarks test reading charts and diagrams and answering basic questions about them but the real question is how much extra compute and therefore money can these companies continue to scale up and invest if the returns are still incremental in other words how much more will you and more importantly businesses continue to pay for these incremental benefits after all in no domains are these models reaching 100% and let me try to illustrate that with an example and as we follow this example ask yourself whether you would pay four times as much for a 5% hallucination rate versus an 8% hallucination rate if in both cases you have to check the answer anyway let me demonstrate with the brilliant new new feature you can use with Claude 3.5 Sonic from anthropic it's called artifacts think of it like an interactive project that you can work on alongside the language model I dumped a multi hundred page document on the model and asked the following question find three questions on functions from this document and turn them into clickable flash cards in an artifact with full answers and explanations revealed interactively it did it and that is amazing but there's one slight problem question one is perfect it's a real question from the document displayed perfectly and interactive with the correct answer and explanation same thing for question two but then we get to question three where it copied the question incorrectly worse than that it rejigged and changed the answer options also is there a real difference between q^2 and netive Q ^2 when it claimed that netive Q ^2 is the answer now you might find this example trivial but I think it's revealing don't get me wrong this feature is mentally useful and it wouldn't take me long to Simply tweak that third question and by the way finding those three examples strewn across a multi hundred page document is impressive even though it would save me some time I would still have to diligently check every character of claude's answer and at the moment as I discussed in more detail in my previous video there is no indication that scale will solve this issue now if you think I'm just quibbling and benchmarks show the real progress well here is the the reasoning lead at Google deepmind working on their Gemini series of models someone pointed out a classic reasoning error made by Claude 3.5 Sonic and Denny XE said this love seeing tweets like this rather than those on llms with phds superhuman intelligence or fancy results on leaked benchmarks I'm definitely not the only one skeptical of Benchmark results and an even more revealing response to Claude 3.5s basic errors came from open AI know Brown I think it's more revealing because it shows that those AI Labs anthropic and open AI had their hopes slightly dashed based on the results they expected in reasoning from multimodal training non Brown said Frontier models like GPT 40 and now clawed 3.5 Sonic maybe at the level of a quote smart high schooler mimicking the words of Mira murati CTO of open aai in some respects but they still struggle on basic tasks like Tic-tac-toe and here's the key quote there was hope that native multimodal training would help with this kind of reasoning but that hasn't been the case that last sentence is somewhat devastating to the naive scaling hypothesis there was hope that native multimodal training on things like video from YouTube would teach models a world model it would help but that hasn't been the case now of course these companies are working on Far More Than Just naive scaling as we'll hear in a moment from Bill Gates but it's not like you can look at the benchmark Mark results on a chart and just extrapolate forwards here's Bill Gates promising two more turns of scaling I think he means two more orders of magnitude but notice how he looks skeptical about how that will be enough the big Frontier is not so much scaling we have probably two more turns of the crank on scaling where by accessing video data and getting very good at synthetic data that we can scale up probably you know two more times that's not the most interesting Dimension the most interesting Dimension is is what I call metacognition where understanding how to think about a problem in a broad sense and step back and say okay how important is this answer how could I check my answer you know what external tools would help me with this so we're get we're going to get the scaling benefits but at the same time the various actions to change the the underlying reing algorithm from the trivial that we have today to more humanlike metacognition that's the big Frontier that uh it's a little hard to prict how quickly that'll happen you know I've seen that we will make progress on that next year but we won't completely solve it uh for some time after that and there were others who used to be incredibly bullish on scaling that now sound a little different here's Microsoft ai's CEO Mustafa sullan perhaps drawing on lessons from the mostly defunct inflection AI that he used to run saying it won't be until GPT 6 the AI models will be able to follow instructions and take consistent action there's a lot of cherry-picked examples that are impressive you know on Twitter and stuff like that but to really get it to consistently do it in novel environments is is pretty hard and I think that it's going to be not one two orders of magnitude more computation of training the models um so not gbt 5 but more like gbt 6 scale models so I think we're talking about two years before we have systems that can really take action now based on the evidence that I put forward in my previous video let me know if you agree with me that I still think that's kind of naive reasoning breakthroughs will rely on new research breakthroughs not just more scale and even samman said as much about a year ago saying the ear ER of ever more scaling of parameter count is over now as we'll hear he has since contradicted that saying current models are small relative to where they'll be but at this point you might be wondering about emergent behaviors don't certain capabilities just spring out when you reach a certain scale well I simply can't resist a quick plug for my new corsera series that is out this week the second module covers a mergent behaviors and if you already have a corsera account do please check it out it' be free for you and if you were thinking of getting one there will be a link in the description anyway here's that quote from samman somewhat contradicting the comments he made a year ago models he says get predictably better with scale we're still just like so early in developing such a complex system um there's data issues There's algorithmic issues uh the models are still quite small relative to what they will be someday and we know they get predictably better but this was the point I was trying to make at the start of the video as I argu in my previous video I think we're now at a time in AI where we really have to work hard to separate the hype from the reality simply trusting the words of the leaders of these AI Labs is less advisable than ever and of course it's not just samman here's the commitment from anthropic led by Dario amade back last year they described why they don't publish their research and they said it's because we do not wish to advance the rate of AI capabilities progress but their CEO 3 days ago said AI is progressing fast due in part to their own efforts to try and keep Pace with the rate at which the complexity of the models is increasing I think this is one of the biggest challenges in the field the field is moving so fast including by our own efforts that we want to make sure that our understanding keeps Pace with our our abilities our capabilities to produce powerful models he then went on to say that today's models are like undergraduates which if you've interacted with these models seem seems pretty harsh on undergraduates if we go back to the analogy of like today's models are like undergraduates uh you know let's say those models get to the point where you know they're kind of you know graduate level or strong professional level think of biology and Drug Discovery think of um a model that is as strong as you know a Nobel prize winning scientist or you know the head of the you know the head of head of drug Discovery at a major pharmaceutical company now I don't know if he's placing that on a naive trust in benchmarks or whether he is deliberately hyping and then later in the conversation with the guy who's in charge of the world's largest Sovereign wealth fund he described how the kind of AI that anthropic works on could be instrumental in curing cancer I look at all the things that have been invented you know if I look back at biology you know crisper the ability to like edit genes if I look at um you know C therapies which have cured certain kinds of cancer there's probably dozens of discoveries like that lying around and if we had a million copies of an AI system that are as knowledgeable and as creative about the field as all those scientists that invented those things then I think the rate of of those discoveries could really proliferate and you know some of our really really longstanding diseases uh you know could be could be addressed or even cured now he added some caveats of course but that was a claim echoed on the same day actually I think by open AI Sam mman one of our partners color health is now using uh gb4 for cancer screening and treatment plans and that's great and then maybe a future version will help uh discover cures for cancer other AI lab leaders like Mark Zuckerberg think those claims are getting out of hand but you know part of that is the open source thing too so that way other companies out there can create different things and people can just hack on it themselves and mess around with it so I guess that's a pretty deep worldview that I have and I don't know I find it a pretty big turnoff when people in the um in the tech industry kind of talk about building this one true AI it's like it's almost as if they they kind of think they're creating God or something and it's it's like it's just that's not that's not what we're doing that's I don't think that's how this plays out implicitly he's saying that companies like open aai and anthropic are getting carried away and later though in that interview the CEO of anthropic admitted that he was somewhat pulling things out of his hat when it came to biology and actually with scaling you know let's say you know you extend people's productive ability to work by 10 years right that could be you know one six of the whole economy do you think that's a realistic Target I mean again like I know some biology I know something about how the AA models are going to happen I wouldn't be able to tell you exactly what would happen but like I can tell a story where it's possible so so 15% and when will be so when could we have added the equivalent of 10 years to our life I mean how long what what's the time frame again like you know this involves so many unknowns right if I if I try and give an exact number it's just going to sound like hype but like a thing I could a thing I could imagine is like I don't know like two to three years from now we have ai systems that are like capable of making that kind of Discovery 5 years from now those those discoveries are actually being made and 5 years after that it's all gone through the regulatory apparatus and and really so you know we're talking about more we're talking about you know a little over a decade but really I'm just pulling things out of my hat here like I don't know that much about drug Discovery I don't know that much about biology and frankly although although I invented AI scaling I don't know that much about that either I can't predict it the truth of course is that we simply don't know what the ramifications will be of the scaling and of course of new research regardless these companies are pressing ahead uh right now 100 Mill ion there are models in training today that are more like a billion I think if we go to 10 or 100 billion and I think that will happen in 2025 2026 maybe 2027 and the algorithmic improvements continue a pace and the chip improvements continue a pace then I think there there is in my mind a good chance that by that time we'll be able to get models that are better than most humans at most things but I want to know what you think are we at the dawn of a new era in entertainment and intelligence or has the hype gone too far if you want to hear more of my Reflections do check out my podcasts on patreon on AI insiders you could also check out the dozens of bonus videos I've got on there and the live meetups arranged via Discord but regardless I just want to thank you for getting all the way to the end and joining me in these wild times have a wonderful day