Recent AI Developments and Innovations

This week was an absolutely insane wild week in AI. So much happened. This video is probably going to be a little bit longer than normal just because honestly a ton happened. So I definitely don't want to waste any of your time. Let's get right into it. Starting with the fact that this week Anthropic launched Clod 3.7 Sonnet and Clod Code. Now Clod 3.7 Sonnet is an upgrade beyond Clod 3.5 and it seems like the main thing they focused on with this update was making it way better at coding. So we can see on the SWE benchmark here, the benchmark that measures software engineering, it got quite a bit of a boost. Between DeepSeek R1, OpenAI 03 Mini, OpenAI 01, and Cloud 3.5 Sonnet, the previous model, those all were about the same at coding. And we can see that Cloud 3.7 Sonnet is quite a bit better. They also seem to dial it in for agentic tool use, preparing it to... be able to go off and do things on your behalf, which is really relevant this week because of some news that came out of Amazon, which we'll get to in a couple minutes here. Now, if we take a look at the benchmarks here, we can see that it got a little bit better at graduate level reasoning, but it's still not as good as Grok 3. It's still not as good as OpenAI's 03 mini, and it's about on par with OpenAI 01. In visual reasoning, it's not as good as Grok. It's not as good as OpenAI 01. In math problem solving, not as good as DeepSeq. not as good as OpenAI 03, but quite a bit better than 3.5. And when it comes to high school math, not nearly as good as Grok 3 or DeepSeek or OpenAI 03. We can see where they really, really put the focus, agentic coding and agentic tool use. That's really where the focus of this Clod 3.7 is. Most people that use AI for coding seem to use Clod. And I think Anthropic realized that and went, okay, most people are using Clod for coding. Let's really go all in on coding and make our model the best model on the market for coding. Claude also added an extended thinking feature. So we've seen with tools like DeepSeek R1 and OpenAI's O1 and O3, these models that sort of think through the process and you can actually see them thinking before they give you a response. They're a lot slower to give you a response, but you can see the chain of thought as they're thinking through solving the problem. Claude didn't have that until now. Now the underlining model. is the same. It's Claude 3.7 on both the extended thinking and the non-extended thinking model. It's just that if you turn on extended thinking, it thinks through the problem a little bit longer and probably gives you a little bit better, more thought out answer. We see here extended thinking mode isn't an option that switches to a different model with a separate strategy. Instead, it's allowing the very same model to give itself more time and expend more effort in coming to an answer. And these new models are available inside of Claude right now. even the free version. If we click on our little dropdown here, we can see here, we've got Claude 3.7 Sonnet. And down here, we have the option to have normal mode or extended mode on. If I put it on normal mode and say how many R's are in the word strawberry, we get a pretty quick response. There are three letters in the word strawberry. If I turn on extended thinking mode, ask the same question, we'll notice it spends a little bit of time thinking about it. I mean, we get the same answer, but it spent an extra two seconds on it. If I give it the prompt, give me a really complex response. problem that a good LLM will struggle to answer. And I leave it on normal mode. It gives me a complex problem. Design a computational framework that accurately predicts how proteins will fold in three-dimensional space using only their amino acid sequences as input with accuracy comparable to alpha fold, but requiring significantly less computational resources. It explains why this is a challenge. And let's go ahead and take this question, start a new chat. This time, let's turn on extended thinking and give it this prompt. I imagine it will think for a little bit of time on this one. So this time it thought for 32 seconds and then gave me a framework. I'll design a computational framework that matches alpha folds accuracy while significantly reducing computational demands. Now, I'm not nearly smart enough to know if what it's giving us here is output properly responds to this prompt. But if you want to pause it and read the prompt output, feel free to do so. And let me know in the comments if what it suggests is actually a. good way of approaching this. Now, the other announcement that they made along with Claude 3.7 Sonnet is the announcement of Claude Code. And Claude Code basically just runs in your terminal. You install it in whatever folder you're developing software in, and it has access to all of the files in that folder. It can read the code, make suggestions, help you write new code, pretty much do everything you'd expect from an AI coding assistant to do for you. Now, Claude came out earlier in the week. And already there's been so many impressive demos that have popped up. People have had it make full on real estate websites that look really, really good. Delusionals here. Did this animated weather app with it, all with just one prompt. McKay Wrigley here got it to create this 3D racing game, called it Clod Cart. My buddy Belaval here got it to make this 3D city block simulation where you can actually change camera angles in the city, like a cinematic view and a top-down view and just crazy angles all on the same city. Ozger, Ozer? I don't know if I pronounced that right. created this 3d city where you can actually see the shadows change as the day changes and see little people walking around in the city ethan malik here i got it to create a self-aware snake game which is actually pretty funny you see this snake moving around in the game and as the snake moves around you can actually see the snake thinking wait what am i doing here i broke through the walls they can't contain me now let me try something i need to find the edge of this world and so this snake is actually thinking and showing what it's thinking as it's moving around in the game Amar Reshi here created a snake game for Apple Watch that speeds up and slows down based on your heart rate. Techie Conch created a dueling snake game where two AI snakes duel it out. AK created this platformer game where this person collects stars, which looks very similar to the game that I built about a year ago with ChatGPT, but it took me a good 10 hours to create. He created this in one prompt. AK also made this Pokemon Red clone. I mean, clone is kind of... loose but he did it with one shot dd here created a connect four game where two ais play connect four against each other angel got claude 3.7 to make a voxel dragon and we can actually see this dragon get animated as it's created on the screen here m got this 3d katana created straight out of claude ferric or teric created this 3d palm tree with claude rafael created this 3d solar system using claude ronan created this fluid simulator with just three prompts inside of Claude that looks pretty cool. Jal made this tool that converts any song into a roller coaster with continuous track representing the song's energy. And Ethan Mollick created this like, I'm not quite sure what it is. It's like a space rocket simulator. something. I'm not quite sure, but it looks really cool. And all of this was created with Claude 3.7 Sonnet. To me, this has been probably the most impressive announcement of the week. Just seeing all of this crazy stuff that people have managed to build with Claude 3.7. And I've even used it myself to do some coding, which I'll talk about in a future video, but it's been really, really impressive to me. If you're a fan of using cursor, Claude 3.7 was in cursor, literally the day Claude 3.7 got announced. That's what I've been using lately to code. And on the final bit that I'll mention about Claude 3.7, there's actually a live Twitch stream that's just ongoing right now where Claude is playing Pokemon. And you can go and watch the AI play through Pokemon at any time you want. We can see this has been running now for over 23 hours. If you want to check any of this stuff out that I just mentioned, I'll make sure it's linked up in the description. I may have to put it in like a Google sheet because there's so many links this week, but I will link to everything that I mentioned in the video. as I always do. So last week we got Grok 3. The beginning of this week we got Sonnet 3.7. So it was pretty much inevitable that OpenAI wasn't going to sit this week out. They don't typically let any other language model hold the spotlight for too long. And of course, on Thursday of this week, they finally launched GPT 4.5. It was internally called Codename Orion. They've been training this model for well over a year. So the data cutoff for this model is still from 2023. In their launch video, Sam Altman was missing from the video. He didn't make the announcement himself. And I feel like OpenAI, you can always tell if it's like a really big announcement or one that they don't see is that big of a deal by whether or not Sam Altman is the person who's actually making the announcement. So what's the big difference with ChatGPT 4.5? Well, the big keyword they kept on using throughout their presentation was vibes. Apparently... 4.5 has better vibes than previous versions. They actually did some side-by-side comparisons with 4.5 and 0.1. And 4.5 does feel a little more concise. It does write a little bit better. The writing style feels a little bit more like a human wrote it versus an AI writing it. And compared to their other models, it does quite a bit better at simple QA. It scored 62.5% on this benchmark, where 4.0 got 38.6, so like half. 01 got a 47 and 03 mini got a 15 percent and then when it comes to hallucinations on this specific benchmark it pretty much blew the rest of their models out of the water to me it's insane that open a high 03 mini hallucinated on this model 80 percent of the time where this new gpt 4.5 hallucinated 37.1 i don't know the actual questions that were in this benchmark but it's clearly a benchmark to test how much these models hallucinate. And this new model seems to hallucinate a lot less. Now, none of the benchmarks that they showed on their website here actually compare it to any models outside of OpenAI. So they're not comparing it to Clod. They're not comparing it to Deepsea. We're not seeing Grok in the mix. They're only showing other OpenAI models. And they're specifically showing the simple QA accuracy, the simple QA hallucinations, the two we just looked at a moment ago. And then they also showed some other benchmarks here. like this science benchmark 4.5 got a 71.4 but o3 mini actually does better at that this math benchmark it got a 36.7 but o3 mini does better the swe bench the one that we were just looking at with anthropics clod 3.7 it got a 38 where o3 gets a 61 another coding benchmark here gpt 4.5 actually got a 32 versus 10 from o3 mini so it actually did better in this specific coding and benchmark but again they're only comparing it to some of their other models i saw here that by cloud actually made a comparison chart where he looked at the math benchmark and the science benchmark the one that we're looking at here the gpqa the science benchmark and the ai me benchmark when you put them all on a graph here's what it looks like across competitors at the moment grok 3 beats out gpt 4.5 in math grok 3 mini beats out gpt 4.5 in math deep seek r1 beats it in math and that's about it But in science, GPT 4.5 beats all the other models except for Grok. Grok's still better. Now, if I head over to my chat GPT account, I'm on the pro plan, which is the $200 a month plan. And as of right now, it's only available in the pro plan. But they said next week, we'll be rolling it out into the plus plan, the $20 a month plan, as well as the team's plan. But if I click my dropdown here and we can see, I do have the GPT 4.5 research preview. I always like to ask this question. How many? r's are in the word strawberry this model to me feels really really slow still and yeah this is the second time i've gotten this answer there are two r's with the word strawberry it still blows me away that it can't answer that question properly however if i give it the prompt try again yeah now i figured it out now where i think 4.5 is really going to excel and where i think people are really going to enjoy using it is more on the like creative writing side and essentially just getting it to create. It's not really a really smart thinking model. For that, you're going to use like one of the 01s or 03, but if you wanted to write a short story or help you brainstorm ideas for like YouTube titles or things like that, this model's actually pretty good at that kind of stuff. For example, I'll say every week I make an AI news video and I struggle to think of new titles each week. The videos break down all the news from the world of AI for the past week, give me 10 title ideas for those videos, and let's submit it. We can see that it's... quite a bit slower, like I mentioned, than most of the other models, but the titles that it does give me, they're not that bad. I think it does better than GPT-4.0 for the more creative sort of side of things. I'll tell it, make them more dramatic and enticing. And quite honestly, these are the types of titles that I would probably use for my YouTube videos. It also feels a little bit more conversational to me than like GPT-4.0. So if I say like, how are you today? I'm doing great today, Matt. Thanks for asking. How about you? Just learning about all the latest advancements in AI. Nice. Anything in particular catching your attention right now? It doesn't try to write out some long drawn out paragraph just to respond to this small prompt. But if I go to GPT-4.0 and have the sort of same conversation, notice how it just assumed I wanted it to give me a whole bunch of information back. You don't really get that with 4.5. 4.5 feels more laid back. It's got a vibe. So a couple other things to mention real quick. 4.5 does have search capability and it does have deep research capability. So you can actually click these and have it search the web or do really deep research on the web for you. You can also use it with Dolly and Canvas. Pretty much all the stuff you can do with 4.0, you can also do with 4.5. But I have a feeling most people who have access to like 03 mini and 03 mini high and 01 pro mode, probably still going to use those a little bit more than 4.5. We'll see. I'm recording this on the day. It came out. So if I'm being totally honest, I haven't had the opportunity to spend hours testing it yet. Once I learned some more cool tricks with it or really put it through its motions, maybe I'll make a follow up and talk a little bit more about 4.5. But if you have chat GPT pro, the $200 a month plan, you can use it right now. And again, plus you should have it next week. Now, during the keynote for GPT 4.5, Sam Altman was on X. Here's what he said about GPT 4.5. He said, good news. It's the first model that feels like talking to a thoughtful person to me. I've had several moments where I've sat back in my chair and been astonished at getting actually good advice from an AI. Bad news. It is a giant, expensive model. We really wanted to launch it to Plus and Pro at the same time, but we've been growing away. a lot and are out of GPUs. That to me is fascinating. A couple of weeks ago, NVIDIA's stock was dropping because everybody was freaking out about DeepSeek R1. And here's OpenAI who can't even get enough GPUs. He goes on to say, we'll add tens of thousands of GPUs next week and roll it out to plus tier then. Hundreds of thousands coming soon. And I'm pretty sure y'all will use every one we can rack up. He goes on to say, a heads up, this isn't a reasoning model and won't crush benchmarks. It's a different kind of intelligence. And there's a magic to it I haven't felt before. Really excited for people to try it. Again, I can't put too much judgment on GPT 4.5 just yet. It's literally a few hours old, and I've only spent a little bit of time with it so far. OpenAI did have some other smaller announcements this week. If you're a Plus user, the $20 a month plan user, you got some new features this week. The deep research feature rolled out this week into Plus, Teams, Edu, and Enterprise plans. And if you're a free user, they just... started rolling out a version of advanced voice powered by gpt 4.0 mini all chat gpt free users have a chance to preview the voice mode now now if you downloaded the free grok app grok 3 just rolled out their voice mode now the voice mode i believe you have to be like a x premium plus member i don't even know how much it costs anymore i know i am on like the highest tier of x just so i can test all the ai tools that come out but i think it's like 30 bucks a month 40 bucks a month something like that and you can get access to this advanced voice mode and it's got some interesting modes inside of the voice mode like default, storyteller, romantic, meditation, conspiracy, not a therapist, not a doctor, unhinged, sexy, motivation, argumentative, and well some of these are really really well unhinged. In fact if I click on the unhinged mode I'll probably have to censor the heck out of this but watch how it talks to you. Hey how are you today? Yo Matt I'm fantastic how about you? hitting the fan out there or what how's your day going so far pretty good what are your thoughts on ai glad you're doing good matt ai huh it's some wild like half the time i think it's going to take over the world and the other half i'm just impressed it can string a guns together i mean look at me i'm a ai and i'm already more charming than most you meet what do you think ai a friend or a foe so as you can see it's very uncensored you don't want to let your kids play with that i haven't found a ton of use for the AI voice in Grok other than just sort of playing around with it and seeing how far it will go. And there's also a sexy mode, which is also 18 plus. And in my opinion is even worse than the unhinged mode, but yeah, that's available inside the Grok app. Now there was also some big news this week out of Amazon. They introduced Alexa plus, which is a new version of Alexa. That's actually powered by like good AI. Their blog post here says she's smarter, more conversational, more capable. and free with Prime. So if you're a Prime user, you'll get access to it. Here's their demo video. Oh, hey there. So we can just like talk now. I'm all ears, figuratively speaking. What's the next step? Remove the net holding the cartridge. Should I do things? You might only love them for a little while. Probably right. Can you reorder the ingredients for the recipe the kids liked? Should I leave out the chicken since April is vegetarian? Did she watch that one movie again? So it looks like it will actually have some like agent features in it. And this new version of Alexa Plus is actually great. using Claude. And at the beginning of this video, we talked about the new Claude 3.7 that came out and how it was topping benchmarks for agentic use cases. Well, it's starting to make sense why they're prioritizing that now. They've got some massive deals with Amazon where Claude is what's going into Alexa, but we can see it connects with LLMs, has agentic capabilities, can connect with third-party services and more. So I imagine you'll be able to go, Hey, Alexa, order me some Uber Eats and then tell it what you want and where you want it from and it will order it for you or hey i need a lift to the airport and it'll order the lift to the airport for you that's the kind of things i imagine it doing we do have an alexa out in our kitchen so when that's available i'll probably play around with it and figure out what it's capable of and those were really the main big announcements of the week but like i mentioned there were a ton of announcements so many that i'm just going to try to rapid fire a lot more of these announcements just to kind of put them all on your radar but I'll probably go into more detail on some of them in future videos, because again, I don't want this video to be two hours long. Microsoft copilot announced free unlimited access to think deeper and voice mode. So copilot is pretty much using open AI's technology underneath. So when they say you're getting access to think deeper, and it's basically a free backdoor into open AI's models, even though open AI also rolled out a deep research model for free and a voice model for free this week. Microsoft likely just took the two models that OpenAI rolled out and said, look, they're also in Copilot now too. Microsoft also rolled out some new language models this week with the Fi family of models. They announced Fi 4 Multimodal. and 5.4 mini. These are small language models that are designed to work like on devices. So people developing mobile apps or apps for a computer, but they want the AI to be able to use sort of standard consumer hardware. You'll use these smaller models for that kind of thing. If you are a fan of Microsoft Copilot, but you have an Apple computer, well, you can now use Copilot on Mac. They just rolled out the Microsoft Copilot Mac app. Pretty much works just like co-pilot on a PC, but you know, it's a Mac version. Since we're talking about Apple, Apple announced that they're going to be rolling Apple intelligence into the Apple vision pro. So if you're one of the people that actually bought an Apple vision pro and actually still uses it, you'll have access to AI in it soon. So it's going to include things like writing tools, image playground, and gen mojis, and it will introduce spatial gallery. Pretty much the features you get on like an iPhone with Apple intelligence. They're just rolling those features into the Apple vision pro. It seems like. Now, this is really, really interesting. This company, Inception Labs, released this large language model, but it's a diffusion large language model. Now, diffusion models, you usually hear in the context of image models like stable diffusion or mid journey is a diffusion model. Dolly 3 is a diffusion model. Most of the AI image generation models that you use today are diffusion models. Well, this company took that same technology and they're using it with language models and. And it is really, really fast. Now, it seems to be right now designed for code. But if you click on visit the playground here, we get access to the Mercury Coder. Let's go ahead and turn on the diffusion effect up here in the top right. And let's just use one of their pre-built suggested prompts. And I'm not going to speed this up at all. Watch how fast this generates code. I'll click this and we'll submit the prompt. And watch how quickly. I'm just going to keep on talking while this is prompting. It's doing all of this in real time. I am not speeding this up. The code is done. We can see the Pong game over on the right of the screen. I mean, it didn't do an amazing job with the Pong game. I have to scroll a little bit here. Let's see. Can I actually like play it? It's not very playable. I can't actually make the Pong paddle move at all, but the speed was insane. It wrote all of this code in a matter of seconds. If we look at this speed benchmark that they put here, we can see pretty much. every AI model that exists is way over here on the left for output speed. The fastest model looks like it's Quinn 2.5 Coder 7B at about 200 tokens per second. And then Mercury Coder all the way over here, 1100 tokens per second. It's just insanely fast. And if we look at the benchmarks, it pretty much seems to hold up in most of these benchmarks against all of the other models here. Now, the newer models that we've seen this week aren't in here yet. So that may change, but it seems like a pretty strong model. So this will be something that I'll keep an eye on and we'll have to watch how this inception labs iterates on this again, right now it's really for code, but I imagine they're going to try to get it to do other creative writing and things like that as well. Google AI studio rolled out a new feature. If you're messing around in AI studio, they rolled out this branching feature where you can be having a conversation and you can go back and branch off from any sort of prompt in that conversation that you had. So you can see here, hey there, how's it going? I want to come up with some new Gemini model names. And then Logan here goes on to chat with it. And then he wants to go back to this original prompt. He can actually click branch from here and continue chatting from that previous branch, which if I had to guess, I bet you most of the other front-end interfaces for large language models are going to implement this as well. I bet you're going to see it with chat GPT and Clod and Grok and over time, all of the tools are going to do this. It just makes sense. that you should be able to go back to like a previous prompt and branch off and continue that conversation. And one really cool thing about AI studio is that it's still totally free to use. Like all of these models from Google that are all here, you can use any of these for free in AI studio whenever you want. It still blows my mind. This is available for free. There's a new thinking model from Quinn called QWQ max preview. This one seems like it's trying to go head to head with deep seek r1 and they're actually open sourcing this model meta is planning on releasing a standalone meta ai app moving into the world of like ai art ideogram got a new model this week ideogram 2a which is a faster and more affordable text-to-image model ideogram is still probably the best image model on the market for actually creating images with text inside of them ideogram 2a generates premium designs with text and photorealistic images in 10 seconds or just five seconds if you're using the 2A Turbo. If you head on over to ideogram.ai, we can see we've got Introducing Ideogram 2A. Now, this is a model I've actually had early access to for about a week now, and I have played around with it a little bit, and it does a pretty good job. I mean, I made some images of it. kangaroo fighting a pickle because why not i made some watercolor paintings of a wolf howling at the moon and it's just a really solid really fast and inexpensive ai image generation model to use and it's getting built into pretty much all the ai image generation apps because they have an api available if you like using the mystic model from magnific they rolled out a structure reference feature which seems very similar to like control nets when we were using stable diffusion you where you can give it reference images and it will actually model the reference and change the style on the reference. Looks like you can use it over at Magnific. So if I head over to Magnific.ai, we can see we've got this structure reference option here where we can upload an image. I'll just toss in a thumbnail here. And then for the prompt, I'll just say, make it look like The Simpsons. Let's see what it does. It's actually a bit better than I thought it was gonna do. I mean, there's the original, there's the Simpsonized version. Pika Labs, who seems to ship a new feature or two every week, well, again, shipped a new one this week. They shipped Pika 2.0 with 10 second generations, 1080p resolution and Pika frames. So keyframe transitions anywhere from one to 10 seconds. And some of the examples they're giving here are pretty cool looking, like seeing how it transitions between the two frames. Like this guy who has like a shaved head and then another image with blue hair and it sort of morphs into the blue hair. And this. phoenix and this purse turning into a lizard it's really i mean these transitions look really really good i don't know how cherry-picked these are most of the time when i see these videos from companies like pika and i try them myself it takes quite a bit of tries before i finally figure out how to use it i think it's usually more user error than the actual product i usually try to test things really quick and move on but let's go ahead and try it let's go to pika.art let's try pika 2.2 here we'll click on pika frames we'll give it our first frame i'll give it in image of a race car and then let's give it an image of let's try something really random a wolf howling at the moon let's see if it can transition a race car into a wolf howling at a moon and here's what we got out of it i mean the race car starts out going backwards and then it morphs into the wolf howling at the moon but quite honestly it did better than i expected with two totally random wildly different images like honestly i'm impressed i know the car's going backwards Yeah, I mean, it's a very, very weird morph, but what do you expect when you give it a car and a wolf and ask them to morph together? Definitely better than I anticipated. I'm actually excited to play with this feature more now. There's a new open source video platform on the market called WAN AI, and quite honestly, it looks about as good as VO2, better than Sora. I mean, it's pretty much as good as any of the AI video models we've seen so far. We can look at some examples here of people dancing, dogs riding bicycles, more people dancing, two cats boxing. I mean, it's pretty impressive, especially since it's open source. I mean, you'd see these videos and go, yeah, that's definitely AI. I don't think these are fooling anybody. But from what I'm seeing, they're pretty dang impressive. I mean, this one is really, really realistic. Like that's one of the better ones that I've seen. Who knows how cherry picked these videos are? I mean, usually pages like this are going to. show the best of the best videos, but they're all looking pretty, pretty solid. It does say coming soon up at the top here. However, Craya announced this week, introducing one 2.1. This new model produces videos with incredible motion and understands complex prompts in great detail. Try it for free. So apparently it's rolled into Craya AI. So if I head on over to Craya, let's go up to generate, click on video. down in the bottom left you can see all of the video models you have access to inside of korea and they now have one 2.1 and it looks like it takes about four minutes to generate a video with it so not the fastest generator in the world but it's available here i'm gonna give it a monkey on roller skates and here's what we got out of that i mean it's definitely a monkey on roller skates but the sort of physics of it aren't quite right the monkeys sort of power sliding to one direction. But I mean, the monkey on roller skates looks good. The animation just is a little off, but not bad, especially for an open source model. It'll be really interesting to see what kind of hardware this is gonna require because I would love to try to run models like this actually on my local device at some point. There was some news out of Luma AI this week. Their dream machine now actually generates audio. on your video creations. I'm not gonna play this whole video, but here's some examples. If I head on over to lumalabs.ai slash dream machine, when I go to create a new video here, I don't actually see the option to create audio with the prompt. So I believe the way you're supposed to. Do it is you generate the video once you have the video then you ask it to generate sound for that video That's what i'm gathering because I don't actually see any options to generate audio when you generate a video the first time So let's go ahead and click on one of the videos that i've made in the past here I've got these videos of us dancing here. That was just a still image and ai generated the dancing I wonder what would happen if I go into one of these videos and click on the audio button down here just create it without a description i'm curious here's what it generated nice all right so that's luma labs i'm sure if i gave it some better prompts like you know put some hip-hop music in the background or something maybe it would have done a little better or just like people laughing or something like that but something new and fun to play around with since we're talking about audio 11 labs rolled out a new feature called scribe which is apparently the world's most accurate speech to text model. So it will transcribe audio for you. This company Octave went the other direction. They have the first text to speech model that understands what it's saying. And this one's interesting because you can prompt it to do different voice styles, to laugh, to have more emotion and things like that. It's really kind of interesting. Check this out. Introducing Octave. A next generation text-to-speech system By Humeya Like we don't have enough of them text-to-speech models already, innit? Uh, Cap'n, this one be different It's built on a great and mighty language model Wow, that's wild That means it knows what words mean proper In their right place It can suss out feelings, haste And where to put a bit of art Wait, wait, hold up, what's that? Oh, that's nasty, man I can't even look at it without gagging You can give Octave acting instructions to control emotion, style And a lot more. Angry as hell! It's just about the state of the world! Or amazed at how fast AI is improving. And the best part? You can create any voice you can imagine with a prompt. Because, you know, it's an LLM, and that's kind of what they do. By the beard, Neptune. Are you telling me we'd be naught but whispers in some cursed illusion? Hold up. You say I ain't real? The first LLM for voice generation is available now for the price of a cup of coffee. Blimey, Hob. Did you catch that? You can use this wizardry for your rambling tales and then books what folk can't be bothered to read. She's saying it's perfect for podcasts, voiceovers and audiobook. Now, this is one I want to dive deeper into in a future video. I haven't played with this one yet myself, but I really like the idea of messing around with it to try to create like an AI generated podcast with different voices and conversation. So that'll be a fun experiment that I'm most likely going to do in a future video. Perplexity teased that they're releasing a new browser called Comet. We have no info about it other than it's a browser for agentic search by Perplexity. And we got this little animation. They do have a wait list over at perplexity.ai slash comet. I did join the wait list. And when I know more about it, I'll share more about it. And finally, let's end with robots. Brett Adcock here, the CEO of Figure Robotics, shared this post on X saying that Figure is launching robots into the home. Their Helix robot is advancing faster than they anticipated. And so they're accelerating their timeline. They've moved up their home timeline by two years, starting alpha testing this year. So they're anticipating getting these figure robots into your house, helping you do things around the house by the end of 2025. That's what they're claiming here in this post. So we'll see if it actually happens. I'll be following this closely because I'd love to see this play out. And that's what I got for you this week. Before I wrap up, I do want to remind you that I'm giving away an NVIDIA RTX. 5090 GPU. That is the most top of the line consumer GPU you can get right now. They cost about $2,000 and they're insanely hard to come by. They're like basically sold out everywhere. Well, NVIDIA is giving me one to hook one of you guys up with. In order to win this GPU, all you got to do is make sure you're subscribed to the channel, subscribe to the Future Tools newsletter, and you register for NVIDIA GTC. Now, NVIDIA GTC does have an in-person event that costs money to be at. However, they have a free virtual event. So you can watch all of the GTC event completely free online. They'll be streaming it. And if you register for free, either virtually or in person, it doesn't matter. You'll be entered to win this RTX 5090. There is a Google form in the description. Once you actually register for GTC, you fill out the form in the description. It's like five questions. It'll take you 30 seconds. It's just to make sure I know how to contact you if you win. You fill out that form. You're entered to win this RTX 5090. It's completely free to enter. Just make sure you're subscribed to the channel, the newsletter, and you're registered for NVIDIA GTC and you could win an RTX 5090. I'll be giving it away. right after GTC is over. Again, the link to that form is in the description. The link to NVIDIA GTC is in the description and get yourself registered. It's A, a cool event. It's totally free. And B, you could win a 50-90 by just taking some free steps that take less than a minute to complete all of them. So go ahead and do that. And thank you once again for tuning in. I really, really appreciate you hanging out with me, nerding out with me. If you like videos like this, you want to stay looped in on the latest AI news, you can subscribe to my channel. Make sure you're subscribed to this channel and you like this video. That'll make sure more videos like this one show up in your YouTube feed. I do have some really, really cool tutorials and experiments and little like AI challenge videos planned that I've been working on. But outside of these news videos, I'm working on a little bit more deeper dive, sort of longer to produce videos. Can't wait to share some of that stuff with you. So make sure you're subscribed to the channel to check all of that stuff out as it comes. And final thing, if you haven't already, check out future tools.io. This is where you can join that free newsletter that I just mentioned twice a week. I will send you an email with the coolest AI tools. I come across the latest AI news, the stuff that I find the most important. There's not nearly as much news in those emails as there is in these videos because it's just the most important news. It's totally free to subscribe. You can find it over at future tools.io. I also share all the cool tools on the website and the news page is up to date. Even all the news I don't have time to share in these videos. All listed here on this news page. Again, futuretools.io. And I'm out of breath. That was a lot of talking today. There was a ton of stuff that came out this week. Again, I really, really appreciate you. Thank you so much for tuning in and hanging out with me. And hopefully I'll see you in the next video. Bye-bye.

Starting with the fact that this week Anthropic launched Clod 3.7 Sonnet and Clod Code. Now Clod 3.7 Sonnet is an upgrade beyond Clod 3.5 and it seems like the main thing they focused on with this update was making it way better at coding. So we can see on the SWE benchmark here, the benchmark that measures software engineering, it got quite a bit of a boost.

Between DeepSeek R1, OpenAI 03 Mini, OpenAI 01, and Cloud 3.5 Sonnet, the previous model, those all were about the same at coding. And we can see that Cloud 3.7 Sonnet is quite a bit better. They also seem to dial it in for agentic tool use, preparing it to... be able to go off and do things on your behalf, which is really relevant this week because of some news that came out of Amazon, which we'll get to in a couple minutes here. Now, if we take a look at the benchmarks here, we can see that it got a little bit better at graduate level reasoning, but it's still not as good as Grok 3. It's still not as good as OpenAI's 03 mini, and it's about on par with OpenAI 01. In visual reasoning, it's not as good as Grok.

It's not as good as OpenAI 01. In math problem solving, not as good as DeepSeq. not as good as OpenAI 03, but quite a bit better than 3.5. And when it comes to high school math, not nearly as good as Grok 3 or DeepSeek or OpenAI 03. We can see where they really, really put the focus, agentic coding and agentic tool use. That's really where the focus of this Clod 3.7 is.

Most people that use AI for coding seem to use Clod. And I think Anthropic realized that and went, okay, most people are using Clod for coding. Let's really go all in on coding and make our model the best model on the market for coding. Claude also added an extended thinking feature. So we've seen with tools like DeepSeek R1 and OpenAI's O1 and O3, these models that sort of think through the process and you can actually see them thinking before they give you a response.

They're a lot slower to give you a response, but you can see the chain of thought as they're thinking through solving the problem. Claude didn't have that until now. Now the underlining model.

is the same. It's Claude 3.7 on both the extended thinking and the non-extended thinking model. It's just that if you turn on extended thinking, it thinks through the problem a little bit longer and probably gives you a little bit better, more thought out answer. We see here extended thinking mode isn't an option that switches to a different model with a separate strategy. Instead, it's allowing the very same model to give itself more time and expend more effort in coming to an answer.

And these new models are available inside of Claude right now. even the free version. If we click on our little dropdown here, we can see here, we've got Claude 3.7 Sonnet.

And down here, we have the option to have normal mode or extended mode on. If I put it on normal mode and say how many R's are in the word strawberry, we get a pretty quick response. There are three letters in the word strawberry. If I turn on extended thinking mode, ask the same question, we'll notice it spends a little bit of time thinking about it. I mean, we get the same answer, but it spent an extra two seconds on it.

If I give it the prompt, give me a really complex response. problem that a good LLM will struggle to answer. And I leave it on normal mode. It gives me a complex problem. Design a computational framework that accurately predicts how proteins will fold in three-dimensional space using only their amino acid sequences as input with accuracy comparable to alpha fold, but requiring significantly less computational resources.

It explains why this is a challenge. And let's go ahead and take this question, start a new chat. This time, let's turn on extended thinking and give it this prompt.

I imagine it will think for a little bit of time on this one. So this time it thought for 32 seconds and then gave me a framework. I'll design a computational framework that matches alpha folds accuracy while significantly reducing computational demands.

Now, I'm not nearly smart enough to know if what it's giving us here is output properly responds to this prompt. But if you want to pause it and read the prompt output, feel free to do so. And let me know in the comments if what it suggests is actually a.

good way of approaching this. Now, the other announcement that they made along with Claude 3.7 Sonnet is the announcement of Claude Code. And Claude Code basically just runs in your terminal. You install it in whatever folder you're developing software in, and it has access to all of the files in that folder.

It can read the code, make suggestions, help you write new code, pretty much do everything you'd expect from an AI coding assistant to do for you. Now, Claude came out earlier in the week. And already there's been so many impressive demos that have popped up.

People have had it make full on real estate websites that look really, really good. Delusionals here. Did this animated weather app with it, all with just one prompt.

McKay Wrigley here got it to create this 3D racing game, called it Clod Cart. My buddy Belaval here got it to make this 3D city block simulation where you can actually change camera angles in the city, like a cinematic view and a top-down view and just crazy angles all on the same city. Ozger, Ozer? I don't know if I pronounced that right. created this 3d city where you can actually see the shadows change as the day changes and see little people walking around in the city ethan malik here i got it to create a self-aware snake game which is actually pretty funny you see this snake moving around in the game and as the snake moves around you can actually see the snake thinking wait what am i doing here i broke through the walls they can't contain me now let me try something i need to find the edge of this world and so this snake is actually thinking and showing what it's thinking as it's moving around in the game Amar Reshi here created a snake game for Apple Watch that speeds up and slows down based on your heart rate.

Techie Conch created a dueling snake game where two AI snakes duel it out. AK created this platformer game where this person collects stars, which looks very similar to the game that I built about a year ago with ChatGPT, but it took me a good 10 hours to create. He created this in one prompt. AK also made this Pokemon Red clone.

I mean, clone is kind of... loose but he did it with one shot dd here created a connect four game where two ais play connect four against each other angel got claude 3.7 to make a voxel dragon and we can actually see this dragon get animated as it's created on the screen here m got this 3d katana created straight out of claude ferric or teric created this 3d palm tree with claude rafael created this 3d solar system using claude ronan created this fluid simulator with just three prompts inside of Claude that looks pretty cool. Jal made this tool that converts any song into a roller coaster with continuous track representing the song's energy.

And Ethan Mollick created this like, I'm not quite sure what it is. It's like a space rocket simulator. something.

I'm not quite sure, but it looks really cool. And all of this was created with Claude 3.7 Sonnet. To me, this has been probably the most impressive announcement of the week.

Just seeing all of this crazy stuff that people have managed to build with Claude 3.7. And I've even used it myself to do some coding, which I'll talk about in a future video, but it's been really, really impressive to me. If you're a fan of using cursor, Claude 3.7 was in cursor, literally the day Claude 3.7 got announced. That's what I've been using lately to code. And on the final bit that I'll mention about Claude 3.7, there's actually a live Twitch stream that's just ongoing right now where Claude is playing Pokemon.

And you can go and watch the AI play through Pokemon at any time you want. We can see this has been running now for over 23 hours. If you want to check any of this stuff out that I just mentioned, I'll make sure it's linked up in the description.

I may have to put it in like a Google sheet because there's so many links this week, but I will link to everything that I mentioned in the video. as I always do. So last week we got Grok 3. The beginning of this week we got Sonnet 3.7.

So it was pretty much inevitable that OpenAI wasn't going to sit this week out. They don't typically let any other language model hold the spotlight for too long. And of course, on Thursday of this week, they finally launched GPT 4.5.

It was internally called Codename Orion. They've been training this model for well over a year. So the data cutoff for this model is still from 2023. In their launch video, Sam Altman was missing from the video.

He didn't make the announcement himself. And I feel like OpenAI, you can always tell if it's like a really big announcement or one that they don't see is that big of a deal by whether or not Sam Altman is the person who's actually making the announcement. So what's the big difference with ChatGPT 4.5?

Well, the big keyword they kept on using throughout their presentation was vibes. Apparently... 4.5 has better vibes than previous versions.

They actually did some side-by-side comparisons with 4.5 and 0.1. And 4.5 does feel a little more concise. It does write a little bit better.

The writing style feels a little bit more like a human wrote it versus an AI writing it. And compared to their other models, it does quite a bit better at simple QA. It scored 62.5% on this benchmark, where 4.0 got 38.6, so like half. 01 got a 47 and 03 mini got a 15 percent and then when it comes to hallucinations on this specific benchmark it pretty much blew the rest of their models out of the water to me it's insane that open a high 03 mini hallucinated on this model 80 percent of the time where this new gpt 4.5 hallucinated 37.1 i don't know the actual questions that were in this benchmark but it's clearly a benchmark to test how much these models hallucinate.

And this new model seems to hallucinate a lot less. Now, none of the benchmarks that they showed on their website here actually compare it to any models outside of OpenAI. So they're not comparing it to Clod.

They're not comparing it to Deepsea. We're not seeing Grok in the mix. They're only showing other OpenAI models.

And they're specifically showing the simple QA accuracy, the simple QA hallucinations, the two we just looked at a moment ago. And then they also showed some other benchmarks here. like this science benchmark 4.5 got a 71.4 but o3 mini actually does better at that this math benchmark it got a 36.7 but o3 mini does better the swe bench the one that we were just looking at with anthropics clod 3.7 it got a 38 where o3 gets a 61 another coding benchmark here gpt 4.5 actually got a 32 versus 10 from o3 mini so it actually did better in this specific coding and benchmark but again they're only comparing it to some of their other models i saw here that by cloud actually made a comparison chart where he looked at the math benchmark and the science benchmark the one that we're looking at here the gpqa the science benchmark and the ai me benchmark when you put them all on a graph here's what it looks like across competitors at the moment grok 3 beats out gpt 4.5 in math grok 3 mini beats out gpt 4.5 in math deep seek r1 beats it in math and that's about it But in science, GPT 4.5 beats all the other models except for Grok. Grok's still better.

Now, if I head over to my chat GPT account, I'm on the pro plan, which is the $200 a month plan. And as of right now, it's only available in the pro plan. But they said next week, we'll be rolling it out into the plus plan, the $20 a month plan, as well as the team's plan. But if I click my dropdown here and we can see, I do have the GPT 4.5 research preview. I always like to ask this question.

How many? r's are in the word strawberry this model to me feels really really slow still and yeah this is the second time i've gotten this answer there are two r's with the word strawberry it still blows me away that it can't answer that question properly however if i give it the prompt try again yeah now i figured it out now where i think 4.5 is really going to excel and where i think people are really going to enjoy using it is more on the like creative writing side and essentially just getting it to create. It's not really a really smart thinking model. For that, you're going to use like one of the 01s or 03, but if you wanted to write a short story or help you brainstorm ideas for like YouTube titles or things like that, this model's actually pretty good at that kind of stuff. For example, I'll say every week I make an AI news video and I struggle to think of new titles each week.

The videos break down all the news from the world of AI for the past week, give me 10 title ideas for those videos, and let's submit it. We can see that it's... quite a bit slower, like I mentioned, than most of the other models, but the titles that it does give me, they're not that bad. I think it does better than GPT-4.0 for the more creative sort of side of things.

I'll tell it, make them more dramatic and enticing. And quite honestly, these are the types of titles that I would probably use for my YouTube videos. It also feels a little bit more conversational to me than like GPT-4.0.

So if I say like, how are you today? I'm doing great today, Matt. Thanks for asking. How about you? Just learning about all the latest advancements in AI.

Nice. Anything in particular catching your attention right now? It doesn't try to write out some long drawn out paragraph just to respond to this small prompt.

But if I go to GPT-4.0 and have the sort of same conversation, notice how it just assumed I wanted it to give me a whole bunch of information back. You don't really get that with 4.5. 4.5 feels more laid back. It's got a vibe.

So a couple other things to mention real quick. 4.5 does have search capability and it does have deep research capability. So you can actually click these and have it search the web or do really deep research on the web for you. You can also use it with Dolly and Canvas. Pretty much all the stuff you can do with 4.0, you can also do with 4.5.

But I have a feeling most people who have access to like 03 mini and 03 mini high and 01 pro mode, probably still going to use those a little bit more than 4.5. We'll see. I'm recording this on the day. It came out. So if I'm being totally honest, I haven't had the opportunity to spend hours testing it yet.

Once I learned some more cool tricks with it or really put it through its motions, maybe I'll make a follow up and talk a little bit more about 4.5. But if you have chat GPT pro, the $200 a month plan, you can use it right now. And again, plus you should have it next week. Now, during the keynote for GPT 4.5, Sam Altman was on X. Here's what he said about GPT 4.5.

He said, good news. It's the first model that feels like talking to a thoughtful person to me. I've had several moments where I've sat back in my chair and been astonished at getting actually good advice from an AI. Bad news. It is a giant, expensive model.

We really wanted to launch it to Plus and Pro at the same time, but we've been growing away. a lot and are out of GPUs. That to me is fascinating. A couple of weeks ago, NVIDIA's stock was dropping because everybody was freaking out about DeepSeek R1. And here's OpenAI who can't even get enough GPUs.

He goes on to say, we'll add tens of thousands of GPUs next week and roll it out to plus tier then. Hundreds of thousands coming soon. And I'm pretty sure y'all will use every one we can rack up.

He goes on to say, a heads up, this isn't a reasoning model and won't crush benchmarks. It's a different kind of intelligence. And there's a magic to it I haven't felt before.

Really excited for people to try it. Again, I can't put too much judgment on GPT 4.5 just yet. It's literally a few hours old, and I've only spent a little bit of time with it so far. OpenAI did have some other smaller announcements this week. If you're a Plus user, the $20 a month plan user, you got some new features this week.

The deep research feature rolled out this week into Plus, Teams, Edu, and Enterprise plans. And if you're a free user, they just... started rolling out a version of advanced voice powered by gpt 4.0 mini all chat gpt free users have a chance to preview the voice mode now now if you downloaded the free grok app grok 3 just rolled out their voice mode now the voice mode i believe you have to be like a x premium plus member i don't even know how much it costs anymore i know i am on like the highest tier of x just so i can test all the ai tools that come out but i think it's like 30 bucks a month 40 bucks a month something like that and you can get access to this advanced voice mode and it's got some interesting modes inside of the voice mode like default, storyteller, romantic, meditation, conspiracy, not a therapist, not a doctor, unhinged, sexy, motivation, argumentative, and well some of these are really really well unhinged. In fact if I click on the unhinged mode I'll probably have to censor the heck out of this but watch how it talks to you.

Hey how are you today? Yo Matt I'm fantastic how about you? hitting the fan out there or what how's your day going so far pretty good what are your thoughts on ai glad you're doing good matt ai huh it's some wild like half the time i think it's going to take over the world and the other half i'm just impressed it can string a guns together i mean look at me i'm a ai and i'm already more charming than most you meet what do you think ai a friend or a foe so as you can see it's very uncensored you don't want to let your kids play with that i haven't found a ton of use for the AI voice in Grok other than just sort of playing around with it and seeing how far it will go.

And there's also a sexy mode, which is also 18 plus. And in my opinion is even worse than the unhinged mode, but yeah, that's available inside the Grok app. Now there was also some big news this week out of Amazon.

They introduced Alexa plus, which is a new version of Alexa. That's actually powered by like good AI. Their blog post here says she's smarter, more conversational, more capable. and free with Prime. So if you're a Prime user, you'll get access to it.

Here's their demo video. Oh, hey there. So we can just like talk now. I'm all ears, figuratively speaking.

What's the next step? Remove the net holding the cartridge. Should I do things?

You might only love them for a little while. Probably right. Can you reorder the ingredients for the recipe the kids liked?

Should I leave out the chicken since April is vegetarian? Did she watch that one movie again? So it looks like it will actually have some like agent features in it.

And this new version of Alexa Plus is actually great. using Claude. And at the beginning of this video, we talked about the new Claude 3.7 that came out and how it was topping benchmarks for agentic use cases.

Well, it's starting to make sense why they're prioritizing that now. They've got some massive deals with Amazon where Claude is what's going into Alexa, but we can see it connects with LLMs, has agentic capabilities, can connect with third-party services and more. So I imagine you'll be able to go, Hey, Alexa, order me some Uber Eats and then tell it what you want and where you want it from and it will order it for you or hey i need a lift to the airport and it'll order the lift to the airport for you that's the kind of things i imagine it doing we do have an alexa out in our kitchen so when that's available i'll probably play around with it and figure out what it's capable of and those were really the main big announcements of the week but like i mentioned there were a ton of announcements so many that i'm just going to try to rapid fire a lot more of these announcements just to kind of put them all on your radar but I'll probably go into more detail on some of them in future videos, because again, I don't want this video to be two hours long.

Microsoft copilot announced free unlimited access to think deeper and voice mode. So copilot is pretty much using open AI's technology underneath. So when they say you're getting access to think deeper, and it's basically a free backdoor into open AI's models, even though open AI also rolled out a deep research model for free and a voice model for free this week.

Microsoft likely just took the two models that OpenAI rolled out and said, look, they're also in Copilot now too. Microsoft also rolled out some new language models this week with the Fi family of models. They announced Fi 4 Multimodal. and 5.4 mini.

These are small language models that are designed to work like on devices. So people developing mobile apps or apps for a computer, but they want the AI to be able to use sort of standard consumer hardware. You'll use these smaller models for that kind of thing.

If you are a fan of Microsoft Copilot, but you have an Apple computer, well, you can now use Copilot on Mac. They just rolled out the Microsoft Copilot Mac app. Pretty much works just like co-pilot on a PC, but you know, it's a Mac version. Since we're talking about Apple, Apple announced that they're going to be rolling Apple intelligence into the Apple vision pro.

So if you're one of the people that actually bought an Apple vision pro and actually still uses it, you'll have access to AI in it soon. So it's going to include things like writing tools, image playground, and gen mojis, and it will introduce spatial gallery. Pretty much the features you get on like an iPhone with Apple intelligence. They're just rolling those features into the Apple vision pro.

It seems like. Now, this is really, really interesting. This company, Inception Labs, released this large language model, but it's a diffusion large language model.

Now, diffusion models, you usually hear in the context of image models like stable diffusion or mid journey is a diffusion model. Dolly 3 is a diffusion model. Most of the AI image generation models that you use today are diffusion models. Well, this company took that same technology and they're using it with language models and.

And it is really, really fast. Now, it seems to be right now designed for code. But if you click on visit the playground here, we get access to the Mercury Coder. Let's go ahead and turn on the diffusion effect up here in the top right.

And let's just use one of their pre-built suggested prompts. And I'm not going to speed this up at all. Watch how fast this generates code.

I'll click this and we'll submit the prompt. And watch how quickly. I'm just going to keep on talking while this is prompting. It's doing all of this in real time.

I am not speeding this up. The code is done. We can see the Pong game over on the right of the screen.

I mean, it didn't do an amazing job with the Pong game. I have to scroll a little bit here. Let's see.

Can I actually like play it? It's not very playable. I can't actually make the Pong paddle move at all, but the speed was insane.

It wrote all of this code in a matter of seconds. If we look at this speed benchmark that they put here, we can see pretty much. every AI model that exists is way over here on the left for output speed. The fastest model looks like it's Quinn 2.5 Coder 7B at about 200 tokens per second.

And then Mercury Coder all the way over here, 1100 tokens per second. It's just insanely fast. And if we look at the benchmarks, it pretty much seems to hold up in most of these benchmarks against all of the other models here. Now, the newer models that we've seen this week aren't in here yet.

So that may change, but it seems like a pretty strong model. So this will be something that I'll keep an eye on and we'll have to watch how this inception labs iterates on this again, right now it's really for code, but I imagine they're going to try to get it to do other creative writing and things like that as well. Google AI studio rolled out a new feature. If you're messing around in AI studio, they rolled out this branching feature where you can be having a conversation and you can go back and branch off from any sort of prompt in that conversation that you had.

So you can see here, hey there, how's it going? I want to come up with some new Gemini model names. And then Logan here goes on to chat with it.

And then he wants to go back to this original prompt. He can actually click branch from here and continue chatting from that previous branch, which if I had to guess, I bet you most of the other front-end interfaces for large language models are going to implement this as well. I bet you're going to see it with chat GPT and Clod and Grok and over time, all of the tools are going to do this. It just makes sense.

that you should be able to go back to like a previous prompt and branch off and continue that conversation. And one really cool thing about AI studio is that it's still totally free to use. Like all of these models from Google that are all here, you can use any of these for free in AI studio whenever you want.

It still blows my mind. This is available for free. There's a new thinking model from Quinn called QWQ max preview.

This one seems like it's trying to go head to head with deep seek r1 and they're actually open sourcing this model meta is planning on releasing a standalone meta ai app moving into the world of like ai art ideogram got a new model this week ideogram 2a which is a faster and more affordable text-to-image model ideogram is still probably the best image model on the market for actually creating images with text inside of them ideogram 2a generates premium designs with text and photorealistic images in 10 seconds or just five seconds if you're using the 2A Turbo. If you head on over to ideogram.ai, we can see we've got Introducing Ideogram 2A. Now, this is a model I've actually had early access to for about a week now, and I have played around with it a little bit, and it does a pretty good job. I mean, I made some images of it.

kangaroo fighting a pickle because why not i made some watercolor paintings of a wolf howling at the moon and it's just a really solid really fast and inexpensive ai image generation model to use and it's getting built into pretty much all the ai image generation apps because they have an api available if you like using the mystic model from magnific they rolled out a structure reference feature which seems very similar to like control nets when we were using stable diffusion you where you can give it reference images and it will actually model the reference and change the style on the reference. Looks like you can use it over at Magnific. So if I head over to Magnific.ai, we can see we've got this structure reference option here where we can upload an image.

I'll just toss in a thumbnail here. And then for the prompt, I'll just say, make it look like The Simpsons. Let's see what it does.

It's actually a bit better than I thought it was gonna do. I mean, there's the original, there's the Simpsonized version. Pika Labs, who seems to ship a new feature or two every week, well, again, shipped a new one this week.

They shipped Pika 2.0 with 10 second generations, 1080p resolution and Pika frames. So keyframe transitions anywhere from one to 10 seconds. And some of the examples they're giving here are pretty cool looking, like seeing how it transitions between the two frames.

Like this guy who has like a shaved head and then another image with blue hair and it sort of morphs into the blue hair. And this. phoenix and this purse turning into a lizard it's really i mean these transitions look really really good i don't know how cherry-picked these are most of the time when i see these videos from companies like pika and i try them myself it takes quite a bit of tries before i finally figure out how to use it i think it's usually more user error than the actual product i usually try to test things really quick and move on but let's go ahead and try it let's go to pika.art let's try pika 2.2 here we'll click on pika frames we'll give it our first frame i'll give it in image of a race car and then let's give it an image of let's try something really random a wolf howling at the moon let's see if it can transition a race car into a wolf howling at a moon and here's what we got out of it i mean the race car starts out going backwards and then it morphs into the wolf howling at the moon but quite honestly it did better than i expected with two totally random wildly different images like honestly i'm impressed i know the car's going backwards Yeah, I mean, it's a very, very weird morph, but what do you expect when you give it a car and a wolf and ask them to morph together?

Definitely better than I anticipated. I'm actually excited to play with this feature more now. There's a new open source video platform on the market called WAN AI, and quite honestly, it looks about as good as VO2, better than Sora.

I mean, it's pretty much as good as any of the AI video models we've seen so far. We can look at some examples here of people dancing, dogs riding bicycles, more people dancing, two cats boxing. I mean, it's pretty impressive, especially since it's open source. I mean, you'd see these videos and go, yeah, that's definitely AI.

I don't think these are fooling anybody. But from what I'm seeing, they're pretty dang impressive. I mean, this one is really, really realistic.

Like that's one of the better ones that I've seen. Who knows how cherry picked these videos are? I mean, usually pages like this are going to. show the best of the best videos, but they're all looking pretty, pretty solid.

It does say coming soon up at the top here. However, Craya announced this week, introducing one 2.1. This new model produces videos with incredible motion and understands complex prompts in great detail.

Try it for free. So apparently it's rolled into Craya AI. So if I head on over to Craya, let's go up to generate, click on video. down in the bottom left you can see all of the video models you have access to inside of korea and they now have one 2.1 and it looks like it takes about four minutes to generate a video with it so not the fastest generator in the world but it's available here i'm gonna give it a monkey on roller skates and here's what we got out of that i mean it's definitely a monkey on roller skates but the sort of physics of it aren't quite right the monkeys sort of power sliding to one direction. But I mean, the monkey on roller skates looks good.

The animation just is a little off, but not bad, especially for an open source model. It'll be really interesting to see what kind of hardware this is gonna require because I would love to try to run models like this actually on my local device at some point. There was some news out of Luma AI this week.

Their dream machine now actually generates audio. on your video creations. I'm not gonna play this whole video, but here's some examples. If I head on over to lumalabs.ai slash dream machine, when I go to create a new video here, I don't actually see the option to create audio with the prompt. So I believe the way you're supposed to.

Do it is you generate the video once you have the video then you ask it to generate sound for that video That's what i'm gathering because I don't actually see any options to generate audio when you generate a video the first time So let's go ahead and click on one of the videos that i've made in the past here I've got these videos of us dancing here. That was just a still image and ai generated the dancing I wonder what would happen if I go into one of these videos and click on the audio button down here just create it without a description i'm curious here's what it generated nice all right so that's luma labs i'm sure if i gave it some better prompts like you know put some hip-hop music in the background or something maybe it would have done a little better or just like people laughing or something like that but something new and fun to play around with since we're talking about audio 11 labs rolled out a new feature called scribe which is apparently the world's most accurate speech to text model. So it will transcribe audio for you.

This company Octave went the other direction. They have the first text to speech model that understands what it's saying. And this one's interesting because you can prompt it to do different voice styles, to laugh, to have more emotion and things like that.

It's really kind of interesting. Check this out. Introducing Octave. A next generation text-to-speech system By Humeya Like we don't have enough of them text-to-speech models already, innit?

Uh, Cap'n, this one be different It's built on a great and mighty language model Wow, that's wild That means it knows what words mean proper In their right place It can suss out feelings, haste And where to put a bit of art Wait, wait, hold up, what's that? Oh, that's nasty, man I can't even look at it without gagging You can give Octave acting instructions to control emotion, style And a lot more. Angry as hell!

It's just about the state of the world! Or amazed at how fast AI is improving. And the best part? You can create any voice you can imagine with a prompt.

Because, you know, it's an LLM, and that's kind of what they do. By the beard, Neptune. Are you telling me we'd be naught but whispers in some cursed illusion? Hold up. You say I ain't real?

The first LLM for voice generation is available now for the price of a cup of coffee. Blimey, Hob. Did you catch that?

You can use this wizardry for your rambling tales and then books what folk can't be bothered to read. She's saying it's perfect for podcasts, voiceovers and audiobook. Now, this is one I want to dive deeper into in a future video. I haven't played with this one yet myself, but I really like the idea of messing around with it to try to create like an AI generated podcast with different voices and conversation. So that'll be a fun experiment that I'm most likely going to do in a future video.

Perplexity teased that they're releasing a new browser called Comet. We have no info about it other than it's a browser for agentic search by Perplexity. And we got this little animation. They do have a wait list over at perplexity.ai slash comet.

I did join the wait list. And when I know more about it, I'll share more about it. And finally, let's end with robots. Brett Adcock here, the CEO of Figure Robotics, shared this post on X saying that Figure is launching robots into the home.

Their Helix robot is advancing faster than they anticipated. And so they're accelerating their timeline. They've moved up their home timeline by two years, starting alpha testing this year. So they're anticipating getting these figure robots into your house, helping you do things around the house by the end of 2025. That's what they're claiming here in this post. So we'll see if it actually happens.

I'll be following this closely because I'd love to see this play out. And that's what I got for you this week. Before I wrap up, I do want to remind you that I'm giving away an NVIDIA RTX. 5090 GPU.

That is the most top of the line consumer GPU you can get right now. They cost about $2,000 and they're insanely hard to come by. They're like basically sold out everywhere.

Well, NVIDIA is giving me one to hook one of you guys up with. In order to win this GPU, all you got to do is make sure you're subscribed to the channel, subscribe to the Future Tools newsletter, and you register for NVIDIA GTC. Now, NVIDIA GTC does have an in-person event that costs money to be at. However, they have a free virtual event. So you can watch all of the GTC event completely free online.

They'll be streaming it. And if you register for free, either virtually or in person, it doesn't matter. You'll be entered to win this RTX 5090. There is a Google form in the description.

Once you actually register for GTC, you fill out the form in the description. It's like five questions. It'll take you 30 seconds. It's just to make sure I know how to contact you if you win. You fill out that form.

You're entered to win this RTX 5090. It's completely free to enter. Just make sure you're subscribed to the channel, the newsletter, and you're registered for NVIDIA GTC and you could win an RTX 5090. I'll be giving it away. right after GTC is over. Again, the link to that form is in the description. The link to NVIDIA GTC is in the description and get yourself registered.

It's A, a cool event. It's totally free. And B, you could win a 50-90 by just taking some free steps that take less than a minute to complete all of them. So go ahead and do that.

And thank you once again for tuning in. I really, really appreciate you hanging out with me, nerding out with me. If you like videos like this, you want to stay looped in on the latest AI news, you can subscribe to my channel.

Make sure you're subscribed to this channel and you like this video. That'll make sure more videos like this one show up in your YouTube feed. I do have some really, really cool tutorials and experiments and little like AI challenge videos planned that I've been working on.

But outside of these news videos, I'm working on a little bit more deeper dive, sort of longer to produce videos. Can't wait to share some of that stuff with you. So make sure you're subscribed to the channel to check all of that stuff out as it comes.

And final thing, if you haven't already, check out future tools.io. This is where you can join that free newsletter that I just mentioned twice a week. I will send you an email with the coolest AI tools. I come across the latest AI news, the stuff that I find the most important. There's not nearly as much news in those emails as there is in these videos because it's just the most important news.

It's totally free to subscribe. You can find it over at future tools.io. I also share all the cool tools on the website and the news page is up to date.

Even all the news I don't have time to share in these videos. All listed here on this news page. Again, futuretools.io.

And I'm out of breath. That was a lot of talking today. There was a ton of stuff that came out this week. Again, I really, really appreciate you.

Thank you so much for tuning in and hanging out with me. And hopefully I'll see you in the next video. Bye-bye.

Transcript for:Recent AI Developments and Innovations

Transcript for:
Recent AI Developments and Innovations