AI Developments and Innovations - April 2025

Well, we got a ton of new updates out of OpenAI this week, some cool new Google announcements, and an insane new AI video model. A lot has happened this week. I'm not going to waste your time. Let's just break it all down. Let's start with the fact that OpenAI is actually taking away some models from us. They announced that effective April 30th, GPT-4, the model that came out around spring of 2023 and just blew everybody's minds, is going away. They're going to retire it and fully replace it with GPT-4. Oh. Now that actually makes sense, but here's where things start to get confusing. They're also phasing out the GPT 4.5 model, which is one of the newer models that they've given us access to inside of chat GPT recently. 4.5 only came out in late February of 2025. It's less than two months old. And quite honestly, up until recently, it's been my favorite model to use inside of chat GPT. It was never the smartest model when it came to complex logic or doing math and reasoning, but when it came to creative writing and actually feeling like you were chatting more with like a human, GPT 4.5 was really good at that. Now I said it's a little bit confusing, not because they're getting rid of it, but because of what's replacing it. This week, OpenAI rolled out GPT 4.1. Yeah, after we got 4.5. Now, as of right now, this GPT 4.1 is only available in the API. You won't find it inside of chat GPT. but they did roll out three versions of this. They launched GPT 4.1, 4.1 mini and 4.1 nano. Similar to GPT 4.5, these ones aren't those thinking models where when you give it a prompt, it sort of thinks through the response before giving you an answer. These are more like the models we're used to where you just give it a prompt and it gives you an answer as quick as it can. We can see that these models are quite a bit smarter. They didn't actually label this here, but we can tell by the chart that they're smarter and they're also about as fast when it comes to generating a response as the 4.0 models these 4.1 models are also much better at coding than 4.0 0.1 high which i believe is the same as 0.1 pro it's better than 0.3 mini and it's better at coding than gpt 4.5 the model that's named higher but was released earlier yeah i know super confusing when it comes to instruction following gpt 4.1 isn't quite as good as GPT 4.5 or 03 mini or 01. These are the two models that actually think through after you give it the prompt. Now there's a few reasons 4.1 is seemingly replacing 4.5. One of them is massive context window. So this one has a 1 million token context window, meaning roughly 750,000 words of input and output text. Here's a chart that shows the needle in a haystack benchmark where they essentially take a large amount of text, bury some information somewhere in the text and then ask questions and see if the model finds the small bit of information that they buried. It looks like it was 100% accurate at finding any of the text all the way up to that 1 million token context window. It's also on par when it comes to vision with GPT 4.5, on par when it comes to math, and very slightly better at reasoning. But if we're being real, this is the reason 4.1 is replacing 4.5. That's the pricing to use GPT 4.1. It costs about a dollar 84 per million tokens. Now, OpenAI has completely removed the 4.5 pricing from their pricing page. But if I look at the way back machine here, we can see from last month, GPT 4.5 with 128,000 context length was $75 per million tokens and $150 per million tokens output. So yeah, when you compare the pricing of... of how much it costs to use 4.5 versus how much it costs to use 4.1, this is the real reason it's getting replaced right now. But OpenAI released even more models this week, and these ones are available inside of ChatGPT. They also released 03 and 04 Mini. These are thinking models, models where when you ask it a question, it starts to think through the problem after you give it the prompt before giving you a response, resulting in a slower response, but also a much more accurate. sort of polished response. Now, one thing OpenAI doesn't seem to do is actually compare their models to models that aren't also OpenAI models. Like they're not comparing these to Gemini 2.5 or Anthropix Claude 3.7, or even Meta's Llama 4. They're only comparing them to their own past models. But we can see that when it comes to math, it's pretty impressive and it gets even more impressive, which I'll show you in a minute. But their 03... without any tools, which I'll also talk about in a second, scored an 88.9% on this competition math benchmark here. And 04 Mini with no tools scored a 92.7%. These new models also appear to be pretty good at problem solving and coding, as well as with agentic tool use. Now here's where things get really interesting about these models specifically. Like I mentioned, they're thinking models. So you give it a prompt. It actually thinks through and reasons the response before finally giving you its final output. But while it's doing that reasoning, it's actually able to analyze images as part of the reasoning, and it's able to use tools during that reasoning process before finally giving you the response. Now, from my understanding, I'm not saying I'm 100% certain on this, but from my understanding, the way things like this used to work is you would give it a prompt. It might go and search the web for answers for you and then use what it found as part of the context. It would sort of add it to the prompt and then give you a response based on your prompt and what it found. Now it's getting a little more agentic where it will start thinking, maybe do a search for something, take that information back in, use what it found in the search or what it sees in your image as part of its thinking process, and then possibly even go do another follow-up search or. another look at the image to clarify something. We can see here for the first time these models can integrate images directly into their chain of thought. They don't just see an image, they think with it. This unlocks a new class of problem solving that blends visual and textual reasoning reflected in their state-of-the-art performance across multimodal benchmarks. They can even transform the images as part of the reasoning process, things like rotating, zooming, etc. And then we can see what it says here about agentic tool use. O3 and O4 mini have full access to the tools within ChatGPT. as well as your own custom tools via function calling in the API. The models are trained to reason about how to solve problems, choosing when and how to use tools to produce detailed and thoughtful answers in the right output formats quickly. For example, a user might ask, how will summer energy usage in California compare to last year? The model can search the web for public utility data, write Python code to build a forecast, generate a graph or image, and explain the key factors behind the prediction. chaining together multiple tool calls. And all of this is happening during that thinking phase. And then, like I mentioned, they can search the web multiple times with the help of search providers, look at results, and then try new searches if they need more info during that thinking process. That's what makes these really powerful. 03 and 04 mini seem pretty powerful on their own based on the benchmarks we were just looking at. But when you add the ability to search the web and use various tools and write... Python code during the chain of thought thinking these things become insanely intelligent. They also appear to be getting quite a bit smarter while maintaining relatively the same cost to actually use. But there are some additional benchmarks that they showed inside of their live stream when they announced this that aren't actually in the blog post that I think show how impressive the actual tool use addition to the thinking really is. So here's some of the same benchmarks they had in their blog post, but with some additional info added. We can see here that this now adds 03 with Python and 04 mini with Python. And on this competition math from 04, 03 with Python scored 95.2% and 04 with Python scored 98.7%. And on this 2025 competition math, 03 with Python scored 98.4 and 04 mini with Python scored 99.5. They basically aced it. Now, I've been playing around with O3 for the last couple days and have been really, really impressed with it, honestly. To me, it actually seems better than using one of the other models with search or deep research turned on because it kind of does that for you during its thinking process. I was having a discussion with it about the ethics of the training data that some of these models are trained on. And if we look at its thinking process here. We can see how it thinks, but then we can also see that it searched the web for a bunch of resources to sort of add to its thoughts on it. I also make a weekly roundup post on X where I share all of the news over there that I'm going to talk about in today's video. And I wanted to clean up all of this for me. I found all the news and made a big list of these are the news articles that I want to share. I asked it. It cleaned up for me and you can see it gave me a very cleaned up list of the news articles. But when we look at the thinking process here, we can see it think through. It did some analysis using tools here, thought some more, did some more analysis. I asked it to make sure that it was all a specific character count. So it started counting the characters in each of these titles here. And during this thinking process, it kept going back and using these tools. I believe it's using Python here. And yes, it's a little slower. We can see it thought for two minutes and nine seconds before finally giving me a response. But I mean, look at the depth of what it thought about and actually use tools during the thinking process to make sure that it was sort of fulfilling on what I asked it to do. And then it finally arranged them into a logical order, grouping open AI news together, followed by Anthropic and Google news. You can see it actually was using Python to figure out the order in which to arrange them. And then. finally gave me this output with my list of news articles that I wanted to share. It's really, really powerful. And while we're on the topic of 03 and these 04 models, this was an interesting article that came out in the information. Apparently these new models are actually able to come up with novel ideas. From this article, it says, if the upcoming models dubbed 03 and 04 mini perform the way their early testers say they do, the technology might soon come up with novel ideas for AI customers. on how to tackle problems such as designing or discovering new types of materials or drugs. The ability of OpenAI's upcoming models to synthesize new ideas represents one of several pillars the firm is developing to match or outperform humans at most economically valuable work, otherwise known as AGI. A lot of people are claiming this is just one giant step closer to AGI. Now, this article actually came out a day or two before the 03-04 announcements, so they word it as soon-to-be-released reasoning models. But it says what makes these models unique is that they can compute an answer or suggest an idea using information from multiple fields. Physics and engineering, for instance, simultaneously. Most scientists must collaborate with experts from other fields to come up with similar answers or ideas. So in that sense, the AI aims to resemble the kind of inventors who blend information from multiple fields, such as Nikola Tesla and Richard Feynman, whose knowledge in physics, engineering and math drove discoveries in electrical devices and quantum mechanics. I'm personally not aware yet of any novel drugs or materials that have come out of people using 03 or 04, but apparently they're either capable or very close to being capable of doing that. Another interesting revelation that's come out of these new models is that they're really, really good at figuring out where a location is from an image, like to the point where it's getting really, really scary. We can see Ethan Mollick here posted on X. The geoguessing power of O3 is a really good sample of its agentic abilities. Between its smart guessing and its ability to zoom into images, do web searches, and read text, the results can be very freaky. He showed off a couple examples here, and we can see it got it pretty dang accurate all the way down to the latitude and longitude coordinates. Here's another example where it did the same, and here's one that Swax shared on X. They gave it this image here and said, can you guess where in the world it is? And we can actually see. sort of how it figured it out. So it noticed yellow on black license plates, tall wooden colonial houses with louvered shutters, left-hand drive cars, but traffic keeps to the right. I don't even know how they can figure that out from this image. Language on the shop fascia looks like a Latin alphabet business name rather than Spanish or Portuguese and low-lying sandy lot and drainage puddles. They guessed it was Paramaribo Serename. I'm not sure how to pronounce that. which was apparently correct. Along with these other announcements this week, OpenAI also rolled out their Codex CLI command line interface, which is an open source coding tool that can be used directly inside of your terminal. I haven't personally played with this one, but it's a coding agent that you can open up in the terminal, give it commands, and it sort of agentically works through the steps to help you code up something. But apparently OpenAI has bigger ambitions when it comes to AI coding, because rumor has it that they're in talks right now to buy Windsurf. for about $3 billion. So Windsurf is an AI coding agent interface. I believe it's a fork of Visual Studio Code with a whole bunch of extra AI agentic features that were sort of bolted into it. If you've used Cursor, Windsurf is very similar, and apparently OpenAI wants to buy them. It also turns out that Windsurf wasn't necessarily their first choice. They also looked at buying Cursor, but Cursor is... seemingly in the process of raising funds at a valuation of 10 billion so i'm guessing they turned to windsurf because it's more affordable i'm a big believer that everybody should have a website even if it's just to share your social media links and any current projects you're working on that's why For today's video, I partnered with Hostinger. Hostinger is a no-code, AI-powered website builder and hosting company, so you don't have to worry about any of the technical details to get a website online. If you head on over to hostinger.com slash mattwolff, you'll see that you already get 75% off, but if we come down and we select the business website builder, this is the one that I recommend because it's got all the AI tools, and we choose this plan, we can use the coupon code mattwolff to get an additional 10% off. off. Once you're logged in, head on over to the websites button, click on add new website, and then select the Hostinger website builder. This is the one that's going to do the whole thing for you with AI. And we just tell it what we want and it's going to go build it for us. I'll call this one Matt Wolf's personal hub and give it the description. This is Matt Wolf's personal website. It's where he'll link to all his social media profiles, YouTube channel, and various projects that he's working on. Then I'll click create website and let Hostinger's AI do the work for me. In a matter of seconds, I have a pre-built website with the title of the website, my various projects, and of course links to all my various social profiles. Now I just customize it however I want. Let's go ahead and change this background section. I'll toss in this cheesy image from a photo shoot. And now I've got the bones to a website that I can build out however I want. But check this out. Over on the left side, they have a ton of AI tools to make it even easier, like an image generator, writer, page generator, and so much more. If I want to change the image associated with my current work, I can select this image, click generate image, and let's do something silly, like a wolf holding up a video camera with a computer in the background, and let's swap out the current work image. Why not? Again, Hostinger makes it super simple to get a website online within minutes. You can learn more by going to hostinger.com slash mattwolff and using the coupon code mattwolff. at checkout for an extra 10% off. Thank you so much to Hostinger for sponsoring this video. It also looks like we're gonna be getting even more announcements out of OpenAI in the coming weeks. Sam Altman took the X to say we expect to release O3 Pro to the Pro tier in a few weeks. So if you're on that $200 a month plan, just based on how impressive O3 is, I imagine O3 Pro is gonna be kind of insane. And in another rumor circulating around OpenAI, apparently they're working on an X-like social media network. We don't have a lot of details on this. Apparently, Sam Altman has been privately asking people for feedback on the idea. Who really knows if this is going to pan out or not, but it is something that a lot of people are talking about is potentially going to happen. And finally, in the last bit of news about OpenAI, they did roll out another sort of quality of life update for the image generation aspect of ChatGPT. If you log into your ChatGPT account, they added a new library button over here on the left menu. If you click on it. you can actually see all of the various AI images that you've generated inside of ChatGPT. If you click on an image, you can actually quickly scroll through various images that you've generated. Microsoft had an announcement this week. They announced that they're going to be rolling out a computer use feature directly inside of Microsoft Copilot Studio. Now, this isn't something that we have access to yet. It was just an announcement. It's something they plan to show off more at Microsoft Build next month. From what I can tell, it's going to use OpenAI's computer use functionality to be able to take control of your computer and do things on your behalf. I'll make sure you can find the link to this announcement article below because at the bottom of the article, they do have a place where you can fill out a form if you want to be a tester of it. We got some updates out of Google this week as well. They just rolled out a brand new large language model as well in Gemini 2.5 Flash. Now, Gemini 2.5 has probably been the most popular model recently among... coders doing a little bit better job than even Anthropix Cloud 2.7. It's a thinking model and it had a million token context window similar to the GPT 4.1 we talked about earlier. Well, this new Gemini 2.5 Flash is a bit lighter and a bit faster of a model and it's their first fully hybrid reasoning model, meaning that developers can actually turn thinking on or off. So it still has that thinking ability, but if you need a faster response without it thinking through as much, you can turn that feature off. off and get maybe less of a thought out response, but a quicker response. This new model is quite a bit less expensive than 04 Mini, Claude Sonnet 3.7, Grok 3, and even DeepSeek R1. Although when you do turn reasoning on, that price boosts up quite a bit and puts it more in line with 04 Mini and R1 and some of these other models. It's also pretty on par in science and mathematics with models of a similar size. and seemingly pretty decent at code and visual reasoning and image understanding. And again, pretty on par with a lot of other models that have similar capabilities. We can see in this chart here that along the left, this is the LM arena score here. So how users have essentially ranked these models based on how well. Well, they answered their prompts and a sort of side-by-side blind comparison. And then the bottom is the cost to use these models with the cost getting lower and lower as we go to the right. We can see that Gemini 2.5 Flash outperforms DeepSeek R1, 03 Mini High, 01 Preview, Claude 3.7 Sonnet, even with thinking Grok 2 from this like comparison standpoint on LM Arena, it's kind of beaten out everything while being roughly in line with the cost of DeepSeek R1, an openly available model. When it comes to this sort of user preference testing that LM Arena provides, Gemini seems to be blowing all the other models out of the water lately. It always seems to be a Google model on top lately. Even when I log in right now, Gemini 2.5 Pro is still the top most upvoted model. If you go over to AI.com, dev. This actually redirects you to Google's AI studio. I actually just learned that today. And you can test their brand new models over here for free. If I click this dropdown here, you can see we've got the option for a Gemini 2.5 pro. And now as of April 17th, Gemini 2.5 flash. And when we go to use this model, you can see we have a toggle to turn thinking on or off. We can also turn on tools like structured output, code execution, function calling, and grounding with Google search. and use it in the same way you'd use any other chat bot like chat GPT or clot or whatever else you use. Let's give it this old prompt here and we can see it thought about it for two seconds and then got the right answer. It even shows us our token count down here at the bottom showing that we can go all the way up to 1 million tokens within this conversation that we're having. Greatest sentence where every single word ends in the letter A and has exactly 10 words. This used a couple thousand more tokens. because it does use tokens as it does this whole thinking process here. And after about 18 seconds told me that it's nearly impossible, but it did give me a list of 10 words that end in a. Again, this platform is totally free to use. You just go over to AI.dev and you can test out all the various Google models and it even saves your history of past chats and you can jump back into old conversations. Google also showed off something that's pretty interesting this week called Dolphin Gemma. It's a large language model. that's helping scientists study how dolphins communicate and hopefully find out what they're saying. Dolphin Gemma is a foundational AI model trained to learn the structure of dolphin vocalizations and generate novel dolphin-like sound sequences. This approach in the quest for interspecies communication pushes the boundaries of AI and our potential connection with the marine world. They're also making this an open model. It's part of their Gemma series of models. The Gemini models are Google's closed models. The Gemma models are their... open models, which they're making available for people to essentially build and iterate off of. Google also rolled out their VO2 into more platforms. You can actually generate videos directly inside of Gemini and Wisk. If you are a Gemini advanced user, Gemini advanced users will have a VO2 option and they can give a text prompt to generate a video. No image option in this version yet, but if I simply give it the prompt of wolf howling at the moon. It will generate my video directly inside of Gemini and spit out the video right in line with the rest of the chat. I imagine over time you'll be able to do things like ask it to tweak certain elements of the videos and things like that. But right now it's pretty much a text to video generator, but inside of this chat interface. If you're a developer and you want to integrate VO2, that is now available for you to do. VO2 video generation is now available for developers inside of the Gemini API. And if you're a student and you want to use some of these cool features that are available in Gemini Advanced, well, good news. College students in the U.S. are now eligible for the best of Google AI and two terabytes of storage for free. So college students can sign up to get Gemini Advanced, Notebook LM Plus and more free of charge for this school year and next school year. Again, I will put the links in the description. So if you are a college student and you want to take advantage of this, you'll find the link below. Claude from Anthropic got an update this week as well. they now have a research feature as well as Google workspace integration to connect your email, calendar, and documents. The research functionality will actually connect up with your various Google services. So we can see in their example here, it says starting research and it's searching your Gmail messages, searching your calendar events, searching your Google drive, searching the web, and then it pulls information together from all of these places. This is somebody planning a trip because it has access to emails and calendar and. and web search and all that kind of stuff, it's able to accurately help them plan this trip. Now, I demo it myself, but we can see here, research is now available in early beta for Mac's team and enterprise plans. I'm on the $20 a month plan. Mac's is the $200 a month plan. I'm not ready to subscribe to another $200 a month plan right now. The Google Workspace integrations, however, are available to- all paid users so even the 20 month plan if you jump into clod here you can see that there's a connect apps button and you can actually connect drive calendar gmail github etc it also looks like we're about to get a voice mode for anthropics clod making anthropic pretty much the last one to roll out this kind of feature according to this article on bloomberg here anthropic is nearing the launch of a new voice assistant product for its clod chatbot nearly a year after rival open ai began rolling out a similar option for chat gpt users The new feature will be called voice mode and it could be released as soon as this month and will initially roll out on a limited basis, according to a person familiar with the matter. Limited basis probably means they're going to roll it out to like the max plan people first, which is their higher end, more expensive plan. And then it'll probably trickle down to the rest of us later. There was a couple updates out of Grok from XAI this week as well. Grok released what they're calling Grok Studio, which adds code execution and Google Drive support. To me, Grok Studio looks very similar to OpenAI's Canvas, where it sort of pushes the chat to the side and then opens a new window on the right here. It says Grok can now generate documents, code reports, and browser games. If we head over to grok.com here and say, create a snake game that I can play in my browser and then submit it, we can see that it opens up a new code window to the right while leaving the chat over to the left. And when it's done, it... automatically switched over to this preview mode where I've got my snake game that seemingly works like a snake game should work only with one prompt one try. Pretty much all of the models could do this with one try now though. Grok also rolled out a memory feature. Last week OpenAI rolled out their memory feature which would remember all of the conversations you've ever had with it to provide additional context to any of the new chats you have with it. Well now that's also a feature that's available in Grok. We can see here this X post, Grok now remembers your conversations. When you ask for recommendations or advice, you'll get personalized responses. Memories are transparent. You can see exactly what Grok knows and choose what you want it to forget. This feature is in beta and available over on grok.com. So both this feature and that studio feature are available on grok.com, but don't appear to work yet if you just click on Grok inside of X. Moving over to AI video news. This week, Kling launched their new two point. version of their AI video generation model. In this 2.0 model, Kling AI officially introduces a new interactive concept for AI video generation, multimodal visual language. This concept enables users to efficiently convey complex, multidimensional creative ideas such as identity, appearance, style, scenes, actions, expressions, and camera movements directly to AI by integrating multimodal information like image references and video clips. We can see it's got better adherence to actions. A man starts off smiling happily, then suddenly becomes furious, slams the table and stands up. On the left is Kling 1.6. On the right, we see the new model. Better adherence to camera movements. We can see the camera follow this bee around. Better adherence to sequential actions where it follows the apple. Enhanced dynamics. More natural range of motion. More realistic movement speeds. Cinematic visuals. Style consistency. Dramatic expressions. So it seems to be quite a big upgrade here. And I've seen some really impressive videos coming out of Kling. For example, this video of a jet flying is just insane. It goes to the cockpit with the pilot and then goes to the first person view of the jet and then a front view of the jet. And this was all done with this new Kling model. Really, really, really impressive. This one came from Isaac Rodriguez here. I found it on X. Here's another really impressive one from PJ Ace here of like some Native Americans riding on some horses through a desert and holding up fire. And I mean, the color and everything just looks really, really amazing on these, like really super impressive. This whole video is a minute 36, but I will make sure it's linked up. So if you want to check out the whole thing, it's it's quite cool. This is probably my favorite generation I've seen so far from Blaine Brown here where they took this shot from Titanic. but then he threw her off the boat at the end. And then here's a thread from Elcene here where they shared a whole bunch of different things it can do. For example, Kling can now swap out any actor from any film scene. So they swapped out Adam Scott with Taylor Swift here in Severance. This one appears to be Tom Cruise swapped out with somebody. I think maybe Taylor Swift. I'm not 100% sure. Here's another Severance one swapped out with Gal Gadot from Wonder Woman. And just this ability to take an existing scene and swap it out with another person is really quite impressive. Apparently, it's also really good at conveying emotions. Pierrick here shared facial expressions are really well handled in Kling 2.0's text-to-video model. And we can see the emotion really, really show through in the face here. Since we're talking about emotion, here's a new model that I just came across earlier called Arcads.ai. And this one introduced gesture control. So you can actually prompt AI actors to generate specific looks like crying, laughing, and more. And here's some examples they shared. Here's one of like a podcaster showing some excitement or shock while on a podcast. Here's one where somebody is laughing. This was AI generated right here. Here's another one where somebody actually is celebrating. They uploaded their avatar here and told... that avatar to celebrate. Here's one where they do a heart sign. Here's another one where the actor is sad. Again, these are all AI generated and the emotion they're showing is prompted emotion. Here's one where the actor is pointing to the top of the screen. We can't tell what's real anymore. Now it looks like it's specifically designed for creating ads. It's called arcades. What's interesting here is in their FAQ, it says, are the actors real or AI actors? And it says they are AI actors. You don't need to wait for them to get back to you. However, they are based on the image of real actors. So you do need to be careful with what you make them say. So they're not AI generated images of actors. They're actually real images of real actors, but you could use AI to make them say anything or do various gestures. Now, I haven't tested this one yet myself because well, the lowest plan is $110 a month for 10 videos. And I really, really don't like trying tools like this that don't at least give me like. one free trial before they actually require me to pay to use it more. I've talked about this quite a bit in past videos. It's sort of a pet peeve of mine. If you want to charge somebody a hundred dollars a month to use your AI tool, let them try it at least once first before forcing them to pay to even get like a test drive. But nonetheless, the technology looks pretty cool. And if you guys want to see me test this and you think it would make a good video, maybe I'll shell out the bucks to actually try it in a future video. Luma Dream Machine rolled out a new feature this week as well. In fact, it's dropping the day this video drops. And that's the ability to actually adjust camera angles on the videos that you generate. So down inside of your prompt box inside of Luma, there's now a little camera icon. If you click on it, you can see you've got options for static, handheld, zoom in, zoom out, pan left, pan right, tilt up, tilt down, push in, pull out, truck left, et cetera, et cetera. Tons and tons of different camera angles and shots. types that you can get. So if I do point of view here and then let's do, since we already saw a video that did it earlier, looking out the cockpit of a fighter jet and let's see what this generates. And it took a few minutes, but here's what we got out of it. A first person view looking out of a fighter jet. I mean, I think there's a few more dials there that might actually be on a fighter jet, but we got a first person view like we anticipated. This was my very first test using this. So better prompts and more testing. I'm sure you can get way better outputs. But again, a lot here to play with. All right, I have just a few more like rapid fire things that I want to share with you. This tool, Crisp, which has historically been known for sort of removing background noise and making your audio sound better, now has a feature that actually helps people remove accents. So call centers can be able to use it to make it sound like somebody in India has a Texas accent. Here's an example that they shared. Crisp accent conversion comes in. Let me show you the voice preservation mode in action. This mode keeps my original voice intact while softening challenging parts of my accent. So now if you get a sales call, you'll never actually know if it's coming from the same country as you're currently in. Netflix is testing a new AI search engine to help better recommend shows and movies to you. And the new search engine is actually open AI powered. Apparently it's going to let users search for shows based on more specific terms. Things like the subscriber's mood, for example. Apparently it's rolling out in Australia and New Zealand first, and it's only available on iOS devices, but it's going to expand soon to more markets, including the US pretty soon. Apparently it's also Tim Cook's mission in life to bring true AR glasses to the world before Meta does. Meta showed off their Orion glasses. They also have their Ray-Ban Metas, and there's been talk that a new version of the Ray-Ban Metas are going to come out with like a heads up display on them. The rumor is that Tim Cook. wants to get something out into the world with like a similar form factor before meta actually does and finally along the same lines during some ted talks google showed off their new glasses these are glasses that i actually got to demo in person at google deep mine in london a few months ago and they are really really impressive so check out this demo as may have noticed i snuck a peek back at the shelf a moment ago i wasn't paying attention but let's see if gemini was hey you Did you happen to catch the title of the white book that was on the shelf behind me? The white book is Atomic Habits by James Clear. That is absolutely right. Absolutely right. I keep losing my hotel key card. Do you know where I last left the card? The hotel key card is to the right of the music record. Great. This is my first time in Vancouver and I love going on walks. So why don't you navigate me to a park nearby with views of the ocean? Okay. I am starting navigation to Lighthouse Park, which has magnificent views of the Pacific Ocean. Is there anything else I can assist you with? Honestly, with these directions and a 3D map, I should be all set and hopefully I won't look like a tourist. Now again, I did actually try these glasses on. I was kind of sworn to secrecy. I wasn't allowed to talk too much about them, but they also have things like translation. So when somebody's talking to you, it will automatically translate it back. You can look at signs that are in different languages and it will tell you what it says in English. And there is that little heads up display you saw in that video where it showed the little map on the screen. Well... that can also have text. So if you're looking to translate something, it can put the text right in front of your eyes as well. It was really, really impressive technology. Again, it's a form factor that people would actually wanna wear, similar to the Ray-Ban Metas. I don't know exactly when these are gonna roll out to the world. I do know that Google I.O. is coming up next month. So hopefully we have more information about when these glasses are possibly gonna be able to see the light of day in public. But that's what I got for you today. Hopefully you feel more looped in with what's been going on in the world of AI this week. If you want to stay looped in, make sure you like this video and subscribe to this channel. I'm constantly putting out new videos that share all the latest updates in the AI world, as well as tutorials. I have some interviews in the works that I'm really excited to share as well with various AI experts and CEOs and thought leaders. So really excited to share that with you. So make sure you subscribe to the channel for more stuff like that. And if you haven't already, check out future tools.io. This is the website where I curate. all of the cool AI tools that I come across. I share news on a daily basis. So if you want to stay up to the minute with all the latest AI news, it's all here. And I have a free AI newsletter where I'll email you just twice a week with all of the most important AI news and just the coolest tools that I come across. It's totally free. And if you sign up, I'll hook you up with access to the AI income database, a little database that I've been building of cool ways to generate side income using these various AI tools. Again, All free. You can find it over at future tools.io. Thank you so much for tuning in. Thank you to Hostinger for sponsoring this one. I really, really appreciate you guys. And it's been fun nerding out this week. And hopefully I will see you in the next one. Bye-bye.

Well, we got a ton of new updates out of OpenAI this week, some cool new Google announcements, and an insane new AI video model. A lot has happened this week. I'm not going to waste your time.

Let's just break it all down. Let's start with the fact that OpenAI is actually taking away some models from us. They announced that effective April 30th, GPT-4, the model that came out around spring of 2023 and just blew everybody's minds, is going away. They're going to retire it and fully replace it with GPT-4. Oh.

Now that actually makes sense, but here's where things start to get confusing. They're also phasing out the GPT 4.5 model, which is one of the newer models that they've given us access to inside of chat GPT recently. 4.5 only came out in late February of 2025. It's less than two months old.

And quite honestly, up until recently, it's been my favorite model to use inside of chat GPT. It was never the smartest model when it came to complex logic or doing math and reasoning, but when it came to creative writing and actually feeling like you were chatting more with like a human, GPT 4.5 was really good at that. Now I said it's a little bit confusing, not because they're getting rid of it, but because of what's replacing it.

This week, OpenAI rolled out GPT 4.1. Yeah, after we got 4.5. Now, as of right now, this GPT 4.1 is only available in the API. You won't find it inside of chat GPT. but they did roll out three versions of this.

They launched GPT 4.1, 4.1 mini and 4.1 nano. Similar to GPT 4.5, these ones aren't those thinking models where when you give it a prompt, it sort of thinks through the response before giving you an answer. These are more like the models we're used to where you just give it a prompt and it gives you an answer as quick as it can. We can see that these models are quite a bit smarter. They didn't actually label this here, but we can tell by the chart that they're smarter and they're also about as fast when it comes to generating a response as the 4.0 models these 4.1 models are also much better at coding than 4.0 0.1 high which i believe is the same as 0.1 pro it's better than 0.3 mini and it's better at coding than gpt 4.5 the model that's named higher but was released earlier yeah i know super confusing when it comes to instruction following gpt 4.1 isn't quite as good as GPT 4.5 or 03 mini or 01. These are the two models that actually think through after you give it the prompt.

Now there's a few reasons 4.1 is seemingly replacing 4.5. One of them is massive context window. So this one has a 1 million token context window, meaning roughly 750,000 words of input and output text. Here's a chart that shows the needle in a haystack benchmark where they essentially take a large amount of text, bury some information somewhere in the text and then ask questions and see if the model finds the small bit of information that they buried. It looks like it was 100% accurate at finding any of the text all the way up to that 1 million token context window.

It's also on par when it comes to vision with GPT 4.5, on par when it comes to math, and very slightly better at reasoning. But if we're being real, this is the reason 4.1 is replacing 4.5. That's the pricing to use GPT 4.1.

It costs about a dollar 84 per million tokens. Now, OpenAI has completely removed the 4.5 pricing from their pricing page. But if I look at the way back machine here, we can see from last month, GPT 4.5 with 128,000 context length was $75 per million tokens and $150 per million tokens output. So yeah, when you compare the pricing of...

of how much it costs to use 4.5 versus how much it costs to use 4.1, this is the real reason it's getting replaced right now. But OpenAI released even more models this week, and these ones are available inside of ChatGPT. They also released 03 and 04 Mini.

These are thinking models, models where when you ask it a question, it starts to think through the problem after you give it the prompt before giving you a response, resulting in a slower response, but also a much more accurate. sort of polished response. Now, one thing OpenAI doesn't seem to do is actually compare their models to models that aren't also OpenAI models. Like they're not comparing these to Gemini 2.5 or Anthropix Claude 3.7, or even Meta's Llama 4. They're only comparing them to their own past models. But we can see that when it comes to math, it's pretty impressive and it gets even more impressive, which I'll show you in a minute.

But their 03... without any tools, which I'll also talk about in a second, scored an 88.9% on this competition math benchmark here. And 04 Mini with no tools scored a 92.7%. These new models also appear to be pretty good at problem solving and coding, as well as with agentic tool use. Now here's where things get really interesting about these models specifically.

Like I mentioned, they're thinking models. So you give it a prompt. It actually thinks through and reasons the response before finally giving you its final output. But while it's doing that reasoning, it's actually able to analyze images as part of the reasoning, and it's able to use tools during that reasoning process before finally giving you the response. Now, from my understanding, I'm not saying I'm 100% certain on this, but from my understanding, the way things like this used to work is you would give it a prompt.

It might go and search the web for answers for you and then use what it found as part of the context. It would sort of add it to the prompt and then give you a response based on your prompt and what it found. Now it's getting a little more agentic where it will start thinking, maybe do a search for something, take that information back in, use what it found in the search or what it sees in your image as part of its thinking process, and then possibly even go do another follow-up search or.

another look at the image to clarify something. We can see here for the first time these models can integrate images directly into their chain of thought. They don't just see an image, they think with it. This unlocks a new class of problem solving that blends visual and textual reasoning reflected in their state-of-the-art performance across multimodal benchmarks.

They can even transform the images as part of the reasoning process, things like rotating, zooming, etc. And then we can see what it says here about agentic tool use. O3 and O4 mini have full access to the tools within ChatGPT.

as well as your own custom tools via function calling in the API. The models are trained to reason about how to solve problems, choosing when and how to use tools to produce detailed and thoughtful answers in the right output formats quickly. For example, a user might ask, how will summer energy usage in California compare to last year?

The model can search the web for public utility data, write Python code to build a forecast, generate a graph or image, and explain the key factors behind the prediction. chaining together multiple tool calls. And all of this is happening during that thinking phase.

And then, like I mentioned, they can search the web multiple times with the help of search providers, look at results, and then try new searches if they need more info during that thinking process. That's what makes these really powerful. 03 and 04 mini seem pretty powerful on their own based on the benchmarks we were just looking at.

But when you add the ability to search the web and use various tools and write... Python code during the chain of thought thinking these things become insanely intelligent. They also appear to be getting quite a bit smarter while maintaining relatively the same cost to actually use.

But there are some additional benchmarks that they showed inside of their live stream when they announced this that aren't actually in the blog post that I think show how impressive the actual tool use addition to the thinking really is. So here's some of the same benchmarks they had in their blog post, but with some additional info added. We can see here that this now adds 03 with Python and 04 mini with Python. And on this competition math from 04, 03 with Python scored 95.2% and 04 with Python scored 98.7%.

And on this 2025 competition math, 03 with Python scored 98.4 and 04 mini with Python scored 99.5. They basically aced it. Now, I've been playing around with O3 for the last couple days and have been really, really impressed with it, honestly.

To me, it actually seems better than using one of the other models with search or deep research turned on because it kind of does that for you during its thinking process. I was having a discussion with it about the ethics of the training data that some of these models are trained on. And if we look at its thinking process here.

We can see how it thinks, but then we can also see that it searched the web for a bunch of resources to sort of add to its thoughts on it. I also make a weekly roundup post on X where I share all of the news over there that I'm going to talk about in today's video. And I wanted to clean up all of this for me.

I found all the news and made a big list of these are the news articles that I want to share. I asked it. It cleaned up for me and you can see it gave me a very cleaned up list of the news articles. But when we look at the thinking process here, we can see it think through.

It did some analysis using tools here, thought some more, did some more analysis. I asked it to make sure that it was all a specific character count. So it started counting the characters in each of these titles here.

And during this thinking process, it kept going back and using these tools. I believe it's using Python here. And yes, it's a little slower.

We can see it thought for two minutes and nine seconds before finally giving me a response. But I mean, look at the depth of what it thought about and actually use tools during the thinking process to make sure that it was sort of fulfilling on what I asked it to do. And then it finally arranged them into a logical order, grouping open AI news together, followed by Anthropic and Google news.

You can see it actually was using Python to figure out the order in which to arrange them. And then. finally gave me this output with my list of news articles that I wanted to share.

It's really, really powerful. And while we're on the topic of 03 and these 04 models, this was an interesting article that came out in the information. Apparently these new models are actually able to come up with novel ideas. From this article, it says, if the upcoming models dubbed 03 and 04 mini perform the way their early testers say they do, the technology might soon come up with novel ideas for AI customers. on how to tackle problems such as designing or discovering new types of materials or drugs.

The ability of OpenAI's upcoming models to synthesize new ideas represents one of several pillars the firm is developing to match or outperform humans at most economically valuable work, otherwise known as AGI. A lot of people are claiming this is just one giant step closer to AGI. Now, this article actually came out a day or two before the 03-04 announcements, so they word it as soon-to-be-released reasoning models.

But it says what makes these models unique is that they can compute an answer or suggest an idea using information from multiple fields. Physics and engineering, for instance, simultaneously. Most scientists must collaborate with experts from other fields to come up with similar answers or ideas. So in that sense, the AI aims to resemble the kind of inventors who blend information from multiple fields, such as Nikola Tesla and Richard Feynman, whose knowledge in physics, engineering and math drove discoveries in electrical devices and quantum mechanics. I'm personally not aware yet of any novel drugs or materials that have come out of people using 03 or 04, but apparently they're either capable or very close to being capable of doing that.

Another interesting revelation that's come out of these new models is that they're really, really good at figuring out where a location is from an image, like to the point where it's getting really, really scary. We can see Ethan Mollick here posted on X. The geoguessing power of O3 is a really good sample of its agentic abilities. Between its smart guessing and its ability to zoom into images, do web searches, and read text, the results can be very freaky.

He showed off a couple examples here, and we can see it got it pretty dang accurate all the way down to the latitude and longitude coordinates. Here's another example where it did the same, and here's one that Swax shared on X. They gave it this image here and said, can you guess where in the world it is? And we can actually see.

sort of how it figured it out. So it noticed yellow on black license plates, tall wooden colonial houses with louvered shutters, left-hand drive cars, but traffic keeps to the right. I don't even know how they can figure that out from this image. Language on the shop fascia looks like a Latin alphabet business name rather than Spanish or Portuguese and low-lying sandy lot and drainage puddles. They guessed it was Paramaribo Serename.

I'm not sure how to pronounce that. which was apparently correct. Along with these other announcements this week, OpenAI also rolled out their Codex CLI command line interface, which is an open source coding tool that can be used directly inside of your terminal.

I haven't personally played with this one, but it's a coding agent that you can open up in the terminal, give it commands, and it sort of agentically works through the steps to help you code up something. But apparently OpenAI has bigger ambitions when it comes to AI coding, because rumor has it that they're in talks right now to buy Windsurf. for about $3 billion.

So Windsurf is an AI coding agent interface. I believe it's a fork of Visual Studio Code with a whole bunch of extra AI agentic features that were sort of bolted into it. If you've used Cursor, Windsurf is very similar, and apparently OpenAI wants to buy them. It also turns out that Windsurf wasn't necessarily their first choice. They also looked at buying Cursor, but Cursor is...

seemingly in the process of raising funds at a valuation of 10 billion so i'm guessing they turned to windsurf because it's more affordable i'm a big believer that everybody should have a website even if it's just to share your social media links and any current projects you're working on that's why For today's video, I partnered with Hostinger. Hostinger is a no-code, AI-powered website builder and hosting company, so you don't have to worry about any of the technical details to get a website online. If you head on over to hostinger.com slash mattwolff, you'll see that you already get 75% off, but if we come down and we select the business website builder, this is the one that I recommend because it's got all the AI tools, and we choose this plan, we can use the coupon code mattwolff to get an additional 10% off.

off. Once you're logged in, head on over to the websites button, click on add new website, and then select the Hostinger website builder. This is the one that's going to do the whole thing for you with AI.

And we just tell it what we want and it's going to go build it for us. I'll call this one Matt Wolf's personal hub and give it the description. This is Matt Wolf's personal website.

It's where he'll link to all his social media profiles, YouTube channel, and various projects that he's working on. Then I'll click create website and let Hostinger's AI do the work for me. In a matter of seconds, I have a pre-built website with the title of the website, my various projects, and of course links to all my various social profiles.

Now I just customize it however I want. Let's go ahead and change this background section. I'll toss in this cheesy image from a photo shoot. And now I've got the bones to a website that I can build out however I want. But check this out.

Over on the left side, they have a ton of AI tools to make it even easier, like an image generator, writer, page generator, and so much more. If I want to change the image associated with my current work, I can select this image, click generate image, and let's do something silly, like a wolf holding up a video camera with a computer in the background, and let's swap out the current work image. Why not? Again, Hostinger makes it super simple to get a website online within minutes. You can learn more by going to hostinger.com slash mattwolff and using the coupon code mattwolff.

at checkout for an extra 10% off. Thank you so much to Hostinger for sponsoring this video. It also looks like we're gonna be getting even more announcements out of OpenAI in the coming weeks. Sam Altman took the X to say we expect to release O3 Pro to the Pro tier in a few weeks.

So if you're on that $200 a month plan, just based on how impressive O3 is, I imagine O3 Pro is gonna be kind of insane. And in another rumor circulating around OpenAI, apparently they're working on an X-like social media network. We don't have a lot of details on this. Apparently, Sam Altman has been privately asking people for feedback on the idea.

Who really knows if this is going to pan out or not, but it is something that a lot of people are talking about is potentially going to happen. And finally, in the last bit of news about OpenAI, they did roll out another sort of quality of life update for the image generation aspect of ChatGPT. If you log into your ChatGPT account, they added a new library button over here on the left menu. If you click on it.

you can actually see all of the various AI images that you've generated inside of ChatGPT. If you click on an image, you can actually quickly scroll through various images that you've generated. Microsoft had an announcement this week. They announced that they're going to be rolling out a computer use feature directly inside of Microsoft Copilot Studio. Now, this isn't something that we have access to yet.

It was just an announcement. It's something they plan to show off more at Microsoft Build next month. From what I can tell, it's going to use OpenAI's computer use functionality to be able to take control of your computer and do things on your behalf. I'll make sure you can find the link to this announcement article below because at the bottom of the article, they do have a place where you can fill out a form if you want to be a tester of it.

We got some updates out of Google this week as well. They just rolled out a brand new large language model as well in Gemini 2.5 Flash. Now, Gemini 2.5 has probably been the most popular model recently among...

coders doing a little bit better job than even Anthropix Cloud 2.7. It's a thinking model and it had a million token context window similar to the GPT 4.1 we talked about earlier. Well, this new Gemini 2.5 Flash is a bit lighter and a bit faster of a model and it's their first fully hybrid reasoning model, meaning that developers can actually turn thinking on or off. So it still has that thinking ability, but if you need a faster response without it thinking through as much, you can turn that feature off.

off and get maybe less of a thought out response, but a quicker response. This new model is quite a bit less expensive than 04 Mini, Claude Sonnet 3.7, Grok 3, and even DeepSeek R1. Although when you do turn reasoning on, that price boosts up quite a bit and puts it more in line with 04 Mini and R1 and some of these other models.

It's also pretty on par in science and mathematics with models of a similar size. and seemingly pretty decent at code and visual reasoning and image understanding. And again, pretty on par with a lot of other models that have similar capabilities. We can see in this chart here that along the left, this is the LM arena score here.

So how users have essentially ranked these models based on how well. Well, they answered their prompts and a sort of side-by-side blind comparison. And then the bottom is the cost to use these models with the cost getting lower and lower as we go to the right. We can see that Gemini 2.5 Flash outperforms DeepSeek R1, 03 Mini High, 01 Preview, Claude 3.7 Sonnet, even with thinking Grok 2 from this like comparison standpoint on LM Arena, it's kind of beaten out everything while being roughly in line with the cost of DeepSeek R1, an openly available model. When it comes to this sort of user preference testing that LM Arena provides, Gemini seems to be blowing all the other models out of the water lately.

It always seems to be a Google model on top lately. Even when I log in right now, Gemini 2.5 Pro is still the top most upvoted model. If you go over to AI.com, dev.

This actually redirects you to Google's AI studio. I actually just learned that today. And you can test their brand new models over here for free.

If I click this dropdown here, you can see we've got the option for a Gemini 2.5 pro. And now as of April 17th, Gemini 2.5 flash. And when we go to use this model, you can see we have a toggle to turn thinking on or off. We can also turn on tools like structured output, code execution, function calling, and grounding with Google search.

and use it in the same way you'd use any other chat bot like chat GPT or clot or whatever else you use. Let's give it this old prompt here and we can see it thought about it for two seconds and then got the right answer. It even shows us our token count down here at the bottom showing that we can go all the way up to 1 million tokens within this conversation that we're having.

Greatest sentence where every single word ends in the letter A and has exactly 10 words. This used a couple thousand more tokens. because it does use tokens as it does this whole thinking process here. And after about 18 seconds told me that it's nearly impossible, but it did give me a list of 10 words that end in a. Again, this platform is totally free to use.

You just go over to AI.dev and you can test out all the various Google models and it even saves your history of past chats and you can jump back into old conversations. Google also showed off something that's pretty interesting this week called Dolphin Gemma. It's a large language model.

that's helping scientists study how dolphins communicate and hopefully find out what they're saying. Dolphin Gemma is a foundational AI model trained to learn the structure of dolphin vocalizations and generate novel dolphin-like sound sequences. This approach in the quest for interspecies communication pushes the boundaries of AI and our potential connection with the marine world.

They're also making this an open model. It's part of their Gemma series of models. The Gemini models are Google's closed models.

The Gemma models are their... open models, which they're making available for people to essentially build and iterate off of. Google also rolled out their VO2 into more platforms.

You can actually generate videos directly inside of Gemini and Wisk. If you are a Gemini advanced user, Gemini advanced users will have a VO2 option and they can give a text prompt to generate a video. No image option in this version yet, but if I simply give it the prompt of wolf howling at the moon. It will generate my video directly inside of Gemini and spit out the video right in line with the rest of the chat. I imagine over time you'll be able to do things like ask it to tweak certain elements of the videos and things like that.

But right now it's pretty much a text to video generator, but inside of this chat interface. If you're a developer and you want to integrate VO2, that is now available for you to do. VO2 video generation is now available for developers inside of the Gemini API. And if you're a student and you want to use some of these cool features that are available in Gemini Advanced, well, good news.

College students in the U.S. are now eligible for the best of Google AI and two terabytes of storage for free. So college students can sign up to get Gemini Advanced, Notebook LM Plus and more free of charge for this school year and next school year. Again, I will put the links in the description.

So if you are a college student and you want to take advantage of this, you'll find the link below. Claude from Anthropic got an update this week as well. they now have a research feature as well as Google workspace integration to connect your email, calendar, and documents.

The research functionality will actually connect up with your various Google services. So we can see in their example here, it says starting research and it's searching your Gmail messages, searching your calendar events, searching your Google drive, searching the web, and then it pulls information together from all of these places. This is somebody planning a trip because it has access to emails and calendar and. and web search and all that kind of stuff, it's able to accurately help them plan this trip. Now, I demo it myself, but we can see here, research is now available in early beta for Mac's team and enterprise plans.

I'm on the $20 a month plan. Mac's is the $200 a month plan. I'm not ready to subscribe to another $200 a month plan right now.

The Google Workspace integrations, however, are available to- all paid users so even the 20 month plan if you jump into clod here you can see that there's a connect apps button and you can actually connect drive calendar gmail github etc it also looks like we're about to get a voice mode for anthropics clod making anthropic pretty much the last one to roll out this kind of feature according to this article on bloomberg here anthropic is nearing the launch of a new voice assistant product for its clod chatbot nearly a year after rival open ai began rolling out a similar option for chat gpt users The new feature will be called voice mode and it could be released as soon as this month and will initially roll out on a limited basis, according to a person familiar with the matter. Limited basis probably means they're going to roll it out to like the max plan people first, which is their higher end, more expensive plan. And then it'll probably trickle down to the rest of us later. There was a couple updates out of Grok from XAI this week as well. Grok released what they're calling Grok Studio, which adds code execution and Google Drive support.

To me, Grok Studio looks very similar to OpenAI's Canvas, where it sort of pushes the chat to the side and then opens a new window on the right here. It says Grok can now generate documents, code reports, and browser games. If we head over to grok.com here and say, create a snake game that I can play in my browser and then submit it, we can see that it opens up a new code window to the right while leaving the chat over to the left.

And when it's done, it... automatically switched over to this preview mode where I've got my snake game that seemingly works like a snake game should work only with one prompt one try. Pretty much all of the models could do this with one try now though.

Grok also rolled out a memory feature. Last week OpenAI rolled out their memory feature which would remember all of the conversations you've ever had with it to provide additional context to any of the new chats you have with it. Well now that's also a feature that's available in Grok.

We can see here this X post, Grok now remembers your conversations. When you ask for recommendations or advice, you'll get personalized responses. Memories are transparent.

You can see exactly what Grok knows and choose what you want it to forget. This feature is in beta and available over on grok.com. So both this feature and that studio feature are available on grok.com, but don't appear to work yet if you just click on Grok inside of X.

Moving over to AI video news. This week, Kling launched their new two point. version of their AI video generation model.

In this 2.0 model, Kling AI officially introduces a new interactive concept for AI video generation, multimodal visual language. This concept enables users to efficiently convey complex, multidimensional creative ideas such as identity, appearance, style, scenes, actions, expressions, and camera movements directly to AI by integrating multimodal information like image references and video clips. We can see it's got better adherence to actions. A man starts off smiling happily, then suddenly becomes furious, slams the table and stands up. On the left is Kling 1.6.

On the right, we see the new model. Better adherence to camera movements. We can see the camera follow this bee around.

Better adherence to sequential actions where it follows the apple. Enhanced dynamics. More natural range of motion.

More realistic movement speeds. Cinematic visuals. Style consistency. Dramatic expressions.

So it seems to be quite a big upgrade here. And I've seen some really impressive videos coming out of Kling. For example, this video of a jet flying is just insane. It goes to the cockpit with the pilot and then goes to the first person view of the jet and then a front view of the jet.

And this was all done with this new Kling model. Really, really, really impressive. This one came from Isaac Rodriguez here. I found it on X.

Here's another really impressive one from PJ Ace here of like some Native Americans riding on some horses through a desert and holding up fire. And I mean, the color and everything just looks really, really amazing on these, like really super impressive. This whole video is a minute 36, but I will make sure it's linked up.

So if you want to check out the whole thing, it's it's quite cool. This is probably my favorite generation I've seen so far from Blaine Brown here where they took this shot from Titanic. but then he threw her off the boat at the end.

And then here's a thread from Elcene here where they shared a whole bunch of different things it can do. For example, Kling can now swap out any actor from any film scene. So they swapped out Adam Scott with Taylor Swift here in Severance. This one appears to be Tom Cruise swapped out with somebody. I think maybe Taylor Swift.

I'm not 100% sure. Here's another Severance one swapped out with Gal Gadot from Wonder Woman. And just this ability to take an existing scene and swap it out with another person is really quite impressive. Apparently, it's also really good at conveying emotions.

Pierrick here shared facial expressions are really well handled in Kling 2.0's text-to-video model. And we can see the emotion really, really show through in the face here. Since we're talking about emotion, here's a new model that I just came across earlier called Arcads.ai.

And this one introduced gesture control. So you can actually prompt AI actors to generate specific looks like crying, laughing, and more. And here's some examples they shared. Here's one of like a podcaster showing some excitement or shock while on a podcast.

Here's one where somebody is laughing. This was AI generated right here. Here's another one where somebody actually is celebrating. They uploaded their avatar here and told... that avatar to celebrate.

Here's one where they do a heart sign. Here's another one where the actor is sad. Again, these are all AI generated and the emotion they're showing is prompted emotion.

Here's one where the actor is pointing to the top of the screen. We can't tell what's real anymore. Now it looks like it's specifically designed for creating ads. It's called arcades.

What's interesting here is in their FAQ, it says, are the actors real or AI actors? And it says they are AI actors. You don't need to wait for them to get back to you. However, they are based on the image of real actors. So you do need to be careful with what you make them say.

So they're not AI generated images of actors. They're actually real images of real actors, but you could use AI to make them say anything or do various gestures. Now, I haven't tested this one yet myself because well, the lowest plan is $110 a month for 10 videos. And I really, really don't like trying tools like this that don't at least give me like.

one free trial before they actually require me to pay to use it more. I've talked about this quite a bit in past videos. It's sort of a pet peeve of mine. If you want to charge somebody a hundred dollars a month to use your AI tool, let them try it at least once first before forcing them to pay to even get like a test drive.

But nonetheless, the technology looks pretty cool. And if you guys want to see me test this and you think it would make a good video, maybe I'll shell out the bucks to actually try it in a future video. Luma Dream Machine rolled out a new feature this week as well. In fact, it's dropping the day this video drops. And that's the ability to actually adjust camera angles on the videos that you generate.

So down inside of your prompt box inside of Luma, there's now a little camera icon. If you click on it, you can see you've got options for static, handheld, zoom in, zoom out, pan left, pan right, tilt up, tilt down, push in, pull out, truck left, et cetera, et cetera. Tons and tons of different camera angles and shots.

types that you can get. So if I do point of view here and then let's do, since we already saw a video that did it earlier, looking out the cockpit of a fighter jet and let's see what this generates. And it took a few minutes, but here's what we got out of it. A first person view looking out of a fighter jet.

I mean, I think there's a few more dials there that might actually be on a fighter jet, but we got a first person view like we anticipated. This was my very first test using this. So better prompts and more testing.

I'm sure you can get way better outputs. But again, a lot here to play with. All right, I have just a few more like rapid fire things that I want to share with you. This tool, Crisp, which has historically been known for sort of removing background noise and making your audio sound better, now has a feature that actually helps people remove accents.

So call centers can be able to use it to make it sound like somebody in India has a Texas accent. Here's an example that they shared. Crisp accent conversion comes in.

Let me show you the voice preservation mode in action. This mode keeps my original voice intact while softening challenging parts of my accent. So now if you get a sales call, you'll never actually know if it's coming from the same country as you're currently in.

Netflix is testing a new AI search engine to help better recommend shows and movies to you. And the new search engine is actually open AI powered. Apparently it's going to let users search for shows based on more specific terms.

Things like the subscriber's mood, for example. Apparently it's rolling out in Australia and New Zealand first, and it's only available on iOS devices, but it's going to expand soon to more markets, including the US pretty soon. Apparently it's also Tim Cook's mission in life to bring true AR glasses to the world before Meta does.

Meta showed off their Orion glasses. They also have their Ray-Ban Metas, and there's been talk that a new version of the Ray-Ban Metas are going to come out with like a heads up display on them. The rumor is that Tim Cook. wants to get something out into the world with like a similar form factor before meta actually does and finally along the same lines during some ted talks google showed off their new glasses these are glasses that i actually got to demo in person at google deep mine in london a few months ago and they are really really impressive so check out this demo as may have noticed i snuck a peek back at the shelf a moment ago i wasn't paying attention but let's see if gemini was hey you Did you happen to catch the title of the white book that was on the shelf behind me?

The white book is Atomic Habits by James Clear. That is absolutely right. Absolutely right.

I keep losing my hotel key card. Do you know where I last left the card? The hotel key card is to the right of the music record.

Great. This is my first time in Vancouver and I love going on walks. So why don't you navigate me to a park nearby with views of the ocean? Okay.

I am starting navigation to Lighthouse Park, which has magnificent views of the Pacific Ocean. Is there anything else I can assist you with? Honestly, with these directions and a 3D map, I should be all set and hopefully I won't look like a tourist. Now again, I did actually try these glasses on.

I was kind of sworn to secrecy. I wasn't allowed to talk too much about them, but they also have things like translation. So when somebody's talking to you, it will automatically translate it back. You can look at signs that are in different languages and it will tell you what it says in English. And there is that little heads up display you saw in that video where it showed the little map on the screen.

Well... that can also have text. So if you're looking to translate something, it can put the text right in front of your eyes as well. It was really, really impressive technology.

Again, it's a form factor that people would actually wanna wear, similar to the Ray-Ban Metas. I don't know exactly when these are gonna roll out to the world. I do know that Google I.O. is coming up next month. So hopefully we have more information about when these glasses are possibly gonna be able to see the light of day in public.

But that's what I got for you today. Hopefully you feel more looped in with what's been going on in the world of AI this week. If you want to stay looped in, make sure you like this video and subscribe to this channel.

I'm constantly putting out new videos that share all the latest updates in the AI world, as well as tutorials. I have some interviews in the works that I'm really excited to share as well with various AI experts and CEOs and thought leaders. So really excited to share that with you.

So make sure you subscribe to the channel for more stuff like that. And if you haven't already, check out future tools.io. This is the website where I curate.

all of the cool AI tools that I come across. I share news on a daily basis. So if you want to stay up to the minute with all the latest AI news, it's all here.

And I have a free AI newsletter where I'll email you just twice a week with all of the most important AI news and just the coolest tools that I come across. It's totally free. And if you sign up, I'll hook you up with access to the AI income database, a little database that I've been building of cool ways to generate side income using these various AI tools. Again, All free.

You can find it over at future tools.io. Thank you so much for tuning in. Thank you to Hostinger for sponsoring this one.

I really, really appreciate you guys. And it's been fun nerding out this week. And hopefully I will see you in the next one.

Bye-bye.

Transcript for:AI Developments and Innovations - April 2025

Transcript for:
AI Developments and Innovations - April 2025