Recent Advances in AI Technology

This week saw an impressive new image generator, an AI assistant in your pocket, and fighting AI robots. I don't want to waste your time, so let's go ahead and dive right in. Starting with the new Flux One context model from Black Forest Labs. This is a really impressive new AI image generation model that lets you do very similar things to what you can do with that chat GPT image generator model. You can upload images and tell it to just tweak little items in the image. However, because it's using Flux under the hood, it's way more realistic. So, imagine the realism that you get with the Flux image model combined with the sort of customizability that you get with the chat GPT model and you have this Flux one context model. We can see some examples on their announcement page here where they give it this image of I guess a seagull in a VR headset. But then they say the bird is now sitting in a bar and enjoying a beer and it took this image and generated that same bird in a bar with a beer. There are now two of these birds. Watch them from behind. The two bird characters are now sitting in a movie theater. The two bird characters are now grocery shopping. and it took this original image and was able to put that bird in all of these different scenarios just through a text prompt. Here's another example where they gave it this input image on the left here and in the middle they gave it the prompt tilt her head towards the camera and we got this middle image and then on the right make her laugh and we got this image and it got all of that from just this initial first image. It also seems to be very impressive when it comes to text. You had me at beer. You had me at context. Then they changed it to a completely different scene. And this is available to use right now. In fact, Black Forest Labs set up the Flux Playground where you can play with it over at playground.bffl.ai. And we have a handful of options here. We can give it a text prompt to generate an image. We can give it an image prompt to edit the image. And we can choose from a variety of models. Let's just go ahead and leave it on the pro model. That's going to give us the best output. Just for fun, let's give it a prompt, a wolf howling at the moon, and generate it. Get an idea of how quick it is with a text to image prompt. And it took roughly 10 seconds or so to give us some fairly decent images. I could select one of these images, make it larger, and then click on edit down in the bottom, and give it a new prompt, make the moon red. And a few seconds later, we get essentially the same image, but now with a red moon. Let's jump back to our initial window here. And this time, let's upload an image. I'll upload this actual real image of myself here and tell it to give me a ball cap and aviator sunglasses. Let's go ahead and submit that. And now we have an image of me wearing aviators in a ball cap that well actually looks like me. Let's go ahead and edit this one here and say put me on Mars. And now here's the images that look like me, but now I'm on Mars. And it's really quick, too. So, a pretty solid new image generation model that gives us all that customizability that we got in chat gpt recently. We can even do the whole Gibli thing if we want. Let's edit this image. Give it the prompt make this in the style of Studio Giblly. And as expected, we get a Gibli image. And along with this roll out of this new model, it's actually been made available in pretty much all of the AI image platforms, including my favorite AI image platform, which is Leonardo AI. Now, full disclosure, I am an adviser to Leonardo, but it is the image generation platform I do use the most myself. And this week, not only did they add the new Flux One context model, but they also added the GPT image model. So, you can actually choose which one you like better and use it directly inside of Leonardo. Now, if I log into Leonardo, click on image here at the top and then under our model presets in the top left, you can see we now have GPT image one as an option and flux one context as an option. So, let's select flux one context here. Give it a prompt like a monkey on roller skates and like 6 seconds later, we have our monkey on roller skates. And one of the awesome things about Leonardo is we could turn any of these things quickly into a video as well. So, if I just click this little generate video button, give it our same prompt and set our options. Leonardo also just recently rolled out their new motion 2.0 model and motion control. So, you can do things like crane overhead, crane up or zoom in, crane down, dolly left, etc., etc. Let's go ahead and do orbit left. And we'll get our monkey on roller skates with an orbit around video. And we get our orbiting shot of a monkey on roller skates, which I had to blur a little bit because it was a little too anatomically correct. And I did a few examples here, like me at a disco party and also another one of me at a disco party. Here's me flying on a dragon. So really, really fun stuff that you can do with this, especially now that you can even train your own likeness in and then turn your own likeness into little fun animations. Pretty cool stuff. The company Tencent rolled out a new video avatar model called Hunan Video Avatar. And it's a model where you can upload an image or an image and audio and actually turn that into a talking video. So you can upload an image and type out some text and it will turn that text into speech and then animate the video to lip-sync that speech. Or you can upload the image and an audio file and it will lip-sync it to the audio file. It even works with multiple characters. Here's some examples. Come on. It's time to go. I booked the table for 8 and I'm not sure exactly where the restaurant is. Hey Ally, relax. This isn't work. This is a night out. Now, the lip sync isn't perfect, but for a free open-source model, it's not bad. Now, this model is available over on GitHub and Hugging Face, and they even set up a site where you can try it for free. So, let's go ahead and demo it real quick. So, I gave it the text, "Everyone needs to subscribe to Matt Wolf." I set the speed selection to 1x. And then for tone selection, we got some interesting options here. Uh, gentle and quiet, temperamental and graceful. I decided, let's go with milk dog brother. And I gave it this. This is my profile pick on all of my social media accounts here. And this was the output. Everyone needs to subscribe to Matt Wolf. Everyone needs to subscribe to Matt Wolf. I mean, it's free and open source. Now again, it's really slow, so hopefully more inference providers will make this available. And if you've got a decent GPU, you should be able to download it and run it locally if you'd like. As always, I'll make sure to link up every single resource I talk about in the video down in the description below. If you use the Claude app on your phone, they just rolled out a new voice mode that's actually pretty decent, especially since Claude can link to your Google Drive, your Gmail, your calendar, and things like that. It's actually a pretty good use of it. So, let's go ahead and demo that real quick. So, when you open up your Cloud app, you should actually now see a little circle with like some waveforms in it. And if you press that button, it will now open up the new voice assistant. And one thing that I really like about this new voice assistant is that it can leverage your Google Drive, your Gmail, your calendar, things like that, and actually act as like a decent assistant. kind of like what people sort of expected Siri to do for you. So, check this out. What do I have coming up next week? Let me check your calendar for next week. You have a busy week ahead with several meetings and events. Monday starts with your weekly kickoff with Dave at 11:00 a.m. followed by a Callum preall and a smooth creator sync. Is there a particular day you'd like more details about? No, that's good. Are there any urgent emails that I need to attend to? Let me check your recent emails for anything urgent. I found mostly promotional emails and partnership requests in your recent unread messages. One potential opportunity stands out. An invitation from the Maximum Lawyer podcast, which looks more legitimate than the others. Would you like me to check for any specific business emails? There's also a handful of voices to choose from. So, if I click the settings button up in the top right, you'll hear some additional voices to choose from. Betty bought a bit of butter, but amid the mists and coldest frosts with stout Fuzzy Bear was wuzzy. Fuzzy wuzzy head perfect needs practice. I'm perfectly practiced and the shells she sells are surely from the So if you use the Claude app as your chatbot of choice, it's got some new sort of assistant features which are pretty cool. We got some news out of Perplexity this week as well. They just introduced Perplexity Labs. They describe it like this. Perplexity Lab is for anyone who wants to bring an entire idea to life. Labs can craft anything from reports and spreadsheets to dashboards and simple web apps all backed by extensive research and analysis often performing 10 minutes or more of self-supervised work. So this is a very agentic feature from Perplexity where you go and give it a task and it will go away and just work on that task for like 10 minutes. So, some of the examples they shared of what you can do with Perplexity Labs include things like visualize the Formula 1 Emola GP qualifying times for 2025 versus 2024 by team including all teams, which teams got faster/s slower and why. We can see it actually generated this whole leaderboard here that you can take a look at along with live commentary. It created this chart here and all of this indepth detailed overview and analysis based on this question. They also gave it the prompt, we're a tech consulting firm in Genai field. Please create a potential customer list for us. The target companies are B2B, American companies, etc., etc. You can go ahead and pause and read that prompt. And we can see that it generated this whole dashboard here for them. Let's view it in full screen. So, yeah, Perplexity generated this for them directly inside of Perplexity with all of these companies and their details. This is pretty impressive. Develop a short sci-fi film concept and in noir style about a 30-year-old female scientist living on Mars in the future when a calamity hits create nine storyboards and a full screenplay. So, we've got like a sort of cover image here. More images throughout. I guess this would be like all the various storyboard images for the whole thing. It generated all of this directly inside of Perplexity along with, you know, thematic analysis and production considerations, all sorts of cool stuff directly inside of Perplexity. Pretty wild. Now, if you're a Perplexity Pro member, so you're actually paying for Perplexity. If you log into Perplexity.ai, you'll see this new button here that says labs. And if we click on this, we can give it our prompt and get our sort of lab response. But again, it takes 10 minutes to process these, so I'm not going to run one right now. We just looked at a handful of examples already. And since we're speaking of agents, the company Factory AI introduced their new Droids feature this week. This is a software development agent where you tell it what you want it to do inside of your software, either starting from scratch, building a whole new software, or going off and fixing bugs and it will just autonomously go and work on it and fix the problem or build the thing you told it to. Unlike cursor and wind surf where you kind of give it a task and fix one little thing at a time, this one it can just do big projects autonomously behind the scenes and just keep on running. This is a company that I was actually so impressed with that I actually made a very very very very small investment in them. But I was very very impressed with it and thought it was super cool to show off what it can do. We recently had Matan the CEO on the Next Wave podcast and he gave Nathan a demo where they continued on with their conversation and it just wrote a software in the background for them behind the scenes. So we can see in the demo here that they gave that's actually just going and building all this stuff while they're just having a conversation. So, nobody's actually typing anything. This is just sort of running in the background, building a software for them from scratch. And at the end of the video, if we fast forward it here, the whole time they were talking, it was building this and it actually made like a docuign clone that they tested on the podcast and it actually worked. So, let me fast forward a little bit here. We can see that it has a login. They log into the page. They pull in a PDF. They actually add a signature box where you can sign directly inside of the PDF and once they signed it shows that one completed document was signed and this whole thing was built autonomously behind the scenes. They set up the prompt in the beginning went on with their conversation towards the end of the episode jumped back to see what it completed and it built this whole docyign clone app entirely autonomously in the background. Super super impressive. Definitely check out the full podcast if you haven't already. Factory AI is honestly a really impressive coding app, but again, I'm a little biased because I did actually make a small investment in it. Have you ever been in the zone vibe coding your next big idea only to be derailed by confusing error messages or unexpected bugs? It's like hitting a wall just when things are getting good. And that's why for this video, I partnered with Code Rabbit. Code Rabbit now brings smart AI powered code reviews right into your favorite code editors like VS Code, Cursor, and Windsurf. No more switching between tools or waiting for someone else to check your work. As you code, Code Rabbit watches out for bugs, security issues, refactoring suggestions, or anything that might trip you up later. It offers instant suggestions and even one-click fixes, helping you keep your momentum and stay in that flow. Whether you're building a personal project or the next big thing, Code Rabbit acts like a senior developer, ensuring your code is clean and functional without getting in your way. And the best part, it's free to use directly in your editor with some limits on usage. So if you're ready to code with confidence and fewer interruptions, check out Code Rabbit. There's a link in the description to get started. Now, let's get back to it. All right, now I'm going to move into the rapid fire section where I'm going to break down some of the smaller but still kind of important news that you might have missed this week. Let's start with the fact that VO3 got an update this week and it's now available in 71 new countries. When they originally announced it last week, they announced it in the US. Now it's available in a lot more countries. They also changed up the pricing and how many generations you get a little bit. So if you're a pro subscriber, you get 10 generations. That's total. That's not monthly. You just get 10 generations to try it. If you're on their ultra plan, their $250 a month plan, well, you get max limits with daily refresh. I think the max limits might be like a sliding scale cuz they don't say an exact number. And for their pro subscribers get 10 generations a month. Ultra subscribers now get 125 generations a month, which is up from 83 generations. And since we're talking about V3, we should mention this one. It's kind of funny. People are saying that VO3 is hitting its Pope in the puffer jacket moment, right? When Midjourney had that image of a Pope in a puffer jacket that seemed to fool everybody on social media. Well, V3 seems to have done the same thing. This video here has been kind of going viral all over the internet that shows a woman trying to get her emotional support kangaroo onto a flight with her and the video zooms in on the kangaroo and just uh you know it's a pretty funny video and a lot of people really thought this was real. Um but it's not. This was generated with VO3 and seemingly fooled just everyone that saw it. So much so that the original post actually got community noted and it looks like it was now taken down. We got a small update out of OpenAI this week. Their operator tool which goes and actually browses the web and takes actions for you. It got a small update. It's now using OpenAI's 03 model. Before I think it was using their 40 model. I'm not 100% sure. I don't remember. But now it's using 03. In some other interesting OpenAI news, OpenAI's 03 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed, allow yourself to be shut down. So yeah, some of these AI bottles are getting kind of scary. When you ask it to stop doing what it's doing, they will sometimes refuse. Now, I'm sorry, Dave. I'm afraid I can't do that. In more agent news, the company Manis introduced Manis slides, which is a feature of Manis that will go and basically create slide decks for you autonomously. Manis creates stunning structured presentations instantly with a single prompt. Manis generates entire slide decks tailored to your needs. Whether you're presenting in a boardroom, a classroom, or online, Manis ensures your message lands. Want edits? Just click and adjust. Once you are done, export or share it with colleagues and peers. And we can see from their demo here that it actually creates pretty solid looking slides. Here's an example of a syllabus, algorithmic principle slides with charts and graphs and things, course overview slides, more like PowerPoint presentation-y looking slides. Interesting that it put the Apple logo down here, but very keynotey looking presentations here. So yeah, I mean pretty impressive looking slides that it can generate. I haven't tested this one myself yet, but it's apparently already available over on Manis. The Opera browser is getting into the agent game here with their new Opera Neon browser. They call it a browser for the agentic web. It can browse with you or for you, take action, and help you get things done. This one's weight list only right now. So, I got myself on the wait list for this. If I get early access or normal access, I will make sure I talk about it in a future video and show off what it's actually capable of. The company Mistral launched their Mistral agents API this week. So, if you're a coder and you're trying to build agents, maybe this agents API is something to look into. It's got built-in connectors for code execution, web search, image generation, and MCP tools, persistent memory across conversations, and agentic orchestration capabilities. The model Deepseek R1, which sort of freaked out the world of AI a couple months ago, just rolled out a new version called Deepseek R1 528 or, you know, May 28th version with improved benchmark performances, enhanced front-end capabilities, reduced hallucinations, and it now supports JSON output and function calling. Here's a screenshot of the benchmarks if you want to go ahead and pause and take a closer look at what it's capable of. Here's an interesting story from the past couple weeks. The Dolingo CEO recently said that they're going AI first and sort of implied that over time their company was going to use less and less employees and more and more AI. Well, they've recently rolled back those statements saying that their employees are important and they don't plan on slowing down on hiring them. But when they initially made these statements, something interesting happened. The employees didn't take kindly to it and actually took all of their social media platforms dark. On May 17th, 2025, Duolingo executed a dramatic social media blackout, completely scrubbing its Instagram and Tik Tok accounts while leaving only cryptic messages like gone for now 123 with dead roses and eye emojis in the bio and realize realize realize as a final post on X. Now, Duolingo's been shifting into AI for quite a while now. I mean, they've been rolling out more and more AI features into their platform, but I guess saying that actually replacing employees with AI was a last straw for a lot of employees, which is kind of understandable. Although, I kind of wonder what sort of legal ramifications going and screwing with their social media accounts would have on the employees that screwed with them. All right, a couple last things before we wrap this one up. This week, the company Odyssey ML released an interactive video platform which you can actually play with right now and it's really interesting. If you go to experience.dysseys.world, you can actually jump into this platform and every single scene you're seeing is generated. So, I'm using the wd buttons and I can move around and look around and every single frame is generated in real time. So every step forward I move, that's a new generated frame. And you can play around with this and explore it for up to about 2 and 1/2 minutes. But what's really interesting is see this world channel slider at the bottom. If I slide this around, it actually generates new worlds for me to jump into. Now they even claim themsel it's still kind of janky and buggy. And obviously the graphics aren't great at all. very very low resolution. Kind of hard to make out, but still really trippy. I don't know what we're gonna do with this yet, but it's really interesting that we can do it. Some news out of China this week. They're building a constellation of AI supercomputers in space, and they just launched the first pieces. China launched the first cluster of satellites for a planned AI supercomputer array. The first of its kind array will enable scientists perform inorbit data processing. The satellites will use the cold vacuum of space as a natural cooling system while they crunch data with combined computing capacity of a thousand operations per second. Now, where have I seen them putting AI in satellites before? And finally, in other AI news out of China, China staged the first robotic kickboxing match. And yeah, I mean, it's exactly what it sounds like. [Applause] If you want to see the full video, again, I'll make sure it is linked up in the description below so you can see what it looks like when robots go toe-to-toe in the ring. And that's what I got for you today. Thank you so much for tuning in. I do want to share something really cool that I've been experimenting with that I think you're going to like. I just put out a new video on the channel called AI History. This is the moment AI growth exploded. The title might be a little bit different because we're constantly testing new titles, but check this video out. It's a completely different style from what I usually create, and it breaks down the entire AI timeline from 1950 all the way to today and how we got here and all the advancements. And it's a video we put a lot of time into that I really think you're going to like. So, check that one out if you haven't already and let me know in the comments on that video what you think about it because again, it's a totally new style and we're just trying things. And I'm really curious to hear what people think of that style. But again, that's what I got for you today. Thank you so much for tuning in. If you like videos like this, you want to stay looped in with all of the latest AI news, get some cool tutorials, learn about things like the history of AI, and again, just be completely looped in and have your finger on the pulse of AI. Make sure you like this video and subscribe to this channel. I will make sure more videos like this show up in your YouTube feed. And if you haven't already, check out futuretools.io. This is where I curate all of the coolest AI tools that I come across as well as keep the news page up to date on a daily basis. And there's a 100% free newsletter where just twice a week I will email you with the most important news and the coolest tools I come across. It's totally free. You can find it all over at futuretools.io. Thank you so much to Code Rabbit for sponsoring this one. Thanks again for nerding out with me. Hopefully I'll see you in the next video. Bye-bye.

Transcript for:Recent Advances in AI Technology

Transcript for:
Recent Advances in AI Technology