Key Highlights in AI Developments

this is the first week in AI where many people claim that we reached AGI and look that's a bold claim and it's definitely debatable but the fact of the matter is with the release of Ofrey from OpenAI which we'll talk about today you'll see in this very video chat GPT doing things that were unthinkable before and the crazy thing is even beyond that this was an amazing week in AI state-of-the-art video generation many more Chat GPT updates multiple other models coming out of OpenAI and so much more in this week's episode of AI News You Can Use the show that rounds up all the AI releases from this week filters by the one that you can put to work today and then we test them or have a first look at all the things that came out in Generative AI this week as you might see I'm still on the road my last week here in Japan what a country i don't want to waste too much of your time here we're here to speak about Generative AI releases but I will be using some pictures from my trip as a part of our testing here all I'll say is if you're tech enthusiast and you haven't been to Japan before oh boy does this country have a lot in store for you and it's so peaceful all right let's get into this starting with the OpenAI releases so there's multiple things here and I'm sort of going to bundle this because we had two big announcements one of them released multiple developer focused models that are not available in chat GPT this is the GPT 4.1 series these are non-thinking models again not in chat GPT only available for the API for builders and the second announcement and that is the big one this week is the release of free the full version just free that's the brain that was powering deep research and the thing that made it so damn good you can now access it through chat GPT on all Pro Plus and Teams plans coming to the education and enterprise plans in about a week from now just to make it clear this is not accessible on the free plans with this 03 release they released also all four mini models as you can see here on my pro account there's one called O4 Mini and one called O4 Mini High but in this video I really want to focus on the O3 model because I think this is the one that most people care about on this show I always try to focus on the tools that really push the bar on what's possible with AI and while O4 and 04 Mini High are impressive and perform better than O3 Mini and O3 Mini High before that naming is a mess the joke has been made a thousand times they'll fix it soon point is the one model for most users that really makes a difference this week is 03 because it's head and shoulders above anything that we had access to before except maybe deep research but this thing is way faster and just different in certain behaviors so before we get into some examples and use cases here which I find absolutely fascinating i can't wait to show you some of these let's just briefly outline some of the specs and which models you would care for based on what you're doing so as I said 03 is sort of the king here as they show in the blog post where you can see all the details of this it leads on all the benchmarks and beats everything including this 04 mini model on both settings so that's a really important takeaway if you want the highest intelligence in chat GPT as of now you pick all free on a paid plan the other release this week which was GPT 4.1 GPD4.1 mini and GPD4.1 Nano those are developer focused models that are particularly good at generating code these are only accessible through the API where you pay per request and you can't find these inside of Chat GPT and again these are developer focused models both in terms of usage and use cases they're really good at generating code and already a lot of tools are adopting GPT 4.1 under the hood like for example the sponsor of today's video which we'll talk about later which is Notion Mail they instantly switched the model that helps you manage your emails to 4.1 because it just performs better and works better so you can expect the increase in performance across all AI powered applications that used other GPT models before but these are non-thinking models so let's not even compare the benchmarks to these thinking models that are 03 and O4 mini because the thinking models are always going to crush non-thinking models so one more time GPT series non-thinking models there's a new 4.1 and smaller versions of it o3 and 04 thinking models o3 is the best one that you want to put to work today there and if you wonder how they compared to some of the competition on the benchmarks well we started putting together a table to show you all of that but in the middle of creating this table we found a website that already does this really well not affiliated just think this is a great resource and as you can see you can sort by the various benchmarks that have been published for example in GPQA 03 leads above all the competition here anyway let's get to the more interesting part here because this is a model that you'll be using and I want to focus on 03 here i think we talked about the different alternatives and releases and benchmarks enough here the thing that really matters to me and what I really try to make the point of this channel is how to democratize the access to all of these tools techniques workflows and I haven't seen a tool since deep research that does it as well as free now because this is a big point of free isn't just a thinking model it's a thinking model that has access to tools what kind of tools well all tools that Chachibd has before we saw limited releases okay when 01 preview came out you couldn't upload images you couldn't upload files you couldn't access memories it could not use the data analysis tool and so on all of these things work automatically inside of 03 now this is a massive deal and we'll see why in the examples that I have lined up here next for you but before we do that I want to share one more thing look I made it my life to stay on top of all of this test all of it build organization around it and we teach it at scale to individuals and organizations at this point not often does a release stop me in my tracks and maybe rethink the way I use these tools the way I use technology the way it integrates into my life as I said in the opener many people are stating that this is the first release that can be considered AGI and I think that statement is debatable because if you ask 100 people you're going to get 100 definitions for AGI and I'm making these videos to give you my personal take on this and the conclusion sort of is this model makes me want to rerun every prompt I have ever tried because it has the ability to surprise me with its intelligence and level of insight and while that might not be the case on every single thing you run it did happen regularly to me in my first initial look here so with that being said let's have a look at the some of the things that I tried here right away one of them just being this quick prompt from Matt Schumer on Twitter here which says "Do intensive research on Egor Pogani and give me a massive report on everything you find." Obviously you would replace my name with your name here and you could try this for yourself and what happens here is just so different from anything we've seen except maybe deep research it goes ahead it starts thinking it starts making a plan then it uses the web search tool to pull up multiple sites it pulled up my Instagram LinkedIn my ex it found the company website the company LinkedIn we're now going to be posting twice a day on our company LinkedIn go check that out if you're on that platform and it even found my SoundCloud which I haven't used in years but fair enough then it goes ahead and figures out that hey I even made a typo writing my own name embarrassing and it proceeds to think about this pull up more links and it just finds like a dozen different LinkedIn posts and other websites including some of my university etc and it comes up with something that Deep Research would always refuse a profile on me with all the details and it's quite comprehensive while not being overwhelming which sometimes is the case with deep research so the guardrails on this are lower than what we saw with deep research and it's quite impressive it pulls together many many relevant things but let me tell you there's like six or seven mistakes here that are just flat out wrong and those are not mistakes that Ofrey made those are mistakes that websites that this reference made for example it shows that my nationality is Austrian this is actually not the fact i lived in Austria for most of my life but I still have a Slovakian passport but the great thing is it gives it source right here and this website here which I don't know much about it's called business ABC it just somehow assumes that my nationality is Austrian that's why has it in its report so it's not perfect but clearly the big point here is that the guard rails are lower with deep research you could have never researched a person and some of these other insights here actually made me lean back in my term and kind of surprise me myself it made a comprehensive list of all the different paid products we offer and it kind of extrapolated the signature content themes that made me think like yeah we do we did used to do a lot of AI workflow stacks and workflow videos and do less of these these days we should do more of that they get less views but they seem to communicate more value and then the overarching theme of the channel is productivity hacks i mean we're doing this in the form of generative AI tools and workflows but that's what a lot of people here care about so I thought that was great and then it ran things like the SWAT analysis here in the end and that's really one of the big points that I see here it's not that none of this was possible before but you had to bring in the context yourself you had to know which prompts to use and then you had to take the time to actually use them to wait for the results and you think about it reprompt things like that this does all of it for you you don't need to engage a search function you don't need to give it specific websites that you wanted to consider you don't need to prompt for it to come up with a structure like this that really makes sense here it just figures all that out for you with a combination of thinking models and tool usage so obviously you could take this little prompt replace your name run this on yourself or on anybody else maybe they'll restrict this down the line but as of now this works and in deep research this never used to work this was one of the most interesting things that I wanted to try now it works in here but in the second use case that I want to share here it really shows the depth of the capability and this one was inspired by Matthew Burman on Twitter here doing something that many people on the internet are trying right now which is giving it a picture and saying figure out exactly where this person is i did it on this wholesome picture of myself from the recent Japan trip and Ofrey went off and started thinking about it what it did is it cropped into the different parts and started analyzing them in detail translating the Japanese language and interpreting some of these signs like for example here it says the crest has 16 pedals although it looks a bit stylized it zooms in onto different parts and based on the caption and the text it comes up with the conclusion that this is clearly the Maji Jingu Shinto shrine i hope I'm pronouncing that correctly that is located in Shibuyaku Tokyo a park in the middle of Tokyo and here's the clues it followed and it even states that this is the inner precinct exhibition area which is exactly right then I was like "Wow." Okay so it didn't just get the location right so it even got the exact area within the quite small shrine right impressive but it did have several clues here like the text and the logo so I wanted to give it something a bit harder i took a second picture for my script which had no text no logos matter of fact it's night and I figured there's no real clues on this one and I asked the same thing figure out exactly where this person is and after about 2 and 1/2 minutes of thinking but it started by cropping into the various parts coming up with some assumptions and various locations this could be maybe Kamakura's Hokuji Temple or the Kaji Temple in Kyoto it started running some internet searches to find more images that it could then doublech checkck with and in typical 03 fashion as you might know from deep research it didn't just find a website or two it found 11 ranging from Trip Advisor to specific travel blogs to Reddit to YouTube like yeah it pulled up a YouTube video in the process to cross reference these snippets of this image and it noticed things like hey it only shows one horizontal rail while the other shrine that I was considering typically has two but that could be a trick of the vantage point so I want to verify it and then it went ahead and looked for that specific trail that it assumed and again pulled in a bunch of different websites and analyzed images on there to cross reference this and eventually chose me multiple images that match mine and says that no this is not my first assumption but it's a small hillside bamboo walk inside Kodai temple in Kyoto's Higashyama district which is 100% correct this is not some obvious spot the main temple is at the bottom you kind of have to walk up into the gardens and then this is a small passage tucked away in the woods behind the smaller shrine that is behind the big temple and it even says the person is standing roughly halfway along the illuminated bamboo trail that leads visitors back towards the main hall of Kodai this is exactly correct and it even gives you the exact coordinates at this point I was just blown away by the accuracy of this thing and by the fact that it looked up dozens of websites looked at the images there and cross referenced them and could identify this and I told myself "Okay okay okay i got to give it something even harder how about something without a person and without a trail that people walk so I looked at my camera roll and found this little snap of this beautiful little bird in a river here and I just figured how could it know this i mean it's just a river with some rocks and a generic looking Japanese building in the back well long story short it got this right too what but I thought it was really interesting that initially it pulled up Python and wanted to extract the metadata from the image which I thought of already and all of these images didn't include that i screenshotted the original image to make sure that there's no GPS coordinates embedded in this to make this actually challenging so that didn't end up working so then again it did what it did before it made some assumptions ran some searches looked at all the websites cross referenced this and after only 43 seconds it concluded that this elegant gray heron is standing in the Shurikava Canal and then again it gave the exact coordinates and a description of where you can find this exact spots with other reference images that look similar to this and yep that's the exact location where I took this picture god damn how is it this good at this point I felt a combination of excitement but also I was kind of pissed at the tool i just figured like how can it be this good how can I break this thing so I just took out my phone at this coffee shop where I was preparing this video earlier today and I looked out and I was in a beautiful coffee shop overlooking the river of Osaka and when I looked closely on the other side of the river there were these rocks that were tied up in a net with some turtles sitting on it i shot a little video of you so you can visualize what I'm talking about right now and funnily enough when I shot the video a little raven landed there and all the turtles that were sunbathing kind of hopped off the rock that was kind of a random moment anyway with the telephoto lens on my iPhone I snapped a picture of this and I just wanted to break it at this point i just wanted to give it something that it cannot figure out and yeah when I ran the prompt after 2 and 1/2 minutes it actually got it wrong boy it figured that this is in central Tokyo which is not right this is in Osaka so then when I followed up and said it wrong not in Tokyo it did several weird things like running Python code on top of the images and look it found these signs and it tried to run some filters on top of it to extract the contrast to read what's on them but the image was just too low resolution for it to read what's on this sign uh side note if they plug in some of the top AI upscalers right now into this it might have gotten this right but after a total of 10 minutes of thinking it actually figured out that these are the sandbags that line the inner mode of Osaka Joe the Osaka Castle in Osaka Japan which is correct i mean it's not exactly next to the castle it's like a 25minute walk down the canal but just based on this low resolution image with one full prompt I got it i mean this is just crazy so I hope this example shows you the capabilities of this thing it's not just the fact that the thinking model is just better than anything we've seen before or free but it's also the fact that it has access to every tool in chat GPT and while before even some power users might not have took full advantage of the tooling in chat GPT just because you don't think of it or often because one is too lazy i mean if you run a search you kind of have to wait for it then you need to take that context prompt on top of it use the data analysis tool again that took some time it does all of that for you now and just presents you with the result so this works really well if you do something that I refer to as goal-based prompting rather than instruction based prompting so you just tell it where you want to go and it figures out how to get it that was always a strength of these thinking models yet any AI skill that you might have required over the past few years is still relevant here because if you know how to use this individual modalities you can open up this thought process and reprompt it to do specific things for you if the results are not what you want and there's still value to prompt engineering because you can do more intricate things like I started trying afterwards here but I'm still playing and I don't want to spend an hour showing you every single use case that is interesting here if I find enough I'll do a separate video as per usual but let me just say that many of these business use cases which I now teach in workshops like this meeting analyzer just work better than anything before in some cases they're just slightly better in other cases they're stunning and in other cases they're just the same as you would have got with a deep research or proper prompting but one thing remains constant and that's that the bar to get some of these advanced results that even in combination with AI would have took quite a bit of effort to achieve you can now get from a simple oneline prompt and that the guardrails on things like searching for specific people were lowered on this release just unlocks a whole new world of possibilities if that's agi or not is up for you to decide but it's certainly impressive so that's my first look at 03 and then O4 mini and 04 mini high are just versions of it that are faster and then cheaper when you use them for the API but for most people 03 will be the one that you want to be really playing with right now and then for developers the three different variations of the 4.1 models are excellent at generating code and if you need a non-thinking model that's the new go-to they will be depreciating the 4.5 in the API by the way that's also kind of interesting okay so there's two more things out of OpenAI here one of them is a mobile release which I'll just quickly cover there's a new tab called library where it just collects all of your images so if you're generating a lot of images they created this interface that is sort of more focused on that that's really nice but more importantly there was one more OpenAI release this week which is OpenAI Codex and I thought about spending more time on this but I decided to focus this video on all free cuz I think most people will get more value out of that plus this new release of Codeex CLI that they by the way fully open sourced is an Agentic product that works in the terminal and does pretty much exactly what Claude Code does which was released by Enthropic about 2 months ago and I've been using that tool regularly since its release it's really great it basically runs in your terminal so I guess you do need the very very basics of development but these are the things that with chat the assistance you can figure out in like an hour and then you can basically do things like drag and drop images into your terminal and let it build little applications for you on demand on full autopilot it's obviously the same thing that Entropic did with claude code and I think it's amazing that OpenAI decided to open source this entire program and they said there will be more coming but as of now if there's one thing you should really check out this week it's all free okay let's see what's next all right so this next one is exciting for me because if you've been following the channel for a while you'll know that I love using Notion we use it for our prompt databases and the entire operating system of the AI advantage whether that is the content production the entire organization or the community come to think of it we actually kind of use it for absolutely everything i try to avoid other apps at all costs so we have one source of truth and also one source of context for our various AI workflows and as they reached out and wanted to partner on a video I was like "Yes please." because the new feature they're releasing is actually something we're interested in ourselves and that is notion mail it's a brand new email platform that launched earlier this week and there's a couple of features that I want to highlight to you because you might just want to consider this for your workflow the basic concept is simple you just connect your Gmail to notion mail and it upgrades your regular inbox with a ton of AI powered organization and writing tools when it comes to organization they have the auto label system which is my personal favorite feature in the notion mail release and it's one of the things when you start using it you wonder how you were doing it before basically you just tell the AI that you want a new label and the AI does all the work of the labeling or you could also call it categorizing and then once the AI labels all your emails you can review the decisions and tell it if it did well basically fine-tuning its decision-making so you can rely on it moving forward and sure you have this feature in other email clients but I found this AI integration to work particularly well as they're a new product and they don't have a million existing features that the new one has to comply with and in practice it just works better next up is a feature that you've probably seen in other platforms too but it's an important one and that's when AI drafts your email responses but Notion Mail has one twist that sets it apart from the competition here and that's a fact that usually the AI writing your email drafts or the agentic app that you kind of plug into it lacks the context of well anything that matters and guess what as this is notion you can pull the context from your notion pages into the email client really simply so for example if you want to draft something and say "Hey you can find all the relevant information about the sponsorship in this page." You just type at pick the page that's it the AI has all the context from that page now obviously there's way more ways to add context here but if you already work in Notion this is the smoothest thing ever obviously so those are two but there's many more ways Notion Mail will make your email management smoother and it's completely free to use so why not give it a try today all you need to do is click the link at the top of the video description link your Gmail account and get started thanks again to Notion for sponsoring this video and now let's get back to the next piece of AI news that you can use next up we have some major updates from Canva with all of their AI integrations and I always like covering Canva because I feel like they take a lot of the trends and things that have proven to work out there and integrate them into their suite of products which just I understand it as taking complex tools for visual communications and packaging them and then making them easy to use and they already were doing a ton with AI but now during their Canva create conference they launched several new features which I would like to show you and I believe a few of these deserve our attention here so we tried everything that is available now and I think most significantly this new updated Canva AI is really close to something that one might conceptualize as a visual AI tool that creates social media content for you as you can see we just took the example of a online dog product subscription brand trying to create social media content and kind of assisted us in the process here and what I like here is that gives you various options right they have a lot of these templates already and this AI integration just helps you navigate between those and it really creates an experience that is very smooth for somebody who might not be involved in this at all i mean can you get higher quality results by dialing in every single detail sure but is this one of the quickest workflows out there yeah probably i mean look at that you pick one of these presets and then you can edit all the text in there hit export and you're done and honestly it's kind of funny it's getting to a point where for some of these smaller brands that just need to get their message out there it's really hard to argue for some of these fully manual workflows where you pick your templates fonts maybe do photo shoots for custom imagery put it all together in Photoshop and then iterate on top of that you just say what you want you get a bunch of presets you click one change the text and you're done it's kind of great and I think this is an easy recommendation for a lot of small businesses that just need content right but they went even further they integrated some of these AI trends that we've seen across all different applications i mean the one that most people were surprised by is Canva code like Canva served the opposite of an technical audience but now it can write code and create little landing pages and so on we tested it again on our imaginary dog subscription box business here and hey it built the interface with a onboarding flow with surveys and everything and canvas so I found that interesting but me personally when I looked through all of this one thing really caught my attention and that is the fact that Canva has now their own type of sheets integration where you can upload data and it works with it now unfortunately this is one of the features that isn't available as of now but I thought this looked really interesting because you can add a sheet with data into it and then quickly process everything in there making bulk workflows really simple so from what they showed is you could upload the list of all the products or campaigns or pretty much anything and then you can use Canva at scale to generate multiple assets and just get it done instantly and there's also data analysis features and things like that but I just thought that looked quite promising especially for social media agency owners so if you do that have a look at these Canva features but now let's move on to the next story okay next up we have some major updates to AI video generators namely the one where most people agree right now is probably the best one google's V2 is for a short amount of time freely accessible for Google's AI studio that's their developer interface and with new releases they put their products in there for a short amount of time for developers to try out and as of the week that I'm recording this video in V2 is available in there so you can just go ahead and generate a few videos for yourself if you haven't tried this before this model is head and shoulders above many of the competitors as you can see by some of the examples that are on screen right now but then allow you could only access it directly through the API where you paid per usage and in their experimental whiscap that we talked about recently but now you can just try it so that's kind of nice and then beyond that we have cling 2.0 and cling was one of the models that was competing for this top rank with V2 and let me tell you this 2.0 model is incredible it's probably the best textto image model that we have right now even better than V2 it's debatable both have their strengths and weaknesses sure but the point here is AI video has definitely not plateaued and I wanted to show you a few comparisons of this Cling 2.0 model and some of the prompts that we usually like to run on top of them for the show to give you a feeling of how this stuff is evolving before we do this though I do want to say it's ridiculously expensive whereas right now a 5-second video costs 100 credits and here's the pricing for the credits only so getting one clip that needs a few iterations might just cost you $5 to $10 which is just crazy for most people but as you can see from some of the comparisons on screen right now the quality is incredible particularly this little text to video example of a sloth hanging out in an inflatable donut really impressed me the water physics are near perfect here even the way his foot is submerged in the water kind of works i mean AI video has become indistinguishable in many cases but with releases like this and V2 now it's just pushing that high bar further and further interesting interesting now let's move on to the next one so here we have a few announcements and I'm going to bob these together i would call these competitors catching up to OpenAI and if you're using either Enthropics Claude or Grock then great there's a bunch of new features that you should know about namely I'll start with Enthropics research feature which is essentially deep research now integrated into Enthropics Claude many people love Claude for good reasons the tone is really unique the coding abilities are state-of-the-art although now Gemini 2.5 Pro and GPT4.1 are competing for that slot which one is the best there really that depends on who you ask and what exactly you're doing but the point is they added this research feature which has one integration that none of the competitors have as far as I know and when I say competitors I mean Google's Gemini Advance ChatGpt and Groc and that's our ability to connect to external documents within the deep research so you could do something like connect it to your Gmail account Google calendar Google docs and it will consider those files while conducting its deep research here as of now it's available in beta for max team and enterprise users in select countries so yeah this is a paid feature and no the standard plus plan does not have this as of now and then we have updates from Grock releasing Grock Studio and there's things like data analysis as we know it where it can generate documents code and then visualize them and in something like a canvas where you can collaborate on the content with the AI here they're just clearly playing catchup with chat GBD features that have been here for well almost 2 years in the case of data analysis code interpreter as it was called back then and they also ship the memories feature similar to the one in chat but it doesn't reference your past chats we talked about that extensively last week if you haven't heard that discussion I think it's an important one consider checking that segment from last week's video out point here being Grock now has basic memories and adding some of these essential features that I still love using all the time in chat GPT okay onto the next one which is update to Gemini's deep research feature now it uses Gemini 2.5 Pro under the hood which has enjoyed massive popularity since its release now if I may I'll give you a quick recap of the history of the deep research feature it first was released within Gemini Studio they called it deep research but it was very basic then OpenAI came along and implemented their O3 model under the hood of their deep research product which blew everybody's mind i think still today that is the most useful AI thing out there and they just adopted that descriptive name and now as we just covered everybody's following suit with pretty much everybody having a deep research feature now including a croc and even more specialized players like Perplexity so what happened actually last week and I'm featuring it just now is that Google updated the model under the hood of their deep research with their best model Gemini 2.5 Pro and I was really excited to see this because I love me a good deep research and I was hoping that this could be some competition for OpenAI but long story short I genuinely do think that OpenAI is still better i ran a few test prompts here and just consistently I found the OpenAI results still be better we'll put up a result here on screen right now from our testing basically the gist of it is that the OpenAI reports are more pointed towards what you exactly prompt so one might say that the prompt adherence is better or in more human terms it just listens to you and understands what you actually mean gets you exactly those results now don't get me wrong still a great product just at this point in time open version is still superior oh one more note the deep research results they're more structured like a research paper if that's what you need then I would definitely recommend Gemini's implementation of deep research and if you use Gemini Advanced it's perfectly fine i'm just saying if you start nitpicking OpenAI's results just more relevant and I find some of the insights from those reports to be truly novel whereas I don't have that experience with pretty much all the competing products all right so that's really everything I have for this week next week I'll be back in my studio if you enjoyed this make sure to subscribe i do the show every single Friday my name is Eigor and I will see you very soon

Transcript for:Key Highlights in AI Developments

Transcript for:
Key Highlights in AI Developments