ChatGPT Agent Overview

What happens if you combine OpenAI's operator, Deep Research, and Chat GPT together? Well, OpenAI did just that, and it's called Chat GPT agent. Let me show it to you in action. And this video is brought to you by our fantastic partners at Vulture. More on them later in the video. So, here's ChatPT. I'm going to click tools and agent mode right there. So, when I click it, here are a bunch of example queries that I can give it. So research carbon capture cost trends and projections. Retrieve who? Health professional density data by profession. And you can see this one's going to be a report. This one's going to be in Excel. Another report. And then this one's going to be browsing. So book dog friendly hip camp with hot tub near San Francisco. Let's see what that one looks like. So I'm going to hit play. And the first thing you're going to notice is it looks a lot different than chatt operator. With operator you would actually see the websites. you would see operator clicking through them, making mistakes, fixing it. But now you have this more simplified, streamlined view. So you do see the website. It's searching through it. I'll use the browser tool to check for interactive features. And this can take a while. I need to find a private hot tub. So right here, what you're seeing is the chain of thought. In the background, you're seeing the searches. And here you can see it's in reading mode. So it's actually reading the website, but it's not able to present it so far, which is interesting. And I actually really like the user interface for this. So down at the bottom you have a little scrubber. I can scrub forward. It basically recorded the entire thing. You can see how many different steps it had. So I'm going to stop right here. So it loaded up the Hip Camp website and we can actually see it clicking around looking for different dates. So I'm going to scrub forward a little bit. You can see it's selecting the number of people looking around and at the very end it gave me all of the information. So it double checked that all the dates were available and it found the best result. So, Kings Mountain Camp's apartment farm stay with hot tub in Woodside. Here we can see the key details and it reasoned for 12 minutes to get there. So, if I click the little activity button right here, we can actually see every single step. Look at this. Look how many there are. So, here is something where it had to navigate a website. Here's a search. If I scroll down a little bit more, we can see on and on it goes. And basically, you can do anything that you can do in a browser. So, obviously, it can also create spreadsheets. So, here's an example. organize vegetarian recipes from all recipe by protein efficiency. So, let's click play. It's going to get started. Let me scrub through it. You can see it's doing a bunch of searches in reading mode and then it should finalize with a spreadsheet. There it is. So, all of this looks really cool. It's basically a combination of deep research, the ability to search a bunch of websites in a long horizon task way, and operator, so the ability to actually perform tasks on the web. And so now let me tell you about the details. At the core of this new capability is a unified agentic system. It brings together three strengths of earlier breakthroughs. Operators ability to interact with websites, deep researchers skill in synthesizing information and chatbt's intelligence and conversational fluency. It has its own virtual computer very similar to Manis. And in fact, Manis put out a few different posts on X about the fact that Chhat GBT Agent is essentially joining their game. And in this thread, they did a bunch of comparisons between Manis and Chat GBC agent. I'll drop that link down below. Now, this new feature is available in Pro Plus and Team accounts. So, you don't need the $200 a month account to use this, which is nice. All right, so how about those benchmarks? This is humanity's last exam. And on the right, this is chat GPT agent browser plus computer plus terminal. And it achieved a 41.6% as compared to deep research alone at 26. OpenAI 03 with Python plus browsing at 24.9. Chat GPT agent with no tools. So this is interesting. Having tool use obviously gave it a massive improvement in this benchmark. Now for context, Gro 4 heavy which just came out about 2 weeks ago scored a 44.4% with Gro 4 heavy and that is multi- aent. Here is Gro 4 at 38.6%. Now for Gro 4 heavy we don't necessarily know what's going on under the hood if it has access to tools and the ability to have its own browser and environment. But what I do know is the Gro 4 model is a brand new model whereas ChachPT agent seems to be one of their existing models possibly customized to be really good at agentic tasks. Now imagine this with GPT5. And you know what else is on this level of intelligence? The sponsor of today's video. Vulture is the world's largest independent cloud provider and they've been a fantastic partner to us. So I'm really excited to tell you about them again today. So, if you need to provision GPUs, whether you're just tinkering on your own AI project or you're scaling up to production, Vulture is the place to go. They offer the latest AMD and Nvidia GPUs, spanning 32 locations across six continents, so you're going to get the lowest latency. They also offer industry-leading price to performance with serious accessibility and reliability. So with Vulture's global fully composable cloud infrastructure, you move your applications closer to your users and frees you from vendor lock in which you know I've talked about quite a bit on this channel. They also have Vulture Kubernetes Engine which allows you to scale beyond just a single container. So if you're tired of waiting in line for other GPU providers, check out Vulture today. are offering my viewers $300 in credits for your first 30 days when you visit getvulture.com/bman. And remember to use code bur300. Thanks again to Vulture. Back to the video. Here it is with Frontier Math. It gets a 27.4% as compared to 04 mini with Python 19.3 and 03 with Python 10.3. And here it is with economically important tasks. And so across the x-axis is the estimated time for humans to complete a task. On the yaxis we're seeing the win rate and tie rates versus a human. And in blue we're seeing the new chatpt agent. Green is 03 and yellow is 04 mini. And what we see is yes chatpt agent is winning now as compared to humans because that is really what this benchmark is supposed to test. It looks to be scoring right around 30 plus%. So think about that 30 plus% of the time chatbt agent is beating the human at this economically important task. Here is DS bench which is designed to evaluate agents on realistic data science tasks spanning data analysis and modeling. And what we're seeing is chat GPT agent is beating humans. So in the striped blue that is CHIGPT agent and right here with 65% is the humans with data modeling and same thing over here 89.9 for CHIGPT agent and 64.1 for humans. Now here's spreadsheet bench which is its ability to create and edit spreadsheets and what we're seeing here is 71.3% for the human butt agent with Excel access 45.5. still has some ways to go to get as good as a human at Excel. Now, one last thing, this release marks the first time users can ask Chhat GPT to take actions on the web and so it introduces new risks particularly because Chachbt agent can work directly with your data. Now, there are many risks there. There can be malicious actors on the web trying to convince Chat GPT agent to give them your information. So be very careful with what information you give to chat GPT agent because yes if planning the prompter is out there he's going to get it to tell him your social security number if you give it to the agent. So couple final thoughts on this one. It really does seem like humans are getting further and further away from the internet as we know it today. And I don't know what to think about it yet. On the one hand, I'm kind of excited because there's so much garbage on the internet. If I didn't have to sift through it all, and I didn't have to do all these time-intensive tasks by myself. I can just assign an agent to go do it. I'm really excited about that. But at the same time, now I'm having to trust an agent to be the filter between me and the information on the internet, which has its own problems in itself. And once again, thank you to Vulture for sponsoring this video. You get $300 off your first month using code Burman 300. I'll drop all the links in the description below. Thanks again to Vulture. If you enjoyed this video, please consider giving a like and subscribe.

Transcript for:ChatGPT Agent Overview

Transcript for:
ChatGPT Agent Overview