Transcript for:
AI Agents: Innovations and Differences Explained

Are you looking to build AI agents in 2024? Then this video is for you because there have been some big announcements in the world of AI in the last couple of weeks that you need to know about. In particular there has been some big players like Google, OpenAI, Microsoft and Meta and they've all been making waves and releasing tons of new AI products recently. all with their own names and unique angles and it gets confusing super fast in this video i'm going to break down what the latest agent builders have been coming out from the companies of google and microsoft and openai and what they mean for you there is very clearly a distinction in the different types of builders that are for consumers versus enterprise and each company has their own kind of angle as to what they're doing what kind of applications and use cases are so if you want to just understand this whole kind of context around ai agent builders and what it means for you then stick around and let's jump into it so let's first start off with a bit of a definition from sundar pichai which is the ceo of google we're gonna just jump into the screen and listen to what he describes agents as what i mean by that i think about them as intelligent systems that show reasoning planning and memory are able to think multiple steps ahead work up across software and systems all to get something done on your behalf and most importantly under your supervision so what's really interesting here is he's talking about reasoning planning and memory thinking in multiple steps ahead and working across software and systems so if you'll remember what the general chat gpt interface looks like and we can obviously go and ask a bunch of different questions like you know explain superconductors is one of the three written questions they have here.

So we're basically prompting, asking a question and we get a response. These language models are basically working off of a big, large language model that's been trained and you prompt it and then it's going to give you a response based on its training material. But when Sundar is talking about reasoning and planning and having memory, thinking in multiple steps and working across software and systems, this is very different. I'd like to jump into this video. It was actually an interview with Langchain's Harrison Chase.

it's on the Sequoia Capital YouTube channel. But they break down really nicely, effectively, what makes an AI agent. And I think this is really helpful because it can be confusing.

An AI agent isn't an LLM. It is different. And it uses large language models, but it is different in context.

I do actually have a video which goes into this in more detail called AI Agents Explained. So I'll put this somewhere around here. Also link it in the show notes.

But an agent has different components. So firstly, it's... got an ability to plan and reason.

Now, when you go to Chachaputi and you kind of ask a question, for example, if I say, you know, where should I go on holiday in 2024? I'm going to get a response, but it just gives me kind of some different options and some kind of, you know, answers as to why. But it's not really taken into consideration like, okay, well, let me understand more about you. What's your preferences? Where do you like to go on holiday?

What's your availability? Time of the year, is it? And so... it's not having this kind of actual cognizant planning and reasoning capacity about it. Now, an agent, on the other hand, if you think of agents more like an employee, I think an employee would just be a better word for an agent, it's going to be less confusing.

But an employee, if you give them a task, you can be pretty ambiguous with that task and say, okay, let's go and plan an offsite for our company. So that employees, they're going to have to think about all these decisions, these little micro decisions that it's got to make, it's going to think about them and then plan it out and then go ahead and execute. Whereas when you just ask a language model that question, it's just going to kind of spit out an answer straight away.

It's not really thinking, it's just kind of predicting what it should say is the next word. And this planning phase is really important when we start thinking about autonomous agents, and they need to think about the whole kind of spectrum. The other three parts here, I think the most important parts are really like the tools and the actions.

So you when you again go and speak to ChatGPT, for example, I'm basically chatting in an interface with a large language model. That core is what these interfaces are for. If I go and look at all these other types of chatbot chat interfaces, very similar, basically the competitors to ChatGPT. We have Copilot from Microsoft. Now Microsoft is actually partnered with OpenAI and the models that power this chatbot.

are the same ones that are powering ChatGPT. And ChatGPT is powered by their OpenAI's models, GPT-4 Omni, GPT-4, GPT-4.5. So these get really confusing because you've got OpenAI, which is a company, you have the models that they built, GPT-3.5, 4.4 Omni, and then you have ChatGPT, which is this environment, this kind of product that sits on top of it that you're looking at.

And Copilot is the chat environment that's built on top of the model, like GPT-4, which is actually... owned by OpenAI, but Microsoft is the owner of the co-pilot. Whereas when you look at Meta, which is obviously formerly Facebook, they have built an open source language model called Lama. And Lama is the model, Meta is the company, but they've also called their chatbot Meta.ai. But effectively, it's the same thing.

It's a chat interface that allows you to interact with their model. And lastly, Gemini. So Gemini used to be called Bard. This is from Google. So Google is the company.

Gemini 1.5 Pro is their latest model. And Gemini is also the name of their chat. So they've tried to like brand it in a way which is a bit more clean.

They used to be called Bard. So the model was Gemini, the chat was Bard, and it was Google, the company was just really confusing. So we just need to make the distinction between like most of these kind of chat environments.

At the very core, it is a chat interface on top of a language model. But as we can see here, these agents, that's a step above interacting with just a basic question answer model. These agents are able to interact with different tools like calendars or calculators, code interpreters, just data analysis, searching the website. So in this case, the older versions, let's say like ChatGPT 3.5, if I go and say, what was the latest news this week?

Let's see if it's actually able to. pull out the latest news. So it says the latest news this week covers blah, blah, blah. It's just like generic crap. There is nothing real or cited from the internet here.

So this is like the purest example of a chat on top of an LLM. However, when I go to the latest models, the chat interface has actually become slightly agentic in the fact that they have integrated built in tools. There's not many that they built in, but in particular, they built in search.

So if I say what's the latest news topics today. And this is using, you know, GPT 4.0. You see that it says searching the web, searching for different sites, and then it's going to pull out the news topics, and it's going to give me the link.

So I've had to change the latest model. But it's not the model that is answering with these, it's actually just it's using the latest model, but it's also got these tools, which are searching the web, pulling out, you know, scraping the web for this information. So it's doing stuff on top of the language models. And so it's this kind of very first basic expression of an agent. Now the other two pieces here, the memory.

So this is something when we go over to OpenAI's announcement of GPTs, custom GPTs back in November 2023. This is what they said. Since launching ChatGPT, we've been asking, people have been asking for ways to customize ChatGPT to fit specific ways that they use it. We launched custom instructions in July that let you set some preferences, but requests for more control kept coming. Many power users maintain a list of carefully crafted prompts and instruction sets, manually copying them into ChatGPT.

GPTs now do all of that. for you. So this is what we're talking about the first kind of context here, or when they talked about custom instructions, it was the equivalent of saying, you know, if I was to ask the question, you know, give me the latest news, I might in my custom instruction say I want just like very concise bullet points, or maybe the custom instruction is give me like the news and then give me some more context about these things that are discussed in the news. And I want it formatted in this way, like I can give custom instructions about how this chat interface should work with me.

But that is a form of memory. And then lastly, there's action. So actions is similar to tools in the sense that it needs to use the tools to take action.

You know, in the example where it was using this news, the action is to go on to like the SERP or the search engine results page and take the top, you know, articles for news today and then go and scrape them. So it might be two different actions that it's taken there. But these are all the kind of key components to what an agent kind of behavior environment looks like.

So going back to custom GPTs. when they introduced this back in November, they also built a GPT builder. And this was available to anyone, they kind of started really on the consumer side and allowed people to build them and then open them for the enterprise.

But I can go ahead here and create a custom GPT. I'm going to do a very quick example. But let's say assistant to help plan my meals. This is basically a chat builder. to help me build my own assistant kind of very meta in that sense okay ask for the name the name sounds good so that's good it's probably going to ask me a few more questions let's go ahead and skip through this so it's now generating a profile picture and so on so forth so long story short it's creating the kind of instructions in the background that's going to help me create a food assistant planner now if i go over to the configure screen i actually prefer this screen it's just two different ways of doing the same thing like one's through a chat and one's just like let me enter in some fields and I find this a lot easier.

But here we can see you give it a name, give a description. Now you've got instructions. So these are kind of like custom instructions.

And they're actually system prompts. So when you chat with ChatGPT, behind the scenes, there is like a master prompt at the top. And then there's the prompt, or which is the question you ask. So if I say, what's the latest news, there's going to be a system prompt that kind of tells it how to interact overall.

and then there's going to be the prompt itself. So let me give you an example. Here for the food planner, it's saying, the GPT acts as a personal meal planning assistant, helps users create balanced meal plans, et cetera, et cetera, aims to make meal planning simple, easy, enjoyable, and personalized.

answers should be super short and concise. So when I send the message in here, if I was to go and save this somewhere, if I was to go and type the message, it's going to then send the system prompt, which is this instruction on the left hand side, you don't see that, but it will read that. And then it will also add on my message in a thread. So we've got this kind of like memory around how the assistant should actually interact or what it should do.

Now that's pretty cool. But So far, there's nothing really agentic about this behavior. We haven't really, we just kind of stored some like memory as to how the assistant should act. But where it gets interesting is down here with knowledge and the capabilities.

So knowledge, right now, it's going to answer my questions without any additional knowledge added to it, which basically means it has to use the LLM's training data to answer this question. So if it's going to give me Mills. based off of what it's been trained on.

But if I have, for example, like my family's secret recipe of, I don't know, vegan food recipes, and there's like just this stuff that I want my meal plans to be based off of, I could upload a PDF file, a spreadsheet or a document that has that list of recipes, and it could then reference that. And so it's going to pull from this kind of custom knowledge, which again, is like having some sort of long term memory, it's it's going to go and use what we call RAG or retrieval augmented generation. to go and pull out the recipe based off of the prompts that I've used and then insert that into the prompt as an answer.

Let's just shut this other one because it's getting confusing. So that's a big kind of part of agentic behavior. And we can customize these to work in whatever ways we like.

So I'm going to upload a recipe one. Now we go down to the capabilities at the bottom here. And the capabilities are really interesting because, as I said, ChatGPT has already built in some pre-made tools. And in these tools, they've got web browsing and then they've got...

DALI for image creation. They've also got Code Interpreter, which is basically like data analysis where it's going to go and do math through code because language models, as we said previously, are not very good with reasoning at the moment. So there's kind of very specific ways that they're going to answer more kind of mathematic questions.

And I can switch these on. So web browsing will help me when it has a query that it thinks it should get it from like, you know, the internet and live information rather than what it was trained on. It can go out and pull that.

But what's really interesting to me is this section at the bottom, the actions, the create new action. Now, if I go on this screen, I can't really do anything here unless I'm a developer because I have to build this and code this. But what this page would allow me to do is it allows me to go and build tools. Not necessarily build tools, but it's going to allow me to interact with tools that already exist. So that could be APIs to, for example, ESPN.

Like maybe I'm going to create an assistant. that gives me an update of the latest UFC fights, and all the scores of those matches, I could maybe get to like the ESPN API and call that data from there. And then once I've used that data, I could then say, hey, every half an hour, send a text or a message through Slack to my friends to update them on the results that could be like tool number two.

So I can start building out these tools through these custom actions. And these are also in the API known as function as function calling. So What's important to note here is that OpenAI, of course, set the innovation back in November 23. And what we're starting to see with Google and Microsoft in the last couple of weeks is their versions of these types of agent builders.

But they do have a slightly different flavor. In this particular scenario, anybody on ChatGPT can build their own custom GPTs. Now, of course, there are lots of businesses that are using this to write articles or create scripts for YouTube, for example.

But ultimately, it has... kind of been more consumer focused. But if we go over to look at Google and Microsoft, and what have they been doing in this space, it starts to get really interesting.

So let's take a look at what's happening in the latest Google IO conference where Google is talking about gems. So just to give the context here a little bit, if we go over to Gemini, so again, remember, this is from Google, and it's the chat. Originally, it kind of the strengths and weaknesses of this, like the models historically haven't been as good, but Gemini has now started to get really powerful.

They have 1 million tokens that you can use when you're chatting. So you could put your entire thesis and documents and upload them here. But really what's interesting about Gemini is how integrated it is into the ecosystem of Google Docs, Gmail, Google Maps, and these sorts of things. So let's just take a look.

They say here, let's you customize it for your own needs and create personal experts on any topic you want. We're calling these GEMS. Just tap to create a GEM, write your instructions once, and come back whenever you need it.

For example, here's a GEM that I created that acts as a personal writing coach. It specializes in short stories with mysterious twists, and it even builds on the story drafts in my Google Drive. GEMS will roll out in the coming months.

That reasoning and intelligence all come together in the new trip planning experience in Gemini Advanced. We're going to Miami. My son loves art.

My husband loves seafood and our flight and hotel details are already in my Gmail inbox. To make sense of these variables, Gemini... So that's like an important point here.

Basically, they're saying, you know, if I was to go, say, plan me a trip for XYZ, it's not going to be just talking to the LLM Gemini. It's going to be taking into consideration the memory, i.e. it's going to go and call things from your Gmail inbox. It's having this way to interact with that.

And then there's tools. So it's going to be able to... access different things and take actions on your behalf.

So already we're going towards much more agentic behavior. All kinds of information from search and helpful extensions like maps and Gmail. The end result is a personalized vacation plan presented in Gemini's new dynamic UI. I think this is another interesting thing here. Like you can see the overall interface that they're talking about.

This travel planner looks very different from the one in, you know, going back to like. the main kind of interface here. It just looks very different the way that you would type the way you get results back.

It's quite interesting because I think this is going to be one of the big players on the consumer side, which is what a tooling or infrastructure that you've already got can bring to the table makes it interesting. So here they've clearly got these listings from their Google search and their Google Maps and everything else that they've already got internally, it can make that chat experience very unique to the Google environment. Something just to think about when we're comparing these chat interfaces on top of proprietary models.

I like these recommendations, but my family likes to sleep in. So I tap to change the start time. And just like that, Gemini adjusted my itinerary for the rest of the trip. This new trip planning experience will be rolling out to Gemini Advanced this summer.

You can upload your entire thesis, your sources, your notes, your research, and soon interview audio recordings and videos too. It can dissect your main points. identify improvement, and even role play as your professor. Maybe you have a side hustle selling handcrafted products.

Simply upload all of your spreadsheets and ask Gemini to visualize your earnings. Gemini goes to work calculating your returns and pulling its analysis together into a single chart. And of course your files are not used to train our models.

Later this year, we'll be doubling the long context window to 2 million tokens. So that's the kind of consumer side. So we've now spoken about open AI and how they've got custom GPTs where you as a consumer can create your own ones.

They also have an API where enterprises can then build on top of that and enterprises can use just the consumer interface as well. What Google has just announced here with Gems is basically, you know, Gems is the equivalent of custom GPTs, but for Google. I'd be super interested to see what this environment is like.

But to be honest, it just feels like it's going to be another custom GPT environment. So what is the difference that this is going to bring to the table? What is really going to be about the ecosystem of tools that surrounds the Google environment and the Google ecosystem like Gmail, Google Maps, Google Docs, Google Drive, and so on. I think that's where it starts to become really powerful.

But the general idea of AI agent builders is becoming pretty standard in terms of like being able to build this no code. What's going to be interesting is really what other tools in the ecosystem you can provide these agents and that's where it becomes really valuable. And is that easy for us as consumers or even just business people to build things that are relevant at work?

You know, if you're in a sales or marketing role, it's probably going to be quite difficult for you to code and use things like the custom GPT environment where you have to code to build these actions. Having some sort of vertical solutions where they're fully tailored to that. particular workflow of maybe sales, you know, prospecting or marketing, creating content might give us better experiences. But right now, Google has clearly created these kind of big general purpose environments that are really well suited to their own environments.

Now, something that I'm not going to go into this video, like this whole episodes and videos, you can watch on this. But Google has this thing called Vertex AI. And Vertex AI is their agent. builder ecosystem, but for the enterprise. And this is way less user friendly than what you just saw on the screen.

But it kind of has exactly the same process where you can add knowledge, you can add actions, and you can integrate kind of other things in them. And the point being is you can then deploy these agents that Google are calling these agents for things like consumer facing applications like customer support chats, or maybe a buying experience on an e commerce website. These are being rolled out on the enterprise side.

And this kind of leads us on to the next update, which is from Microsoft and Microsoft Build, which was their latest conference that came out last week, and some of the announcements they had here. So again, like going back to a quick reminder, we just had a copilot, which was built on top of the language models from OpenAI, but is Microsoft's copilot. Now they have put copilot, which is again, just a chat interface on top of a language model.

They have put this throughout all of their different products. to try and create like a seamless experience and enhance and add value to their existing suite. Now think of Microsoft, you think of Microsoft Docs, you think of Outlook and email, Microsoft Teams, all of their enterprise kind of software is also available there.

So it makes sense that these large companies like Microsoft and Google are building chatbots, agents, assistants on top of their existing ecosystem to make them more valuable. But what's really interesting in the latest announcement is you can now build these custom co-pilots in Microsoft's co-pilot studio. And if this is a different announcement here, but what's really interesting is Microsoft now has Copilot Studio where you can build your own custom Copilots.

So effectively, we can build Copilots like this tailored towards our own things, just like you've seen in custom GPTs and just like you've seen with Google Gemini's Gems. So many words and what a mouthful to get through all this stuff. But let's just take a look at what they say here. Find business processes with Copilot Studio. Create copilots that act as agents, working independently for you.

Simply describe what you want your copilot to do. Easily configure your copilot with the details it needs, like instructions. So you can see you've got instructions there, like exactly the same of what you've seen over here in the GPT Builders. You go over to create, you configure, you have your custom instructions. It's exactly the same stuff.

you've now got triggers. So this is kind of, it's all related to the actions, but triggers is like, you know, what, what thing is going to trigger a workflow or an action to be taken? Effectively, when we talk about these tools and these actions, you know, we have an action to maybe search through your HR policy if someone asks a question, but like, what's the trigger to make that happen?

This is what they're talking about here. Triggers, knowledge, and again, knowledge. So in the same sort of context where we could upload a PDF file, the knowledge is a similar thing, you know, in context of the memory, but knowledge here could be accessing your cell CRM, it could be accessing docs. Now, obviously, Microsoft have their whole like ecosystem of docs and files and sheets.

And they're probably gonna make that super, super integrated and easy for you to access. But it's the same, it's exactly the same context. There's nothing new or exciting here.

Actions. Quickly test your copilot before you deploy. and seamlessly published across multiple channels.

Watch it use memory for context, reason over user input, and manage long-running tasks. Copilot can learn from feedback to improve. So this screen is kind of interesting because this is all of like analytics around how the copilot is being used. And you're not going to see this on OpenAI's custom GPT on the consumer side, because no one really cares that much.

But when you're rolling out copilots as a business enterprise solution, maybe like it's a chatbot for a customer or some sort of assistant to take in insurance claims, you want to know how it's performing and whether people are able to answer their questions correctly. Put Copilot to work for you with Copilot. So again, the builder itself, I feel like is not too interesting outside of the fact this is way easier for like a non-technical person who doesn't code to be able to do it.

And it's going to make it super easy for enterprises to roll this stuff out in a much more... no code fashion, fully integrated into the Microsoft ecosystem. So one thing that's important to note here is, you know, you can't use this unless you're a Microsoft like 365 customer, and then you pay for it.

So you've got to pay for the studio, you've got to pay for the membership, you've got to pay to be able to build these custom bots. And it's kind of more focused on the enterprise clients that are using it. And it's like closed ecosystem. The thing I like about open AI is it's much more open, like anybody can contribute to it, and so on, so forth. So in a nutshell, we have these big players who are building these AI agent builders and we can call them agents, we could call them assistants, we can call them chatbots but the main important thing to note is they are doing more than just getting a response from a language model.

They're not fully agentic in the sense that it's an autonomous agent where you give a task and just go and do everything like a lot of this is having a back and forth with a human but there is a lot of automated processes in here. So what are the key takeaways from this whole video? Number one, custom chatbots, custom GPTs, Gemini gems.

or Microsoft Copilot Studios, they're all basically the same. They have the same components making them all up. They have actions, knowledge, memory, and tools. Number two, the different ecosystems in which they're applied is what makes them interesting. Google's integrations with their whole G Suite and their Google's tools like Google Maps and Docs and everything else is going to be really interesting to the experience it can provide to users.

Microsoft, on the other hand, has their own extensive suite of tools, and they're more enterprise as opposed to consumer. So they are going to bring a different value to theirs. Even though both of these have also studios where you can build your own custom ones, they're still going to be greatly impacted by the ecosystem of tools that you have access to. And I guess number three is there's clearly a bifurcation between consumer and enterprise use cases. And I still think OpenAI has really got down like the consumer side and they are trying to go more and more into the enterprise.

But right now, it's the only place where someone who can build a custom bot super easy without code can go to. It's going to be very interesting to see how this ecosystem of AI agent building evolves. I think it's clear that the building of the agents themselves is not the hard thing.

There's kind of a known process now for this stuff to go through. The tough thing or the thing that's going to make them really valuable is what... What ecosystem are they a part of? And what tools are you able to equip them with?

It's just like hiring an AI employee. The employee itself is not the complicated thing. It's how do you train it and how do you equip them with the skills that you need to be able to carry out the task? The better that happens or the better those skills, the more advanced they can be. be and the more value they can bring to your particular day to day whether it's in work or in your personal life.

So if you like this video, please give it a like and subscribe. It's a new channel and there's going to be lots more videos like this. Let us know what are you building with AI agents and how far have you gone down this rabbit hole?

Does all the different terminology make sense to you or what questions are you asking on a day to day basis? Let me know in the comments below and let me know if you like this video. Thanks a lot for your time, everybody.

Take care.