Transcript for:
Building Effective AI Agents: A Guide

openai recently released this practical guide on building agents which I think is a mustread for any developer since 2025 is supposed to be the year of agents and seems like every major lab has their own take on agents for example late last year Anthropic released this guide building effective agents which is I think one of the best guide out there google also had their own take on agents in this white paper which was released last year i also recently gave a talk on building AI agents that work at Google Next i recorded a video based on my presentation link is going to be in the video description if you are interested if you want to build agents you have a whole bunch of different options there are a lot of packages and libraries and SDKs coming up including an agent SDK from OpenAI there's a new agent development kit from Google we already know about Langraph Crew AI small agents and a lot more but if you're thinking about building agentic systems my advice always is to keep it simple so think about an agent as an LLM that has the ability to make certain function call using the tools that are available don't over complicate it and I think this meme really captures the whole essence of this landscape right now personally I really like this definition of an agent you have a model or LLM at its core that has access to tools and there is an orchestration layer so an agent is an AIdriven application that uses an LLM guided by a set of instructions uh that shapes it behavior it also has access to tools that basically enhances its capabilities and all of this is running inside a dynamic runtime environment that the agent controls now in this video we're going to be looking at a deep dive of this practical guide to building agents from OpenAI now there are this guide is mainly focused on agents SDK and in fact I have a very interesting take on declarative versus nond declarative graphs I think they were pointing towards langraph in fact the CEO of lang posted this on x pretty bad advice from openaii there are 27 libraries like agents who put the original lang chain in this camp and none of them are reliable enough to get to production for 99% of use cases this blog coming this weekend i totally agree with Harrison Chase on this if you're building an agentic system it's always best to use these frameworks for the initial proof of concept but once you have the proof of concept in place then it's always better to implement a custom solution it may or may not be a graph based solution right although I'm really looking forward to reading the blog post over the weekend and we'll probably create another video so in this video we're going to go and have a very deep dive into this guide and also share some of my thoughts first we're going to try to answer the question what is an agent openai has a very specific definition so agents are systems that independently accomplish tasks on your behalf so they need to have autonomous decision-m capabilities now OpenAI also defines workflows which are very different from the definition that Enthropic uses so a workflow is a sequence of steps that must be executed to meet the user goal whether that's resolving a customer service issue or booking a restaurant etc right so it's basically a sequence of steps that you need to take in order to accomplish a task now not every application is going to require an agent so application that integrate LLMs but do not use them to control workflow execution think as uh simple chatbot single turn LLMs or sentiment classifiers they are not agent so it's very important to understand that not every LLM based application will require the use of agents again there are two main core characteristics first the LLM needs to be able to manage the workflow execution and make decisions autonomously second it will have access to a number of tools which will let it interact with external systems and the agent can dynamically choose when to use these tools the most important question is when should you build an agent and as I said before not every LLM based solution requires an agentic use case so you can actually solve a lot of problems by just building a normal rule-based decision-m solutions now in order to figure out whether you need an agentic solution for a given problem OpenAI recommends to look at three different core requirements first the system needs to have complex decision making so there needs to be a need for nuanced reasonings then it needs to have difficult to maintain rules so you can accomplish a lot with simple rule-based systems however at certain point the number of rules and the complexity will go beyond which is humanly maintainable in those kind of systems you definitely want to look at more agentic solutions or LLM based decision making now the third one which I think is a very important criteria is heavy reliance on unstructured data so decision or the rule-based decision-m systems usually work great on structured data but if you have a lot of unstructured data such as natural language or information extraction for documents then you could potentially rely on an LLM based system but this is a very important thing to keep in mind before committing to building an agent validate that your use case meet these criterias clearly otherwise a deterministic solution may suffice okay so according to OpenAI there are three main components of an agentic system the first one is the model which is the LLM powering the agent's reasoning and decision making the second one is tools basically these are the capabilities expanded by function calls and the last one is instructions which are the system instructions guidelines and guardrails that controls the agent behavior now this guide mainly uses uh OpenAI's agents SDK so all the example codes that you're going to be seeing are going to be based on that however you can easily expand these concepts to other SDKs if you want so here's how you implement a simple agent the agent here is powered by a model or LLM now you have a set of instructions which controls the behavior and then you have a set of tools one or more which will expand or extend the capabilities of your agent or your model next we're going to talk about all these three components in details so first is how do you select different models for your LLM now something I have been educating for the last couple of years is that for every application you do not need the most capable or smartest model and OpenAI basically recommends the same thing right so based on your application you want to optimize for performance versus cost and latency now the recommendation that they have is if you are building an agentic system first build your agent prototype with the most capable model for every task in task to establish a performance baseline so let's say for every task in your agentric workflow pick the best possible model let's say 03 or Gemini 2.5 pro then you can replace it by a weaker model and see what impact it has so the first initial implementation is just going to give you the best performance baseline and then you can try to optimize it so you need to set up evolves to establish performance baseline then focus on meeting your accuracy target with the best model available and something which you really need to focus on is cost and latency i have worked with a number of clients who got really good results with the smartest models but it's beyond their latency and cost constraints right so you need to optimize for cost and latency for a given performance now next they talk about tools so when you are defining tools you need to provide a very detailed tool description that the model or agent can use to pick a tool for a certain task openai is categorizing tools into three different categories data action orchestration so data enable agents to retrieve contacts and information necessary for executing workflow it could be a simple rack pipeline or a rack tool which will receive or retrieve data from source documents or databases or CRM and you enrich the context that the adenm is going to use with that data the second set of tools is actions so this enables agents to interact with systems to take actions such as adding new information to database updating records or sending messages right so example would be send emails or text update a CRM record hand off customer service tickets to a human and so on the third one is orchestration so agents themselves can serves as tools two other agents now there are going to be two different patterns one is the manager pattern for orchestration in which you can basically use one agent as an orchestrator and the rest of the tools becomes or the rest of the agent becomes tools right so you can have a refund agent research agent writing agent those are controlled by orchestration agent and then there is a another decentralized approach we're going to look at that later in the video okay so here is how you would implement different tools using the agent SDK so tools are basically simple Python function with a decorator function call function tool now within the tool description you need to add a dock string which will tell the LM what exactly this tool does what are the expected inputs and outputs and then you simply provide that list of tools to an agent so the agent can autonomously decide which tool to use now very important you need to make sure you limit the number of tools that an agent has access to if you think there are way too many tools then it's better to split these tools into different categories and probably build separate agents around those tools we're going to look at an example later in the video now the last component is in instructions or these are the system messages that controls the behavior of your model basically you tell the model how to make decisions what things to consider also what tools the model has available and how to use these tools okay so they recommend some best practices around how you define these instructions now they say use existing documents a lot of time we have existing operating procedures or workflows or policy documents so you could potentially use those in the context of your instructions to tell the model what type of problem it's trying to solve and what are the different procedures it needs to use some other things that they recommend is prompt agents to break down tasks so you want to divide tasks into subtasks the idea is to make it as easier for the LLMs to make decisions do not assume that these are smart systems there's definitely a component of smartness however you want to provide very clear instructions on what the agent is supposed to do otherwise it's going to make assumptions and that will lead to unwanted behavior so you need to define clear instructions and an example is you could instruct the agent to ask the user for their order number or call an API to retrieve account document details and these are the type of things that you need to explicitly tell the model to do again don't let the model make any assumption also something very important capture edge cases sometimes the user could provide incomplete information now you need to have a set of instructions which will help the model resolve those edge cases and one more thing whenever you're building these agentic systems you need to have iterative refinements to everything including the set of instructions tools in the models so you need to have an evolved data set where you capture the failure cases then update your set of instructions and tools based on the failures that you're seeing okay next we're going to talk about that orchestration layer there are two different patterns when it comes to building agentic systems one is single agent systems where a single model equipped with appropriate tools and instructions executes workflows in a loop and the second one is multi- aent system where there are multiple agents which are controlling different parts of the workflows and then there are multi-agent system where flow execution is distributed across multiple coordinated agents we're going to look at examples of both of these now in a single agent system you basically have an agent with a set of instructions tools guardrails and hooks now this is usually iterative refinement or incremental improvements right so you get the user input the agent is going to go into a reasoning or agentic loop it will pick a set of tools execute those get the output feed those outputs into another step or another set of tools right and this loop will continue until the agent is able to either accomplish the task or terminates its execution so within the agent SDK OpenAI has implemented this runner.run method which finishes execution if the agent is able to produce an output or the agent returns a response without a tool call right and the way you set this up is you provide the agent user message and all the execution is happening internally now that was a case of a single agentic system or agent system you want to start with a single agent system even if you think you will need a multi- aent system but at some point you want to move on to a multi- aent system now there are practical guidelines for splitting agents into multi- aent systems so the first thing is complex decision or complex logic so when prompts contain many conditional statements for example multiple if else then branches and the prompt templates gets too difficult to scale consider dividing each logical segment across separate agents right so if your if your set of instructions become too complex that is probably a point at which you want to divide it into sub aents and the second case is where you have way too many tools you want to make sure there are set of tools that the agents can use reliably if there is a lot of overlap between different tools or there are ambiguous naming names and descriptions of the tools and they're doing different things you probably want to divide into different subcategories and build agents around those personally I try to limit the number of tools to around 10 per agent if it goes beyond that usually some of these LLMs will have a lot of trouble okay so let's say you decided to use a multi- aent framework or a pattern then there are two different possibilities one is manager agent so there is a central agent which works as an orchestrator and other agents are available to this agent as tools the second pattern is decentralized agent where each agent is working autonomously and there is a handoff of control between agents in the multi- aent systems both of them can be modeled as graphs with agents represented as nodes so in the manager pattern edges represents tool calls whereas in the decentralized pattern edges represents hands off the transfer execution between agents okay so here's a very simple manager pattern or orchestrator so there is one main agent that is interacting with the environment or the user so it receives input from the user then since it's an orchestrator so it has access to a set of tools which themselves are agents and it picks the appropriate tool use that tool or agent to accomplish that task the results are sent back to the manager agent and the manager agent is going to produce the output so everything goes through this orchestrator now here's how you would implement it so you will define a manager agent which is going to have a set of tools but now these tools themselves are a separate agents and all the communication with the users are happening through this manager agent okay they also have a section on declarative versus non-declarative graphs you could potentially represent this orchestration agent or manager agent as a a graph now from this some people thought that they are targeting something like lang graph i don't think that is the case i do see a value in their point of view however I think the declarative graphs still have a lot of value in agentic systems right most of the problems can be solved with declarative graphs where basically you as a domain expert provide a set of steps that the agent can take in order to execute or accomplish your task but you still want to add enough flexibility that the agent or workflow can make some dynamic decision flexibly now we are I think too early to say which approach is better but time will tell okay so the second pattern is decentralized where basically you have a triage agent which initially interacts with the environment of the user but after that it will figure out which agent to use pass on the control to that agent and that specific agent is going to continue the execution now let's say if the user input is where is my order the triage agent will figure out okay I need to pass this to an agent that is specialized in orders now from that point onward this specialized agent will be communicating with the user right now it could have multiple hands off for example orders can go to sales right and then the sales information can come back to orders right so there is this birectional communication but this is more decentralized where you're not relying on a single agent to control the execution now here's a quick example of how the implementation is going to look like so we have technical support agent with a specific set of tools sales assistant agent with another set of tools then order management agent and then we have the triage agent which has handoffs i have a detailed video on Python SDK for the agent SDK from OpenAI where I go into a lot of details of how hands off happens and everything so I highly recommend to check out that right but this is more decentralized now both of these are valid patterns which you can use in your applications it will really depend on your specific use case on which one to use the last component we're going to talk about is guardrails which is very critical for any customerf facing application so you want to put guard rails both at the input and output of your agentic system which will control or manage data privacy risks or reputational risks for your companies so here's a quick example of how these guardrails could look like so here's the user input that goes into the agentic system that was created using agent SDK then in parallel completely independent of your implementation there are going to be a set of guardrails where you pass on the input as well as the output of the system and this parallel mechanism is going to determine whether there are any sorts of risks jailbreaks leakage of information right and that will be able to block either the inputs or the outputs now it's critical to keep it independent of your agent because that way the user is not going to know that there is a parallel system that is running okay so there are multiple types of relevant classifiers that basically determines whether a given user query is relevant to the application that you're trying to solve safety classifier so it will detect unsafe inputs as well as outputs in most of the applications you want to preserve PPIs so you can have a filter or guardrail that will prevent leakage of PPI you could also use a moderation guardrail or moderation i think there's a free API from OpenAI that will moderate content and then tool safeguard so you want to make sure you have a system that is going to safeguard against the usage of tools breaks etc you also probably want to use some sort of a mechanism against let's say SQL injections or prompt injections and in the output you want to have another set of guardrails which will validate your output because sometimes these LLM based systems can be gel broken too and you definitely want to have output validation okay so before building the guardrails set up guardrails that addresses the risks you have already identified for your use case and layer in additional ones as you encode new vulnerabilities agentic systems or any LLM systems in general needs to have iterative refinement so you need to observe how your system is behaving and update guardrails system prompts tool calls list of tools as you encounter more and more things so some of the things that they recommend is focus on data privacy and content safety add new guardrails based on real world edge cases and failure you encounter and optimize for both security and user experience tweaking your guardrails as agent evolves now here is a simple example of how you can add guardrails to your agents using the agents SDK so you can have input guardrails output guardrails they're going to be triggered based on specific trip wires that you can set right so for example here is an agent and then in order to define godrays you will need to implement a Python function and all you need to do is just add this decorator at the top right and you provide a list of guardrails that you have implemented on the input similarly you can add guardrails that you have implemented on the output as well right so the implementation is very simple okay so this was a quick overview of the practical guide to building agents by OpenAI now this aligns very well with my own experience of building agentic systems for startups and enterprises and guiding different teams a couple of other things that I would add is ewalls you definitely want to have robust evolves when you're building any LM based system and these ewalls also need to be expanded on as you collect more and more data so don't wait for a huge evol to start with 10 15 examples is a good enough start then based on how the system behaves you can add more and more examples but it's very important to have an evolve or test set the second thing that I would add is metrics that you're tracking right so don't think about any fancy metrics for example a lot of people use truthfulness and correctness in rag systems but just look at how you want your application to look like and what are the important metrics that you need to track right it could be as simple as accuracy or recall could be a metric right so think about metrics specific to your applications now if you need help with optimizing your agent solutions or evaluations I do offer my consulting and advising services details are in the video description so do check them out anyways I'm actually really glad that the frontier lab companies like OpenAI Google Enthropic they're releasing these practical guides to building agents and there seems to be a convergence on ideas which would potentially becomes industry standards so I'm really excited to see this convergence and there seems to be a common path forward now anyways I hope you found this video useful thanks for watching and as always see you in the next one