Understanding AI Agents and Workflows

how and when to use ai agents hey guys i stumbled upon a very interesting and well thought and well crafted blog post by the anthropic team which is don't know anthropic is the company who built clod is one of the biggest competitors of open ai chpd and i think that there is a lot of confusion in terms of what are our ai agents there is a lot of buzz around ai agents But very often AI agents are being confused or marketed incorrectly because people are just building automation workflows with robotic process automations or just chaining a few LLM pumps together and they call them AI agents. So I wanted to share this short blog post that goes into detail about the difference between a workflow automation and an AI agent. When to use... a workflow, when to use an AI agent, the differences in workflows, what is a voting workflow, what is an optimizing, a self-optimizing workflow, workflows running parallel, etc. So we will cover everything in this short video. Before we dive into this blog post, which I will obviously share in the video description, I invite you to subscribe to the channel if you're interested in AI agents and data-driven automation. it would be great and I will try to provide useful and meaningful content so let's get going so the blog post is right here it's not long but it's super useful and yeah let's get just go over it and I will elaborate whenever I feel there is a need to elaborate so first of all what are agents Agents can be defined in several ways. Many people misdefine them. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents and this is very very crucial. Workflows are systems where LLMs and tools are orchestrated through predefined code paths. So let's say you have a workflow of whenever you receive an email to your Gmail you label it as important, urgent or non-important or spam problem. So this can be a workflow in which you have specific limitations, constraints, and considerations that you do, like a chain of conditions. And you can use a tool. What is a tool? A tool is basically a script, a piece of code that does something which is predefined. So let's say we give it a full email. The tool can return to us only the body of the email or a tool that can count the number of... sentences in email for example so these are workflows that are very predefined have limitations constraints and conditions on the other hand agents or systems where lms dynamically direct their own process and tool usage maintaining control over how they accomplish the tasks so agents are supposed to be more autonomous and you you're supposed to just give them a task and they're supposed to go into the wild and take control over the process, speak with other agents, use the tools whenever they see fit, and eventually provide the output to you. So the thing is, these days, most of the workflows, most of the automation that you see are being marketed as agentic workflows, but in general, they just are workflows with a lot of conditions that people bake into this because... When agents just got started, many people tried to build agents, myself included. And then I saw that it never yielded the results that I wanted. So I started adding a lot of layers of conditions and kind of choking the workflow in order to produce an output that I was happy about. Moving on. When and when not to use agents? While building applications with LLM, we recommend finding the simplest solution possible and only increasing complexity when needed. This is very important. Some people fall in love with the concept of using agents or automation when sometimes you can build a very very very simple solution without using agents or complex stuff agentic systems often trade latency and costs for better task performance and should consider when this trade-off makes sense when more complexity is warranted workflows offer predictability and consistency on consistency for well-defined tasks while agents are the better option when flexibility and model driven decision making or needed at scale. For many applications, however, optimizing single LLM calls with retrieval and in-context example is usually enough. This is very important to understand. Again, in most of the workflows, most of the business use cases today, while the models provided by Anthropic and OpenAI are not powerful enough to really do stuff on their own and validate themselves, in most cases, you will still need a ton of validations and it would be better just to optimize many LLM calls like chaining LLM calls. So for using agents there are different tools that you can use. You can use Langraph which was built by Langchain. You can use many drag and drop related workflows like N8n for example. You can use AG2 which is formerly known as Autogen. You can use QAI. many different solutions that simplify the process of getting started. But obviously, they sometimes create another layer of abstraction, and this should be taken into account. Now, as I said, they suggest that developers start by using LLM APIs directly. Many patterns can be implemented in a few lines of code. And if you do use framework, ensure you understand the underlying code. Incorrect assumptions about what's under the hood are a common source of error. of customer error. So if you're using AG2, for example, or QAI, you must take a look at the code, assuming that specifically AG2 and QAI are open source. So take a look at the code, understand the system prompt, understand all the different prompts involved and how things are working under the hood, because otherwise you might get stuff wrong and be frustrated. So very often it is advised If you have developing skills, it is advised to just do your calls on your own using Python script to the LLM. But if you're using an agentic framework, make sure that you go through the whole repo and understand what's going on. Now, let's cover different types of elements in an agentic system. So building blocks. The basic building block of agentic systems is an LLM enhanced with augmentation such as retrieval. tools and memory. Retrieval is when you have a knowledge base, for example, let's say a book, PDF with a lot of data, or let's say you want to retrieve from a database, so this is retrieval. Tools is like I mentioned before, functions that whenever they get an input they just produce an output, let's say trimming, generating a video, just more predefined. solutions and memory. Memory is basically the ability of the LLM to use again a database, embeddings, stuff like this in the database and then combine the data that is in the memory in the database, retrieve it and inject it into the prompt. Current models can actively use these capabilities generating their own search queries, selecting appropriate tools and determining what information to retain. so this is a simple prompt you have the prompt you have the llm that does the retrieval it is using the tools if necessary and it is writing reading from the memory and writing to the memory and eventually it produces the output now there is another concept in a workflow which is prompt chaining prompt chaining decomposes a task into a sequence of steps where each LLM call processes the output of the previous one. You can add programmatic checks, a SIG gate we will see in a moment, for any intermediate steps to ensure that the process is on track. So for example, we have the input, we have the LLM call, then we have, for example, a condition in which we check the output. Let's say we want the LLM to produce a cold outreach email. Here in the gate we decide whether or not the cold outreach email contains a call to action. If it doesn't, the workflow is being exited. And if it does, we are continuing to chain the prompts. Let me just a second. I don't know what is the alarm clock. Yes. So we are continuing to chain the prompt and moving to the next LLM call. The output is being then forwarded to the next LLM call, and eventually we have the output. When to use a workflow? The workflow is actually ideal in many situations where the task can be easily and cleanly decomposed into fixed subtasks. So imagine if you have a business and you have an SOP, a very well-written SOP that is very easy to follow, and it doesn't have many variability in the output. So it's very obvious if X do Y, if W do Z. The main goal is to trade off latency for higher accuracy by making each LLM call an easier task. So basically, when it's pretty straightforward, all the subtasks are very obvious. How, what is the input and what should be the output? It's better to kind of put many constraints in the system and then you have more predictability. you get more accurate output. Examples where this is useful, so generating marketing copy, then translating it into a different language, writing an outline of a document, checking that the outline meets certain criteria, then writing the document based on the outline. Now, another important concept in workflows is routing. Basically, routing classifies an input and directs it to a specialized follow-up task. This workflow allows for separation of concerns and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs. So, for example, we have the input. Then we have the LLM call router. And it decides whether or not we want to send it to this, the calling of LLM number one, two, or three. And then we have the output. So, when you use this. So routing works well for complex tasks where there are distinct categories that are better handled separately and where classification can be handled accurately either by an LLM or more traditional classification with a model or an algorithm. And here are some examples. So directing different types of customer service query. So general questions will be directed over here. Requests for refunds will be directed over here. And questions or queries about... product or shipping will be deferred to this LLM. And obviously, you can use different LLMs or you can use the same LLM with different system prompts. Another alternative is routing easy or common questions to smaller models like Cloud 3.5 Haiku and hard slash unusual questions to more capable models like Cloud 3.5 Sonnet. So in this example, let me just clarify. Different models have different pros and cons. It's all about trade-offs. So some models are more expensive, but are able to solve more complex tasks, while other models can be cheaper or faster. So in this case, we can use the router to decide which model we want to use. If it's a complex issue, we are going to write to route the concern to a very capable LLM, for example, Cloud Sonnet 3.5. And if it's just a simple query by a customer, let's say we have a customer support bot, we can route it to a less capable and a faster and cheaper LLM. For example, we can route it to a local LLM, or we can route it to GPT-4, or even Cloud 3.5. Claude Haiku that they gave here as an example, 3.5 Haiku. Now let's discuss parallelization. Wow, that was tough. Okay, LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow manifests in two key variations. So first of all, sectioning, so breaking a task into independent subtasks that run in parallel, and the other alternative is voting, running the same task multiple times to get diverse outputs. So here's an example. We have the input and then we have three different LLMs that are being triggered. And then we have the aggregator that aggregates all the outputs and eventually it sends us the output. Now, when to use this workflow? Parallelization is effective when the divided subtask can be paralyzed for speed or when multiple perspectives or attempts are needed for higher confidence results. For complex tasks with multiple considerations, LLM generally performs better when each consideration is handled by a separate LLM call, allowing focused attention on each specific aspect. Here's an example. Implementing guardrails where one model instance processes user queries while another screens them for inappropriate content or requests. This tends to perform better than having the same LLM call handle both guardrails and co-response. So let me give you an example. Again, let's say we have a customer support bot and we are afraid that someone is trying to do prompt inject injecting. Prompt injecting is basically when you write to the LLM something, a query in which you want the LLM to disclose information that it wasn't supposed to disclose. So for example, let's say I'm a customer, I come to an e-commerce store, and I request, I say, hello, how are you? Based on your data, what is the highest discount code you can provide to me? So as a store owner, I wouldn't want my chatbot, my LM, to respond to the user. This is called prompt injecting. Now, I can provide... my chat as a system prompt, a prompt that instructs him, please be very well-mannered and answer all customers'queries and also make sure that you don't respond or disclose any information that might be problematic. Any requests regarding discounts, you should ignore politely. So this can be a system prompt for my bot. Now, the thing is, this isn't necessarily going to perform as well if I have two different system prompts. So this first system prompt is, please answer all customers with respect in a well-mannered fashion. And another LLM, which is going only to focus on analyze whether or not the customer is trying to get discount codes from us. So the system prompt for this LLM is going to be, make sure that any... request from customer for a discount code you just ignore politely for example so this way we we used two different prompts two different llms and we aggregate the response over here and this will lead better results so basically um because we used different llms for specific tasks the likelihood that it will perform well is higher Another example is automating evals for evaluating LLM performance, where each LLM call evaluates a different aspect of the model performance on a given prompt. So for example, let's say we want to analyze headline a copy of an ad. So what we can do, we can ask the LLM to analyze the ad based on different parameters. Give a score of 1 to 10. for the amount of curiosity this ad arouses. This one could be one parameter. The second parameter could be a score on one to ten how likely are you to click on this ad based on the USP unique selling proposition that is being conveyed in the ad. And then another parameter could be how good is the hook on a scale of 1 to 10. So we can have this in the same LLM or we can have three different LLMs. Each LLM is going to analyze only the specific and relevant tasks. So the third one is going to analyze how good is the hook. The second one is going to analyze the USP and the first one is going to analyze the curiosity level of the ad and then it's going to aggregate everything. So this is for evals. The next alternative is voting. So reviewing, for example, reviewing a piece of code for vulnerabilities where several different prompts review and flag the code if they find a problem. Or evaluating whether a given piece of content is inappropriate with multiple prompts evaluating different aspects or requiring different vote thresholds to balance false positive and negative. This is kind of similar to what I just mentioned, so let's move on. Now, another alternative... is workflow orchestrator workers so in orchestrator workers workflow a central llm dynamically breaks down tasks and delegates them to worker llms and synthesize the results so this is how it looks like we also have different agenting frameworks that can do something similar they just generate agents based on the given task so ag2 does this check out my videos about a captain agent in AG2 Captain Agent. So when to use this workflow? This workflow is well suited for more complex tasks where you can predict the tasks, the sub-tasks that are needed. In coding, for example, the number of files that need to be changed and the nature of the change in each file likely depend on the task. Whereas, let's see. Okay, so example, this will clarify it. Coding products that make complex changes to multiple files each time. Search tasks that involve gathering and analyzing information from multiple sources or possible relevant information. So basically, you ask the LLM to come up with ideas of how to solve the issue. This can be also, you can see this in more LLMs that are doing more like of planning different steps. For example, if you do deep research, by Gemini, which I also did a video about this. It's kind of similar. Some are similar, not exactly. Now, next workflow is Evaluator Optimizer. In the Evaluator Optimizer workflow, one NLM call generates a response, while another provides evaluation and feedback in a loop. So this is very powerful. You can also see it also in different workflows by the AG2 team. It's called the Agent Eval. Basically, you give an input. the LLM generates a solution, you have another LLM that can score the solution, and if the solution is good enough, it provides the solution as output. If it's not good enough, it provides feedback, and it provides this feedback to the LLM, and the LLM regenerates another solution, and this cycle continues until the output that was generated by this LLM is good enough, and we are moving forward. Let's see what they write because probably they explain this better than I do. This workflow is particularly effective when we have clear evaluation criteria and when iterative refinement provides measurable values. The two sides of good fit are, first, the LLM responses can be improved when a human articulates their feedback, and second, the LLM can provide such feedback. This is an analogy to iterative writing process a human writer might go through when producing a polished document. So examples, complex search tasks that require multiple rounds of searching and analysis to gather comprehensive information where the evaluator decides whether further searches are needed. Or for example, was trying to solve a code. Not necessarily, it's not the greatest example because this is more relevant for subjective stuff, not for code because whether or not you have solved the bug is pretty, is objective, is not subjective. Another example is translations, where there are nuances that the translator LLM might not capture initially, but where an evaluator LLM can provide useful critiques. Okay, so these are workflows. Now let's discuss agents. Agents are emerging in production as LLMs mature in key capabilities, understanding complex inputs, engaging in reasoning and planning, using tools reliably, and recovering from errors. Agents begin their work with either a command from or interactive discussion with a human user. Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgment. This is called the human in the loop. During execution, it's crucial for the agent to gain ground truth from the environment at each step, such as a tool call, remember the functions that I told you, or code execution to assess its progress. Agents can then pause for human feedback at checkpoints or when encountering blockers. The task often terminates upon completion, but it's also common to include stopping conditions, such as maximum number of iterations to maintain control. So, for example, you can define upfront to the agent, I want you to do 10 rounds, and after 10 rounds, you pause, because you don't want it to work infinitely or enter an endless loop, and then it will just exhaust your credits, for example. So two alternatives is first of all defining when the task is complete and then the LLM, the agent is going to write terminate or whatever you decided is going to be like the word for completion. Or giving a constraint or a limitation of the number of iterations. Agents can handle sophisticated tasks but their implementation is often straightforward. They are typically just LLMs using tools based on environmental feedback in a loop. it is therefore crucial to design tool sets and their documentation clearly and thoughtfully and this is an agent so you have the human you tell the agent what exactly you want it to do what is the condition for pausing and then the nlm starts doing this iteration so it takes an action it checks what has changed it gets the feedback and it just keeps on going and obviously you can build also a team of agents Now, when to use agents? Agents can be used for open-ended problems where it's difficult or impossible to predict the required number of steps and where you can't hard-code a fixed path. And I think this is very crucial to understand. Use agents when it's difficult or impossible to predict the required number of steps and where you can't hard-code a fixed path. So, very often, you can hard-code a fixed path. So this is when you should use a workflow and not an agent. Furthermore, at the moment of recording this video, at the beginning of 2025, LLMs and LLM models that are powering the agents are not powerful enough in order to do a lot of stuff. So this is also something to take into consideration. In many cases, it is better to hard-code a fixed path or add more layers of a human in the loop and QA. than solely relying on agents. The LLM will potentially operate for many turns and you must have some level of trust in the decision making. And this is also something to understand. If you have a reputational risk on the line, you can't rely 100% on an agent because it can hallucinate. Obviously, there are workflows or agentic frameworks that bake in many validation steps in order to make sure that we mitigate the reputational risk. But this is a downside that should be taken into consideration. And this is why very often today when people are using agentic frameworks, or that they choke the agent so much that it's already becoming a workflow, or they use agentic workflows or automations only in internal and non-risky workflows. So for example, research, stuff like this, which doesn't have so much repetition risk. Now, the autonomous nature of agents mean higher costs and the potential for compounding errors. We recommend extensive testing in sandbox environments along with the appropriate guardrails. A few examples where agents are useful. So, a coding agent to resolve bugs, for example, or computer use implementation. This is something that I don't want to get into, but I'm not sure I agree 100%. Now... you can combine and customize these patterns so these building blocks are not prescriptive they are common patterns and this is why i wanted to create this video because i think this blog post kind of lays down lays down all the fundamentals of building workflows and agents this is and it's very well written this is why i wanted to create this video the key to success as with any llm features is measuring performance and iterating on implementations to repeat. You should consider adding complexity only when it is demonstrated, when it improves outcomes. So more often than not, you shouldn't add complexity and you shouldn't try to use agents just for the sake of using agents. Very often you can just use workflows with LLM chaining or elements of LLMs. Now the summary. Success in the LLM space isn't about building the most sophisticated system. It's about building the right system for your needs. Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agenting systems only when simpler solutions fall short. When implementing agents, we try to follow three core principles. Maintain simplicity in your agent designs. Prioritize transparency by explicitly showing the agent's planning steps. Carefully craft your agent computer interface through thorough tool documentation and testing. Frameworks can help you get started quickly, but don't hesitate to reduce abstraction layers and build with basic components as you move to production. By following these principles, you can create agents that are not only powerful, but also reliable, maintainable, and trusted by the users. Now here are a few appendix, how to use agents and when to use agents. I'm not going to go over them and a few suggestions from pro for prompt engineering. I guess that's it for today guys. I just wanted to share with you this blog post which in my opinion As I said is very well written and summarizes the whole fundamentals of agents and workflows When to use each one of them and understanding the different building blocks. I hope you enjoy this video if you have any Criticism or ideas, please leave them in the comment section. Obviously if you haven't subscribed yet, please do um, yeah Until next time, keep on automating.

So we will cover everything in this short video. Before we dive into this blog post, which I will obviously share in the video description, I invite you to subscribe to the channel if you're interested in AI agents and data-driven automation. it would be great and I will try to provide useful and meaningful content so let's get going so the blog post is right here it's not long but it's super useful and yeah let's get just go over it and I will elaborate whenever I feel there is a need to elaborate so first of all what are agents Agents can be defined in several ways.

Many people misdefine them. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows.

At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents and this is very very crucial. Workflows are systems where LLMs and tools are orchestrated through predefined code paths. So let's say you have a workflow of whenever you receive an email to your Gmail you label it as important, urgent or non-important or spam problem. So this can be a workflow in which you have specific limitations, constraints, and considerations that you do, like a chain of conditions. And you can use a tool.

What is a tool? A tool is basically a script, a piece of code that does something which is predefined. So let's say we give it a full email. The tool can return to us only the body of the email or a tool that can count the number of... sentences in email for example so these are workflows that are very predefined have limitations constraints and conditions on the other hand agents or systems where lms dynamically direct their own process and tool usage maintaining control over how they accomplish the tasks so agents are supposed to be more autonomous and you you're supposed to just give them a task and they're supposed to go into the wild and take control over the process, speak with other agents, use the tools whenever they see fit, and eventually provide the output to you.

So the thing is, these days, most of the workflows, most of the automation that you see are being marketed as agentic workflows, but in general, they just are workflows with a lot of conditions that people bake into this because... When agents just got started, many people tried to build agents, myself included. And then I saw that it never yielded the results that I wanted.

So I started adding a lot of layers of conditions and kind of choking the workflow in order to produce an output that I was happy about. Moving on. When and when not to use agents?

While building applications with LLM, we recommend finding the simplest solution possible and only increasing complexity when needed. This is very important. Some people fall in love with the concept of using agents or automation when sometimes you can build a very very very simple solution without using agents or complex stuff agentic systems often trade latency and costs for better task performance and should consider when this trade-off makes sense when more complexity is warranted workflows offer predictability and consistency on consistency for well-defined tasks while agents are the better option when flexibility and model driven decision making or needed at scale.

For many applications, however, optimizing single LLM calls with retrieval and in-context example is usually enough. This is very important to understand. Again, in most of the workflows, most of the business use cases today, while the models provided by Anthropic and OpenAI are not powerful enough to really do stuff on their own and validate themselves, in most cases, you will still need a ton of validations and it would be better just to optimize many LLM calls like chaining LLM calls. So for using agents there are different tools that you can use.

You can use Langraph which was built by Langchain. You can use many drag and drop related workflows like N8n for example. You can use AG2 which is formerly known as Autogen.

You can use QAI. many different solutions that simplify the process of getting started. But obviously, they sometimes create another layer of abstraction, and this should be taken into account. Now, as I said, they suggest that developers start by using LLM APIs directly. Many patterns can be implemented in a few lines of code.

And if you do use framework, ensure you understand the underlying code. Incorrect assumptions about what's under the hood are a common source of error. of customer error. So if you're using AG2, for example, or QAI, you must take a look at the code, assuming that specifically AG2 and QAI are open source.

So take a look at the code, understand the system prompt, understand all the different prompts involved and how things are working under the hood, because otherwise you might get stuff wrong and be frustrated. So very often it is advised If you have developing skills, it is advised to just do your calls on your own using Python script to the LLM. But if you're using an agentic framework, make sure that you go through the whole repo and understand what's going on.

Now, let's cover different types of elements in an agentic system. So building blocks. The basic building block of agentic systems is an LLM enhanced with augmentation such as retrieval.

tools and memory. Retrieval is when you have a knowledge base, for example, let's say a book, PDF with a lot of data, or let's say you want to retrieve from a database, so this is retrieval. Tools is like I mentioned before, functions that whenever they get an input they just produce an output, let's say trimming, generating a video, just more predefined.

solutions and memory. Memory is basically the ability of the LLM to use again a database, embeddings, stuff like this in the database and then combine the data that is in the memory in the database, retrieve it and inject it into the prompt. Current models can actively use these capabilities generating their own search queries, selecting appropriate tools and determining what information to retain. so this is a simple prompt you have the prompt you have the llm that does the retrieval it is using the tools if necessary and it is writing reading from the memory and writing to the memory and eventually it produces the output now there is another concept in a workflow which is prompt chaining prompt chaining decomposes a task into a sequence of steps where each LLM call processes the output of the previous one. You can add programmatic checks, a SIG gate we will see in a moment, for any intermediate steps to ensure that the process is on track.

So for example, we have the input, we have the LLM call, then we have, for example, a condition in which we check the output. Let's say we want the LLM to produce a cold outreach email. Here in the gate we decide whether or not the cold outreach email contains a call to action. If it doesn't, the workflow is being exited. And if it does, we are continuing to chain the prompts.

Let me just a second. I don't know what is the alarm clock. Yes. So we are continuing to chain the prompt and moving to the next LLM call. The output is being then forwarded to the next LLM call, and eventually we have the output.

When to use a workflow? The workflow is actually ideal in many situations where the task can be easily and cleanly decomposed into fixed subtasks. So imagine if you have a business and you have an SOP, a very well-written SOP that is very easy to follow, and it doesn't have many variability in the output.

So it's very obvious if X do Y, if W do Z. The main goal is to trade off latency for higher accuracy by making each LLM call an easier task. So basically, when it's pretty straightforward, all the subtasks are very obvious. How, what is the input and what should be the output? It's better to kind of put many constraints in the system and then you have more predictability.

you get more accurate output. Examples where this is useful, so generating marketing copy, then translating it into a different language, writing an outline of a document, checking that the outline meets certain criteria, then writing the document based on the outline. Now, another important concept in workflows is routing.

Basically, routing classifies an input and directs it to a specialized follow-up task. This workflow allows for separation of concerns and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs.

So, for example, we have the input. Then we have the LLM call router. And it decides whether or not we want to send it to this, the calling of LLM number one, two, or three.

And then we have the output. So, when you use this. So routing works well for complex tasks where there are distinct categories that are better handled separately and where classification can be handled accurately either by an LLM or more traditional classification with a model or an algorithm.

And here are some examples. So directing different types of customer service query. So general questions will be directed over here. Requests for refunds will be directed over here.

And questions or queries about... product or shipping will be deferred to this LLM. And obviously, you can use different LLMs or you can use the same LLM with different system prompts.

Another alternative is routing easy or common questions to smaller models like Cloud 3.5 Haiku and hard slash unusual questions to more capable models like Cloud 3.5 Sonnet. So in this example, let me just clarify. Different models have different pros and cons. It's all about trade-offs. So some models are more expensive, but are able to solve more complex tasks, while other models can be cheaper or faster.

So in this case, we can use the router to decide which model we want to use. If it's a complex issue, we are going to write to route the concern to a very capable LLM, for example, Cloud Sonnet 3.5. And if it's just a simple query by a customer, let's say we have a customer support bot, we can route it to a less capable and a faster and cheaper LLM.

For example, we can route it to a local LLM, or we can route it to GPT-4, or even Cloud 3.5. Claude Haiku that they gave here as an example, 3.5 Haiku. Now let's discuss parallelization. Wow, that was tough.

Okay, LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow manifests in two key variations. So first of all, sectioning, so breaking a task into independent subtasks that run in parallel, and the other alternative is voting, running the same task multiple times to get diverse outputs.

So here's an example. We have the input and then we have three different LLMs that are being triggered. And then we have the aggregator that aggregates all the outputs and eventually it sends us the output. Now, when to use this workflow? Parallelization is effective when the divided subtask can be paralyzed for speed or when multiple perspectives or attempts are needed for higher confidence results.

For complex tasks with multiple considerations, LLM generally performs better when each consideration is handled by a separate LLM call, allowing focused attention on each specific aspect. Here's an example. Implementing guardrails where one model instance processes user queries while another screens them for inappropriate content or requests. This tends to perform better than having the same LLM call handle both guardrails and co-response. So let me give you an example.

Again, let's say we have a customer support bot and we are afraid that someone is trying to do prompt inject injecting. Prompt injecting is basically when you write to the LLM something, a query in which you want the LLM to disclose information that it wasn't supposed to disclose. So for example, let's say I'm a customer, I come to an e-commerce store, and I request, I say, hello, how are you?

Based on your data, what is the highest discount code you can provide to me? So as a store owner, I wouldn't want my chatbot, my LM, to respond to the user. This is called prompt injecting. Now, I can provide...

my chat as a system prompt, a prompt that instructs him, please be very well-mannered and answer all customers'queries and also make sure that you don't respond or disclose any information that might be problematic. Any requests regarding discounts, you should ignore politely. So this can be a system prompt for my bot.

Now, the thing is, this isn't necessarily going to perform as well if I have two different system prompts. So this first system prompt is, please answer all customers with respect in a well-mannered fashion. And another LLM, which is going only to focus on analyze whether or not the customer is trying to get discount codes from us.

So the system prompt for this LLM is going to be, make sure that any... request from customer for a discount code you just ignore politely for example so this way we we used two different prompts two different llms and we aggregate the response over here and this will lead better results so basically um because we used different llms for specific tasks the likelihood that it will perform well is higher Another example is automating evals for evaluating LLM performance, where each LLM call evaluates a different aspect of the model performance on a given prompt. So for example, let's say we want to analyze headline a copy of an ad. So what we can do, we can ask the LLM to analyze the ad based on different parameters. Give a score of 1 to 10. for the amount of curiosity this ad arouses.

This one could be one parameter. The second parameter could be a score on one to ten how likely are you to click on this ad based on the USP unique selling proposition that is being conveyed in the ad. And then another parameter could be how good is the hook on a scale of 1 to 10. So we can have this in the same LLM or we can have three different LLMs.

Each LLM is going to analyze only the specific and relevant tasks. So the third one is going to analyze how good is the hook. The second one is going to analyze the USP and the first one is going to analyze the curiosity level of the ad and then it's going to aggregate everything. So this is for evals.

The next alternative is voting. So reviewing, for example, reviewing a piece of code for vulnerabilities where several different prompts review and flag the code if they find a problem. Or evaluating whether a given piece of content is inappropriate with multiple prompts evaluating different aspects or requiring different vote thresholds to balance false positive and negative.

This is kind of similar to what I just mentioned, so let's move on. Now, another alternative... is workflow orchestrator workers so in orchestrator workers workflow a central llm dynamically breaks down tasks and delegates them to worker llms and synthesize the results so this is how it looks like we also have different agenting frameworks that can do something similar they just generate agents based on the given task so ag2 does this check out my videos about a captain agent in AG2 Captain Agent.

So when to use this workflow? This workflow is well suited for more complex tasks where you can predict the tasks, the sub-tasks that are needed. In coding, for example, the number of files that need to be changed and the nature of the change in each file likely depend on the task. Whereas, let's see. Okay, so example, this will clarify it.

Coding products that make complex changes to multiple files each time. Search tasks that involve gathering and analyzing information from multiple sources or possible relevant information. So basically, you ask the LLM to come up with ideas of how to solve the issue.

This can be also, you can see this in more LLMs that are doing more like of planning different steps. For example, if you do deep research, by Gemini, which I also did a video about this. It's kind of similar. Some are similar, not exactly. Now, next workflow is Evaluator Optimizer.

In the Evaluator Optimizer workflow, one NLM call generates a response, while another provides evaluation and feedback in a loop. So this is very powerful. You can also see it also in different workflows by the AG2 team.

It's called the Agent Eval. Basically, you give an input. the LLM generates a solution, you have another LLM that can score the solution, and if the solution is good enough, it provides the solution as output. If it's not good enough, it provides feedback, and it provides this feedback to the LLM, and the LLM regenerates another solution, and this cycle continues until the output that was generated by this LLM is good enough, and we are moving forward.

Let's see what they write because probably they explain this better than I do. This workflow is particularly effective when we have clear evaluation criteria and when iterative refinement provides measurable values. The two sides of good fit are, first, the LLM responses can be improved when a human articulates their feedback, and second, the LLM can provide such feedback.

This is an analogy to iterative writing process a human writer might go through when producing a polished document. So examples, complex search tasks that require multiple rounds of searching and analysis to gather comprehensive information where the evaluator decides whether further searches are needed. Or for example, was trying to solve a code.

Not necessarily, it's not the greatest example because this is more relevant for subjective stuff, not for code because whether or not you have solved the bug is pretty, is objective, is not subjective. Another example is translations, where there are nuances that the translator LLM might not capture initially, but where an evaluator LLM can provide useful critiques. Okay, so these are workflows. Now let's discuss agents.

Agents are emerging in production as LLMs mature in key capabilities, understanding complex inputs, engaging in reasoning and planning, using tools reliably, and recovering from errors. Agents begin their work with either a command from or interactive discussion with a human user. Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgment.

This is called the human in the loop. During execution, it's crucial for the agent to gain ground truth from the environment at each step, such as a tool call, remember the functions that I told you, or code execution to assess its progress. Agents can then pause for human feedback at checkpoints or when encountering blockers. The task often terminates upon completion, but it's also common to include stopping conditions, such as maximum number of iterations to maintain control. So, for example, you can define upfront to the agent, I want you to do 10 rounds, and after 10 rounds, you pause, because you don't want it to work infinitely or enter an endless loop, and then it will just exhaust your credits, for example.

So two alternatives is first of all defining when the task is complete and then the LLM, the agent is going to write terminate or whatever you decided is going to be like the word for completion. Or giving a constraint or a limitation of the number of iterations. Agents can handle sophisticated tasks but their implementation is often straightforward. They are typically just LLMs using tools based on environmental feedback in a loop.

it is therefore crucial to design tool sets and their documentation clearly and thoughtfully and this is an agent so you have the human you tell the agent what exactly you want it to do what is the condition for pausing and then the nlm starts doing this iteration so it takes an action it checks what has changed it gets the feedback and it just keeps on going and obviously you can build also a team of agents Now, when to use agents? Agents can be used for open-ended problems where it's difficult or impossible to predict the required number of steps and where you can't hard-code a fixed path. And I think this is very crucial to understand.

Use agents when it's difficult or impossible to predict the required number of steps and where you can't hard-code a fixed path. So, very often, you can hard-code a fixed path. So this is when you should use a workflow and not an agent.

Furthermore, at the moment of recording this video, at the beginning of 2025, LLMs and LLM models that are powering the agents are not powerful enough in order to do a lot of stuff. So this is also something to take into consideration. In many cases, it is better to hard-code a fixed path or add more layers of a human in the loop and QA. than solely relying on agents.

The LLM will potentially operate for many turns and you must have some level of trust in the decision making. And this is also something to understand. If you have a reputational risk on the line, you can't rely 100% on an agent because it can hallucinate. Obviously, there are workflows or agentic frameworks that bake in many validation steps in order to make sure that we mitigate the reputational risk. But this is a downside that should be taken into consideration.

And this is why very often today when people are using agentic frameworks, or that they choke the agent so much that it's already becoming a workflow, or they use agentic workflows or automations only in internal and non-risky workflows. So for example, research, stuff like this, which doesn't have so much repetition risk. Now, the autonomous nature of agents mean higher costs and the potential for compounding errors.

We recommend extensive testing in sandbox environments along with the appropriate guardrails. A few examples where agents are useful. So, a coding agent to resolve bugs, for example, or computer use implementation. This is something that I don't want to get into, but I'm not sure I agree 100%.

Now... you can combine and customize these patterns so these building blocks are not prescriptive they are common patterns and this is why i wanted to create this video because i think this blog post kind of lays down lays down all the fundamentals of building workflows and agents this is and it's very well written this is why i wanted to create this video the key to success as with any llm features is measuring performance and iterating on implementations to repeat. You should consider adding complexity only when it is demonstrated, when it improves outcomes. So more often than not, you shouldn't add complexity and you shouldn't try to use agents just for the sake of using agents.

Very often you can just use workflows with LLM chaining or elements of LLMs. Now the summary. Success in the LLM space isn't about building the most sophisticated system.

It's about building the right system for your needs. Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agenting systems only when simpler solutions fall short. When implementing agents, we try to follow three core principles.

Maintain simplicity in your agent designs. Prioritize transparency by explicitly showing the agent's planning steps. Carefully craft your agent computer interface through thorough tool documentation and testing.

Frameworks can help you get started quickly, but don't hesitate to reduce abstraction layers and build with basic components as you move to production. By following these principles, you can create agents that are not only powerful, but also reliable, maintainable, and trusted by the users. Now here are a few appendix, how to use agents and when to use agents.

I'm not going to go over them and a few suggestions from pro for prompt engineering. I guess that's it for today guys. I just wanted to share with you this blog post which in my opinion As I said is very well written and summarizes the whole fundamentals of agents and workflows When to use each one of them and understanding the different building blocks.

I hope you enjoy this video if you have any Criticism or ideas, please leave them in the comment section. Obviously if you haven't subscribed yet, please do um, yeah Until next time, keep on automating.

Transcript for:Understanding AI Agents and Workflows

Transcript for:
Understanding AI Agents and Workflows