Transcript for:
AI Workflow Patterns Overview

hey guys today I'm going to be sharing with you two very powerful design patterns that you can implement into your AI based workflows to significantly improve their accuracy and these are the evaluator and optimizer pattern and the other one is human in the loop so I've actually already covered both of these in my previous videos separately but I also decided to cover them together as just because of how similar they are in the way that they work with just one of their element being different and that single different causes them to be applicable in different kind of scenarios which is also one of the things we're going to be touching on today and another reason is just because of how powerful these patterns can actually be especially if you're building production grade automations where reliability is key these patterns could actually be the difference between your automations working and not working so I would definitely recommend you to master them and understand them properly so let's first start with the evaluator optimizer pattern we have a diagram here and a description so in the evaluator optimizer workflow one LLM can generate or one LLM call generates a response while another provides evaluation and feedback in a loop and that's exactly what we can see in this diagram here so we have an input being passed to an LLM this LLM generates an output based on that input that output gets passed to an evaluator LLM which decides whether or not it actually aligns with the given criteria and based on that it's either going to approve of it or actually reject it when it does reject it it's going to provide feedback along with it which gets passed back to the initial LLM and that LLM reflects on that feedback comparing it to its current output and generates a new output after that which again gets passed to the evaluator LLM and this loop continues until the evaluator LLM is happy with the output in which case it approves of it and finally the workflow generates a final output right and when we look at when to use this workflow it says this workflow is particularly effective when we have clear evaluation criteria and when iterative refinement provides measurable value the two signs of good fit are first that LLM responses can demonstraably be improved when a human articulates their feedback and second that the LLM can provide such feedback this is analogous to the iterative writing process a human writer might go through when producing a polished document and the example I have here prepared for us meets these requirements perfectly where we have clear evaluation criteria that the evaluator agent can refer to when deciding whether or not in our case the output of the customer support agent satisfies them and what we have here essentially is an email autoresponder workflow that takes in customer inquiries and these inquiries gets passed to a customer support agent which generates a response to those emails right and we are basically optimizing this process using the evaluator optimizer pattern so without further ado let's actually just execute this workflow here i already have an inquiry email sent to the email account that's connected to this trigger let's see how exactly this pattern helps with the whole process all right so I'm just going to go ahead and click on test workflow so that we can see this in action the classify email email classifier classified it as customer inquiry so uh what we see here is the customer support agent first received this email and then it generated an output this output got passed to the evaluator and initially the evaluator actually rejected this output and then provided a feedback right and this feedback got passed back to the customer support agent which then reflected on this feedback comparing it to the current output we have or the current email and it generated a new output that aligned with the feedback this output again got passed back to the evaluator agent and this time the evaluator agent was actually happy therefore it approved it and allowed the workflow to proceed which finally ended up us in us sending an email and when we look at the email that we just sent I am going to go ahead and just expand this and we can see here that this is the final email right now let's take a look at what exactly happened under the hood so we got the email from this Gmail trigger passed it to the email classifier LLM it classified the email and decided that it is a customer inquiry based email that email got passed to the customer support agent and when we look inside the customer support agent we can see that it generated two outputs right the first one here which got rejected and the second one here which finally was approved so now let's take a look at the prompt we have JSON text here which is the content of the email and this is the system message and when we look at the system message we have two sections here the first one is just the simple instruction telling it what to do you're a customer support specialist at TechSpark Solutions etc right and this second section here is actually where something really cool happens and we're going to be covering this in just a bit but for now let's just continue with the workflow so we got the email this email output got passed to the evaluator agent and again we have two outputs here the first one obviously was a rejection so it said pass equals to fail right pass equals to false feedback the email is clear and professional but lacks a proper sign off as John Doe and the placeholder your name should be replaced with John Doe also including a more friendly tone could enhance customer experience though it is generally professional as is right and when we look at this guy's prompt we can see that we passed it some evaluation criteria telling it to make sure that the email gets signed off as John Doe and the tone is professional and friendly and we also emphasize clarity on completeness then we said text should not include the subject which was actually a problem uh in some of the outputs generated by the customer support agent where it would include the subject line in the email content itself then we're telling it to output in a structured manner here we're saying hey if it's a pass then just output an object with a field called called pass set to true and then in the second one here if it's a fail then include a field called pass set it to false but this time also include a feedback field where you explain what's the problem right and we're telling it here additionally no extra keys or text outside the JSON structure just to keep things airtight so in the first one it got rejected and in the second one it reanalyzed the email this time it saw that it did include John Doe as the sign off so it was happy with it and approved the email which allowed the workflow to proceed ending again in us sending the email so what happened in the case when it failed was the workflow went through this route here and we have this feedback node and what we do here is we're setting some fields to pass back to the customer support agent so the first thing we have here is the current email which is the email that the customer support agent generated right and the second one is the feedback that's provided by our evaluator agent and finally we have this text field which again refers to the email content that we got from the Gmail trigger and this time this feedback is the one that enters this customer support agent instead of entering from this part of the workflow and this is where the cool thing I mentioned happens right and in this case what we have here is another section and we are taking advantage of N's expression field what we have here is called a turnary operator so we are saying hey whenever JSON feedback is populated then show this part of the condition so this condition is actually split into two parts this is when the condition is true and this is when the condition is false and so what happened initially was that when we first entered this customer support from this part of the workflow we did not have any feedback right when it first checked if feedback was populated at that point it wasn't so it basically just printed this empty string into the prompt so as far as the agent is concerned it was not aware of any kind of feedback or any kind of prepared email here right that we set as current email but now that we are entering the customer support agent from the feedback node what happens this time is it is aware of the feedback so this turny oper turnary operator here does return true and what happens this time is it adds or it prints this section into the prompt this time and we can actually see the result of that to the right here so we got the feedback the email is clear and professional again the one that we just saw and then we also gave it the prepared email or the current email that it just generated so that it can compare it with the feedback that the evaluator agent provided to generate its new output and this way we don't have to even add any memory to the agent because we're providing all the context that it needs anyway and that's essentially how the evaluator optimizer pattern works and I'm sure you can already understand how powerful it is and how useful it can be in many scenarios and again especially in cases where the evaluation criteria is clear those are the cases when evaluator optimizer pattern truly shines and you might also actually be wondering why didn't we just include all of these criterias in the customer support agents prompt in the first place and that is a very valid question and the reason to that is because in real applications the prompts of AI agents may actually be really long and they at least in where we stand today with this technology they might actually miss out on nitty-gritty details so it is very useful to have specialized agents that are just focused on a single thing like in this case we have the evaluator agent just focused on making sure to validate certain criteria and of course what we could have done is we could have also included these criteria in the customer support agent which by the way I purposefully omitted in the case of signing off as John Doe just so that we could actually see it in action where the evaluator agent catches it and tells it to correct that but in a real scenario you would include all these criteria in the customer support agent as well and also add that in the evaluator agent just to make things airtight and to increase the accuracy in general and with that we have evaluator and optimizer covered now we can continue with human in the loop hey guys just a quick interjection as always I'm going to be uploading this template to my free school community which has around 3.4K members once you're in you'll just have to head to the YouTube video and resources section and look for the post associated to this video obviously I haven't posted yet since I haven't uploaded this video but once you get onto that post you'll just have to look for the JSON file attached to it which is where the workflow will be in in this case it's for example lead generation right i'm going to click on that once you do you'll just have to click on download and then head over to your canvas once you're there click on this three dotted button and then look for import from file and once you do that you'll just have to select the file that you just downloaded and you'll be good to go and one more thing I just actually launched my premium school community yesterday as you can see here we are currently at just 13 members we just started and there's actually a special discount for the first 20 members as a thank you to early joiners so you might want to consider joining anyway I talked too much with all of this set i hope to see you inside the premium school community and of course the free school community but now let's get back to the video and to be honest they are so similar that I feel like we could just call this AI in the loop instead of calling it evaluator optimizer just because of again how similar they are right in human in the loop instead of having AI evaluate the output what we have here is a literal human that's going to evaluate the output and provide feedback and so that's exactly where the difference lies like everything else is pretty much the same thing but because we have human and AI in this case and because currently humans are more reliable than AI human in the loop can be used in more critical tasks where you require decision making in things like for example purchasing something online or maybe uh processing a refund request like let's say you sell products online and you have a userfacing chatbot that can also process refund requests but just before it is going to approve the refund request it could actually delegate that request to a human with all the context of that conversation and the human agent can decide on the final decision to whether or not actually finalize the request and so these are the scenarios where you can actually use human in the loop and where evaluator optimizer might not really be enough all right so what we have here this time is a report generator workflow you can think of it as a mini deep research and the way it works is we're going to provide it with a topic and based on that topic it's first going to look for key points it's going to generate these key points by searching the internet and then it's going to pass these key points to the sections generator here which are which is going to generate the sections for us and these sections are then going to be passed to us for our feedback just like what we had here with the evaluator optimizer right this time the sections generator takes the place of the customer support and the human feedback or us take the place of the evaluator and I'm just actually going to execute this i'm going to go to the uh telegram app and I'm going to say obesity as a topic let me just test workflow first and then click on send all right right so it's working and we can see that the key points generator is generating the key points for us and those key points got passed to the sections generator which is now generating the sections and now these sections are sent to us on telegram what we'll have to do is we'll click on provide your feedback and I'm going to click on open and when we do that we can see all the sections that it has generated and what I'm going to do now is I'm going to say please uh sorry about the thunder by the way it's raining but it does make filming the video more cozy so I'm going to actually just say for uh sake of uh providing feedback to please remove section three and instead add a new section called impact of obesity on let's say uh confidence okay I'm going to click on submit and this should be passed back to the sections generator which now is analyzing our feedback and generating a new generating new sections accordingly which it already did now we got back these feedbacks let's click on open again see what we have and we can see that it did indeed replace section three with impact of obesity on confidence right so it did exactly what we wanted to do so I'm going to say thank you very much and I'll say approved right to indicate that we did approve of it and when we do that it allows the workflow again to proceed this time passing it to the report generator agent the report generator agent is now generating the final report for us and then it's going to be converted into a markdown file which is then going to be sent to us on Telegram and there you go we have that i'm just going to click on open all right so this is what the report looks like we got the introduction we got section one we got all the other sections and when we look at section three it indeed is the section that we told it to include in the report so you can see how using this pattern we have more granular control over the flow of the automation and it is just actually super powerful when you think about it right right now the workflow we have here is uh just for example purposes so we don't really see its implications quite if you take a real deep research example where you might actually do up to 50 API calls i actually have one built I'm going to add uh the link of the video up here where I replicated lang chains deep research on N8N and in that version for example it did around 50 API calls in total depending on your configuration and the number of sections imagine if in those cases where we don't use human in the loop that time we would only be able to see the result once the whole report is generated after doing all those 50 API calls which you can imagine how much cost that could incur right so having this pattern in that case would allow you to beforehand before the execution or the research phases actually starts to just basically go back and forth with the AI to decide on what exactly it is you want it to research for you so again in this case it gives us the control on making the final decision on how we want the automation to progress right and this is also an example where having evaluator optimizer is just not going to be as good as human in the loop since this is more up to the preference of the person that is running the automation so unless the AI can read our minds and know exactly what we want to learn we are still going to be better off using human in the loop in these cases but in cases where again the criteria is clear and definitive then definitely using evaluator optimizer is going to be much more efficient now before we end this video I actually just want to go over this quickly even though it is exactly the same thing right so we got the sections generator it generates the sections for us right and the output gets passed to this human feedback node and this is basically human in the loop node provided by N8N and what it does is it sends a message and waits for a response for that which in this case you can see that we are passing in the sections as the message right obesity introduction section one section two and so on and then we're telling it to set the response type as free text which is where we were typing in our feedback just now and we also added a message button label right telling provide your feedback when we go to our uh telegram here you can see that's what we have a button that says provide your feedback and then our feedback gets Let's pause to the feedback classifier this feedback classifier is just basically analyzes our feedback and based on these instructions or these categories that we have set up here it decides whether or not it indicates approval or or request for modification right in our case we first uh rejected it which navigated the workflow through this path here we have this feedback field so in this feedback field we have the exact same thing as what we had in evaluator optimizer we got output we got the current sections and we got feedback the only difference here is the output and the output is the topic that we provided basically and in evaluator optimizer's case or at least the example we built on was email so the output or in that case text referred to the content of the email and this feedback gets passed to the sections generator in exactly the same format right we are passing in the feedback we also have the turnary operator here we got the feedback and the current section that it generated then it generates the sections again passes it to this node waits for our response and this time we were happy with the generated sections so we approved it and the feedback classifier determined that right and allowed the workflow again to proceed which then later it generated a report for us converted it into a markdown and finally sent it to us on Telegram so that's pretty much it and with that we have both evaluator optimizer and human in the loop covered i hope I was able to explain it clearly to you and that you were able to see how these two patterns really are super similar but only differ in a single element and how that single difference causes them to be applicable in different scenarios and with that we have both of these patterns covered i hope I was able to explain them to you clearly and also convey you on how they are super similar in the way they work and the only difference they have is the fact that one of them uses AI to evaluate the other one uses human and this different causes them to be applicable in different scenarios for example we have human in the loop being more effective in tasks that are more critical or that depends on user preference and on the other hand with evaluator optimizer it becomes more efficient in cases where the criteria and the validation condition is clear and determined in those cases having evaluator optimizer would be enough or even more efficient compared to having human in the loop implemented so it's really like efficiency versus reliability and so yeah that was the topic of today's video take care