Effective Prompting Techniques for GPT 4.1

Hello legends. In this video, I'm going to be going over the GPT 4.1 prompting guide that was created by OpenAI. This guide goes over the key differences between the 4.1 model and its predecessors like the 40 and previous models and explains that because of the differences in how it was built, there's actually differences in how you should actually prompt it as well. So, this article is fantastic. Lots of really good information in here, but it is very long, very textheavy, and very dense. So, in this video to show you a highlight reel of what we're going to be learning and seeing is, I've actually taken that article and I've highlighted interesting stuff, um, key takeaways, key learnings. I've annotated it as well. So, we're going to be going through the probably the most interesting stuff, the stuff that I think you should actually know. Um, although I do recommend just go and read the article yourself after this video as well. And I've also got an excellent discussed in the article. If I just zoom out, we've got a bunch of different stuff here that we're going to be covering. So, I think like I'm a very visual person. So, this is for me uh much easier to understand than leaving than reading a couple of paragraphs of text. So, I think it might be good for some people to just see visually some of the differences in the prompting just to learn. Um, but then I also break down uh I kind of pull in extracts from the article and I just break them down into like how you should use it, what it means. I try and focus on little nuggets of information. Um, I also just look at like I use NAN a lot for my videos and if you're watching my channel, you might use NAN or you might do custom coding. Um, I just want to like, you know, explain to you a little bit more of the technical information about how you should set up your prompting. And then at the very end of the video, we dig into like what's it like what's the actual um, recommended output structure for a prompt, how it should look. This is recommended in the article. And then I've got a breakdown over here of like each section, you know, what it means and some examples to really just help you build your prompts. In addition to that, we've got a couple of different sources that we're going to be looking at. So, we're looking at uh over here in the OpenAI um playground, we're looking at this little feature which helps you generate prompts for your AI models, which is pretty cool. It's an OpenAI feature. I'm also going to be going through some just briefly some documentation for OpenAI. So one cool thing is uh for example in this explanation here if I just zoomed out you would see but this this is using XML tags within a context of a prompt. So this is also referenced in the article the 4.1 uh prompting article uh the XML tags actually improve the output efficiency or the output accuracy of um of the model when you're using basically when you use XML tags. So, I'm just going to show you this like little cool feature just to start broadening broadening your um understanding or just like just trying to make you a bit better with prompting to be honest. Um I then also go into very quickly anthropics like public resource of how to learn prompt engineering for it's made for the clawed models like for um anthropic models but really this is applicable for quite a few different models that you use. They also use XML tags. So, XML tags, I'm going to highlight in this video as something pretty important that I think you should learn to use and I've been using them as well more and more. Um, but yeah, I just really want to kind of go over this article, uh, show you what I think is most important and just try and, yeah, bring you up with me. Um, at this stage, if you're watching this video, I'd appreciate if you could drop a like onto the video. That would help boost the engagement. It would actually help push this video out to more people. It would help me grow my channel and just try and, you know, help me reach more people, which would be pretty cool. Um, but at any stage, if whatever I say, like you think you have more, like if you do have more information about this, why I say anything that's wrong, drop a comment below as well. Like I do want to learn. I do want to get better. So, I'd appreciate um like I'm happy to learn from anyone, but I don't really care. So, like if you do know information, I'm pretty sure everyone watching this would want to know more information about this, too. So, where I want to start is actually by uh speaking about why I think 4.1 is important for us to pay attention to. So the 4.1 model from my understanding in this article is firstly yes it's built differently to the 40 model and specifically the the what we're looking at here is that the 4.1 model is better at following instructions. So what that like what that leads to like the connection that I'm making is okay the 4.1 model is getting closer and closer to um basically being very good at agentic workflows. So we've all been building agentic workflows now as it stands and an agentic workflow if you watch my channel or if you're just building you know customcoded stuff is the concept of an AI agent. You have an AI agent that has a brain like a 4.1 model has a set of instructions like a system message and can do tool calling. So in the context of a workflow um like an agentic workflow is using some kind of LLM uh system like this. So we've all been doing that. But what I think is happening um with the 4.1 model from open AAI with the 3.7 model from from anthropic with the 2.5 model from uh Google Gemini um all these models are kind of stepping towards more and more of this like next level model um and to kind of support what I'm saying here to kind of give you like my own understanding of how I like I'm trying to deliver this message I want to jump across to um a YC uh a YC article. So YC Y Combinator is actually a um a tech firm which invests in AI startups or actually it doesn't just invest in AI startups sorry it just invests in tech companies um and it has a very high success rate of picking out really popular really good um I'm probably just describing this wrong but like tech businesses. So over here you can see that you have human loop which is building reliable AI products and this is yeah okay an AI product. Then we have Truin which is building uh digital staff. This is another one Shephardd which is uh AI enabled self-study. So I think this is specifically looking at AI startups but really Y Combinator so YC um actually has this so-called wish list of businesses that they're going to be investing in in 2025. And they're saying that they're actually putting all their chips in a basket of AI kind of like AI agents, AI employees so they can sell to companies. So you have full stack disruption. So like AI software employees, AI agents and personal assistants, uh AI employees for healthcare, voice AI, and a bunch of different like you could just see a bunch of different um AI staff members. So when we're coming back into here, if we know that one of the biggest um firms that actually invests and is trying to pick out the best AI companies, if they're looking at AI agents, well, OpenAI's best interest is to start building models that will bring us closer to the space of a kind of these autonomous systems where an AI agent has to actually replicate what a human does. Cuz this I mean at the end of the day, an AI agent is a replacement for a human. A human has a brain. uh if you're a stat if you're an actual um if you work somewhere you've got a role in responsibilities um you've got a set of metrics that you actually have to like KPIs against your own performance. So an AI agent, this system itself is trying to replace what a human does and replacing human thought, human ability to do complex reasoning, complex problem solving is not very easy. That's why I think this 4.1 model, it's very interesting. I don't think this is the be all end all model, but I think in general the 4.1 is like now we're taking steps more and more towards this specific area and this specific area. It's all trending towards AI staff members or like I think right now it might be AI staff, then it'll be AI team and then it'll be AI businesses. um but to that space. So um over here we've got this reference to a benchmark which is SWE and apparently the 4.1 model solves 55% of problems. So the SWE benchmark is actually a uh a benchmark that evaluates the AI model's ability to solve real world software issues. So in the context of software issues it's a very like difficult problem to solve. Um, but if you're building something like AI staff members, going back to this here, like AI assistants, uh, AI voice assistants to like manage your email inbox, like the problems that these guys have to solve or these AI systems have to solve is not as complex as software problems. Um, so what I'm what I'm saying is that if this AI, if the 4.1 is really good at solving really complex software problems, it's going to be very good at solving these like day-to-day problems as well. an email agent, a healthcare agent, a customer support agent, an email, you know, assistant, whatever else. So that's why I think it's important for us to pay attention to the 4.1 model. And yeah, it's basically it's important for us in general to just understand that the 4.1 is a little bit different to the 40. So, even if you have a um even if you've got an existing automation or workflow that uses the 40 model, just replacing the tag that says which model do you want to use and instead of just putting it to 4.1 and kind of going away with it to really unlock the power of the 4.1 model, you have to you have to prompt it in a slightly different way. Um, so yeah, now we're going to get into the actual crux of the article. I hope that now uh we're on the same page about or at least you can understand why I think the 4.1 is pretty interesting and pretty important to pay attention to. Um but yes we have over here uh this actually underlines how how good the model is at following instructions. So if the model behavior is different to from what you expect a single firmly and equiv unequivocally clarifying sentence will most likely steer the model into the right behavior. So let's say you have a prompt and it has 10 like 10 steps a bunch of different information about how to actually build um how to actually take your problem to a solution. So if we go in something like this, if you have a prompt over here and then in, you know, on the left hand side, you got a problem and you got to take the problem to a solution via the prompt. If this is not really giving you desired output, what the 4.1 can do is it just says a simple sentence can actually like it's that good at following instructions. A simple sentence can just like shoot you into the right direction. You don't have to restructure your entire prompt, which sometimes you would have to do, you know, previously with the for model. might have to fully fully change how the prompt is built. Fully change your initial steps, how it actually tries to solve the problem. Um, a single sentence can and I don't really have an example for this, so it's kind of hard to understand um, just how like useful this is, but that's the point that it underlines is it's very good at following instructions. So, I'm just going to jump across to 1.1 and 1.2 to speak a bit more about this, which should push the point across more. Okay, so optimal context size. So we observe a very good performance on needle in the haststack evaluations up to our full 1 million token context. So back in Excali Draw I've got a kind of a highlight section here of all the cool things about the 4.1 model. So over here we have the 4.1 model has a 1 million token context window for perspective. Um this is like there's a book called Crime and Punishment by Fodor Dostki which is like 500 pages. I don't know if you've heard of it but imagine a book 500 pages. you can fit three of those books into the context into the 1 million token context. So, it's actually just over three books, which means that you have 1500 pages of context within this um 4.1 model when you're sending one of the messages in and the needle in the haststack. Uh this means that like imagine that you have that you just want to find tokens within a sea of 1 million tokens. It's very good at finding those five tokens. So let's say I give you as a human 1500 pages which might have um I don't know tens of thousands of sentences and I'm like hey find this one sentence amidst these 1500 pages of tens of thousands of sentences. Um and imagine you could do that within a few seconds. Okay here is that specific sentence that you're looking for. Um so that is that is truly impressive. Um and this actually feeds into later on we're going to be speaking about how to build your prompt. Now knowing that um which is going to be basically you know we'll be looking at different features of how to set up your prompt to be able to work with these massive contexts. So that was 1.1 and 1.2 this is the next thing over here. Okay. So note that if your existing prompts include these techniques it could use 4 point. Okay. So, this this is saying over here is unlike previous models where you would have to use like caps locks or do these like really sneaky things like bribe it like I'll pay you five bucks if you really think deeply and solve this which are things that people were actually doing with previous models. You don't have to do with this model. It's so good at following instructions that if you simply take the 4.1 model and put it into a prompt that was for a previous model that was not really good at following instructions, um you might actually uh screw the system up because it could pay attention way too strictly to that. So that's a very interesting point as well. So literally it's saying here that okay for this needle in the haststack situation from the 1 million tokens we can actually understand and observe every single like individual section of tokens as independent contributions to the prompt which is massive. And then it's also saying uh be careful not to over prompt it using these old school techniques because they will uh make your system fail. So yeah I think that's really really cool. Now, going back up to the top, we have um the next section is system prompt reminders. So, because we have that massive context, that 1 million tokens, it just makes sense that you need to be tapping it on a shoulder certain times to basically say, "Oh man, I've given you a million tokens um of context. Here's what I need done on this million tokens of context and to try and um basically keep it calibrated towards getting you the desired output." So, this section here I've actually copied across to Excaliraw. uh this is saying that there are three technique three techniques which is persistence tool calling and planning that open AAI did in when they were testing this model. So persistence tool calling and planning um that were able to achieve an improvement by close to 20% against that benchmark score that we spoke about before 20% from three simple techniques persistence tool calling and planning. So over in Excali Draw let's shift across to that section that we want to speak about. So here we have um we have the three sections. Now this for this section here think about in the context of the 1 million tokens and 1 million tokens being 1500 pages. So um what we're saying here is that persistence this ensures that the model understands it is entering a multi- message turn and prevents it from prematurely yielding control back to the user. So that first part is um it kind of goes in line with GPT 4.1 being really good at following instructions. So imagine you have a prompt very basic prompt that has a number of steps in order to take the problem across to a resolution. So what these older models would do if you didn't use caps lock and try and bribe it would be that it might complete step one, step two, step three and then kind of trick itself and kind of exit out of the sequence and go into a solved which would cause you to have a um a failed result. It wouldn't actually be accurate because you didn't complete step four and five. So in general 4.1 is so good at following instructions. that article was saying that it's much better now at following every single step that it needs to. But with this persistence thing um that the model understands it is entering a multi- message turn. This is a sample prompt that you can put into your or sample um like sentence that you can put instruction you can put into your prompt um to help with that persistence to make sure hey we've got multiple steps here. You're an AI agent. Please continue going until the user's query is completely resolved before ending your turn and yielding back to the user. Only terminate when you are sure that the problem is solved. So that's like pretty straightforward instructions that you would actually insert into this prompt to try and make sure like even more that hey there's multi multiple steps here and these like this seems easy step one step two but this could be first analyze and review this entire uh problem that you're receiving compared to the context that's provided to you. Step three make a tool call and now now re-review what you've just read and add the additional context to the previous context that you have. These are very complex steps. So some of these systems can be actually very very sophisticated. Um that's what this first persistence one does. So after we look at persistence, we then have uh tool calling. So tool calling, this is um I mean you probably experienced it already. This is where hallucinating or guessing an answer can come from as well. So um this encourages the model to make full use of its tools and reduces the likelihood of hallucinating with the fact that it is good at following instructions. U again this whatever is in here in this code block is just reproduced here. If you are not sure about the file content or the code database structure, again this is because the S sw benchmark for software problems. Um so you just adjust this to your own use case. If you're not sure about the content of the customer support ticket that you receive um in order to answer the request, just please use the tools that are provided. So it's actually much better at following tools. And later in this video, I'm going to be going over some like how you have to set up the tools in for example n to make sure that you're, you know, you're marrying up the tool to the actual prompt. Um, so yes, we have here uh to the user's request, use your tools to read the files and gather the relevant information. Do not guess and do not make up an answer. So that's another thing that you would add into your prompt to make sure that you're going to be getting um the most successful output. So like a resolution and not like a failed resolution. And a final thing is uh planning. So planning is very interesting. I've just taken again out from this code block into here. You must plan extensively before each function call and reflect extensively on the outcomes of uh previous function calls. Do not do this entire process by making function calls only as this can impair your ability to solve the problem and think incightly. So once again if you just look at like each step here what it's saying is imagine this is all just function calls for example or it could be this is a function call that's a function call that's a function call. Um it's saying that after you do the function call before and after just make sure that you're doing this for a reason. You're not just doing it to follow the steps. So actually think a little bit more. um kind of turning the 4.1 into a bit of a reasoning model which we al also discussed later but before you do the function call consider is this the right thing to do after you do it review it and then analyze now the entire context again like just re-review that here's the returned information what does it mean do we actually go to step two or do we go to step 2 B um and that's what it's speaking about here when you're doing planning and that planning is not just at the very very start of the actual um like prompt it's like planning reviewing optimizing during the actual execution as it tries to take the to the solution. So all I've done here is I've looked at the comparison between the just a barebones prompt and I've tried to ingest those three different sections into here to show you how they would sit. So you have over here a um the core prompt with the instructions plan and reflect. So you might put that sentence that we had over here this plan and reflect sentence that was here into this top section for planning and reflecting uh tool calling. You might put this tool calling. Yeah, maybe you wouldn't put at the bottom here, but this is just to show you. You'd put these two uh planning and reflecting tool calling into this top section of a prompt. And now that you actually have 1 million token context, this context is actually where you would ingest things like the actual problem itself. You might ingest different context of the problem. So maybe you have another API call that ingests all the historical customer support tickets from this specific customer. So all of your different data sources, you can insert it into this prompt if you wanted to. Um in addition to doing tool calling which would then seep out into different data sources as well. Um but that's what you're doing here. you you're literally building the start of the prompt to have some of these uh techniques built in. Then you have the actual context and in the article later we also read about a final reminder, a final set of instructions after the very context. And actually the model I think the 4.1 and other models um they'll actually listen kind of more to the final instructions at the very very end um when they're generating their output. In general, models, uh, the longer the context, they pay really good attention to stuff at the start, really good attention to stuff at the at the at the bottom over here, and sometimes in the middle, this stuff, it gets kind of lost in lost in translation, but it just it's less likely to follow the instructions here. So, in this space between the start and the end, this is where the 4.1 model shines, which is why you can put a very large context. And like we spoke about, it has that needle in the haystack thing, which you can literally pull out five tokens from the sea of 1 million tokens, which is truly impressive. Um, but that's how you would start building your prompts. This is like like learning one. This is how you would be structuring your prompts. Now, you can put a lot of context into here. You must have a final reminder at the end. And you're probably already doing some of this stuff now, but this is like the 4.1 is really good at following instructions. And all we're doing here is giving it more instructions and just more structure and stability into how it's completing its process. Now, going back to our guide. So, we spoken about these system prompt reminders. Once again, 20%. This is literally this, this, and this. an extra one or two sentences throughout your prompt will give you a 20% better hit rate for your resolution over here. So that's such a very like such a simple improvement um that can actually get you um a lot of benefit. Scrolling down now, okay, to tool calls. This is another interesting one. So developers should use uh use should name tools clearly to indicate their purpose and also add clear detailed instructions uh clear detailed descriptions to the description field of the tool. So this is where we had this nan visual over here. So at a high level, if you're building an AI agent in NAN, the breakdown is you have the AI agent, you have the system prompt, you've got tools that are attached to it, and you have the 4.1 model for example, or any other model that you're using, but for this context, 4.1. So within this tool, you have the tool name and tool description. And then I've just literally opened up this tool that we had here into this picture. So you can see tool name and tool description. This is super important because when you're adding new tools to N and this is going to I'm going to show the code version as well from the article. When you're adding new tools, don't just leave the actual name default, whatever it is, because the name that you're using here, you must use in the system instructions. That's what this next section is saying. In your system instructions, you must actually use the same name as what you have here. Um, to have higher accuracy of actually using the correct tool in the execution. And then tool description also important. So, what you're actually looking at here is a visual interface. Um, but when it's applied at a code level like N is just like drag and drop for code. It makes it easy for you. you don't have to code but in the back end it is code. So this structure is actually built out um like how open AAI is expecting it. So changing this tool name and tool description does um feed into the higher success of the 4.1 model. So we just looked at how to do with NAN and if I go back to this guide here actually this one will be better. Where is our tool section? Chain of thought long context and then tools. Okay. So it's this section here uh tool calls. And how it's done here is that you have your system prompt. So all this information for system prompt but then that section that we saw for the HTTP call that's this over here. So there tool name Python bash patch tool. And then the description which in this case this is a very complicated description. So they've actually got a standalone um section but it's this entire bit here this entire thing here. So it's a it's a little bit different to n because when you're doing custom code you can actually put proper instructions that have much more detail and action steps that you can take. with an N you're kind of like you have guardrails up. So it's a slightly more restricted environment. But look at the description that they're using over here exactly what the tool is, what it does, how to use it, all the different bits and pieces. So then when you're calling this tool, you can have really high success rate and actually again it's all about getting the right answer. So it's all about that SWE benchmark that we referenced before. Um but for our agents that we're building um these AI systems or these AI agents like what why combinator is looking to really invest in all this stuff um these guys are doing that they're doing they're definitely going into the prompts making sure the prompt is accurate for the model they're doing lots of testing which is we're going to be speaking about later in this video as well lots of testing they're making sure the tool calls are using the appropriate configuration the appropriate naming conventions the appropriate description so if you want to get into that space you have to be doing this stuff so we have a tool call over here. Uh if your tool call is particularly complicated, you can create an example section in your system prompt, which is uh what we were looking at before. Uh this is the system prompt. So within here, you would actually have the example section of how to execute that tool. I don't think for n workflows, we would necessarily need to use that. Um but for yeah, customcoded workflows, you would have you would have to use something like that. um rather than adding them to the description field and that description field like what we saw here is if this is a pretty complex tool which it really won't be an N um you don't put the description you don't really put all the instructions into here you would add them into the system prompt so that's how you actually do that and that's why we had in this coding example here we had that system prompt or we had the tool description here as its own standalone prompt um or its own its own standalone description. So to finish off this tools section, we actually have like it says over here like remember you can use the generate anything in the uh prompt playground and it's a good starting point for defining your new new tool definitions. So going to the playground that's what we have over here. So this is literally just copied from that um I've clicked that hyperlink and we have this prompt section and you can press this button to help you generate some of these prompts. Although I would actually just use the techniques from from the article from this video as well to start just getting better at creating the prompts yourself instead of constantly relying on AI because everyone's relying on the AI. So what sets you apart? So yeah, just learn how to do it. Um and this tool section here can create different tools. You can kind of play in this playground. It's got sample definitions. This is more if you're going to be doing customcoded stuff. Yes, for sure. But it's a resource that we're being linked to. Um otherwise if you're using nann.com, just make sure this stuff is actually filled out. The tool name, tool description, reference the tool name in here. put some more information about the tool if you need to into the system prompt as well. Um the more information, the more context the better. All right, so prompt induced planning and chain of thought. So like we already spoke about developers can optionally prompt agents um built with 4.1 to plan and reflect between tool calls. So that is specifically uh this thing here, this planning and reflection between tool calls which again these three components all added to 20%. And this specifically added to 4% um in like success rate for the desired output. Um instead of silently calling tools in unbroken sequence uh 4.1 is not a reasoning model. So that's another important thing. The the actual like what we're doing here by saying planning and thinking is let's imagine just for argument sake these are all different tool calls 1 2 3 4 5. We're saying that between each tool call at the start of the tool call stop think and reflect. Okay what are we doing here? Okay cool make the tool call. After the tool call, stop, think, and reflect. Okay, what have we done? What what's the new context? How's everything look now? Okay, tool call two. All right, what's the intention of this tool call? Okay, what's the context that we currently have? We've just completed one tool call additional context. Like, it's literally stopping at every single stage between tool calls and um it's making the model um it's kind of turning it it's not turning it into a reasoning model, but it's giving it the ability to do this um to do this reasoning. And that itself is important because in general uh we have here the 4.1 is not a reasoning model. It's not a reasoning model which means it doesn't like it's not an 03 it's not an 01 model which will take time to spend tokens to actually think about problems. Um but the 4.1 what it is good at is it's it's good at following instructions. So what you're trying to do here is fuse those two elements together. a model that's really good at following instructions and a model that's really good at stopping thinking and reflecting because thinking and reflection will give you a better result actually like you know planning in general like success favors the prepared planning in general gets you a better result instead of just kind of blindly following the steps and not really paying attention to what you're doing. So the 4.1 this is like that's what they're saying here. It's not a reasoning model but you are trying to make it kind of you know resemble or you know do things that a reasoning model would do because it gives you a higher accuracy. So that's that 4% that we're speaking about here. Now if I zoom out, there is actually a um a technique that OpenAI speaks about for how to get that chain of thought prompting um into your actual prompt. So the actual bit that we would be speaking about is this uh planning, this thinking and reflecting. So this instruction set here, if we go to this uh this part here, that's what it's saying. How do you actually optimize this? How you can actually build this out? So this bit here from this code block and this from this code block I've copied across to Xcali draw and the contrast here is actually disgusting. I'm sorry guys but so what it's saying is first we want you to start small. So we recommend starting with the very basic chain of thought instruction at the end of your prompt. Okay, so that's what it's saying. The very end of the prompt is jumping around a lot. Sorry guys. Is this bit here, the final reminder after all the context. The final reminder. So it's like okay here's the instructions. Here's your context. Now, slow down. Breathe. Stop and reflect after each step. Make sure you're on course. You're doing the right thing. That's what it's saying here. And it's giving you the instructions on how to actually build that sentence accurately. Um, so it's basically saying first in Excali Draw, we have the text uh copied out. Think carefully step by step about what documents are needed to answer the query. Then print out the title and ID of each document. Then format the ID into a list. So this is what you would put at the very end of the prompt. This is obviously specific to like a certain use case, but let's say you're doing customer support. You might first say uh cuz customer customer support you might get multiple different types of queries. General questions, warranty concerns, and order tracking. Each of these would have a separate set of steps that you need to do to get the answer. Like you wouldn't just ping the rag database if you need to do a order tracking request. It's just you wouldn't do that. So maybe you would start with something simple that says first think carefully step by step or you know think carefully about what request you're receiving and categorize it into one of the three um uh what is it general questions warranty or order tracking and that would be your initial first step and then you might say print out what the ticket category is like what the contact reason is and then it's saying that when you do this look at the desired output and then iterate step by step then you might you know get to something that's a bit more complex but you would literally break it down bit by bit until you um have an entire set of steps as your reasoning strategy, as your planning and thinking strategy. So, I've got that copied across in Excaliraw. Um actually, before we get to that bit, it is no actually we can get there. It is this reasoning strategy here. So, break down and analyze the query until you're confident what it might be asking. Consider the provider context to help any ambiguous or confusing information. Context analys. Again, this is all just like spend tokens thinking. Don't do tool calls. Don't do actual work yet. like generate an action plan for how you're going to be solving this. And this is all going at the very very end of the prompt into that final section. So carefully select and analyze um the large set of potentially relevant documents. So you might actually have the documents within the um the context or might be a tool call and then summarize which documents are most relevant including all documents with a relevance rating of medium or higher. And then um yeah for our case with the customer support it might be like analyze what the customer is asking about. Is it a general question? Is it a warranty? Is it a uh order tracking request? Based on that information, just understand, do you have all the information you need to actually successfully answer the ticket? If it's an order tracking request, do you have the customer's name or order number that's going to be needed? Okay, if you do, then generate a small action plan for how you're going to approach this problem. Here's the tool calls that you have available to you. Um, generate your plan and then execute. So, that's the kind of like how you would approach this. um you know or this is a reasoning strategy but this is essentially yeah like the planning um and thinking between tool calls or planning and thinking for how to solve the the problem in general. So this other cool section here it speaks about where errors tend to come from and just like we spoke about at the very start of this article we spoke about the GPT 4.1 model is very good at following instructions. So typically one sentence can actually get it back on track. So if you've created a prompt you don't have to worry about like you have to rewrite the entire prompt. you'll be able to get away with actually just making minute changes. So this is saying that generally speaking errors tend to occur from mis misunderstanding of user intent. So if you haven't completely like if you haven't actually ident like um articulated your problem and what you need done to it. So user intent um oh sorry this is like the let's say this is the customer support ticket. So the actual model is not understanding for the context of customer support agent. It's not understanding the user's intent. Um it's what they're messaging about. Maybe their message is hi I need help. So of course the model is going to fail because it's like well what do you need to help about? We can't actually solve it. So in that case you might have a fallback option that says if the user doesn't give enough information ask for more information. Um insufficient context gathering or analysis. So maybe it's not looking at all the context like we haven't given it the appropriate instructions to say here's the different data sources that you can actually tap into. So tap into these to figure out what your next step is or to figure out how to solve this problem. And then insufficient or incorrect step-by-step thinking. So again this is all about prompting. It can all be fixed by prompting just telling the system to work a little bit slower um to you know to figure out the appropriate plan for itself. So essentially this is an example of how you would uh solve all of these problems by building out a reasoning strategy like this and you would tell the model at the very end of the prompt uh so the final thing it sees is that reasoning strategy or that um planning and reflecting or that that planning strategy over here. So that's what's going to get you this really high output. Again, it's 20% from all three of these. Um, and that's a major chunk of it as well, just planning. And this is this is what the guide is basically telling you. It's put that section at the very end of the prompt. So to keep going through this article, we have optimal context size, uh, prompt organization, we spoke about this recommended workflow. So for this section here, I think what a lot of people would do and what I would actually used to do is, um, I would ask chat GBT to write the prompt for me. So I just kind of go into the chat GBT window and I would just generally speaking say write this prompt which was working for me so far. But what this process is now saying is you cannot like don't just go and create a prompt using chat GBT actually think about the structure of the prompt. So now that we have Excal we can kind of see like how the format the prompt should be. I don't think chat GBT is going to be able to uh natively understand how the 4.1 model should be prompted unless of course you just take this entire article upload it into the context of a chat GBT project for example and say here's my problem here's a prompt that I want to create follow the rule set from this article so that's what you could do for sure um but this is saying um yeah don't just go use a chat GPT actually start building your process like this so what it's what it's saying is start with a highle bullet point instructions so when you're first building your prompt What you just want to do is keep it very simple. Don't put any of the other techniques into here. Don't use any of the planning. Don't use any of the like mention the tool calling, but just keep it very simple and keep it like five bullet points. Really kind of like what we're doing here with this chain of thought. It's the same. It's literally this is like a really good way to solve problems. Um, start small, just give it one or two instructions, see what the output is, and then keep iterating. This is actually how I build my automations. Like my workflow for building automations is I don't try build the entire thing. I build one section, I run it, I get the result, I see what data I have, I build section two, and that sometimes is node by node. It's it's literally so slow, but the accuracy is through the roof because I don't have 20 nodes and I have to figure out where it's like what where is what's screwed up. I go node by node and I kind of build it out. So building out your prompt step by step is also very interesting, very good idea. You just do like three or four steps, run it, see what the output is, and then iterate. So you you don't actually have to try and oneshot the prompt. So back in these instructions, start with high level bullet points. Test and review. Run it a couple times. See what feedback or what information you get. See what's missing. Add section by section. So this is like we're going to be speaking about what sections you actually need for building out your prompt in general, which is like uh this stuff here, prompt structure, uh role and objective, instructions, um reasoning steps, example, final instructions. Don't build out everything at once because you're not going to know what's actually producing the good result or what's actually pulling you away from a good result. So you just like build it out step by step. So probably start with instructions first. You build out the initial instructions. Once you have a really good like a relatively good output, but you just need some tweaking, then you might add some reasoning steps. Then you might add some examples. Then you might start giving it some character and say this is your role and objective. So like all these different parts are all subcomponents of getting to your res like getting to your solution. So that's what this is saying over here is start high level bullet points like the instruction section test and iterate add section by section kiss. So keep it simple stupid um use examples of desired output. Uh no don't use caps and don't bribe. So we already spoke at this section before we don't need to use um we don't need to oversell how accurately like focus on this section the most. We don't need to do that anymore because we know needle in a haststack it's going to be able to pull out five tokens from 1 million tokens really easily. Um, and if this is the second, if you are updating to 4.1 in different workflows, so if you're actually using an AI agent already in NAN, for example, and you change your model from 40 to 4.1, you must change a system prompt. Like just take the system prompt, put it into chat GBT if you want to like do it the easy way. Um, and just ask it to help you revise it based on the instructions that you have over here. Like that's if you really want like to kind of like just take an easy way out. Um, but as a fun challenge, you can just start building the prompt from scratch. You have the framework here. Um, no. If you are updating to 4.1 in existing workflows, you must edit the prompt and that's going to give you more accuracy. Uh, that's the the Ferrari analogy. If you're putting the 4.1, this is access to a Ferrari. If you do not modify the prompt to be suitable for the 4.1, you're stuck in first gear. You're just like smashing first gear the whole time. But gear 2 would 3x your speed. So just getting one unlocked level would actually 3x your output. And you've got six six extra levels. You got six gears in a Ferrari, for example. So do not limit yourself by not putting an appropriate prompt. And from this article so far, what we see is it's very easy to build a prompt. We've got the framework here, but even before the framework, just build it section by section. We've got some very visual basic understanding here of what's important. 20% of the overall 55% accuracy is from these three sections. So if you want really accurate prompt, have these three sections in this specific order. Um, and then you're going to get, you know, halfway of the way to your results. So that's what we spoke about here. Now, the final thing I want to speak about is actually um delimiters. So, a delimiter is just like a different tag that you use within your prompts. So, what you're probably already used to is using markdown. And markdown, if I go to my delimiter section, so it's the same thing here. Markdown is like using um hashtag to say this is a H1. This is the most important. Then, uh H2 is like second most important. So you're actually giving the model an understanding of what's important, what's less important, like what to like what to focus on when it's going through the actual um the prompt itself to understand what should I put more attention to. Then you have like asterics. So like this thing and this is all just varying level of importance which you could use to denote sections, you could use to denote subsections um or important steps. So markdown everyone is using XML. It's actually recommended to use XML as well and XML is what I spoke about before. So let's say for example within your context within the actual context of your system prompt you're putting a bunch of information like the problem some other you know historical tickets a bunch of different context those dynamic variables that will change customer to customer you can insert them using these uh these uh these XML tags. So XML tags here's an example. What this is telling the model is essentially this is the start of the user query and with this forward slash this is the end of the user query. Everything in between these two tags is exactly what you must process or understand. This is what you must process. It doesn't get it confused with like in a very long context. It'll be it's like it's not the model can figure out it can find it can locate those things that it needs to locate. But in order for like to kind of distinguish between different bits of information where it stops and starts you need to use XML tags. So the interesting thing is here they speak about there's three types of delimiters that you can use. Markdown XML what we just spoke about and also XML and anthropics. um prompt engineering like course thing where it constantly says to use like these XML tags constantly everywhere XML tags. It's like literally every single instruction if you put dynamic information in there you must use XML tags or you don't have to use them but like you're going to have higher you're going to have higher accuracy. Um this is actually a really cool resource. Yeah, if you want to see a video of me going through this from start to finish um just yeah comment below that you want to see that. But what we're what what what the interesting thing is um sorry for that little jolting there but what the interesting thing is that it's here we have okay XML performed well in the long context. So when you're looking at using 1 million tokens for example the longest possible context using XML tags to denote different parts of um different parts of the actual prompt. So here doc ID whatever you have fox the quick brown fox jumps over the lazy dog. So this is start of the actual XML tag, end of the XML tag. The cool thing is you can actually put metadata into the actual tag as well, which is like 10 out of 10 next level. Super long context. Not only are you going to be able to do um distinguish between the user query, but the metadata is what gives the prompt even more accuracy and understanding how it should use this specific bit of information. So this is this is absolutely wild, this specific technique. I might even do a video on XML metadata technique for prompting um as a standalone video. But it said that yeah, XML performed well, but JSON performed particularly poorly. So using JSON within your prompts is not so good. Isn't that interesting? A lot of my prompts, I would actually use JSON within the actual um prompt itself. Like where are we? Where are we looking at here? So like when I was like whenever I was doing this, like sometimes I would use JSON to to know here are your categories that you can choose between. I put like a JSON information. No, don't use JSON. It's not so good especially with long context. You want to use XML tags. So yeah, that's interesting. That's that's the like the final bit we're going to be discussing or the final bit of this video anyway. Um is just to end it on delimiters but like it's these it's like little bits and pieces like this that probably add five 10 points to like to the outcome of whatever you're creating. So yeah actually there's actually one more section. So for Excaliraw here the final the final takeaway is this is the ideal prompting structure. So in the actual prompting instructions here we have instructions to to follow customer service general advice. Okay. So this is the actual prompting structure of what you should follow, how you should build it out. You don't need to have every single one of these sections, but this is how it should be built out. Again, context here and the final instructions for the prompt to think step by step, which is what we discussed already in this section here. the final instructions is that planning stuff that we were looking at when we were doing I know I'm jumping around a lot but that chain of thought at the very very end which um its overall contribution was you know 20% amongst these three different um features or the these three different things that you could do. So for here the prompt structure ro and objective instructions subcategories and more detailed instructions it's using markdown over here it's using the actual um same technique of using the context and then the final instructions which is a 20% against the uh sw benchmark um and then also you would use XML tag. these like you can slowly start to see all the stuff that I mentioned in the article like all the different like yes markdown yes the structure of the prompt yes the including of instructions here it's all purposeful to get you as close as possible to the desired output there's a science to this it's not just an art so yeah here's a breakdown um role and objective define what the role is like what it is and what it's trying to do a helpful research assistant your goal is to extract clear summaries and highlight key technical points instructions is highle guidance um be specific on what to do what to avoid Avoid include tone formatting and restriction. So, always respond concisely and professionally. Avoid speculation. Just say, "I don't have the information if you're not sure." Um, and it'll it'll it's much better at following that like that instruction as well. Format your answer using bullet points. Um, sub instructions. This is another interesting one because I've actually done some reading on this. And um, sample phrases. This one is uh, when you put some sample phrases into here, the model will follow the instructions perfectly, which means if you give it like two or three sample phrases, it would always use them. So, in like a live chat environment, it's going to constantly be using those same three phrases if you don't give it like a bit more of leeway and say, you know, not you're not going to say don't overuse these phrases, but you have to be specific when you're prompting and say, don't just use these phrases. Here is a pool of other phrases that you could use or use them one every five sentences to like really mix it up. Um, so sample phrases in a sub instructions is a new one. I've never done sub instructions before, so that's really really cool. Um, prohibited topics, don't discuss politics or current events. So like these second layer um are kind of like yeah operational instructions. So you're meant to you help us like highlight really good information but when you're doing that keep it conversational. Don't speak about this and that. Um and kind of you know you're putting up guard rails like stage two guardrails for the actual operation like the personality side of things. Um step by step reasoning and planning. Uh think through the task step by step before answering. Make a plan before taking any action. Reflect after each step. That's the stuff that we're speaking about. um over here planning and reflecting between each of these steps. Um and then you had chain of thought. So like yeah, planning and reflecting is a little bit different to chain of thought as well, but both of them work towards making the model more of a like giving it reasoning um making it a bit more like a reasoning model, but not an actual reasoning model. So you have these step-by-step instructions, your output format, specify exactly how you want it to look. I typically in here would put even JSON outputs. So I'd say this is the JSON structure that I want. And I would use that when I'm working in NAN or even doing custom code projects because I specifically would want um to then use the data in a separate in a separate place somewhere else. So my go-to is just like JSON outputs but you can see here uh respond in this format summary key points and conclusion. Sorry it's a little bit blurry and then you pass some examples as well where these examples are using markdown in this case. So examples um is show GPT what good looks like. So you just give a couple of examples for a customer support query or whatever work you're trying to complete. Um, final instructions, repeat the key parts at the end to enforce reinforce the model's behavior. So, final instructions once again is just that section. And I keep coming back to here, so you should probably already have this memorized in your head. Um, final instructions over here. So, it's a very simple layout. Nothing too crazy. Um, but that that's yeah, that's a run through of how you would actually build it out. And then we have uh bonus tips from the guide. Put key instructions at the top and bottom for longer prompts. The stuff at the very bottom is um going to be retained. and followed more closely. Use markdown headers and XML to structure the input to put different importance of different sections to also um introduce additional context that is dynamic. Break things into lists or bullet points. And if things break down, try reordering, simplifying or isolating specific instructions. So, because it is so good at following sentence by sentence, um you can really just like strip away bit by bit and try and find the source of the problem. All right, guys. Thank you very much for watching to the end of the video. If you haven't subbed to my channel, if you haven't liked the video, can you please do that as well? Like I said at the start of the start of the video, that's going to help this video push out to more people. It's going to help me grow my channel. So, I'd appreciate if you could help me as well.

Transcript for:Effective Prompting Techniques for GPT 4.1

Transcript for:
Effective Prompting Techniques for GPT 4.1