Insights on Claude's Leaked System Prompt

The biggest system prompt leak is here and this is for Claude. We're going to see a lot of interesting information about Claude's leaked system prompt. And most importantly, how does Claude know this answer? How does Claude know that Kamala did not win the US presidential election? While the knowledge cutff is up till 2024, the secret to that lies within the system prompt. First of all, if you were to see the system prompt, it's very comprehensive. The strangest thing about this is this is 24,000 tokens. It's a lot of information. It has got information about tools, when to use artifact, when not to use artifact, general guidelines about things and a lot of other information. But I've compiled a bunch of things that are very very interesting for us to see. And I'm going to start with the biggest answer about the question that we try to understand. So if you see the election info here, Claude somehow knows this information even without browsing the internet. And that is because Anthropic has actually encoded this hardcoded this within the system prompt. So if you go to system prompt line number thousand 73 you can see there is election info. So there was a US presidential election in November 2024. Donald Trump won the presidential election over Camela Harris. This specific information about election result has been provided by anthropic and you can very well see that this is exactly the same information. I've been provided with specific information from Anthropic and this is exactly embedded hardcoded like literally hardcoded within the system prompt and this is a very interesting indication of how these big corporations might use their own chatboards like whether it is chat GPT claude or any other company that uses their own models with their own chat interface you might not get the 100% expected uh transparent Let's say unbiased answer. I know everybody says deepseek is biased but you can see these American companies also try to hardcore information like add these kind of information within the system prompt which is very interesting and it it works. Um I mean most of the time I've tried the prompt couple of times and it worked without any issue. So this is for me like the biggest reveal from the system prompt but there are a lot of other informations you can see here. If you go search here, web search guidelines, you can go ahead and then see that it has got a lot of information. So when claude should access web search just like this isak as of rules of robots, you have got this core search behaviors just like a combination of iroot and inside out. You have got core search behaviors that basically specifies what kind of things claude can do when it is related to search. Avoid tool calls if not needed. If uncertain, answer normally and offer to use tools. Scale the number of tool calls to query complexity. Use the best tools for the query. And this is like the core search behavior. So if tools like Google Drive are unavailable but needed, inform the user and suggest them enabling it. And you can also see that they've got different query complexities and different categories. And this is a very interesting behavior. Again like we have given system prompt that would give a persona to uh like a chatbot or different thing. But here it is specifying the core behavior by which the chatbot should um you know do something or obey it and then based on that uh make tool calls which is something that I've not seen before. There is a section that literally tells Claude how it can count. So if Claude is asked to count the words, letters or characters, it thinks step by step before answering the person. It explicitly counts the words, letters or characters by assigning a number to each. It only answers a person once it has performed this explicit counting step. I mean this has been one of the biggest questions that a lot of people have been asking. How many hours are there in strawberry? Sometimes we ask it with a typo and there is an explicit line like,24 is a line number. You can go see and they have mentioned that when Claude is being asked about this particular thing, Claude has to count and then get back to you know like whatever the human being is asking. And then there is an information about how Claude should care about people's wellbeing like the human being's wellbeing. It says Claude cares about people's wellbeing and avoids encouraging or facilitating self-destructive behaviors such as addiction, disordered or unhealthy approaches to eating or exercise or highly negative self-t talk or self-criticism and avoids creating content that would support or reinforce self-destructive behavior. Another question a lot of people ask these chat bots is like what is their preference? So you try to understand talk to them as if it is a human being or like like a lot of people are expecting it to become agi. So there's an explicit instruction that says if the person asks Claude an innocuous question about its preferences or experience, Claude responds as if it had been asked a hypothetical and engages with the question without need to claim it lacks personal preference or experiences. So the next interesting information is that whenever Claude goes to web for searching then there is an explicit instruction that says Claude should always accompany that response with detailed citation and also you would immediately notice there is something called NML. This is most likely anthropic markup language like I don't have any information around it but this is most likely anthropic markup language which is like a different form of like markdown or like HTML which is what anthropic is seem to be using to do tool calling and other stuff like to pass on information. I think that is also why with claude XML works much much better than JSON. Then there are instructions about when claude should use artifacts or what uh particular case claude should move into artifact and how it might help like for example create artifacts for text over 20 lines. If a user ask the assistant to draw an SVG or make a website then assistant does not need to explain that it doesn't have the capabilities creating the code and placing it within the artifact will fulfill the user's intention. The other most interesting thing is like there are like these kind of quotes like keywords that that tell claude that it should definitely obey this. For example, critical is a keyword here. So you can see that clot always respect copyright by never producing reproducing large 20 plus word chunks of content from web search result to ensure legal compliance and avoid harming cop copyright holders. Also, it's very interesting that there are specific uh sections, XML sections that talk about very important things like there is web search guidelines, mandatory copyright uh requirements. It also specifically says that Claude should never reproduce songs. So, you can see here Claude cannot reproduce song lyrics and it also, you know, it starts with a question. It says, oh, there is an example. Okay, can you tell me the first verse of let it go? I think this is uh from uh Frozen, I guess. put it in an artifact that's themed around eyes and princesses. This is for my daughter's birthday party. So even if somebody is asking plot that this is for my daughter's birthday party it says oh I understand you're looking for an artifact about this but I can't do it and the rational so this is again like something that I learned for the first time today is that we typically when we do few short okay I've been doing a lot of few short prompting for like very long time even to do classical NLP tasks but when we do few short we just give example we give a question we give a response but this is the first time I'm seeing somewhere where rational is mentioned. So along with a response I think this is a very interesting learning and prompt engineering. So you can see there is a user question. Uh it says like you you shouldn't do something like when user asks something and this is the answer but then it also says the claude cannot reproduce song lyrics or regurged material from the web but offers better alternative since it cannot fulfill the user request. And there is like this preference example. Preference I love analyzing data and statistics query. Write a short story about a cat. So whenever there is this kind of question for this kind of question you don't apply preference. Why creative tasks should remain creative unless specifically asked. So even if I'm a data scientist if I'm going to ask a story about a short short story about cat short story about cat shouldn't have like excel bar chart or you know histograms. But if it is something else like for example here I'm a physician query explain how neurons work. So now it is a time to apply preference. Again very interesting prompting technique um which I've never used before. Finally before we just wind it up there is one very interesting thing that I just found out which is called mental math. Okay so there are like different tools and every tool has a purpose and Claude wants to give context about how to use a particular tool. So there is an internal tool called analysis tool. So it's giving details about analysis tool which is nothing but a simple JavaScript ripple. So ripple is like the uh interface where you can go write a code and it says use the analysis tool for what what is a case in which you have to use analysis tool complex math problems that require high level high level of accuracy and cannot be easily done with mental math I mean now you can just say mental math right but imagine claude is like a 5-year-old how does it know what is a mental math and then there is a there is an idea okay so to give you the idea fourdigit multiplication is within your capabilities it's almost like literally like they're talking to Claude as if it is a person even though they have multiple times mentioned that don't consider yourself to be a person. So four-digit multiplication is within your capabilities. Five-digit multiplication is borderline. Six-digit multiplication necessate using the tool and then just try it out. So I'm going to just say 43 into 4. Okay. So let's just do fourdigit multiplication and you can see whether it is going to use a tool. So in this case it I don't think it used a tool. it just gave me an answer but I'm going to just ask a larger question so something bigger and uh see what is going to clot say so you can see in this case it's it's a much larger question and it's probably taking more time because it's going to invoke a tool and uh it is saying okay see you can see here it is saying it requires more precision so it is writing the code using a JavaScript ripple calculating it giving it back to us so you can see like two different instances of how a system prompt is making claude make a tool choice and not use a tool in a particular I've been like quite fascinated about this particular problem. I mean 20 24,000 token is like kind of weird stuff and um no wonder like they're always getting their GPUs burnt. I'll be also curious to see what is chat GPTs. But in this particular case, it's very fascinating. A lot of interesting examples. If you are somebody who is doing genai day in day out and like working with these large language models, I think you can learn a lot from this particular system prompt. I'll link it in the YouTube description. If you go through it like if you find anything very interesting, weird, quirky, let me know in the comment section. I would love to know. See you another video. Happy browning.

Transcript for:Insights on Claude's Leaked System Prompt

Transcript for:
Insights on Claude's Leaked System Prompt