Transcript for:
Long-Term Memory Implementation in AI Chatbots

hello it's Mike again now in the last video what were we talking about hang on a sec yes we were talking about agents it's really helpful to write things down if you want to remember them and also if you want a really tedious segue into what I want to talk about now which is long-term memory with generative AI specifically with large language models because you see the thing is that large language models don't have any state they don't remember anything so if we're using a chatbot such as this let me introduce myself to the chat bot say hello and then tell it some information about me I like to drink uh tea and take walks on the beach okay so I've given it some information about me um but at no point in this conversation so far has it had to remember anything from earlier on in the conversation but it can do that because as part of the application here we have saved this conversational history so when we type in the next question we have to send this conversational history and the next question into the model so that it knows what we're talking about so what was the first thing I said and it should know yes the first thing I said was hello um list the things that I like and yeah it's able to list out the things I like drinking tea taking walks on the beach and that's again because every time I send a question in the application also has to send in the whole of that conversational history as well that's how it manages to remember things but this is really short-term memory if I reset this chat and sort of simulate coming back the next day and then say list the things that I like then it says I don't have enough information to be able to answer that question and that's because we aren't passing in conversational history we aren't giving it any information about what has happened in the past so how can we solve this well one way we can solve this is at the end of the chat session we can have our application take that conversational history summarize it probably by using another large language model and then storing that and then the next time we come back to the chat we can recall that stored memory that stored conversational history summary and then carry on the chat from there and that's a lot of leg work to do so let me show you how you can do that with agents for Amazon Bedrock so if I switch tabs here we're inside of Amazon Bedrock if I go to the menu on the left hand side scroll down I can go to agents so we're going to create an agent here and an agent is a basically a chatbot which is capable of integrating in with other tools but we don't have to integrate it with other tools we can just have an agent Standalone which is what we're going to look at in this particular case but remember that you can combine this together with other tools to reach out to the outside world to do things like retrieval augmented generation and so many other things but let me show you how I can create an agent and start to chat with it and get it to have this long-term memory that we talked about so I'll click create agent I'm going to call this my demo agent because no imagination and let's click create and then once it's done that we're able to start to configure our agent and again it's going to be super simple so I'm going to select a model that I want to use I'm going to select either Sonet or Hau in order to work with this particular method this long-term memory so I'm going to choose Hau because it's intelligent and it's fast um and then if we scroll down here I have this new section here this is in public preview as I'm recording this video um and it allows me to be able to enable memory so let's click enable and then we have this option of how long we want the memory to stay around for so the longest you can have and the default value here is 30 days so this number here is measured in days so that's all I need from that perspective let's scroll up and give our agent some instructions you are a chatbot answering questions that users might have do your best to answer the question this is just a demo okay if I scroll to the top and I click save and then I click prepare I can now test this agent and I can show you what I mean so let's have the same kind of conversation that we had before so I'm going to say hello and it says how can I assist you today and so I can say I like drinking tea and take taking walks on the beach um let's run that it's giving it a bit of information about us okay sounds fantastic all right now let's imagine that we're going to leave this conversation and walk away and in the testing ground that we've got here inside of the Bedrock console I can click this little end session button right here if I do that then it all goes away and we have reset our conversation now what's happening at this point is that the conversation is being taken and it is being summarized for us and then that summary will be available to the next conversation that we have with the agent as long as we're using a memory ID which matches so we'll take a look at this in code in just a moment but the memory ID from this particular chat session obviously has to be given to the next chat session in order that the memory aligns to the particular user who's using this session so the memory takes around about a minute a minute and a half to actually be generated and so once that time's elapsed we can jump back in and we can have another conversation so let's start another conversation going to say hello um and this time let's get rid of me so you can see it a little bit better if I go to show trace and I go to memory you notice here that it actually has this saved memory the user enjoys drinking tea and walks on the beach they find it very relaxing and rejuvenating experience so it's actually thought a little bit more deeply about the comment that I made to it than just what I said but it summarized the conversation there so if I go back in here and ask it a question that relates to that so what things do I like to do question mark I recall from a previous conversation that you enjoy drinking tea and taking walks on the beach so it's got that memory of the previous conversation built into our next conversation all right let me show you now how to implement this in code so that you can build this into your own application and to do that I've got this Jupiter notebook here and I've got a link to this jupyter notebook in the description beneath this video so let's take a look quickly at how this code works so first of all um yes Jupiter code need to remember to run all of the cells here so we're going to import some python libraries for us here so we just got The Usual Suspects we're using boto 3 which is the SDK for AWS for python um the next thing we're going to do is Define a region so this will work in a number of different regions again this is in preview right now so you need to look for a region where you've got Amazon Bedrock where you've got agents and where you've got access to the models that are required and so it's the Claude Haiku model that we're going to be using for this particular demonstration here so once we've got the region defined we can start to create our agent now this next cell is huge there's lots of things going on in here so let me just quickly step through it and then we'll run it so this basically goes through the process of creating an agent and then enabling that long-term memory and getting it ready so that we can experiment with it and so this is going to do all of those things that we just did in the console plus a little bit more so we're going to get ourselves the boto3 client object for the Bedrock agent now do know that there's a couple of different client objects that are available there's the Bedrock agent and there's the Bedrock agent runtime we'll look at the Run runtime in a moment this Bedrock agent uh client is for us to be able to configure a new agent or to make some updates to the agent itself we're also going to grab ourselves the IIM client because we'll need to make a role in order to make in order to create this client this agent so we're going to just Define a name that we're going to use for this and then an instruction now previously when I was in the console I typed out something pretty simple um but I actually got uh Bo to help me actually create a prompt or an instruction here that we can give to our agent which is much more comprehensive and this will actually form part of the system prompt so a significant component but a component of the system prompt the agent service will add more things in as well so uh have a look through this prompt it's just a generic chatbot prompt but see as well that it's not just a couple of sentences I'm being quite specific about how I want the model to behave um then I'm going to Define what the foundation model is going to be in this case Haiku um I'm going to get a random string so that I've got some ability to be able to name some resources and not stamp over things that are already inside my account and then we get on to the creation of stuff so first of all the trust policy and the am policy that we need in order to be able to attach this to the agent so the agent needs to be able to assume this role and the agent needs to be able to invoke model in other other words it needs to be able to call the Amazon Bedrock service itself to call the particular Foundation model that we're going to be working with um so from that we can then go ahead and use the boto3 client to create the role and then put a policy on that role that policy being the policy that we just looked at there so with that uh set up we've got an Arn for the role that we need to give to the agent all of this happened for us automatically inside of the console but when it comes to infrastructure as code this is how we do it so with that role created we now get on to actually creating the agent itself so for that we're going to use our Bedrock agent and we're going to call create agent we pass in the name that we want the foundation model those instructions that we defined and the role and then this part here is where we configure the uh the memory it's where we enable that checkbox essentially to say that we want to have memory so we have this memory conf configuration we're using session summary memory and we're having storage days of 30 or whatever it is that you want so with all of that we've run that with create agent we get the response back as part of that response we're going to get the agent ID now the agent ID is one of the pieces of information you need in order to be able to invoke the agent so I'm going to store that in a variable here called Agent ad ID um and then I'm going to wait just briefly for the um agent to be created and enter into a state of not prepared it sounds like we should wait even longer but that's actually the state that we're waiting for the state of not prepared because the very next thing that we're going to do is we're going to prepare the agent and we do that using the prepare agent command fair Fairly straightforward and then we're going to wait for it to enter the prepared state so this is the equivalent if you remember when I was in the console that I actually specifically clicked on the prepare button and we need to do that because the next thing that we're going to do is create an alias and that Alias creates a version and you need to have a prepared agent before you can do any of those things so we're going to call create agent Alias now an alias is the second piece of information that we need if we're going to call our agent we need the agent ID and we need an agent Alias you can think of an alias as a way to be able to handle the deployment process of your agent you can have different agent aliases for different versions of your agent but for now we're just going to have one agent ID One agent Alias which is going to relate to one version we'll move forward with that so once we've got our Alias um and it itself is prepared then we're done we've got our agent ID our agent Alias ID now you can create agents with cloud formation um but not all of the preview and the public preview services are going to be available in Cloud formation yet so that's another reason why we're doing it in Python here now let me run that cell and have all of those things happen you'll notice that in that code there was a bunch of print lines to show us what was happening at the different stages and it really just does take a few seconds for all of that to happen and so in just a moment we'll have our agent fully created and we'll have there we go the agent ID and the agent Alias ID there for us okay so now we've completely set up an agent if we were to go back to the console we'll be able to see it in there but let's have a look at how we can invoke the agent using Code as well so if I just scroll down here I need to get myself the Bodo 3 object for the uh Bedrock agent runtime now um so I can actually invoke the agent which is now running and let's get into this so the first thing that we'll need or some of the things that we're going to need are the session ID so session ID has to be the same across all of the session where you're communicating in that moment with the agent so this is all the time between whilst you're chatting with the agent up until the point where you click that button to say finish or the programmatic equivalent of that um that's going to be a single session so I'm just creating a uu ID a random string essentially to represent my session ID and then I have this memory ID and in this particular case I'm setting that to a static string that I'm I'm going to use for the demonstration and that's then basically going to align to me as a user so that when I come back to this agent and this agent Alias using this memory ID then I should get access to the previously stored memory of our conversations so I need to make sure I run all these cells so that's run and defined um and we can look back on setting up another session ID later when we try to reset um next up I've got a very large kind of Swiss army knife kind of function here which is going to help us to test this out so this is the invoke method that I've created and it's basically abstracting us away from the invoke agent call to the Bedrock runtime agent the Bedrock agent runtime and and I've set out here all of these different parameters and what they do but basically we're calling the agent ID the agent Alias and we're sending in um some text the input text which it's going to get from this function call and this is essentially going to be the hello and I like drinking tea and taking walks on the beach part of the conversation it will come in here and then we'll get a response back we're also passing in the memory ID so that once this conversation is finished it will store it in that particular location so I get my response back um inside of my response I have a completion and that is an event stream um and then from that event stream we need to um get all of the events off the event stream and wait for it to close and so the event stream will be filled with all kinds of things but there's only really a couple here that I'm particularly interested in one primarily and that's chunk so chunk basically represents the textual output from the from the agent so whenever it responded to those questions they were arriving as chunk now the agent doesn't support streaming of content back it only supports all coming back in one go so was yes this is called chunk um it will just arrive Allin one go here is all of your text the other thing that we'll get out from the event stream and you might notice that we actually have a an option here for show trace which is defaulted to false um which uh if we have show trce on it will take out what we call the trace events so these Trace events are debug essentially and showing what the thoughts are the reasoning steps the system prompt and everything which is happening behind the scenes with our agent will come out from this Trace um and I have a sort of helper function here which is going to um extract out from the trace the memory that was stored so that we get to see it inside of this notebook and we'll have a look at that later on in the code execution um and the final thing in here is that I have a um end session function so essentially we have the ability to finish the session such that we can start it again and see if the memory worked for us okay so let's make sure we run that and Define that function it's really just to help us test the concepts of working with an agent and then also test the concept of working with memory so well let's have a go it all becomes much easier now so let's just run the hello we should get something very similar back than as we did from uh previous that's just a little bit more of verose but hello I'm curious um a I sorry AI assistant created to be knowledgeable engaging this much longer response here is probably very much in response to that big system promp that we made available or the instructions that we gave it telling it quite specifically how we wanted it to behave um so next let's tell it some information about us I really like drinking tea it's my favorite drink I don't much like coffee um slightly different from what we typed in before but it's some information about me um I even then go on to tell it a bit more and you notice it is conversing with me that's wonderful I'm a big tea fan as well not entirely sure I believe that but you know a large language model maybe it likes tea I don't know um then let's go to the next one here um telling it specifically how I like black tea no sugar blah blah blah blah and more information about me okay so we've started that brief conversation um now let's end the session now if I was to go back to this to the uh to the chat and ask it a question about me at this point it wouldn't quite know because it needs to spend some time um taking that conversational history and summarizing it into a memory object which it's then going to use for subsequent conversations so I've created this quick function here and basically what it does is it calls um get agent memory um and it will give us the agent memory that that's currently stored and so I'm basically waiting for agent memory to contain something so we're waiting here and we'll just need to wait for maybe a minute or so for that summary to be made okay and so that took well 30 seconds in the running of this but it took me a while to press play so roughly about a minute in order for the uh memory to be captured and we can see here the memory that has been captured the user expressed expressed a strong preference for black tea over coffee blah blah blah blah blah and it starts to have more information about its memory of our conversation so let's start a new session and all we need to do to do that is create a new session ID so I'm going to create a new random variable for our session ID and then ask it something again in this case I'm going to say if I uh if you could make me a drink what would it be and it says given your preference for black tea that we discussed previously I believe the drink I would make for you is a nice hot cup of plain black tea so it's remembered me from from our previous conversation and so we've seen there how we can see what the memory is and we can see the memory in action here I think now is time for a nice biscuit now what I've done here with this particular invoke is I've send in this True Value which will go to our super helpful function that we had before and it will actually print out some of those Trace elements for us so the dbug elements and so this is what we can see so we've got the system prompt here um which is very very very very wide which is why it doesn't look very long but that's the system prompt this has been the whole thing here has been built up by a template mechanism inside of agents for Amazon Bedrock so we've got the our particular instruction at the top of the system prompt it then tells it about some functions and tools that it could use we didn't configure any for this agent but that is a normal use case for an agent um and then well I'm going to make this a scrollable element so I can see a bit more because what I wanted to show you in here is how it presents the memory that it has so it's got some more instructions here about how to work and if I have a look here we've got some yeah here we go this is where the memory is actually given over to the model now all of this that we're looking at here even though it looks a bit like XML is actually passed to the large language model just like this so this is what it's actually reading so um more than just that memory that summary of a memory that we saw previous in the previous uh function call we can see here what it actually puts into the template so the user expressed strong preference for black tea over coffee which is something we've seen before but it's put pulled this out into something called facts some facts that it's figured out from the conversation some user goals what were the goals of the user during our last conversation um assistant actions what did the the assistant actually do and the action results now if this agent had access to some actual um tools some actions that it could take then those kinds of responses can be summarized in here as well and the agent would have this sense of what happened last time what were the preferences of the user last time what did they get up to so I can carry on the conversation okay so let's scroll all the way down there are a couple of other cells in this notebook if you are following along so you can clean up you can run this it'll remove the agent and remove the and click this it'll remove the role but if you want to you can go back into the code you can start playing around with it a bit more or indeed we can jump into the console because if we jump back into the console if I just uh close this down for a moment and go back to the agents page we'll find that yes there's my demo agent and there's my long-term memory test agent that we've just created using that notebook okay I hope that was useful what we've covered there is a little bit about memory inside of large language model applications chatbots and even agents here and so how we can use long-term memory inside of agents for Amazon Bedrock to remember what had happened in a previous conversation if you are interested in this then please do drop me a comment beneath this video this is in public preview right now which means we really want to hear your feedback about this service if you got any other questions about anything generative AI then please do feel free to reach out I hope you've enjoyed this video if you have please give it a like subscribe to this channel for more videos just like this and until the next time I'll see you later