hi everyone welcome back to kdan live we are at episode eight today at kdan we will bring you actional Insight so that you can Empower your business your team or even yourself with the power of AI and generative AI like the llms so before we get started don't forget to subscribe to our channels we have YouTube LinkedIn Facebook and Tik Tok today we have two interesting section and topics the first section is going to be about all Lama which is an application that allows you to manage um open- Source models like Lama 3 um Gemma and 53 you can download it and run um these llms on your laptop or on your machines and you can even host an API so that you can build um applications that call to these API itself in the second part we are going to build an application that summarizes u YouTube video for you and in this section we are going to use four tools the first one is the old Lama itself second one is llama 3 Model uh the llm model the third one is L chain framework and the last one is grad to build a user interface let's get started the first section is about o Lama we will walk you through how to use these tools but but first what is ama ama is an open- source application that allow you to do these things you can use um open source um language large language model like Lama 3 Gemma 53 or mistro you can download them to your machines to your laptop and after that you can run these LMS and have a chat uh like an assistant even on your laptop then you can customize your chat sessions with the llm by configure system prompts or other parameters such as temperature or context length and lastly this is important and this is key um for all Lama um by using a model file you can easily manage your llm configurations for example after you have Set uh system promps you can save it and then you can easily load that back um so that you can chat with your system prompt uh uh LM and when you have this model file you can send it to other people you can easily share um your configuration with other people easily um Ola will be probably uh liked by developers because you will likely need to interact with it via a command like like this so it might look overwhelming to uh beginners but this is not that difficult I think I I can walk you through it if I can do it you can do it now what kind of models that uh support um directly by olama so um as of May uh 2024 these are the models that are supported by the O Lama um framework or application you can for example you can download llama 3 using o Lama run Lama 3 command line you have 53 mistr nuroad Sterling models um Cod Lama La uh lava model which is a multim model so it allows you to input a text and an image for example you add an image and you ask a lava model to describe the image and obviously you have Google's model called Gemma in your um inventory as well now all L will gradually support more models in the future for sure this is how it's going and you can actually check uh here for supported models you can search for some names that you are interested in um and you will be able to see whether it's supported by Ama or not and you can see here that uh it also supports Comm r model from core here I suppose code Gemma Gemma is fine tuned to be really good at uh coding or app programming dbrx from data braks for example I don't know where it is from oh it's from Alibaba you have dolphin Lama 2 the list will keep going um and when other llm providers release open source model it should be added to the ol platform um shortly after now ol Lama platform Works easily with a file GG uh UF um if you have the ggf files you can model to download that file very very easily but if you have you are familiar or you want to use other formats like uh safe tensor I forgot the e here that should be an e over there but it's another format that keeps the the motor weight you can you can work with that with in O Lama as well but that is a more difficult path um towards that so yes you can import safe tens files but we have to go through all these steps it's not that bad I did it myself and yeah you can easily follow these guide so what is new in May 2024 is Lama 3 lava which is a multimodel model uh meaning text and image model now uh enhanced version of the Lama 3 um this is what we are going to have a play um today as well we have a 53 enhanc with lava so a multimodal as well you have star quar to code Jemma this is interesting if you are a programmer um maybe you want want to have a large language model fill in the doc string for your code the code Lama might be able to do it as well so what can you do with oama these are things that you can do with once you download the model the llm you can start a chat conversation with it so you have access to uh gbt or gmin Styles assistant you can ask it and you can you can ask it whatever it is you can carry a conversation with it next thing is you can also ask an llm for an answer based on a given text so this is an application or technique Call Rag retrieval augmented generation you ask something and you want the LM to answer based on that given context now ol has um a function to supports that directly as well the third thing that you can do is you can ask the llm about an image given that you are using a multi model models like the lava variants of other models for example the last um two things about um serving your application by uh it allows you to serve up the llm as a local API so you can actually um locally hosting your llm on your machines and you can also serve um um this as a Docker as well but this is not I'm not going to cover that today okay nice you can do all these but what kind of ways you can interact or work with AMA there are three main categories the first one is you interact with olama via a command line for example terminal in the Mac OS this is what I'm using today secondly you can uh work with AMA via a python Library as well if you have installed it this is what you will start with you import AMA and then you can do most of the things that the command L uh commands will be able to do with AMA and the last thing is it has a lot of Integrations so you can actually uh check out this page um go down oh maybe this is a wrong one I need to correct it um okay that's my bad but basically if you come to the correct uh read me file you will get to the bot bottom where it's about Community Integrations and you can actually check out all these tools which one will be interesting to you um one of them that I looked into is integration with a database tool um this is called my DB so if your application will be um uh will be involving any database maybe check out this integration as well okay I need to correct this link for me but in more detail if you run so when you download AMA from from its website and you install it it comes as an app if you run an app it will automatically set up a port for you and you can access it we are a local host and the default Port value is 11434 so that's the way to call in to your locally hosted um llm already now you can change the behavior of these default values and other default values in the environment variables but I'm not going to show you today but you can do that and here you can you can see here that okay these are the environment variables that you can change and one of them is the host and port number I'm just going to use the default value because I don't have any problem with it now okay if your application your olama app is running this port will be occupied and when you run this AMA serve it will crash it will throw an error saying that the address is already in use basically that it it's going to tell you that oh Lama is running at this port already you don't need to do it uh again but whatever it is if it happens that you have shut down the application basically you kill the app um actually I can show you right now here at the top of my screen this is the old llama icon it means it's running right now so if I kill this app and I want to serve my API locally I need to run this command Ser that's what it is all right I'm going to show you how to work with ol Lama in steps but before I go to do that I will explain what is happening the basic workflow is this you need to download the language model the large language model to your computer first for example if you want to use Lama 3 you need to download Lama 3 from the cloud to be stalled on your laptop and you can do this by this command Ama pull Lama 3 it will take uh some time depending on the size of the model that you are going to download and also the internet bandwidth that you are having right now so this is an example that uh when I started downloading Lama 3 I just type in the command o Lama P me Lama 3 and it will start the download after you have the modeled on your computer then you can start a session chatting with it and you do this by say AMA run a model for me as an example this is a way to do it ama run Lama 3 so let's just do it right now and then we'll come back to um that website later um so one of the other commands is to list out what kind of models that I have are on my local computer I can do this o Lama list now it will give me these are the models that I have so this is the original model that I downloaded and these are derivatives models basically I made some system prompts and I saved the model so I can load it back up later but that will be discuss um later today let's focus on what we have right now we download Lama 3 we want to just start chatting with it so I will say llama o Lama run L three and it's being uh it's now preparing chat sessions this chat session has history as well um it start having a history as default but you can turn it off later so it's easy to you don't need to remember much of the commands here because it the the interface is nice uh if I have any doubt what I should do I can just type in slash question marks and then these are the avable uh commands that I can do I can say I want to know about the model information so I can type in slash show oh okay I need to do more like SL show info so right now I am chatting uh with the Llama family um this model has a size of 8 billion parameter and is being quantized to a level of Q4 uh underscore 0 you can look it up the meaning uh of this contestation in the internet but this is not the topic for today okay let's start chatting hello um Ben who are you here you go this is llama model started to chat with me so it's nice to meet me Ben um Lama blah blah blah and so on all right that's the first one um tell me what is 1 + one it should be two right drum roll right it has some jokes in it um so I can say this that large language model is not so good at calculating matth but this is easy enough for the model to do OnePlus One it's all over on the internet 1+ one is equal to two all right I'm going to test whether it has a memory or it um basically can it remember what my name is where can you recall my name right so you can see that it has some memory um to start it off with and if you go into the detail you can you can dig it up how um many uh lines of how many terms of conversation it can keep in memory and what what will happen if it becomes an overflowing meaning you have too many turn chatting what is the AMA um default way to handle that situation you can dig it up in more detail but this is this is a beginner class I'm it's just going to keep it clean to start off all right here we go um so what else what kind of commands that I can do the set command allows you to set many many things what things well you have you can set parameters you can set system prompts you can set templat you can turn on and off history you can set whether you want to have a word WAP or not and many other things like quiet disable LM stats I'm not sure why why I would need it but anyway so you have other things to do let's say Okay parameters what can I do inside parameters you have many things and one of them is temperature so a low temperature means the model is more deterministic a higher temperature um increases likelihood of uh the model generating creative answers so if you want something uh if you want a model to help you creating stuff turn on the temperature so I would try set parameter temper RoR okay to zero now it's set to zero okay I let's say that um write a one stanza poem about a to uhuh flying in the sky I'm not sure if it's going to do it well I turn the temperature down to zero it should not be so um creative with feathers bright like siss go the coatu S it spirit Spirit bold it rise the wind with wings outstretched a flash of white against the Blue's gentle clut well not not bad let's try it again but I want to set the temperature up uh temperature all right write a one staner poem ah about a 2 flying in the sky softly swooping through the asual air the cock's Majestic flight is shared with FEDS ruffled and eyes a glow it rise the wind it's um spread saring low I think it's better better than the temperature of zero okay so now you know how to set things up and uh um one of the command that I can do is clear basically it's is going to clear out everything okay let's try clear out everything now I wanted the model to answer my name can you recall my name it should not be able to answer because we have clear all the conversations that has happened before to this point in time it keeps apologizing but basically well it doesn't have a context to work on and that's fair all right so if you want to quit the chat message one of the way is just to say bye and that's it you have quit that right now you can see that there are um load and save um you can try yourself this is I think I will invite you to try if you so keep chatting a few times Maybe introduce your name um to it and then you can save the model and if you load it back is the session uh chat history uh going to be taged along with or retrieved if it is good so I I want you to just try it out now that's one of the way basically you download the model once you finish download you just start using it right away way right um another way that is quite interesting is by using a model file AMA provides you a way to save a configurations and load it back up very very quickly and then you can even share that configuration with your friend so let me just clear um let me just go to the location that I prepare my stuff all right uh okay let me just list out the models that I have so one of the commands is to remove the model that you you you already have here so let me just show you that ol Lama remove PX Chef ah okay oh sorry uh I need to have this P chef Lama 3 it's deleted now so these are the models that I have at this point okay what is um model file model file is a file that contains configurations that you want you have seen previously that you can set temperature you can set system prompt you can do it in here uh in this model file and when you load I will show you later when you create a model by combining or loading this model file the settings will be um the settings in the configurations in the model file will be set properly for you okay one of the model file that I have created is a pisf use Vim to create and edit it make sure that you do not append any extension to this file it just so for example this file must be PF not pach chef. txt that at least I tried um a version with the extension myself and then I couldn't um work with AMA so remove the file all right so I'm just going to open up PF and in this uh file you always start with from from tells all Lama that this model is going to be based on what base model in this case is a Lama 3 um base model and then I prime the model with a prompt uh system prompt I as it to be this you you only answer about pizza and you HTE pineapple in pizza that's it um you can have more uh parameters in it like temperature um you can uh you can add more and um to a way to do this in detail is you come to this uh file on GitHub and you read the documentations like uh if you want to set a parameter what kind of parameter you want let's say a temperature what are you going to do so let me just add uh a temperature in there oh okay can I just go oh not so good at using whim oops okay here we go I just type I isn't it uhhuh okay okay I put in parameters and what else it needs to be followed by temperature right right and here this is an example temperature um lowercase let's say I want it to have a very high temperature all right I saved my P Chef just to make sure that the thing stays here this is parameter and temperature. N all right so I have my model file ready um I can send it to other people uh so that he or she when receive this model file can create a model with the right um parameter uh being set as in this P Chef file so that he or she will do this ol Lama create now you give it a name a b note name so I would say PF model and I want the all Lama to create this modeled based on this that's it it's as simple as this now you have a model called pach in your uh laptop and then you can start chatting with it o l uh Run Pizza Chef now the model is being loaded and here hello andent benzo oh very Italian congratulation welcome to my Pizzeria what can I do for you see this I think this is the the effect of the temperature being very high like 0. n um looks like the generation the text generation seems very creative we got a fresh batch of dough just out of the oven and I'm telling you it's going to be Masterpiece you like classic makarita or maybe some meat lovers marcarita okay just don't even think about putting pineapple on it capis good okay um I'm just going to ask um what is the pizza that has mushroom toppings top B what's the answer you're talking about the fun favorite amiko Amigo is Amigo Italian I do not know all right so it has some pretty good answer to it your well it's it's it's uh it's famous funai Italiano uh it's a classic combination of salty mushroom fresh parsley and sprinkle of ban cheese on a bed of Rich tomato sauce and let me tell you B is it's going to be uh it's going to be Love At First Bite huh no pineapple so remember that I prime the llm with the sister system prom to hate pineapple on Pizza uhuh how is it going to handle if I say I like Awan Pizza oh no oh no oh no can't be serious Hawaiian with pineapple and H that's like putting ketchup on a good steak oh yeah that's true okay no way all right so this is it you have created a model based on a Model file now what is it going to do when you save your sessions check it out try it out okay but eventually you will be saving another model and go check if the history will be tacked along with it but for now I am going to say goodbye to my Italian chef all right where were we so we were here that I have uh introduced you to another workflow where um you created a model file all having um this this model file will have uh important parameters or configurations that you want to set up like a from keyword and a system prompt if you want you can also set up other parameters as well so in the next section we are going to take a look at popular commands or often used commands for the command line you have seen all list it's going to list out models that you have on your laptop um ol Pool Plus the model name will download or update the model that you want you have already seen uh old L remove it's going to remove a model that you don't want anymore allas you haven't seen that but I can show you right now okay here I have a la O Lama app running at the background meaning one of the port the port number 11 1434 is already taken by AMA so if I type in ama serf I should get an error okay all Lama serve here I have got an error saying that this port is already in use so basically I can actually call this I can make an API call to This Server right now but if I turn off or kill my old Lama app you will see later that with a python API call I will get nothing I will get an error actually this is because the Ola is now not running what you need to do is you need to serve it and you can do so like that and now you already have old Lama um application running but in the background not from the uh app interface but I don't want this I want to use the um the terminal so I'm just going to kill it and turn on my old Lama from an app here we go this little Kilt Lama uh icon is back all right you have seen this type of command is to um is actually to create a model based on a Model file you can find more command lines um commands here in this reference so you don't have to remember everything right now you can always come to my blog here W web.con um w web. kd. a/ blog and you will see ama beginner guide as one of the blogs there okay two things not two things um what we would like to cover right now is inputs the way to input your text during the session uh I'm going to skip this rather quickly because in multiple lines if you want to type in multiple lines you can use the triple quotes this is similar to what you you will use in Python programming language for for multiple line um text as well so you can use this what is more interesting is the multimodel input right multimodel means text and image and so one of the way is to uh type in the text that you want and then at the end you add a path to your image and this is what I'm going to show you but again make sure that you are running um a model that is capable of uh doing multimodal and one of them is the lava model this is the lava model you can download it first AMA po lava and then you can start using it so I'm going to okay before I go into that um I have my image in this folder right here is actually a picture of me but our style transfer to be more a little bit Punky um yeah well okay I don't want to show you how it looks like right now but if you have my social Prof files well that that that's that's the image I'm talking about all right so AMA run lava model it's taking quite some time here we go it's ready hello let's start by by greeting okay great now okay what is happening well I I don't understand that anyway so um describe this image in detail now path is um dot back SL image PNG now what is happening is that at the background the image is being loaded and it's provided to the model along with the text describe this image in detail the answer is back the image is a digital illustration of an adult male correct he has a short black hair and a light beard okay correct his eyes are looking to the left side of image with a slight smile his wearing glasses correct with a bluish tint on his face and he's dressed in a dark colored shirt okay sort of correct with a road sleeve cor correct the character appears to be standing Outdoors incorrect but whatever as suggested by the blood back black background that includes what looks like an overcast sky and buildings in the distance there are no visible text uh distinctive markings on the character's clothing or or in the image correct so it's doing really really well not really really it's doing a decent job at explaining this image um to me but I'm a little bit surprised with its Korean language um replying back to me but whatever it is lava model can handle two types of input text and uh an image and now you know how to uh work with this model with all Lama bye passing an argument is something that you can do as well and this is a similar to um this is actually a builtin rack based application you can try that yourself um this is an example the text is summarize this file and then you point to the file location but make sure to close it with the dollar sign and the bracket and then you use a command cat to load that file so I suggest you go take a look and play with this one okay now I think it's the last section yes it is the last section so when you have o Lama running in the background meaning you have it running as an app or you can call you you have called ol Lama surve you can use ol Lama uh integration with python and why this is useful this is because in Python you have plora of other tools to be in applications like L chain and gradal and this is going to be our next topic but we go before I go there this is how you can this is one of the ways that you can uh um code so that you can uh call in Python you can import AMA and then you say chat I want to chat with Lama 3 and these are my messages and you send that to your locally serving or locally hosted Lama 3 and then you get the response back in the following code um this is this is you are so in this code you are actually uh use old Lama library to communicate with the server that you have at the back um via this but you can also do this in a lang chain framework as well and this is the way to do it you do this you import all Lama from Lang chain communities llm module and it has a nice integration you say okay I similar similar to the way that we us usually do you say what model do you want to use and you point your url to your locally serving API at Local Host 11434 and you can also set the temperature along with it as well here we go I think this blog page gives you a decent information as a beginner to use AMA as a tool to download your select um open source llms so that you can run a chat GPD style assistant and have a chat with it it can help you with many many tasks like creative task or summarization task or extraction task most of the NL task um this model will be able to handle it for you you can even have uh your own locally hosted or uh local API so that you when you build other applications you can call back to this API and get the lm's response back all right so before we move on don't forget to subscribe to other channels we have YouTube LinkedIn Facebook and Tik Tok so subscribe to our Channel or go go to our website and subscribe in our website as well so make sure it is a way to make sure that you will get notified of our um updates latest updates or tips or maybe a freebies um I'm planning to give out some freebies in the future soon so um don't forget to subscribe to our channels all right march on to the second part where we are going to build an application that allows you to summarize the YouTube video and one of the tools that you are to use is the O Lama tool but you also use Lama 3 Model Lang chain and gradio to build this this is the output of today's code you have a YouTube summarizers empowered by Lama 3 large language model uh it starts with putting in the URL of the YouTube link that you want then you can get title you can get description of that um YouTube video and then you can get you click this button and you will get a transcription you can see the full transcript here um get transcript I should just change it but whatever you will get a full transcript here and you can you can also see an estimate token count of this oops oops estimate token counts of this transcript and when you click summarize this full transcript will be summarized by the LM for you this is the end of our today's Journey let's start with the basic idea of the summarization task so basically you have a source that you want to summarize it can be in many forms like text files PDF files excels words it can even in a form of URL HTML format even database perhaps like data frames you have a source and you want the llm to summarize when you have a source you need to load the text in order to be ready for summarization and when you load it up you are forming the prompt and this is one of the prompt that is possible for this task the promp says some summarize the following text and then you put in what you have loaded to that prompt and then you end the prompt with ENT this prompt is effectively um will be understood by the llm um so that it will create a summary of the given text this is the basic idea you have Source you load it to a text and you craft a final prompt with a text you want to summarize being embedded in there and then you give this prompt to the llm simple but there are two main issues here one of the issues is that there could be non-text elements in the files for example like images um embedded objects like audio files graphs so these elements um most of the alms cannot handle it off the batch unless uh people need to do some coding or manipulations but usually the algorithm would not handle non-text elements so that's the first challenge the Second Challenge is the text that you want to summarize can end up being too long longer than the lm's context length remember the Alm context length is the maximum number of tokens that can fit into the llm if you have a longer than that you cannot fit it you you'll get an error so um if you think of it as human eating context length of the alms is how big the mouth of a human is in order to eat and digest the food right so if you have something that you want to summarize and is too long over the context length what are you going to do you need need this is the second challenge so in this video we will provide a way to handle Challenge number two where the context length is overflowing or the text is being too long the idea is this um you add a few more steps in between again you have the text that you want to summarize assuming that you have loaded into the memory already and you know that is too long what you can do is you break it into chunks this the green chunk is the first chunk and you have a yellow chunk that's the second chunk and you can see that there is there is it's possible to have an overlap between chunks adjacent chunks this is to make sure or to encourage the model to at least understand what is happening before uh in the chunk before so this is one of the parameters that you can adjust in uh your chunk spit uh splitting process and then then you can imagine that you have as many chunks as you need in order to cover the whole text right and each for each chunk you are going to craft it um Associated prompt similar to this type of prompt but in this text Place holder you only put in the chunk that you want to summarize here and you provide that prompt let's say prompt one to the large language model to summarize and then you will get one summary and then you get prompt two and summary two prompt three and summary three and so on now the last step by the way this step is the map step you map a chunk to its summary and then the final one is the redu step now you have a list of summaries you just want to reduce it to one final summary and this is the final prompt might not be exactly like this you can customize your final prompt but it's going to be in the light of combine the following separate summaries into a co coherent um summary for me and you provide list of summaries until the end right and then you give this to the llm and then you will get the final summary so this is how easy it is but it can be a lot of coding if you want to do that yourself luckily Lang chain a famous uh tool for building llm applications has a wrapper function called load summarize chain allows you to do this type of application um the summarization especially when the context length is overflowing Lang chain provide a tools to use this to do this in just a few L of codes and this is an example over here in this example um some functions are loaded properly and the source is an URL the text of the source is loaded into this documents and okay then Lang chain will say okay I'm going to use chat um an open AI model GPT 3.5 turbo as my language model LA and this is it you get a Chain by calling load summarized chain putting in the llm and the chain type that you want now there are three chain types the first one is stuff this is stuff stuff is when you stuff everything in one prompt it's applicable to when the um the text you summarize is small enough to fit within the context length B yeah it's just the basic thing stuff everything in there but if you know that the context length is overflowing you need to either use map reduce or more refin I'm going to leave refine type uh chain type uh for you to discover but yeah it can be as simple as replacing stuff with map Ru chain okay that's actually I instead of that presentation slide you can also come to my blog to read over this again I think I provide more information in text um um here well I CH start with why do we need summarization so it has values because there are many use cases for example um you can summarize report file documents if it's too long um you can summarize a customer chat um if if the customer is chatting with your uh your your your company you want to summarize what is happening what kind of action that uh your company should take for example um if you are learning new things on YouTube all the time and most of the the video is very very long an hour at least like this uh live video for example you can use this to summarize your YouTube video as well that's a catch as I mentioned the llm works right now the llms that we are using is L three Works in text already so it's not going to summarize YouTube based on it video but it's going to summarize YouTube based on its transcript transcription yeah anyway um another application similar to the YouTube video is summarize a very long podcast given that you can get a transcript from it okay here you have a basic thing you have potential issues and the map reduce um approach here that's it um one more thing I showed previously that you can use chain. run that was an old style in the new um new update the Run function was encouraged to be replaced by the invoke function and it's just as simple as replace run with invoke all right now enough Theory and presentation let's go to the coding here we [Music] go all right so I have prepared this um so that it's ready to run and produce a gradual interface but I can walk you through what is happening here oh sorry guys my eyes are getting itchy anyway so we are loading whatever tools that we need we are using p tube library to get information uh from a YouTube url link gradio is for um building an interface request is for retrieving information via and URL uh re is needed to help us in this case extract YouTube description from a given link Lang chain is going to help us um in working with large language models prepare prompts and um part the outputs for us Tik token is a library that we are going to use to estimate the token counts uh that we need to process all right now that's the import now this section is the helper functions this function uh as the name says it will get uh YouTube descriptions if I provide it with a URL link I didn't write this myself it's from stack overflow con um thank you very much for those who have provid this to the public um this is a YouTube video info so given the YouTube uh YouTube url link what I will get out here is a title of that video and the description where I will call the previously mentioned um function this one is going to help me load a YouTube uh transcript into so now I have a YouTube transcript as a source I want to just load it so that I can prepare um I can be ready for the splitting uh steps right now I'm using a tool from Lang chain uh as well Lang chain has a tool called YouTube loader where you give it a URL and it will just extract the transcription for you play with it um and you will see what it looks like now I need this to uh I wrap it up so that I can display it in my gradio interface nothing more than that okay um so get text splitter is going to so the text splitter is handling this part handling this splitting part uh By the way when I show you guys these colors um chunks it's just theoretical you know it's not as Rous as uh when you are going to use um tools like recursive character text split or sentence splitter so for those kind of technical details you can read the documentation but again Lang chain um provides text splater tools for us to use and um one of the way that I'm going to use is a recursive character text SP where I provide the chunk size like how big for for this chunk how many tokens do I need as a maximum and then I can provide the number of overlaps U between adjusts uh adjust chunks now um for this one is is I give in the URL of the YouTube video that I want then I'm going to get a transcript text as as a text and also how many tokens are in it and eventually this is the punch light here this function gets called when a button a summary button is clicked so basically what it needs here it takes an input URL um the YouTube url it has a temperature chunk size and an overlap size so basically what is going to happen is that first a document uh the source the URL with which is a source is going to be loaded and uh loaded as text to summarize then the steps of chunking will come in using a text splitter now I get chunks then I Define the language model I'm going to use and thank to Old Lama running at the background I am my old my my llm is sitting on my laptop available at Port Local Host um available at local host Port 11434 with this temperature that I I can set to whatever number I want then then I create a change using uh helper function called load summarize change and I want a map reduce process or approach when I have my change I can just say invoke it I'll get an output and eventually an output text that's very simple right you might see uh that there are so many functions but most of them follow through the step load the text from a source split the text Define your llm defy a change and run it and these functions will be coupled with the gradio interface um I am going to provide this code uh on my uh GitHub on kastan GitHub so don't worry you don't need to uh copy and paste here here but when I run this code it's going to create a user interface via gradio where all the buttons are placed according to my design and yeah we can have it a go okay by the way um this is a Jupiter notebook you can have a gradal embedded oh okay you can have a gradal interface embedded here but what I can show you I want to show you is uh actually here so I prepare uh main.py which can be run um which can be run with the command line and at the end of the code demo. launch I said share to be true meaning I can uh I eventually I will get a link that I can send it to some somewhere else so no matter what it is in the world with the internet of course he or she or the user should be able to run this app so let's get started okay let me just pull a terminal all right let's um this one is yes I'm using the right environment so I can run taada not yet okay so it's uh if I want to do it locally it's here but if if I want to send someone else this link he or she will be able to run this so let's actually use that one now I put in the URL it's loading the interface okay this is how it looks like now I have a all a default value so just just to show um you quickly so okay I want to get the info takes a while but what it's doing here is initializing whatever it needs to do to talk to YouTube server get the title back get the description back and populate this to make sure that I'm not fooling you guys here we go yes this is the video so this is I created it's about using generative AI when I travel to um Tokyo it's my first time there and what kind of helps I can get from generative AI um when I travel and this this is the title um check this out when you have time so this is the title geni help me travel in Tokyo blah blah blah and the description is here it starts with three example and ends with uh shop chat TPT oop sorry guys not this page so here start with three examples and end it with chop chat so I was able to retrieve the title and retrieve the description of a YouTube um URL correctly now next one I can get my transcription clicking this button will talk to the YouTube server again get back the transcript and process how many counts um using Tik token how many uh token counts are there for this transcription so here this is a transcript from my YouTube video here we go and then you are ready to summarize this transcript now make sure that you note this so um I made it in such a way that you can change the uh chunk size and overlap overlap size parameter easily the temperature as well but again this is a summary um summarization task so typically excuse me typically you want to set the temperature as low as possible now um Lama 3 context length is about 8,000 if I remember correctly so this one should be able to fit in one go with um with the uh um with this one prompt but let's actually U fake it out well you don't have to fake it out so let's say that okay the chunk size is 2,000 it's a it's a old Brer that um it will take one computation right one summarization task and that's it because this thing can fit into that while it's being calculated let me just prepare a second video which is much longer um my channel how to go okay right here I am going to go to a very very long um here this is a very long live sections which is almost two hours so I copy that URL all righty so while it's taking quite a while remember I'm running this uh llm locally on my laptop and my laptop is an M1 one laptop so um it might be able um to calculate it but quite some time uh I forgot to show you that if you run AMA serve um not the app you can actually see the computation that is happening um in the background okay uh taking quite a long time to be honest so while it's being calculated I can point out several improvements that we can do with this first thing is okay now it just interrupted me all right sorry I'll come back to the idea later but this is a summary a traveler uses Google's Gemini app to navigate Japan without speaking Japanese correct relying on generative AI for local information part one yes you can get uh if you don't know anything about that particular region of the world you can ask for culture manners that you should do in the public or something else um all areas you should avoid um you can get Gemini to help with recommendations about um cafes and restaurants because it has integration with Google Map and lastly translation this is a little bit understatement in my video I I mean I mean I actually mean like Gemini helps you understand uh from an image right if you go to somewhere and you see okay I want to buy this product but it's not in English at all you can try to ask Gemini to explain that product by taking an image and ask it to explain it to you and this is what I did and it actually it was quite helpful I I uh I had a a headache and I want to take some medicine but I don't I'm not sure which one I should take and with some help from Gemini I can find the medicine I wanted now while the AI is uh helpful it has limitations in understanding cultural nuances and providing information on LW resource languages like Japanese correct that's what I said the speaker concludes that while I can be a useful tool he interaction and research are necessary for deeper understanding of place that's right at least for now when um the internet uh or the llms have not haven't been able to do extensive search do research on its own uh for um low resource um languages like Japanese language for example all right so this is a short transcript where it fits into one goal now let's try another URL this is a longer one let's get a title how to hire gen a ilm tokenization plastic token and another topic is about build customer chatbot with agent Al and R this is correct if you go back here and going to get a transcription from it let's see uh wow that's quite a lot 14,000 tokens definitely overflowing the uh capability of Lama 3 at least in the version that I use is 8,000 chunk size so I am going to say okay try your best to understand chunk size of 5,000 um so this should be able to put me to run uh to have four chunks or five chunks depending on the overlap size let's say I want to keep overlap of about 10% of the chunk size and just to be fun maybe increases temperature well okay I just I'm not going to do that I want to make sure that it works properly so I keep the creativity part of the model low so now I'm going to hit summarize buttons and it's going to power through uh different chunks right creating summary for each chunks and with all these sub summaries it's going to write a final summary for us now while it's going to be doing its job for I hope not longer than 2 minutes I can point out what can be improved upon first one um you know this can be changed to accommodate many other things like upload a file right um You can add a tab there where a tab can be YouTube link for one tab another link can be PDF another link can be excuse me Excel file for example and then this is just a source right uh a location of a source for YouTube it can be like this for a file it can be a place where you can upload a file or drop in a file and this section can be anything like um information related to that file or a sneak peek into that file for YouTube I can add another panel that display an image uh uh a cover of that YouTube video I can do that too I can add comments if I have apis to retrieve that kind of information this is easy part this is file related but uh summarize steps um you can have an improvement on M um the the a set of parameters that you can you can tweak in the algorithm right now I I forc you to use Lama 3 maybe you want to have a Dropbox to select the model uh the model that you want to for it to work on this summarization task that's possible as well other parameters can be um a port number uh if if you decided that you are going to change default value of the port away or you think it might be possible that one day you are using this port and the other day you using another you can have a box so that you allows the user to um provide a Target Port um and a model to load it up with that's one of the way um what else you can do well let me just go back to my code and then you can actually help me spot out uh oh what did I do uh what can be done to improve okay on this tab right here so pretty much if you hardcode anything then this can be improved right yeah that's what I said you can find a way to make sure to allow uh the user to select the model you want to use for Tik token as well you might allows for users to select an encoder to help you estimate the right um number of tokens right now you see uh well the tokenizer is for CH GP for GPT 4 but my model is Lama 3 that's a mismatch um that's the reason why I said it's an estimate um token count you can improve this as well you can have a user select what type of chain types um uh they want to use it could be a stuff chain type map reduce or refine process or something like that um another possible possibility is to calculate how many chunks there is you know before before the user hit summarize buttons maybe he they want to know how many chunks do I get myself into do I split it too little like do I have to compute like thousand chunks where 20 can suffice so that's a design option you you can do and another thing is another thing thing is and this is maybe a critical critical one as you can see here the summary is short right I talked about two hours on three different topics um to be honest I'm not going to be happy with this amount of summarization it's too little um no maybe it doesn't have enough Det DET so that I can learn from it think about it if this is going to be about um summarizing a lecture video and then you get this much summary as there no point so that is a way for improvements on many level of summarization task if you just want a few liners this is it but if you want to summarize suent enough but also have details so that you can actually learn or pick up some um specific information for it then you need to um you need to optimize and customize this process all right just to make sure that I understand this correctly I'm just going to read through this summary the text appears to be a conversation between two AI models discussing between two AI models discussing I'm not an AI see um that capabilities strengths um potential application it's a human discussing two AI models along its Dimensions including capability strengths potential application and so on so this is wrong um one model suggests creating a career development plan based on user goals and assessments the conversations will touch us on tokenization natural language processing NLP building a customer chatot for this is correct um this is not now it now it refers to human speaker me the speaker demonstrates the chatbot capabilities limitations suggesting improvements to handle specific request right well I am requesting that this particular chatbot needs to be improved um uh if you want to have a better summarization task anyway so it's just the way it is um you use I mean I use uh pre-made um function from blank chain I got a pre-made summary so this lesson actually tells you um summarization task can be easy and it can be uh um detailed um if you want high quality summarization task um based on your goal you probably need to dig deeper and customize your steps and it's it's not going to be that difficult guys you see the concepts here it's just that um what kind of promt that you are uh going to ask it and how many layers you are willing to go through and that's it in theory that's it um okay and that brings us to a close guys in summary today you have learned two major things the first one is old Lama it's an application that allows you to Again download an open source model like llama 3 um you can run it on your laptop host uh low API locally build applications from that locally hosted API and the second part shows you Builds on top of AMA show you that you can now build application with llama 3 being hosted locally with you can build a summarize uh a chatbot to summarize a YouTube video based on YouTube's trans uh transcripts and you can see you you see uh um um a foundation of summarization task and you see Corner cases where you need to handle um a case where the context the text is too long uh doesn't fit the LM you have you have you know what I thing called map videos process now and you can see um good and bad things of this simple tools that Lang chains has provided a tool called load summarize chain that's bring us to a close guys um today so thank you very much for staying up with me until now uh I'll leave you with a thank you and also don't forget to subscribe to our Channel share it with your friends comment and if you have tried anything like try the AMA and if it works or if it if it didn't um I'll be happy to hear it and perhaps I can lend you a hand as well so share it with you uh with your your co-workers and share uh subscribe and uh comment thank you very much have a good day