Transcript for:
Creating a Custom RAG Pipeline

hello and welcome to the channel in this video I'm very excited to share with you a stepbystep guide on building an endtoend rag pipeline using some amazing tools and Technologies rag stands for retrieval augmented generation and it is a way to provide large language models context to your own data these large language models are trained on a huge Corpus of General data and they have no knowledge of your custom data so with the help of a rag pipeline we provide these llms context around your own data in this video we will be using Lang chain to create the rag pipeline we will be using this model Triplex from sci-fi to extract entities and their relationships we will store these entities and relationships as embeddings or numerical representation in our Vector store and then we will be using o Lama's models both the triplex and the general inference model mol or you could use any model and then we will building a graphical user interface around a chatbot with the help of gradio to knit all of this together so in this video we'll be working with custom data I will be using my own personal data to show you how to set up the rag pipeline using Lang chain which is a very popular framework to build llm applications we will extract entities and relationships by using this model Triplex which I already have covered in way more detail in my other video yesterday so please search the channel you won't be disappointed with that replex video and then we are going to store the extracted data in a vector store and again you could use any Vector store you like but I will be using F and then I will be using a GUI using radio to interact with the pipeline so this is the whole whole technology stack we are going to use the code which I will be using in this video from end to end I will put the whole code in my blog and I will drop the link in video's description so don't worry about writing it uh you can just go to my blog from my video description and copy paste it by the end of this video you'll have a fully functional end to endend rag pipeline that you can use for your own projects in a production environment so this is not just a lab project you can can readily use it in your own company your own purpose so let's get right into it before I show you the installation let me also give a huge shout out to our good friends at Mast compute who are sponsoring the VM and GPU for this video If you're looking to rent a GPU on affordable prices I will drop the link to their website in video description and I'm also going to give you a coupon code of 50% discount on range of gpus let me launch my terminal in the same system as you can see I am running OB 2 22.041 with 48 GB of vram let me clear the screen also make sure that you have K installed K lets you keep everything separate from your local system not mandatory but highly recommended so you can see that I have this version of K install let me create the K environment I'm just calling it rag pipe with python 3.11 let's wait for it to get created and that is all done now you need to install a lot of stuff here so I'm just going to paste it and this is all the stuff including Lang chain FES gradio o p PDF and lot of other stuff so let's wait for it to get installed this is going to take 4 to 5 minutes so let's wait for it to finish everything is installed now let me clear the screen now next thing you need to do is to make sure that ama is installed and running if you don't know how to do that please search my channel I have done like hundreds of videos on it but just to give you a quick idea go to ola.com from here just click on download for example if you're just using Linux simply copy this command run it in your terminal and you should be able to install AMA for Windows Again download this XV file and then click on next next next it is going to install on your local system and that's about it so let's go back to our terminal I already have Ama running so if you do AMA list you will see that I already have various models you would just need Mol and Triplex from here so in order to install it all you need to do oama Triplex it is going to download and run Triplex for you similarly I will be using mistol for inference and you can simply do o po mistol I already have both of them running as you can see on your screen let me clear the screen next up let me install jupyter notebook and then launch it in the browser and then we will do rest of the code there it will be easier to look at so let's wait for it to get launched and the jupyter notebook has launched now let's first import all the stuff which we have installed and it includes a lot of stuff because we are using PDF plumber as the file which I have about my personal information is in PDF so that is why we are using it f is for Vector store and then this is and llm chain now just to give you bit more information at a high level what happens in a rag pipeline is that you take your own data it could be in any file your text file PDF file or whatever any unstructured raw data you obtain it for example I have just created this my pdf.pdf file that contains my own data so I just put it in information about as who fad mza is where he lives what he does of course llm doesn't know about me so that is why I have put in my information and just think about it you can just replace it with any of your own data with multiple files of course so this is the raw data I'm going to use the first step in drag pipeline I will load this data I will chunk or split this data in s uh smaller parts then it will be converted into numerical representations or embeddings or vectors as they are called now for the purpose of our stuff what we will do first we will extract the entities and relationships from this P PDF which I just showed you and then we will convert it it to numerical representation so what I'm going to do first I'm going to load this data the PDF one I will extract the entities and their relationship with Triplex then I will store those entities and relationships in a vector store and from there I'm just um going to retrieve it with the help of rack Pipeline and augment it in my prompt and then we will create the GUI with gradio so that is the whole flow we are going to do so first step let's import all of the libraries you can ignore this warning let's wait okay that is done next up let's define a custom function which is going to just return us back our Triplex now this is a point where if you you haven't watched my other video which I showed you about that Triplex model that will help a lot as what exactly I mean by Triplex if I go to the model page just to show you so if I remember I showed you that I had this raw text in PDF file I provide that text plus that what I want like I want to the model to extract person and organization from that data and the relationship founded by Triplex will go into that text and then return me the whole text that okay the person is fad mza organization is YouTube and then maybe whatever I relationship I want it is going to return me in this format on the right hand side that first person is fad second organization is SpaceX and organization is founded or by one it means that space X was founded by Elon Musk so this what this does this is speciic basically creating a knowledge graph in Knowledge Graph not only we have the data but we also have the whole relationship between that data that makes llm job easier to return more coherent more context aware data and so because it is more grounded in your information in normal rag we do store the data and we only do similarity search but we don't really know semantically how the things are connected to to each other how they related to each other that is the advantage of this knowledge gra and this is a uh this was made Popular by Microsoft graph rag which we already have covered a lot on the channel and that is basically the evolution of that so that is what we are going to do what we have done here is we have just defined a simple function which is accepting text our PDF file What entity types want to extract and what predicates or relationships we want to extract we are just creating an input format on top type of it because this is what is required by Triplex and then we are simply creating a chat template in Json format in the message and we are assigning that message to prompt and then we are using Triplex model from AMA which is running locally and then returning the output so that that this is what this function is going to do the function is created as you can see so now in the next step let's first extract those entities and predicates or relationship from that data so what I have do done here as an example I want to extract person and location and what the profession and where the person is based in this is and you can of course if you're building your end to end application you can create a variable around it you can take this as an input from the user or from your API call up to you so first I'm just specifying my file here specifying the text here I and for the textt one I'm just going through the reader this reader and extracting the text from the file putting it in this text variable and then calling our above triple extract function with text entity pages and predicates then I am just massaging the data because the output which gets return has these uh codes here so I'm removing them and then extracting only entities and triples and then printing it out and by the way when we say triples triples means this that which is uh what things related to what so let's run it let's wait for it okay so you might get this error in okay okay I know what the problem is so I'll actually show you so if I go above where I show showed you the Ola package sorry let me go let me go where is that okay let me open another Tab and then show you what is happening so if you do here AMA list you see the name of the model is SciFi / replex whereas in our uh AMA function I just gave it Triplex so that is the problem so let me put the triplex here and then run it again and now when I will run it it is going to give me the entities and triples so let's wait for there you go it has extracted person fahat mza location Sydney location Australia and location YouTube because I have given it at I I am at YouTube so that is why it has given me this so this is what it has extracted from that data cool okay in the next step let's write this data which we have extracted entities and triples or relationship into a text file and then we are going to use the loader from Lang chain to load it in the docs variable that is done and now let's do the usual rag stuff here in the rag Stu what we do first we splited it to smaller chunks so that it is easier to deal with then we are instantiating the embeding model which is going to convert that chunks of text into numerical representations or embedding then we are creating a vector store and then we are specifying retriever which is going to retrieve that information whenever it is needed also uh I'm using this F here but of course you can use any Vector database of your choice like chroma VV or whatever and if you want to know how to use which Vector store in which situation then go to my channel just search for top Vector databases for AI and then this video which I did yesterday tells you exactly that that when to use which Vector database okay but I'm just going to go with this one so let me run it and that is done in the next step let's specify our promt template after defining the llm so I'm just going with a mol for inference from oama and then we are defining our promt you can go with your own so all I'm doing here is I'm just saying use the following pieces of context to answer the question at the end if you don't know the answer just say that I don't know but don't make up an answer on your own keep the answer crisp and limited to 3 four sentences now this is what I always use for rag pipeline the reason is I don't want llm to go astray and if it my data custom data doesn't contain the information which llm or user is asking llm shouldn't make it up or shouldn't hallucinate in other words okay so we have defined our prom template the way we want it and of course this will depend upon your application too next up Define an LM chain this is a concept of length chain where it takes everything as a chain for example we have in this chain we have component of llm which we defined above mol we have this promt which we have defined and we are not doing any call back and we are making it more verbos to show that is done and now let's define our uh our document prompt so this is the prompt which we are using this is where we just took it from above and this is all the information we are just knitting it together where we have these variables so this is a document prompt which we are using then combine it with the llm so this is llm this is a document which is our own data by the way in simple words and then we are defining a retriever interestingly enough this makes it easier to read but this is one of the criticism on langin that they make it B more uh like more steps they could have done it in one but that is fine okay next up let's define the respond function which is just to respond whenever sorry not this one to respond function is just before our gradio one um whenever user asks a question and I'll show you this will make sense now as I show you the gradio part so in this this respond function which we just defined above we are passing users question and history that in return is calling the QA above and this QA is calling rest of the chain so it is all chained together so this is what I meant earlier and this is just creating a simple chatbot so let me run it and you see it is now running on this if I click here it is going to open our gradio chatport now in this gr chatport you can ask it for example who is fad Mah let's wait for it it is just going through my own data there you go The Fad M and A in Cloud engineer blah blah blah and you can ask it um what fad likes to eat now this information is not in the data so let's see what model does does he make it up no you see model says that I don't have information about fad M likes to it as that provided context does not mention his dietary refences or habits so let's ask something which is in the information where does he live B lives in Sydney Australia and then of course you can ask the general information that which state Sydney is in now this information is not in my um custom PDF but this is a general information model should have been trained on it there you go so new South WS but if I ask something about Fahad mza which is not in the data and which model hasn't been trained on you saw that model simply refused so this is it guys you have your n2n rag pipeline running on your own data all local no API calls we are using this Triplex model to generate the relationships and their entities we stored it in the F Vector store and this was also Triplex was run from olava we used Lang chain to it all together with a gr interface on top of it and that's it I hope that you enjoyed it let me know what do you think if you like the content please consider subscribing to the channel if you are already subscribed please do me a favor and share it among your network as it helps a lot thanks for watching