RAG Architecture and Pipeline

hello everyone welcome back to my YouTube channel my name is sunny and in this particular video we are going to discuss about the retrieval argument generation that is rag guys recently I started my YouTube channel and on my YouTube channel I'm uploading the content related to the generative AI so if you haven't checked guys please go and check and here guys you will find out multiple video related to the generative Ai and I'm trying to covering each and everything in a sequence itself and I'm trying to maintain one playlist as well so here in the playlist section actually you will find out the playlist related to generative Ai and the name is the name of the playlist is generative AI from basic to advance so yes uh I'm recording so many videos and in this video in today's video basically I will teach you about the rag okay and not uh with a few uh not with a simple one right so I will teach you the end to endend thing related to the rag so in this manner only I'm uh recording each and every tutorial so if you will uh check guys definitely you will get it and if you find out that really this channel is helping you for building your geni concept then please subscribe the channel and like the video and leave your thoughts as well so here in the description guys if you will check with the description so I added one Google form as well so let me show you that let me show you with the other video so apart from the introduction apart from the road map if you will check with other videos right you'll find out the Google form in the description so you can can write your thoughts as well uh over there as well so that I will get to know that what my audience want and based on that basically I will try to uh record I will try to record the content so guys uh I I'm having a rich experience in artificial intelligence machine learning mainly in machine learning deep learning and I work with the Gen also and with the mlop also so along with the Gen a I'm thinking to record one extensive playlist on mlops so guys uh don't worry I'll be coming up with mlops and I will be coming up with Azure Azure machine learning AWS machine learning there are so many things basically which I'm thinking to record okay so yes I think uh let's start with the session now so please guys subscribe the channel if you find that this is a valuable for you so I hope you got to know the topic so in this particular video we are going to discuss about the retrieval argument generation now this video actually uh this uh topic actually I divided into various part so this is the first part because here I to discuss lots of things because what I have seen that uh this beginner right so whoever is having zero knowledge with respect to generative AI or maybe AI machine learning deep learning so they are not able to cop up with these type of topic like rag okay or fine tuning transfer learning they is still lagging over there so what I thought that okay I won't cover the overview I will cover in depth related to these type of topic so here this uh particular topic basically I divided into various section so in this particular SE in this particular part I'm going to discuss still uh 4 point and then uh I will discuss in the next part basically I will create one Rec system from scratch then I will discuss that how you can use the rec system with respect to the multimodel multimodel means text to image or image to text then we'll discuss about the rag versus fine tuning which is better option so whether you should go with the rag or fine tuning then we'll understand the evaluation matrixes of rag so if you are going to build your Rec system that how you can evaluate that because that is very very much necessary and after that we'll discuss about the important research paper and at the end basically we are going to implement different different type of rag in the python itself okay so let's begin let's start so here uh I think in total there will be five uh four to five parts so this is the first part of the rag and in this particular video I will teach you the end to endend recck pipeline so what component is there and what all thing basically which comes inside the end to end pipeline pipeline now you will get to know each and everything okay so please make sure that you're watching this video till last and one more thing I would like to tell you here I have to cover so many things so I will go in little slower pace so that I can explain you in a better way if you if you're are finding out it is slower then you can speed up the video you can watch it at 1.5x so that you will get a better experience this is my personal suggestion to all of you okay so let's start let's let's begin with the video now so here uh you can see so first guys we'll start with the introduction introduction in the introduction basically we'll discuss about what is a rag okay now uh here if you will find out so each and everything basically I kept on my Blackboard itself and here itself I try to create one uh document kind of thing so no need to go anywhere here itself I kept all the link all the images each and everything over here itself so guys see how beautifully I have created it and it takes a lot of effort guys so please uh like if you find this a valuable then definitely you can like it you can leave your comments and all okay now here you can see so this pipeline actually I divided into various section so here in this particular pipeline uh we'll discuss about the injection process we'll discuss about the uh here uh you can see retrial process and then we'll uh we'll discuss about the generation process and here I have included so many things so many architecture and many Advanced term I'm going to discuss over here right so this video can be quite long uh around 30 to 40 minutes so please uh make sure that guys you are watching till last if you want to get each and everything related to the r system okay so let's try to discuss the r system guys here uh you can see so first let's try to start with the definition itself so what is the definition of the rag so here I kept multiple definition of the rag so let's see so here I'm saying that rag is called retrieval argument generation it's a architecture used to help lln M okay llm means what large language model like gp4 jimin gima okay I think uh this is the open source model gimma is open source model which has been released by the Google itself recently Google has released it I think uh day before yesterday only uh you can go and check with this particular model it is pretty amazing like llama is there Mel is there right so this for if like we want to get a better experience related to this llm model right so here actually we use the reg system re reg means what retrieval argument generation so here if I want to get a better response with respect to this particular model because this model is not having each and every information so definitely we'll have to argument the information so how we can do that we have two ways first is Rec system and the second is fine tuning I will come to the fine tuning also I'll try to discuss about the fine tuning and all but as of now let's try to understand about the r okay so here you can see response by using the relevant information from additional sources and reduce the chance that l will lead to the incorrect information so here along with the llm here I can write so along with the llm what I'm going to do I'm going to connect open I'm going to connect some other information right so I'm going to connect some other information and how we are going to do that how we are going to do that tell me so this document actually document means what my data which I'm keeping somewhere in the databases right and this databases basically which I'm going to be connect with the llm getting my point so this is the overall idea behind this R system so if I want to uh make my model like uh uh if I want to make my model like much more efficient in that case I will have to use the rank or if I want to make my model task specific or uh domain specific basically or problem specific in that case definitely I will have to use the r rag or fine-tuning so here we are going to talk about the rag system okay now other another definition it is similar to that only so rag is technique that enhance language model Generation by incorporating external knowledge external knowledge means under uh the outside knowledge right so external knowledge this is typically done by retrieving relevant information from large Corpus of document and using that information to inform the generation process okay same thing same thing I'm trying to convey over here now here you can see the SEC third definition with Rec the LM is llm is a leverage knowledge and information that is not necessarily in the in it weights mean okay weights uh meaning is it is not inside the training so what is the meaning of it guys so here I'm trying to say that uh whenever I was training the whenever I was training the llm at that time uh at that time I haven't included those information right so I haven't included those information so for that what I'm trying to do here tell me for that what I'm trying to do here so here I'm going to connect my llm with with one database with one knowledge base so that it can provide me a accurate information getting my point okay so all these three definition is trying to convey the same thing now why we should use rag what is the idea behind using this rag right so here guys you can see uh limited knowledge access as I told you that LM is not having each and every information if we are talking about GPT guys so GPT uh here I can write so GPT also if we are talking about GPT so GPT also has been trained till 2022 data right till this regular year only if we are talking about Jiminy so it is uh like little better compared to this compared to this GPT right so there are so many model basically uh which you can compare based on the trading and all now here guys is still this model are not able to answer a specific question so here to this particular model if you're going to ask then what will be the GP what will be the like uh GDP of the India okay current GDP of the India so here you can write here you can ask what will be the current GDP of the India so definitely it's not going to be answer the question let's say if you're going to ask that okay a what Sun sir has taught in this particular video so definitely it's not going to be answered for that also right for that question Also let's say if you're going to ask any domain related question so let's say any e-commerce related question uh let's say you are going to purchase something or the Amazon and if you are going to ask that particular system means through the GPD model then can you tell me the review or can you tell me the price uh can you tell me the price of this particular uh this particular product so definitely it won't be able to answer okay so in that case what I will have to do I will have to use something I will have to like uh uh either I will have to fine tune my model means on my uh requirement specific task or or basically I will have to connect with the external sources means I will have to connect with the knowledge base so that is the exact idea behind this right so limited knowledge access I think it is clear lack of transparency so llm is struggled to provide transparent or relevant information so it might mislead to you so here in the next point I have already mentioned alation in answer so it might mislead to you with respect to the information and all and that's why we use the rag system over here okay now if we are talking about the rag architecture guys so here is my reg architecture and this particular architecture I divided into three section into main three section in the first section basically uh there is an inje inje of the data in the second section retrieval okay uh means obtaining of the data like we are obtaining the data and the third one basically generation okay so we have three stes of the rag architecture injection generation and uh injection retrieval and generation now let's try to understand these three uh process inside the inside the architecture itself so here guys you can see what we are trying to do here so here actually we have a data so this data can be anything this data can be document PDF tag HTML XML anything it can be anything anything over here now what we are going to do we are going to extract the data and after extracting we are going to divide into a chunks into a various chunks right I will come to each and every point then why we should divide into chunks what is the benefit of it each and everything we'll try to discuss over here itself then what we are going to do we are going to convert into a ambidex right what we are going to do guys tell me we are going to convert our data into what into the edings embedding means what embedding is nothing it's just a numerical representation of the data I will come to that I will try to give you some sort of a glimpse but don't worry I'm going to create uh I'm going to create one dedicated series also on top of the vector database and the Ming because it is very very important uh it is very very important we can convert any type of data into the Mings nowadays then what we are going to do so here we are going to build a semantic indexes I will come to that also semantic indexes and then we are going to be store inside the knowledge base now this knowledge base can be anything it can be Vector database it can be graph database it can be SQL base database I will come to that also because this pipeline actually I have broken into the many section guys okay and believe me if you are watching this video then you no need to go anywhere for this Rag and all so this process actually it is called injection from where from this document to this data base so this is the first part then a retrieval basically so what is the meaning of retrieval retrieval means obtaining okay obtaining the knowledge from the from the knowledge base or getting the knowledge from the knowledge base so here let's say my user is asking something so here is my question I'm converting into the embedding and then I'm performing the semantic search over here and based on that it is going to find out the top result so this process is called this particular process is called a retrieval okay now here uh the third one basically Generation Now what I will do here so here my llm is generating a output means my result is going over here so here you can see what is happening tell me guys so here my result is going to this llm so here is what here is my result here is what here is my result result plus what result plus prompt okay result plus prompt now this is this is the thing basically I'm passing to my llm and then finally my user is getting the answer and this is the entire idea behind the rack and in how many steps I have broken the architecture tell me in three step now let's try to understand this each and everything in a very very detailed way but before that let me give you some idea related to this bding and all so here guys uh first of all guys uh if we are talking about the retrieval let me clarify a few terms over here if you are not good with that so here if we are talking about this retrieval so what is the meaning of retrieval guys tell me retrieval is nothing retrieval is representing obtaining the data obtaining the data from where from the knowledge base right obtaining or getting the data from where from the database obtaining the data obtaining the data from Pro from database okay now here uh we are talking about the argumentation now argumentation means what so argumentation argument means what argument means enhancing information so here what I'm doing enhancing information enhancing information okay enhancing information using using llm Okay using llm so this llm basically what I'm providing I'm providing the uh information from knowledge base okay info from knowledge base from knowledge base and we are providing the prompt that is nothing that is a user query right so here here I'm talking about the argumentation and retrieval now generation means what generation is a final stage where llm is generating my output right what llm is doing tell me llm is generating my output now this llm can be anything so this llm can be like it can be jimy okay or it can be like uh uh llama llama 2 right it can be like uh any sort of a model let's say GPT or whatever model you are going to use GPT 4 or something so here I hope this retrieval argument generation is clear to all of you in a simplest meaning now let's try to discuss about the embedding so here I would like to give you some sort of idea related to the embedding also so here guys uh if we are talking about the edding so edding is nothing embedding actually it's a numerical representation of the data okay this is what tell me this is a numerical representation of the data so here if I'm writing so so first of all let me draw one uh so let me draw the graph over here so here is what here let's say is my x-axis sorry y axis and here is what here is my x-axis now here if I'm uh like uh so this is what this is my x-axis let me write it down over here x-axis and this is what this is my Y axis now uh here let's say if I'm saying one vector okay in this particular direction so what will be the coordinate of this Vector tell me so here my X Y component will be zero and the X component will be some something now here if I'm seeing this one right so what will be the coordinate of this particular point so here I can write so here y x component will be zero and Y component will be y now if I'm representing any vector or any point over here in this particular space so here I can say I will be having X component and the Y component both right so what I'm doing guys here I'm going to represent this particular point in 2D space right and here you can see this is nothing this is the coordinate of this particular point so this is called Vector okay so Vector is nothing in linear algebra if we are talking about the linear algebra so the simple definition of the vector is nothing it's a collection of the it's a collection of numbers right so here I can represent this particular Point let's say the point name is what P1 so I can represent this P1 actually by this X and Y so this is nothing this the vector now this is the P2 let's say so how I can represent this P2 tell me P2 I can represent with this uh X and zero now if we are talking about this P3 guys so what is this p p 3 P3 is nothing 0o and Y so this is my vector vector related to this specific point okay Vector is nothing it's a representation of the point okay it's a representation of the it's a representation of the data all right sorry it's a representation of this particular point or this uh particular feature how many like uh this particular Point how many featur is containing it's containing two feature X and Y I hope you are getting my point what I'm trying to say over here now let's try to understand the different example so here let's say uh if I'm talking about the king okay so here is what here is my word King right here is my word let's say queen and here is word here is my word let's say men okay here is I have here I have three words basically so this three word actually I can represent in terms of vector so let's say here is my Vector so I can like I can represent in terms of any sort of a vector don't worry I will come up with this dedicated session related to the word emitting and all so here let's say this King basically I'm going to represent with this particular Vector let's say 0.1 0.5 0.8 okay one is here and 0 is here now Queen let's say I'm going to represent like this 0.2 0.5 0.7 0.9 and 0 now here man is what 0.8 0.1 here 0.2 here 0.3 or 1 now what I'm doing here so I'm representing this King this particular word into five dimension right this this Vector means what five Di Dimension how many values is there I have five values over here so I'm representing this particular Vector into five dimension now this every value this every value is nothing it is representing some sort of a property right feature related to this King right so here I can write let's say uh the king is what the king is poor no king is not poor so here you can see the lower value related to that now here you can see the higher value related to this man I can represent it let's say the king eats uh like junk food so here you can see yes uh little bit Queen also now here now here let's say the king is wealthy right yes King is wealthy so here you can see the number so this king is wealthy Queen is wealthy and man is not wealthy now here you can see King is like uh King is having power yes King is having power this one this one this one okay now here I can represent this particular one this particular World basically with another feature so here this is what this is nothing this is my feature okay and this King actually I'm representing it to the five dimension and this is the vector basically which I represented over here it was into the two Dimension okay X and Y so I think you got to know the basics of the vector over here so Vector is nothing it's a set of numbers right it's a set of number and yes definitely I can represent into n number of Dimension so here this is the vector related to this P1 right so here if I'm writing King right so King if I'm writing this P1 as a king So how in how many Dimension we I'm representing to the skin I'm representing into two Dimension X and Y now here basically I represent into how many Direction tell me me guys into five d uh five dimension and this is what this is a specific feature okay this is a specific feature so here x and y basically you can take two feature X is going to be my one feature let's say it's going to be a poor okay and here this Y is going to be another feature let's say junk right so by using this two feature you can represent the king over here got it so this is the basic of the aming now guys here there's one thing if I want to check the similar T between the Ming if I want to check the similarity between the Ming of between the vectors anything you can say so we have different different ways the first way basically which is called dot product which is called dot product the second way uh which is called uh basically cosine similarity cosine similarity okay we have another way also we have another way let me write it down over here another way so we can do it by using the ukan distance ukan distance now we have jakar similarity also right zard similarity also right so we have a different different like of ways so let me show you this cosine similarity and this dot product basically by using one example so here what I'm doing so here let's say uh I have x-axis and Y AIS now what I'm doing here so here let's say I have two feature one feature actually which is representing to the tag and the second feature which is representing to the uh fruit okay so I have two feature now let's say here actually uh I have two things okay I have two words the first word is what the first word word is Apple the Apple that fruit basically the second word is iPhone iPhone now here what I'm doing so here let's say if I want to represent to this apple in terms of this two feature right so how many Dimension guys tell me two Dimension so one is St and the second is fruit now over here if I'm telling if I'm talking about this apple okay so here uh Apple basically it's a fruit so it will be in this particular direction and the tag part will be zero over here so how I can represent to this how I can represent to this Apple so I can represent to this apple F0 okay Tech part will be zero now if we are talking about this iPhone so how I can represent to this iPhone tell me so here if I'm saying iPhone so here so Tech part will be uh so fruit part will be zero over here and Tech part will be T right this one so this is the representation of the tag right so here I can represent you two Dimension Now what is my final Vector related to the SLE tell me f and zero all right and here we are talking about the iPhone so here we have two Dimension now with respect to that what will be my final Vector so here 0 and T right now now what we can do now let's try to calculate the similarity over here so how I can check the similarity so here I can write the similarity let's say if I'm going to do a DOT product so here what I'm going to do guys I'm going to perform the dot product so how I can do the dot product guys so here is what here is my f okay and zero now here is what here is my zero and T now here I'm going to perform the dot product so it's going to become F into 0 plus T into 0 so finally it will be Zer so here I'm saying that this two Vector is not similar to each other right it is not similar to each other now if we are going to find out the cosine similarity cosine similarity basically so cosine similarity so in that case what I will do I will take the C of this Vector right so here the angle you can see the angle is 90° and we know that the we know that the COS 90 is equal to0 so in both cases in dot product and in cosine similarity I'm getting zero and here also visually I can see that there is no such similarity in these two vectors so iPhone is different right iPhone is different and apple is also apple is completely different from the iPhone so apple is a fruit which I can eat and iPhone is a phone basically which I'm using right so I hope you got to know about the edding similarity and the basics is clear to all of you don't worry I'll be coming up with the dedicated video related to this particular topic now you will get to know the thing in a more easiest way right so here guys I was talking about the rack pipeline so you can see the rack pipeline now in the r pipeline basically so here I Define into three parts so first is what so first is uh like here you can see the injection the second is called the second is called retrieval and the third is called synthesis okay now here you can see in the inje basically we have a document we are going to convert into the chunks right and then we are going to convert into the embeddings and here basically what we are going to do we are going to uh like doing the indexing and all and finally we are saving it somewhere let's say into the database so here you can see already like uh in the already in the like a diagram itself so here what we are going to do we are going to save this thing inside the database now in the retrieval what we are going to do we are going to make a query we are going to say uh we are going to fetch the result from the database and here is my top care result sythesis is nothing it's a generation actually so here we are going to combine our data which we are going to uh which we are going to fetch from the knowledge base and we are going to pass to the llm along with the given prompt and finally we are going to generate a response okay now guys over here how many process was was there three process okay three process inside the basic Rec pipeline now I will come to this architecture also because uh intentionally I kept this architecture over here whenever you are going to deal with the rag or AAL argument generation you will find out a different different kind of architecture but here guys the meaning of each and every architecture is same okay we are trying to implement this thing only that's it nothing else so here uh what I can do I can come to the inje process so in the injection basically what we are trying to do over here so guys if we are talking about the inje so in the injection actually we have five main stages so the first stage you can see like uh we have we have a document then we have a chunking then after the chunking basically we have embedding okay then uh we have uh the fourth one this is what this what there a vector eding and then the fifth one basically which is a datab base or retriever right so I have divided into this Five Section now let's explore each and every section one by one so here in the injection process you can see we have different different uh like a stages right so the first stage is a document so guys document means what document means it can be any file so it can be local file it can be CSV file it can be tsv file it can be Json file PDF dogs web pages XML database cloud storage or any type of location so from where you are getting the data okay now the next step is what so here the next step is chunking so after the after the like collecting the document what I will do so I will create the so I will create the chunks from the data so here guys uh you can see I have written different different question related to this chunking so first of all let's try to understand the definition of the chunking chunking means we collect the data from various resources and split the data into the chunks okay into the chunks now here I written the first question that is why chunking is required and the the second question is how to figure out the idle chunk size because this is the two major question which people find out now here if we are talking about that why chunking is required and what is the idal size of the chunk before that I would like to highlight some points over here so we are talking about the chunking guys see here let's say we have a data so let's say we have the entire data so this entire data actually which is called Corpus Corpus right so this entire data actually it is called Corpus now this data actually I can divide into the small small chunks chunks is nothing actually this chunks is also called the document document okay document this chunks actually it is also called document and it is nothing it's a sentences it's a sentences so this is my entire paragraph this data this Corpus is nothing it's my entire paragraph and here from the paragraph basically we are going to create a chunks means sentences now we have one more top one more term basically which is called which is called tokens right so from the chunks from the chunks basically what we get tell me from the chunks itself we get the tokens tokens tokens is nothing actually it's a word okay it's a word now this is a terminology basically which you need to understand with respect to this NLP so we have wordss means we have tokens from the wordss from the tokens basically we form the sentences okay the sentences actually which is called which is called document or which which is called Chun and after collecting so many Chun basically we can create the paragraph paragraph and this paragraph is nothing it's my entire data or which is called the Corpus I hope you got to know the hierarchy over here now the next thing is what so here you can see we have a separate different different question so the first question is what why uh chunking is required so we can fit the entire document whatever document whatever data we have so we can fit uh we can fill the uh we can uh like fill the entire document okay so to the uh to the uh like llm also but in that case there is two thing so the first is what first llm will not be overload with the with the information right so if we are if we are going to pass the entire data so in that case what will happen you know so here if we are going to pass it then llm actually it's going to be overloaded with the information so here I have written not so no LM will be overloaded with the information now over here it is even this llm also it is having some sort of a limits right so if you're talking about the llm guys so it is also having some sort of a limits and here if we talking about the GPT so it is having limit related to around like some sort of a tokens right so even the jimney also which is having some limit related to The Tokens we are talking about the cloud a which is having limit some sort of a tokens right uh or any other model like llama also so this is also having a limit related to The Tokens different different variants you can check and you'll find out the limit related to that right so this model actually it is having a limits with respect to the Token so that's why we cannot fill the entire data to the model now how to figure out the idle Chun so here guys I mentioned two things the chunk should not be too small if we are passing too small Chunk in that case it is not going to it is not going to uh like uh it's not going to be a sufficient and uh here in that case the retrieval process is going the retrieval process is going to be affected now here we are talking about the next one so next one is the chunk size is not be a larger one right so the chunk size should not be a larger one over here and if you're are going to pass a larger one so in that case my model get mislead right so it can get overloaded with the information so here we have to maintain the chunk size right we have to maintain the chunk size now here I think you got to know so here you can see words right words is nothing which is called token sentences which is called chunk or the sentences it is also called the document now here the paragraph is nothing paragraph is the entire data or is the entire Corpus I hope this thing is clear now based on which thing I have to find out the chunk over here so here I have mentioned couple of point so let's try to read this point so first is what first is data characteristic second is retrieval constraint the third one is a memory and computation resources right the fourth one is Task requirement now fifth one is experiment and the fourth one is a sixth one is a operational right now here guys I have attached one link also so if you will go and check with this particular link so here let me show you this uh link basically and with this particular link you can check okay so let me copy and paste it over the Google so just wait I'm going to copy and paste it and let me show you the beautiful article with respect to this particular chunking so here the chunking size right what should be your chunk strategy so this Pine con actually it has mentioned each and everything chunking strategy for the llm applications and all now here you can see so how should be your Chun so chunk consideration based on what category uh based on what like a properties you can decide the Chun size here you can see like each and everything chunking method so fixed size chunking content aware chunking right so by using the different different module like nltk spy recursive uh chunking how you can do that so here you can see by using the Len chain specialized chunking so by using the Len chain okay letex is there inside the Len chain itself so it has given you like so many method for the chunking now based on which feature you need to decide the chunking so here I mentioned couple of Point basically you can read this particular article and you can get the different different way of the chunking you can check with the different different libraries and you can check with the different different method like recursive chunking or a sentence splitting or maybe you can check with the specialized chunking and all which which is already there inside the length chain and in the different different library now coming to this point so here based on this data characteristic means if your data is clean if your data is well structured in that case uh like uh in that case that chunk Kings uh the chunk size will be a different right if your data like too much messy and if your data is too much complex in that case you will have to keep a different chunk size retriever contracts so here in in terms of the retriever contracts so here you can see just a second guys uh okay fine so here if we are talking about the retriever context so what was the next Point guys here the next point was the retriever context just wait just wait this one so here uh retriever context means retriever constraint so here if you have any retriever uh related constraint right I will come to that databases and all and with that the idea will be more clear to all of you what is this retriever constant and all because uh if we are not going to build the specific indexing and all okay and if we are going to create a lots of Chunk means two small small chunk lots of Chunk or maybe two bigger bigger chunk and if we are not going to write the specific indexing in that case also you'll find out that your model will mislead mean it it it won't be able to provide the like two specific information now here the next one is memory computation so if your memory is uh if your memory is basically if you have a memory issue basically with respect to your system in that case also you should optimize the chunk size task requirement for which task basically you are going to create a chunks so for the text pre-processing or uh for the code generation for the text generation for the text summarization there might be a different different task right so related to the task experiment you can do means you can uh like uh first you can take a different chunk size then you can take a different chunk size you can do a multiple experiment right overlapping consideration so yes overlapping information also means if you're going to create a chunks let's say this is what this is my data this is my data so from here here to here this is my one first Chun now from here to here is my second Chun now from here to here is my third Chun now here what you can do you can perform the overlapping consideration means this is my first chunk now the second chunk you are not going to start from here instead of that you are going to start from here this one this one so this is the overlapping so this from here to here this is your second one now the third chunk you are not going to start from here right so you are going to start from here let's say this is the third chunk so overlap also you can consider with respect to your chunk size so I hope you got to know the meaning of chunking now here you can go and check with this pine cone documentation they have provided you each and everything and definitely you will be getting it in a precise way right different different techniques and all and along with the code and based on these particular uh points basically you can Define the size of the chunk now coming to the next part which is very very important that is called embedding so over here guys you can see if we are talking about the embedding embedding now after chunking we convert it into a vector embedding definitely in the architecture also I shown you the same thing Vector embedding numerical representation of the data so we have lots of ways for creating the embedding so here you can see p of w tfidf and gr so this is a frequency based technique now the second one is basically neural network based technique so in that word to back fast Tex B Elmo open a emitting jimy emitting each and every model basically they are using neural network behind it okay now uh nowadays we are not going to use this frequency based embedding because uh it is not going to work at all and uh here actually we are not able to keep the semantic information also so seening information we are not going to capture inside this particular uh in this particular way right in the frequency base embedding definitely we'll have to use the neural network based embedding so here uh by using the bo to back fastex B Opia embedding Jim embedding we can get the embeddings right we can get the numerical representation of the data so generally you will see nowadays people are not using word toag also people are not going with a B also people are interested in this open AI based model or jimy based model and we have lots of models so here basically I capped a different different model I will show you the leaderboard okay now here you can see so if we are going to talk about embedding right if specifically we are going to talk about the embedding so I shown you the I shown you the embedding basically right so here I explain you the embedding concept if you will like scroll down definitely you will find out over here so what we are doing we are going to convert word word into the vectors right so this is called word embedding now here guys if you will look into today's rank right so the advanced rank basically so there they are not going to perform word related edding okay they are not going to perform the word related embedding what they are doing you know they are going to perform from the sentence level embedding right sentence level embedding so here let's say this is my data so here basically what I have I have my data so here I'm not going to be convert word by word Vector instead of that what I'm doing I'm going to create a chunk from here right what I'm doing I'm going to create a chunk from here and from that specific chunk right from that specific chunk I'm going to create a vector getting my point then uh how we can do that what is a way what is the way behind that so I can show you first of all uh let me show you the leaderboard where you will find out all the embedding techniques and all and then basically I will show you the sentence how you can perform the sentence label Amed so first of all guys let me copy this particular link where you will find out this leaderboard and don't worry I will provide you this particular documentation this particular uh notes so that you can go through with that now here what I'm doing I'm going to pass this link and let's see let's see the leaderboard so here uh hugging face has provided to me one leaderboard now in this leaderboard basically you'll find out a different different model and this model is nothing this is the ided model OKAY ambing model what two bag is a model now bag is a ambing model neural network based model if you don't know about that don't worry I'll be cover up in the upcoming session along with complete an entire uh like vector database now here you can see the model size aming dimension more MX token how much token you can provide over here average uh so 56 data set okay how many data set uh on top of how many data set they have trained classification average clustering average perer classification reranking average retrieval average there are so many there are so many parameter basically they kept over here if few parameter is not clear to all of you like reranking retrieval average right summarization average don't worry once I will explain the evaluation techniques of the rag right and once I will come to the retrieval operation at that time this thing will be clarified so don't worry about it now here guys you can go through this particular website and you can check the edding model with respect to the different different value if some model if like let's say if some model is not here inside this particular leaderboard then definitely you can search separately and you can get the information over here but by seeing this particular leaderboard definitely you will get the idea that how to evaluate iding based model okay now the next thing is what here so here uh leaderboard is fine so I was talking about the I was talking about here sentence label and word label embedding okay now you can check out to this particular link let me copy and paste over here so what I'm doing guys I'm going to copy this particular link and I'm going to paste it over here so here actually you will find out the sentence Transformer right sentence Transformer so the sentence Transformer actually it is able to responsible right it is able to do it is able to do the sentence label embedding so sentence Transformer is a python framework for state of art sentence label eming so this is the complete processor which is already there inside the hugging phase by using that you can convert into you can convert your sentence into the embedded Vector right and whatever uh model you are seeing like open AI embedding model or any sort of a model they are using this sentence Transformer in the back end I hope you are getting so we are having other model as well like B we have Elmo right but this model actually they are working on a word label word to bag also it's a very old model it is working on a neural network and yes it is working on a word label it is working on a token label but we don't require this thing why why we don't require it guys tell me so here we want to capture a semantic meaning with respect to the chunks with respect to the sentences I hope this thing is clear now you can go through with the documentation also so this is the official documentation of the sentence Transformer now if you want to check more about it so here what you can do you can check about the sentence level embedding right so sentence level embedding so once you will search it you will get a different different model you will get so many articles and definitely you can perform the sentence level article just go and check and yes try to utilize this thing now the next one is what so I hope aming is clear to all of you now coming to the next point over here so here I mention some description also description wise here I clearly mention so let let me show you that first of all let me remove this particular part this one and just try to read this particular description so here I'm saying how sentence Transformer differ compared to the Token level embedding model such as bird so here is a complete description you can read about it and definitely you will get it okay now the next thing is what Vector edding indexing so what is the meaning of it what is the meaning of the indexing so guys indexing here so whenever see this is what this is my data okay here you can see this is what this is my data and this data basically I'm going to convert into the vectors right so along with this Vector actually I'm going to attach the indexes value also what I'm doing I'm going to attach the indexes value also now here guys this indexes actually we have a different different way for creating this indexes we have a different different way now over here you can read a vector database is store indexes along with a vector aming for the fast retrieval and similarity search then why this indexing is required tell me for the fast retrieval and the search so here guys what you can do you can go and check out with this particular article and definitely you will get a lot of information related to the vector indexing we have different different method related to the indexing so here what I can do I can search about it I can pass this link and see guys this is the article which I mentioned inside my document here you will get the different different and the variance uh very uh like a variety of the method for creating the indexing flat indexing LS indexing now here you will find out the inverted file indexing so many method related to the indexing so guys please go and check with this article also you will get to know how to create the indexing okay I hope so far you got lots of information over here now you can easily understand this particular architecture I Know video is going to be quite long so what I will do let me explain you this last part of it then I will stop this video and from here from the retrieval part so I will cover in the uh I will cover up in the next video and I will create one uh like uh R system also from scratch so let's understand the last part of the injection process so here you can see the last part is nothing that is uh this one database now guys this database actually it can be anything okay generally we are talking about the vector database now but at the end I will show you one architecture and with that you will be able to understand we can use any sort of a data database so here you can see we are creating a different different API related to the different different database bring my point so here I mentioned three type of database generally we are using the vector database yes but we have a different different type of database now let's try to read over here this data database is also called the retriever okay this is also called the retriever a vector database graph based database and regular database Vector database is what it uh like store the data it store the Ming for what for the like uh for the like retrieval and for the similar search basically and we can perform the different different operation like croud operation metadata filtering horizontal scaling serverless with respect to this Vector database and nowadays it is going to be a very very popular and the companies related to this Vector database they raised a funding in in the billions also so if you will check with the pine cone WB or other databases also right so uh chroma DB and F which is the database of the meta you will find out that people are being people are using a lot right I will let you know which one you should use at the end as of now in the basic Rec pipeline you can use this Vector database but whenever we are going to create Advanced pipeline at that time we'll use the combination of the database now the another database which is graph database so in the graph database basically each and everything is going to be stored in the form of noes and ages right so node are the primary entities in the graph database each node hold all the all data about a person product business event or another entity okay now we are talking about the ages so ages is nothing connecting part of the graph databases they show the similarity relationship communities you can define the properties and the weight of the edges to fit your purpose so in short basically what I want to say here so I want to say that by using the graph database what I'm going to do I'm going to using the complete power of the data I'm going to perform like a I'm going to get the each and every relation ship inside the data right and how we are going to design the data in the graph database in the form of node and ages and in many places basically we are using this graph database like search engine logistic business social network everywhere right we are going to use this graph database now SQL databases I think we know about this SQL database so it offers a structure data storage and retrieval but it is lacking in the semantic and the flexibility we generally don't use the SQL based database in very few scenario we are going to use it but generally we focus on the graph database and the vector database we use the other form of the nosql database also like mongodb we use the cassendra okay that is cassendra is a column based database uh columnar based database mongodb is a document document based database graph database like neo4j okay so yes we can use the no SQL based database and it will work fine right now here you can see so where we should use which database so here guys see in in the advanced R system if you are talking about the simple one so in that case you can use the vector database only but if you want to present a lot of relationship between the data in that case some other database will be required and the graph database is a best choice for that right so here I have clearly mentioned it is hard to make the choice uh go between this graph database or the vector database with gener a VI large language model and real time data playing is the increasing part in modern application and we are seeing increasing in the combined solution so here we are saying that in nlm based application in the generative VI application we are going to find out the combined solution getting my point so here you can see neo4j okay this this particular database recently added the ability to perform Vector similarity search also they aim to make more sense of the data okay and they want to like combat with a halucination which is a misleading with respect to the llm by blending similar feature Vector to input Vector found through lookup into the knowledge graph getting my point so here by using the knowledge graph we are going to represent the relationship of the data and by using the similar uh by using the similarity search we are going to provide the relevant information I hope you are getting now we have lots of like database but which one you need to select which one you are going to select so here guys you can go and check with these particular link and definitely you will be getting it so here guys you can go and check with this particular link and there you'll find out the complete and detailed comparison related to this data base okay so here Vector DB comparison here's the link guys now just try to go and check with this particular link so you'll find out all the database so chroma is here and here you will find out the different different feature also license tab vsss launch filter hybrid search fact Geo search right each and every feature based on the different different feature basically you can evaluate your vector database you can evaluate your graph database Pine con is here so just just check with the python OSS is there no OSS is not there right you can check about the OSS what is the OSS and all you can check with the licenses right so you can check with the Deb language development language BSS launch right so there are so many things this is the meta information about the database and here is the search and the models basically to which what model basically is going to be uh like uh it's going to be support right now here you can see the model now you can see the API operations every information you will find out related to this Vector database and easily you can make a comparison over here getting my point I hope guys you are getting so this was the complete inje process now let me summarize it guys and then I will conclude the session in the next class I will explain the retrieval part and the generation part because the video is going to be quite long so that's why I'm going to be stop it so here the first thing is what introduction is clear right here the rag architecture is clear so let let me show you this particular thing rag architecture is clear now here end to end pipeline in that I explained you the injection pipeline so what I have explained guys I explained you the inje pipeline okay now just a second so here I have explained you the injection pipeline right injection Pipeline and this injection pipeline basically I Define a various component the first component was the so here you can see the first component step by step you can see over here and first look into this architecture and then go with the component so here the component is document and then chunking then here you can see the embedding then here you can see the vector embeding indexing then the database or retriever so here I completed this injection pipeline now in the next one guys so there we'll try to discuss about this retrieval pipeline the part two and part three and then there are other topic basically so here I will be coming up with multimodel rag rag versus fine-tuning evaluation metric is and how to improve R system with a different different research papers so guys don't miss out this series it's going to be a very very interesting so guys uh yeah I think uh it is uh fine for this particular video so thank you thank you for watching this video we'll meet you soon in the next session until thank you bye-bye Take Care thank you

Transcript for:RAG Architecture and Pipeline

Transcript for:
RAG Architecture and Pipeline