Hybrid Search in AI Systems

hey Daniel here if you're using vector stores to ground your AI agents in your own data then you may have noticed that you don't always get accurate results and one of the reasons for this is that while vector search is great for handling natural language queries it can sometimes struggle when it's dealing with specific names or terms maybe acronyms or codes that are in the knowledge base in this video I'm going to show you how you can fix this by implementing hybrid search and I demonstrate this both using superbase and pine cone to explain the concept of hybrid rag let's first look at what vector search is so vector search is great when it comes to capturing the semantic intent and the meaning of a user's query so let's take a quick example let's say we have an AI agent that's deployed on an e-commerce store and we have a customer on the front end who asks this agent a question can I see the various blue cotton t-shirts that query goes to the agent who goes deep into the knowledge base to fetch similar products and not only will it retrieve a blue t-shirt it may also retrieve other t-shirts that are cotton for example so what's happening behind the scenes then is that query is sent into the agent and the actual query is converted into a vector and in this case it's a dense vector so this is a numerical representation of the query itself that is sent into the knowledge base and the knowledge base already contains all of the information about all of the products and then what comes out of that are these similar results and then based on that query let's say it's show me lightweight cotton t-shirts it will pick up products of a similar meaning and so it'll pick up maybe comfy summer tops breathable shirts for hot weather soft tees for everyday wear and that's how you can end up with quite a broad selection of results for a query like that and this is the real strength of semantic search or vector search where it falls down though are questions like this if someone asked "Show me blue t-shirts that are mediumsized," you may get back results that are not medium but are blue for example or if you ask the agent to show you a blue t-shirt with this specific product code the agent would again fetch a lot of results that are t-shirts that are blue and it may or may not include the product with that actual code and this is an example of how semantic search can be a little too broad in certain scenarios so if we compare this then to the more traditional keyword search or full text search here we're matching exact words or partial words and it's great for precision so back to my example if the customer is asking for blue cotton t-shirts that will go to the agent and if it was using keyword search instead of vector search it'll provide products that have the specific words blue cotton and t-shirt either in the title or the description and there's a number of ways that this can be done behind the scenes but one of the most common ways is for that query to actually be converted into a sparse vector which is then sent into the vector store and similar results are returned so the strength of this approach if you were to ask do you have a black cotton t-shirt it could return a product like this the Apex 25 black cotton t-shirt because these are exact match terms within the title or within the description but the weaknesses then are the opposite of semantic search it has no understanding of meaning t-shirt doesn't equal T for example unless it was explicitly added via synonyms so while keyword search has a lot of strengths in getting the exact match back it really isn't that flexible so then this is where hybrid search kicks in because you have the best of both worlds you have the precision of keyword search as well as the semantic understanding of vector search and within this approach you're actually merging the results from both systems so the customer looking for the blue cotton t-shirt is going to receive a result set that has the blue cotton t-shirt at the very top but then it also has other similar t-shirts that they might be interested in and what happens behind the scenes then is that that query when it goes to the AI agent is actually converted into a dense vector embedding for the semantic search as well as a sparse vector for the keyword search and two different result sets are created with those two different algorithms and a key aspect then of this hybrid result set is that they are then ranked because you end up with scores from both systems so there's different weightings given to the scores so that a hybrid result set can actually reflect what was the strongest results from both data sets so if we were to look at a typical AI agent within N8N there is no option for hybrid search within the tools you have various vector stores for example but all of these are simply semantic search you don't have this full text lexical keyword search so there is a little bit of custom work you'll need to do to get these systems up and running but if you are struggling with the accuracy of your rag agent when it comes to retrieving data from your knowledge base it's definitely worth testing out hybrid search in this video I'll be going through this implementation of Superbas's hybrid search and this is the resulting NN workflow which is pretty simple most of the work actually is on the Superbase side and this is the end result in Superbase this is my documents table and not only do we have our embedding column for our dense vector embeddings we also have this TS vector column for our full text search and interestingly if we copy one of these out and have a look at it in Notepad here we can see the various keywords within the chunk or partial words within the chunk as well as their positions within the chunk so this is used within the keyword search algorithm to actually find these exact matches the Pine Cone hybrid search implementation is a little bit more fleshed out on the NAND side but as a result there's actually less to do on the Pine Cone side so here we have our hybrid search ingestion flow we have our AI agent that's able to hit our knowledge base but that just triggers this search query flow that you see here and on the pine cone side then you can see our full knowledge base and if we click into one of these chunks you can see the metadata under values this is the dense embeddings and then sparse values represents the full text search data before I jump into the implementation if you'd like to get access to these workflows so you can test them out within your own AI agent then check out the link in the description to our community the AI automators where you can join hundreds of fellow automators all looking to leverage AI to automate their businesses here we'll be implementing hybrid search on Superbase superbase have full documentation on how to actually get this up and running so if you come into this page I'll link it in the description below and the first thing you need to do is copy out this query to create the actual documents table and what you'll see is it not only has an embeddings column for the vectors it also has a full text search column so when you inject data into this table it'll populate out this column so come into Superbase i'll create a new project for this and when you land in your dashboard go to your database and then click on extensions because the first thing you need to do is enable the vector extension and from here then go to SQL editor copy in the query to create the documents table and then we'll paste it in here and the one thing you need to do is you need to change the number of dimensions for the actual vector embedding column and this depends on which embedding model that you're going to use so I'll be using OpenAI's text embedding 3 small model which has 1536 dimensions so we'll click run here and then you get a success no rows returned and then you can verify that was created by going to database and you can see that there and if you click on this icon it brings you into the table view where you can see the data but obviously we don't actually have any data okay so back to the SQL editor then and let's copy in the next snippet so here we'll be creating indexes for both the full text search column and the vector search column so this is to make sure that we get really fast results back from this table and again we'll get success and to verify that if you go to database indexes you can then see those indexes created and now let's grab the next snippet so this is to create the hybrid search database function and what this function will do is it's going to carry out a vector search and a full text search and then both of those results are going to be fused together using this reciprocal rank fusion process so copy that out and drop it in here and again just at the very top you need to specify the number of dimensions in the embedding model and we'll leave everything else as is and we'll click run and that has returned success and to verify that go to databases and then functions and this is the hybrid search database function that was just created so at this point now we have the database table we have the function created to actually carry out this hybrid search we just now need a means to actually trigger this from NA to end so if you go on the left hand side here and click edge functions but let's create this edge function using the sample code they provided and then on this page click that you want to create your first edge function via the editor you grab your code and drop it in here now there are a few changes we need to make and it's around the embedding model again so we need to change our dimensions to 1536 and then we need to change the model name to text embedding 3 small and what's happening here is that this function is actually going to go to OpenAI and it's going to take the user's query and it'll actually generate the embeddings for it here and pass it to the search function so we won't need to do this on the NAN side when we're actually triggering this but then as a result we'll need to provide our Superbase project with an OpenAI API key that it can actually use so then at the bottom let's give our function a name okay and that has succeeded and now we have our endpoint URL and further down it gives an example of how you can actually invoke this function but just be aware that the data that it suggests that you pass is incorrect it's not actually based on the code in the function so you will need to make some changes here and what to do is click on code and just copy out the name of this OpenAI key and then come into secrets cuz you need to add a new secret which is that API key and then go to the OpenAI playground to create a new key so I've copied that out and now I can paste it in there and click save and now the edge function will have access to this key when it's actually triggered so we can generate the embeddings so back into functions and let's copy out this example curl that it provides and then if we come into N8 and then for add a first step let's add a chat trigger and then after that click on plus and we'll add a HTTP request and we'll import this curl and you'll see that that actually populates out the method and the URL as well as the API key for superbase and then further down then you can see these body parameters and this is where you need to make some changes where it says request JSON you can see that it's looking for a constant which is query so if you just copy query out and bring it in there so our query is going to be and this is what we're going to get from our chat trigger so if we click execute previous nodes you can now see the chat input message there so we can drag that in then to our query because this is what we're going to send into our vector store we'll click hello and that's gone to superbas's edge function now we're getting no output returned which is probably correct cuz there's nothing in the database so let's load up some data into this vector store now so that we can actually test this properly so let's make some space here on the canvas and if you click on the plus let's add a manual trigger for the moment within our rag masterass we built out a full data ingestion pipeline using Google Drive and web scraping for Superbase's vector store as you can see here on screen so check out the master class for that and this workflow is also available for download in our community and from here type in Superbase and we're going to use the add documents to vector store action so click on that and from here then you'll need to create a credential so on the dropown click create new credential and you'll need to enter your host and a service ro secret and you can paste that in and click save and you'll get a green connection message which means it is able to connect and you're all done you can choose the table which is documents and you now need to add two more elements so if you click on embedding we need to add our embedding model so this is open AI and again you need to create a connection if you don't have it where you can drop in that API key you created previously and from there then you can choose text embedding three small and this defaults to 1536 dimensions so you don't even need to set it and it's crucial that the embedding model that you use when you're uploading documents is the same as when you're querying the vector store and then for document we're just going to use a default data loader and we'll just load all data that's incoming into the superbase node so you can leave it as is i generally use a recursive character text splitter so we can leave it at the chunk size of a th00and so if we click test workflow okay we've hit an error and yes so it could not find the metadata column of documents and the reason for that is the actual example code that we use to create the table doesn't have that metadata field so we just need to add that manually so back into Superbase if we go to our database and open it up and then you'll see this plus where you can add another column to the table so if you click on that and you just need to type in metadata and then the type of the column is JSON B or it's binary JSON data then if you test it again that column now exists and it worked so we now have our hello world content in our vector store you can see the dense embeddings that we have here but now this is the full text search field which is TS vector and this is what the full text search actually uses when it's trying to figure out what's the most relevant results in the database okay so let's load in a document and the example document I use quite regularly in my RAG videos is this Formula 1 technical regulation document which is 180 pages long so we'll just copy out that URL because this is hosted on the FIA's website and if we just come into here and click plus let's just download this document but for the moment just to demonstrate hybrid search we'll just use this one document so there's the request node we'll paste that in and if you click test step we're getting the binary back into our default data loader and we'll just actually process the binary as opposed to the JSON and let's trigger this workflow and now you'll see that it's processing through this PDF and it has just finished and we have 683 chunks and now if we go back into our hybrid search table and refresh we now have that 180 odd page document upsert it to our vector store and it's embedded as you can see there but we also now have these full text search fields populated as well and if you copy one of them out and bring it into notepad just so that you can see the format of this this is what we're getting so it's not a vector the way the embedding is it's a set of words or partial words as you can see there with different weights and positions so now that we have data here let's come back to our chat message and let's pass in a query that will appear in the document so maybe engine intake air so just drop that in here and we're getting 10 items and that is because if you look at the edge function and go into the code you can see that we have match count set to 10 there so if we change that to say 20 and deploy the updates we're now getting 20 items back and if we have a quick look at the data that we're getting you can see that we're retrieving the content of the chunk as well as the full text search data and the actual embedding itself which we don't really need i think what would be interesting though is to understand where this chunk ranked in both the vector search and the full text search so we'll add that in a couple of minutes but maybe what we'll do now is we'll bring this into an AI agent so let's disconnect the chat message from the actual request and let's add our AI agent and then when it comes to a tool the way you would have done this before is you would click on plus and you would add in superbase we can't actually do that with hybrid search because the actual database query that we created has specific parameters that are required that we can't actually pass using this node so instead of this traditional node if you delete that and we can use just a direct HTTP request to our edge function so we essentially need to hook this up but to do that just click on plus HTTP request tool and then you can just copy out the parameters that you've set here so bring in the URL bring in the header off and then when it comes to the body the query is the name but when it comes to value just press this button so that the actual AI agent can define what it's sending into the Superbase Edge function and before we test this let's just do a quick tidy up so we need to give this tool a name and then for the AI agent itself let's set a system message only generate an answer based on results from the connected knowledge base and now let's ask a question what are the rules for the engine air intake all right and there you go we're getting quite a comprehensive answer back we're still getting all of the vectors and the full text search data so we need to strip all of that out because all of that's going to the agent and it's just not required for it to actually generate an answer so if you go back into Superbase and now go to the database function which is your hybrid search function and then with the three dots on the right click on edit function with assistant and then you can vibe code the changes with the AI assistant in Superbase update this function to output the content and metadata columns in the documents table and actually let's ask it to do something else also can you provide the rankings from the vector and keyword searches and that way then we can see which chunks actually performed better in which search before they were actually fused together in the main list yeah so it's getting the full text rank and the semantic rank so we'll click run query okay and we're getting an error which is there's a mismatch in the return type which you can actually see there what I'll do is I'll just delete this because it doesn't seem like it's able to change the return type via this AI assistant so I'll just delete the function and now if we run it it should create it successfully which it has done now if I refresh so we now have our new hybrid search with the correct return type so we've removed the actual embeddings being returned but we've added in this full text rank and semantic rank so let's now test it out back into NAN what are the rules for the engineer intake yeah and that was a lot faster we're getting a decent answer back and now if we double click into it perfect so we're getting our metadata we're getting the content and we're getting the rankings from the full text search and the semantic search and now if we test it out with some terms from the actual document so there's one here which is metal matrix composits MMC's and in theory full text search should work much better for an example like that whereas semantic search might totally miss the point and might pick up things semantically related to metal such as iron or steel whereas with full text search it's going to look for that specific thing and after triggering it the first chunk was both first in the full text rank and the semantic rank so in both searches whereas the second item returned actually finished third in both rankings item number three in the list actually finished fourth in the vector search and sixth in the full text search so you kind of get the idea here that these are different searching algorithms and then what determines the order of the chunks that we get back from the system is actually a fusion of both of these rankings and these types of search terms are great examples of where hybrid search and full text search works very well so if someone was searching for this specific term ISO16220 we drop it in here what is ISO 16220 and we get back a variety of chunks but if you look at the very first one you can see that the semantic ranking of this chunk wasn't even in the top 30 results where it was the top result for the full text search and if you look at it there there it is plain to see in the text so that's a perfect example of how vector search will produce a really bad answer for that type of query whereas full text search nailed it whereas if you ask a kind of a general vague question that doesn't have any direct matches to the text what is the impact of wind on an F1 car the actual regulations probably don't have it in that language they talk about air flow they talk about aerodynamics so yeah the first 10 results that we're getting back are simply just reflecting the vector search so it's not that the full text search isn't running in this query it's just that the relative scores that it's getting back are quite low so that when the two result sets are fused together they fall below the threshold that the semantic results are setting but then this is where re-ranking kicks in because this fusion of the two result sets isn't bulletproof so by actually putting in all of these results into a re-ranking model like cohhere you can get a really accurate ordering of these results and then provide the top subset of those results into an AI agent to generate the answer within our rag masterass on this channel I've built out that re-ranking system that you can see here so this is using coher 3.5 and it is this hybrid search system with superbase so if you'd like to learn more about this then check out my master class and I'll link it in the card above so that's how you can set up hybrid rag in N8N using superbase for both vector search and full text search onto our pine cone implementation i won't go through this build step by step i'll just show you how it's actually set up and here we're following the instructions that are listed in this hybrid search page specifically we're using a single hybrid index as opposed to having separate dense and sparse indexes this will make more sense when I go through it so to get up and running you need to create an index give it a name and then come into custom settings and you need to choose dense as the vector type you need to put in your number of dimensions i have 1024 set because I'm using the multilingual E5 large model and that has 1024 dimensions as you can see there and when it comes to metric you must choose product so Pine Cone only supports hybrid search in a single index if you use the product metric and now you should have a blank index and you'll be able to access the host for that index here so to talk through the ingestion flow I have a manual trigger here for the moment next I have a set fields node and here I'm setting the index host so this is essentially this URL so that's dropped in there and I've also specified a name space as well i've just set this as naden and then what I'm doing is I'm downloading this file so this is just a test file that I've used as part of this build it's this F1 technical regulations document which is quite large you can see that this outputs a binary file so then in my next node this just extracts all of the data from this PDF as you can see there and if we scroll to the bottom we have a field called text that is all of the text in this document so this is a machine readable document if this was a scanned PDF you would need to use the likes of Mistral OCR to actually extract the contents again I go through that in my RAG master class on this channel next up we need to do our own chunking of this document and while there is text splitters within N8N none of them are standalone nodes you need to bring in your own chunking logic and for this I vibe coded this with chat GPT i just asked it to generate a JavaScript chunking script where I can specify the different chunk sizes and chunk overlaps and here I've just specified I want the text to be what I'm getting from the PDF which is here so I just drag that in like that and that's basically what this code node does there was a little bit of a back and forth as I tested it out but nothing too complicated so at this point now if I trigger that you can see we've ended up with 1,152 chunks this is a 180 page document so that makes sense and then I'm looping through these in batches of 96 the reason it's 96 is that the dense embedding model that I'm using has a maximum batch size of 96 so that's why I'm doing that and you can see those 96 items in this batch have then flowed through to this next node and what I'm doing then is I'm simply just aggregating them all into an array so that I can go back to a single item and there we have it so if we jump into this now here we can see we have 96 items and here we have one item okay okay so next up I'm converting the format because here what I need is an inputs array that actually has text as an item within an object so this is what's required by the pine cone API this is just an array with strings whereas here now we have an array with objects that have a text parameter that's a string and now we can actually trigger the embedding of these chunks and I'm using Pine Cone's API to actually embed these chunks so that's using the pine cone.io/mbbed endpoint i'm passing my header off which is the XAPI key and then onto the body and here is what I'm sending so the model here is multilingual E5 large i'm passing the parameters input type passage that's pretty important because when you're actually inferring you need to pass input type query and I got stuck on that for a while and then the inputs then are what we set up in the previous node and you can see it there on the right so now we do the same for our sparse embedding the only difference is we're now passing a different model name this is pine cone sparse English v0 so we'll just press play on that and that'll generate our sparse embeddings and you can see they're a bit different so we have sparse values and sparse indices and this is the format that pine cone requires when you're actually upserting the data and then we have a code node to build out this vector array this pine cone endpoint to upsert the vectors requires a specific format and that's this format that I actually build in this node so I'm generating a unique ID for each vector i'm loading in the arrays of dense embeddings of sparse embeddings and of the chunks and then we're simply just looping through all of that and building out our new vector array that we can then send to pine cone and all of that is just then returned off the back of this node so if I click test step here that you can see that this is the exact output that it requires and then finally onto the upsert vectors node we're hitting this vectors upsert endpoint and then it's simply that JSON.output that we're getting from the previous node and now if I come into here and let's click test workflow it's downloading that 190 page document and then it goes through this loop upserts the vectors and then it goes to generate new embeddings upserts them generates new embeddings upserts them and this is essentially what happens with the default data loader in a standard AI agent and the chunking strategy except here we've just built it out manually it's pretty fast with these batches of 96 items okay and we're done and now if we come into Pine Cone and let's refresh and if you jump into any single chunk you can see we have the text of the chunk as well as the vector embeddings and the sparse values for the keyword search so then onto the actual inference phase we have our AI agent as I mentioned we can't use the standard pine cone vector store because that doesn't support hybrid search so instead we need to just call an NAN workflow and what we're doing is we're calling this exact same workflow which then means we can add this node here so essentially the AI agent is going to trigger this which is going to trigger this within that tool we're specifying query as a workflow input and we're getting the agent to actually define that automatically and that's done by just pressing this button here to let the model define that parameter and then over here if you look at this when executed by another workflow trigger we've had to add that query as an input field for this actual flow great so we now have the query from the AA agent showing up here so then we can do the same thing we're just going to set some variables which is our index host and our namespace and it's the same thing again we're just generating the dense embeddings and the sparse embeddings but the key difference is when we generate them we need to change the input type to query as opposed to passage so now if I click on generate the dense embeddings that has gone to pine cone it's sent in that query and it's generated out this dense vector embedding that we can then query with pine cone and then it's the same thing with the sparse embedding we get back a representation of that query and then we can query pine cone so this is hitting the query endpoint again passing header o and this is the structure of the query this is all documented in pine cone's API documentation so we're passing the name space the dense vector the sparse vector with its indices and values the top k so we want to get back 10 results we don't need the vectors back but we do need the metadata back because that's the text and this is what it looks like when we get the output so we're getting our chunks we're getting the score as well so this is the hybrid result set that's kind of the common score between both systems and then all of these chunks are sent back into the AI agent to actually generate its output okay so let's test it out can you explain the plank assembly rules okay and there's our result so 11 different rules and if we jump into the tool calling itself you can see the chunks that were provided to actually generate that text and if we search for say plank assembly or just plank you can see there's a lot of exact matches on plank and this is the real benefit of hybrid search is that if you search for very specific terms you're going to get much more results back with that term than you would if it was just semantic search i hope you found this video useful make sure to like and subscribe to our channel for more and thanks for watching and I'll see you in the next

Transcript for:Hybrid Search in AI Systems

Transcript for:
Hybrid Search in AI Systems