Implementing RAG with Azure SQL and AI

Hey, hi everyone. I'm Davide Mauri, a Principal Product Manager of Azure SQL. Today, actually right now, I'm going to talk to you about how to do RAG retrieval augmented generation on your data using.NET, AI, and Azure SQL. So let's get into it right away. So first of all, for those who doesn't know about Azure SQL yet, probably just a few, but just for those few, let's recap what Azure SQL is.

Azure SQL is the enterprise AI database. It is a modern relational database with full true multi-model support, means that you can store JSON, graph data, even XML if you still be using it, geospatial, column store, and query all these things together. It's scalable up to 100 terabytes.

You can have up to 80 vCore, 30 replicas, so it can really sustain any kind of work where you can throw at it. And of course, we are working on adding vector support. Vector support is in early adapter preview.

You can see the link on the slide where you can just go and submit your request to enter the private preview. The private preview is exactly what I'm going to use today to build the Rack solution. Not because you can't without it, but with the new vector support that we are adding, you will have just a much easier way to build a complex solution that today are common in our complex world.

There are other cool stuff that Azure SQL offers to developers, not to mention, of course, enterprise-grade solutions so you don't have to worry about security. We take care of it. Everything is encrypted.

You can encrypt the column. You can even encrypt the data flowing between the SQL Server or Azure SQL in your application in such a way that no one except the receiver can decrypt the data. Of course, as a developer, you probably want to connect with Azure services like Azure Function that we'll be using today, Static Web Apps that you will be using today.

or any other services, Stream Analytics, Fabric, Power BI. Of course, that is absolutely easy with Azure SQL. As a developer, again, you probably want to use the common libraries that everyone uses today from.NET EF Core, to Semantic Kernel, to long chain or any other libraries that you want to use.

Again, Azure SQL is integrated with those libraries, so you can just have a great experience no matter what you want to use. Now, the other thing that I want to discuss before moving into the demo is, what are the use cases for having AI into Azure SQL? So why you want to move AI to your data, instead the other way around.

Well, the first and most common use case that even today, I'm attending and actually speaking at another conference at the same time we are recording this session. Again, all customer come to me and ask, I have this very common situation. I have my data in Azure SQL or SQL Server.

And I would like to add the vector search, because I would like to be able to filter by, let's say you are an insurance company, and you have all your data already in SQL, so you want to find all the contracts that are related to a specific person or created by a specific employee. And within those contracts, you want to chat with your data, ask a question in natural language like, For example, find all the documents from Acme customer written by John Doe that are related to the new security policies they need to apply. You can do that in SQL because you will have like the author of the document, the tagger that you use to tag the document, and all other information already stored in your database in some column or some document in JSON format that you stored in SQL.

So adding vector capabilities and then AI support into database is the natural choice. for us and then for you to use because that will just make your life easier because you will add vector support and semantic search to the queries that you already need to run in order to filter out the data that are specific to a customer or something else. That's the most common use case, what we call a hybrid search where you mix vector or semantic search with full text, maybe you are already using it, and then with the regular filters.

Then of course the other use case is to store chatbot memories. For example, we have a release and integration with semantic kernel that allows you to build a chatbot and store the memories in Azure SQL, so then you can then analyze them, find what is the most common question. For example, the problem that is arising from your customers chatting with your chatbot. Of course, the other very common use case is RAG, retrieval augmented generation, which is exactly what I'm going to show you right now.

This is actually also the most complex pattern, so that's why I prepared a little bit of a high-level architecture and a high-level design. So, retrieval augmented generation is a sequence of steps, actually two, that you have to do so that you can take the data that is stored in your database, filter it using semantic search and thus vector search. So, let's say, for example, I have a database that contains all the sessions of a conference.

And let's say that I want to filter all these sessions based on some topic or some day that I want to go and attend to the conference. But then I also want to ask to an AI model, out of the sessions that are available today, what are the sessions that are about SQL and AI? And I want to do it in a natural language.

And I want to get the answer in a natural language. So the first part is where I do the hybrid filtering and the vector search. So I limit out of the hundreds of thousands of sessions that I have available in my database, I only limit the scope to the ones that are related to the question I'm doing, like today. The session must be today. And then once I filter that, and I also make sure that I filter only the the session that are somehow related to the topic I expressed the interest into, like for example, SQL and AI or security or anything, then I take those sessions and I send them to the AI model, like a chat GPT model, a GPT model, and then ask it to evaluate the question in the natural language and return to me the answer in natural language.

So the first step is similarity search, that's why we need the vector search. And the second step is sending to the AI model the result of the first step and explaining the AI model using prompt engineering, what it needs to do with the data I'm providing him so that he can provide me the answer I need. So that's what we need to build, and that's what we are doing. And we are doing it with the aid of some Azure services.

For example, we are using Azure Function because the moment someone writes a new or updates a new abstract or a title of a session in my SQL database, I need to immediately be able to be notified about the fact that the session has been added or updated, so that I can calculate the embeddings, get the vectors that are representing my topic or my session and store them back into SQL, because all the AI model, all the AI world works on vectors and embeddings. So we need to convert our text into those vectors and embeddings. And again, here I'm using an Azure function that will be automatically triggered by Azure SQL as soon as a new title or a title has been updated or the abstract has been updated.

Then we will be using, of course, Azure OpenAI because my function will take the text that has been updated in my database and send it to OpenAI so that it can be converted into a vector and then stored back in SQL. And then in SQL, I have a stored procedure in this case. Let's see.

that I want to use to search for all the session that are in a specific day or related to a specific topic. Once I have that function that started procedure already, I want to expose it so that it can be used by my front-end. Since I'm building a full stack application, and specifically a Jamstack application, my front-end is only able to communicate with my back-end using JSON and REST API. Unfortunately, my stored procedure is just in SQL. The problem I have to solve right now is that how do I expose that story procedure, but it could be a table or a view, as a REST endpoint or maybe even a GraphQL endpoint.

Well, typically, I would have to create the service and code everything by myself. But luckily, today, we have something called Data API Builder that allows my database to be exposed as a REST endpoint or a GraphQL endpoint just in a second using a configuration file, and that's it. And the beauty of Data API Builder is not only that it's... open source and available for free, both on-premises and in Azure. But in Azure, it's also integrated with Static Web Apps.

So if I'm creating a Static Web Apps, I don't even have to worry about deploying the API Builder and configuring it to connect my database. It will just happen automatically as part of my Static Web Apps solution. That's exactly what we are going to use. Then of course, my front-end in this case is just a React front-end, I will just communicate with my back-end, my Static Web Apps back-end.

using REST queries, REST calls, and JSON. So that's the application we are going to build. All this solution is something you can actually use right now, and deploy right now in Azure, because it's fully available on GitHub.

It supports AZD, so you can just easy do AZD app and everything deployed for you. So you can try it out on yourself. You don't need to use Azure if you don't have a subscription because everything also works locally, and that's exactly what we'll be doing today. So you can even try it without having to use Azure if you don't have it.

The only exception would be for Azure SQL because the private preview we are running where you can use vectors as types and vector function in SQL is only available in Azure SQL at the momentum. So you need to use Azure SQL. But luckily for you, there is a free tier for Azure SQL. It's completely free, forever free, that you can use so you can really try out the example without having to worry about costs or money.

Now, before going to the demo, let me show you how it works and you can actually try it on yourself right away. Because this website that I built for another conference where I was in I was invited to talk. I was allowed also to keep it running, so I can use it as a kind of a reference architecture. And here, let's say I want to learn about a session on security.

And what I want to do, I already loaded my database with a session of that conference. Maybe I want to ask, is there any session on security? And even if I misspell it, thanks to the fact that the data is being converted into an embedding and a vector, and then...

semantic search is done in Azure SQL using vector search, and then everything is sent to the iModel, the iModel is still able to understand my query, or in this case, my question, even if I didn't spell everything right. And yes, there is a session on security. The session is titled How to Build Secure, and blah, blah, blah, you can read it for yourself.

And you can see we have all the data here. Now imagine doing that on your own data. So the example I'm going to show you right now is of course we don't have The full day here, so I'm going to show you the important part of the architecture.

So you can take the architecture and change it to run on your own data. So let's dive into the example, starting from the GitHub repository, that you can reach out from the little GitHub icon here. So that's the repository, and that you can clone.

And that's, of course, what I've already done on my machine. And this is the full structure of my example. Let's start from, well, the database. You have all the script here. I already deployed.

The interesting part I want to show you is that tables are super simple. We have the session table, and we are going to store the embeddings and thus the vectors directly into SQL for now as a binary object. And then we have the speaker table, and of course here also we have the embedding and the vector associated with the speakers. And then we have a very simple relationship. table that connects the session to the speaker who are delivering the session.

It's a many-to-many relationship, so we need this associative table. That's it. Just three table, nothing complex. The only additional stuff is the ability to store embeddings right into Azure SQL.

Once I will have at some point the embedding in my table and I show you how, then I want to find the session. I want to find the session that is about SQL and security. So I have this story procedure that will take my text, will call OpenAI using this get embedding story procedure, that again you have available, you can see how it is done, but it's very easy.

It's just calling OpenAI through SP invoke external rest endpoint, which is a story procedure that allows you to call an external rest endpoint, for example, OpenAI. And then it will convert this text into an embedding, which is needed to do vector search. Then we'll take our vector that is stored in a binary format, and then we can calculate the distance between our vector that represents the topic I'm interested in, and I can compare its distance to the old embeddings already stored in my database that represents the session that I have in my conference. I will take the session that are closer, the topic I'm interested in, and then I will return the result to my query to my user.

And this is a stored procedure, so that's it. That's the only thing I need to do. Since I have Data API Builder, that will help me to make this stored procedure a rest endpoint without writing code, but just by configuring it.

And the first, before going to how to expose this rest endpoint, let's play with this data a little bit. Right now, my database is completely empty, right? So the first thing I want to do is to add some data and make sure that my data will be automatically converted into an embedding because You will see that here I have some sample data here.

You will see that here, I'm just adding my text. I'm not really worrying about adding a vector. A vector is a very long sequence of numbers that will not be able to type in here. I need an AI model to generate it for me. So I need an AI model to take this content and turn it into a vector.

So in order to do that, I have a function here. This function is using something called, a SQL trigger, that means that as soon as there is a change in the web sessions table, this function will be fired, and we will get information on which row were changed, or inserted, or deleted. And then we can basically take this I read only list of rows, and in this case I'm writing a link you query, so that I can take the title of the changed or updated row, or inserted row, the abstract. create one payload and then ask it to be processed for changes, which means in this case I am just calling getEmbeddingAsync on my OpenAI client. So I'm using the OpenAI SDK to call OpenAI and get the embeddings of my text.

And then I'm saving back the embeddings that I get into my database. Let me show you how the function works behind the scene. And I just need to start.

Static Web Apps because I'm using Static Web Apps, so I have the frontend and the function already. So let's just do swap start. This will start the function, will start the API builder for me.

As soon as the function is started, it will be monitoring my database for changes. So I will be adding some sample data, and as soon as I will add the session, it will be sent to OpenAI and convert it to an embedding. So it just will take a few seconds to start and then as soon as, here we go, we have the trigger ready. So what I can do right now is, let's just make sure that everything is ready here. It should be.

So let me just go on and add the speaker, the session, and the connection between speaker and session. And that's it. And now here you will see that, yep, the processing is starting. So. the function detects that there is something to be changed, there's something to be processed, it's connecting to OpenAI, getting the embedding.

And then this means that I can, as soon as this is done, I can go back to my table and I will see that even if I didn't add any embedding here, when I query my session or speaker stable, yeah, you see, I have an embedding. Now, I have the vector, and of course, vector is saved in a binary format, so I can do vector search using the function vector distance that we just added to Azure SQL in the private preview to find the closest session to some topic. So let me add another session and so also this will go through the entire process of getting the embeddings and you will see here that again there is something new detected so it's connecting to OpenAI getting the embedding. And in just a few seconds, my session is converted to a vector. Now that my session is converted to a vector, I can go to the local website that's been created for me, localhost 4280, and this is the application I am running locally.

So my cool conference rag sample. If I click on about, it automatically recognizes there are two sessions indexed. Why it automatically recognizes there are two sessions indexed?

Because if I open, Data API Builder behind the scene that is again running automatically because it's integrated with the static web apps. You see, for example, that I have my database is exposed and I decide to expose the session table, for example, as a GraphQL and REST endpoint. So it's very easy for my front-end application to query the table or to query my database just using regular REST and GraphQL actions. For example, I can have a GraphQL query or I can just execute the story procedure using the find endpoint. Okay.

So this is all free for me because I have Static Web Apps with Data API Builder integration, which means I can just focus on my front-end, my website. I'll need this, so I can just go back to. And now, for example, I can ask some question like, is there any session to learn how to use AI on my own data?

So, what is happening here is that my Yeah, this is interesting. So, this is hallucinating a little bit this time, but basically, it will be using all the data that I have in my database and basically running the query and returning to me the answer based on the data that they have in my database. How this is happening is because this ask button here is actually Going to this specific function, here we go, chat handler, where I am doing a couple of interesting things. And here you can see the rag pattern at work. Here I am first of all retrieving the similar session, so I'm executing in this case directly the web find session so it's easier for this demo to show what's happening.

And I'm getting the result of find session. based on the topic I'm searching for like security. And then I'm getting the result of the query, and I'm sending it to my AI model, basically concatenating the result by taking the title, the abstract, the speaker, and separating them by a pipe. Why I'm doing this? Because I'm doing some prompt engineering, and I'm saying to my AI model, hey look, you are a system assistant who helps user to find the right- session to watch from the conference.

How do you know which session are available in the conference? Well, because I'm providing them to you in this format. So, I'm providing the instruction on how the model should behave. I'm providing a list of the session to the iModel using this format.

Then, of course, I'm also providing the query that the user typed. So, the model has the information on how it should behave, the context in the and the question from the user, and it can actually answer us by telling us what are the session that are about some interesting stuff. So, let's go to the demo.ai. So, here I can say, for example, another question could be, is there any session by Davide, for example. and it will be able to scan the session, doing vector search using this query, this text.

Then here we go, then it will get the session after we scan the database for similarity search, and then the AI model will actually understand the question and answer as in natural language. This is all coming from the AI model. So with these steps, you can really easily have static of app function. Here is the client. I'm not going into the client because it's just a regular React client, not super interesting.

I would say the most interesting part is that, for example, thanks to the fact that database is exposed as a rest in point as I showed before, calling the stored procedure from the front-end is as easy as doing a simple actually post using the fetch command in JavaScript. So that is what make the API builder super nice because it takes your database and expose it as a rest endpoint. So it means I can easily use it from JavaScript or TypeScript without having to worry about drivers in anything. And how do I expose the storage procedure?

That's the last thing I want to show you. It's using this swadb-connection folder. When you configure static web apps to use Data API Builder, you need to have this configuration file that will be created for you. And here you just specify that that you want your storage procedure, not this one, but this one, your storage procedure to be exposed at the slash find subpath.

So anything that will go, will be routed to Data API Builder, and this is something that automatically happen in Static Web Apps, and that's why here you see, if we go back to the client, you see that I'm in the slash Data API Builder because that, Data API, because that's is how Static Apps knows that it has to route this question to the API builder. And then the rest end of find, because we are using the rest endpoint, and find because that is where my story procedure has been published. So this is super easy. I can have my story procedure published as an endpoint, and then I can just focus on the front end.

So front end, as I was saying, is in the client folder, is all React. Then the database in the database folder, the function that takes care of... getting the title and the abstract and monitoring the table and turning them into embeddings is in the session processor. The chat handler is where you actually are doing the rag pattern.

So you basically, first of all, retrieve the similar session here, explain how the model should behave with this system message, and then get the result after adding the user prompt, so that you can actually basically send the data to the iModel and get the result and send it back. as a JSON payload that can be consumed by the front-end. Then the last part is SWAT-DB connection where you actually configure database to be REST and GraphQL endpoint. Let's see, that's how you build static web apps rag solution with static web apps and SQL.

If you are interested in using vector search in Azure SQL right away, there is in here, This is the website you have to go in order to be able to request to be part of the early adopter preview. And as you saw, we are adding the ability to store vector in binary format. But most importantly, you have the option to calculate the distance between vector, which is the clue of doing similarity search using this new feature called vector distance that allows you to calculate distance.

between two vectors using cosine, Euclidean, or dot product, which are the most common way to calculate distance between vectors. With that said, my recommendation is to go to the website, to the GitHub repository, download it, run on your machine, understand how the architecture is done. Everything is well explained here. Again, with all the details that you need to do, and then start to run it on your machine, and then add your data and see.

where you can go from there. You will see a lot of interesting options and a lot of fun. Thanks a lot for listening.

See you next time.

Transcript for:Implementing RAG with Azure SQL and AI

Transcript for:
Implementing RAG with Azure SQL and AI