You've heard a lot about LLMs, but what about small language models? Learn all about it and how it works with Azure SQL database this week on Data Exposed. Hi. I'm Ana Hoffman, and welcome to this episode of Data Exposed. Today, I'm joined by Muzma, a leader on the SQL team. Muzma, thanks so much for joining us. >> Thank you, Ana, for having me. >> It's great to have you. Today, we're going to be talking about something I don't know a ton about. I'm really excited to learn about small language models. That's right. SLMs? >> Yes, SLM. >> Tell us all about like, what are these things and what does SQL DB have to do with? >> Thank you, Ana. Small language models is something that Microsoft, a few weeks ago, announced their Phi-3 model, which is one of the small models. As you know with the large language models, this is a new family of models, and there's a lot of benchmarks out there between these models and also some mid size models, and also large language models. What is a small language model? Phi-3, for example, from Microsoft is a 3.8 billion language model developed by Microsoft research. This family of Phi-3 model actually comes with three options, the Phi-3 mini, small, and medium, and there is a wide range of different options for parameters. I'm going to show you the example with Phi-3 mini. >> Cool. What are some considerations people should think about between these three models or in general, with Phi-3? >> In terms of considerations, this is a common question we get from customers. Why should we consider a small language model versus a large language model? The first question you need to answer is, are you going with the RAG pattern which is retrieval-augmented generation versus a fine tuning, where you have your own data, and you want to customize the model with your own data set, and I'm going to show you example of a fine tuning with small language model. The other thing is latency. Of course, if you're sending your data to a large language model which is hosted by, let's say, OpenAI or somewhere else versus if you're hosting a small language model right next to your data, so there is definitely a gain in latency for small language model. The total cost of ownership. How much you want to pay, and what is the total cost. Of course the optimizations that you want to do to your code. Those are some of the considerations that you have to consider when it comes to small language models. >> Cool. Generally, with the small language models, you get lower latency, because you can put it closer to where your data is. Is it cheaper because it's smaller? >> It is cheaper in terms of that, but it depends on how much computer you need to host that right. It is smaller, so the computer is lesser as well. These are all the different advantages in addition to that for small language models, for example, they are lightweight. The Phi-3 mini can be uploaded in your mobile device. You can really take it with you wherever you want or you can fine tune on your data. There's of course, we talked about latency. One other factor that I really would like to mention is the sustainability. Of course, it has lower footprint carbon dioxide footprint, and of course, the cost effectiveness. >> Cool. Awesome. How does it work with like SQL database? >> What you have to do is, from your SQL database perspective, you need to understand, what data you need to really train this model on. You can take the model currently as it is, and then just run with it or you can fine tune as we talked about that. In terms of the fine tuning, you need a compute node where you load this model, then you run the fine tuning, which is a set of python scripts or one big script, and then you can store the result back in SQL database. I will, again, show you some examples of how to do that. >> Cool. Awesome. >> In terms of the overall architecture, again, a question that I get is, can we use other models? You can think about any model that is out there which is a small language model. Mistral-7B is another one which has seven billion parameters that you can use. Again, you can take the data out from SQL, train your small language models. Then you can use this orchestration engine called Ollama and host your model in that. I'll show you an example of how to really host your fine tuned model there, and then just using prompt you can see if your model is really doing the things that you really wish to get the results for. >> Nice. I'm ready to see this in action. >> Let's do this. In this example, I have a Phi-3 model which is fine tuned on SQL documentation, probably taken from SQL documentation pages, and also stored in SQL database. I'm going to show you end result, and we are going to talk about how it was done. This is open web UI. It's a docker-based front end running on Olama, which is currently hosting the fine tune model. In this case, it's a very nice chat interface where you can just go and select the model. This is the Phi-3 database fine tune model. I'm just asking a question like, does Azure SQL database support factor functions? Since Phi-3 does not have this information, we were just announcing it this week, so this means that it is picking up the fine tuned data that we have really fine tuned it on. >> Awesome. That's super cool. How complicated was it for you to get this whole thing up and running. >> The fine tuning process is a little bit intensive. You need to understand how the model works. There're different techniques. But once you have the script running, you can further tune it, and do that. The part to actually host and make it run is less than five minutes. I can quickly show you how to do that. This is my Olama environment, and if you actually go here and basically just see the model file, this is my model file where I'm actually taking the Phi-3 quantized model. It's a PDF format. What I'm doing is I'm actually adding my adapter file, which is the fine tuned output of the model that I have fine tuned. Then I actually defined some other additional parameters such as the temperature of the model, how creative you want the model to be, what type of results you want to see, and then you just save this model file. You can just run the Ollama run, provide it the model name, and then it gives you the prompt, and I'm going to ask the same question as Azure SQL database support vector functions. It will just go into training information, and hopefully gives the same answer which is that, while the specific details of new vector features are get to be announced, Azure SQL is known for its continuous updates, and then it will give you a wide range of options that are available there. >> Awesome. Cool. This is really cool. Muzma, as folks who are just getting started with this, what do you recommend? How they go get started? How do they learn more? >> We have the AKA MS link. Definitely checked it out. It's aka.ms/azuresql-slm, and you can always find all the SQL samples at aka/sqlai and sqlsamples. >> Awesome. Cool. Muzma, thanks so much for coming on the show. I learned a lot all about SLMs. We saw one in action. We saw it running locally. Thanks so much for coming on the show. To our viewers if you like this episode, go ahead give it a like, check those links. We'll put them in the description for you to learn more, and leave us a comment and let us know what you think. We hope to see you next time on Data Exposed.