in this video I'm going to show you how you can create your own llm classification system in five simple steps now if you don't know who I am my name is Dave abar I'm the founder of data Lumina and I've been building custom data and AI solutions for the past 5 years and next to that I create educational videos like this to help you do the same and ultimately get started with freelancing so let's dive in all right so how to create an llm classification system in five steps now for context in this video I am going to walk you through some examples of classifying text more specifically tickets from a customer care ticketing system that we are going to emulate so the goal is given these tickets to put a label on them to categorize them and later as you will see also add other metadata and information that is relevant to the system now that said while we are going to use text right now with the capabilities of open Ai and other models you can also apply these same PR principles to classifying images or classifying audio you can almost classify any data at this point using the approach and the techniques that I will share in this video and to put this into perspective this used to be a pretty hard problem to solve classification in general one of the I think most well-known examples is your email inbox where in the background there's always running a filter of spam and no spam to Route emails and you used to be a machine learning engineer here to create classification systems in order to do it really well but now everyone including you can build systems like this in a few lines of code so that's what I'm going to share with you today so let's start with the two tickets that we have over here we will also be using the open AI library for this and we will do this example in Python but you can do this with other language models as well and you can also use almost any programming language to do this so let's start up an interactive python session over here and load the tickets here into place now what I'm going to do is I'm going to show you first an example of how you would maybe tackle this problem so we use open Ai and I'm assuming here that you're already familiar with uh open aai chat completions uh so I'm going to skip over that uh a little bit here and just assume that we have a system prompt over here and it says classify the following customer support ticket into a category so we leave it very open and then as for the role we insert the ticket text over here as the content so we create this little function we can run that over here on the right and then print that and what we will get is category it will say order issue the thing that you can immediately see over here is that even though we can we can really improve this with prompting techniques Etc but I just want to show you a plain example this from a system design perspect perspective is not ideal because we just ask it to classify the ticket into a category we have no idea what all the potential categories could be and also we see that it it returns a response with category and then order issue so if we for example would now take this and would put this into uh a system we would first have to parse it right we have to get rid of the category because we only want order issue that's the value that we want for example to put in the database so right now it's still the same but let's say we change the temper temperature over here change the temperature to one run that one more time and here you can see we have a completely different response order issue incorrect item received so when we just use this approach there are a couple of drawbacks so no structured outputs making it difficult to really integrate into automated systems also no validation leading to inconsistent categorization we have limited information extracted we only have the category we're also missing important details so we would maybe have to increase the prompt or do multiple uh calls to the large language models for there's also no confidence score making it hard to flag uncertain classification for human review so there are a lot of things that we can improve so let's continue with that and get into really step one and that is to get clear on your objective so pretty simple uh you can even do this without the coding but from a system design perspective it's it's really important to think about your classification problem and think about where it fits into the business context and also what other type of values or metadata or keywords you potentially want to extract from the data to overall improve the system so for this uh classification problem I've identified the following objectives so we want the category and we want it to be accurate we want to also assess the urgency in the sentiment of a ticket we want to extract key information for quick resolution and also uh we want the system to provide a confidence score to flag uncertain cases for human review so that's the objective then also let's consider the business impact so why is this relevant to implement why is this relevant to understand we want to reduce the average response time by routing tickets to the right Department improve customer satisfaction by prioritizing urgent and negative sentiment tickets we want to increase the efficiency by providing agents already with the key information up front and we want to optimize Workforce allocation by automating routine classifications so that's the first step but now let's get into the Second Step which is a little bit more tactical and that is using the instructor Library so let me make sure we import it over here as well you can do a simple pip install to get the library and this is literally my secret weapon to build reliable large language models application that can make it to production so if you're new to the instructure library I highly recommend checking out the website docs it's available in different languages and in essence instructor makes it easy to get structured data like Jason from large language models and you can see some examples over here but we are going to use data models and more specifically penic data models to really first get clear on our objectives that we have identified and then also make sure we put validation in place using the instructor library to really ensure that our system is robust so it's really easy to get started with instructor because we can just patch the open aai client with this simple example over here and now we have a client that is instructor ready and we can provide it with response models so before we can do that we first have to define those models so that's going to be step three so let's now get into that all right so what the following code will do is it will Define structured data models for the classification and also for the sentiment and the urgency now if you're new to Python's data models or ptic this can be a little bit confusing in the beginning maybe so I'll give a general overview but if you're completely new I highly recommend to just look further into the penic library but what we're going to do is we're going to use a combination of enums so enumerations and a penic model that's inherited from a base model so let's first look at the C categories and what an enum essentially allows you to do is let me make sure to run this one more time make sure we have everything in memory and let's look at the ticket category first what we can do is we can first tackle the problem of okay what are the predefined categories that our system can accept and if we now look at that and once we have defined that ticket category we can run that and we can see if we put in order issue we correctly get the ticket category uh model back of type order issue so that works fine but now if we say for example General it will hit us with a value error because General is not a valid ticket category and what this allows us to do is really get clear on what does our system accept and if open AI or any other language model now is going to get creative generating other types of categories it this our system will return a value error so that's how the enumerations work and we can do the same for centent we can do the same for Ticket urgency so these are just some examples so those are the enims so make sure we have those in memory now let's look at the ptic model over here and you can see that this is a penic model because uh we plug in the base model here that we import from penic and because of this we can leverage all of the powerful data validation features uh that are part of the penic library so what we can now do is we can define a data model where we have a couple of uh keys in here where we have the category the urgency and the sentiment first of all which are expected to be enums of the following classes that we have just defined then we also have a confidence score we add a description in here so this is going to be the confidence score of the class ification and we also explain here or or put in place that it can only be a value it can be a float and it can only be between zero and one meaning that if we put a different value in there I will show you in a bit we will also get an error we also Define the key information in here which is a list of multiple strings and then finally we have the suggested action which is just just a string which is a brief suggestion for handling the ticket so let's see what that looks like if we create an object like this so we create the ticket classification object sorry make sure we have that in memory as well and then create the ticket classification object so we can now have a look at this and we can see all of the information that's in here but now like I've said this uses penic under the hood to validate all of this so let's say the language model things the confidence score needs to be between 1 and 100 let's see what the model will say validation error input should be less than or equal to one that's what we have specified over here now what happens if we for example do something else here where we say General this is a general inquiry the model will say General is not an attribute that we can use over here so I hope by now you kind of like understand high level what we can do with these data models and how we can use it for validation so the next step is to take these powerful features of penic and combine those with the powerful features of large language models and that's really where the instructor library is going to come in because instructor allows us to very easily do this all right and then now here in Step number four we bring everything together in a single function and now if you look at this it's almost identical to what we started with but there are some subtle differences here that make all the difference and that is first and foremost since we have now patched the openai client with instructor we can now put in a response model that accepts a penic data model so we can now tell open AI tell our system that we want to receive a ticket classification data model from our system and what the response now will also do so we still call client. chat. completions so similar similar to how you would normally interact with openai but now instead of getting a chat completion back we get our response model back so we get an actual ticket classification object back and in doing so we immediately put that validation in place because we've we know because we've just seen if we try and plug any information into this sticket classification model that is not in line with the spefic ification it will throw an error and the cool thing about penic is that the errors are very easy to interpret by humans it is very good at using natural language to explain what's going on and what the error is and because of that we can use that as feedback for the large language model to iterate on that and that's where this next parameter comes in Max retries so this is another parameter that is now available because we've patched it with instructor and what this allows us to do is if a question comes in if a query comes in and open AI returns a response and we cannot load that into the ticket classification uh object it will throw an error it will then feed that error along with all the other context back to the API and then it will self-correct and since these errors are in natural language this TS to work really well and usually with even little retrice or uh no retrice or very little retri you can make pretty robust systems with this now you do have to keep in mind that multiple queries could uh increase the cost and the latency Etc but for now we're just looking at how to make it really robust so if we bring all of this together and now run this system using ticket one and again let's quickly have a look at what ticket one was ordered the laptop from the store order number uh received the tablet unacceptable I need the laptop for work urgently please please resolve immediately so this is a pretty like heavy problem urgency seems to be high let's have a look at what we have over here so we can take a look at result one and we can now see that this is an actual ticket classification object and now we also know for sure that it is in line with the information that we've specified otherwise it would have given us an error so we can also print that by calling the model. dump Json and we can have a look at all the information that's in here so the category is order issue urgency is high it's angry very high confidence here's some key information customer ordered the laptop but receives a tablet uh needs the laptop for work customer is considering disputing the charge is not resolved immediately suggested action apologize to the customer you get the idea now this is a very cool classification system now imagine being part of a customer care team and this is the type of information that already comes your way and depending on whether it's an order issue whether it's an angry customer an angry customer you might immediately uh hand over to the manager who for example uh is allowed to give refunds or whatever and maybe a more Junior Customer Care Rep um would not be able to handle those type of tickets it's not not authorized to do so so purely based on that department seniority you could already start to Route these tickets and also store this in your um let's say in your ticket system or into a separate database where you now all of a sudden next to just this ticket also have a lot more metadata and information that you can use to perform analytics so you can track the categories you can track the frequency you can track the sentiment over time you could create a whole dashboard solely based on these metrics over here and this is also just the start because you can infinitely expand the amount of data that you want to extract for this system depending on your use case that's why if we come back to let's say step one get clear on your OB objectives why that's so important because you can see this opens up so many possibilities over here so let's have a look at result number two and run that so we run the system and here we go so uh ticket number two let's have a look at what we were dealing with so having trouble to log into my account resetting the password it's not working can you please help me you've been a loyal customer for years and have several pending orders it's pretty serious so let's scroll down what we have account access urgency High customers frustrated not angry but frustrated high confidence score and we also have the information here with a suggested action so that is step four of bringing all of this together into a robust system but now that we have this running there's one more thing that we can do to improve it all right so then the final step five now that you understand these principles is optimizing your prompts and experiment because there's a lot you can do here so first of all we could look at some prompting structures and this is just an example here where we provide some contacts some tasks we really identify the role uh we we have some object some some key things to remember over here so this can also help to provide context and this will definitely change the outcomes in certain situations so for example you could Define what what defines urgent for your company what defines angry you could give examples of the different kind of categories so that is all uh those are all elements that you can put into a system prompt now also of course you have to data models itself so you can experiment with and expand the models to get more or less information from the system and put maybe even more validations into place and another one that you can experiment with is different models so in the first example I use GPT 40 but simple classification tasks smaller models are generally also very capable of doing that and of course this will decrease the latency and it will also Al reduce costs because it's important to consider that these type of intermediate metadata classification types of tasks typically not always are but are part of a larger system as a whole where you for example have various steps throughout the system where you leverage AI so a next step for example in this system could be to already for certain tickets already generate a reply so for example if the model has identified that it's a general inquiry or simple inquiry whatever then throughout your system you've identified that those are the types of queries that the AI can perfectly answer but if it's more complex for example a complaint about the customer receiving a tablet in case of a instead of a laptop that is probably not something that an AI can handle so you want to Route it to a to a human so that's why you can consider experimenting with various different models where for the generation of the reply you might use a more sophisticated model but for the simple classification you might use something like GPT 5 turbo or clot um and or entropic high coup and experiment with that to see how the overall system performs so if we look at this same example using GPT 3.5 turbo we can run this over here let's wait for it to finish and then we have the results so we can still see that in this case although it's it's an pretty obvious example but it's still categorize it as order issue urgencies High angry high confidence score you can see there is slightly less key information over here so that's an interesting thing that we can spot um and again we have account access High frustrated so the key variables over here are the same for both the more powerful model and the more simple model so that's how you create a large language model classification system in five steps and really make it production ready these are the types of techniques and principles that we use in the applications that we build for our clients all right and that's it for this video and now by the way if you're a developer and you want to get started with freelancing but you struggled to find clients then you might want to check out the first link in the description it's a video of me going over how my company can help you to solve that problem now in all transparency it's a funnel designed to get leads for my company so please keep that in mind you don't have to click it but if you want to get started with freelancing and need a little help with that you might want to check things out and now if you found this video helpful please please leave a like and also consider subscribing and then if you want to learn more about how we at data illumin our company find build and deploy these generative AI applications then make sure to check out this video next where I go over our entire process that we follow right now