Efficient Fine-Tuning with Unsloth

This is amazing. Now we are going to fine tune using Unsloth. Using this tool we are able to fine tune Mistral, Gemma, Lama 2 five times faster with 70% less memory. It supports various different models as you can see here. 0% loss in accuracy. It works on Linux and Windows via WSL. It supports 4-bit, 16-bit, Q LoRa and LoRa fine-tuning. You can see for all these data sets it's a comparison between Hugging Face and unsloth open source. It is twice better than hugging face. That's exactly what we're going to see today. Let's get started. Hi everyone, I'm really excited to show you about fine tuning with unsloth. We are going to fine tune Mistral 7 billion parameter model. This is how you get a response without fine tuning. When you ask a question like this, what are the tips for a successful business plan? But after fine tuning, you'll get a response like this. To do this we are going to use OIG dataset. It is a dataset for instruction follow-up. I'm going to take you through step by step on how to do this but before that, I regularly create videos in regards to artificial intelligence on my YouTube channel. So do subscribe and click the bell icon to stay tuned. Make sure you click the like button so this video can be helpful for many others like you. First we are going to load the data, load the model, then we are going to compare how it looked like before fine tuning and after fine tuning and finally we are going to upload the model to Hugging Face. First, conda create hyphen and unsloth python equals 3.11 and then click enter. Next, conda activate unsloth and then click enter. Now pip install huggingface hub ipython then unsloth conda with a git version. So I'll put all this information in the description below and then click enter. Now export hf underscore token equals and then the huggingface token which you can generate from huggingface website and click enter. If required, you can also use huggingface hyphen cli login and then click enter. Then enter the Hugging Face token. This is required if you are planning to upload your model using CLI to Hugging Face. You can see the token is valid and login successful. In this tutorial, we are not going to use CLI to upload, but this is just for your reference. I'm using NVIDIA RTX A6000 and use 47GB of RAM with 6 virtual CPU. Now create a file called app.py and let's open it. First import OS, then import Fast Language Model. from Unsloth, next Torch, SFT Trainer, Training Arguments and Text Streamer from Transformers, then loading the dataset. So the first step is to load the dataset. First we are entering the URL where the OIG dataset is present and it is in JSON-L format, which means every single row is a JSON object. And this is how it looks like with the text as the key and value towards the end. So if I take only one row, this is how it's going to look like. So with text, metadata and source. So if I extract only this line, this is how it's going to look like with a human tag at the bottom and a question. What are some tips for creating a successful business plan? And we are teaching the large language model. This is how it should respond with point number one, point number two, point number three, point number four and similar approach. So similarly, there are different types of instruction in this data set. So we are going to tune our large language model. using this. Now let's load the dataset using dataset function. It's in JSON format. And here we are mentioning split train. So generally, we can split the data for training and testing. Generally, it's nine is to one ratio, which means 90% of the data will be used for training and 10% used for testing. So now we are going to load only the training data. So step number two is a load the large language model, which is Mistral. We are defining the max sequence length and then defining the model and the tokenizer using fast language model dot pre-trained function these models are unique to onslaught and i am loading a 4-bit quantized version which means it's much easier to load it on my computer next i'm going to define fast language model for inference and then passing the model this will speed up the inference speed if you want to generate a text now we're going to compare how it looked like before training so we're going to create this function called generate text here same as before We are entering a text that is converted to numbers using tokenizer. Streaming is just used for streaming the response. Then those tensor numbers are sent to the large language model and the large language model will again generate the output. The output will be again numbers and those numbers will be converted back to words using tokenizer. So the words convert to numbers using tokenizer. Then the large language model use those numbers that is tenses. and give us the output that is again numbers which are predictions and those numbers converted back to words using tokenizer that's what happening here now prints before training and then generate text what are the tips for successful business plan now we're going to run this code we need to install one more package which is unsloth colab this will include many of the packages which are required so i'm going to enter that now it's installing other required modules now python app.py and click enter it's printing out the response it mentions it's unsloth using fast mistral patching release So as you can see in the response, we asked what are the tips for a successful business plan and it is giving us a continuous completion statement which is not relevant to what we want. We want the business plan in points. Now we have completed loading data, model and before training. Now we are going to train the model and then upload to Hugging Face. So step number four, training. We are going to do model patching by adding fast lower weights. I will tell you what does that mean. So here we are defining the model. We are using these projections which I have already covered in the previous video which I will link that in the description below. So these are the ways which we are going to fine tune and we are defining all other variables. You can change this based on your requirements. Now we are defining the SFT trainer that is supervised fine tuning trainer. So here we are defining the model, the data set. That is the data set which we are feeding to train and then mentioning the text field which you can see here. Next we are providing other values using atom optimizer maximum steps I'm giving 60 to keep this tutorial short but you might need to increase these things to get a good model and finally defining the trainer.train this will automatically train the model. Now after training I'm going to see how it's going to perform so the same function with the newly fine trained model. Finally we are going to save the model in the outputs folder. Next we are going to save the merged model. Save pre-trained merged. which is going to be saved in outputs merged folder and it's in 16-bit. So what's the difference between this and this? This saves only the adapter. That means you have a base large language model Mistral and use the adapter and you will get the same response. So each time you want to run the model you want to load the base model that is Mistral and then load the adapter to ask a question. But after merging both the adapter and the base model will become one. So you don't need to load two different things at the time. Now model.push to hub merged. I'm mentioning the model path and then save method merge 16-bit and getting the Hugging Face token from the environment variable. Similarly, going to push the LoRa model. That is only the adapter in this location with the token from the environment variable. That's it. So overall, we loaded the data set. We loaded the model. Then we configured for training using SFT trainer. Then we initiated the train function and finally we are pushing those to hugging face. We are going to make a slight modification in the code. So we are going to disable this text streamer. For some reason I am not able to compare before training and after training. It could be because of some catch or the way we structured the code. Now I am going to run this code in your terminal pythonapp.py and then click enter. Now we can see it started training. Ultimately we need to get this loss lower. Here we have given one epoch and... totally 60 steps. It's better to increase these values. It's nearly around two minutes now and it is completed. So after training now this is what it's printing. What are the tips for a successful business plan and it's in points format. That is exactly what we expected. Now you can see it's getting uploaded to Hugging Face. The model upload completed and here is the uploaded model in Hugging Face. You can use this using these commands. So we're going to test this model. Same like before we are loading the model which you can see here the merged version and we are going to generate text by asking this question what are the tips for a successful business plan and I'm going to run this code in your terminal python test.py and then click enter. Now it's downloading the model and here is the response what are the tips for the successful business plan and gives us in points. This is exciting. Now you can fine tune much faster with less memory. I'm really excited about this. I'm going to create more videos similar to this so stay tuned. I hope you like this video. Do like, share and subscribe and thanks for watching.

This is amazing. Now we are going to fine tune using Unsloth. Using this tool we are able to fine tune Mistral, Gemma, Lama 2 five times faster with 70% less memory.

It supports various different models as you can see here. 0% loss in accuracy. It works on Linux and Windows via WSL.

It supports 4-bit, 16-bit, Q LoRa and LoRa fine-tuning. You can see for all these data sets it's a comparison between Hugging Face and unsloth open source. It is twice better than hugging face. That's exactly what we're going to see today.

Let's get started. Hi everyone, I'm really excited to show you about fine tuning with unsloth. We are going to fine tune Mistral 7 billion parameter model. This is how you get a response without fine tuning. When you ask a question like this, what are the tips for a successful business plan?

But after fine tuning, you'll get a response like this. To do this we are going to use OIG dataset. It is a dataset for instruction follow-up.

I'm going to take you through step by step on how to do this but before that, I regularly create videos in regards to artificial intelligence on my YouTube channel. So do subscribe and click the bell icon to stay tuned. Make sure you click the like button so this video can be helpful for many others like you.

First we are going to load the data, load the model, then we are going to compare how it looked like before fine tuning and after fine tuning and finally we are going to upload the model to Hugging Face. First, conda create hyphen and unsloth python equals 3.11 and then click enter. Next, conda activate unsloth and then click enter. Now pip install huggingface hub ipython then unsloth conda with a git version. So I'll put all this information in the description below and then click enter.

Now export hf underscore token equals and then the huggingface token which you can generate from huggingface website and click enter. If required, you can also use huggingface hyphen cli login and then click enter. Then enter the Hugging Face token. This is required if you are planning to upload your model using CLI to Hugging Face.

You can see the token is valid and login successful. In this tutorial, we are not going to use CLI to upload, but this is just for your reference. I'm using NVIDIA RTX A6000 and use 47GB of RAM with 6 virtual CPU.

Now create a file called app.py and let's open it. First import OS, then import Fast Language Model. from Unsloth, next Torch, SFT Trainer, Training Arguments and Text Streamer from Transformers, then loading the dataset.

So the first step is to load the dataset. First we are entering the URL where the OIG dataset is present and it is in JSON-L format, which means every single row is a JSON object. And this is how it looks like with the text as the key and value towards the end.

So if I take only one row, this is how it's going to look like. So with text, metadata and source. So if I extract only this line, this is how it's going to look like with a human tag at the bottom and a question. What are some tips for creating a successful business plan? And we are teaching the large language model.

This is how it should respond with point number one, point number two, point number three, point number four and similar approach. So similarly, there are different types of instruction in this data set. So we are going to tune our large language model. using this. Now let's load the dataset using dataset function.

It's in JSON format. And here we are mentioning split train. So generally, we can split the data for training and testing. Generally, it's nine is to one ratio, which means 90% of the data will be used for training and 10% used for testing. So now we are going to load only the training data.

So step number two is a load the large language model, which is Mistral. We are defining the max sequence length and then defining the model and the tokenizer using fast language model dot pre-trained function these models are unique to onslaught and i am loading a 4-bit quantized version which means it's much easier to load it on my computer next i'm going to define fast language model for inference and then passing the model this will speed up the inference speed if you want to generate a text now we're going to compare how it looked like before training so we're going to create this function called generate text here same as before We are entering a text that is converted to numbers using tokenizer. Streaming is just used for streaming the response. Then those tensor numbers are sent to the large language model and the large language model will again generate the output. The output will be again numbers and those numbers will be converted back to words using tokenizer.

So the words convert to numbers using tokenizer. Then the large language model use those numbers that is tenses. and give us the output that is again numbers which are predictions and those numbers converted back to words using tokenizer that's what happening here now prints before training and then generate text what are the tips for successful business plan now we're going to run this code we need to install one more package which is unsloth colab this will include many of the packages which are required so i'm going to enter that now it's installing other required modules now python app.py and click enter it's printing out the response it mentions it's unsloth using fast mistral patching release So as you can see in the response, we asked what are the tips for a successful business plan and it is giving us a continuous completion statement which is not relevant to what we want. We want the business plan in points.

Now we have completed loading data, model and before training. Now we are going to train the model and then upload to Hugging Face. So step number four, training. We are going to do model patching by adding fast lower weights.

I will tell you what does that mean. So here we are defining the model. We are using these projections which I have already covered in the previous video which I will link that in the description below.

So these are the ways which we are going to fine tune and we are defining all other variables. You can change this based on your requirements. Now we are defining the SFT trainer that is supervised fine tuning trainer. So here we are defining the model, the data set.

That is the data set which we are feeding to train and then mentioning the text field which you can see here. Next we are providing other values using atom optimizer maximum steps I'm giving 60 to keep this tutorial short but you might need to increase these things to get a good model and finally defining the trainer.train this will automatically train the model. Now after training I'm going to see how it's going to perform so the same function with the newly fine trained model. Finally we are going to save the model in the outputs folder.

Next we are going to save the merged model. Save pre-trained merged. which is going to be saved in outputs merged folder and it's in 16-bit. So what's the difference between this and this?

This saves only the adapter. That means you have a base large language model Mistral and use the adapter and you will get the same response. So each time you want to run the model you want to load the base model that is Mistral and then load the adapter to ask a question. But after merging both the adapter and the base model will become one.

So you don't need to load two different things at the time. Now model.push to hub merged. I'm mentioning the model path and then save method merge 16-bit and getting the Hugging Face token from the environment variable. Similarly, going to push the LoRa model.

That is only the adapter in this location with the token from the environment variable. That's it. So overall, we loaded the data set. We loaded the model.

Then we configured for training using SFT trainer. Then we initiated the train function and finally we are pushing those to hugging face. We are going to make a slight modification in the code.

So we are going to disable this text streamer. For some reason I am not able to compare before training and after training. It could be because of some catch or the way we structured the code. Now I am going to run this code in your terminal pythonapp.py and then click enter.

Now we can see it started training. Ultimately we need to get this loss lower. Here we have given one epoch and...

totally 60 steps. It's better to increase these values. It's nearly around two minutes now and it is completed.

So after training now this is what it's printing. What are the tips for a successful business plan and it's in points format. That is exactly what we expected. Now you can see it's getting uploaded to Hugging Face. The model upload completed and here is the uploaded model in Hugging Face.

You can use this using these commands. So we're going to test this model. Same like before we are loading the model which you can see here the merged version and we are going to generate text by asking this question what are the tips for a successful business plan and I'm going to run this code in your terminal python test.py and then click enter.

Now it's downloading the model and here is the response what are the tips for the successful business plan and gives us in points. This is exciting. Now you can fine tune much faster with less memory. I'm really excited about this. I'm going to create more videos similar to this so stay tuned.

I hope you like this video. Do like, share and subscribe and thanks for watching.

Transcript for:Efficient Fine-Tuning with Unsloth

Transcript for:
Efficient Fine-Tuning with Unsloth