Transcript for:
Fine-Tuning Poly JMA: A Google Efficient Language Model

this video you will learn how to fine-tune poly Jama a new efficient language model by Google Now this tutorial is going to be broken down into four main steps we are going to download our data set prepare it and show you guys how it looks like I'm going to show you guys the model how to download it the training arguments required for training the model and finally the training by the way I make similar videos like these on llms Mission learning and other data science tools so please feel free to subscribe now what is pojama poly JMA is a new large language model released by Google and if you guys haven't heard about Gemma Gemma is a suite of different large language models open source large language models provided by Google and they've newly released a new language model called P Jama this is a state-of-the-art vision model and if you guys didn't know I've been making videos on Vision Transformers and vision models for a couple of years now so this really excites me now some of the things you can do with pjama is image captioning visual question answering detection and expression segmentations not only that it can also do a document understanding and many more now in this video I'm going to show you guys how to adapt such a model to your domain data by fine-tuning it now first things first make sure you've downloaded the package over here data sets Transformers and accelerate after done that feel free to log into huging face Hub that's going to help you save the model at the end and now you're going to load the data set we're going to be loading a very very small subset of the training data to make this tutorial very short now what we're going to do is we're going to prepare the data set a bit more by removing certain columns and then we're going to split the data set into training so we're going to take a even more smaller subset of the data so the final data set has around 2,000 samples we have multiple choice answer question and then image if we inspect the training data it looks like this so you have various labels over here you have a question and then you have the path to various images now we're going to process our data now because this is a vision model there are some special steps they have to take to process the data before we start training so now we're going to download a processor a poly Gemma processor now to access the polyma model you have to make sure you have access in the hugin face Hub in order to use the model so after you done that you should not get any problems and now we are going to use the processor to convert our input data so each row into a specialized token and we do that by using a collate function so to do that we're going to import torch we're going to put device Cuda and I'm going to explain why we do that we are going to Define our function so this is the CATE function that I was talking about we have our text from the example the labels the images and the tokens that we get with the processor at the end we're going to pass it onto the GPU where we finally get our tokens so essentially input data converted into tokens and now we get to load our model so we're can do the Imports and then we're going to call our model to download and we are loading the model to our GPU therefore we have two device at the end and also to download the Gemma model you have to make sure that you have access to the model through the hug in face Hub and there we go there we have the model ready now what we're going to do is we're going to Define our Lura configurations here we set the BNB config the Lowa config specifying which modules that we want to find in okay after we set the model and Lowa config we're going to run the model or run the configuration for the model after we've done that we're going to set our training arguments so over here we're going to set the number of epox 2 and a few other configuration for the bat size learning rate Addam beta weight DK and few other parameters now this is something you just have to play with yourself to see which one performs the best for your use case after that he's done the only thing that is left to do is to start training so we're going to Define our trainer over here we're going to pass in the model the training data set the colate function and the arguments and that's it now it's all ready and set all we have to do right now is to train the model and this is going to take some time depending on how fast the computer is but over here I am using a Google collab notebook so this should take according to the statistics around 10 minutes okay now the fine tuning is done here are the metrics now what you can do is you can take the trainer over here and you can push the model to the hub and this is the same Hub that you authenticated with in the beginning of the notebook which will automatically send the model into hug and face Hub now this is a very very smooth way to to save the model a multi- gigabyte size model in a free Hub and over here here's a link to the inference on how to use the Hub and the save Gemma model to do some inferences on your images and there you have it that is how to fine-tune poly Gemma in the most simplest way possible I hope you guys found it insightful if you guys did please feel free to subscribe and I will see you in my next video have a nice day