Hey guys, in this video we're going to have a look at how we can fine-tune YOLO version 5 using our custom dataset and see how well does it perform on some images that we have in our test set. Let's get started. YOLO stands for You Only Look Once and the most current or latest version is known as YOLO version 5. Note that there is a huge controversy going on right now about the naming of this project. And there is a post on Hacker News that at least the first comment right here says that this is just bullshit.
And there is a lot of accusations about calling out the authors of EO or the author of EO version 5. Um... And yeah, I'm not going to spend any more time on this issue, but just be warned that this is something ongoing. And this video is going to focus on this repo, which is done by Ultralytics Yoho version 5. So we're going to have a look at that one.
Yoho version 5 is completely written in PyTorch. And in my opinion, this project is very well done. This implementation is heavily based on YOLO version 3 and the experience that the guys at Ultralytics have using YOLO version 3, at least this is somewhat what they say. And this model appears to be very efficient and very well performant at least compared among the YOLO implementations. Of course you can see here that efficient that which is a project or a model introduced by Google is more or better at this task.
And the task that we're talking about right here is real-time object detection and I'm going to have a look at the leaderboards right there in a minute. And you can see that the authors acknowledge that in the future they might include updates using even some of the features which are presented. in Yoho version 4. Yoho version 4 was introduced in the paper Yoho v4 optimal speed and accuracy for object detection of course this paper is available on archive and the paper or at least this version is available as april 23rd so you can see that right here we have a similar chart comparing the MS COCO you the performance on your version 4 as compared to the efficient that on the ms coco object detection validation set and you can see at least here in this chart that the yellow version 4 gets an ap of around 44 while this yeah while the yellow version 5 at least the largest the largest model gets around 47 48 i would say so at least from this very bareborn comparison we see that yolo version 5 might be better performing better so this paper introduced a lot of cool new stuff i mean like cool new features and cool new ways to use Speedup YOLO version 3. And here we are seeing a chart or an image that is available in this paper.
And this is the main difference between a one-stage detector and two-stage detector. YOLO, all of the YOLO implementations are known as one-stage detectors. So they basically get some input. They have some backbone network.
And then they produce those. dense predictions and the purpose of all this is to have something that is very fast and runs in real time so what this means is like something maybe 30 or even 60 plus frames per second so you are very good at those tasks and if you want more accuracy if you don't care that much about the speed or the speed of the inference of your model, then you might want to use something like RetinaNet or FasterRNN or FasterRCNN or something like that. When you use those, you have a second stage, which is essentially a classifier, which states or which says that those bunch of predictions that we have from the first stage of the detection from something like YOLO, Then you have an additional stage and this classifier right here gets the prediction or the dense predictions and classifies each one as whether or not how believable or how probable this really this prediction really is so yeah of course two stage detectors are fast are more performant but they're slower and there is a trade-off depending on the task that you want to handle finally we are going to have a look at the leaderboards provided by the papers with code which are of course bought by facebook research so on the real-time object detection on coco at least this task defined right here you can see that efficient that which the largest amount of parameters is performing very very well compared to the yolo v4 but the frame rates and i believe that this is done on tesla visto i'm not really sure about that but we have around 9 frames per second with the efficient debt which is very poor but with yolo version 4 we have 62 frames and at least in the hacker news we have a post right here by some guy or girl i'm not sure and we have at least this breakdown which states that yolo version 5 is much smaller, it's much faster and is right about there on the accuracy. So you might expect that, of course, this again can be controversial, but you might expect that you get a similar accuracy right here, but much faster performance or inference speed using Yoho version 5. So in the next video we are going to try out YOLO version 5 maybe on a mobile device and see how well does it perform in the real world.
Finally we are going to compare YOLO version 4 with the state of the art object detectors which are not that interested in how well the accuracy goes. So here we have... a leaderboard again on papers with code and you can see that this detector s or something like that which was introduced during this year and it has the best results on object detection again using the coco test set and if you look at this result we have almost 55 of box ap you So if we go and this stands for average precision AP. Yeah.
If you look for YOLO version 4, which is the first result right here, we have 43.5. So this is a very, very low result, at least compared to the first model. And you can see that the leaderboard position is very low as well. All right. It is time to start coding and I'll open up a Google QoAP notebook and right here I'm going to start with checking the current GPU that we have for this machine and we have a Tesla P100 with 17 gigabytes of VRAM and right here I'm going to install the three command line tool and then I'm going to copy and paste some of the requirements for the YOLO version 5. And right here you can see that I'm installing PyTorch 1.5.1 which is currently the latest release which contains some very important bug fixes.
And I'm installing also the Torch version 0.61. We are fixing or specifying the version of NumPy which is required by the EO version 5 project we are going to also use pyml for some configuration files and then I'm installing the coco api which is again required by the iowa version 5 project and if i run this this will go ahead and start downloading everything after the installation is complete we are required to restart the runtime so i'm going to do that and i'm going to install the final dependency which is going to be a project known as apex which is provided by nvidia so i'm going to paste in the command right here and run it. So what this does is roughly clone the repo and starting the installation project.
And what is this thing called? Apex. So this is a recommended installation from the author of YOLO version 5. And this is a PyTorch extension which is used for mixed precision computations.
So this is the PyTorch extension. Your version 5 can be trained using more than one GPUs and I guess they're using this mixed precision so they might be using FP16 or FP32 so different floating point precisions and this project speeds up the computations when you're using those kind of computations so this is a nice additional benefit when you're training your version 5 at least. Now that the installation of Apex is complete we are going to rerun basically the whole code that we've used to generate the data set and I'm going to run all of the cells below this one. So this will set up some imports, download the JSON file, then we'll see the example image right here.
And next we are going to execute our create data set function for both the training and the validation set. And then we are going to see the examples right here for at least one of the annotations. The one thing that I did right here compared to the last video is that I've gotten the categories, converted them to a list, and then I've sorted them so we have a reproducible.
class names right here which is going to be important at least when we are training our YOLO model. So the dataset creation is now complete and I'm going to start the fine tuning of the YOLO version 5. I'm going to go to the GitHub repo and get the clone URL copy that one then I'm going to git clone the repo right here all right and then I'm going to check that the repo is indeed here so here it is all right so next just for the purposes of reproducibility I'm going to enter the cut the directory uov5 and then I'm going to check out a specific commit that I'm I've tested the project width. This might change into the blog post that I'm going to write and this might be a bit further if there are some bugs that need to be fixed in the current implementation. So next I'm going to go to the repo once more and have a look at the type of models that we have right here.
So we have a pre-trained checkpoints for yolo v5 smlx and spp so for our purposes i'm going to take the best performing model which is of course the slowest but it has the largest amount of parameters and i expect that this will indeed give us somewhat good performance at least in our case we are not that interested in real-time object detection while you might be and you will have to take that into consideration depending on your hardware and the accuracy that you want to achieve. So for this purpose, I'm going to show you how you can create a specification for your dataset. And then we are going to download or create a configuration file for the model that we're going to use. So both of those files are stored in Google Drive. And I'm going to show you their contents right now.
So I'm going to... download the two files. I'm going to copy and paste the commands.
So we are in the YOLO version 5 directory and you can see that we are downloading data-coding.yaml and YOLO version 5x which is again the largest model. So those files will be stored right here into data-coding.yaml. So let me show you the contents of this.
coding.yml file so here we have a path to the training set and this if you recall from the last video is into this directory quoting so the structure right here says that the data set should be outside of the yo version 5 directory and we have the training and the validation set and if you recall right here we have the images and the labels so Both of those are training and validations. We have NC which stands for number of classes. And we have nine classes. And this one is really important. The class names should be the same as those that we printed out right here.
For all of the categories that we have. Because you just want them to match. So this is why the sorting right here was required. Alright, so what about the next file which is this Yoho version 5x.
So let's go and check it and I'm going to download this as well. Open it and show you the config for this one. So essentially the only thing that I've changed right here compared to the default file was this again the number of classes.
that we have right here. So again we have 9 and this is a specification of the backbone and the anchors which are the initial positions at which the bounding boxes should be looked for. So this is pretty much unchanged and the same thing that is going on with the current implementation.
So now that we have an understanding of those files we can continue and next we are going to basically follow the guide which says that yeah if you want to train the yobo model you have to run this train.py file and pass it some configs so i'm going to do to do just that i'm going to call the file and i'm going to specify the maximum image width of this which is going to be 64 sorry 640 pixels i'm going to specify that each batch should contain four images this is roughly how many can we fit into the memory of 1p100 you might write six it might work as well but i'm not risking some out of memory exceptions here then we are specifying the number of epochs and i'm going to fine tune this for 30 epochs I'm going to specify the config for the dataset and this will say that we are going to use the coding dataset that we've filed just I've shown you. Next I'm going to specify the config of the model which is again models your v5x yaml So this looks very good and another thing that is very important to pass in right here is the weights. So I'm going to do the pre-training with EOOV5x.pt which is a pre-trained checkpoint for the PyTorch implementation of EOOV5.
And this will be automatically downloaded. since those this checkpoint is not available currently into the directory of our project and I'm going to specify the name of the model so I'm going to name it yo v5x underscore coding and finally I want to cache the images into the proper format for maybe later training if you are inclined to do some fine tuning or hyper parameter tuning on that so the model the whole process has started at least for now and you can see that this is using kuda apex so the actual apex implementation is being used and we have some of the hyper parameters that we are going to use for fine tuning our model and you can see this is basically the whole architecture of the model another interesting thing right here is that this is actually starting tensor board for us so the results will be loaded right there if you're interested into checking those out and you can see that we are actually downloading the weights of the model so uov5 5x dot pt and this checkpoint is around let's see do we have the information for this one i'm going to open the model and have a look at yeah yo v5xpt so it's roughly under 200 megabytes at least that checkpoint and next you see that we are training for 30 epochs we are analyzing some anchors which is basically smart way to change those anchor values that we've seen into the model config file and fine-tune them a bit for our data set so we have four five three images that we're going to use for training and we have 51 examples or images for the validation and you can see that we are using roughly 10 gigabytes around 10 gigabytes of vram Now that the training is finally complete you can see that we are done in approximately 0.45 hours. So that's a bit strange.
Anyway, so we've done 30 epochs and you can see that the best and the worst weights were stored right here into the weights directory. So here you can get your... fine-tuned model or the checkpoint for that with the best MAP so probably mean average precision so you can see that the final results are actually very good at least on the 50 images that we've tested right here another thing that you can see into the project structure right here is that you have some images you And unfortunately on any of those I don't think that any predictions were done. And for example you can have a look at some training batches.
So you can see the annotations from some of the images. Including of course some image augmentations that were done by the YOLO version 5. Next I want to show you this. image which we are going to see in a minute and this one takes the information from the directory called runs and you can see that today we have sorry this is the runs for the tensorboard actually and the results were taken from this text file so this is a summary of all of the metrics that were done during each epoch.
So I'm going to show you a brief plot that is provided from these results. So from the utils I'm going to import plotResults and I'm going to call the plotResults function right here. So this should take into account the charting preferences that we had and you can see that roughly we have Recall that was very close to 1 at some point but the precision was increasing up until the epoch 20. You can see that the classification was steadily decreasing and overall the project was very well trained and you might argue that you might benefit from training this model even for more epochs. box for example 50 or more or you might want to just go ahead and fine-tune the hyper parameters that were used so this is not something that i'm going to do at least for this video but if you're working on some real world project i would suggest that this might be a viable next step of course if you're using your version 5 so next i'm going to create some you images on which we are going to do some inference and I'm going to select 50 images from the validation set and I'm going to take just the first 50 and I'm going to copy those into inference images. So this is the folder right here.
So we have two images already. And I'm going to copy some of the validation images right here. So inference. I made a typo. And again I have an error.
Quoting, oh I should exit this directory. Alright, so hit 50, yep. And it should copy the images right here. We have those. So from this directory.
I'm going to run the detect.py file and I'm going to pass in the weights that are best that we find are doing the best during our training and I'm going to specify that again I want the images to be in at max at 640 pixels I want the predictions to be at least 40% confident and I'm going to pass in the source which is again inference images. So I guess this should be it but let me just check again the command. Yeah let's run this. So this should be relatively fast. The most, the slowest part I guess is loading of the model and after that you can see that the inferences are done sub 0.1 second.
So if I open this and go to the output, let's open up an image right here. So you can see that for this the model is drawing this bounding box which is very good. and it predicts with accuracy of 46% that this is indeed correctly identified as jeans.
So let's open up another one. This should be trousers, I guess, which is again very good. Again jeans, I guess.
Yeah, the the labels of the bounding box are not very well done in this project So this is a jacket. So this is very good and let's see that we have this image which was not part of the data set in any way and You can see that at least one of the jackets were detected right here So this is again on an image that the model haven't seen or haven't been fine tuned on because all the images that we have right here are in this format so i would say that this model performs much better than i expected because yeah yolo models are not very well known for their accuracy but this does a very good job so now we know how to fine tune a yolo version 5 model using your own custom dataset and you can see that the output or the best performing model was stored as a checkpoint which is available for download by you so in the next video i'm going to show you how you can use that checkpoint and load it on a mobile device deploy it right there and we'll build a very simple mobile app on which we are going to use our own custom model so please make sure to watch the next video as well thanks for watching guys i'll see you in the next one please like share and subscribe bye bye