Football Video Analysis with YOLO

hello in this video I will teach you how you can build this football analysis project from scratch this project detects and tracks players referees and footballs across the whole video using YOLO YOLO is one of the best AI object detection models right now we will also be training it to improve the output of the outof thee boox models then we'll assign players to teams according to the T-shirt colors that they are wearing this will require us to segment and cluster pixels with K means to only choose t-shirt pixels from a player's bounding box from here we can measure a team's ball acquisition percentage in a match then using optical flow we will measure how much camera movement from one frame to another so we can precisely measure a player's movement afterwards we will work on prospective transformation that will take a camera's distorted Viewpoint of the 3D World and accurately represent the scene's depth and perspective this will help us know how much a player moved in meters and not only in pixels and lastly we will measure a player speed and meters covered this project will cover a lot of Concepts and deal with real world problems so whether you're a beginner or an experienced machine learning engineer this project will totally make your resume shine so let's Jump Right In so let's start by getting the video that we are going to use to develop this project so in gagle I found a data set with football matches that is called dfl Bundesliga data shootout um if you scroll down a little bit you're going to find a lot of videos and if you open any one of them you're going to find that it is like um U an Eagle Eye uh camera point of view like this and the one that we are going to use is going to be 08 F d334 uh remember to log in before you download it it's um it's going to be for free so just click on it and this is the video that we are going to use on it so just uh press here and press download shouldn't take long because it's not a long video so I just have um an empty folder called football analysis this is my project folder and then you can go and make a new folder right here called input videos that we are going to put put our example video in it so in your downloads you should find the video just cut it and paste it in um and then we can open this uh open this project in Visual Studio code so I'm just going to uh open it like this and I'll close all opened all previously opened uh files um so the first step in this project is going to detect the players and the ball inside the uh inside the video and you can uh like write object detection in order for me to um uh explain what object detection real is so um in here you can find that the object detection is just um a neural network that is able to uh draw a bounding box um around an object and it can tell what object it is so you can say that this is a bicycle and this is a person and this is a car and it knows where it is inside of an image um one of the main uh like good uh convolutional neural networks and it's State ofth art right now it is called YOLO um and we can use YOLO uh with ease by using a library called tics so you can just pip install Ultra litics like this uh it it doesn't take any time on my machine because I already have it but uh it should take a little bit more time on your end and we can just play around with this uh YOLO models and ultral litics in a new um file we can call it yolo inference py it's just going to be a file will we play around with ultral litics get to know it and um understand how it works so uh in order for us to use this uh YOLO model which is the state-ofthe-art object detection model we can just uh import it from altic alra litics so from alter litics import YOLO and uh to to to like load the model we can just write YOLO and then give it uh the model that we want so we have uh YOLO like till V8 and V8 is is the latest one that we can use conveniently there is Yolo v9 but it's not convenient yet um so you can have YOLO V8 and then you can write whatever model you want so um uh we have different models for YOLO V8 so you can just write like this GitHub and you can see the different model for YOLO V8 so we have YOLO V8 Nano small medium large and x and U like the difference between them is the number of parameters so if you can see it this is 3.2 million this is 11 million and of course the the higher the number of parameters the higher the accuracy so you can see that the accuracy is increasing but also more uh Computer Resources are being consumed so uh I have um like a good amount of ram I don't have a GPU but I do have uh good RAM and good CPU so I'm going to use the YOLO v8x which is the largest one to have the best accuracy uh if it doesn't work on your end you can try the medium you can try the small and it should all have the same code uh now to run the model um it's going to be uh simple we just run Mo like model which is uh the YOLO model Dash do predict and then give it the path for the video so you can write input videos slash and then you can uh paste in the mp4 file uh you can also specify that we want to save the results so save equal and this save just saves a video an output video of the results and it's by default false because it Returns the results back here back to um a variable and this variable is going to have the detections um in it so but yeah let's let's also put the results here and also have the save video equal true uh let me also show you the what the results have so we can just print results of zero uh which is going to be uh the first frame and then we can uh basically Loop over the boxes in the first frame and I'm going to show you what boxes is so results do boxes uh of zero do boxes and then print box and we can also have uh some sort of line in between so that we can understand which is which and then we can just run it uh so this is all to it uh for the object detection um this is how we do the inference for it um and the first time it's not going to find the YOLO V8 model locally so it's going to download it but after words it should find it locally and um it it shouldn't download it again so it will be a little bit faster um so yeah after it downloads it it will start predicting the uh on the video and it should take a little bit uh of time to predict on all the video but let me show you uh the prediction uh outputs right now so yeah you can see that the it it's predict ing on one video and this video has 750 frames and in the first frame it found 23 persons one sports ball and it did it in 500 uh around 500 milliseconds um you can see that it's it's going to go through all the frames and uh if you have a GPU it might like it will be way faster uh but right now I'm just going to cut the video and come back when it's finished okay so now it's done uh as you can see we have the outputs for the boxes and output for the results but we can go through it after we see the output video so you can see that after you run the script the model do predict and uh when you save equal true it automatically creates a new folder called runs and in inside it you're going to find the output video so we can open it from here uh you can see it runs then detect then predict so let me explain first the uh the output that you see right here so the output right here is uh consists of for each uh object that the that the uh model is detecting uh the output is uh basically two things uh it's going to be the bounding box which is the B like the Box um around the object which is this one and then uh the class name and the confidence uh but the uh the class name already has the confidence so it's going to be uh only two things as output um so the person right here uh or the uh like the object um and here is for example the object is a sports ball um so the object is going to be um an ID the the model is going to uh output a number and we are going to map it to whatever number that was trained on so um uh and I'll show you where we can get this map and then we also have a bounding box and this bounding box basically defines where in the image the the object is so this can be defined uh in multiple ways we can Define it as let's uh choose a blue color first uh we can Define it as the center point like this like here and it's going to be the X and Y coordinates which is going to be uh this one and this one so it's going to be the X and Y coordinates of the uh on the image and then we are going to put the width and the height so it's going to be the width of the bounding box and then the height of the bounding box um another way we can represent the uh bounding box is by those two points uh this point at the uh top uh left and the point at the bottom right so this is another way to represent a bounding box we call this X1 y1 X2 Y2 those are different formats to represent the same bounding box so you like you don't need to um like confuse yourself but uh we are going to stick by this uh it's going to be called xyxy format um this is uh like this format for for me it's straightforward and it defines the bounding box positions um and we can also uh like when we draw it we can just provide those xyxy positions and it will draw a bounding box instead of this one uh so yeah this is what object detection is uh is it's just uh detecting um like a convolution network that detects bounding boxes and classes um and class names and class IDs uh so let's also so see the results and then map the what we see with the uh uh with the output of the results so you can see here that it was running on all the frames first and then I pr like printed out the results so the results here have first it has boxes that we are going to print later on and uh you can see that boxes is an object and we can and we will open it uh like uh in a couple of seconds then we have key points and masks but we are not using those because key points is for like um uh post detection which is detecting joints and mask is for segmentation so we're we're using neither of those so it's going to be none and then the names names is just going to map the uh ID of the user uh to the uh actual name so so zero right here so when the model predicts zero it means person so it goes in the frame and writes instead of writing zero it writes per so that the so that we can understand it basically uh one bicycle two car and so on and so forth you can see the whole U like 79 classes here then we have the original image in pixels you can basically ignore that and yeah you have the save Direct the and other statistics about the uh runtime so yeah you can ignore those basically uh then we have the bounding box so this is one bounding box and let's see what does it have so first off it has uh a CLS which is a class ID it is zero the first one is zero so zero if you can scroll up you can map it to the name so zero means person so this bounding box is for a person and then um you'll have the confidence of it so how confident the model is is um is basically U saying that this is um uh this is a person for example so if the confidence decreases like this one then the model is quite not certain whether this is a person or not uh but the higher the like the higher the confidence like this 0.8 0.7 um the more like the more likely it is it's going to be an actual person and not going to be for example um a background or anything like that um and afterwards uh you're going to find uh something that is called track um we are going to cover tracking later on um but right now we don't track we're just predicting so it's going to be false and then we have the bounding box we have first the bounding box which is XY width and height which is going to be the center point which is uh what I showed you before and then width and height you also have different other uh like ways of representing the same bounding box but we are going to use xyxy which is going to be the far end of the uh rectangle like I showed you and yeah so this is one bounding box and um yeah we have also other bounding boxes as well I am looping all over them uh at the first uh in the first frame and this is the output of it so yeah back again to our results right here um so uh in our results right here uh we have multiple uh uh like inconveniences so the first one you can see that the sports ball uh right here is being detected but after a while right right now it's not being detected and it's going to be very rarely detected afterwards like you can see that it's almost being ignored most of the frames so we will need a model to basically uh protect the sports ball or this uh uh football a little bit better that's one uh thing that we need to do so that we can like analyze ball acquisition by players and we can also analyze different things about the ball um and yeah having very few detections is going to hurt us so uh increasing the accuracy for this is important and then another thing is that we also see that people outside the cord that are not players like this one and this one uh this one are being detected um it's going to be good for us if the model out of the box can ignore them so that we don't write code to detect the lines of the court and then um ignore the people that are outside the lines um so yeah if the model can does do this for us automatically it will save us a bunch of code and also the referees right here so for example this is a referee that is wearing black like this one uh if the model can also differentiate between referees and players that would be also awesome um I know that we can do it by hand like we can uh see the color that the player is wearing uh sorry the person is wearing and say if it's black then this is a referee uh but yeah it's going to save us a little bit of code if we if the model can do this for us basically so for this I have found a data set on Robo flow on Robo flu um that is called football player detection image data set and roof flow is um like it's basically a library uh a python library that you can use to download any data set that you want and also the website helps you to um upload data sets and um annotate them even um and yeah so uh and it also provides you to download the data set in different formats so that you don't have to deal with wrangling the data and putting it into different formats to accommodate different um uh let's say models so let's open one of the images of the data set and see what we get so we get the player right here so all players are having a bounding box uh the people outside the court are not being detected like you see here uh the referee is being also detected like this one um and is like it has a different class so this is a player and this is a referee also the sideline referees are also being annotated uh the ball is also being annotated and the goalkeeper is being annotated as a goalkeeper and not a player now this is a very good uh annotations for us it's going to help us with out of the box understanding which is referees and ignoring the people outside so yeah this is exactly what we need um this data set is only 612 images um this is um a little bit low for us to start training on it uh but um yeah uh it it it will help us to it will be good enough for us to for our for our project so yeah so right now our task is to basically um run a training um a training trial on YOLO and fine-tune it on this data set so that we can have this neat output um yeah um and before we proceed you can also create um uh an account with roof flow so that we can uh download this um download this later so in order for us uh to to start training the model we can create a new folder called training like this and we can train it in a notebook you can write football training YOLO V5 uh Dash uh ipynb uh ipynb is going to make it as a notebook and I chose YOLO V5 because it's going to have like it has the best out of the box accuracy at detecting the ball which is the main issue that we have so the first thing is that um if you didn't install Ultra litic we can just pip install Ultra litics right here and uh you can also pip install roof flow which is the library that we are going to use to install uh to um uh download our data set Robo flow um afterwards we can get the data set so you can get uh data set and then uh we can um just to get the data set we can open up proof flow uh click on Yo V5 because this is the model that we are going to use make sure that show download code is being checked then press continue and then uh you are presented with a code snippet to download the code automatically for you so you're going to have an API key so uh I am going to uh blur mine but yeah your API key is going to be different than mine so you can just paste it in like that and we can remove the first line because we already installed it and yeah um we can now run this so you can see right now that it is downloading the uh football data set so now it's finished and you when you can go back and refresh it you can see that football player detections D1 is going to appear this folder will have our data set and you will have the testing training and validation sets you can open the validation for example and uh you can see the images right here uh it's the same type of images for the um video that we are trying to predict the same camera angle and everything um and uh you can also see the labels let me also explain the labels a little bit so the first one is going to be the class ID and this is what is going to say that this is a person or referee a player or a ball and two here uh is just the um class ID so if you can see here we have the names so 0 1 2 so this is the player so two is for the player uh then it is followed by the bounding box um and then this bounding box is going to be the X Center y center then the width and height um relative to the width and height of the frame so 0.32 is just like uh it's telling you that it's almost 32% of the way of the width of the frame um but yeah you don't have to worry about this um but I just wanted to to show you how what are the labels um of this model and as you can see it's just the bounding box and the class name uh now you can close it and you also have the data. yaml that is very important at um like telling you what are the class names and where the images are so you can also close this and yeah right now you have the data set right here so you can have the data set. location for example and it will tell you where this um where this data set resides in our local machine um one more thing before we proceed with this is that um we need to uh we need to move this uh data set into another folder with the same name so football players detection -1 uh we need to put it in also another folder called football players detections -1 and put the test training and validation sets there um this is what the uh training code expects and it crashes without it so uh right now we just um we just need to uh do this simple step so I can do it manually just like that like I create a new folder that is called football um players detection -1 and then I move the testing right here and then the training and then the validation you can do it manually if you want uh but I want to make the code reproducible so I'm going to do it with python so what I'm going to do is just import uh a library called shuttle shuttle helps us to move and copy files and folders in our uh in our machine uh using python so we can have shuttle dot move and then we can uh provide football players detection then slash train and where do we want to get it like this is what we want to move where do we want to move it we want to move it football players detection D1 then football players detection D1 that is all what we want to do um again this is just a requirement uh for the uh training code for uh alter litics so I'm going to do the same thing for the test and the same thing for the validation like that um and you can just run it so now it's running I'm going to also create um training uh training section so now it's done and uh to start training all we have to do is we are going to write a terminal command with ultral litics and we write YOLO then we can tell it that the task is going to be detect uh because we're detecting then we are going to set the mode for training so we can say mode equal train and then we can specify the model that we want to train um we can train uh YOLO V5 and again also y V5 has small Nano large and extra large I am going to train the extra EX extra large version and uh this is like the x is the for the extra large uh then I am going to specify the data set location so it's going to be data set. location and they need this file which is data. yaml so I'm going to also put it right here data DOL um then we can specify the EPO which is the amount of iteration that the model is going to do to learn from the um to learn from the whole data set and then we can specify also the image size so image s uh s and Z for the image size and we can specify it to be 640 um now yeah this is all what we have to do to start the training uh to start the training code and to to make our own model so I don't have a GPU on my machine so I will need to train on uh Google collab uh Google collab provides you um with a couple of hours of GPU training per day so uh we will be able to utilize that so in order to use that you can just go to your uh Drive uh Google Drive and then create like a right click and then more and then you can create a Google collab uh um like notebook right here uh so after you create a new Google collab notebook um you can just copy paste the cells there are a few amount of cells and also the first cell will install the requirements um so yeah you can just copy paste it uh or you can just uh press file and then uh upload notebook notebook so for me I already did that um I already copy pasted the the code that we written together and I also have run it so we don't have to wait but training takes a little bit of time so be patient um this has to be 100 EPO and yeah uh so you're going to find their training steps like every epok and uh the first ook has uh 1.4 uh loss and then you can find the loss is uh decreasing steadily um so after it finishes like this like here uh you're going to also have the runs folder and if you don't see it just press the refresh button right here uh this would uh refresh the um the uh the folders so that it will have the latest uh uh output so you have the runs the protect then train and then you have the weights you have the best weight and the last weight you can download them both you can just press uh those uh three buttons right here and press download um or if you want to move them to your drive you can press this button to mount your drive it's already mounted on my end so it's un mounting it right now but you can uh mount it and then you can uh copy it so basically run a copy command like copy um this uh this file which is in runs detect train weights and best. PT uh two it's going to be the drive so your drive my drive and then you can put it wherever you want so you can just paste it right here but I have chosen uh to paste it in the collab uh folder uh football analysis project and then um a folder called YOLO V5 so I copy pasted it um the best and the last and then you can uh just download them um so after you download them so you can go back again to your project and create a new folder called models like this and here you can paste in the models that you downloaded so just paste it in uh so now you have another folder right here that is called models that have best and last and now you can close in this notebook just save it and close it and yeah um that is it uh you can also close the training folder so we're done with the training so right now we just want to see the output of this model so all what we have to do is that instead of giving it yolo VX which is give it the models the/ best. PT uh which is this uh this uh file path which is the best file path and uh just save it then uh clear the output and run it again and it should also take uh a little bit of time but you can see that it's here it's doing the same thing but notice that it is predicting balls goalkeepers players and referees instead of person and sports ball so it has different class names um but yeah again I'm going to uh cut the video and come back when the run is over okay so now it's done so just like the last time it's in the runs now we have predict two which is the latest predict that we have so we can open it like this runs detect predict two and then open this so you can see right here that the people outside the court is not being detected the players are being detected as players the referees right there the three referees are being detected as referees the ball is also being detected quite well uh you can see that it is right now it's being detected a little bit more than it used to be um especially when it's it is hit you can see that it it's at least having um detections in this part we didn't have any detections and yeah so yeah that this is how you can use your uh model for like the the the fine-tuned model for predictions and like like just like the model that we have um we also have the um the bounding boxes and the the uh results just in the same format like nothing has changed um and currently right now we just want to write code to make the annotations on the video a little bit neater uh I used to have circles uh like below the players so that we don't have those clunky uh bounding boxes and annotations and uh we had also triangles for the B for for the ball instead of this uh clunky one um because there's a lot because there's a lot happening in this Frame and in the video um having like uh this type of annotation is going to uh overlay a lot of things so for example this player is we can't see it like we can't see from the annotations the referee and the players and it would be better uh for us to have uh the different type of annotations uh I showed you the annotations at the beginning of the video but let me also uh show you what I mean um so this is what I mean instead of having those bounding boxes we just have circles uh but it's going to represent the same thing and instead of having a bounding box uh on a ball we just have a triangle on top of it uh so this is this is just what I mean right here you can see that that you can like you can see a lot uh better and you can see players a lot better so we're going to uh do those annotations next um so here right now we're just going to close YOLO inference now we know how ultral litics works and how YOLO works so we can do a main.py and we can then uh Define Main and then print hello world and we can also have if name equal equal Main and then we call the uh main um so this is just a hello world application that we are going to build on top of you and yeah you can see the hello world is being done so the first step in uh like doing this and uh creating this project uh we can create a new folder called utils and inside of it we can create a video uh utils.py uh this video utils is going to have the utilities to read in the video and save the video and in order for us to do that we are going to use a library called CV2 now CV2 uh you can open it like here open CV um and you can just pip install and you will have the PIP install command to install open CV I already have it but it is just straightforward as copying and pasting it in the terminal so we now have the uh we now want to have a function called read video um this video is going to take in a video path and basically return um the list of frames for the video so a video is just a couple of images uh like presented one by one so you can see uh this is an image this is an image this is an image so uh it is being presented at 24 images per second and we can um call those images as frames so um so this is 24 frames per second and in order for us to uh read in the video we can create um a video capture um object with CV2 and I am going to uh take in the predicted code from GitHub copilot uh so those predictions were from GitHub copilot if someone asks uh so it's a video capture then it initializes a an empty list of frames then it Loops over while true uh it Loops over it and it reads in the next frame so uh it reads in the next frame put it puts it right here and it also returns a flag whether there is actually a frame or the video has ended so if it is um um so if it uh so if it is false then the video has ended and we should break uh out of this Loop but if it's true then we can just append the frame to the list of frames and then return it so this is just the list of frames for the video and while we're at it let's also Define the save video function so save video and it it it takes the output uh video frames which is going to be also a list of frames but it it's also going to take an output video path and uh for this we are going to also uh maybe take in the um the same uh like the GitHub co-pilot uh recommendations but we are going to change it a little bit so the first thing is we Define an output format and it's going to be an X vid um then we are going to define a video writer that takes in the video path this is just going to be a string then the um uh uh then the uh the output video type and then it takes in the number of frames per second right now it's going to be 24 frames per second then it's going to take in the output uh like the frame width and the frame height and in order for us to do that we can just uh take in the output video frames of zero do shape of one let me write shape correctly and we can also copy paste it right here uh but instead of shape of one uh we are going to put shape of zero and this is going to be the um the uh X and Y position like the width and height of the frame and yeah so what we going to do now we are going to Loop over each frame and basically write the frame to the video writer and that is it so let's test it out so in the in the main uh we can um so yeah before going to the main we can just in the utils uh create a new file called init.py and this init exposes functions from inside the utils to outside the utils so uh right here you can write from um do video yours import uh read video and save video and here you can just write from um you tills import read video and save video uh now let's read in the video so we can just read a video and we are going to have video frames which is equal to read uh video and we are going to give it the path of the video that we want which is going to be input videos slash and then you can copy paste this um like mp4 file the name of it so uh right like this so input videos and then the mp4 file and let's also save the video so we can uh know that it's working or not so we can save video and then we can also save video give it the video frames and then uh output video and call it output. AI uh output uh video.avi and yeah this is it so uh before running just create the folder that is called output videos so output videos and then uh you can run it um it shouldn't take long to just read and uh write the video so it it should be a a second or two yeah that is it and you can open up the output videos like this and uh open it and you're going to find the same video being outputed again um so yeah nothing special here but now we now we know that there read video and the save videos are working correctly so now what we want to do is we just want to uh write our uh detection and tracking and uh before we can proceed we can I can um I can uh explain what tracking is so um for example um um in this video you're going to find that uh you like in the output videos you have bounding boxes like this one um so like this um uh around the object so every object has bounding boxes like this and this is the current frame so this is the current frame and the bounding box is just um X Y then X then XY uh it's just going to be uh a number a number a number a number and uh and you are also going to have whether this is a person or not um but yeah this is this is going to be it for the detection and you are going to have the same out like the same output uh Val like the same output format for for this also so it's going to be x y x y and it's going to be a person um the problem now is is that uh in the next frame so if you if you go out and go to the next frame like this this Frame and you try now to uh predict the output so let's let's also have it now in red um so you have those bounding boxes right now and the previous bounding boxes were right there like this like this for example so uh the blue ones are the bounding boxes that I drew in the frame before and the red bounding boxes are the current bounding boxes so the idea right now is we have nothing that tells us that this bounding box is the same as this bounding box because we only have the XY and then XY uh nothing um nothing is telling us that this bounding box is the same as this bounding box uh now we can say that um the closest bounding box is the bounding box that is the same uh user so for example we can say that because in the last frame um this blue one is closest to this uh red one uh so this is the same player basically uh so we can do this we can have this um uh we can we can do this uh like detections but yeah tracking uh or like assigning the same bounding box or the same entity to a bounding box across multiple frames is what we call tracking so we just want to track this uh user and tell this uh player that um he was there and then he is now here um so yeah the problem is is that if you just get the closest bounding box then maybe uh this bounding box right here like this bounding box um um is closest to the blue one instead of the red one so um so a Tracker a smart tracker would also predict the uh movement so the um the trajectory that the user is uh the the object is moving in um and it would also uh like take in some visual features so this uh this user right here was wearing white and this user is now wearing white and it is close enough and this user is wearing green so most probably he is going to be this user so this is what tracking is and uh it just assigns a like uh bounding boxes an ID so that we can understand what like where was the user or the object before in the frame and now where he is in the current frame so in our case we are going to use um a Tracker called bite tracker and um this bite tracker is going to be uh more than enough for us to uh track objects across frames so in order for us to write this tracker it will need to have the detection first and then we will need to track it so it will need the bounding boxes and then it would need to match the bounding boxes to an ID um so yeah basically we can create a new F new folder called tracker uh trackers and like we did in the utils we can create the init uh that exposes the classes and functions to outside the tracker um trackers uh folder and then we can create a file called tracker. py like that and then we can create a class called tracker like this and we can then create the init uh this in it gets called when we initialize the um class and uh then we can also have a function called get object tracks object tracks uh this is going to take in the self and then the frames so inside the init we just want to load in the model and load in the tracker so we can just uh first of all import from ultral litics import YOLO like this so we can go here and make self do model equal equal YOLO and then uh give it the model path we can get the model path from here model path uh from the init so this is the model that we are going to use um so right now we just want to get the detections first and then from the detections uh we can get the tracking so detections is equal to self. detect frames and then give it the frames uh we don't have this function yet so we can write it so it's going to be uh detect frames whoops uh so it's going to be detect frames and then we are going to uh basically this code is wrong uh GitHub copilot predicted it wrong but uh we can write the correct one right now um so basically when we when we used to uh write it in the YOLO inference we did it like this self. model. predict and then we just gave it the um basically the um uh the video path and now we can just give it the frames um so yeah we can have the results right here or the detections and then return detections and uh this would basically work but um I would also add a batch size for it so that we don't go into memory issues and things like that so batches is going to just instead of uh predicting uh predicting it on the whole frames uh we just gave it 20 by 20 we just give it 20 frames by 20 frames so we can uh Define a batch size of 20 then we can say uh detections is equal to to an empty list then 4 I in range zero till length of frames then batch size and this is basically going to have I to Z and then the next iteration is going to have I is equal to 20 because it's going to increment by the patch size then the next time is going to be 40 so on and so forth um then we are going to have detections then batch detections of the batch then we are going to get the uh model uh predict so U we are going to choose the frames um from I till i+ batch size like this and then we are going to get the confidence to be uh 0.1 so this is the minimum confidence that we are setting right now and um yeah I saw that 0.1 is uh good enough for detecting um um the whole uh like a lot of objects and without having many false detections and after we get the detections batch we just add it right here uh to the uh whole detections list so right now we just have the detections and as I showed you before in the YOLO inference it has the boxes and it has the um um uh it has the class names and boxes and yeah it's it's in the same format basically so after this we just want to run a Tracker um actually we can actually run a Tracker right here we can instead of model. predict we can actually run model. track and it would uh provide a track ID uh for the objects um but um we wanted to I wanted to also show you something before we proceed is that in the runs detect and predict to uh you can see that uh here the goalkeeper is being detected as a player and it keeps switching between player and goalkeeper and assume and I I assume that this happens because of the small data set it's just 600 uh images so we are going to um treat the uh goalkeeper as a normal player because we are not doing any special statistics for the goalkeepers um so we can just treat them as a player so in order for me to just uh not uh like overwrite a goalkeeper with a player uh I won't be able to use the track here right like this I'm going to use the predict and then uh override the goalkeeper with the player and then run the um tracker on it so in order for us to run a Tracker uh after the detections we are going to use a library called supervision so import supervision as SV and this will have the tracker so uh like I said we are going to use the bite tracker so we can have the tracker is equal to SV do sv. bite tracker uh bite track like this and this is the tracker that we are going to use um so before we start tracking uh we just need to override the um goalkeeper with the player so let's do that right now so for frame num and uh detection in uh enumerate detections so uh basically what I'm doing right now is just I'm looping over the detections uh one by one and numerate here just puts in the index of the uh list so it's going to be 0 1 2 3 4 so on and so forth um then what we are going to do is we are going to take the class names like this and uh it's going to take detections do names and we can also have the class names like this and inversed so if you if you saw it before uh like when I showed you this was like this this was like um zero was a person then one was goalkeeper uh and whatever so and so forth uh this was the format of it and I want the inverse of it so I want to have person then it maps to zero and uh and ball maps to for example two and so on and so forth this is just going to be a little bit convenient for us to uh know which is which quite easily without uh having to uh search anything so uh in order for me to do that I'm just going to get 4k and V which is key and value in uh class names do items and I am just going to switch them instead of key and value I'm just going to put the value as the key and the key as the value like this um then what we want to do is that we want to convert um convert the detection to uh supervision detections uh supervision format um supervision detection format and the way that we are going to use that is is that we're going to um write the detection supervision is equal to SV detection uh detections from uh ultral litics and then we are going to provide the detection like this um now I can show you what the detection uh has so I can just do like this right here and uh in order for me to not run on all the frames I am just going to break here um like break here like this and we are going to remove the brake when we are ready to run on all the frames so right now I just want to expose this and show you the supervision uh format so in the init in The Trackers I'm going to write the from trackers tracker uh import tracker like this and then we can write from trackers here which is this folder um import tracker like this um so yeah so we can so first we can initialize the tracker and after initializing it we can run the prediction on it so let's initialize tracker and then we can write tracker is equal to tracker and we can give it the model path which is models SL best. PT and uh then we just want to run the tracker. getet object so we can get tracks and tracker. getet object tracks and provide video frames and yeah that would be it um so let us clear the output then run again and you should see uh you should see the detections uh supervision uh in in supervision format so here it is the supervision format it is in a loop so it will uh keep on repeating but I will take only the last frame and show you guys so detections is going to be an xyxy format and it's going to be a bounding box which is going to be starting from Pixel 500 300 uh 539 um and 700 uh 01 till 600 and 700 and each one of those is going to be a different bounding box with a different object afterwards we have mask which is not going to be used also confidence which we're not going to need and then we have the class IDs so for example two here means that the first object is going to be two and um I can uh print the um I can print out the class names as well so that we can see uh what is two basically so whether two is going to be a person or um a gorke keeper a player or a ball but each bounding box right here has its corresponding class ID 3 2 1 and zero and this is the supervision format um let me now also print to you the class names like like here and uh yeah let me also break so that we don't need to print out all this again it ran and you can see the um the whole list of the classes so right here you can see that zero is ball one is goalkeeper two is player and three Is referee and you can simply say that two here which is the first bounding box is a player and three here which is the second bounding box uh is going to be a referee uh so on and so forth and you can see that this is the class name which is zero and ball and class name inverse is going to be ball as key and then zero as uh value uh so let's now convert the goalkeeper to a normal uh player and what we are going to do here is that that every time we see one right here in the output uh right now we have um this one we are going to switch it with a two so this is this is what we are going to do um so um so convert and we are switching it with a two because it's it's the player uh ID so uh convert goal keeper to player object and for object index in uh and a class ID in enumerate and it's going to be the detections uh supervision like this uh dot class ID uh like that so we can check if the CLS um class names um of class ID is equal equal to goalkeeper like that so basically I'm checking if this uh what We're looping over so We're looping over this so if I'm checking that two is going to be the class name of goalkeeper then I'm going to switch it um but uh right now we also only have this so we're going to um yeah um run it so give me a minute so here we are going to run detections supervision. class ID of object index then it's going to be uh class names inverse of person and this is going to return two um and and I'm not hardcoding it to two and one and things like that um because this way is going to be more robust so whatever the class names that you got and whatever data set that you have you can just run it and it will work fine um so yeah this is going to replace this one with a two basically so if we run it again uh you're going to find that uh this one is has disappeared and it's now a two uh whoops I I did a small mistake right here uh this is not going to be a player this is going to be uh sorry this is not going to be a person it's going to be a player so you can save it clear the output and run again so yeah uh right here you can see that the one has disappeared and is now replaced by a two and yeah that is going to be it so now we are ready to to uh write our tracker so we can write Here track objects then we can write detections with tracks then it's going to be self of tracker dot um update with uh whips it's going to be up date with yeah detections and yeah we can give it the detection uh supervision and that is it so this is going to add a Tracker object to the uh to the detections so let me uh let me just print it out and let me also not break to show you how we can track across uh uh frame so you can just run it again so the tracker is now running fine and you can see that um we have a new object right here which is called track ID and it tells us that the uh player number one is uh track ID one and and this bounding box of two is uh uh tracker two then tracker three and you can see that those trackers right here are going to be consistent throughout the frames so um it's going to match this uh bounding box with a number uh which is going to be a track ID right now you can see that it is in ascending order but note that when the users move this is going to switch according to the bounding box position so if this user for example comes here then this one should be here and it's not just um um a number in ascending order so it's it's going to assign it correctly um so yeah this is the tracker and uh right now uh the tracker is now finished and uh we just need to put this output in a format that we can utilize easily so I will choose a format which is going to be a dictionary of lists which going to have players referees and balls in different objects so that we can uh reference them quite easily so let me show you what I'm talking about so I'm going to make a tracks object right here open it and then have the players as an empty list I am going to also have the referees uh like this as an empty list I am also going to have the all as an mty list and for each uh like for each track and for each frame I am going to have the tracks like this of players then we append uh a dictionary um yeah so a dictionary and this dictionary is going to have a key for the um for the track ID like this one and the value is going to be the bounding box and we are going to do the same for the referees and for the ball and we are going to Loop over each detection with track right now which we are going to do Loop over each track so we are going to have the frame the detection in detection with uh tracks then we are going to extract the bounding box which is going to be frame detection uh dot uh frame detection of zero uh do two list uh of zero because uh the first one is detections so uh this is the zero then we have one a mask then two confidence then um three is going to be the class ID uh so we can have here the class ID which is going to be the frame detections of uh three like this and we are also having a track ID we want to get the track ID so the track ID is just after the class ID and it's going to be the frame detections of four like this um now we just want to add um like add tracking for both for the players and referees uh but not for the ball uh the ball is just going to be one ball so no need to track it just the bounding box is going to be enough and yeah we just we just know that the bounding box is the same uh it's it's only one across frames so uh we don't need to do it so we can simply say if class ID is equal equal to class names uh inverse of uh player like this uh then uh we just uh get the tracker so tracks of players of frame number of uh then track ID this Frame number is going to be the list index um so right now it's going to be zero because right now we only have one uh if if we are looping uh over it one by one so this is for the frame number zero this is going to be zero then the track ID which is going to be one of those whoops one of those and then we are going to put a the bounding box but because we might have multiple other uh pieces of information other than the bounding box for the user we are going to put it in a dictionary and just put it uh like this bounding box now let's do the same thing for the referees so we have the refere like this and for this referees uh we can just uh put it right here and it's going to be the same so frame number track ID and put it as a bounding box and right now we just don't want to Loop over the detections with tracks for the ball so we are going to Loop over the uh detection uh like without the tracks so in detection supervision and then we are going to also extract the bounding box like this um which is is going to be frame detection of 0.2 list then we are going to also extract the class ID which is going to be frame detection of uh three um then we are going to make if the class ID is equal to the ball like that um then we are going to add it uh to the uh ball CHS so like this we are going to just copy paste it like this ball and instead of having the track ID we are just going to hardcode it to one because it's only one ball now now we are done with the tracker um now we just want to see the output um but yeah before we just run the output again um this is going to be the uh our final uh output right here uh like from this uh from this function uh right here we just want to return the tracks which is going to be a list of uh dictionaries a dictionary of list of dictionaries um so yeah uh let me also tell you again what's inside of it so this is going to have um like uh for each frame so uh this Frame number like let's say that this is going to be number one uh like it's going to be this then it's going to also have uh the same for track ID 1 like this and then it's going to also have track ID two like this or maybe 21 uh no problem and after after all this after like this is only one frame this is the dictionary of only one frame so it's also going to have the same set of bounding boxes maybe um another user came in like 10 and this user moved out of the frame so it's not there anymore so this is for frame one and this is from for frame zero uh from frame zero and this is for frame one and it's going to be so on and so forth till we finish the frames so yeah this is going to be the output uh right now we run the detections and then the tracker but I would like to also save the results so that we don't need to When developing we don't need to run it and wait for like 10 minutes for it to finish so I would uh save this uh tracks object as a pickle file after we finish and if I found it I just can read it and not uh run the whole thing again so let's also add another parameter right here which is called uh read from stop which is going to be false by default which is going to make us uh run uh this uh from scratch and then we can have the stop path uh stop path is going to be just the uh output of it so uh the tracks so basically uh what we want to do is that uh we want to see if the stop path is not known then we can just save it so we can we we'll save it with a pickle um so we can just import pickle right here like this and then we can just say with open stop path then uh WR bites as F then we can pickle dump the tracks so we're we're going to uh save it right here so yeah this is how we uh this is how we are going to save it and let's also put here the reading and the first time that we run it uh it's probably not going to read it because it's not there uh but we can say if read from stop is equal to True like this and stop path is not NN and also let's check if the stop passing this so import OS so we can also see that um o s. path. exist stop path and if it exists we can just open up the stop path then load the tracks and then return it then after we return we don't need to run this again so yeah that is going to be it and now we can remove The Brak so let's now provide also uh read from from read from stop is equal to true and let's also provide a stop path which is going to be stop path which is going to be equal to stops SL track tubs dot pickle yeah this is going to be underscore not uh bracket and yeah and right now it's it's set to true but it's not going to find the um the pickle file so it's going to run it so let's also create the tops folder right there and then run it so let's just make sure that it's going to run and um yeah it's going to take a while so I'm going to cut the video and come back when it's done so now it's finished and uh you can see that it uh took a while to run but uh we can just uh run it again and make sure that it's uh reading from a file and not running so I just run it again and it should take only a couple of seconds to finish and uh yeah just give it a minute yeah so now it's finished very quickly because it went and read the file instead of just predicting it again uh so right now we just have the predictions in the tracks uh but we just want to see it and visualize it like we uh saw the bounding boxes and instead of the bounding boxes we just want to have those circles so let's now uh write the code to have the circles in place and let's close down this function and this function and let's now write Define draw annotations and it's going to take in self video frames and it's going to also take the tracks uh then it will have the output uh video frames and this is going to be the video frames after drawing the output on uh then we are going to to run the um Loop over each uh frame so we can have frame number then frame and enumerate like this then video frames and then uh we can uh start the drawing process first thing we can copy it frame. copy so that we do not uh like um uh pollute the the uh video frames that are coming in uh we can just copy it right here so that the original uh list is not uh drawn on and uh right now we just want to draw a um uh a circle uh beneath the user so let's extract the player dict like this and uh we can get the tracks of uh layers of frame number like this and then we can do the same for ball and referee so ball and refer uh referees like this so This Is referee and this is also going to be the ball and now we have the uh dictionary of the uh objects or the players and the balls and the referees um for this Frame so um we can start by drawing players basically so for track ID um and player in player dict like this dot items like this um then uh what we want to do is that uh we want to draw an ellipse basically so we can have uh frame is equal to self Dot Door ellipse and ellipse is like a uh is like a circle but with different uh diameters in um like horizontally and vertically uh I'm going to show you uh guys what it is in a bit uh but the draw ellipse function takes in the bounding box like this and it also takes in a color um and let's right now give it a color the color is in BGR so you can give it a color of red right now so uh 255 and we can also give it the track ID so like this uh so this is the draw ellipse but we don't have it so we need to write it so we can Define it so Define draw ellipse then it takes self uh then frame then bounding box then color and then uh track ID which is uh that is correct um and then we want it to be put at the bottom um like at the at the bottom of of the uh bounding box which is the Y2 Y2 is at the bottom so we can just do it like this BB box of three and we cast it to an integer um we also want to have it to be Center like at the middle uh of the X so we want the center of the um we want the center of the uh Circle to be the center of the bounding box uh x y at least so we can go to utils uh and we can create a new uh file called bbbx utils.py so bbbx utils.py and we can create a function right there called get Center of box so Define get Center of BB box and you can give it the the VB box which is the bounding box then you can give it the X then y 1 X2 Y2 and um uh you can have it as the BB box right there uh then we just want to have the center of it uh so you can just do it like this this is going to be the center of X and this is going to be the center of Y but before we return it we just going to cast it as an integer like [Applause] that and while we're at it I will also need to get the width of the bounding box so let's also have the uh function that is called get with uh get sorry uh bounding box width like this that is going to take a bounding box and it's going to return it's width so return BB box of two minus BB box of zero which is going to be X2 - X1 this is going to be basically it uh so let's expose those function outside the utils from dot uh BB box utils import get Center of bounding box and get bounding box width we can then go to tracker and then um we can import CIS then CIS do path. append then we can go one folder before it so we can go to this folder and we can from now we can just import actually from utils import um uh get Center of bounding box and get bounding box width I think I misspelled width maybe so let me just go and fix it so right here it's width and also in the bounding box right here it's called width um so yeah um now that we're done with that we can uh continue with uh writing the uh draw ellipse function so uh right now we just want to get the X Center of the bounding box so it returns X Center and Y center of the get uh Center of bounding box function but uh we are not using the Y Center anymore so let's use uh the X Center uh because the Y we're going to it as the bottom y not the center y um then we want also the width uh and the width here is going to help us to be one of the uh radiuses of the ellipse so get uh uh bounding box width like this and um uh let's draw the ellipse so we are going to Dore the ellipse with the CV2 function uh we did not import CV2 so let's import it first then CV2 ellipse then give it the frame give it the center which is going to be X Center and Y2 then give it the um uh axis and axis here is going to be just um like the U radius of circle but um since an ellipse has two then we will need to provide two so let me show you what I'm talking about um so right there we will need to provide the minor axis which is this one and we will need to provide the major axis which is the wider one so you can see that um we will need to provide uh this and this th those are the two values so we can simply say int width this is as one of the axes and the other one we can just have um like 35% of this axis like this width which is going to be um like uh the other one uh you can play around with the shape if you want but this shape is what um you saw earlier and uh we can then Define the angle and then the start angle the start angle right here is going to be uh 45 and the end angle is going to be 235 and I'm I'm going to tell you that those will mean that um a bit of the circle is not going to be drawn and I think this is uh and this is um this looks good uh for my end so you can see right here that for example this uh Circle doesn't have this bit drawn and it's because we draw from 45 and end at 235 and don't draw the other uh bit um if you want to draw it you can draw it but I think this gives it gives it like a gamified uh gamified look so we can have it like this uh yeah after this we can give it the color to color and then we can give it the thickness uh and two is quite good then we can also uh give it a line type and I'm going to choose line four this looks good for me and uh at the end we are just going to return the frame like this so right now we are drawing the players and let's uh run it uh yeah before we run it let's call it first so uh we can say draw output right here then we can draw for object tracks and then we can write output video frames equal tracker dot draw annotations give it the video frames and the tracks and then instead of uh um like saving the video frames as we H as as we read them we just uh save the output video frames so yeah that is it so let's run it again and see the output so uh I got an error right here because through annotations I forgot to return back the uh frame uh so the output frames right here so after we have the frame and like this we just append it uh append the frame then we return it back the output video frames like this so we can clear uh we can clear the output right here let's clear it and let us run it again so it's done right now and we can go and open it so you can uh go to the output video files open it right here and you can see that it is uh being uh drawn like semicircles being drawn quite correctly but I think the uh the part that is not being drawn that it is uh way too much I think I'm going to revisit the angles uh give me a minute uh so it's going to be 45 till 235 not 45 five so yeah with that being said we can just run it again let's see the output now so yeah this is the output that we expect and you can see the ellipses are being drawn on the users and um for example when you concentrate on this user and when he is running and the bounding box is wide you can see that the the ellipse is wide and then when he starts to be straight then the ellipse is going to be narrower uh because uh this is going to be related to the width of the uh user uh the player or or the object so yeah this is going to be it we're going to do the same thing for referees so we can do this as a referee like this and and uh we are going to get the referee dick and then refere like this and we are going to provide it with the bounding box and instead of it being red uh I want it to be yellow so that it can have a different color and this is just going to be 255 in the uh green and red so this is going to give us a yellow so if we can uh run it again let's see the results so you can see that the referees right now has a yellow uh uh Circle below them um so yeah right now we want to have an identifying like the track ID below the users below the players so I want to show you the output that we we are going to do so this is what I'm referring to so for example this all always has six be like below the user this always has 11 this always guy has 12 and this has 13 um so we want to add this and this will be in two steps basically the first step is drawing this rectangle the second step is just writing the number so let's do that um so uh for here uh uh we can uh go back again to the draw ellipse and now we just want to draw the rectangle so we can Define some things for the rectangle first so rectangle width is equal to 40 then we have rectangle uh height is equal to 20 and uh X1 rect is going to be uh by the way uh we are going to right now uh Define the X Y and XY positions which is the corners of the um the the top left corner and the bottom right corner of the rectangle so uh it's going to be X1 X1 uh X1 y1 X then X2 Y2 so X Center like this and I am going to which is X Center is going to be the center of the center of the bonding box in terms of X um minus rectangle width / by two and this is just going to uh move uh half of the width uh from the center of the um center of the rectangle and uh I am going to do the same thing for two so this is going to be two but instead of subtracting I'm going to be adding uh this way the B like the the rectangle is going to be centered um on the user then I am going to have the uh y1 uh which is going to be um basically Y2 which is uh the bottom of the um the bottom of the uh player then I am going to get the angle height then divided it uh by two and plus uh 15 which is just a random buffer I am going to do the same thing with Y 2 but instead of subtracting I am going to be adding and also this uh is going to be a a buffer that makes things a little bit neater and I am going to check if track ID is not none so there is a track ID then I am going to draw a rectangle like this and I am going to give it the frame then give it the X1 rect then X um X2 rect we can just make sure that those are ins like this I think they are but yeah U we can just make sure about that um we can also give it the same color that we uh got and this uh -1 uh is going to be we can replace it with CV2 fil which is going to make the rectangle fil um now we just want to write the number so um we can Define where the number is going to be written so the exposition of it is going to be X1 re of the rectangle I am just going to add a little bit of padding which is 12 pixels and if the track ID is greater than 99 I am going to make the X1 text um minus equals 10 because it's going to be a bigger number so I want it to start uh maybe a little bit on the left so that it can be also centered in the rectangle uh this is just for visual stuff but if you're getting confused uh don't worry about it just the you can delete it and the number will be a little bit on the right for the bigger numbers so in order for us to put the text we just uh write put text then give it the frame then uh give it what to write so I am going to give it that it's going to be the track ID like this um and then uh we are also going to give it the position so it's going to be um X1 text like the the the one that we defined here then uh it's going to be y1 uh rectangle uh then plus 15 which is going to be also uh some padding uh afterwards we are going to define the font side like the font type and I think uh Hershey uh Hershey Simplex is quite nice and we can then Define the font size which is going to be 0.6 um like this and uh let's also Define the um color uh it's going to be black so that we can see it quite easily and then the thickness uh we can make it two and then we return back the same frame so with that being done we can run again and see the results so I got an error here because track ID doesn't have um a default type so we can just make it none like this uh because I'm not going to give the track ID for the referees I don't care about their uh like track and individual statistics uh so I'm just not going to provide it and it's not going to be drawn um because we just check if the track ID is not none then we draw it if it's none then we don't draw it uh just save it clear the output and run again it's done now so we can go here we open it and we can see it quite nicely so you can see the you can see number six is being tracked across frames and this bounding box is saying that this is number six and this is number six so this is what tracking is uh so now we just want to have the pointer on the uh ball so let's now do that um so before we do that we I just want to to show you how are we going to make this pointer this pointer is going to be basically a triangle so let me draw it right here so it's going to be something like this um where we have three points to make this uh thing happen which is going to be this one this point then this point then this point so uh it's going to be an inverted triangle so we have uh three points so this is going to be X and then y then it's going to come up a little bit then we subtract a little bit from the X whatever and then we also subtract from the Y um so you understand what it is uh we also have like uh like this color and we also have a border and the border is going to be just the the same angle again but it's not going to be filled so let me show you how it's um like what I'm talking about so we are going to make the uh Define draw triangle we're going to take in the self the frame the BB box and the color uh we are going to define the Y which is the bottom uh the bottom point of the Y which is going to be int BB box of one and I am choosing y1 because right now we just want the triangle to be on top of the ball so this is going to be y1 and not Y2 and uh I will also want to have the X to be the accenter so we can call the same function and give it the bounding box and this is the xenter we can also right right now Define the three points of the triangle so triangle points which is going to be equal to the numpy array of the three points let's import the N array first like here like this um and then we just want to give it the X and Y this is the first point the second point like I showed you let me draw it again uh so uh right here this like this like this so this is going to be X and Y uh let's me try and write it like this and this is going to be x - 10 and uh y - 20 this point and this point is going to have the same y y - 20 but the x is going to be + 10 instead of - 10 so this is going to be it and this is also going to be uh filled like this and yeah those are the points so let me now write it here so x - 10 then y - 20 and let's have the same point but add the plus 10 and in order for me to draw it we can just write uh CV2 draw uh Contours and we give the frame then we give it the triangle points like this and uh then we uh give it the uh color index then we give it the color like this and then we say if it's uh uh filled or not so uh right now it's going to be uh filled so you can also do it like this filled and afterwards we also uh want to have a border and drawing a border is quite easy we're just going to draw another rectangle another triangle and it's going to have the same coordinates it's going to have a color of zero but it's not going to be filled it's just going to uh like draw the um draw the edges of it and yeah that is it so at the end of the day we return the frame and the function is ready so right now we can also here draw ball and then we can Loop over uh the balls so we can for track ID again we won't need the track because it's only one ball uh in ball uh dict do items like this um and uh then we just take the frame equals self dot dra triangle and then we give it the frame we give it uh the uh ball bounding box BB box like this and we also give it a color and so B grr like this this is going to be a green color and yeah this is it for the ball so let us clear the output and draw again now it's done so we can open it and you can see that the uh ball is being detected and it has the green annotations quite good um so yeah that is it now we just want to um like assign each player their team according to the t-shirt that they are wearing and we are going to do that with um like color clustering and then uh understanding which color that the user is wearing and then assign it to a team so in order for us to do that um like in a clear manner um I will start by doing it in a notebook so that you can see the stepbystep process and the output of each step and then we can move those uh we can move the same code that we are developing uh into a module that we can use um with a class uh so let's create a new folder called development and Analysis um and we can add here as much uh python notebooks as we want so we can have the color assignment. iynb and yeah this is going to be the notebook that we are going to use uh so we can return back here and let's just um take any of those um any of those players like this and then um uh like try and uh take their uh image and save it so let's save the cropped image of the uh uh of the user so of of a player so right now we have the video frames like cropped image of a player uh let's also do it after we get the tracks because uh we will know uh where uh to crop so basically um we want to have for um player in tracks of players of um zero which is going to be frame number zero and um this player is going to have uh basically um uh an item like items this is going to have a check ID and the player so it's going to have the bounding box like this and then we want to have the frame of the uh first one which is like this so we want now to crop the bounding box from the frame from frame like this and what we are going to do we are going to get the cropped image and then the frame and we are going to uh start from y1 to Y2 and um X1 to X2 then we want to just save it save the cropped image so right now we just want to save the cropped image so we write uh CV2 IM right output and then maybe videos because we have this already um and we can call it cropped image. gpg and we can give it the uh cropped image let's just import the CV2 two first because we did not import it and let's also break don't forget to break we just want one image and yeah just run it again uh so there's an error because uh I did not specify here the bounding box so just specify the bounding box and make this as an image uh but it's it doesn't matter but I'm going to keep it as an image um and yeah just clear the output and run again so it tells us that the slices must be uh must be integers so just make sure that they are integers before we proceed so like this here here and and here uh let's run again and uh see the output so now it's done so you can open the output videos see the cropped image and you can see that there's a guy that is cropped according to the bounding box and uh yeah we can now try and segment the color of his T-shirt so we don't need this anymore the save cropped image code and you can delete it like this we uh we just wrote it one time and you can open back again the color assignment and just import CV2 we are going to need that then import matplot li. pip plot as PLT and this is going to help us uh visualize the images inside the notebook and let's also import the number p as NP like this and uh let's now uh specify the image path so image path is equal to back then output videos then it's going to be uh cropped image. jpg and then it's going to be image we just read it with CB2 IM read then it reads the image um then it reads it as BGR so we can convert it to RGB and yeah that is going to be it um the oh yeah I misspelled videos right here so don't do that uh so yeah now let's uh show the image it's going to be shown with uh PLT so uh like this PLT do show and you will now going to see the image wonderful so the thing is is that we have the whole image but the t-shirt is in the top half of the image almost always because a user will not be upside down in this uh case and this use case so let's just take the top half of it so take the top half of the image and uh so top half like this um image is equal to uh image till the uh shape of uh of uh like till the height divided by two and I'm going to take the the whole X then let's also show it so I am show then this one then uh PLT do show like this then we have the top half of the image now if you get the average color of this the problem is that there is a lot of background like you can see the background pixels which is the green pixels um they are uh a lot um and we will need to uh segment m m or remove uh the background from the T-shirt color so in order for us to do that we can cluster the image into two colors so that we can have the background color um in one cluster and the uh t-shirt color in another cluster and we can simply uh get the average of the T-shirt color the average color of the T-shirt color uh so yeah so for order for us to C cluster the image into two clusters like this uh we just want first to reshape the image into a 2d array um because we don't need to have it to be um in this uh format um image 2D is equal to uh image3 shape -1 and 3 uh then we perform uh K means clustering uh with two clusters and this would require us to import uh K means from es skarn so let's import s Kearn do clusters import actually it's from import K means like this let's run it let's go here and uh let's uh now uh write K means with um two clusters and a random state of zero so that we can replicate the results and then we can fit it uh you can write it like this uh or you can have it like this if you want so have it in two different steps uh whatever you want uh then we can uh get the labels so get the cluster labels so for each pixel we are going to have whether this is like cluster one or cluster two um so labels is equal to K means of labels and uh then we are going to have uh to reshape it again to be a uh an image so reshape the labels into the [Applause] original image shape so clustered image is equal to labels do Tre shape and then we can uh uh reshape it um then we just want to display the results so display the clustered image and let's uh show it but let's not show it in a gray scale um and let's PLT do show like this oh yeah so we took the image of it but let's take the crft image top half image like this then we run it again and there you go you see that um the the the background pixels are in uh blue and are in yellow and the uh t-shirt pixels are in uh purple so now we can just segment it so we can say um okay so clusters that are this we can get all the pixel colors right here and then get the average color of it so this is going to be very easy for us but in order for us to know like if the background pixel is going to be zero or one um uh uh uh this is going to be a little bit um like we we're going to use our brains right here U because uh we don't know like CES can assign background to one or zero uh for for anything so usually the background is going to be in the corners like those four corners so what I'm going to do is I'm going to get the class for the corners and the most uh like abundant class or the class that um was um was there the most like for example right here if it's class Z 0 0 0 then class Zer is the background and class one is the foreground so yeah so I'm going to get the corners of it so Corner clusters so it's going to be clustered image of 0 0 and then 0 and negative 1 which is this one then it's going to be 1 and zero then it's going to be1 and negative 1 which is this one and uh we are just going to get the non-player cluster which is going going to be uh the number that appeared the most on those four corners so print non-player cluster like this and the non-player cluster is cluster one and to get the player cluster it's just going to be player cluster it's just going to be 1 minus non-player cluster so like this so it's going going to be zero so if it's if the non-player cluster is is is one this is going to be zero and if it's zero this is going to be one uh now this is very simple for us and uh let's now choose the color so we can get the K means cluster centers of then get the player cluster now we know that uh zero is the player that the is the color that we want so yeah you have 171 235 and 142 you can just go here write rgp color and uh see whether this color is green so we can have 171 then 235 then 142 and indeed it is green and it is the same color that the uh that the player is wearing so very close so the clustering is now working fine and we can now put it into a module that we can use to assign player teams uh according to the to their t-shirts so let's create so let's save this first and uh maybe close it and let's also create here a new folder called um team assigner like this awesome and let's also create um a file called init.py to expose it and let's also create a file called team assigner py so team assigner dopy and inside of it we are going to create the class for team assigner so class team assigner and then we Define the init and self uh you can close this exit I opened it by mistake um then we can pass it till now uh then we want to uh basically uh get the player color and assign team colors so let's first assign every team a color so let's say team one is going to get color uh White and team uh two is going to get color green so we are going to assign uh team color and we are going to have it uh for self and uh frame and uh player detections uh this is the input of it and uh we are going to get the player colors in a list so for each player we are just going to put every color into a list so four and player because we don't care right now about the track ID so uh player detection and player detections do items and then we can get the bounding box which is player detections of BB box and uh then we are going to get the player color and uh we are going to do that with another function um player uh color and we are going to give it the frame and the bounding box so uh again this function is going to be very similar to the code that we already written in the notebook so let's us uh do it so uh get player color and then it takes self and it takes frame and bounding box and right now we will do the same steps as before so we will take the image and uh first of all crop it and don't forget to uh convert it to integers and not do the same mistake that I did before like [Applause] this uh uh and after the image we can get the top half of the image again so top half image is equal to image and then we can uh get the top half of it then we want to get the clustering model and the clustering model is going to be K means self. get clustering model that will take the top half of the image and this is going to be another function again I'm just making it into functions so that we can uh utilize it and make it uh quite easy but it's the same code that I have for in The Notebook so we can get the get clustering model then self then image and uh what I want to do is first reshape the image into 2D array then image 2D is equal to image. reshape and then we can perform K means um with two clusters and this is going to be uh K means equal K means and let's import it first so uh from SK learn do clusters import K means and let's here use the number of clusters is equal to two let's also use the uh K means Plus+ and the uh n in it is going to be one um this is just um I wanted to have the least number iterations so K means Plus+ would help us to get um better clusters faster uh but it's the kind of the same clustering uh algorithm uh just uh a different way of initializing the Clusters um then we can fit it and then we can return it back like this so K means so right here we have the K means clusters is and uh we can uh return back to the get player color so after we get the uh after we get the key means uh we just want to get the cluster labels get the cluster labels for each pixel and we can get the labels is equal to K means do labels like this and just like we did last time we just reshape it back again to the dimensions of the image so uh reshape the labels to the image uh shape and it's going to be clustered image is equal to like this and uh we can get the uh get the player cluster get the player cluster and it's going to be the same thing which is getting the corner clusters first like this and it's going to be yep like this and uh we are going to choose the non player cluster which is or the cluster that appears most in the corners uh then uh the player cluster is going to be 1 minus it one we can actually have this if condition but we can also have this one minus number player cluster this is going to be working fine uh then we are going to get the player color which is going to be K means uh then get the centers of the player cluster I skimmed over this code because we already covered it in the notebook so if you're quite um like not catching up what what I'm doing here just go to the notebook and I think um it will be um more visual so you can see the output of each step so right now we have the player color uh of it so the uh player color of um of each player that we have and right now we can just add let's rename this so player colors we can take this add here do append and add player color so now we are going to have all the player colors within the first frame that we are going to pass and yeah we just now want to divide those colors into two so one is going to be white and and the other one is going to be green so again what we uh in order for us to do that we can just K means do another K means um we can cluster the uh into n and clusters which is going to be two also uh let's also choose C++ and let's choose the initialization to be one um let's also uh fit it to the player colors that we have and uh yeah then we will have two colors so uh let's first in the init uh let's define self. team colors which is going to be a dictionary and let's here uh Define self. team colors and we can Define one for K means of zero and uh two for C of one so now we have the colors for each team uh so now we have like a color for each team now we just want to match this color with the um like the the play like the player color with the team color so before moving on we can just get this K means and uh save it in the self so that we can uh refer to it later now let's create also another function that assigns um that assigns players to teams because right now we only have the color of the team so let's uh do this so let's get player team now we are going to match the um the the player with a team so self then we are going to give the frame then uh give the player BB box and we are going to give the player ID as well and uh what we do is that if player we just want to have uh basically um memory of if the player ID we already know the team of then we don't have to run the K me on it this is going to save us a little bit of time so we can have the player team dict which is going to be a dictionary where uh we are going to have the player ID and whether he is uh like whether he is in uh player one or player two like team one or team two I mean um so what we do is basically first we check the player ID like this if the player ID uh is in self. player team dict uh the first time we are on this uh like the first frame we on this we shouldn't uh enter this uh if condition uh but if we do uh we just return back the team of it and and uh if not we just get the player color and uh player color uh get player color would require us to run the K means and uh would require us to run this get player color uh function uh give it the frame give it the player bounding box and it returns back the color then we can get the team ID uh by running the self. K means which is this one model uh do predict dot predict and then uh we can give it the uh uh like the player color player color do reshape of one of1 and then we take the uh zero zero with index and because the team ID is going to be zero or one and I would like to for it to have it one or zero so I'm going to get the team ID plus equals 1 so this is going to be one or zero um so yeah uh before just returning the team ID I am going to save it first in the uh player team dict which is this one and I am going to say player ID and it is equal to the team ID so next time in the next frame if I already have it like if I already have this uh in I won't have to run the C's clustering I would just return it back and yeah that would be it so uh I can return back the team ID okay so now we're done uh we can just uh call those functions from the main and start assigning uh T like assigning um players teams so yeah let's go back to the main and uh right here we can just uh yeah before we go here we can go to init and from Team assigner import team assigner and then we can go back to main here then uh from Team assigner import team assigner and then here we assign player teams so we can get here team assigned let's initialize it first and have the team assigner initialized like this then we want to uh assign teams their colors so get the team assigner assign team color give it the first frame and give it also the tracks uh but uh don't give them all tracks let's also give it only the tracks for the play ERS in the first frame so uh this is uh this is going to get the uh colors of the uh players of the in the first frame and assign it to the teams and let me also get you back here so team colors will be uh will have uh their own uh colors right here so right now we just want to Loop over each player in each frame and assign them to the team so for uh frame number and player track in enumerate and it's going to be tracks of players like this um then we want to look over each player in the frame so player ID then track like this in um player track. items uh what I want to do is that I want to get the team so uh we have the team assigner do get player team and uh give it the video frame right now that we are standing on and uh we also give it the uh bounding box and we give it the player ID uh so this will return back the team and uh in order to save it we can just take the players like this tracks of players of uh frame number then of uh player ID uh then we assign it we introduce a new value to the dictionary that is called team and then we give it the team uh we also can give it also a color so we can have like this so team color and we can also have Team assigner uh of team colors of Team uh which basically is going to get us the uh color that we have put here in the Clusters so that is going to be it now note that in order for us to add anything to the uh tracks uh we would add um a key just a key and then add the values so right here I added it into the main but in in in future modules I will be adding it into uh the module itself so that we have less code in the main but I wanted to show you guys this is happening in the main so that you can um understand and um like connect the dots uh so right now we can uh whips we can now run it uh let's clear the output and run it from here uh so I'm getting an error right here for the team assigner and I have misspelled the init so just um uh just do the init uncore uncore afterwards and yeah just run it again and it should work Byer to uh also I think I have mispell the K means I think it should have a capital K so let me go and fix that right now uh so right here uh we can have it as a capital k um let's so it's capital K and M so let's uh do it like this clear then run again and hopefully no more errors okay so now it worked fine now to utilize the color in the drawing so what we have to do is go back to the tracker and instead of giving it a hardcoded uh color uh we just give it the color of the team so let's give it the color of the team which is color uh of player. getet team color and if it doesn't if it doesn't found any if it doesn't find any tee color we just write the uh red uh basically but um it should have uh all team colors uh in in place so yeah that is going to be it and let's clear the output and run it again now it's finished so let's go back here go to the output videos and now we can see that this uh player has a white circle and this player has a green circle which is quite correct so you can see that all players are assigned to the correct teams and now we can differentiate teams from each other uh so each player will have the team color and uh the team number um as well as the bounding box so uh right now also let me open this up and let me uh point you to the next uh thing that we are going to do um so here for example you're going to find that the ball is being detected then it is being not detected in this Frame then the next frame and then the next frame is being detected so uh the idea is is that um those two frames we can interpret where the ball is so um so for example let me uh try and uh visualize it for you guys so right here we have this detection then we have this detection in uh like in um in a next uh frame in like in the uh like this this is frame number one frame number two it's not being detected frame number three it's not being detected but frame number four it is being detected in so the idea is that uh the ball is going to move in a straight line so it's going to move like this this is the ball so it's it's going to be easier for us if we can just say okay so in this like this distance divided by two and just put a rectangle in uh in like here and here which is basically the same line that we are going to be moving in and it's going to uh divide the distance equally between the frames that are missing so that we can put it right here um this would help us to uh add in the missing values of the ball um and have our like our detections more complete um now this is uh now we can do this by hand but there is a faster way to do it with pandas so I'm going to show you that so we are going to go in the trackers again and let me close down all those unnecessary uh files so that we can uh do it uh clean things up a little bit and let's let me create a new uh a new function that is called interpolate bows and interpolation is just missing in the missing values and uh pandas is going to help us uh interpolate those missing values because it already has a function for that so let us interpolate uh ball positions and the way that we are going to do that we are going to give it the self then ball positions then we are going to give it the ball positions um we are going to convert the ball positions format into a data frame a pandas data frame so uh in order to first do that we just needed to be in a list so for X in ball positions what we want to do is that we want to get uh uh to get number one which is the track ID one and then uh if there is no track ID then it's going to be an empty dictionary then we are we want uh the BB box and if there is no BB box because it's an empty dictionary right here uh then we put an empty list and this empty list is going to be interpolated by the pandas data frame also while we're at it let's import pandas so import pandas as PD uh then let's convert it to the uh pandas data frame so ball positions equal PD do data frame like this give it the ball positions and then give it the columns to be equal x X1 y1 then X2 Y2 and then we interpolate missing values which is going to be uh DF ball positions like this then DF ball positions do interpolate like this um now we have interpolated it uh just the there's an edge case where the if the missing detection is the first one then it's not going to interpolate it um so um we can like do that by replicating the the nearest uh detection that we can find so what we can do is that we just can backfill it basically but interpolate should fill in almost 99% of the missing uh detections and back filling is just for the first frame or the first two three frames something like that if it's missing then we can uh return back the B positions to the original format which is the list of dictionaries of lists um which is going to be like this um so let us Loop over it first so for X in DF do ball positions dot 2 numpy do 2 list and uh I am going to put it into a dictionary where one is going to be the track ID then the value is going to be a dictionary of bounding boxes and the value is going to be X then we return back the ball positions uh so this is it for uh the ball positions and now we can just go to the main and call it there so let's go back to the main and after we do the tracks we can just uh interpolate ball positions like this and uh what we can do is we can give it the tracks of ball which is equal to tracker do interpolate ball positions and give it the tracker uh tracks. ball so that is it so we can just run it so now it's done and uh and we can go right here and now we can see the output so you can you can like uh in the previous detections you saw that here it started to not show so right now you can see that it is showing in every frame and it is quite accurate so this is a filled uh those two are filled uh positions and you can see that those filled positions point to the ball quite exactly Al although this might not be the case if you can go right here and you might find that in some scenarios um it might be a little bit laggy like this uh but it is close enough for use case and in most cases it is um it's going to be pointing at the ball correctly now what we want to do is we want to um assign uh which player has The Bard right now so uh I can go and show you the final output video again and show you that the player that has the ball has a red triangle on top of him like this one and BAS basically what I want to do is that uh I want to replicate this logic again so you can see that um the player that has the ball basically has um has this red uh triangle on top of him so what I'm going to do in this scenario I am just going to see the closest uh the closest foots um uh the closest foot position to the ball and which player does it belong to so in this case it belongs to this player so this player has the bar right now and um so the idea here is that um if the ball is far enough like this uh I don't want to assign it to anyone so I am going to also have um a maximum uh distance between the player and the ball for assignment and if it increases on top of it then no one will have the ball uh till it is again very close to the to the to the player till uh so it can get assigned so in order for me to work on this I would need to have also a new folder let's create a new module uh that is called player ball assignment so we can create a new folder right here that is called player ball assigner like this so this one is going to have a file that is going to be in it again and it is going to have a file that is called player ball assigner dopy and in this player ball assigner we are going to write the logic for this and it's going to be uh super simple uh so it's going to be like we we want to import sis then from yours import uh we we will want get Center of bounding box and uh yeah so for we are going to create the class player ball assigner like this and we are going to have the init um in the init we are going to set the maximum distance between the player and the ball so max player ball distance to be uh 70 uh pixels anything above that then the ball is not going to be assigned to anyone um then we have a function that is called assign ball to player uh this this function is going to take in the self the players and the ball box BB box um so for uh so I will need the ball position so ball position uh which is going to be the center of it so get center of a ball box um and then I am going to Loop over each player like player ID in player in players. items then I want to have the player BB box which is going to be player of bounding box um then I want to have the distance like here uh I want to have the distance of this foot and the the distance between this foot and the ball and the distance between this foot and the ball and this is going to help me uh like get the distance to the nearest Foot so if the user is running this way or running this way um I will get the minimum distance between them so uh in order for me to get the distance we are going to go back to theils then uh bounding boxes and then I'm going to write a function called measure distance um so I'm going to Define measure distance like this it's going to take P1 and uh P2 and it is going to return the distance between those two points uh which is going to be return and uh then uh it will be like this P1 uh yeah so this is this is the correct one so this is the uh equation for me to get the distance between any two points um and it is just the the the points then uh like power two uh X is power two then plus y power two then um get the root of that um again let's return back to the uh uh utils in it and then let's add this which is going to be measure distance and let's get back to the ball assigner let's also add the measure distance and let's then uh return back uh to the uh for Loop that we were in so we are just going to have the distance left which is on the left foot so measure distance and we are going to get it the player BB box of uh zero and uh player uh BB box of uh -1 which is Y2 which is the bottom y so this is going to be the X and the bottom Y and then uh this is going to be one point then the other point is going to be the ball position like this and the distance is going to be this is distance left then we want distance right which is going to be instead of zero it's going to be two uh then the distance is going to be the minimum distance between the two like this um then I want to get the closest player so in order for me to get the closest player I need to get the minimum distance so like this and assigned um assigned player which is going to be negative 1 uh when it's uh like uh at the beginning it's going to be negative 1 so if the distance is less than self dot Max uh This One max player uh ball distance then we can uh start proceeding on and see if the distance is less than the minimum distance then minimum distance is equal to distance assigned player is equal to the player ID the then at the end we just return back the assigned player and this is how we assign uh players to bols so we can go back here and say from um player ball assigner uh import the ball assigner then um here we can also import it so from player po assigner import player assigner um and then we can just write here uh like assign ball to assign ball acquisition basically acquisition so we can have the player assigner which is equal to player uh ball assigner which is uh we're just initializing it so team uh let's let's ignore that for now but uh let's have the four frame number in player and enumerate player tracks for tracks of players like this and then we are going to get the uh ball uh BB box and it's going to be the tracks of ball of frame number of one which is the uh uh the track ID then the box and then we want to get the assigned player so assigned player is equal to player assigner do assign ball to player give it the uh player track then the ball box and uh player assigner I think I misspelled it give me a minute yeah I I misspelled it like there is a G that I needed to put so this is the uh assigned player that we assigned so if the assigned player is equal to1 then there is no player that was assigned so uh we can basically uh don't do anything but if there is like this so if the tracks of players of then of frame number of um ass assigned player um then we will make um a new parameter in the dictionary called has ball and we will make it is equal to true and yeah that is going to be it um so we are going to have this has ball uh this has ball attribute and according to it we can also add something in the draw annotations so let's open it go down to the draw players and we can simply say if player. getet has ball then um if we don't find it we just return false uh then we can just draw the rectangle self dot draw triangle sorry and we are going to give it the frame we are going to give it the uh player bounding box and we are going to give it a color of red like this and let's clear this and run it again so now it's finished so we can open it and see so yeah you can see that uh it's working now quite fine oh yeah sorry I opened I think the wrong uh video let me open the correct one uh the output videos this one and yeah we can see the uh red one uh the red triangle is being assigned quite correctly and yeah I think we can move on so the next part is going to be uh this part which is going to uh assign the team ball control so it's the percentage of time that the team has the PO has the ball and you can see that team one because the white one is having the B most of the time right now and it's going to increase till the other team gets the ball so uh we are going to return back to our code and we are going to add it uh in the main right here so we can have something like team ball control and it's going to be simple so I don't want to make a new module for it but uh yeah we just can uh uh have it right here and we can draw it in the same uh tracker do annotations but yeah if if you are doing things uh by the book it's better for it to have its own module so we can have this uh uh Team ball uh control. append and we can add the um the team the the the the the player that has the ball the the assigned team that has the ball so tracks of players of frame number which is the one that we are say then the assigned uh Team or assigned team player like this one and uh we are going to get his team so if his team is one or his team is zero we are just going to add it into this list so this list has the frame number and the frame number is going to be assigned uh a team and else uh we are just going to add the um like this we are just going to add the last one that is going to had the ball so if there is a pass or anything like that um we are going to include it to the uh last person that has the ball so yeah so this is it this is the uh Team board control basically um so this array will have uh in each frame whether uh like team number one or team number two has it and like I said we can uh put it in the draw annotations so we can just do like this like here and the draw annotations function uh we can add code for it to have this uh semi-transparent rectangle uh so that we we can um uh sum yeah so we can have the summer transparent rectangle at the bottom right position and we can write the statistics uh there so at the bottom before we append uh the um before we open the results we can write draw team uh b control and we can have the frame self dot draw team bow control that takes in the frame and it takes in the frame uh number um as well as the team bow control so team bow control and uh Team B control needs to be taken inside uh as a parameter like this and let's now uh draw it so we can go here Define and then uh this uh that it takes in the frame frame number and Team B control so frame frame number and team Bol control and let's now um like um draw the semi semi-transparent uh rectangle so draw uh sui uh transparent rectangle like this and we can have an overlay of uh like uh this overlay is just going to help us with the transparency but we are just uh going to draw a normal rectangle uh on the overlay like this and we are going to make the position of it as this X and uh this um so yeah um in order to find those numbers I just played around with those numbers till I found something that I like uh but it's like U like finding the position so you can draw whatever rectangle keep moving it till you find the position that you like and uh it was white so the the rectangle was white like this and it's filled so it's -1 or you can write CV2 field if you want uh then um you can have Alpha uh and this Alpha is for the transparency and it is 40% then uh in order for us to uh add it to the original frame we take the overlay the Alpha and the frame we add it together and uh yeah now it added a semi-transparent uh rectangle um now we just want to calculate the percentage of time that the that a team has the ball so we can have the team bow control uh till frame and uh we can have the team bow control like this and uh give it the frame number + one so this is going to be just a list till the frame number that we are standing on um then we want to get the b control for each team so get the number of time um each team has the ball each team had the ball so team one num frames is going to be uh which is going to be uh like this and uh we are going to choose Team B control and uh to like off uh where the uh where the value is going to be one and then we are going to make it shape of zero like this so This requires it for the uh for the list to be a numpy so if we didn't uh make it a numpy so we have to make it so yeah right here this is a list right now so just convert it to a number Pi array like that and also import nonp here so we can use those uh functions with ease uh let's return back to the uh tracker so right here uh we can use this with e if it's a NP array so that is working fine now let's do the same thing but for a team two so just copy it paste it underneath make it team two and make this also team two and then uh you can make the statistic which is team one is going to be team one number of frames over Team 1 plus uh Team two then uh same for team two and right here so this is going to be uh basically it for the statistics now we just want to write this so CV2 dot uh put text then put it on the frame what do you want to put we want to put uh something like this so team one uh b control then we want to add um team one uh multiply by 100 uh to make it a percentage then uh we can just limit it to two decimal points and then add the percentage sign then we are going to add a position which is going to be um 1,400 and 900 then we are going to add the uh CV2 and um specify the font so it is going to be simplex and we are also going to specify the color and the thickness and let's make the thickness uh three let's do the same for team two and put here team two and let's make it uh 950 in y so that it can be underneath it and yeah that is going to be it are we returning the frame no we're not so let's return the frame back and now we returned it so we can clear this run it and see the output uh so numpy is not defined so make sure and when you go back to main import numpy as NP just save it clear the results run it again and you should get it so uh now it's done so you can go back again to your folder uh go to Output video so you can open up the video and see the team player acquisition and right now only the white team has the ball till the ball is reached the the goalkeeper and the goalkeeper here is being detected as the opposite uh player uh opposite um opposite opposite team because he doesn't have uh the color white or green and we don't have to bother a lot with the goalkeepers right now so we can just hardcode him to be the white team so let's fix that and come back again so in the team assigner um so we just uh here here in the assign team uh Team assigner uh on the left team assigner team assigner right here uh you can just say if the player ID is equal to 91 which is the uh player ID of the um of the uh goalkeeper then we can specify that the team ID is equal to two we can save it we can clear the results and run it again so uh now it's done and uh we can now see the results so um we can see that actually it needs to be team uh one not two uh that is a mistake so let me change it right here and let me also be more confident in clustering the player colors so let's make it instead of in it uh one just make it 10 and run it again so that we can uh get the results so now it's done we can open it again we can see the output right here and we can see that the the goalkeeper is being detected as the white team the other goalkeeper is also being detected as green so everything is good here and we are good to go and we can see that the percentages are moving quite fine so everything is good right here uh so the next step right now is to measure the uh camera motion as you can see the motion of the camera is being uh like the camera is moving all around and because the camera is moving the bounding box will move even if the players are not moving so we will need to um counteract the the the player bounding box movement with the camera movement so that uh we don't exaggerate the player movement um this will be essential when we are trying to measure the player speed and the distance that the player have covered uh because it will be contaminated if the camera is uh moving a lot okay so now what we want to do is we want to uh make the camera uh movement module and let me explain what are we going to do first and then uh uh go back and uh codee it so the idea right now is that I want to detect some features and specifically I'm going to detect Corners right here for example this corner Corners at the top and Corners at the bottom so I'm going to choose um basically features from the top here and features from the bottom and ignore the features uh from the middle because everyone is moving in the middle uh and hopefully those two things shouldn't move in reality uh and the only thing that will make them move is that if the camera moved so I will choose this part right here and the part that is down and try to extract some features um and those are going to be Corner features and basically uh from each frame I am going to the detect how much they moved with Optical flow so I'm going to try and see uh how much it moved right here how much it moved right here and this uh this movement this feature movement is going to be the camera movement at the end so it is going to be uh quite simple so let us open it again here and create a new module called camera movement so so let's open up the uh new folder called camera movement uh estimator then we are going to make a new file called init then we are going to uh put the init here then uh we are going to make the camera movement uh estimator estimator py and this will of course have the class uh so let's create the class so class camera movement estimator which is going to be having uh the init function like this uh which is going to have the self and we are going to pass it right now uh then then we want to also have a function called get camera movement so Define get camera movement that is going to take the self then it's going to take the frames uh then uh it's also going to uh read from stops because this is going to be a time consuming uh a timeconsuming operation so I would do it only also the first run and then save it and then the second runs would read from the toop so read from stub is equal to false and then stub path is equal to none and because we're using uh because we're going to save it then it's going to be pickle so I'm going to import pickle and yeah so that is it uh right here we are going to read uh the stub when we have it uh but let's ignore that for now and let's now uh create a camera movement per frame so for each frame we're going to calculate the camera movement so [Music] camera movement is going to be movement for x and movement for y so this uh first list is going to be for the frames then the second list is going to be for the X and Y positions then I multiply this um with the length of the frames that we have so this is going to be 0 0 for all frames at least uh when we initialize it then I want to extract uh I want to uh sorry um convert the image into a gray image so that we can and extract the features so I am going to call it old gray because it is going to be the previous frame uh so I'm going to call all previous frames as um as old so CV2 dot uh CVT color then uh frames of zero uh then we also have the CV2 color RGB to Gray and let's also import the uh CV2 like this and afterwards for the old features let's also extract old features so from this image we are going to extract features so we are going to say CV2 then uh good features to track this is going to extract Corner features we we also give it the old gray and we can also give it whatever parameters we want but H let me uh let me explain the parameters so um let me also put it in the init so uh for the parameters let's here take a frame like this and let me also uh this is going to be the first first frame so first frame gray scale let's also make it as a gray scale uh I would put the features in the end it just to make things uh more uh uh cleaner so CV2 CVT color and uh convert from RGB to gray then I am going to just uh choose the uh banners uh like uh I am going to choose this part uh as my uh extracting of features and also I'm going to choose this part uh to extract features and it's going to be before what I write here so uh you can ignore this so those two places shouldn't move much like this and this so I am going to choose them so uh mask features which is going to just uh choose the top and bottom so it's going to be np. Z at the beginning and let's import numpy as NP uh so NP of zeros uh like and then I give it the uh the gray image then I give it the mask features which is going to be uh this one and uh uh first 20 uh first 20 uh rows of pixels so I'm going to take it like this and I want to have also the last ones which is going to be 900 till 1,50 uh so this is going to take the banner from the up and banner from uh the bottom then self do features is going to be add that is going to be uh first of all we are going to specify the maximum Corners so maximum Corners is going to be 100 this is the maximum amount of Corners that we can utilize for the good um for the good uh uh features then the quality level is going to be 0.3 and of course the higher the quality level the better the features but the lesser the amount of features that you can get so 0.3 is quite uh good for our use case uh then we can specify the minimum distance between the features which is going to be uh three pixels then we are going to specify the block size which is the search size uh of the um of the features then we are going to give it the mask that has the top and bottom um like rectangles so that it knows where to extract the features from and right here from the good features track we can just get here self features and because it's a dictionary then we can have the double star before it so that it can expand the dictionary into the uh parameters let's also remove this extra bracket and right now we can uh loop over each frame so frame number in uh range one because we already used the zero here so this is the previous frame use I said and this is going to be the current one uh then length Len of frames like this then I'm going to get the frame gray uh CB2 dot frame numbers and make it uh gray scale uh then I want to see where the new features are so so the new features is going to be here so new features then I'm going to use Optical flow and give it the old gray uh basically then the gave it give it the uh new uh new gray which is the new gray image then I want to give it the old features and then I want to give it some parameters so um it's going to be self do uh LK params and let us also fill it up so in the init let's also fill up the LK parameters so right here you can have the self do uh LK params as a dictionary and the first one is window size which is the window that we are going to search like the size of the window that we are going to search so wind size is equal to 15 and 15 uh then uh we for this we have pyramids for this Optical flow we have pyramids that is going to downscale the image to get larger features so in order to do that uh we have a max level of two so it can downscale the image uh up to twice um so so uh this is going to be big enough for our use case so uh no need for us to increase it on top of that so uh uh for the stopping then we Define the stopping criteria and it is uh it is either we uh loop over like we um Loop over it and search the number of times that is in the stopping criteria or we have 10 times uh we looped 10 times but didn't find anything below uh sorry above the quality score that we have have so uh either of those uh will be our stopping criteria and yeah this is going to be Optical flow so it's going to provide us with the uh new features basically and we now want to measure the distance between the old features and the new features and to see whether the distance is going to be um uh like whether there is distance the camera movement or not so in order for us to do that we are going to measure the maximum distance because each uh image or each frame is going to have multiple uh features and we just want to get the maximum distance between any two features so maximum distance is now zero and the camera movement of X and the camera movement of Y is going to be 0 and zero and for I in new and old so in enumerate and we give it old features so new features so uh yeah so let's just uh swap them so new features then old features like this and we can then get the new features uh point so we can know the the point of it so we can get a new travel and then um uh old features also do travel and then we can get the distance between the features and it's going to be measure distance and give it the new feature points and then the old feature points and it is going to be from utils from so uh import CIS and sis. path. append and it's going to be one before that from utils import measure distance like that and uh here we have the distance and if distance is greater than the max distance then max distance is equal to distance and then we can get the camera movement X and camera movement y uh we can just do it like this uh or we can just have a function in the utils that can do this for us so let's uh create it so the Define measure uh XY distance then uh give it um the uh 01 then 0 2 and we return back um we return back the distance of Y the distance between the two uh x's and the distance between the two y's so it's P1 of 0 minus P2 of 0 and P1 uh and P1 of 1 minus P2 of one uh this is going to be it we just expose it in the init like this and then uh we return back to the camera movement import it right here uh like this and we just have it like here and give it the uh new features than the old features and yeah that is it so it's going to be like this uh let's just uh swap also all features with new features uh because yeah this is the correct one um now we just want to just make sure that um the camera movement is not so little so we can Define uh something like self do uh the minimum camera movement that is required so minimum distance which is going to be five so at least if it moves five then it's statistically significant for us if it doesn't move five then we can consider that the camera didn't move at all so uh basically we can say if max distance is greater than self do minimum distance like this and then we can specify that the camera movement of frame number it's it's going to be camera movement of X then camera movement of Y then we are going to make the old features equals to the uh CV2 dot uh good features to track then give it the frame gray and give it the uh features parameter and this is going to get the features for the current frame that we are on and put it in the old frames because now we are going to Loop and it's going to be the previous one and uh at the end we are going to get the override the old gray with the frame gray. copy and uh yeah that is going to be it at the end we just want to return the camera movement so this camera movement is going to be returned right here uh so that is going to be it and yeah uh the only thing that is missing right now is that we are not utilizing the stops so let's add it and uh uh let's add it in so if stop path is not none then we pickle dump it like this then at the top let's also add if read from stop and stop path is not equal to none and the OS let's import OS and the file also exists um then we can save it uh then we can uh read it and yeah this is it for us so we can open again the init here and expose it so from camera movement import camera movement estimator and we can go back to the main so we can have here from camera movement import camera movement and right now we just want to calculate the camera movement and then we can actually also draw it so uh uh let's let's first um uh add it so camera movement estimator right here [Applause] um and you can initialize it with the uh first frame uh then uh camera movement per frame is equal to the camera movement estimator then uh get movement of video frames then um we can make it read from stop then uh we can specify that the stops to be uh stops of camera movement uh let's say stub of pickle so this is uh going to be it but before we close it I just want to also draw the camera movement so that we can see it um so right here we can just have here draw camera [Applause] movement and output video frames is equal to the uh camera movement estimator do uh draw camera movement and give it the output frames and then give it the uh camera movement per frame uh of course this function is not uh done yet so we can go and uh uh work on it so right here we can have a new function called Define draw camera movement and it is going to have the self then frame then camera movement per frame like this then I'm going to have the output uh frames is equal to an empty list uh then I'm going to uh run over each frame so frame number in frame do enumerate um then frames let's call this frames right here and it's going to be frames right here uh then I'm going to frame equal uh frame do copy in order to not contaminate the what uh was uh like uh inputed to the function and we are going to utilize the same overlay that we utilized before so I am just going to skim over it basically which is frame. copy then I'm going to use the rectangle function and give it the overlay give it 0 0 then make the position uh 500 and 100 then uh make the um color of it to be white like this and uh keep it as fil which is -1 and I'm going to specify also the alpha which is going to be 0.6 this is um for transparency then add weight overlay and Alpha to frame and uh this is going to be the uh White uh rectangle so also I am going to have movement X movement so X movement uh and Y movement from cab movement per frame and take the frame number and uh then I am going to uh put some text which is going to put the camera movement for x and the camera movement for y so CV2 dop put text and is going to put the frame then I am going to put a camera movement X then I want to put the X movement and I want it to be only two decimal places so like this and um I am going to uh put it at the position of 10 and 30 uh then I'm going to also give it the uh font uh type and also give it the uh size the color and the thickness of it I am going to do the same thing for the Y so just copy paste it at the Y here and add the Y here and um also uh we're going to push it a little bit down so to be shown um and then we are just going to add it to the output list and return it so this is it so we return back and make sure that we are utilizing it and draw camera movement is just draw camera whoops draw camera movement so uh I think it's it's fine right now uh so everything is good let's save save and run it again so in the main I have put a dot right here just remove it uh clear the output and run it again so I had two mistakes right now so the first mistake is that I didn't put the zip here so you need in order for you to Loop over any two lists you have to zip them first first so zip them and then uh you will be able to do it and in here I did uh like I put the bracket right here you need it you need to put it uh right here so that you can multiply by the N of uh frames so this is now uh done and you can open it right here and you can see that the camera is not moving and as soon as it moves you can see that the values is also moving so this is working uh fine right now um so right now we just want to make the uh player POS player positions robust to the camera movement so the first thing that we want to do is that we want to uh get the player positions so we want to add the positions uh to the tracks first and then we want to subtract the camera movement from the player positions so in the tracker we are going to add a function called add positions to tracks so let's do that right now uh you can go here open the tracker and go here to any place I like uh uh it doesn't matter but I would prefer to have it at the top so that I don't write between the drawing uh logic add uh position to tracks and then I give it the self then I give it the tracks and uh then for object object tracks in tracks do items and uh four frame number in uh track in numerate and give it the object uh tracks uh and uh we are just going to get uh the player position relative to the bounding box so let's also have the track ID and the bounding box it's not going to be the bounding box right here um it's going to be the track uh like um track info and um is also going to have a bounding box so let's get the bounding box of it like this track info of BB box um now we have the uh bounding box now we uh can also have the uh if object which is what We're looping over over here is equal equal to uh ball uh we want to have the position to be the center of it so let's get the position to be the center of the bounding box and uh like that and else uh we want to get the foot position which is uh the center of the bounding box but the uh the center in terms of X but the bottom in terms of Y so we can add also a function right here that is called get foot um position that is that takes in the bounding box and it is going to uh basically have the uh end position of Y and the middle position of X and let's also have it right here as an integer and um yeah uh where am I I think I'm in the tracker um yep so yeah so uh position then get um get foot position sorry about that so we have to get foot position and there you go and at the end of the day we just want to um write it basically so we're going to have the tracks then the object then the the frame number then the track ID then the position is going to be the position so that is going to be it and uh in order for us to utilize it basically we go back to the main and we just after getting the tracks right here we just write tracks uh uh sorry tracker do add position tracks and give the tracks and we can also have a comment that is called get object positions let's run it and make sure that nothing broke so uh it it uh we have the uh get foot position we need to put it in the uh init to expose it and then we can clear we can run it again we can also go back to the main so now it's working fine uh what I need right now is that I need to go to the camera movement and adjust the positions that we have just uh added um to the uh to the tracks um so right now it it's working fine and now we just want to go to the camera movement again and adjust the positions that we have just added right here and adjust it according to the uh camera movement so you can go back to the camera uh move mement estimator and we want to also add a function called add adjust positions to tracks that is going to adjust the position according to the camera movement so Define adjust uh positions uh to tracks and this is going to be uh self then it's going to be track and then it's going to be uh camera movement per frame and add here so don't forget to add adjust camera mve into frames so uh the idea here is that we are going to Loop over each of the uh objects so each of the objects and each of the tracks we are going to Loop over them then we are going to Loop over each uh frame in the tracks and for each uh frame we are going to Loop over each track ID we are going to get the position and uh then we are going to adjust the position so this position adjustment uh was uh done uh uh in um was predicted wrong so let me uh write the correct one so camera uh uh movement is equal to the camera movement per frame of the frame number then we have the position adjusted which is equal to the the position of zero which is uh the X uh plus uh sorry we we have to minus it um then we also have to minus that so um um so we are going to subtract the X from the camera movement X and the Y from the uh movement of Y and let's call the camera movement correctly like this so that it can be readable and then after we do this we just uh assign it so we have the tracks doobs of frame number of track ID then we call it position adjusted and add the position adjusted right here and in order for us to uh call it we uh we will go back to the main and we will go uh right here underneath the camera movement so let me uh copy it um camera movement estimator like this do add adjust positions to tracks give it the tracks and give it the camera movement per frame and now it should be working fine so let me clear the output and run it again so everything is working fine now uh it run without any crashes and the position now is adjusted according to the camera movement so uh we are good now so we are ready to move to the next step and the next step is going to be the um uh view Transformer basically we are going to do some perspective transformation so that uh we know basically how much a user have um uh have uh like uh moved in in meters and in the real world so I'm going to tell you what exactly what I mean um so in this image uh you you know that uh this is the middle of the court so in reality uh this middle is going to be uh like the number of meters here is going to be exactly the same as the number of meters here so um if this camera was actually um uh was doesn't had any uh like was point was shot at an angle uh that is perpendicular you can have uh the uh you can have the pixels corresponding or like uh very proportional to the actual world but uh here you can see that this half court is only 214 pixels while this half court is 543 pixels so if a user here uh if a user here move 54 pixels it's going to be the same as a user here that it moved around uh 200 pixels so with this in mind we will need to adjust this um adjust this to be um to be basically uh uh the to be in the same uh way that it is presented in the real world so we will need for the computer to understand that those amounts of pixels is equal to those amounts of pixels so this is what we call perspective transformation which is going to basically let me uh do this uh it is going to do something like this uh it is going to take in uh this point this point and at this point this point and this point like it is going to take this sh trapezoidal shape um and convert it into a rectangle so that all this uh all this place in blue let me get it for you um this place in uh blue which is this one that is only uh 200 pixels out of 700 this is only 200 out of 750 even so this blue uh this blue height or this blue thing is going to take in half of the rectangle like this and this um let's say orange this orange bit right here press okay uh so this orange bit right here is going to take in the rest of the rectangle so this is what is called a perspective transformation where um where where the pixels doesn't correspond one to one with the uh actual distance being covered in meters so we will need to uh transform it somehow into this rectangle shape where uh where the where the place here is is is going to be uh proportional so this place is going to be half of the court and this place is going to be half of the Court which is in actual reality and in actual meters this is exactly what is happening um so this can be done uh with also open TV where we can give it um uh some things like those uh positions uh of the uh red trapezoidal shape and we can also give the uh positions or the meters of this um uh of this rectangle and uh this rectangle is quite easy to get so again this is the width of the quart and we can from a quick Google search we can get the width of the C and I am going to get it with you guys so this is going to be the width and we can write it for example um give me a minute I'm going to get it so this width of the quart is going to be 68 uh Point uh this is going to be 68 M uh the width of the court is uh 68 M and the uh court length is also like this one uh can be also calculated quite easily because we know the width of this uh of of this half court and it is divided equally between those uh rectangles so uh 1 2 3 4 5 6 7 and 8 and N9 so we can divide the width of this uh with u the uh width of the rectangle so uh yeah so let me open this up and write football Court uh Dimensions so yeah uh so let uh so let us do a quick Google search with football core dimensions and you can see that it is 105 M uh times 608 so 105 is the length and 68 is the width so again this width is 68 M and uh all the height here is 105 so we can open a calculator me like this uh so half a quart is going to be 52 M and because it's divided equally between those rectangles like 1 2 3 4 5 6 7 8 and 9 so we can also divide it with nine so one uh so one rectangle uh like this is going to be 5.83 uh M basically um so if we are going to have um like the same one that we uh that I showed you before so we are going to have the same trapezoidal shape uh which is going to be starting from here here here here and here so uh we can have this uh trapezoidal shape and we can know uh exactly how uh like the width of it is is 600 but the length of it is going to be uh 5.8 uh three which is going to be the uh length of one rectangle of those uh then Time 1 2 3 and 4 which is going to be 23.5 so if we know those two pieces of information which is the uh those vertices those trapezoidal vertices and the actual dimensions in meters for the uh for the thing that we are measuring we can transform it to have uh for those to have the positions according to uh basically a rectangle which is going to be the real world position so yeah this is what are we going to do right now is to um convert the adjusted position of after the camera movement to real world positions so that we can then track uh speed and distance quite easily so let me close this uh come back here and uh create a new folder called uh view Transformer and let's also create a new uh file that is called init like this py and let's create a new uh file also that is called uh view trans for. py and in this view transformer. py we can create a Class View trans for that will have the uh uh init like this and it will have the actual diamensions basically so the quart width like we said is going to be 68 M and the quart uh length is going to be the 23.5 uh 23 32 M uh this is the actual uh meters of it then we can Define the um uh the vertices that I showed you of the trapezoid so we can have the pixel uh vertices uh equal to NP and let's import NP what at it numai as NP uh array like this and uh we can give it uh like the um positions of the points so the positions of the points I knew them uh because because I just uh tried a couple of points and just mve them till I uh reach the correct point that I want uh but you can actually do a program where you click and get the uh like print the uh print the pixel vertices um or you can basically try and estimate it uh with another Tool uh with a screenshot tool or anything like that but what I did is that is that I just drew this vertex and just keep moving it so for example I started with 100 and I was like okay 100 is far so let's try 105 and then 110 it was on the spot so I kept it and and so on and so forth and yeah so let's define the rest of them so it's going to be 265 and 275 um let's also Define the rest uh 910 we're 260 and the last one is going to be 1,640 and uh 915 uh this is going to be the uh pixel vertices now let's also have the target vertices so self. Target uh vertices uh which is also going to be an ire and it is going to start from zero and uh till uh cord width uh then it's going to be 0 0 and then it's going to be quart length and then uh zero and it's going to be a quart length also and then cord width uh th this is a rectangle so this is basically a rectangle and this is a trapezoid and this is what we want the trapezoid to be after we transform it um then we can just uh cast it to be a float so for this we can have this like this and we can get the pixels do as type float and we can do the same thing for the Target vertices like that then we can uh initialize the perspective transformation Transformer so this is going to be CV2 do get perspective transform and let's on Port CV2 get perspective transform and then give it the uh pixel vertices uh followed by the Target vertices and there you go you have the model that will uh switch between the pixel vertices to real word vertices just like that um afterwards we just want to uh transform the uh adjusted points to the uh uh to the uh points after the transformation of the prospective transformation so what we do is that we can just do something like this trans formed uh position to tracks uh so add transform position to tracks which is going to take self and is going to take tracks and is then going to Loop over let's take it like this uh objects and tracks and then it's going to Loop over the frame number and the track and then the track ID then the track info and we are going to take the position adjusted like this uh then uh we are going to uh transform this position to an npire AR like this and uh then we want to transform the position uh so the position transformed is equal to self dot TR form uh Point like this and give it the position uh so this is how we uh transform the position and let us uh if the position uh transformed like this um is not none uh then we can have the position transformed which is going to be equal to the uh position transformed uh dot uh squeeze because we are going to return it back to a uh numpy and then we are going to um uh basically uh write it uh in this so tracks object object then frame number uh track ID then position transformed then the position uh transformed so this is going to be it and we can uh put none here and then uh deal with it later so the only only thing that is left is that we need to write the transform Point function so we can go up here and write trans transform point and this transform Point uh maybe maybe this is not going to cut it so uh we can basically say uh p is equal to int of uh point zero then we can have the int of 0.1 and we can also put it into a tuple like that so that's the point and uh let's check also if the point is inside the trapezoid and if it's not then we can ignore calculating the speed of it so is inside so we can have the CV2 do points polygon test and give it the pixel vertices and give it the point and uh say that it is greater than zero and if it is inside if it is inside uh not inside like this uh we can just uh return none and if it's not inside so sorry if it uh if if it's not inside then we return none and if it is inside then we reshape get the reshaped point um which is going to be uh Point dot reshape and uh we are going to reshape it in a format that is uh uh read by the uh prospective Transformer uh this is just uh some reshaping it's not going to do anything to the numbers um and we are we want to also have it as float uh then we transform the number transform point and we do a CV2 uh dot perspective transform and then give it the reshape point then give it the uh perspective Transformer and then at the end of the day we just want to return it back and and reshape it before we return it so do reshape um this and uh do 1 and two so this is going to uh calculate the uh the view trans like the uh the position uh relative to the actual Court um in meters so uh before moving we can go back to the init and write uh from view like this from view Transformer import then uh we import The View Transformer I have written it wrong Transformer like this um and yeah uh we can go back to the main and we can also so try and import it so from uh view Transformer import view Transformer and let's also uh copy paste it right here and after the camera movement estimator we can add the view Transformer and we can have the view Transformer right there and uh let's call it add uh transform positions to tracks and you can give it the tracks and it would uh uh calculate the transform position of the tracks uh we have a small error let me trace it back and come back to you so in the init I misspelled the Transformer again so just copy and paste it right here clear it and run it again also in the main you can have the view Transformer uh writed correctly and uh come here and write also write it correctly here and clear it and run it again then I have I have another issue which is I added uh curly braces here I should have added square brackets so you can come to the view Transformer and in this array you can just add uh square brackets instead of dictionaries and you can also make sure that nothing has this mistake uh in the rest of the code don't think so then clear and run it again so I currently I have another error which is a transform point so you can just uh copy it and paste it right here I'm just copying I'm just writing it so fast so I might make uh spinning mistakes so you can clear it and run it again and also I misspelled uh pixel vertices so you can just also copy it right here and paste it right here so uh do this and clear and run again and I also uh did not spell perspective Transformer quite correctly so you can also do that so everything now is working fine and the last step of this uh project is just to calculate the speed after we have added the position transformed um now it's in meters so it's going to be quite easy to calculate the me the the distance covered and the uh speed in kilm per hour uh so let's do that next uh so let's close down uh all of those and let's also close down those and just to clean things up and make things more tidy for us to develop in then uh let's also create a new folder let's call it speed and distance uh estimator like this uh let's also uh create a new file called init uh do uh py and let's create a file another file that is called speed and distance St made t. py um let's go here calculate uh uh sorry create a class called speed and distance um estimator like this let's define the init function and inside the init um we are going to calculate the current speed of the uh user which is going to be uh within a certain uh frame number so let's call this a frame window and let's calculate the fra like the speed of the user every five frames so Frame Window is going to be five and let's also Define the our frame rate which is going to be be uh 24 uh frames per second uh 24 frames per second uh then let's uh have a function that is called add speed and uh distance uh to uh distance like this to tracks then add the self then also add the tracks and let's now uh try and calculate the speeds for each object so uh this is not correct so we can just uh uh do this for object and then object uh tracks in tracks. items and uh we want to uh detect only the speed and the distance for the players so if the object is equal equal to uh ball uh or object is equal equal to a referee uh then we continue we don't need to do anything but otherwise it's going to be a player and we want to detect the uh the speed and the uh distance of the player so what we want to do is that uh let's get the number of uh frames which is going to be length of objects object tracks and let's now Loop over the frame number then uh in range uh from zero uh till the number of frames uh like this right right here and uh let's uh have our batches of frame window so this is going to be zero then five then 10 and so on and so forth it adds Frame Window every Loop it does add one it just adds uh five uh then we can Define the last frame because sometimes when you add it uh add this Frame uh like the the last frame might be + 5 or it might be less than that so we can get uh the minimum so frame number plus Frame Window which is self. frame or a maximum of number of frames so whatever is the minimum of those and this is just to not go out of bounds um so for track ID uh and uh let's uh take in the object uh object uh tracks of frame uh number uh dot items like this uh we can then if track ID uh not in uh basically uh uh object tracks so if it's not in object tracks of uh last frame then uh we can continue uh this basically says is that if the track is in the first frame but it's is not in the last frame then we continue and we don't do anything uh with with this uh player calculation but uh in order to measure the distance and in order to measure the um the speed uh we will need to for the user or the player to exist in the first frame of the of of this batch and the last frame of this batch so uh then uh we can simply quite easily say the start position is equal to object tracks of uh is going to be frame number of track ID then it's going to be position transformed and this is going to be the uh position in meters and we want to have the end position as well so the end position position is going to be not for the frame number but for the last frame and it's going to be position transformed as well and for any reason if the start and the end position is going to be none and it is none when the user is outside the trapezoidal shape that is outside the uh view Transformer uh then we don't want to measure the speed because we measure the speed only in the trapezoidal shape which is most of the uh which is most of the qu anyways and most of the frame anyways so if the this one is none or end position is none we just continue um otherwise we just want to measure the distance covered measure distance covered and uh we can do that uh like um uh with the utils function measure distance so import CIS sis. path. append and then uh put this then from utils um import measure distance and yeah this is it so we can come here and say measure distance and it's going to be start position and end position um then uh time elapsed and it's going to be the last frame minus the first frame over the frame rate which is the 24 frames per second and this is going to be time elapse per seconds then the speed M uh per second uh because now we have the distance covered in meters and this is uh seconds so the distance in meters per second is going to be distance over time now we want the speed in kilometers per hour because this is uh more widely used and in order to convert from m/s to km/ hour we just multiply by 3.6 and then we just want to save the uh total distance uh of the user so uh at the top we can just have the uh total distance total distance uh of this uh and we can make it a dictionary so uh if if object so if the object is not in the total distance we can make it a a dictionary and then if the track if the track ID is not in the total distance of object uh then we can uh start and uh put the uh track total distance of object of track ID and then initialize it with a zero and then we can simply add the total distance so of object of track ID plus equals uh distance covered and let's put this a z0 and um then we basically want to have um like some annotations basically label the speed and label the distance so that we can go ahead and write it so uh we can just go for frame num batch um in range from uh so basically from frame num to last frame then we want to have if the track ID is not in the uh tracks of object of uh frame number batch then we uh continue uh otherwise uh we just want to have the tracks of uh the object like this of the frame number batch of track ID of speed to be the speed in kilm per hour and let's do also the distance so we will have the distance right here like this to be the uh total distance of the object of the track ID like this and yeah that would be it so this is adding the speed to the packs and last but not least we just want to um just uh add this to the annotations so we can go here and uh write draw speed and uh distance and uh we will just write the distance and speed uh underneath uh the player so we can go self video frames and then we going go tracks and then we have the output video uh output frames which is equal to an empty list then for uh frame number and frame in enumerate uh basically uh frames let's call this frames like that and uh then we can Loop over the objects so uh so we can Loop over this so for objects and object tracks and the uh tracks do items then we can see if the if the object is not bolds or referees uh if so we can continue without doing anything and if it is a player then we continue uh we take the track um let's call it uh track info and we were not going to do anything with the track ID right here so uh and let's uh Delete the rest so uh track info H in object tracks of frame number of items and then we can call if speed is in track info uh then we can uh write it down so speed is equal to track info dot get and then speed and if there is no speed we can just put none and we can do the same for the uh distance so distance right here distance right here and if speed is none or distance is none we can continue uh otherwise we just want to uh write the speed and distance below the bounding box so let's get the bounding box first so this is going to be track info of bounding box uh let's get the position where we want to write it so it's going to get foot position get uh foot position like this and uh get foot position of BB box and we wanted to also have uh basically um a buffer underneath the uh underneath the uh bounding box so we will just add position of one we just want to add it uh like 40 pixel buffer so that way it can be a little bit uh at the bottom and not overlap with the circle or the rectangle that have the track ID uh and then we can um basically make the Tuple um like this like cast it to a tuple and a map and then position and then we can put it as a text so CV2 let's also import CV2 at the at the top import CV2 so we want to add the text like this and we want to put the frame we want to have the uh speed so like this speed and a we want to put uh 2 F so we can only um have the uh two decimal points and kilm per hour we add kilm per hour uh then we add the position then we add the um uh font uh font type then we add the font size it's going to be 0.5 and then we add the color let's make it black and let's add the thickness to be two and and let's do the same thing for the distance and this is the distance right here and instead of uh like the uh position um we can have a little bit uh like the same position but a little bit down so we can have the uh same um X but for the Y I am just going to add 20 like that and everything else is going to be the same and uh yeah so at the end we just want to uh uh append it and then return it basically so this is it and let us expose it so from dot speed it's not completing so I'm going to uh copy it import speed and distance estimator so we can copy this and we can go here we can also import it don't uh don't forget to remove the dot and now we can use the speed estimator in the main um so you can go uh right here above the assigned players and we can uh speed and distance estimator like this and we can write speed and distance estimator to be speed and distance estimator and um we can then add speed and distance tracks to the tracks um and at the bottom uh we just want to now draw it so let's draw the speed and distance so right here you can draw speed and distance and it is going to be speed and distance estimator to draw speed and distance and uh basically give it the output frames and tracks so save it and run it so uh there is an error right here that has the uh last frame is out of range and uh right here uh we just uh need to uh subtract one from the length of the frames so uh right here we can just go here and subtract one and yeah because it's zero index so we can run it again and see how it goes so it's now done let's go and check the output uh we can open it from here output videos and then output here and uh you can see the uh output of the kilom per hour but I don't see the distance uh the distance is is not uh correct I think um let me trace it and come back to you give me a minute so the last problem that we had is that I used to do this position of zero then plus two and then I needed to have position of one cuz this is the Y and you can also uh change this kilomet per hour I used to have it right here 2 m and then if you run it again uh it should be uh now working fine so give it a couple of seconds so now it's finished and you can open the uh final output again and you can see that um the uh users in the trapezoidal area is being detected for the uh speed right here and then you can see if the user is running like this one the speed is 17 km/ hour and then when he slows down it goes down as well uh and then this is the total number of me that he covered um and yeah so this is the whole project and uh in this project uh you did a lot you uh you did um you worked with object detections and you also trained your own object detector and uh with yolo which is a state-of-the-art model you also knew what are trackers and how you can utilize them and you also knew how to Cluster an image into different colors and how you can uh specify what color you want from an image you then worked on a little bit of logic for ball acquisition and know how to do a little bit of statistics on that for a team uh player movement um you also knew how you can calculate camera movement and how you can um like deal with a camera movement for estimating U uh players's uh speed and meters then you also worked at the view uh uh perspective Transformer that would help us to move from a camera uh like shifted view to the normal uh real world view of uh in meters and then we calculated the speed and the uh meters covered for a user and yeah this is all the project it's a beefy project that is going to beef up your resume and yeah if you if you stuck through all this project then you have ton of experience and whether you're Advanced or you're a beginner or you're a machine experienced machine learning engineer this is going to be very important for you congratulations for staying this long and yeah see you see you in the next one bye

Transcript for:Football Video Analysis with YOLO

Transcript for:
Football Video Analysis with YOLO