Transcript for:
T20 Cricket Data Analytics Project Overview

T20 Cricket World Cup was finished just few weeks back with England claiming victory over Pakistan, and today we are beginning a cricket data analytics series using same T20 World Cup data. We will begin this project by scraping data from ESPNcricinfo website. Then doing data cleaning and transformation in Pandas, and then eventually building dashboards in Power BI. Before we begin any technical work, let's look at the exact problem statement and our special stakeholders that we got in this project. The whole atmosphere is charged with one title. Thursday [Music] [Phone Ringing] We like your cricket! Surrender, Earth! You need to fight 8 billion people for that! We destroy! Then you get nothing! Negotiate. You defeat us in cricket, you get Earth! If you lose join me as an intern! Deal! Tony, we got to save Earth give me your best 11! Best 11 what? Data analysts? No, cricket players! As you saw Planet Sporta has challenged planet Earth to play cricket, and Nick Fury has assembled the secret agents of field to work on this project to find out the best 11 players based on the T20 World Cup Cricket data. Tony Sharma is in charge of this project. He is not only a senior data analyst, he's also a cricket subject matter expert. So, in the next video you will see what kind of algorithm Tony Sharma comes up to pick this best 11 team who can go and defeat the aliens. At the end of the video, we are going to give you a challenge based on this project and by working on that challenge you will be able to win exciting prizes. So make sure you watch till the end. Nick, this is the requirement you gave me. You said we don't know the strengths and weaknesses of our opponents, but give the best eleven from the planet. So, this is what I'm going to do: I'm going to give you a team that will score 180 runs on an average. At the same time, this team will be able to defend 150 runs. So you have a margin of 30 runs to play. Do you think that's enough for you? Yes, this sounds like a good target because although aliens have not played Cricket they have a technology where they can learn things really fast. Okay, so this is how I've done this: I've asked I've made different positions for players, and I've selected parameters for each of them. I'm gonna quickly show these parameters to you and let me know if you know if you want to add some parameters. Okay. You know anything you know just this is just a idea I want to give and how I'm going to select the team. The first ones are the openers they are going to open the innings, these are the power hitters. So, they will be hitting the balls out of the park, and they will also score runs, it's just not about hitting. So that's how I've selected the parameters, batting average, strike rate, uh boundary percentage, very important where you should be scoring 50% of the runs, over 50% of the runs by boundaries because in the powerplay the fielders are going to be inside and they should be hitting outside the inner circle. That's the plan and uh they should be giving us at least 50 runs this partnership, and uh within the first five hours. Got it! I, so we'll basically get players like Sehwag correct? Nick, Sehwag isn't playing anymore. Oh okay, I see. Okay? All right, then we have the middle order or the anchors. So here yeah they won't be hitting ball hitting the ball as hard as openers, but they can shift gears and hit the balls if needed. So, they will have a better batting average, they will bat for longer time that's why I've also included the average balls faced. You know they're also bat you know like aggressively if they want. So those are the kind of players you'll have here. I'm going to select three players for this position, so overall we have five players now. So, these five players will give us at least 120 plus runs in the 13 to 14 overs, that's the plan. I would like to see Virat Kohli in this five list. Oh he is playing. I think he's certainly a part of this team. Great! And of course we are we are making this you asked uh the special requirement is to consider the World Cup T20 2022 for uh you know for the selection of the team, so I'm select I'm picking only that particular data for that. Okay! All right, and uh so this is going to be an interesting role uh we I'm gonna get one player for this role, a finisher role. So, if we are chasing, this particular person should be able to hit like crazy, go berserk. But if you lose wickets early, like if you lose a middle order so badly, so this person will be able to stabilize the innings and order the rest of the lower order batsmen. So that's the kind of player we're looking for, we're looking for more of a batsman here than a bowler. So uh you know ideally I would need it all rounder here. But again, uh with more uh batting all rounder, rather bowling all rounder and this will leave us with five more players and uh my seventh and eighth player will be all rounders or the lower order batsmen. So, mostly these can be spinners and uh because the the last three slots, I'm going to keep it for fast bowlers. So, these spinners are the ones who can also hit and they can hit uh without thinking. Like they come to bat they just start hitting. So that's the kind of players we want because they will be mostly coming under uh around 17th or 18th over, and uh so that's why I have selected uh you know I kept the parameter in such a way, the batting average should be at least 15 and you can see the strike rate is more than 140. So, if even if they score 15 runs they should be scoring that in around eight or nine balls. And you can see the bowling economy is really good it's less than seven, which means if these two players ball all the 20 overs, they will be giving only 140, which is good for us because our team target is to keep the defense under 150. And their strike rate is less than 20, which means if they ball 20 balls, they will get one wicket at least. If they ball all the 20 overs they should be getting at least six tickets. Um nice! Yeah and uh here Comes the specialist fast Bowlers because we need to rattle the team, we need to rattle the the Sportans you know, I don't think they should be having capability to face these bowlers. This is the wolf pack we've got. They are threatening, they can take wickets at least every 16 balls or even less than that. This is the parameter I've said for them and they can ball very fast, and they can also you know ball dot balls over uh 40% which means, if they ball four overs around 10 of that will be dot balls, which is great for a T20 game. Exactly, dot walls are so precious for T20! Yes and with these 11 players I think I'm pretty sure we'll save the Earth. What do you think? The algorithm looks pretty solid to me. I am excited to see the final results which our Power BI dashboard can produce. Yeah. So this is already in progress. So, I'll be uh you know I'll be showing you the you know the final dashboard, and we can do an analysis together to pick the final eleven. Looking forward, thank you! Thank you, Nick! See you then, bye! Folks we had our resources on GitHub, but we have moved to this resources section on Codebasics.io because the file sizes were getting very big. So go to Codebasics.io click on resources here, and you will be able to find all the files here. So, here you can click on download and then you can just create an account for the Codebasics website and login once you're logged in you will be able to download all these files. Now you can download whatever files you need, sometimes people may not need all the files although we are going to provide download all button later, but this way you can download just the required files. Here, for example I am going to download my web scrapping code and once it is downloaded, you can open it, unzip, and here you will be able to see all those JS file that we need in Bright Data. Similarly, you can download rest of the files as well. [Music] Folks, why so much fuss? Just use something readymade. For example Bright Data! [Music] Bright Data uses proxy networks for building web scrapers that works seamlessly. They have various solutions, such as residential proxies, web unlocker, and so on. They also provide readymade datasets. The tool that we are going to use specifically is called data collector and if you use the link which is given in the video description below, it's a special link for Codeasics, you can log in here it says work email, but don't worry you can use your personal email ID and still login. Just select some values from these two drop downs and you will be able to create an account. Once the account is created, if you look at the dashboard you will find that you will have $15 balance. I want to thank Bright Data for giving this $15 free credit for all Code basics viewers. For our data scrapping work we need hardly three or four dollars. So, this is more than enough! You can use this credit for your other projects too! We are going to capture four type of data for our project. Number one is this detailed match results table. So we will scrap this entire table. When you click on this particular scorecard link, you will get a detailed bowling and batting scorecard. So, we are going to grab all these tables as well. And then when you click on any player, you will get player specific information. So we'll grab few fields from this also. I'm going to click on user dashboard and go to something called a data collector. Now data collector is web crawler basically which will go to the website and collect the data. I have already created the four collectors. So, this one is for the match results. So, let's say this page you want to grab that right, let's look at that code and understand how that works. So, when you right click and say inspect, it will show you the HTML tags for that particular page. And you can click on this, and let's say you are getting this particular table right or this particular let's say cell. When you look at this, see this is one row, this is second row, third row and so on and this particular table is inside this tbody and the table class is enginetable. So, now let me show you my collector. So, I will go to my collector here and say edit code and in this one you will see the interaction code as well as the parcel code. Now this is a JavaScript code JavaScript is better suited for web scrapping because a JavaScript code runs inside the browser. So, it becomes easier for that. So, here this particular link that you're seeing is nothing but this particular page. So, what I'm saying in my collector is go to that page and then collect parse. Now when you say parse, it is going to execute all this code. Now once again this is a JavaScript code, where I am locating the enginetable table, then I am going into tbody and then tr. data1 So, if you look at here see there is an enginetable then trt body and then tr.data1 and this one is an array. So if you go through this array one by one, you'll be able to get all these records, so and that's exactly what I'm doing here. See, the first element is team one, second is team two, then winner and so on. So if you look at any row here see the first one is team one, second one is team two, you can see on the left side then the winner, the margin, the ground and so on so that's exactly what we are doing here. And then it will grab all this data into match summary array this is Javascript array basically, and it will return that okay? So let me just run this here. It will take some time but you will see that in the browser the page is loaded. So it executed this navigate function band here and now it is collecting all this information. So if you look at it in the output you will see this information is now available as a Json, and if you download that Json file this is how it will look. Now, let me just format it so that you can see it properly. You can see that now you know you have team one, team two, who is the winner, margin, ground, match date, and the scorecard. So, we grab this particular first table in its entirety, we have the entire match result for T20 World Cup. Now the way Bright Data collector works is they're going to use a smart proxy Network okay and using that infrastructure, it will do web scraping so that you have seamless data collection without having to worry about some website blocking your IP, and so on because it is sort of like VPN, it is using different IPs and you will not have any trouble in your data collection process the kind of trouble that you have when you are using plain Python script. If you're new to Bright Data what you can do is you can use one of the templates. So for example, I can click on develop a self-managed uh collector, and you can use one of these templates. See here you are collecting data from uh Quora for example, here you're collecting a data from YouTube okay? So, you use that template and you can run that collector and you will get an understanding of that. I have provided the code for all these four collectors on GitHub. The link is in video description below. So, if you look at the web scrapping codes see here is the batting summary, here is the bowling summary, so these are the code that you have you have interaction code and you have parcer code. Now let me show you batting uh summary web collector. So, here I can go to batting summary, click here and then advanced action, edit the code. Here what we are doing is this first page that you have is this page only okay? So, you're going through that match summary page, and then you are collecting all the links. All right, which links? So, these links see when you right click and say copy link address, you get this particular link right? So we are collecting all those links in this particular code all right? Then we are calling next stage. So, in the next stage it will execute this particular code okay, and this will be the link of that mass score, and if you want to see it from the previous run I have see I have all these links which I collected from the first stage. And in the second stage what you will do is you will go through this scorecard. So, for example this is my scorecard and if you inspect this particular page and let me click on this element and go here right, so if you do this one you will find that you are collecting data from a table called CI scorecard table, and if I check that in my collector here you will find that, here I am collecting that particular table and going through my first inning, second inning, etc. This code is not as hard as you think. It is just HTML, you need to have basic knowledge of HTML. You're going through those HTML elements and you're just trying to grab data from it. So, you're going through an entire list, all the scorecard one by one, and you're putting uh that summary here okay? And when you click on this it's gonna take time. What it will do is it will load this page, then it will go to this link, then it will you know grab all this tables, the batting tables, then it will go back it will it will go to the second link collect all the tables and so on. When you click on this it will do a sample run for one or two matches. But if you want to run this collector uh you know for the entire data collection, what you can do is you can click here and you can say initiate manually. So, it will run the whole collector. You can also set it on a schedule or initiate by API in some Python code as well. I am going to show you match results execution because it doesn't take much time so you click on this and in the delivery preferences, you know I'm going to type in my email ID here. So, it will deliver the data as a Json to my email ID. All right? So, just say initiate manually, start, and see this starting now it will run internally. It is using smart proxy framework and it is going to ESPN Cricinfo website grabbing that data and once data is ready it would have sent me an email. All right! When I check my email, I got the match results, and when you click on download results it would have downloaded this particular Json file, and when you look at this Json file see I have the entire match summary okay? So, this way I ran all the collectors and grabbed all the data, and I have put that data on GitHub in T20 Json and CSV file all right? So, this is Json files, if you want to directly get the data you can get it from here, but I highly recommend that you use Bright Data for data collection because data collection is super important part of any data science or data analytics project. When you log in make sure you are seeing this $15 credit and then you can go to data collection, collectors and create your first collector by clicking on this open IDE. So when you do that, you can just say start from scratch you're creating a blank collector. You're not using any of these templates and when you see the JavaScript UI what you will do is you will use the code which I have provided, you again check video description carefully for all the download instructions. You will have this file, for example T20 World Cup matches results.js so open that file and it has two sections: interaction code and parser code. So, for interaction code I can just copy paste this here, and you know use that code here. Whereas, for the parser code, copy paste just that particular portion here in the parser code, and that's it! This is your collector, it is ready okay, and you can run it, verify it, you can just say finish editing you can give the name to that collector. So, let me just say finish editing here. So, it takes few seconds and then your new collector is ready. You can edit collector name you can call it whatever T20 World Cup match summary okay? You saved it if you just cancel that icon, you will see that collector. Now you can run that collector manually. If you want to edit the code you can once again go inside. So, this way you create four different collector. Again, I have provided the code to you. If you need any help you can click on the help button, you can do learning you can watch Bright Data tutorials on YouTube as well. We are done with web scraping part, and the files that we extracted from ESPN cricinfo are available in T20 Json files folder. Now, if you want to use these files readymade check video description below you should be able to download these files. And if you look at these four Json files for example batting summary, we have this batting summary Json element for each match. For example Namibia vs Sri Lanka So, there is one element in this array. Then the second element in this array is UAE Vs Netherlands and so on. Now, when we pull this data in Power BI it would be beneficial if the data is available in CSV file, something like this where you know it's just a single flat table and you have cricket match and the corresponding batting score right, and then see here also there is UAE and the batting score and so on, and here is the batting position. So we need to do transformation basically. We need to transform this Json file into this particular CSV format. You will also see an additional column. For example if the player is out or not out in the Json file we did not have that information okay? Uh all we had was let's say if dismissal is blank that means the player was not out. But if the dismissal had some string that that means the player was out. So, we have to do this data transformation and Python pandas is probably the best way of performing this transformation. If you don't know about Python programming language, you can go to Codebasics.io super affordable, easy Python course for total beginners, you can follow that. Then for pandas, search Codebasics pandas tutorial you find my playlist which is very popular, and just watch first you know maybe six or seven videos, and that's it it will take you less than one hour. Assuming you have now some Python and Pandas knowledge, now let's start a jupyter notebook and work on the transformation. I went to my cricket analytics folder and launched jupyter notebook by running this command and it looks like this. See I am at a location where T20 CSV files in Json file directories are there at the same location. I am going to create a new Python 3 notebook and import some necessary libraries. We are going to use pandas and other library called Json and then I will open the first file which is the match results okay? So I will say match results as F and then data is equal to in Json you can just say json.load file pointer, and it will load that data for you. So, let's print what kind of data it loads. So here I have the match summary, remember the match summary table. So, let me show you the Json file so that you get an idea. So, my match summary is this one where basically it's a match uh who won the match right? So between these two Namibia won the match by this run and so on. And if you look at match summary element. See it's it's just one element and if you look at this array this array also has one element okay? So, let's look at that element. So, first of all data array has only one element, you can confirm it by see printing just one element, and then data 0 and then match summary if you print that's your main list where you have all the match results. So, what I can do is I can create a dataframe out of it and I will just say df_match is equal to PD . dataframe in the data frame you can supply the entire list as an array, and if you print that head of that dataframe see wonderful! So, my dataframe is created and I can just quickly check how many elements it has and there is a method called shape, when you do that it prints the shape basically 45 rows and 7 columns and I'm going to do a few processing steps which is let's see, I will use uh this scorecard as kind of a key of this particular data frame. What I mean by that is I want to treat this as a match ID so that I can connect with other tables because I have other tables and when when I import them in Power BI I need a way to link them. Sort of like primary, foreign key in SQL terms. So, I want to treat this scorecard which is a unique ID basically as a primary key for this. So, let's uh do that. So, I will just rename that column I will just say dfmatch. rename and how do I want to rename the column? So, I want to say that rename scorecard as match ID axis is equal to 1 which is you know you have column axis and rows axis and when you print head, you will see the column is rename. If you're not getting this just hold on later on you will understand why I am calling it a match ID. Now, once I have this particular dataframe I want to export this data into a CSV file okay, so that you know all the data all the data is in single CSV with nice columns. But before I do that, let me process the batting summary. So, the other file that I have is batting summary okay, so I will say batting summary you can add markdown columns, markdown rows in the notebook and let's see how this one looks like. So, batting summary has this particular format let's see in the notepad plus plus. Where is my batting summary huh? So, batting summary first of all the outer array has multiple Json objects, you see these are all multiple Json objects. If I open the first Json object that has a element called batting summary. See, it has just one element batting summary, and in that batting summary there is a score of one match. So Namibia Vs Sri Lanka see all the players number one player, number two player, they are in one element and if you close this and open the second element okay, you will find UAE Vs Netherlands match. So, all these matches are presented by one single Json element inside my array okay? So now what I will do is I will go through that those records, so I will say for record in data and this each record is one match and one match has multiple records for the player right? So if I print for example record batting summary, what I will get is the batting summary of one match. So Namibia Vs Sri Lanka okay? So this, this so I'll get total I think 11 records and if you want to just append everything in in one array right, because our eventual goal folks is to get a single list. Basically, I should have a single list where all the matches are present and if you want to create a single list in pandas, what you can do is I can create a list called all records, and then I can just keep on appending. I can just say all_records.extend and when I say extend, it is just extend is basically you have one array, and you are appending another array after that. So, let me just show you so let's say you have an array called a okay, and if you have another array called b, and if you say a extend b and if you print a you get this. So it's just joining those array okay? So similarly here I am joining all the records and after I join all of them, I can print all records. See I got all the records in a single flat list. So Namibia Vs Sri Lanka if I just scroll down I will see UAE Vs Netherlands and so on and from this I can create a dataframe dataframe like this, and when you print see I get the this is a continuous so if you print tail of it you will see the final match Pakistan Vs England I don't know if you have seen that it was pretty interesting match, World Cup final 2022. Now, that I have a dataframe ready uh let's let me look at I just want to do some analysis I'll just print let's say first 11 elements or so. And see I want to do couple of things here first of all here the dismissal column, I want to convert this column into out or not out okay? So, I want to have a column like this which tells me if the player is out or not out. And the way you can do that is look at the dismissal column and if you don't find anything in it, if it is blank, it means the player was not out. So, let's first do that and how do you do that? Well, you can create a new column called out or not out in pandas just by doing this based on the dismissal column. So, my dismissal column is this okay and on this you can use apply method. If you've seen my pandas tutorial you will know. So you can say on dismissal column apply some transformation and create a new column called out or not out. So this is the new column, this is how we create a new column, and on the existing column I want to apply some transformation. What is that transformation? Well, Lambda x Lambda is this is just a short way of writing Python function. So, here I am going to use ternary operator. So, I'll say player is out if X when I say X you you're getting each value from this column okay, so if let's say X is this if it is out, if the length of X is greater than 0, else it is not out okay? So, I will let's see okay df is not defined obviously. I am used to writing df all the time. So, you see uh you have out and not out, and if I print few more records you will see whenever there is a blank it is not out. Otherwise, it is out okay? All right, now that I got out not out I don't need this dismissal column. So, I can just drop that column okay and this is how you drop it you will just say drop which column you want to drop and in place is equal to true. If you don't specify in plus is equal to true it will not modify that dataframe. It will be a new dataframe okay? So, when you run this, dismissal is gone, and all you have is out and not out, and you have another column called batsman name which has some issues, which is see this kind of special characters you have. You want to remove this special character and you can just simply locate all the special characters and then you know apply some Lambda function. For example you can say dfbatsman.apply this you can use regular expression, you can use a replace function, you can do n number of things. But just to keep things simple, I will just uh you know I found this as well as there are there are other records where I found this character. So I'm just removing them and now you see in Kusal Mendisa, uh Mendis actually, you don't see let me see, you don't see that particular character. So, let's see yes see here you don't see that extra character. All right! Now, how do you connect this particular dataframe, with the match because as I said for our visualization purpose later on in Power BI we need a way to link all these tables. Now, just carefully notice these two tables okay? So, this is one table that I have okay, and this is the second table that I have I can just use Snipping Tool and just kind of take a screenshot of that so that you get an idea. So, let's say I have this particular table right and then I have this another table. So, now let's try to match these two. So, I want to now connect these two tables. How do I do that? See I have Namibia and Sri Lanka right? I have team name and here I have Namibia Vs Sri Lanka I mean that is the only key I have between these two tables, so that I can join them or I can link them basically because I have scorecard here, but I don't have a scorecard here in this particular table. So though yeah the only thing I have is match right? So, here it says Namibia Vs Sri Lanka here it says team one and team two now I can say maybe take team one and then use Vs in the middle and then do team two right and that way I can join them. But the problem could be in in this table the names could be reverse right? So here let's say it's Netherlands Vs UAE whereas here for example uh let me print this See here I have UAE Vs Netherlands, see so if I just simply say Team 1 Vs Team 2, it's not gonna work. I have to use both the combination team 1 then Vs then team 2 then team 2 Vs and team 1 so that is the way I can connect, and for doing that I need to go back to this code and create a kind of like a dictionary okay? So dictionary like this. So let me just do that here, so I want to create a match IDs dictionary okay and the dictionary will look something like this: So, let's say there is Namibia Vs Sri Lanka okay? So, let's say it looks something like this, and then I can have maybe a match ID as a value right? And then I can have same thing but in the reverse order. Why I need this because the order is not guaranteed. So I need to have both, and then I have let's say Netherlands Vs UAE correct? So, let's say I have this kind of Python dictionary where I have Team 1 Vs Team 2, Team 2 Vs Team 1 and the match ID then that is going to be helpful. So I I can create that dictionary by going through so let me just remove this, I think you've got an idea or let me put it in the second cell. So, I can say for index row in dfmatch .iterrows So, there is a function called iter rows which will go over each row one by one and then every row has Team 1 and Team 2 okay, and you can use that as your key one. So this will be your key 1 correct, Key 1 because see Namibia Vs Sri Lanka so I'm creating that by doing this and then first doing Sri Lanka Vs Namibia you can have key 2 So Key 2 is nothing but just you know you're reversing the order: Team 2 Vs Team 1 and then you want to add that into this dictionary. So, here I will say add this into my dictionary key 1 okay so key 1 is this right and what is the value you are adding the match ID, so you will say row match ID and you will do the same thing for key 2 and when you look at now match_ids_ dict it looks something like this. See every team, their original order, reverse order and the corresponding match IDs dict Now, you'll be like okay so how do I use this dictionary? Well, see okay let's go back. So, the way you use this dictionary is to bring match ID column in this particular dataframe. So how do I bring match ID column in this particular dataframe? Well, this has a match column right? So, I can look into my match IDs dictionary. So you see if I for this match I want to get a match ID how do I get it? So I will say match IDs dict and this okay in quote, so if you give this see you get the match ID! So, I can create a new column in this dataframe okay, I can create a new column in this dataframe call it match ID equal to batting. so I want to apply transformation on this match column so I will say on this column apply transformation map is sort of like is a function similar to uh apply and you are going to give this mapping. So you are just mapping it basically okay, and when you do this you get match ID. See, now you found a way to link these two tables okay? And then you will export this particular table as is into a CSV file, and the way you can export this table is by doing this. Okay, I have that file open by the way that's why. But I'll just say temp.csv for example and when I see temp.csv see in my CSV files folder, see temp.csv and it looks something like this: it is same file as I was showing before basically it is this file okay? Now, in the interest of time I'm not going to go over the entire transformation for other files too because the code can be bigger and the tutorial will get so long. So, I'm going to share with you the entire notebook you can take a look later on. But see here I first process match results, then I process batting summary, and at every point I am exporting the file. So I exported match summary first, then I exported the batting summary, and then I exported bowling summary. See bowling summary and the player information. Now player information the name I have given is no images because for player I need their image as well. So, I have two files like player CSV with no images. So let me show you that. Let me remove this stamp.csv and see I have dim players.csv and no images so no image is something that my notebook is giving okay, and you can see there is player name, there is team, batting style, bowling Style player role and the description. Some players don't have description but some players do. So, and in this file we have manually added their images so we collected their images manually, and we added it here in dim players.csv So, if you open that file you will see an extra column okay? So I have dim players open looks like already so in that we have an extra column called image, and if you click on this image you will see the player's picture basically right? See if you click on any any image you will see that particular player's image all right? So, I hope you had fun and now in the next step we will be importing this CSV file into Power BI I'm going to launch Power BI desktop. I have already installed this application. If you don't have this you can just YouTube and find the installation instruction. So, Power BI desktop click on that, close this and just say file, save as you know just go to downloads and just give some file name. I will just say T20 cric and then you will go to get data click on more and import the entire folder of CSV files. Now, if you look at our downloads folder, so just check video description below I have given the instruction to download the CSV files. So, these CSV files either you have used Bright Data to capture it or you can use ready made files which I have given to you and here in this folder you see five CSV file. Now for dim players we have two versions: one with images, one without images. So, I'm going to delete the no image CSV file because that's not going to be useful, and I will here click on folder connect and go to that folder and grab all those four files. So, this PC you go to downloads T20 CSV file okay, it's going to import all those four files. Now, here you will click on transform data to perform data transformation in Power Query. So, Power Query is a component inside Power BI that allows you to do data transformation. Here, right click and say duplicate I'll tell you why I'm doing that. But go to the first step and then click on this binary okay? Now, when you click on that it's gonna expand that file. So, our first file, if you look at the steps on the right hand side, see the first file is dim match summary and when we clicked on it what it did is it expanded that particular file. So, I will just call this dim match summary, and similarly I'll just duplicate these raw steps multiple times, and then here I will expand dim player. So, once again click on this and it's going to expand it I will call it dim players and do similar things for next two files okay? Now, that I have expanded all the files I will quickly look at the data and perform some transformation. When I'm looking at dim players I see it did not recognize the column names properly. This can happen. So, what you will do is you will say go to the transform tab and then use first row as headers. So see now it is using name, team, image, etc. In the previous step if you look at it it did not have that right? It was having column 1, column 2, etc. and now I said use this first row as a header row. So, after that it looks something like this and when I glance through the data I I noticed couple of issues. Number one is the player name has this in bracket C, this is captain basically. So in the cricket scorecard if someone is a captain in bracket you will see C. I don't need that I just need a player name. So, you are going to apply some transformation where you'll use extract option and you will say text before delimiter. So, delimiter is bracket right? So, you want to get text before the bracket so in this underscore C if you get text before this bracket you'll get only Shakib Al Hassan. So, you'll say okay and you see that C is gone. So in the previous step you see Shakib Al hassan in bracket C, but in the next one that C is gone. It is a usual practice to do data trimming after this delimiter step. So you will say format and trim and that will just in case if there are extra spaces it is going to remove those. One other thing that I do is I can sort these values. So if I say sort ascending and when you just quickly glance through these values, you will find some duplicates. See Matthew Wade the same record due to some reason appeared two times, and you want to remove this duplicates. In real life you will always find this kind of data problems okay? So I'm gonna now remove sorted rows here and then I will say right click and remove duplicate. So, that will remove the duplicates value see right now the rows is 213 and in the previous step we had 219 rows so it removed six duplicates. Many times what will happen is you are building dashboard at that time you will encounter issues and then you have to go back to Power Query and perform this data transformation okay? I kind of knew about this steps so I'm doing it but in general you will glance through it you will sort it you'll do bunch of uh things to figure out all these data issues. And, now we are done with uh dim players uh data transformation. We can move to the next one which is dim match summary. Here when I glance through this data it all looks good, the only thing I'm going to do here is create a new column called stage. Now in T20 cricket any match in this particular World Cup of 2022 any match that was played before 22nd October, was a qualifier match, otherwise it was considered to be the Super 12. So, let me show you that so the logic here that we're going to use is if date is less than 22nd October 2022, then the match was qualifier okay? Qualifier otherwise it was super 12. So, I'm going to create a new column called stage and it will be a conditional column. So, let me just show you so that you get an idea. Click on ADD column and say conditional column and in that conditional column you have a match date, see match date. So the new column name by the way is stage stage could be qualifier or super 12 and if match date is before, before which date well 22nd October so I'll go and say 22nd October okay, I just know it it's just a one fact that I know and if that is the case then this stage was qualifier. You know in any tournament there is qualifier and then there is a main tournament otherwise it is called super 12. And, when you say okay see you find something like this. So this was all of these were qualifier, and then then tournament started from here. Actually, it's not less than equal to it is less than this so I'm just going to modify the formula and just hit OK. So this is Scotland Zimbabwe was the last qualifier match, and then uh from next match onwards uh we had a main tournament which is super 12. So just to summarize I created this new stage column here, and then I'm gonna change the type to text. So here ABC123 means text or number. So, I'll just change it to only text, then the next one is fact bowling summary so I go here and then I will just rename few columns. For example bowling team I can just say team. So double click this and just say team okay? And then you have things like 0s. So I'll just say 0s instead of having 0s I I feel zeros is a better column name. So, once again you will see that you might be renaming a lot of columns. So, here I will say fours sixes and so on. Now see for calculating statistics for bowling performance I am gonna do some transformation with over. So many times you know you have 2.5 over if you just 2 over and five balls so it is better if I have a column called balls, and then if I run then I can divide run by balls to get the bowling of strike rate and so on therefore I will just uh you know create a balls column from this, and the way you do that is you go to overs and then you first split the column okay by delimiter, and that delimiter is dot. So you'll just say okay and it created two columns: overs 1 and overs 2. So, for example see 2.5 I I previously had 2.5 here okay? So, let's see 2.5 so it will now say 2 and 5 in one one and second column right overs.1 overs.2 and then here in this column null values you can replace with zero right? 4 and null is basically 4 over. So this can be replaced with 0 and you can just go here replace values, null replace that with zero okay. And now you can add a new custom column so you will just say add create a custom column call that balls, and balls is nothing but overs 1 into 6 correct? So, two overs will be how many balls? Two into six twelve and then if you have any additional ball you add that and that is overs 2. So, you insert that and then you get this particular formula. So if you look at uh different overs here for example uh let's say this is this first one is four over so that will be 24 balls correct? So, see 24 balls but if you look at this particular uh row let's say this row is 18 balls, but let's check this one this is 17 balls. So, 17 balls is 2 over and five balls so two or two or is 2 into 6, 12 and then 5 is 17. See 17 here, and then let's perform some quick transformation in batting summary as well. Again, I'm gonna rename all these columns fours and sixes and I don't like that this column is in a text form. So, maybe I can have a simple column called out you know I can call this column out and if it is out then value is 1 otherwise it's 0 okay? So, not out is 0. So, how do you do that? So, you can just go to transform and say replace values, and if the value is not out, out is 0 and if the value is out once again you can just say replace values. So replace values if the value is out then that means 0. So, now we got nice binary 0 1 type of column. Not out is 0 and then out is 1 actually okay? So see something like this. And by the way this balls column should be number. So, I'll just say this is the whole number, and in batting summary also you need to do similar transformation for captain where you remove the bracket C text. So, once again you click this and you just extract text before delimiter bracket okay, and you will see see that c thing is gone. See Gerhard whatever in bracket C and the next one it is gone. So, now we are done with our transformation, you can go to home close and apply. After data transformation, we need to look into Data modeling. For that, you can go to this particular tab and you will see it has already established some relationship based on column names. So, let me just pull fact tables in the middle okay? So, I am going to just arrange them nicely and fact tables are basically the the transactions and then the dimension tables are basically the attributes. So, you can just Google if you want to read more about fact dimension table and the star schema. So, now you can see that when you hover your mouse cursor here based on match_id it established this link between the two okay? It's like one to many see, one to many relationship similarly here also based on match ID established this relationship. Now we will link dim player table with this one so here the player name is basically bowler name here and that thing is called name here okay? So, I will left mouse click, drag and drop on name here and that will establish a relationship. When you hover your mouse cursor over here see bowler name and name are highlighted you which means these two tables are linked through that particular column. Same thing here batsman name is linked to name here, and that's it our data modeling in this case is pretty simple. Once data modeling is done, the next step would be to create DAX measures. So, DAX measures are something that we'll be using in building the actual visuals okay? So, for that here I will create a category or a folder where I can keep all those DAX measures. So, I'll click on enter data here and I will call that key measures okay, so key measures here is this key message 2, just ignore that. This is just a category where you're going to add all your measures. So here I will click on this and I will say new measure. Now we need couple of measures and I have given an Excel file again checked video description. You can download it from here I have given a complete list of measures that you need for this project. So, the first one is total runs okay? So, it's that one is pretty simple so you will say here uh you can click on this icon to kind of expand it and Ctrl scroll to make it bigger. So, total total runs is equal to sum of total runs is equal to sum of all the runs in fact batting summary table right? So, here fact batting summary runs and that's it. So, that creates a total runs uh measure. This column one is not needed so you can right click you can click and delete it. So, total runs is one measure that we have already created. Now, let's create the other one so the other one that we need is total innings batted okay and that one would be so once I can click here and I'm going to just copy paste that formula. So Ctrl C Ctrl V okay? So, fact batting summary and you're getting a count of match IDs and that will be the total innings batted. Now, what's the purpose of these measures exactly, if if you have no idea on DAX, etc. um I would suggest that you can check my uh course on Codebasics.io for Power BI where I have explained Power BI pretty much in detail with a lot of practical and fun learning where we completed a real life project with one and a half million records, and the course has received amazing reviews. So, go check that out I'm gonna go quickly through DAX measure creation because this is not a Power BI detail course correct? That's why. All right! So, let's say I have created these two measures and if I want to quickly check it what I can do is I can pull a player's name here and then in that table, I can add those measures. So, let's say I add total runs for example and I can look for a particular player. For example my favorite one especially in the last series was Suryakumar Yadav and here it says he scored 239 runs right? You can quickly check if the measure is correct or not by opening fact batting summary CSV file and when that file is open you can create a filter you can go to data and filter and you know here in the filter you can type in SuryaKumar Yadav. So let's say let's say you have a file like this you can say Suryakumar Yadav, and if you look at his runs if you if you just highlight this you will see 239 and that's what I have 239 right? Similarly, you can create other measures. So what other measures do we have here? So, let's let's check that. The other one is total inning dismissed okay? And total inning dismissed would be let's see. So here I'll say new measure, total innings dismissed is equal to sum of once again you can do Ctrl scroll to see the whole thing uh okay batting summary and out. So, how many times the player got out, okay total innings dismissed and when you pull that here. So, when I say total running dismiss here say I'm getting some error and if I click on this it says that function sum cannot work with value type string which means we created this measure based on this out column, and out column is not a number it looks like. So, you can go click here home click on you know transform data to go to the Power Query and you see the the out column in fact betting summary is ABC which means it's text. So you can change it to a whole number. So, just change it to whole number say close and apply, and then it will show up fine. So, Aaron Finch was out two times and and you can see all this uh statistics. In the similar fashion you will be creating all the rest of the DAX measure okay? So, once again this file is given to you, you download it, and I want you to create all these measures. And once you create all these measures it's gonna look something like this: See here it will look something like this so you will by the way create all these measures and also group them in these kind of folders. So how do you group them? So, let me show that so when you have these measures let's say these three measures right and if you want to group them uh what you can do is this you can enter a display folder here basically, you can say batting and it will put in a batting folder and then you just drag and drop okay? So, you drag it here drag it here okay? So, this way see you have a nice batting folder. So your goal by the way this is an exercise for you. Folks this is not hard. Don't worry, you can do it! Your goal is to create all the measures, you know I just showed you two, three measures but you have to create all these measures on your own, and just put them in respective folder. Once, you're done creating a measures you have to also create some calculated columns. Calculated columns is just a jargon. Actually, it's nothing but similar to Excel formula. So, if you have Excel formula for example let's say if I go to this CSV file and I remove this filter right, so let me remove that filter I want to know total runs from boundaries okay? So let's say boundary runs, if I want to do that I can use some formula in Excel correct? So, whatever let's say if you have whatever number of fours, into 4 plus whatever number sixes you have into six that will be your boundary runs correct? And if you drag and drop this formula like this, you will see the boundary runs. For example this guy scored one four and one six so that will make it 10 run. Similar to this Excel format we are going to create calculated column in our DAX. So, let's go to our file here and and by the way this this visual was just for your validation, so you can delete it. So, here in the batting summary I will create my first calculated column which is let's say boundary runs right? So boundary runs is this. So, how do you do that? So, you can go to this table view and you can say new column okay? And this new column will be this this is the name okay? So, let's say boundary runs and that boundary runs is equal to see that fact batting summary fours into four plus fact batting summary sixes into six if you have slight bit of Excel idea this is pretty straightforward. So, you create a new column here once again for validation you can look at anything here. See like this one for example uh this person, this person hit Pat Cummins two fours, one six, so two force is two into four eight, and one six is eight and six is fourteen, so fourteen runs. Pretty straightforward. So, create all these three calculated columns and once again if you want to look at the final of pbix file which I have given again check video description you you are given all the assets, and that final file would have see all those measures. So, anything which has this symbol is a calculated column. So, see it has boundary runs okay? So, it has pretty much everything that you need. The visual here correctly represents how much attention dashboarding gets, and now we are going to start dashboarding for our project. When you work as a data analyst in any company, a usually business managers will provide you some kind of rough mock-up with their understanding on how they want to see the dashboard. So, here they have given this image which they can draw on note and pen where they want to see different tabs for power hitters, anchors, fast bowlers. Remember previously we covered different criteria for each of these categories, and when you select any category let's say you select hitters or anchors you will see the players in that category along with the statistics such as their runs, strike rate, betting average, etc. And on the right hand side you want to have a criteria filter where based on our criteria we can see a list of players here and then at the bottom you will have some kind of trends for various statistics and bottom right would be the scatter chart between strike rate and batting average. Now, they can provide this mock-up in a rough format like this or sometimes people use PowerPoint or some other tool and they will just draw their rough ideas and as a data analyst it is our responsibility to communicate back. You know communication is a very important skill when it comes to data analyst career. So, we are going to provide you all these mock-ups the name of this file is mockup.txt which you will find in the video description below once again when you download all those files you will see this file. Now, I'm going to take this stage 2.pbix file which has all the DAX measures created you can also do the same you can just get that file and I will start building our visual. So, here the first thing that we are building is the page for power hitters, and what I will do is I will go to dim player table and grab the name of the player and just drag and drop here. So it shows me see all the names of every single possible player, and if you check our mockups we want to have certain uh columns in this field such as let's say team for example, then the batting style okay, then the Innings batted so you can go to key measures and you can say okay innings batted, and then total amount of runs that these players made, total balls that they faced, strike rate, their batting average, and so on. And once you have all these columns, you want to look at your criteria for openers, your power hitters, and here it says see their batting average should be greater than 30, strike rate should be greater than 140. So, now you will use this filter tab to filter those players because this list is showing all the players right? You want your batting average to be greater than 30, so I can say batting average should be greater than 30 apply filter and it will see filter all those players. Then you want a strike rate to be greater than 140, so I will go to strike rate and say is strike rate is greater than 140. So, strike rate greater than 140. Innings batted greater than 3, and boundary percentage is greater than 50. So, innings batted uh where is it okay innings that it should be greater than 3, and then boundary percentage greater than 50. So boundary percentage is greater than 50. 50 percent is 0.5 right so that is greater than apply filter, and then the last one is betting position should be less than four. So, in whatever matches they played their batting position should be in in the opening somewhere. So, it is less than four. And when you apply all this criteria you see a nice list of players who can be a your potential power hitter in your final 11 team. You can also do some you know visualization related changes. For example total runs I want to see them as a bar chart, like a horizontal bar chart. So you can now go to visualization tab you can say total runs I want to do conditional formatting and I want to display the data bar charts, and the bar charts looks something like this. You can sort these columns as well by the way. So, if you click on it you can see the player with highest runs. We all know Jos Buttler I still remember their inning with India, Joe Butler and Alex Hales just killed it in the semi-finals. So, you see they have pretty good runs, uh strike rate, and so on. Now, folks building the whole dashboard is few hours of process, and I don't want to waste your time just going over all of that. So, I'm just going to show you a stage 3 file. So, once again whatever files you have downloaded okay video description check it, you will find this file called stage 3 and that has all the raw visuals created. So, you know you will have a page for power hitters for example, then you will have a page for anchors middle order okay, and how do you create this page? Well when you go to here this will be page one, so you you say okay power hitters or openers and then you create a new page and you create a new page for anchors or middle order right? So, anchors middle order use you just type double click and type in it. Anchors middle order and then you drag and drop and start building the visuals here. And when you build the visuals in the raw format, ultimately they will look something like this okay now if you are once again new to Power BI and you don't know the basics of these visuals, what you can do is you can go to Codebasics. io and just take our Power BI course. It's it covers all of those Concepts pretty much in detail. But for this video, I'm going to give you this stage 3 files so that you can check various properties of the visual in case you don't know how to build it. So, when you click on it uh see here it will show you what kind of visual it is. So, this one is a card visual and if you want to look at the formatting for example then you can just go click here and you can look at various you know properties. So, we are going to assume that you have built all these raw visuals and now it is time to beautify them, you know to make them look good and kind of connect different pieces. Once you beautify your dashboard it's gonna look something like this. Now this is what we have built but you can build it as per your own preference for colors and different visual behavior. We have provided you all the dashboarding tips here which you can use to build a dashboard. See, I'm not going to spoon feed you because when you go work in the industry, it becomes essential that you use your Googling skills to figure things out. If you are trying to change a color of some visuals okay, uh you can just Google it. Googling is an art that can be tremendously helpful and this is a unique opportunity for you to use that art to learn certain things on your own while we are providing You full assistance. So, read dashboarding tips and then try to make the entire dashboard look pretty good. In our Power BI course also, we have an entire chapter on designing an effective dashboard. Now you'll be glad to know that this particular dashboard is designed by one of our students. So, he took our Power BI course, his name is Ashish Babaria. He also participates in our Codebasic. io resume project challenges. So, if you go to the website to resume a project challenges which are free for everyone to participate, he has won our first prize in two challenges, and if you click on this LinkedIn icon you will see his post here where he builds beautiful beautiful dashboards. Now Ashish's background if you check it he's a trade agro specialist. So, he comes from a non-technical background. He learned Power BI. Look at his background, he's trade agro specialist. So, he doesn't have formal training on data analytics, etc. He learned Power BI mainly from Code basics channel and various other resources, and by participating in resume project challenges we were able to spot his challenge and he is now working with us as a freelancer, and he's helping us with all these projects. So, look at the quality of and professionalism of this dashboard. He learned things on his own at a later part of his career, and you can become like Ashish too. Now, you can click on various player categories. So, in the power hitters you want to have few players from this particular list, and if you look at the the filter criteria see this is all the filter criteria that we had. Now, look at the filter criteria for example for fast bowlers right? So, in fast bowlers you want to have bowling economy to be less than seven and they should be taking uh wicket every 16 balls. So, if you go to our fast bowlers control click on that and if you click on this visual, you will see all the filters that we have okay? So see, bowling strike rate is less than 16, dot ball is greater than 40%, and so on. So, you see all those uh details here, and if you want to modify any criteria it's super easy. You just go here type in and you can play with uh different things, and in the next part of this uh session what we're going to do is invite Tony Sharma who is a subject matter expert and in charge of this project. He will help us decide the final 11. So, if you control click on it see we have our final 11 almost decided. But you can modify certain criteria, substitute different players and if you look at this team if you know a cricket just look at this team. It looks pretty solid like unbeatable team. So see, Power BI data analytics can help you generate data driven insights uh that can make a huge difference in the problem that you are trying to solve. We are going to provide you this final file also so if you're if you have a question on some visual behavior, you can click on it and you can you know check or the different visuals that we have. You can look at format and their you know various properties that these elements are having and I will quickly play a time lapse view so that you you have some idea on how this visual was built. [Music] [Music] thank you [Music] So, Nick we are just few minutes away from saving the world. How does this dashboard look? Dashboard looks amazing! But can you show me the best 11? Not yet. We are getting there just give me a few minutes. Let me explain how we've done this. So, in the previous session we had, I explained what are the parameters I'm using right and now you can see this parameters come into action, coming come live. So, this is how I've created the parameters. You can see how I filter the place for the openers, these are the power hitters and uh you know I also have like a solid graph of them like to understand their consistency, to understand their playing trend, and all of that and I also have a scatter plot to show how their batting average fairs their strike rate. So, you can see these are the players I've got here correct? So, these players will be able to strike ball at around you go I have players you can strike ball at 160, 170 even close to 170, at the same time give me an average of 35. You can see that, and Jos Buttler here gives me the highest average. But he's is a good striker as well. He strikes at 140 plus which is which fits our parameter. So, you can see that these players are there. So, out of these players I'm going to select Jos Buttler because you can see he's consistent all the matches, he has played all the matches pretty decently, and he's a good Wicket keeper as well. So, we need a wicket keeper. So, he's gonna be a wicket keeper batter, and his partner Alex Hales is a good choice, he's a good choice of a second opener. But again he is not consistent in all the matches. Also I need a left hand combination with a better strike rate. So, I'm going to choose Rilee Russouw from South Africa. He's a better option for me. He has a you know he can strike ball like crazy. I can show you the combined uh you know the performance of these both, I'm just selecting these both. So if these two players play together, Rilee Russouw and Jos Buttler if they play together, they will give us 40 runs in average at a strike rate of 150 plus. So, if they too bat without losing a wicket so we'll hit our Target of 180. If they bat 120 balls they will give us 180 runs. And they will stand at least for four overs on an average because the average ball spaced you can see it's 23.9, and they score runs at 60% in boundaries which is so so perfect to what we need. You can see the consistency is is pretty much there, it's it's dropping here and there but but they are you know they together as a package will give us what we need. I love this feature that you can select two players and see their statistics uh on average basis because this way let's say if these two players are playing and if I have questions on their partnership statistics, I can get the view of those numbers easily here. It is not exactly like a partnership but it will give you their combined performance. Got it! I got the point! Yes and uh so I'm also going to select Alex Hales just as my reserve opener. So, I'm going to select three players for this position potentially. But the the what the two I'm going to play is Jos Buttler and Rilee Rossouw. All right! Let's move to the anchors where we'll be select three players here again you can see the filters I applied on this data: batting average, strike rate, innings batted, exactly like we discussed right? Here the interesting part is um we have Virat Kohli on the top with most runs and then follows Suryakumar Yadav. So, let's check into this chart the scatter plot. Virat Kohli is clearly the winner because he gives us a lot of runs, he's a run machine we definitely need to pick him and the second player could be I couldn't think anything better than Suryakumar Yadav because you can see his average is 60. He could give us 60 runs on an average and he's striking at 190. This is the best we have in the team. So, even our openers did not score runs at a good strike rate this guy can come and propel the score. So we need him in the team for sure and these two players are solid they have a good partnership, it's a great idea to play them together. So, for my fifth position I'm gonna go with the three options have is like Lorcan Tucker, Glenn Phillips and Daryl Mitchell. So, I'm based on the statistics, based on the averages, it's easy for me to choose Glenn Phillips. So, even though he has a strike rate of 160, which is which is really high for the position, he scores at an average of 40. So, that's the reason I go for this he's clearly my number five. Okay. Right. And uh let's move on to the next one. So this is going to get slightly tricky because I have many players in this position uh you know whom we can pick. So like I said I'm looking for a batting all rounder here and that batting all rounder there could be uh you know batsman who can score runs at a very high strike rate, at the same time anchor the innings, or you know that this person can be a fast bowler or a spinner as well. In order to you know justify my selection let me come back to this place again. But like and I will go to my I'll go to select my specialist fast bowlers first. So, that it becomes very easy for me whether I need a fast bowler in the position or whether whether at all I need a bowler at all in the position. If I need a good bowler or a good batsman in that position, so I need I will select my fast bowlers first. So, here my selection is super super super easy. I'm I'm gonna select this guy: Sam Curran. You can see his economy is 6.53, and his bowling average is 11.38 which means he gets a wicket for every 11 runs that he provides, and his bowling strike rate is also staggering. He gets a wicket for every ten and a half balls which means if he balls the full four overs we are definitely getting two wickets. And look at this guy he has got 11 wickets at an economy currency economy of 5.37, he gives less than six runs in over. We definitely need him on our team and he's a fast bowler and he picks wickets in less than 10 balls. So, these two are definitely in our team. And of course Shaheen Shah Afridi, how can we ignore him? The kind of player is is a left arm fast bowler and he can really rattle the batsman. He's one of my favorite! Yeah so you're picking this three. Tim Southee is good but if I have to pick three I'm just gonna be just three. So, these are the parameters I've applied here you can see that uh you know, innings bowled, bowling strike rate, bowling average exactly like we discussed in the parameter session. So, let's see the combined performance together just to get an idea. Sam Curran, Anrich Nortje, Shaheen Shah Afridi. You can see these guys will pick a wicket for every 11 runs they give, which means if these three bowl all the 20 overs they will get the full team all out for 110 runs, 113 runs and they will get a wicket every 11 balls. If the bowl all the 20 overs, the team is all out. And they imagine they're bowling first three or four overs, they will let's say if these players they they ball first six over, how many Wicket they will take? If they bowl the first six overs they will pick a wicket for every 12 balls, six overs is 36 so they will pick at least three wickets that's that's on average. That would be awesome! If you pick three wickets in the power play! That's gonna be amazing! They'll definitely scalp the top three, they will open the middle order for us. And uh look at the economy the combined economy is six which means if they bowl the first six overs they will be just giving 36 runs, and the dot ball they produce is close to 50 percent which means if they bowl all the 20 overs the batsman is able to score only the half of the overs, like only the 10 overs the rest of the 10 hours they're scoring nothing, it's a dot ball. These three are the major strength to our team. All right, so let me come to the all-rounders now before I go to the finisher row. So, we have three solid fast bowlers. So, in my all-rounder I'm going to have the spin element, but at the same time I want these two players to bat as well, at a higher strike rate. So, that's the reason as I explained earlier I've kept the strike rate parameter as 140. If I don't consider the strike rate let's say if I get the strike rate is 100, I'm gonna get more players like even I get Ben Stokes here. But I'm considering strike rate 140. Oh got it! I was expecting to see Ben Stokes. Now, I know why he's not showing up. Yeah, yeah I will explain why Ben Stokes is not here in the other page as well because I wanted him at that, I wanted him at that role earlier. So, you can see from this graph Rashid Khan is really good in terms of uh you know the bowling strike rate, I mean I can't say it's really good because bowling strike rate should be lesser right, it's it's not good and uh you have Sikandar Raza who has the economy we kind of finding people in the zone this this zone is the best zone, who has the lesser economy and lesser bowling strike right. So Shadab Khan is really the winner. He definitely deserves a spot because you can see his performance, he's quite consistent as well. You know his bowling average is good whenever he got a chance to bat he has batted well as well this is his bowling and batting performance it's of high consistency. But Shadab Khan can I play him at number seven? I'm not sure because if my number six did not bat well I want someone more capable to bat at number seven. So, I would play Shadab Khan at number eight. But then at number seven I would play Sikandar Raza. This is the person I'm going to play at number seven because look at the strike rate it's 147 and he also has a very high average batting average of 27 runs. For a number seven he normally played number five or number four but for number seven if you have this kind of an average it's great with this strike rate. So, that's that's going to be my number seven, and number eight I've got my 9 10 11 as well. So, I just need to select my number six. So, again choosing this position is slightly difficult I would show you why uh you know let me take out the filter from this diagram right? You would see that I also have Ben Stokes here So, Ben Stokes could not make this list because his strike rate was too low in the series. His strike rate was just 105. If the batsman is scoring just run a ball we cannot have them in the number six because this position might require to hit like crazy. So, that's the reason Ben Stokes might not find this place so I'm going to say it's greater than 130. And again if I have to say that um whom I'm gonna pick from here. So, we have three fast bowler, two spinners. Glenn Maxwell looks like a good option. But he's a spinner again. I'm not sure whether my sixth bowling option should be a spinner. But his bowling average is good uh you can see his bowling is economy is six, uh he is is that is bowling uh strike rate is 6.33, which means he has got a wicket almost every six balls. So, he is one of my options. My other option is Marcus Stoinis. I'm gonna play with this guy. His bowling average is not that good, his economy is not that good, but he's a good striker of the ball and he's someone I can also trust to anchor the game. So, I could I could go for Marcus Stoinis if I'm if I want a better batsman with high striking rate or if I want someone more balanced, I could go with Hardik Pandya. So, this guy will give you a good batting average but his strike rate is also not not that great, and his consistency is also not that good in this world cup. But his bowling is good if I really need the sixth bowling option I would go for Hardik Pandya. But the kind of bowlers that I have Shaheen Shah Afridi, Sam Curran and Anrich Nortje and all those players I don't think I would need Hardik Pandya in my team. Then I will go for Marcus Stoinis. So, I will make my decision easier by showing the final eleven right? So, if I go to the final eleven, I've already picked the top 11 like I say right? So you can see this is our batting 11. Josh Buttler, Rilee Russouw will be opening the game, Virat Kohli followed by Suryakumar Yadav, Glenn Phillips and in this number six I've selected here Hardik Pandya. But I want to go with let's say uh Marcus Stoinis. You can see the combined performance of the team batting average is 37.76, the team on average cost 37 runs, strike rate is 150. You know all the facts, it's given here. But if I choose Marcus Stoinis since my sixth batting position should increase the batting strength I'm focusing more on the batting strength now because our bowling is already great, I'm selecting Marcus Stoinis and removing Hardik Pandya you would see my batting average improved from 37 to 39.6, and my strike rate improved from 151 to 154.4. So, this is the reason I would go for Marcus Stoinis. But I can also go for Maxwell right? You can see that the uh you know the performance of Maxwell and Marcus Stoinis is a very much comparable. They have, they have a very similar strike rate, almost very close, and uh Maxwell has bowled well. He has picked up wickets well. But he's a spin bowler, he's a right arm off break bowler. So, since I wouldn't need my sixth b owling option as someone who can you know who can bowl pace. I'm going for Marcus Stoinis. But on a given day we will still have options you can still choose Glenn Maxwell on a given day depending upon the pitch, if the sixth bowling option can be a spinner, then my second option is Maxwell. Uh if I need someone more in the bowling side, if if some of our fast bowlers get injured then I would choose Hardik Pandya for number six. So, I would keep all these three options open. But right now my team in the final 11, I would have uh Marcus Stoinis, and uh in the opening I would go for Rilee Russouw, but if you want Alex Hales he is also available for selection. Got it just yeah this list I am feeling super excited anyone who is following cricket if they look at this list of players, they will be like this team is unbeatable. This is the best team that we have got on our planet and I'm sure you would win this game and save our planet. So, Tony this team is great but I know that Mitchell Starc is a pretty good fast bowler from Australia. Can you tell me why he's not in the final 11? Sure I think he must have not made our standards. Let me check I'm pretty sure he's there but he didn't meet our standards. So, I think um if you recall it correctly maybe his bowling strike rate is definitely not under 16. So, let me take this out and uh maybe he didn't bowl dot balls like we want and this bowling average might not be less than 20, I should remove that as well and his economy is definitely not less than seven. Let me take that out and let me see if we can have him in the 11 don't think we have him yet. So maybe he didn't even uh play the four innings yeah you could find him here now Mitchell Starc, You can see his economy is 8.5 which is more than uh you know seven his bowling average is 34, his bowling strike rate is 24. So, that's the reason why he's not there and he has also played bowled just three innings. We wanted at least four innings. Yeah. That's the reason. All right, it's clear now. Okay, here's your final 11 Nick. Go get us the cup and save the planet. The future of this planet is focused on these 11 players. They are traveling to the wilderness out in the dark, to bring us light. And may we all, citizens of data hope and pray our analysis and insights shall work. Defeat the Sportans and bring us glory! [Music] Three weeks later! You defeat us in cricket, you get Earth! If you lose join me as an intern. Join me as an intern. Now comes the most interesting part of this entire project series, which is an exercise and by working on this exercise you will be able to win an exciting prize which is 20% scholarship on one of our premium courses on Codebasics. io For exercise what you need to do is number one in the tooltip for players you are seeing let's say if it is Surya Kumar Yadav you are seeing India Vs Pakistan, India Vs whatever team, you all know he is from India. So, you need to remove that IND and just have Vs Pakistan, Vs Sri Lanka. So, you need to update that tooltip that is exercise number one. Second one is you have to update the visual look and feel of the entire dashboard. So you can change colors, the placement you know different visual aspects of the entire dashboard. So come up with your own design and colors. Number three is providing some more insights. So, whatever we have covered in dashboard in addition to that try to add more insights and then once you have that dashboard created, you can write a nice LinkedIn post and tag me, Hemanand and add following particular hashtag so that you we know that you have submitted this. We are going to provide some sample LinkedIn Post in the description below. These are for resume projects. But basically you can write nice post upload your uh project on Novi Pro or maybe just short a simple video and make a LinkedIn post on it I wish you all the best and I hope you learned a lot of different things in this project. This is going to be an excellent project for your resume, and make sure you're writing LinkedIn post that way you know you can draw attention from potential recruiters. If you like this video give it a thumbs up and please share it with your friends. Any question or comments post in the comment box below or we have a Discord server for Codebasics go there and you can post your question there as well! Thank you! [Music]