Review of T20 Cricket World Cup (England vs Pakistan)
Project focus on cricket data analytics using T20 World Cup data
Steps involved: scraping data, data cleaning and transformation, building dashboards in Power BI
Project Steps
1. Data Scraping
Source: ESPNcricinfo website
Tools: Web scraping techniques to extract relevant data
Use of Bright Data for scraping with proxy networks
Data types to capture:
Match results table
Detailed scorecards for batting and bowling
Player specific information
2. Data Cleaning & Transformation
Python and Pandas: Tools used for data cleaning and transformation
Objective: Transform JSON data into a flat CSV format for easier analysis in Power BI
Steps include:
Renaming columns
Creating new columns (e.g., match ID, out/not out indicator)
Handling missing values and data types
3. Building Dashboards in Power BI
Dashboard Features:
Categorized player selection (openers, anchors, fast bowlers)
Display player statistics: runs, strike rate, batting average
Filtering capabilities for selection criteria
Visualizations like scatter plots for performance comparisons
Problem Statement
Assembling a cricket team to defeat Planet Sporta
Required team performance metrics:
Average runs scored: 180
Runs to defend: 150
Parameters for Player Selection
Openers
Criteria:
Batting average
Strike rate
Boundary percentage
Target: 50 runs in the first 5 overs
Middle Order (Anchors)
Criteria:
Ability to shift gears and bat for a longer duration
Higher batting average
Average balls faced
Finishers
Criteria:
Ability to chase down and stabilize innings
Preferably all-rounders with batting focus
All-rounders
Criteria:
Should be capable of hitting hard and also bowling effectively
Fast Bowlers
Criteria:
Bowling economy below 7
Wickets taken every 16 balls
Data Collection Process
Bright Data: Utilized for efficient data collection
Created multiple collectors for different data types
Example of data extraction process shown using JavaScript code for web scraping
Data Modeling in Power BI
Establishing relationships between tables based on match IDs and player names
Creating DAX measures for calculations (e.g., total runs, innings batted)
Calculated columns for derived metrics (e.g., boundary runs)
Dashboard Creation
Mock-up provided for initial design
Creating visuals based on player categories and performance metrics
Emphasized on the importance of visualization aesthetics and layout
Final Team Selection
Analysis of player performance using the created dashboard
Selected players based on statistical performance and roles needed
Emphasized the importance of pairing players effectively for optimal results
Challenge and Further Learning
Participants given an exercise to improve the dashboard and add new insights
Encouraged to share progress on platforms like LinkedIn for networking and visibility
Resources and codes provided for further exploration and practice
Conclusion
The project aims to provide hands-on experience in cricket data analytics through practical application of data scraping, cleaning, and visualization techniques.
Encouraged students to leverage the knowledge gained for real-world applications and potential career opportunities.