This guide covers the five key stages of the data analysis process.
Overview and introduction to each stage are provided.
Discussion on some of the tools to undertake these stages.
Step 1: Defining the Question
First step: Define your objective, also known as the problem statement.
Objectives: Formulate a hypothesis and figure out how to test it.
Key Question: What business problem am I trying to solve?
Example: A company like "Top Notch Learning" may need to explore why they have low repeat business by asking, "How can we boost customer retention whilst minimizing costs?"
Data analysts need to understand business goals thoroughly.
Tools:
Business metrics and KPIs (Key Performance Indicators).
Monthly reports.
Tools for business data analysis: Databox, DashaRoo, Grafana, Freeboard, Dashbuilder.
Step 2: Collecting the Data
Create a strategy for collecting and aggregating the right data.
Determine the type of data needed: Quantitative (numeric data) or Qualitative (descriptive data).
Data Categories:
First-party data: Data collected directly from customers (e.g., transaction data, CRM data).
Second-party data: Another company’s first-party data, often structured and reliable.
Third-party data: Aggregated from multiple sources, often includes big data.
Example data sources: Customer satisfaction surveys, focus groups, purchase history, shipping data.
Tools: Data Management Platforms (DMPs) like Salesforce DMP, SAAS, Xplenty, Pymcore, Dswarm.
Step 3: Cleaning the Data
Objective: Prepare data for analysis by cleaning it.
Key Tasks:
Remove errors, duplicates, and outliers.
Extract irrelevant observations.
Fix typos and layout issues.
Fill in major gaps.
Data analysts spend ~70-90% of their time cleaning data.
Tools for cleaning data:
Open source: Open Refine.
Coding tools: Python libraries (Pandas) and R packages.
Enterprise tools: Data Ladder.
Step 4: Analyzing the Data
Types of Analysis:
Descriptive Analysis: Identifies what has already happened.
Diagnostic Analysis: Understands why something has happened.
Predictive Analysis: Identifies future trends based on historical data.
Prescriptive Analysis: Makes recommendations for the future.
Choice of analysis technique depends on goal and types of insights needed.
Example techniques: Univariate, Bivariate, Time Series Analysis, Regression Analysis.
Step 5: Sharing Your Results
Objective: Interpret and present insights to stakeholders.
Presentation of results should be clear and unambiguous.