πŸ“Š

Customer Churn Prediction Overview

Jul 28, 2025

Overview

This lecture covers customer churn prediction using artificial neural networks, including data cleaning, feature engineering, building a model in TensorFlow, and evaluating model performance using metrics like accuracy, precision, recall, and F1 score.

Introduction to Customer Churn

  • Customer churn is when existing customers leave a business or service.
  • Predicting churn helps businesses identify at-risk customers and take action to retain them.
  • Deep learning, especially neural networks, can aid in churn prediction.

Data Preparation & Cleaning

  • Use the telecom churn dataset with features like gender, tenure, monthly charges, and more.
  • Drop irrelevant columns like customer ID.
  • Convert object-type columns (e.g., total charges) to numeric, handling errors and missing (blank) values by dropping affected rows.
  • Replace "no internet service" and "no phone service" with "no" for consistency.
  • Apply label encoding to binary categorical columns (yes/no β†’ 1/0, male/female β†’ 1/0).
  • Apply one-hot encoding for categorical columns with more than two categories using get_dummies.

Exploratory Data Analysis & Visualization

  • Visualize tenure and monthly charges to see how they relate to churn status.
  • Histograms compare loyalty/tenure and high charges with the likelihood of customers leaving.
  • Use legend, colors, and labels to enhance chart clarity.

Feature Scaling

  • Scale numerical features (tenure, monthly charges, total charges) to 0-1 using MinMaxScaler.
  • Machine learning models benefit from scaled features for convergence and stability.

Model Building with Neural Networks

  • Split data into training and test sets (typically 80/20 split).
  • Build a sequential neural network in TensorFlow/Keras with input, hidden, and output layers.
  • Use relu activation for hidden layers and sigmoid for binary output.
  • Compile model with binary cross-entropy loss and Adam optimizer.
  • Start with a low number of epochs, then increase based on initial accuracy feedback.

Model Evaluation

  • Evaluate the model on the test set; achieve around 80% accuracy.
  • Use model predictions and convert sigmoid outputs to binary (0/1) with threshold 0.5.
  • Generate a confusion matrix to analyze true/false positives and negatives.
  • Calculate accuracy as (correct predictions) / (total predictions).
  • Compute precision, recall for both classes, and interpret their meanings.

Key Terms & Definitions

  • Customer Churn β€” When a customer stops using a company’s service.
  • Label Encoding β€” Converting categorical yes/no or binary fields to numerical 1/0.
  • One-Hot Encoding β€” Converting multi-category columns to multiple binary columns.
  • MinMaxScaler β€” Scales features to a [0,1] range.
  • Confusion Matrix β€” Table showing true vs. predicted values for classification.
  • Precision β€” Proportion of true positives among all predicted positives.
  • Recall β€” Proportion of true positives among all actual positives.
  • Accuracy β€” Proportion of correct predictions (both positives and negatives).
  • Sigmoid Activation β€” Produces output in the range [0,1] for binary classification.

Action Items / Next Steps

  • Download and clean the bank customer churn dataset from the provided Kaggle link.
  • Build and evaluate a similar neural network model as shown in this tutorial.
  • Analyze accuracy, precision, and recall on the new dataset.
  • Review any referenced pandas, matplotlib, and one-hot encoding tutorials as needed.