[Music] now that we've completed our exploratory data analysis Eda on the Telco customer churn data set it's time to move on to the next phase model building for churn prediction let's start by importing the necessary libraries and preparing our data for modeling in this script we're importing pandas for data manipulation as well as necessary mod modules from s learn for model evaluation and building we'll start by splitting our data set into features X and the target variable y then further split the data into training and testing sets next we'll train a decision Tree classifier on the training data and evaluate its performance on the testing data to kick things off let's load our pre-processed Telco customer churn data set and take a quick look at the first few rows in this script we're using pandas to read the CSV file tor. CSV which contains our pre-processed data then we're displaying the first few rows of the data set to ensure that the data has been loaded correctly and to get a glimpse of its structure in this section we're removing the unnamed zero column from our data set using pandas let's dive into the explanation and script the unnamed zero column often appears when we save a data frame to a CSV file without specifying the index it represents the index column from the original data frame since we already have the indexation information this column becomes redundant and can be dropped from the data set we're using the drop method to remove the unnamed zero column from the data frame DF along the column's axis AIS equals 1 this ensures that the column is dropped and the data frame is updated accordingly now let's separate our features from the target variable in preparation for model training in this script we're creating a new data frame X containing only the features by dropping the churn column along the columns axis axis = 1 from our original data frame DF do this ensures that X contains all the features we'll use to predict customer churn now let's extract our Target variable which is the churn column from the data set we're creating a new variable y containing the target variable by selecting the churn column from our original data frame DF this variable y represents whether a customer churned or not which will be our Target for prediction before we proceed with model training let's split our data into training and testing sets using the train uncore testore split function from CET learn in this script we're using the train uncore test underscore split function to split our features X and Target variable Y into training and testing sets we specify the test uncore size parameter to determine the proportion of data to include in the testing set in this case 20% finally we display the sizes of the training and testing sets to ensure that the split was successful now let's train a decision tree classif model on our training data in this script we're initializing a decision tree classifier model with certain hyperparameters the criterian parameter is set to Genie indicating the impurity measure used to split nodes in the decision tree we also specify random underscore state for reproducibility Max underscore depth to control the maximum depth of the tree and Men underscore samples underscore Leaf to set the minimum number of samples required to be at a leaf node with our decision tree classifier model in initialized let's Now train it on our training data model dt. fit xor Trin Yore Trin we're using the fit method to train our decision tree classifier model on the training data xcore train and corresponding Target variable Yore Trin dot this process involves constructing the decision Tree by recursively partitioning the feature space based on the training data now that our decision tree classifier model is trained let's use it to make predic on our testing data Yore pred equals model dt. predict xcore test we're using the predict method to generate predictions for the testing data xcore test using our trained decision tree classifier model these predictions will allow us to evaluate the performance of our model on unseen data here are the predictions made by our decision tree classifier model on the testing data Yore PR we're displaying the predictions made by our trained decision treat classifier model on the testing data xor test dot each prediction represents whether a customer is predicted to churn one or not churn zero based on the features provided now let's evaluate the performance of our trained decision tree classifier model on the testing data in this script we're using the score method to calculate the accuracy score of our trained decision tree classifier model on the testing data xcore test and corresponding actual labels Yore test do the accuracy score represents the proportion of correctly classified instances out of all instances in the testing set let's take a closer look at the classification report for our decision tree classifier model we're using the classification underscore report function from sck learn to generate a detailed classification report for our model's predictions the report includes various metrics such as Precision recall F1 score and support for both the positive churned and negative non-ed classes in situations where we have imbalanced data meaning one class is significantly more prevalent than the other resampling techniques can be employed to address this imbalance let's break down the code and its functionality we're using the motin algorithm from the. Combine module which combines the synthetic minority oversampling technique SME with edited nearest neighbors enn SME generates synthetic samples for the minority class while enn removes noisy samples by editing the data set in this script rpt we're first importing the smin algorithm from the. Combine module next we initialize the Smo resampling technique finally we res sample the features X and Target variable y using the fit underscore sample method which performs the resampling operation after resampling our data to address class imbalance let's split the resample data into training and testing sets in this script we're using the train uncore testore split function from s learn to split the resampled features xcore sampled and resample Target variable ycore sampled into training and testing sets we specify the test underscore size parameter to determine the proportion of data to include in the testing set which is set to 20% in this case to continue with our resample data let's initialize a decision tree classifier model with the same hyperparameters as before we're initializing a decision tree classifier model with the same hyperparameters as before but this time using the resample data this model will be trained on the resample training data to address class imbalance and improve predictive performance now let's train a decision tree classifier model using the resample training data in this script we're training a decision tree classifier model using the resample training data XR uncore train and corresponding resample Target variable yearor Trin dot then we're making predictions on the resample testing data XR uncore test calculating the accuracy score of the model on the resample testing data and printing the accuracy score finally we're printing the classification report for the resample testing data which includes metrics such as Precision recall and F1 score let's also take a look at the confusion Matrix for our decision tree classifier model trained on the resample data in this script we're using the confusion uncore Matrix function from s learn to generate the confusion Matrix for the resample testing data the confusion Matrix provides a tabular representation of the true positive false positive true negative and false negative predictions made by the model printing the confusion Matrix allows us to assess the performance of our model in detail now let's explore training a random forest classifier model on our data in this script we're initializing a random forest classifier model with specified hyperparameters such as the number of estimators Criterion maximum depth and minimum samples per Leaf then we're training the model on the training data xcore train and corresponding Target variable Yore Trin dot next we're making predictions on the testing data xcore test and calculating the accuracy score of the model on the testing data finally we're printing the classification report for the testing data which includes metrics such as Precision recall and F1 score for both classes to address class imbalance let's use theine algorithm to resample our data in this script we're using the Moine algorithm which combines the synthetic minority oversampling technique smot with edited nearest neighbors enn to resample our data we initialize the smene object and then apply the fit underscore resample method to resample the features X and Target variable y dot after resampling our data with Smo let's split the resample data into training and testing sets in this script we're using the train uncore testore split function from s learn to split the resampled features xcore resampled 1 and res Target variable ycore resampled 1 into training and testing sets we specify the test underscore size parameter to determine the proportion of data to include in the testing set which is set to 20% in this case now let's initialize a random forest classifier model with the same hyperparameters as before but this time using the resample data in this script we're initializing a random forest classifier model with the same hyperparameters as before but now using the resample data this model will be trained on the resample training data to address class imbalance and improve predictive performance with a random forest classifier model initialized with resample data let's Now train it on the resample training data in this script we're using the fit method to train a random forest classifier model on the resample training data XR train 1 and corresponding resample Target variable yearor Trin 1 dot this process involves building multiple decision trees based based on bootstrap samples of the resample data and averaging their predictions to improve generalization now that a random forest classifier model trained on resample data let's use it to make predictions on the resample testing data we're using the predict method to generate predictions for the resample testing data XR test 1 using our trained random forest classifier model these predictions will allow us to evaluate the performance of our model on unseen resample data let's calculate the accuracy score of our random forest classifier model trained on resampled data we're using the score method to calculate the accuracy score of our trained random forest classifier model on the resample testing data XR uncore test 1 and corresponding actual labels year uncore test 1 dot the accuracy score represents the proportion of correctly classified instances out of all instances in the resample testing set let's take a closer look at the performance of a random forest classifier model trained on resampled data we're printing the accuracy score of our random forest classifier model on the resample testing data XR test 1 and corresponding actual labels yearor test 1 do we're also printing the classification report for the resample testing data which includes metrics such as Precision recall and F1 score for both classes let's also examine the confusion Matrix for our random forest classifier model trained on resampled data we're using the confusion underscore Matrix function function from s learn to generate the confusion Matrix for the resample testing data the confusion Matrix provides a tabular representation of the true positive false positive true negative and false negative predictions made by the model printing the confusion Matrix allows us to assess the performance of our model in detail to further enhance our model training process let's apply principal component analysis PCA for dimensionality reduction in this script we're using the PCA class from sck learn to in iiz PCA with the parameter 0.9 indicating that we want to retain 90% of the variance in the original data then we're applying PCA on both the resample training and testing data to reduce their Dimensions while preserving most of the variant finally we're retrieving the explained variance ratio which represents the proportion of variance explained by each principal component let's initialize a random forest classifier model to train on the PCA transformed data we're initializing a random forest classifier model model with specified hyperparameters such as the number of estimators Criterion maximum depth and minimum samples per Leaf this model will be trained on the PCA transform data to explore its performance in predicting churn now let's train our random forest classifier model on the PCA transform training data in this script we're using the fit method to train a random forest classifier model on the PCA transform training data XR Trainor PCA and corresponding resample Le Target variable year underscore train 1 dot this process involves building multiple decision trees based on bootstrap samples of the PCA transform data and averaging their predictions to improve generalization now that a random forest classifier model is trained on the PCA transform data let's use it to make predictions on the PCA transform testing data we're using the predict method to generate predictions for the PCA transform testing data XR testore PCA a using our trained random forest classifier model these predictions will allow us to evaluate the performance of our model on unseen PCA transform data let's calculate the accuracy score of our random forest classifier model trained on PCA transformed data we're using the score method to calculate the accuracy score of our trained random forest classifier model on the PCA transform testing data XR testore PCA and corresponding actual labels yearor test1 do the accuracy score represents the proportion of correctly classified instances out of all instances in the PCA transform testing set let's examine the performance of a random forest classifier model trained on PCA transformed data we're printing the accuracy score of a random forest classifier model on the PCA transform testing data XR testore PCA and corresponding actual labels yearor test1 do we're also printing the classification report for the PCA transform testing data which includes metrics such as Precision recall and F1 score for both classes to save our trained random forest classifier model for future use we'll serialize it using the pickle module in this script we're importing the pickle module which allows us to serialize python objects next let's specify the file name for our serialized model we're defining the file name as model. saave which will be used to save our trained random forest classifier model now let's save our trained random forest classifier model to a file using the pickle module we're using the pickle. dump function to serialize and save our trained random forest classifier model model RF smot to a file named model. saave in binary mode WB to use our saved model in future applications let's load it back into memory we're using the pickle. load function to load our saved model from the file model. saave in binary mode RB now that we've loaded our saved model let's evaluate its performance on the resample testing data we're using the score method to calculate the accuracy score of our loaded model on the resample testing data XR test1 in corresponding actual labels yearor test1 dot with an accuracy score of 92.5% a random forest classifier model trained on resampled data demonstrates strong performance in predicting churn through thorough exploration model training and evaluation we've gained valuable insights into customer churn prediction thank you for joining us on this journey remember understanding churn patterns empowers businesses to make informed decisions and retain valuable customers if you found this video helpful don't forget to subscribe to our channel for more insightful content on data science and machine learning feel free to leave a comment below with your thoughts or questions we'd love to hear from you [Music]