Targeting Sales Campaign with Machine Learning

Abhishek Sharma
4 min readJan 12, 2021

Sales campaign thorugh cold calling are higly ineffective. We will look at a Machine Learning Prototype where a Bank can monetize it’s past sales campaign data to understand the factors behind successful calls and sales conversion which can then be used to better qualify prospects for future campaigns

We will be following below structure:

  1. Data Loading and preparation
  2. Exploratory Data Anlalysis
  3. Dataset splitting into train & test
  4. Modelling the data through multiple alogorithms
  5. Comparing the perfromance of models and conclusion

Step 1: this dataset provides the informating of previous sales campaign for crosselling personal loan to liability customers of bank

Reading the Data

Step 2: Input Features

Continous Variables
Customer ID: Customer ID variable can be ignored as it will not any effect on our model. As we know customer Id is just to maintain the record in serial order.
Age — Age of the customer
Experience — Years of work experience of customer has
Income — Annual Income of the customer which is in (thousand) dollars
CCAvg — Avg. spending on credit cards per month which in (thousand ) dollars
Mortgage — Value of House Mortgage in (thousand) dollars if any

Binary Categorical Variable :
CD Account — Does the customer have CD Account with bank or not?
Security Account — Does the customer have Security Account with bank or not?
Online — Does the customer have Online banking facility with bank or not?
Credit Card — Does the customer have a credit card issued by Bank or not?
Personal Loan — This our target variable which we have to predict. This indicates that the customer has token loan or not?

Ordinal Categorical variables :
Family — Number of famlily member of the customer
Education — Education level of the customer. In our dataset it ranges from 1 to 3 which are Under Graduate, Graduate and Post Graduate respectively.

Step3: Exploratory Data Analysis

CD Account can be a good predictor for purchase of Personal Loan
Family size doesn’t seems to have much predicting power, however family simze of 3&4 have more likelihood of taking personal loan comparerd to 1 &2
# Customers who have credit card and monthly Credit Card spending is higher are more likly to take loan
#Customers who have taken personal loan have income than those who did not take. So high income seems to be good predictor of whether or not a customer will take a personal loan.
# Securities Account doesn’t seem to be a predictor

Bivariate Analysis:

Correlation Heatmap

Inference from Correlation:
Experience’ and ‘Age’ have strong positive association.
Income and CCAvg have moderate positive association
Mortgage has moderately correlated with income
Personal Loan has maximum correlation with ‘Income’, ‘CCAvg’, ‘CD Account’

Step4 : Split the data into train and test

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42, shuffle=True, stratify=y)

Step5: Model Buliding

Model 1: Logistic Regression

Classification Report for Lgistic Regression

We will also model based on Knn and Naive Bayes to compare and identify the most suitable model

Model 2: KNN (K nearest neighbours)

Since KNN is a distance based algorithm ,hence scaling must be performed before modelling.

Scaling the input features
Check the accuracy for different value of odd no of K’s. Maximum accuracy is achieved at k =3

Variation of misclassification error, with different value of K

Classification Report for KNN at k=3. As we notice, there is improvement in precision and recall by KNN algorithm as compared to Logistic Regression

Model 3: Naive Bayes Algorithm

Classification Report for Naive Baye’s

Step 5: Comparison of Performance of models:

Hence among the above three algorithm applied on the underline dataset, K-NN would be the best choice to predict the customers who will accept the personal loan with an accuracy of 96 %, Precision of 92 % and Recall of 66%