Sales campaign thorugh cold calling are higly ineffective. We will look at a Machine Learning Prototype where a Bank can monetize it’s past sales campaign data to understand the factors behind successful calls and sales conversion which can then be used to better qualify prospects for future campaigns
We will be following below structure:
- Data Loading and preparation
- Exploratory Data Anlalysis
- Dataset splitting into train & test
- Modelling the data through multiple alogorithms
- Comparing the perfromance of models and conclusion
Step 1: this dataset provides the informating of previous sales campaign for crosselling personal loan to liability customers of bank
Step 2: Input Features
Customer ID: Customer ID variable can be ignored as it will not any effect on our model. As we know customer Id is just to maintain the record in serial order.
Age — Age of the customer
Experience — Years of work experience of customer has
Income — Annual Income of the customer which is in (thousand) dollars
CCAvg — Avg. spending on credit cards per month which in (thousand ) dollars
Mortgage — Value of House Mortgage in (thousand) dollars if any
Binary Categorical Variable :
CD Account — Does the customer have CD Account with bank or not?
Security Account — Does the customer have Security Account with bank or not?
Online — Does the customer have Online banking facility with bank or not?
Credit Card — Does the customer have a credit card issued by Bank or not?
Personal Loan — This our target variable which we have to predict. This indicates that the customer has token loan or not?
Ordinal Categorical variables :
Family — Number of famlily member of the customer
Education — Education level of the customer. In our dataset it ranges from 1 to 3 which are Under Graduate, Graduate and Post Graduate respectively.
Step3: Exploratory Data Analysis
Inference from Correlation:
Experience’ and ‘Age’ have strong positive association.
Income and CCAvg have moderate positive association
Mortgage has moderately correlated with income
Personal Loan has maximum correlation with ‘Income’, ‘CCAvg’, ‘CD Account’
Step4 : Split the data into train and test
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42, shuffle=True, stratify=y)
Step5: Model Buliding
Model 1: Logistic Regression
We will also model based on Knn and Naive Bayes to compare and identify the most suitable model
Model 2: KNN (K nearest neighbours)
Since KNN is a distance based algorithm ,hence scaling must be performed before modelling.
Variation of misclassification error, with different value of K
Classification Report for KNN at k=3. As we notice, there is improvement in precision and recall by KNN algorithm as compared to Logistic Regression
Model 3: Naive Bayes Algorithm
Step 5: Comparison of Performance of models:
Hence among the above three algorithm applied on the underline dataset, K-NN would be the best choice to predict the customers who will accept the personal loan with an accuracy of 96 %, Precision of 92 % and Recall of 66%