Employee Retention Predictive Analysis - Human Resources Data Set
Client Details:
HR department at a large software company that follows retroactive retention process with exit interview being only source of insight.
Client wants to roll out of new initiative: Proactive Retention (for permanent – non-temp employees only)
Business Problem:
Haphazard approach as exit interview depends heavily on the skill of the interviewer.
No systematic aggregation of insights across employees who have left.
Difficulty in being proactive as only exit interviews drive policy changes post employee has left the organisation
Objective of the study:
Predict whether employee is likely to leave and providing insights to business to strategise accordingly for retention
Data Acquisition & Deliverables:
Data was obtained from HR Department in CSV format
Build a classification model using that dataset.
Predictive machine learning API machine learning task: classification target variable: status (employed/left)
Data Preparation:
There are total no: of: Rows – 14249 & Columns (Features) – 10
The observations span 12 different departments.
After cleaning the records, 79.83 % of original data is retained for data analysis
Data Visualisation: Descriptive Analysis
For Tenure: Average years spend by employees in the organisation: 3.5 years
Median is 3 years
SD is1.46
For Employee Status: At present 10857 are employed
3392 have left the organisation
For departments: Sales department employs maximum number of people followed by engineering & support
The least deployment is for temporary staff i.e why we are interested in predicting retention for permanent staff
Exploratory Analysis:
Top 3 most important features impacting employee retention-
N_projects
Satisfaction
Avg_monthly_hours
Least 3 features impacting employee retention-
Department
Filed_complaint
Recently_promoted
Model Evaluation Metrics:
Accuracy: 97.6
Misclassification rate/ Error rate: 2.35
True positive rate: 94.17 (Sensetivity or Recall)
False positive rate: 1.25
Specificity: 78.5
Precision: 95.9
Prevalence: 24.12
F1 Score: 95.04