Employee Retention Predictive Analysis - Human Resources Data Set

November 6, 2019
Client Details: HR department at a large software company that follows retroactive retention process with exit interview being only source of insight. Client wants to roll out of new initiative: Proactive Retention (for permanent – non-temp employees only) Business Problem: Haphazard approach as exit interview depends heavily on the skill of the interviewer. No systematic aggregation of insights across employees who have left. Difficulty in being proactive as only exit interviews drive policy changes post employee has left the organisation Objective of the study: Predict whether employee is likely to leave and providing insights to business to strategise accordingly for retention Data Acquisition & Deliverables: Data was obtained from HR Department in CSV format Build a classification model using that dataset. Predictive machine learning API machine learning task: classification target variable: status (employed/left) Data Preparation: There are total no: of: Rows – 14249 & Columns (Features) – 10 The observations span 12 different departments. After cleaning the records, 79.83 % of original data is retained for data analysis Data Visualisation: Descriptive Analysis For Tenure: Average years spend by employees in the organisation: 3.5 years Median is 3 years SD is1.46 For Employee Status: At present 10857 are employed 3392 have left the organisation For departments: Sales department employs maximum number of people followed by engineering & support The least deployment is for temporary staff i.e why we are interested in predicting retention for permanent staff Exploratory Analysis: Top 3 most important features impacting employee retention- N_projects Satisfaction Avg_monthly_hours Least 3 features impacting employee retention- Department Filed_complaint Recently_promoted Model Evaluation Metrics: Accuracy: 97.6 Misclassification rate/ Error rate: 2.35 True positive rate: 94.17 (Sensetivity or Recall) False positive rate: 1.25 Specificity: 78.5 Precision: 95.9 Prevalence: 24.12 F1 Score: 95.04