Ans 1-9, Business Intelligence- ISM633 Submitted by: Sargam Palod (1810120031)
Ans 1. Voluntary Termination -88 Reasons: >Another position (20) >More money (11) >Hours (9) >Career change (9) Ans 2. Another position - 14 Ans 3. Near Future - 11 Ans 4. > Career builder - 7790/ employee > Pay per click - 1323/ employee > MBTA ads - 646/employee > On- campus recruiting - 625/employee Ans 5. Retention Rate- 82.76% Ans 6. Building a linear regression model, Dependent Variable= Pay Rate; because this is the rate of money attained by an employee by keeping Independent Variable= position, manager name, employee source, performance score and department. R^2= 83.26% This model explains the that the moneary pay mostly depends on the position, department and manager as well which explains the general rule of salary earned. Ans 7. In a logistic regression model, Dependent Variable= Employment status has been taken as dependent variable which is categorical in nature. This would help us understand what are the variables which are creating an impact on the status of employment. The factors taken for logistic regression are payrate, department, position, manager name and performance score. These variables are taken into consideration because they affect the status for any employee in large dimensions. Ans 8. When using the CART model. Dependent Variable= Employment Status CART algorithm is the process of answering the sequence of questions in a hierarchical basis. In CART, we use the same variables as in Logistic Regression i.e. EmpStatus ID is based on sex, unit, age, efficiency, director, marital status, as well as reducing the number of decision trees to 1 and evaluating results for better analysis. So, for predicting the better results we start analysing from the top level like department and start drilling down for the employment status as reason. Ans 9. Confidence= The ratio of the number of transactions that include all items in the denominator, as well as the numerator or support, to the number of transactions, included in all items of the antecedent. Lift= Association between the two variables. a. The larger the lift ratio, the more significant is the correlation between the two variable. It also means that the two variables are independent and have nothing in common, still they show high degree of association. In the first case, performance score=fully meets and terminated for a cause have high correlation. b. A lift ratio larger than 1 implies that the relationship between the Marital description= divorced and employee status=voluntarily terminated is more significant than would be expected if the two sets were independent. Also, the count of 3 shows, the three case where a person was divorced has voluntarily left the organisation. c. If the employee is able to meet performance score in 90 days, the employee is voluntarily terminated. A lift ratio larger than 1 implies that the relationship between the Performance score= 90 days meets and employee status=voluntarily terminated is more relevant than would be expected if the two items were independent. The count of 3 shows, the three transactions where a person has met his performance score and has voluntarily left the organisation. d. If the employee's marital description is divorced and his performance score ranges from N/A to too early to review, the status of employee being voluntarily terminated. Confidence measures the reliability of the inference made, in this case, the confidence is 100% at a count of 4, which means the probability of an employee who is divorced and whose performance score is between N/A to too early to review is likely to get terminated voluntarily. With support value equal in 4 out of 5 rules signify all 4 items in the rules are equally frequent to appear in the data set. e. When the employee is Male and a Production Technician II, the status of employee being voluntarily terminated. The lift value is 3.349 which is simply the ratio of these values: target response (i.e. employee status) divided by average response (i.e. position and gender). This value of lift lets us know the degree to which those two occurrences are dependent on one another, and makes those rules potentially useful for predicting the the outcome for future such occurrences.