Predicting the vehicle price using Boosted Decision Tree
Predicted Variable: Price Train Data/Score Date: 70/30 **Data Manipulations performed:** - List item Used median values (as the data is not normally distributed) for missing data for following variables: bore, stroke, horsepower, peak-rpm Reason: If we drop the records, we might not get accurate predictions for few of the 'makes'. The data might be insufficient for these makes and overall we will get less coefficient values - List item Used custom value as 'four' for the following variable: num-of-doors Reason: This is a categorized variable and thus I used the most repeated value. If we dropped the records, there will be less data for a couple of makes that might result in incorrect prediction or low coefficient values. **Liner Regression observations:** Coefficient of Determination: 0.82 Tested with 3 different records and successfully got accurate values. **Boosted Regression Tree observations:** Trees generated: 100 Coefficient of Determination: 0.72 Tested with 3 different records and successfully got accurate values. Recommendation: As there are many dependent/classified variables in the data, it would be difficult to analyse if decision tree algorithms are used. Using linear regress algorithm might give more accurate predictions as there are more dependent variable and less correlated variables.