Personal Project: Predict auction sale price for Bulldozer Inc


JMP, Data Analysis, Prediction

This analysis can help Bulldozer Inc grow as a business so that they are more transparency with their buyers to have potentially more buyers. This can also help them to understand what their profits will likely be and have better reliability/accuracy in their predicted financial standings. The response variable is Sale price since this is what we are trying to predict. The predictor variables are Product Group, ProductGroupDesc, Enclosure, Hydraulics, Coupler, fiProductClassDesc, Year[Saledate], YearMade, Saledate, state, fiModelDesc, and fiBaseModel. Since there might be a correlation between how old a part is to when it was bought, a column has been added showing the difference between year made and year sale date this will be called column “age.”

Process

Data

Using the column viewer there is have 54 variables. About half of them are missing a substantial amount of data. The data is mostly categorical but does have some numerical.

Analyze: Modeling

Three Model Comparison

Model I: Stepwise Modeling

The Stepwise model has a .367 r-squared, this model is not very good at predicting sales price.Looking at the VIF or the variance inflation factor we can see if there is multicollinearity. The higher the value the higher the likelihood. Such as the second and fifth variables which are 45.01 and 36.20 respectively. Overall most variables do not have a high VIF.

Model II: Decision Tree

Sale price is set as the dependent and the seven possible predictors are set as x. We can see from the diagram that the most splits the model ran was 49 and that was the best it could find. Resulting in an r square of .415 which is better than the stepwise model, however, this r-squared is still low.

Column Contributions

Looking at the column contributions we see the fiBaseModel has the most influence in the Decision Tree followed by the “age.” It also should be noted that Enclosure and Produce Group Desc has no influence in this model

Model III: Neural Network

Lastly, to begin modeling the Neural Network, the sale price is set as the dependent variable or y and the 7 possible predictors are set as x. The random seed has been set to 1234. The default settings will be used. From the training and validation, the r-square is weak in both. The training is better than the validation, but this is expected. The most influencing factor is the “age” or the year sale date difference from the year made. With fiBaseModel contributing a bit as well.The neural network has 4 nodes. Overall it is not a very good model.


Model Comparison

Comparing models will show which model is best using the validation. Looking at the training, validation, and test Partition has the best r-square in all three. This is unusual seeing that Neural is usually best but not out of the ordinary. However, the r-square of the partition is not too far from the neural network. The Neural network has .41, .37, and .37 and the partition has .44, .39, and .39 for the training, validation, and test respectively. The test can only be used if there is a validation it is not used in the model-building and is only a measure of the model’s predictive capability.

Conclusion

The best model for Bulldozer Inc to use to predict sale price of auction items is Partition since this has the highest r square.

Alondra Salazar

© 2022 Alondra Salazar