Predicting Retail Store Department Sales using XpredictR
Domain : Retail Forecasting
The experiment demonstrates how to predict the store sales for a retail store.
Description
Understand and evaluating retail segment using Machine learning is still at rookie stage to address and train for consumer behavior analysis. This sample demonstrates how store sales are indirectly affected due to certain attributes/features that are related to the retail customer world.
Dataset
The dataset has various features such as temperature, store locations, consumer price Index, product types etc. One can determine how weather patterns, fuel price patterns, holidays and unemployability can impact sales.
Similar datasets are available over web and kaggle competitions such as
https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data
Data Processing
Upon analysis, one of the most suitable ways to segment the dataset was to divide it along the lines of departments. To help AzureML do seasonal analysis, the date-time attributes were converted to a simpler format along with number of days since first day and number of current weeks etc. The historic sales around the current weeks were added and the regression slope for that particular store-department combo was calculated. The Normalized values derived were used with attributes such as fuel price, temperature etc. to implement categorization wherever it helped to improve results.
Training the Model
Training was done using the Boosted Decision Tree algorithm and the number of samples for a single node were less due to data insufficiency. The number of leaves are 700.
Choosing the Algorithm
We had a host of algorithms to choose from, ranging from linear-based algorithms to neural-networks. We found that linear algorithms didn't give good results as well, as it seemed the data was too complex to be addressed by them. We found that Boosted Decision Tree was giving the best results amongst random and boosted trees.
There was a problem that we had faced while modeling using Boosted Decision Trees, we found that it was unable to handle exceptions well. So suppose on a particular day, maybe in the holiday season, the sales go higher than normal or it goes outside the boundaries of the training dataset, our models don’t give good results even if the pattern is there. It tends to predict within the boundary usually!