Train a decision forest model to predict surface reflectivity of streets

December 2, 2019
Train a decision forest regression model to predict street albedo using band values (Red, green, blue, NIR) from NAIP imagery as input.
**Input Data:** Training/validation: - measured or known albedo values of specific materials/sites at specific times. We collected pavement albedo measurements from City of Los Angeles, Federal Highway Administration Albedo Study. Currently we have known albedo values associated with specific measurement or installation dates for over 30,000 unique roofs or pavement location in approximately 45 US states. Overall, 2294 pavement samples were used as input for the pavement model. The samples were selected in a way to have a more diverse and balanced training data. - geometries of features of interest. For streets, official street centerline from LA city government was used for LA and OSM street data was used for other cities. - high-resolution four-band imagery of training sites from within 6-months of site measurement. National Agriculture Imagery Program (NAIP) was used as the source of imagery which has 1m resolution. In summary, the input data consist of 2294 rows. Each row contain 5 columns. "norm_red_mean", "norm_green_mean", "norm_blue_mean" and "norm_nir_mean" are the main inputs and represent preprocessed, normalized mean band values calculated from 20 random pixels within a roof. Each of the 2294 street entries are associated with an expected albedo value which will be used to train the model. **Data processing:** The data is split with a 70/30 ratio. 70% data will be used for training and the remaining 30% will be used for validation. **Model description:** Decision forest regression algorithm was used to train the model. The model was tuned to find the optimum parameter settings. **Results and evaluation of model performance:** The model is scored and evaluated on the validation data. The "Scored Label Mean" column from the "Score Model" module represent estimated mean reflectivity (unitless albedo, 0-1) for each street. The output from the "Evaluate Model" shows that the model performs well with a coefficient of determination score of 0.990903.