Step 3. Operationalize a VW model
This example shows how to operationalized a trained VW model.
This is the step 3 in the [Vowpal Wabbit Samples Collection](https://gallery.cortanaintelligence.com/Collection/Vowpal-Wabbit-Samples-2). To see how the trained model is created, please see [step 2](https://gallery.cortanaintelligence.com/Experiment/Train-a-VW-Model-with-Small-Dataset-1).
In the previous experiment, we successfully trained a VW model using the Adult Income Census data. Right click on the output port of the Train Vowpal Wabbit module, and save the trained model with a unique name. The model will be saved in your workspace in the Trained Model page, and become visible in the Trained Models category in the modules palette. Now we are ready to create a predictive experiment using that saved trained model.

The predictive experiment is a bit different than a regular predictive experiment in that you will probably need to format the dataset to be scored into the VW format. And here again you can use a bit Python script to do that.
# convert a dataframe into VW format
import pandas as pd
import numpy as np
def azureml_main(inputDF):
colsToExclude = ['workclass', 'occupation', 'native-country']
numericCols = ['fnlwgt']
output = convertDataFrameToVWFormat(inputDF, colsToExclude, numericCols)
return output
def convertDataFrameToVWFormat(inputDF, labelColName, trueLabel, colsToExclude, numericCols):
# remove '|' and ':' that are special characters in VW
def clean(s):
return "".join(s.split()).replace("|", "").replace(":", "")
def parseRow(row):
line = []
# set all labels to 1 since it is not used in scoring
line.append("1 |")
for colName in featureCols:
if (colName in numericCols):
# format numeric features
line.append("{}:{}".format(colName, row[colName]))
else:
# format string features
line.append(clean(str(row[colName])))
vw_line = " ".join(line)
return vw_line
# drop columns we don't need
inputDF.drop(colsToExclude, axis = 1)
# select feature columns
featureCols = inputDF.columns
# parse each row
output = inputDF.apply(parseRow , axis = 1).to_frame()
return output
Again, this is approach is totally fine if you are planning to call this web service using real-time request-response style, or if you are planning to call it using batch service but the batch size is rather small. If the batch is large, I recommend that you do the VW format conversion outside of Azure ML to avoid running out of memory in the Execute Python Script module.
In the Score Vowpal Wabbit module, the VW arguments is specified as:
--link logistic
This tells VW engine to map the scored result into [0, 1] space as probabilities. And finally, after scoring is done, we use a little bit R code to set the threshold to 0.5 and calculate the prediction.
# read input data
dataset <- maml.mapInputPort(1)
# set the scoring threshold to 0.5
threshold = 0.5
# set negative class
dataset$MyScoredLabels[dataset$Results < threshold] <- -1
# set positive class
dataset$MyScoredLabels[dataset$Results >= threshold] <- 1
# Result is the probability when "--link logistic" is on
dataset$MyScoredProbabilities <- dataset$Results
# drop unused columns.
dataset$Results <- NULL
# set output data
maml.mapOutputPort("dataset");
After a successful run, you can then [deploy this predictive experiment as a web service API](https://channel9.msdn.com/Blogs/Windows-Azure/Deploying-a-predictive-model-as-a-service-part-I-).