Parse Custom Delimiters

April 8, 2015

Report Abuse
This experiment shows importing and transforming a text dataset with a custom delimiter.
AzureML currently does not support parsing of text data with custom delimiters. The workaround is therefore to first import the dataset as a tab-separated file which creates a table with a single column of text. Then, you can use Execute Python Script to tokenize the text using the custom delimiter and create a dataset with the correct columns. Often, data parsed in this manner is heterogeneous (i.e., contain a mix of types). Therefore, we use the Metadata Editor module to coerce columns to the appropriate type. Thanks to AzureML user Taheer-Naveed for posting this question in the forum (https://social.msdn.microsoft.com/Forums/azure/en-US/4e9ec16b-1504-4b52-a791-173a80ad35bc/pipe-delimited-column?forum=MachineLearning)