Text to Columns

By for October 21, 2016
Language

Report Abuse
This module converts text in a single cell into multiple columns based on the selected delimiter
This experiment has a data from many sensors in a single column separated by a space. Now first step in data prep is to get this column separated into multiple columns for each sensor and additional meta data about the sensor data such as id, cycle etc. This new module helps convert a text in a single column that is separated by a delimiter (such as comma) into multiple columns. This module also updates output data frame with the column names provided in dataset2 for these newly created columns. ### Module Configuration ### This module is very easy to configure. It requires user to select a column to be converted into multiple columns, provide a delimiter, and define output behavior in terms of append vs. replace **Select Column** - This parameter is for allowing user to specify which columns are required to be converted into multiple columns. This parameter by default picks all string columns even though module will actually only work on the first string column **Split Character** - This parameter is for user to provide a delimiter to be used during conversion of a single column to multiple columns. By default, this module uses space as a delimiter **Include Original Column** - This parameter defines replace vs. append behavior. If selected as "include", it will add both existing columns and the new columns to the final output port. If selected as "exclude", it will only provide new columns as output ![](http://neerajkh.blob.core.windows.net/images/TextToColModuleConfig.PNG) ### Parameter Restrictions ### 1. This module only works on a single column. So if more than 1 column is selected, it will pick the first column. 1. This module expects column names to be provided as optional dataset2. If dataset2 has more or less values than column names, behavior is not deterministic. If dataset2 has more than 1 row, it only picks column names from the first row and ignores the rest of the rows ### Sample Input Dataset ### ![](http://neerajkh.blob.core.windows.net/images/SensorDataInARow.PNG) ### Sample Output Dataset ### ![](http://neerajkh.blob.core.windows.net/images/TextToColOutput.PNG) ### Overall Sample Experiment Graph ### ![](http://neerajkh.blob.core.windows.net/images/OverallTextToCols.PNG) ### Source Code### Source code for this module is located here - [https://gist.github.com/nk773/09b25c63421fe631c12a009ea5c40d4e](https://gist.github.com/nk773/09b25c63421fe631c12a009ea5c40d4e)