Discover Association Rules

By for October 27, 2016

Report Abuse
Discover a set of association rules or frequent itemsets, along with relevant metrics, from the input dataset
This custom module is based on the popular [CRAN R package arules]( to calculate association rules and frequent itemsets using the Apriori algorithm. Here is what a simple example looks like using the Adult census dataset. ![Sample experiment screenshot]( There are 3 different ways you can format your input dataset. And you must specify which format you choose in the Input Dataset Type parameter. ##1. Use a regular data table. Note all columns selected will be implicitly converted to categorical columns. Below is an example using the adult census dataset. ![adult census dataset screenshot]( To see what rule sets can lead to high or low income, specify the Right Hand Side value to be: income=<=50K, income=>50K Note the above is really *income="<=50K"* and *income=">50K"* but arules deal with string matching so the quotes are not needed. ##2. Use an item list. The list is in a single column of a dataset which contains all items delimited by comma. Tip: In order for Azure ML to avoid using the comma as a CSV format delimiter, you can add a pair of quotes around the list. ![Grocery items list]( To see what customer might also purchase, if he/she purchased bread and butter, set the Left Hand Side value to be simply: bread, butter ##3. Use an item matrix. The matrix is a dataset with item names as column headers, and 1 and 0s as matrix values. See below example: ![Grocery items matrix]( ## Output By default, the output of the module is a data table of discovered rules, with rule id, left hand side (LHS), right hand side (RHS), support, confidence and lift as columns. ![output example]( You can also specify the minimal support value, minimal confidence value, max number of rule items in each discovered rule set (minimal is 2), and whether or not to prune the rule set by merging the redundant rules. In addition, you can also choose to output frequent itemsets, maximally frequent itemsets, closed frequent itemsets or hyperedgesets. These output types are automatically sorted by the support value in descending order. For more information about these parameters, please see the original documentation of the [arules CRAN package documentation]( The source code of this custom module is in [GitHub](