Customer segmentation groups customers into their respective clusters. It gives customer insights and enables targeted marketing campaign.
## Scenario GOOD TIME Pte Ltd is a trendy bar based in Singapore. Over the years, the bar has collected information about its customers, such as their preference and demographic. Given the recent economic crisis, GOOD TIME wants to find ways to improve the effectiveness of its marketing efforts. The bar wishes to explore **targeted advertising** by creating advertisement specifically for **each segment of its customers**. This will be a walkthrough on how to build a machine learning model that will **determine the optimal number of clusters** in the dataset and **allocate each customer to appropriate cluster**. ## Dataset ### Description The dataset consists of metadata about customers. Each row represents the demographics and preferences of each customer. It contains both categorical data (e.g. dress_preference, drink_level, and transport) and non-categorical data (e.g. height, weight). The columns' titles are descriptive and are self-explanatory. ![Restaurant Dataset Overview](https://drive.google.com/uc?export=view&id=1x0fCj2Rp6PpMtvmmcEOywd2Kyr6jeuQe) ### Attribute Information The attribute information explains the dataset in further details, such as: 1. Data type: numeric or nominal 2. How many missing values 3. The number of categorical variables The breakdown of each column's attribute information as follows: - Rows: 138 - Columns: 19 - userID: Nominal - latitude: Numeric - longitude: Numeric - smoker: Nominal, Missing: 3, 2 [false, true] - drink_level: Nominal, 3 [abstemious, social drinker, casual drinker] - dress_preference: Nominal, Missing: 5, 4 [informal, formal, no preference, elegant] - ambience: Nominal, Missing: 6, 3 [family, friends, solitary] - transport: Nominal, Missing: 7, 3 [on foot, public, car owner] - marital_status: Nominal, Missing: 4, 3 [single, married, widow] - hijos: Nominal, Missing: 11, 3 [independent, kids, dependent] - birth_year: Nominal - interest: Nominal, 5 [variety, technology, none, retro, eco-friendly] - personality: Nominal, 4 [thrifty-protector, hunter-ostentatious, hard-worker, conformist] - religion: Nominal, 5 [none, Catholic, Christian, Mormon, Jewish] - activity: Nominal, Missing: 7, 4 [student, professional, unemployed, working-class] - color: Nominal, 8 [black, red, blue, green, purple, orange, yellow, white] - weight: Numeric - budget: Nominal, Missing: 7, 3 [medium, low, high] - height: Numeric ## Content This walkthrough demonstrates the following: 1. Build a clustering model to perform customer segmentation 2. Interpret the results of the clustering 3. Improve performance by tuning the parameters of the model ## Outcome The graph below shows the results of segmenting customers into clusters. [Principal Component Analysis](https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/principal-component-analysis) is used to compress the multi-dimensional features into two dimensions for ease of visualisation. The ML model identified 15 different clusters of customers in the dataset. GOOD TIME Pte Ltd can use the result and create one advertising campaign for each cluster of the customer to improve its marketing effort. ![Clustering Result Graph](https://drive.google.com/uc?export=view&id=12sZIEY-TezvmPUIWDRP7ZQQEGFNthoWw) Each customer (row) is tagged to a specific segment. For instance, the first customer in the dataset is tagged to cluster 0 and the second customer is tagged to cluster 6. GOOD TIME Pte Ltd can use this information to perform further analysis and create advertisements that specifically target each customer segment. ![Clustering Result](https://drive.google.com/uc?export=view&id=1f1JZlDkQLLsgwh1ZPtqV8QCPNJ-C_fM-) ## Further Reading Motivated learners can read further about [using K-means to perform customer segmentation](https://makerspace.aisingapore.org/2020/04/the-most-important-data-science-tool-for-market-and-customer-segmentation/).