Clustering: Color quantization
This experiment demonstrates how the k-Means clustering module can be used to perform color quantization of an image.
#Clustering: Color Quantization#
In this experiment, we used **K-Means Clustering** with the Bill Gates Image dataset to reduce the overall number of colors in that image.
This process of reducing the number of distinct colors in an image is called **color quantization**. For additional technical detail, see [this Wikipedia article:](https://en.wikipedia.org/wiki/Color_quantization).
##Data##
The sample dataset used in this experiment contains a CSV representation of an image of Bill Gates measuring 160 x 160 pixels. Each row in the data table represents a pixel, and the columns represent the X and Y coordinates of each pixel, followed by the red, green, and blue (RGB) color values of each pixel. Each of the color values is an integer between 0 and 255, because there is a total of 256 x 256 x 256 distinct colors in the image.
##Model##
The objective of this experiment is to use the K-Means clustering algorithm to group pixels into clusters based on their colors.
After the appropriate number of clusters has been determined, all pixels that are assigned to the same cluster will have their RGB values replaced by the mean RGB values of that cluster. This process (quantization) reduces the total number of colors in the image to the number of clusters (k) in the model.
When you configure the **K-Means Clustering** module, you can select different distance measures to use in separating the clusters. For this experiment, we used the Euclidean distance, which is applied to all pairs of pixels in the RGB space. The K-Means algorithm will attempt to group the pixels into clusters based on this distance. After several iterations, it will return the optimal cluster assignments of each point.
##Transforming the Image##
The clustering module outputs the original pixel coordinates and RGB values, in addition to the new cluster assignments (the column 'Assignments').
We applied a custom R script to compute the mean RGB values for each cluster, and replace the original values by the cluster means.
##Displaying the Image##
The original and transformed images can be displayed using another R script that directs the graphics output to the R Device output port.
##Running the Experiment ##
The following diagram shows the overall workflow of the experiment:

###Experiment Details
1. From the Saved Datasets group, drag and drop **Bill Gates RGB Image**.
2. Add a **Metadata Editor** module and open the Column Selector.
3. Select the columns R,G,B. From the **Fields** dropdown list, choose **Features** to indicate that these columns are features for use by the K-Means algorithm.
3. Add a **K-Means Clustering** module and set the number of centroids to 32.
4. Add a **Train Clustering Model** module and connect the **K-Means Clustering** module to the left port. Add a **Metadata Editor** module and connect it to the right port. This module performs the actual clustering.
5. Add an **Execute R Script** module. Connect the **Train Clustering Model** output to the leftmost input port.
6. Type the following R script in the Properties pane of the **Execute R Script** module. This script implements the color transformation:
```
img = maml.mapInputPort(1)
transform = function(colorColumnName, img){
clusterMeans = tapply(img[ ,colorColumnName], img$Assignments, mean )
clusterMeans = data.frame(clusterMeans, as.integer(names(clusterMeans)))
colnames(clusterMeans) = c(paste(colorColumnName, "New", sep=""), "Assignments")
transform = merge(img, clusterMeans, by="Assignments")
return(transform[order(transform$X,transform$Y), paste(colorColumnName, "New", sep="")])
}
transformed = as.data.frame(lapply(c("R","G","B"), transform, img=img))
colnames(transformed) = c("R","G","B")
maml.mapOutputPort("transformed")
```
6. Add an **Execute R Script** module to display the transformed image. Connect the output from the left port of the transformation module (step 5) to the leftmost input port and add the following R script:
```
img = maml.mapInputPort(1)
img_rgb = rgb(img$R, img$G, img$B, maxColorValue = 255)
dim(img_rgb) = c(160,160)
library(grid)
grid.raster(img_rgb, interpolate=FALSE)
```
7. Add another **Execute R Script** module and type in the same R script that was used before to display the original image. Connect the original dataset **Bill Gates RGB Image** to the left input port of this module.
8. To view the results of the color quantization process and compare the transformed image to the original, click the **R Device** output port of the **Execute R Script** module added in steps 5 and 6 and select **Visualize**.
In the following graphic, you see the transformed image on the left, and the original image on the right.
<table width=100% border=0>
<tr>
<td align=center></td>
<td align=center></td>
<tr/>