Loy Iris

June 16, 2019
This flower belong to which category? Train using a small dataset and very simple model. Produce high precision result.
**Predict the flower category**<br> <br> Iris is the family in the flower which contains the several species such as the iris.setosa,iris.versicolor,iris.virginica,etc.<br> <br> Work step;<br> • Create Data Set<br> • Place dataset<br> • Drop missing data rows<br> • Split Data<br> • Train, Score, Evaluate<br> • Understand Confusion matrix<br> <br> **Problem**<br> To demonstrate clustering API in action, we will use three types of iris flowers: setosa, versicolor, and virginica. All of them are stored in the same dataset. Even though the type of these flowers is known, we will not use it and run clustering algorithm only on flower parameters such as petal length, petal width, etc. The task is to group all flowers into three different clusters. We would expect the flowers of different types belong to different clusters. <br><br> The inputs of the model are following iris parameters:<br> <br> petal length<br> petal width<br> sepal length<br> sepal width<br> <br><br> **ML task - Clustering**<br> The generalized problem of clustering is to group a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. <br><br> Some other examples of clustering:<br> group news articles into topics: sports, politics, tech, etc.<br> group customers by purchase preferences.<br> divide a digital image into distinct regions for border detection or object recognition.<br><br> Clustering can look similar to multiclass classification, but the difference is that for clustering tasks we don't know the answers for the past data. So there is no "tutor"/"supervisor" that can tell if our algorithm's prediction was right or wrong. This type of ML task is called unsupervised learning. <br><br> **Solution**<br> To solve this problem, first we will build and train an ML model. Then we will use trained model for predicting a cluster for iris flowers. <br><br> **Dataset**: <br> Train https://raw.githubusercontent.com/laploy/ML.NET/master/Iris/iris-data-train.csv Test https://raw.githubusercontent.com/laploy/ML.NET/master/Taxi-fare/taxi- fare-score.csv <br><br> Dataset description<br> <br> 1. sepal length in cm<br> 2. sepal width in cm<br> 3. petal length in cm<br> 4. petal width in cm<br> 5. class: <br> -- Iris Setosa<br> -- Iris Versicolour<br> -- Iris Virginica<br> <br><br> ![enter image description here][1] <br><br> **Preview of Data**<br> <br> Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper.The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. The data set consists of 50 samples from each of three species of Iris: <br><br> There are 150 observations with 4 features each (sepal length, sepal width, petal length, petal width).<br> There are no null values, so we don't have to worry about that.<br> There are 50 observations of each species (setosa, versicolor, virginica).<br> <br><br> The finished model <br> ![enter image description here][2] <br> Question and Data<br> Question: Which group is this flower belong to?<br> <br><br> Evaluation Metrics ![enter image description here][3] [1]: https://raw.githubusercontent.com/laploy/ML.NET/master/Iris/iris-data-preview.png [2]: https://raw.githubusercontent.com/laploy/ML.NET/master/Iris/iris-azureML-diagram.JPG [3]: https://raw.githubusercontent.com/laploy/ML.NET/master/Iris/iris-azureML-metrics.JPG