Text Classification with SQL Server

By for April 6, 2018

Report Abuse
This solution uses SQL Server 2017 + ML Services with R or Python to train a machine learning model to categorize text.
> **Note:** You can read more about this solution and deployment guides in the [Text Classification solution](https://github.com/Microsoft/ml-server-text-classification) published on GitHub. > **STOP before you proceed** If you have not yet deployed a Data Science Virtual Machine on your Azure Subscription, you must first **[accept the Terms of Use](https://portal.azure.com/#blade/Microsoft_Azure_Marketplace/LegalTermsSkuProgrammaticAccessBlade/legalTermsSkuProgrammaticAccessData/%7B%22product%22%3A%7B%22publisherId%22%3A%22microsoft-ads%22%2C%22offerId%22%3A%22windows-data-science-vm%22%2C%22planId%22%3A%22windows2016%22%7D%7D)**. ## Overview This solution describes how to train a machine learning model using [SQL Server Machine Learning Services](https://docs.microsoft.com/en-us/sql/advanced-analytics/architecture-overview-machine-learning?view=sql-server-2017) to categorize incoming text. This solution uses a preprocessed version of the NewsGroups20, containing a Subject (extracted from the raw text data), a Text, and a Label (20 classes). Note this has a similar structure to a support ticket data set which would also have two data fields: Title and Problem description. Read more about this solution at the [Text Classification Website](https://microsoft.github.io/ml-server-text-classification/). ## SQL Server Machine Learning Services SQL Server Machine Learning Services brings the compute to the data by running R on the computer that hosts the database. It includes a database service that runs outside the SQL Server process and communicates securely with the Python runtime. This solution walks through the steps to create and preprocess data, train a multiclass Logistic Regression Model while featurizing the text variables separately on the fly. Data scientists who are testing and developing solutions can work from the convenience of their Python IDE on their client machine, while [pushing the compute to the SQL Server machine](https://msdn.microsoft.com/en-us/library/mt604885.aspx). The completed solutions are deployed to SQL Server 2017 by embedding calls to R in stored procedures. These solutions can then be further automated with SQL Server Integration Services and SQL Server agent. ## Pricing Your Azure subscription used for the deployment will incur consumption charges on the services used in this solution, approximately $4.12(USD)/hour for the default VM in the East US. ### Disclaimer * All prices shown are in US Dollar ($). This is a summary estimate, not a quote. For up to date pricing information please visit [https://azure.microsoft.com/pricing/calculator/](https://azure.microsoft.com/pricing/calculator/). >Please ensure that you stop your VM instance when not actively using the solution. Running the VM will incur higher costs. > >**Please delete the solution if you are not using it.** ## Disclaimer ©2017 Microsoft Corporation. All rights reserved. This information is provided "as-is" and may change without notice. Microsoft makes no warranties, express or implied, with respect to the information provided here. Third party data was used to generate the Solution. You are responsible for respecting the rights of others, including procuring and complying with relevant licenses in order to create similar datasets.