Fraud Detection with SQL Server

By for July 25, 2017

Report Abuse
Using SQL Server 2017 with R Services, build and deploy a machine learning model for online retailers to detect fraudulent purchase transactions.
> **Note:** You can read more about this solution and deployment guides in the [Fraud Detection solution]( published on GitHub. > **STOP before you proceed** If you have not yet deployed a Data Science Virtual Machine on your Azure Subscription, you must first **[accept the Terms of Use](**. ## Overview Fraud detection is one of the earliest industrial applications of data mining and machine learning. This solution shows how to build and deploy a machine learning model for online retailers to detect fraudulent purchase transactions. Read more about this solution at the [Fraud Detection Website]( ## Business Perspective This solution shows how to preprocess data, create new features, train R models, and perform predictions in-database. The final table in the SQL Server database provides a predicted value for each transaction. This predicted value, which can be interpreted as a probability of fraud, can help you determine whether you wish to try to interrupt the transaction. We provide a PowerBI dashboard which shows the predicted scores of the data in the Test set - this is data for which we know whether the transaction was fraudulent, but was not used to build the model itself. Note in the top table that we predicted fraud in 14,525 cases which were not fraudlent. This is something to keep in mind when deploying the model. Rather than reject a transaction that is predicted to be fraud, we might instead want to add a step to the purchase that would discourage an actual fraudulent transaction while still allowing a valid transaction to occur. ## Data Scientist Perspective SQL Server R Services brings the compute to the data by running R on the computer that hosts the database. It includes a database service that runs outside the SQL Server process and communicates securely with the R runtime. This solution walks through the steps to process data, train R models, and perform scoring on the SQL Server machine. The final scored database table in SQL Server gives a predicted score for each potential transaction. This data is then visualized in PowerBI. Data scientists who are testing and developing solutions can work from the convenience of their R IDE on their client machine, while [pushing the compute to the SQL Server machine]( The completed solutions are deployed to SQL Server 2016 by embedding calls to R in stored procedures. These solutions can then be further automated with SQL Server Integration Services and SQL Server agent. ## Pricing Your Azure subscription used for the deployment will incur consumption charges on the services used in this solution, approximately $2.06(USD)/hour for the default VM. >Please ensure that you stop your VM instance when not actively using the solution. Running the VM will incur higher costs. > >**Please delete the solution if you are not using it.** ## Disclaimer ©2017 Microsoft Corporation. All rights reserved. This information is provided "as-is" and may change without notice. Microsoft makes no warranties, express or implied, with respect to the information provided here. Third party data was used to generate the Solution. You are responsible for respecting the rights of others, including procuring and complying with relevant licenses in order to create similar datasets.