Online Payment Fraud Detection

December 19, 2018

Report Abuse
Predict whether a payment is a potential fraudulent transaction. This is an example of anomaly detection in a cluster of online payments.
The use case described in this experiment is classification of online payment transactions as anomaly, by observing typical patterns of expenditure by customers over time, different vendors, and countries. Financial institutions take a traditional approach to tackling this problem by implementing rules or logic statements to query transactions and to direct suspicious transactions through to human review. While there is some variation, it is notable that over 90 percent of online fraud detection platforms still use this method, including platforms used by banks and payment gateways. While this is effective to some degree, in cases where there is a sufficient gap between an order being received and goods being shipped, it is also incredibly costly and far slower than alternatives. The “rules” in these platforms use a combination of data, horizon-scanning and gut-feel. The system is backed with manual reviews to confirm experts’ decisions. If we make the example of an abundance of, say, Russian credit cards available on the darknet due to a data breach in Russia, businesses recognize the increased risk of Russian cards as fraudulent and can simply add a rule to review any transactions from Russian credit cards. Following this, every attempted purchase made by such a card raises an alert and is declined or reviewed. However, this raises two significant issues. The first is that such a generalized rule may turn away millions of legitimate customers, ultimately losing the business money and jeopardizing customer relations. Secondly, while this can deter future threats after such fraud has been found, it fails to identify or predict potential threats that businesses are not aware of. These rules tend to produce binary results, deeming transactions as either good or bad and failing to consider anything in between. And until the rules are manually reviewed, the system will continue to prevent such transactions as those from Turkish credit cards, even if the risk or threat is no longer prominent. **Machine learning** works on the basis of large, historical datasets that have been created using a collection of data across many clients and industries. Even companies that only process a relatively small number of transactions are able to take full advantage of the data sets for their vertical, allowing them to get accurate decisions on each transaction. This aggregation of data provides a highly accurate set of training data, and the access to this information allows businesses to choose the right model to optimize the levels of recall and precision that they provide: out of all the transactions the model predicts to be fraudulent (recall), what proportion of these actually are (precision)? Within the datasets, features are constructed. These are data points such as the age and value of the customer account, as well as the origin of the credit card. There can be hundreds of features and each contributes, to varying extents, towards the fraud probability. Note, the degree in which each feature contributes to the fraud score is not determined by a fraud analyst, but is generated by the artificial intelligence of the machine which is driven by the training set. So, in regards to the Russian card fraud, if the use of Russian cards to commit fraud is proven to be high, the fraud weighting of a transaction that uses a Russian credit card will be equally so. However, if this were to diminish, the contribution level would parallel. Simply put, these models self-learn without explicit programming such as with manual review. #Dataset The sample dataset used in this experiment contains the following columns: - Timestamp, expressed in Unix epoch format - Client, name of the person performing the transaction - Description, name of the merchant - Country, from where the transaction has been processed - Amount, in Euro currency, of the payment transaction ![Dataset][1] To enhance the predictive power of the ML algorithms, an important step is **feature engineering**, where additional features are created from raw data based on domain knowledge. For example, if an account has not made a big purchase in the last month, then a thousand-dollar transaction all of a sudden could be suspicious. The new features generated in this scenario include: - **Aggregated variables**, such as aggregated transaction amount per account, aggregated transaction count per account in last 24 hours and last 30 days. - **Mismatch variables**, such as mismatch between shippingCountry and billingCountry, which potentially indicates abnormal behavior. - **Risk tables**, fraud risks are calculated using historical probability grouped by country, state, IPAddress, etc. By analyzing the sample data set, we can immediately observe: ![Frequency of payments in a similar amount range][2] ![Typical expense range by client][3] ![Distribution of payments by country][4] #Transformation As you notice, the timestamp is expressed in number of days since 01/01/1900. A simple SQL transformation can turn that number is a more readable date field. Add the **Apply SQL Transformation** module just after the dataset, and enter the SQL statement below. ![Apply SQL Transformation][5] select * , date( datetime(t1.Timestamp * 86400, 'unixepoch', 'localtime') , '-70 years') as CreatedOn from t1; ![Date transformation script][6] The new column is named CreatedOn. I can remove the original Timestamp column from the dataset. Also, I don’t need the merchant name in this experiment, so I can remove the Description column. Add a **Select Columns in Dataset** module to the experiment flow. ![Select Columns in Dataset][7] When selecting the columns to keep in the dataset being trained, ignore Timestamp and Description. ![Remove Timestamp and Description columns][8] #Training Azure Machine Learning Studio supports the following anomaly detection algorithms: - One-Class Support Vector Machine - PCA-Based Anomaly Detection ![Anomaly Detection initialize models][9] **Principal Component Analysis**, which is frequently abbreviated to PCA, is an established technique in machine learning that can be applied to feature selection and classification. PCA is frequently used in exploratory data analysis because it reveals the inner structure of the data and explains the variance in the data. PCA works by analyzing data that contains multiple variables, all possibly correlated, and determining some combination of values that best captures the differences in outcomes. It then outputs the combination of values into a new set of values called principal components. In the case of anomaly detection, for each new input, the anomaly detector first computes its projection on the eigenvectors, and then computes the normalized reconstruction error. This normalized error is the anomaly score. The higher the error, the more anomalous the instance is. You can use the **One-Class Support Vector Model** module to create a supervised anomaly detection model. This module is particularly useful in scenarios where you have a lot of “normal” data and not many cases of the anomalies you are trying to detect. For example, if you need to detect fraudulent transactions, you might not have many examples of fraud that you could use to train a typically classification model, but you might have many examples of good transactions. This experiment is designed for use in scenarios where it is easy to obtain training data from one class, such as valid transactions, but difficult to obtain sufficient samples of the targeted anomalies. For example, if you need to detect fraudulent transaction, you might not have enough examples of fraud to train the mode, but have many examples of good transactions. Azure Machine Learning Studio provides a dedicate training module for anomaly detection experiments, called **Train Anomaly Detection Model**. ![Train Anomaly Detection Model][10] #Scoring After adding the **Score Model** module, a scored probability is calculated for each record. This represents a likelihood, or classification, of a transaction as potentially fraudulent. ![Training experiment][11] ![Scored dataset][12] By visualizing the scored dataset in the Score Model module, we can also observe the distribution of scored probabilities and the density of occurrence. ![Scored probabilities and density of occurrence][13] #Web Service As usual, the training experiment is then transformed into a predictive experiment, and a web service generated and hosted in the Azure cloud, for consumption by third-party applications over REST HTTPS protocol. ![Predictive experiment][14] To consume the Web Service programmatically, this is the expected format of the **request message**. The response message will contain the scored probability for each record. { "Inputs": { "input1": { "ColumnNames": [ "Client", "Country", "Amount", "CreatedOn" ], "Values": [ [ "value", "value", "0", "value" ], [ "value", "value", "0", "value" ] ] } }, "GlobalParameters": {} } [1]: [2]: [3]: [4]: [5]: [6]: [7]: [8]: [9]: [10]: [11]: [12]: [13]: [14]: