Predict Fracking Success via Classification

February 23, 2016
Combining fracking job data with oil/gas production data, we build a classification model to predict the success of secondary completions.
![Frack Job ML Experiment SS][1] [FracFocus][2], a national hydraulic fracturing chemical registry, is a site where oil/gas companies may voluntary submit information about frack jobs (e.g., chemicals used, depth of job, etc.). This machine learning experiment sources data from a subset of the jobs ran between 2010 – 2013. Specifically, we consider “secondary completions” in the state of Texas i.e. a frack job representing a completion on an existing wellbore. Using [production output][3] from the Railroad Commission of Texas, we can compare the average amount of oil/gas prior to and following the fracking job. To see a visualization of this data, download the sample PowerBI Desktop file from [][4] (Note, you will need to download the [Power BI Desktop application][5] in order to open this file). ![Power BI Desktop SS][6] For our machine learning experiment, the fracking and production data is joined into a single table. ![Source Data][7] To build a predictive model, we consider operator, country, water volume, vertical depth, a subset of popular chemicals, and whether this job was for oil or gas. We also attempt to identify the likely supplier of the frack job (TopNamedSupplier) based on information in the frack job’s chemical report. Production Change, expressed as a percentage, represents the average change in oil/gas production; Large Production Increase is set to 1 for production changes of 100% or more. This experiment attempts to predict the Large Production Increase value using the Two-Class Logistic Regression algorithm – a type of binary classification (note: the Production Change value could alternatively be used as part of a regression model). Holding back 20% of the data for testing purposes, we are able to make predictions with roughly 70% accuracy. ![Evaluate Results][8] [1]: [2]: [3]: [4]: [5]: [6]: [7]: [8]: