Data mining for production optimization in unconventional reservoirs brings together data from multiple sources with varying levels of aggregation, detail, and quality. Tens of variables are typically included in data sets to be analyzed. There are many statistical and machine learning techniques that can be used to analyze data and summarize the results. These methods were developed to work extremely well in certain scenarios but can be terrible choices in others. The analyst may or may not be trained and experienced in using those methods. The question for both the analyst and the consumer of data mining analyses is, “What difference does the method make in the final interpreted result of an analysis?”

The objective of this study was to compare and review the relative utility of several univariate and multivariate statistical and machine learning methods in predicting the production quality of Permian Basin Wolfcamp Shale wells. The data set for the study was restricted to wells completed in and producing from the Wolfcamp. Data categories used in the study included the well location and assorted metrics capturing various aspects of the well architecture, well completion, stimulation, and production. All of this information was publicly available.

Data variables were scrutinized and corrected for inconsistent units and were sanity checked for out-of-bounds and other “bad data” problems. After the quality control effort was completed, the test data set was distributed among the statistical team for application of an agreed upon set of statistical and machine learning methods. Methods included standard univariate and multivariate linear regression as well as advanced machine learning techniques such as Support Vector Machine, Random Forests, and Boosted Regression Trees.

The strengths, limitations, implementation, and study results of each of the methods tested are discussed and compared to those of the other methods. Consistent with other data mining studies, univariate linear methods are shown to be much less robust than multivariate non-linear methods, which tend to produce more reliable results. The practical importance is that when tens to hundreds of millions of dollars are at stake in the development of shale reservoirs, operators should have the confidence that their decisions are statistically sound. The work presented here shows that methods do matter, and useful insights can be derived regarding complex geosystem behavior by geoscientists, engineers, and statisticians working together.

You can access this article if you purchase or spend a download.