Abstract
This study provides a structured and comprehensive overview of five popular outlier detection techniques (angle-based outlier detection (AOBD), distance-based outlier detection, density-based outlier detection, isolation forest, and one class SVM) and identifies AOBD as the most suitable technique for cleaning outliers from production data. Outliers are observations that deviate from the well-defined notion of normal behavior. Presence of outliers can add significant uncertainty and non-uniqueness to the rate transient analysis (RTA) of production data. Each outlier detection technique in this study measures "deviation" differently and therefore performs differently even when applied to the same dataset, creating the need to test several methods and find the optimal technique for cleaning production data.
We first generated a synthetic example by simulating production from a multi-fractured horizontal well using a reservoir simulator. We then added outliers to this data to approximate a realistic field case. In this way every data point had a known outlier or inlier label. All five outlier detection techniques were then applied to this dataset to classify every point as either an outlier or an inlier. The most suitable technique for removing outliers from production data was identified based on the algorithm’s ability to correctly identify the largest fraction of outliers. We also demonstrated the application of these methods using a field example from a shale play.
Our results show that AOBD technique works best for outlier detection in production data. AOBD is a non-parametric approach for outlier detection that classifies a point as an outlier if the variance of angles between pairs of remaining points in dataset is much smaller than the rest of the points in the dataset. The algorithm is intuitive, and its implementation in any coding language is relatively straight-forward. We conclude that the AOBD technique can be used to remove outliers easily and effectively from production data in an objective fashion for improved production forecasting and RTA workflow.