The objective of this paper is to integrate "Big Data" concept with petroleum engineering knowledge for the prediction of recovery factor in Deepwater Gulf of Mexico (dGOM) oilfields. Recovery factor is affected by many geological and engineering factors; as a result, there is no explicit approach to accurately calculate the recovery factor. This is particularly true for deepwater development as the parameters associated with the recovery factor estimation have significant uncertainties and usually extremely costly to obtain. Typically, the recovery factor of a field is estimated using analogs, material balance, decline curve or numerical simulation. These deterministic approaches requires representative geological models. However, enough information is often not available to capture the realistic flow. In addition, the estimated recovery factor can be very different using different methods. Reservoir engineers are faced with the challenging task of estimating recovery factor by optimizing a large number of parameters with limited, sometime inaccurate information. This dilemma calls for an alternative approach in handling the noisy data.
Data mining and classification identify hidden patterns in unstructured data and tend to be fairly robost in the presence of noisy data. Using a database of 395 Deepwater Gulf of Mexico (dGOM) oilfields with 84 attributes, a set of dimensionless numbers are calculated for 59 oilfields with water drive. This helps in dimensionality reduction and scaling of reservoir models for comparison. Based on the distribution of dimensionless numbers, data mining techniques like K-means clustering followed by principal component analysis (PCA) are used for classifying oilfields into four categories. Subsequently, partial least square (PLS) regression is used for relating dimensionless numbers to recovery factor from sparse data in dGOM, which gives good coefficient of correlation for some clusters.
This paper shows that dimensionless numbers together with data mining techniques provides a new, easy to implement method for predicting recovery factor for large datasets where application of other methods are limited due to requirement of high computational cost and time.