The objective of this study is to develop an explainable data-driven method using five different methods, namely: Recurrent Neural Network (RNN), XGBoost, GMDH, CatBoost and GP to create a model using a multi-dimensional dataset with over 700 rows of data for predicting MMP. In this work, we applied various AI methods (three black box algorithms and two White-box algorithms) to train a model using a multi-dimensional dataset with over 700 rows of data. Moreover, two robust correlations will be developed that can be used for a wide range of parameters. The dataset has 20 variables, and five subsets (labelled SET1 to SET5) were used as input parameters to develop models. The subsets were selected using a feature importance analysis (similar to Gray’s theorem). Among the multiple inputs tested, the model trained with SET1 and SET5 input parameters (including mole fraction of different hydrocarbon and nonhydrocarbon components and reservoir temperature) resulted in the most accurate estimations of MMP (R2 = 0.99). To further improve the explainability of the model, sensitivity and shapely values analyses were conducted on the developed models, and the impact of each individual feature on the output (MMP) was explained. Temperature, volatile/intermediate, and nonhydrocarbon components are the most influential parameters depending on the subset of parameters chosen; moreover, the models developed in this work performed considerably better (25-40% more accurately) compared with three well-known empirical models from the literature. The result of the current study is repeatable; the developed correlations can be readily applied in other scenarios within the scope of the parameters used to develop the models. The vast range of features in the dataset makes it suitable to study the effects of different parameters on MMP in conditions representative of CO2-EOR and CCUS.

You can access this article if you purchase or spend a download.