Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance.
In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability.
Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values.
The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product.
Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly drawing datasets with replacement from the training data. Each sample has the same size of the original training set and it can be conducted N times to produce N bootstrap datasets to re-fit the model accordingly to decrease the squared difference between the estimated and observed categorical variables (facies) leading to decrease the degree of uncertainty.