Adding uncertainty to the estimation of log data using a gradient-boosted tree approach
Marianne Rauch, PhD, TGS, Keyla Gonzalez, TGS
The inputs for the ML model training are different
Summary
combinations of feature and target curves. The minimized
cost function is the mean squared error between predicted
and target curves. During each round of training, a new tree
function is built to minimize the mean squared error cost
function summed over all samples in the training dataset.
At every iteration, the resulting model is used to predict a
validation set, and the model which exhibits the lowest
validation set errors is deployed over the whole basin on all
wells that have at least one of the five curves available,
figure 1.
Machine learning (ML) applications have infiltrated
geosciences, especially in wireline log estimations,
interpretation of salt bodies, and fault definitions. Other
applications such as seismic processing, automated first
break picking, velocity analysis, and pre-stack inversions for
rock property have been tackled with this technology. Our
industry has large datasets that are especially suitable for
data-driven methods. However, ML algorithms have been
labeled as a “black box” as the actual process is not evident
to the end user. Furthermore, the uncertainty of the results is
always questioned.
We applied ML to estimate missing log data at a basin scale
on millions of wells, and our findings indicate that
measuring the uncertainty of the results in a meaningful way
is extremely difficult. Instead of spending efforts
quantifying uncertainty, we spent more time and work
cleaning the input data to ML, which proved to be the most
effective in improving the quality of the ML inference
results. When estimations are compared to blind data, the
data fit is very good if the input data have been cleaned and
the uncertainties are small. If the input data are noisy, the
comparisons are poor, and the uncertainty is larger.
Figure 1: log prediction ML input and estimated logs
Quantile regression uses a typical linear regression model
where the mean difference from the ground truth is being
tracked, which allows for an optimization of the model. The
downside of this approach is that a specific quantile
(percentile) is being tracked against the medium of the
ground truth. For example, tracking 5% quantile covers 5%
of the data, and 95 percent quantile covers 95% of the data.
Theoretically, this should provide lower and upper
boundaries for the smallest and highest estimates in a
regression task.
Method
We used ML algorithms to estimate missing well-log
information by training a network to predict one curve from
a combination of one or more other curves. We target infill
predictions of 5 curves: gamma ray, deep resistivity, neutral
porosity, compressional sonic, and density. For each target
curve, separate models are trained from different
combinations of available curves.
For regression prediction tasks, rarely, an absolute accurate
prediction is performed as those are always inaccurate.
Searching for an absolute precision acquired over a
prediction interval is better. This is done using a Light
Gradient Boosting Machine (LightGBM), which helps to
increase the efficiency of a model, reduces memory usage,
and is one of the fastest and most accurate libraries for
regression tasks. In theory, it adds more utility to the model
by implementing prediction intervals that show a range of
possible values. Figure 2 shows an example of an estimated
density curve. In addition to the actual measured curve
(orange), all three predicted curves are represented, P5
(blue), P95 (black) and P50 (red). The ML algorithm
effectively estimates density (red curve) compared to the
measurement (orange curve). If the measured and predicted
curves are similar, the P5 and P95 values shouldn’t display
a big deviation from the P50 curve. This is not the case in
our example, which indicates that the input data are too
noisy, and the algorithm has issues dealing with this.
The underlying algorithm for curve prediction is gradient-
boosted regression trees. Gradient boosted trees belong to a
class of tree-based methods with the capacity for modeling
data-driven piece-wise target-feature interactions as
opposed to multivariate linear regression, which requires
certain assumptions about the correlations between input
and target curves. The tree structure is chosen because it is
robust - invariant to input scaling, and scalable – additional
trees can be used to model higher-order interactions
between features. Tree models essentially map a set of
feature curves to target curves using iterative cost
minimization. Feature curves can be recorded logs or
representative attributes extracted from logs. Features can
be continuous or categorical.
Third International Meeting for Applied Geoscience & Energy
10.1190/image2023-w03-01.1
Page 1780
© 2023 Society of Exploration Geophysicists and the American Association of Petroleum Geologists