Machine learning (ML) applications have infiltrated geosciences, especially in wireline log estimations, interpretation of salt bodies, and fault definitions. Other applications such as seismic processing, automated first break picking, velocity analysis, and pre-stack inversions for rock property have been tackled with this technology. Our industry has large datasets that are especially suitable for data-driven methods. However, ML algorithms have been labeled as a “black box” as the actual process is not evident to the end user. Furthermore, the uncertainty of the results is always questioned.
We applied ML to estimate missing log data at a basin scale on millions of wells, and our findings indicate that measuring the uncertainty of the results in a meaningful way is extremely difficult. Instead of spending efforts quantifying uncertainty, we spent more time and work cleaning the input data to ML, which proved to be the most effective in improving the quality of the ML inference results. When estimations are compared to blind data, the data fit is very good if the input data have been cleaned and the uncertainties are small. If the input data are noisy, the comparisons are poor, and the uncertainty is larger.