Decades of subsurface exploration and characterization have led to the collation and storage of large volumes of well-related data. The amount of data gathered daily continues to grow rapidly as technology and recording methods improve. With the increasing adoption of machine-learning techniques in the subsurface domain, it is essential that the quality of the input data is carefully considered when working with these tools. If the input data are of poor quality, the impact on precision and accuracy of the prediction can be significant. Consequently, this can impact key decisions about the future of a well or a field.

This study focuses on well-log data, which can be highly multidimensional, diverse, and stored in a variety of file formats. Well-log data exhibits key characteristics of big data: volume, variety, velocity, veracity, and value. Well data can include numeric values, text values, waveform data, image arrays, maps, and volumes. All of which can be indexed by time or depth in a regular or irregular way. A significant portion of time can be spent gathering data and quality checking it prior to carrying out petrophysical interpretations and applying machine-learning models. Well-log data can be affected by numerous issues causing a degradation in data quality. These include missing data ranging from single data points to entire curves, noisy data from tool-related issues, borehole washout, processing issues, incorrect environmental corrections, and mislabeled data.

Having vast quantities of data does not mean it can all be passed into a machine-learning algorithm with the expectation that the resultant prediction is fit for purpose. It is essential that the most important and relevant data are passed into the model through appropriate feature selection techniques. Not only does this improve the quality of the prediction, but it also reduces computational time and can provide a better understanding of how the models reach their conclusion.

This paper reviews data quality issues typically faced by petrophysicists when working with well-log data and deploying machine-learning models. This is achieved by first providing an overview of machine learning and big data within the petrophysical domain, followed by a review of the common well-log data issues, their impact on machine-learning algorithms, and methods for mitigating their influence.

This content is only available via PDF.
You can access this article if you purchase or spend a download.