This paper presents an approach, how predictive data mining models can be used to enhance and accelerate production of hydrocarbon reservoirs. The whole knowledge discovery process, combined with the automation of the data-flow bears significant potential for improving the current production practice. Predictive data mining models can perform different tasks. When they are applied automatically (i.e. scheduled after the acquisition of new data), they can reduce the workload of the knowledge workers significantly. These models act as gatekeeper, looking at the quality of data, as well as behavioral changes of wells or whole production systems.


Currently, the petroleum industry is facing problems to analyze and optimize production due to the exponentially increasing amount of data, provided by the production facilities. Delayed data delivery and data overload leads to an inability to deliver the proper data in a timely fashion to the engineers. Paradoxically in times of the high tech age, data quality and the information derived is decreasing.

In addition, many physical processes, which are encountered during production of a hydrocarbon reservoir, may or may not have been successful or properly designed. Limitations have to be applied to reduce the complexity of the system, so that mathematical equations can be used to model the processes. These models operate in a certain way and they generate wealth of data whose interpretation suffers from two main problems:

  1. They do not seem to fit simple patterns of expected behavior. This should be expected, because of the large number of interconnected phenomena, which affect the data.

  2. Even with complicated computer-aided history matching, there is always the danger to miss an important component whose influence may be hidden or may not have a continuously similar impact.

A process of discovering significant new correlations, patterns and trends by sifting large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques has been labeled ‘data mining’ and is the subject of this paper.

Data mining does not ignore the underlying physics of phenomena and processes. Instead, it recognizes the complexity and multiplicity of influences and allows this complexity to be interpreted and, at the same time, teach and reveal the underlain behavior.

Data mining technologies1–13 can fill the gap between high frequency data and the petroleum engineers desktop. Automated data mining tools can preprocess raw data, apply quality checks and extract information out of the vast amount of data gathered in an oil or gas field14–16. The engineer, usually overloaded with his daily work, is supplied with already preprocessed information, which can be turned into domain knowledge with little effort. This paper covers the process of knowledge discovery, focusing on the data flow and explaining the main tools to achieve a higher level of automation and optimization.

Data Mining and Knowledge Discovery

There are many ways to classify data mining tasks. For simplicity, four categories of broadly defined tasks can be used to define the activity in a data mining project.

  • Classification

  • Estimation

  • Segmentation

  • Description


To put labels on data records is a classification task. The data miner has to build a model that will route incoming records into the correctly labeled "bucket".

This content is only available via PDF.
You can access this article if you purchase or spend a download.