This paper introduces an algorithm for rule induction intended to provide new insights, improve the reliability and expedite the utilization of large petrophysical and geologic databases. Very large petrophysical, geophysical, and geological databases contain multiple data types, which must be interpreted for application in subsurface modeling. This paper presents a significant advance in discovering complex and even nontrivial data relationships from such databases.
Geoscientists are often challenged to predict subsurface lithologies and properties from multivariate relationships within large databases of core, wireline, and seismic data. Many data analysis techniques are used including histograms, parametric and non-parametric regression, n-dimensional histograms, cluster analysis, discrimininant analysis, principal components analysis. This paper introduces a new algorithm that seeks to discover "rule-like" relationships within the data that can be used to make predictions. The method is loosely derived from a data mining technology of classification.
Concepts of data attribute distinguishability and importance are introduced to assess the value of the data and the outcomes to predictability. The new theory, implementation details, and an application are presented. Current petrophysical, seismic, and geostatistical analysis benefit from the rule induction algorithm presented. Improved reservoir characterization and forecasting result.
The field of data mining2,7,8,11,17,18 has grown in recent years to deal with large databases available in different industries, in particular, the financial and medical fields. Data mining is the identification or discovery of patterns in data. There are several different types of data mining. These include classification, clustering (segmentation), association, and sequence discovery. The main focus of classification is supervised induction, that is, inference of rules and relationships from large databases. The aim is to extract knowledge from data, so that results not directly in the training data set can be predicted. The training data helps to distinguish predefined classes. Neural networks3,9,13,14,27, decision trees5,6,16,19,26,29 and "if-then-else" rules are classification techniques. A disadvantage of neural networks is that it is difficult to provide a good rationale for the predictions made, that is, the rules are not always clear.
Data mining is an interdisciplinary field bringing together techniques from statistics, machine learning, artificial intelligence, pattern recognition, database, and visualization technologies. The methods used in data mining are not fundamentally different from older quantitative model-building techniques, but are natural extensions and generalizations of such methods. There are many applications of various data mining techniques to petroleum characterization1,4,12,15,28.
A rule-based algorithm is intended to provide understandable rule-like relationships in the data. A rule is a prevailing quality or state. Induction is an instance of reasoning from a part to a whole. Rules indicate the degree of association between variables, map data into predefined classes, and identify a finite set of categories or clusters to describe the data. The rules support specific tasks and are generated by repeated application of a certain technique, or more generally an algorithm, on the data. Rough Sets17,20–25,30,31 are specialized methods for inducing rules. The essential idea of rough sets is to express uncertain knowledge through an approximation space, which is constructed as certain sets.