Abstract

Machine learning has gained a substantial attention in recent years. Many Industries, including the oil and gas have adopted the technique in their applications. As of today, machine learning has been used in several aspects of petroleum engineering, from reservoir modeling and characterization to well placement optimization (Guyagulera et al. ,2002) and equipment malfunction predictions (Bangert, 2012).

In petrophysics, machine learning has been used extensively as an alternative approach to conventional methods to classify rock facies based on available well data. This is generally because conventional methods involving visual examination of cores and assigning facies manually is a tedious and a time-consuming process. Nevertheless, most machine learning classification algorithms accuracy is reduced when the facies to be classified are not represented equally in the dataset i.e. the problem of data imbalance.

In this paper, we compare the performance of five machine learning classification algorithms using an imbalanced data set where two facies types dominate the dataset. model selection is carried out first then algorithms are compared using cross-validation concept and finally best performing models are investigated further and compared in terms of prediction accuracy using the same data set. It is concluded that in an imbalanced dataset, simple support vector machines outperform the other four algorithms i.e. the tree-based algorithms and it is more efficient in predicting facies classes.

Exploratory Data Analysis

The data set used consist of seven well log measurements (referred to as features) from six wells in the Mabruk Field, southwest of Libya. Each set of measurements at a half-foot interval, is associated with a facies label (referred to as the target class). There are four distinct facies classes in the data set; Shallow Lagoon, Deep Lagoon, Shoal Lagoon and Shoal/Reef and these are labeled from 1 to 4 respectively. Distribution and count of each class per well is shown in figure (1a). We can clearly see that facies 2 and 4 dominate the data set and this might influence model predictability.

This content is only available via PDF.
You can access this article if you purchase or spend a download.