Accurate coal identification is critical in coal seam gas (CSG) (also known as coalbed methane or CBM) developments because it determines well completion design and directly affects gas production. Density logging using radioactive source tools is the primary tool for coal identification, adding well trips to condition the hole and additional well costs for logging runs. In this paper, machine learning methods are applied to identify coals from drilling and logging-while-drilling (LWD) data to reduce overall well costs. Machine learning algorithms include logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forests (RF), and extreme gradient boosting (XGBoost). The precision, recall, and F1 score are used as evaluation metrics. Because coal identification is an imbalanced data problem, the performance on the minority class (i.e., coals) is limited. To enhance the performance on coal prediction, two data manipulation techniques [naive random oversampling technique (NROS) and synthetic minority oversampling technique (SMOTE)] are separately coupled with machine learning algorithms.
Case studies are performed with data from six wells in the Surat Basin, Australia. For the first set of experiments (single well experiments), both the training data and test data are in the same well. The machine learning methods can identify coal pay zones for sections with poor or missing logs. It is found that ROP is the most important feature. The second set of experiments (multiple well experiments) use the training data from multiple nearby wells, which can predict coal pay zones in a new well. The most important feature is gamma ray. After placing slotted casings, all wells have over 90% coal identification rates and three wells have over 99% coal identification rates. This indicates that machine learning methods (either XGBoost or ANN/RF with NROS/SMOTE) can be an effective way to identify coal pay zones and reduce coring or logging costs in CSG developments.