This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 203448, “Decision-Tree Regressions for Estimating Liquid Holdup in Two-Phase Gas/Liquid Flows,” by Meshal Almashan, SPE, Yoshiaki Narusue, and Hiroyuki Morikawa, University of Tokyo, prepared for the 2020 Abu Dhabi International Petroleum Exhibition and Conference, Abu Dhabi, held virtually 9–12 November. The paper has not been peer reviewed.

In the authors’ study, a machine-learning predictive model—boosted decision tree regression (BDTR)—is trained, tested, and evaluated in predicting liquid holdup (HL) in multiphase flows in oil and gas wells. Results show that the proposed BDTR model outperforms the best empirical correlations and the fuzzy-logic model used in estimating HL in gas/liquid multiphase flows. Using the BDTR model with its interpretable representation, the heuristic feature importance of the input features used in building the model can be determined clearly.


Machine-learning approaches in predicting HL in multiphase flows have been recently studied to improve prediction accuracy compared with existing empirical correlations. However, these approaches ignore the heuristic feature importance of the input parameters to the predicted HL values. The heuristic feature importance can help provide better insight into the issues associated with HL studies, such as the liquid-loading phenomenon. To the best of the authors’ knowledge, the present study is the first work that shows how decision-forest regression predictive models can predict HL accurately.

Data Acquisition

The performance and the predictive power of a machine-learning model relies greatly on the quality and completeness of the data set used in building the model. The data sets used in training and testing the predictive model are experimental and were collected from the literature (111 data points). Air/kerosene and air/water mixtures were used in obtaining the 111 experimental data points. In this study, this data set is divided into three different subsets: training, validation, and testing.

The data sets consist of the properties of HL, the superficial gas velocity (Vsg), the superficial liquid velocity (Vsl), pressure, and temperature (T). The statistical measures of the data sets are shown in Table 1 of the complete paper.

This content is only available via PDF.
You can access this article if you purchase or spend a download.