Rate of penetration (ROP) is one of the important indicators for evaluating drilling efficiency, which provides the basis for drilling parameter optimization. ROP prediction methods can be divided into two main categories—physical models and machine learning models. The physical models are derived from classical drilling theory and experimental results, with clear physical meanings, good interpretability, and generalization. However, physical models do not result in very accurate predictions due to assumptions and experiences introduced in the modeling process. Machine learning models can effectively learn the intrinsic relationships between data through the training of a lot of data. However, machine learning models are like black boxes and their performances are highly dependent on the quality of drilling data in oil fields, so their interpretability and generalization ability are relatively low. This study mainly focuses on establishing a more accurate model of ROP prediction with clear interpretability. To achieve this goal, two novel categories of hybrid modeling approaches were introduced for horizontal drilling in the China Jimusar oil field, one of which is the error compensation by machine learning and another is the weighted average outputs. In the first category of hybrid model, physical models are taken as the main submodel, and machine learning models are used to predict and counteract the errors caused by physical models. By this method, the physical model can effectively ensure its physical meaning and generalization, and the machine learning model, as a submodel, can effectively compensate for the low-accuracy defects of physical models to improve prediction accuracy. In the second category, combining physical models with machine learning models utilizing ensemble learning, the deficiencies of models are cancelled out by the other models in the ensemble—like a team effector. The paper presents hybrid models with four suggested steps, which include data collection and preprocessing, optimal selection of physical model, optimal selection of machine learning model, and establishment of hybrid model. The performances of physical models, machine learning models, and hybrid models are intercompared. From the view of prediction accuracy, model interpretability, modeling difficulty, and generalization, the hybrid model with error compensation by machine learning is the optimal method for ROP prediction. This study also demonstrates an optimal trade-off between high accuracy and good interpretability.

You can access this article if you purchase or spend a download.