The use of advanced analytics techniques has become pivotal for the Digital Transformation of the Oil and Gas Industry. Most of these models are used to predict and avoid the off-spec behaviors of both equipment and functional units of the plant and also for predicting overshooting events in advance allows plant’s operators to avoid production down-time.
From a Machine Learning perspective, predicting off-specs situation and peaks in time signal is a complex task, due to the great rarity of events. For the very same reason, using standard data science measures – like Area Under the Curve (AUC), Recall and Precision – can lead to misleading performance indicators. In fact, a model that predicts no off-spec would have a high AUC just because of the unbalanced classes, leading to many false alarms. In this paper we present a business-oriented validation framework for big data analytics and machine learning models applied to a upstream production plant. This allow to evaluate both the effort required to operators and the expected benefit that could be achieved.
The validation metrics defined take the classical Data Science measures to the business domain. This allow to adapt the model to the very specific use case and end user addressing the specific upstream plants constraints. This framework allows to define the optimal tradeoff between effort required and preventable events, providing statistics and KPIs to evaluate it. Normalized Recall (NR) takes into account both the percentage of events intercepted and the effort required, in terms of Attention Time (AT), when the operator should pay attention to the equipment involved. Plant operators can now have an idea of the results they can achieve with respect to the maximum effort required. Moreover, to prove the goodness of the model, we defined the lift in the NR as the ratio of the model NR and the NR that would be obtained by randomly distributing the same number of alarms.
We applied this framework to specific use cases obtaining an expected recall of 40-50% with an expected effort of 5-10% of the time (considering more than 6 months). The effort is actually lower, since the operator is not requested to be fully committed to the alarm. The innovative framework developed is able to demonstrate the real operating capability of the analytics implemented on field, highlighting both the effort required to operators and the accuracy of machine learning tools.