For years, many companies involved with drilling have searched for the ideal method to calculate the state of a drilling rig. While companies cannot agree on a standard definition of "rig state," they can agree that as we move forward in drilling optimization and with further use of remote operations and automation, that rig state calculation is mandatory in one form or the other. Internally in the service company, many methods exist for calculating rig state, but one new technology area holds promise to deliver a more efficient and cost-effective option with higher accuracy. This technology involves vision analytics. Currently, detection algorithms rely heavily on data collected by sensors installed on the rig. However, relying exclusively on sensor data is problematic because sensors are prone to failure and are expensive to maintain and install. By proposing a machine learning model that relies exclusively on videos collected on the rig floor to infer rig states, it is possible to move away from the existing methods as the industry moves to a future of high-tech rigs. Videos, in contrast to sensor data, are relatively easy to collect from small inexpensive cameras installed at strategic locations.
Consequently, this paper presents machine learning pipeline that is implemented to perform rig state determination from videos captured on the rig floor of an operating rig. The pipeline can be described in two parts. Firstly, the annotation pipeline matches each frame of the video dataset to a rig state. A convolutional neural network (CNN) is used to match the time of the video with corresponding sensor data. Secondly, additional CNNs are trained, capturing both spatial and temporal information, to extract an estimation of rig state from videos. The models are trained on a dataset of 3 million frames on a cloud platform using graphics processing units (GPU). Some of the models used include a pretrained visual geometry group (VGG) network, a convolutional three-dimensional (C3D) model that used three-dimensional (3D) convolutions, and a two-stream model that uses optical flow to capture temporal information. The initial results demonstrate this pipeline to be effective in detecting rig states using computer vision analytics.