A real‐time deep‐learning model is proposed to classify the volume of cuttings from a shale shaker on an offshore drilling rig by analyzing the real‐time monitoring video stream. Compared with the traditional video‐analytics method, which is time‐consuming, the proposed model is able to implement a real‐time classification and achieve remarkable accuracy. Our approach is composed of three modules: a multithread engine for decoding/encoding real‐time video stream. The video streaming is provided by a modularized service named Rig‐Site Virtual Presence, which enables aggregating, storing, transrating/transcoding, streaming, and visualization of video data from the rig; an automatic region‐of‐interest (ROI) selector. A deep‐learning‐based object‐detection approach is implemented to help the classification model find the region containing the cutting flow; and a convolutional‐neural‐network‐based classification model, which is pretrained with videos collected from previous drilling operations. Normalization and principal‐component analyses (PCAs) are conducted before every video frame is fed into the classification model. The classification model classifies each frame into four labels (Extra Heavy, Heavy, Light, and None) in real time. The overall workflow has been tested on a video stream directed from an offshore drilling rig. The video stream has a bitrate of 137 Kbps, approximately 6 frames/sec (fps), and a frame size of 720 × 486. The training process is conducted on an Nvidia GeForce 1070 graphics processing unit (GPU). The testing process (classification inference) runs with only an i5‐8500 central processing unit (CPU). Because of the multithreads processing and proper adaptation on the classification model, we are able to handle the entire workflow in real time. This allows us to receive a real‐time video stream and display the classification results with encoded frames on the user‐side screen at the same time. We use the confusion matrix as the metric to evaluate the performance of our model. Compared with results manually labeled by engineers, our model can achieve highly accurate results in real time without dropping frames.