Summary
The formation of deposits is a very common issue in oil and gas pipeline transportation systems. Such sediments, mainly wax and paraffine for crude oil, or hydrates and water for gas, progressively reduce the free cross-sectional area of the pipe, leading in some cases to the complete occlusion of the conduit. The overall result is a decrease in the transportation performance, with negative economic, environmental, and safety consequences. To prevent this issue, the amount of inner deposits must be continuously and accurately monitored, such that the corresponding cleaning procedures can be performed when necessary. Currently, the former operation is still dictated by best-practice rules pertaining to preventive or reactive approaches, yet the demand from the industry is for predictive solutions that can be deployed online for real-time monitoring applications. The paper moves toward this direction by presenting a machine learning methodology that leverages pressure measurements to perform online monitoring of the inner deposits in crude oil trunklines. The key point is that the attenuation of pressure transients within the fluid is dependent on the free cross-sectional area of the pipe. Pressure signals, collected from two or more distinct locations along a pipeline, can therefore be exploited to estimate and track in real time the presence and thickness of the deposits. Several statistical indicators, derived from the attenuation of such pressure transients between adjacent acquisition points, are fed to a data-driven regression algorithm that automatically outputs a numeric indicator representing the amount of inner pipe debris. The procedure is applied to the pressure measurements collected for one and a half years on discrete points at a relative distance of 40 and 60 km along an oil pipeline in Italy (100 km length, 16-in. inner diameter pipes). The availability of historical data prepipe and postpipe cleaning campaigns further enriches the proposed data-driven approach. Experimental results demonstrate that the proposed predictive monitoring strategy is capable of tracking the conditions of the entire conduit and of individual pipeline sections, thus determining which portion of the line is subject to the highest occlusion levels. In addition, our methodology allows for real-time acquisition and processing of data, thus enabling the opportunity for online monitoring. Prediction accuracy is assessed by evaluating the typical metrics used in the statistical analysis of regression problems.
Introduction
Pipeline transportation systems represent the cheapest and safest solution to convey hydrocarbons, gases, and other fluids over long distances. After construction, pipe internals tend to naturally accumulate deposits, such as rust, dirt, mill scale, or paraffin wax (McAllister 2013). Those constituents need to be removed for a number of reasons: First, to avoid product contamination, which can have a negative economic impact on the business; second, to allow for a better use of corrosion inhibitors, whose action is less effective if the pipe bore is covered with mill scale or it is partially corroded; third, to improve flow rate and efficiency, which is maximized when the pipeline is completely clean (especially for pipelines having a length of several tens of kilometers); and lastly, in natural gas transportation lines, to facilitate pipeline drying, required to prevent internal corrosion and the formation of hydrates.
The internal cleaning of a pipeline can be performed with several techniques, applied individually or jointly (McAllister 2013; Olajire 2021). We mention here the injection of chemical solvents (e.g., flux); internal sandblasting, in which an abrasive material is used to scrape the inner surface of the pipe and to remove contaminants; purging with air or gas to prevent oxidation phenomena leading to corrosion; running a pipeline inspection gauge (PIG), which is a multipurpose maintenance tool capable of flushing debris out of the pipe by scraping its internals with metallic brushes or plastic disks. Among those cleaning solutions, the PIG becomes particularly advantageous from a product saving and environmental point of view, especially in multiproduct lines where fluids are conveyed in batches: For instance, at the end of a given oil transfer, one can clear out the residuals stuck inside the pipe bore with a PIG run, thus allowing for a faster and effective product switch. In fact, by separating two batches of different products with a PIG, it is possible to avoid flushing the line with water, solvents, or (in some cases) the following product, and to stave off effluent treatment or contaminated product recovery.
Regardless of the solution used, adequate strategies should be carefully arranged to prevent blockages in pipes and to guarantee the desired transportation efficiency. For these reasons, tracking the amount of inner deposits assumes particular relevance in pipeline transportation systems (Van der Geest et al. 2021), especially for crude oil lines where the high viscosity of the conveyed product facilitates the deposition of wax and asphaltenes. To this date, however, this monitoring operation is still performed by resorting to empirical, best-practice rules or by scheduling a periodic activity, an approach used in preventive or reactive maintenance: Such a policy is mainly adopted because we lack both a clear definition of clean/clogged pipe and a rigorous method for measuring the related accumulation of debris. A significant body of literature focuses the attention on the prediction and monitoring of inner deposits in oil and gas pipelines. Several authors have proposed deterministic deposition models, yet they can only be applied in an offline context (Giacchetta et al. 2019; Leporini et al. 2019; Xie et al. 2018; Kamari et al. 2013, 2014; Shasha and Qiyu 2014; Wang et al. 2014b; Obaseki and Elijah 2021; Yao et al. 2021; Chen et al. 2021); multiple studies have instead been designed and tested on laboratory setups or make use of data from the literature, and so they lack a validation phase on real scenarios (Huang and Ma 2008; Huang et al. 2017; Guozhong and Gang 2010; Ito et al. 2021; Li et al. 2020; Modesty Kelechukwu et al. 2013; Wang and Huang 2014; Wang et al. 2014Wang et al. 2014bWang et al. 2014b; Wang et al. 2014a; Zougari 2010; Li et al. 2018; Theyab and Diaz 2016; Van Der Geest et al. 2018; Adeyanju and Oyekunle 2019; Chi et al. 2019; Sun et al. 2020; Obinichi et al. 2021; Li et al. 2021; Agarwal et al. 2021). According to the authors’ knowledge, there are currently two research works that satisfy the requirement of real-time monitoring: The former, by Halstensen et al. (2013), is an online estimation method based on acoustic chemometrics, which has, however, been validated on a very short pipe section (5.5 m of length); the latter, by Lock Sow Mei et al. (2015), is a technique based on electrical capacitance tomography, yet it has been designed and tested in a laboratory. Lastly, certain authors have implemented data-driven solutions based on machine learning approaches (Xie and Xing 2017; Obanijesu and Omidiora 2008; Jalalnezhad and Kamali 2015; Sousa et al. 2021; Menad et al. 2021), but they each share one or more of the previously discussed drawbacks.
The literature review reported here highlights two main research gaps: First, there is a need for consistent and precise prediction methods to monitor the amount of inner deposits in pipelines, making use of flexible data-driven approaches to be validated on real data sets (Mwendapole Lonje and Liu 2021); second, modern systems demand online monitoring to perform real-time predictive maintenance (Alnaimat and Ziauddin 2020), yet this requirement is typically not satisfied by the current research. This work addresses all the aforementioned necessities by presenting a machine learning methodology (based on extremely randomized trees) that makes use of pressure measurements, collected in two or more discrete points along a pipeline, to automatically provide as output a numeric indicator that quantifies the cleanliness level of the pipeline itself, thus offering a clear indication of its internal conditions. We demonstrate that our proposal can track the occlusion levels of an entire line and of individual pipe segments, therefore determining which portion of the conduit is mostly blocked by deposits and debris. In addition, the methodology presented here can operate with data collected in real time from transportation assets, being thus capable of performing online monitoring and control tasks. Lastly, the validity of the proposed procedure has been assessed on one and a half years of data, collected from a 100km crude oil pipeline located in Italy.
The remainder of the paper is structured as follows. We first outline the proposed prediction method. Then, we describe in detail its application on a real crude oil pipeline. Lastly, we draw the conclusions.
Overview of the Prediction Method
The approach presented here for monitoring deposits in crude oil transportation pipelines makes use of standard pressure measurements, collected by means of hydrophones in two distinct points (, ) along a pipeline; such instruments sense the pressure transients propagating within the fluid that is flowing inside the pipe. Acoustic signals can be generated by multiple sources, such as pumping equipment, valves, flow turbulence, any PIG traveling inside the line, spill operations, tremors, quakes, landslides, etc.; in this particular case, the main emitters of interest are represented by the pumps and by the PIGs (Bernasconi and Giunta 2020).
The raw pressure measurements are suitably processed to compute the specific attenuation of acoustic waves propagating between and . Successively, a set of statistical indicators is evaluated from the aforementioned quantity, and the resulting feature set is used to train a machine learning algorithm based on extremely randomized trees (Geurts et al. 2006). The latter is designed to output a real number ranged between zero and unity, which describes the current state of pipeline internals (clean = 0, dirty = 1 or any intermediate stage). As a last step, it follows an assessment phase, in which the accuracy of the data-driven prediction model is tested on unseen data. If the performance of the predictor satisfies the design requirements (e.g., accuracy metrics greater than a target threshold), the model can be deployed online to monitor in real time the inner conditions of the pipeline. In such a case, at each timestep , one has to perform the following operations:
Collect the instantaneous pressure measurements and from the locations and , respectively;
Compute
Evaluate the required set of features, denoted by the vector
Obtain the instantaneous prediction by providing as input to the regression algorithm.
Application of the Model on a Real Transportation Scenario: Chivasso-Pollein Crude Oil Pipeline
This section demonstrates how the proposed model has been applied to data collected from a real pipeline transportation system. Two applications will be presented: The first one consists of a global monitoring strategy to predict the occlusion levels of a conduit in its entirety, whereas the latter is used to monitor the amount of deposits within individual line sections.
Experiment Setup
We have used the historical vibroacoustic measurements, collected by a proprietary digital integrity monitoring system (e-vpms® technology; Giunta and Bernasconi 2019; Giunta et al. 2016), installed on a crude oil transportation line that connects the Eni logistic terminals of Chivasso and Pollein, located in north Italy (Bernasconi et al. 2014). Such a line has a length of approximately 100 km and is characterized by 16-in. inner diameter pipes. A schematic representation of the e-vpms® system is displayed in Fig. 1,. A set of sensing stations are located in discrete points (named A, B, and C in Fig. 1) along the pipeline; each e-vpms® acquisition unit is equipped with a sensing group, recording the absolute pressure of the transported fluid in bars, and a dynamic hydrophone, which measures small-scale dynamic pressure variations (in the order of kilopascals). The collected measurements are time synchronized by means of GPS and, successively, sent to a central control unit. Pressure data have been collected at a sampling rate of 20 Hz from 1 June 2013 until 1 December 2014. In addition to the e-vpms® data set, we also dispose of historical prepipe and postpipe cleaning operation logs, which textually outline the dates and times at which one or more PIG runs have been performed on the trunkline.
The satellite map of the conduit and the location of the recording stations (labeled with the letters A, B, and C) are displayed in Fig. 2, respectively, with a red line and yellow pins. The distances between each station and the pumping equipment located at Terminal A are reported in Table 1 (Giro et al. 2021).
Satellite map of Chivasso-Pollein pipeline routing (red curve) and location of the e-vpms® measurement stations (yellow pins).
Satellite map of Chivasso-Pollein pipeline routing (red curve) and location of the e-vpms® measurement stations (yellow pins).
Data Processing
The first step consists in transforming the unprocessed measurements into a format suitable for machine learning tasks. The two plots in Fig. 3, respectively, show the raw static (Fig. 3, top) and dynamic (Fig. 3, bottom) pressure signals, collected from the three different e-vpms® stations (A, B, and C, respectively identified with turquoise, purple, and green lines). Each pressure time series needs to be cleansed to remove undesired data points:
Presence of outliers because of sensor errors. Such outliers are because of rare electromagnetic disturbances affecting the power unit of the measuring stations and result in faulty acquisitions having values outside of the dynamic range of the instrumentation. In this specific case, static pressure readings lower than 0.5 bar and higher than 80 bar are discarded; likewise, dynamic pressure values below −170 kPa and above 170 kPa are eliminated from the data set.
Unwanted pressure values corresponding to operational statuses of the line not contributing significantly to the occlusion levels of the pipes. More specifically, we assume that the formation of deposits within pipe segments mainly occurs when the oil is actively conveyed through the line. In fact, at the end of each batch, the pipeline is filled with a flux. Therefore, all the corresponding time intervals in which the pipeline is not operational (e.g., off) or it is into a flow regulation state (e.g., pressure transients generated by pumping fluctuations) should be ruled out from the data set.
Raw static (top) and dynamic (bottom) pressure time series, as measured from Stations A, B, and C (respectively, colored with turquoise, purple, and green lines).
Raw static (top) and dynamic (bottom) pressure time series, as measured from Stations A, B, and C (respectively, colored with turquoise, purple, and green lines).
Even though each of these impairments can be addressed manually, tackling Step 2 by hand becomes impractical when processing data sets having billions of points. A possible solution to this problem consists in exploiting some automated detection procedure, such as the data-driven pump monitoring system described in Giro et al. (2021); Giunta et al. (2020). We have therefore applied the clustering method outlined in (Giro et al. 2021) to fit a Gaussian mixture model (GMM) to the available pressure data. The Gaussian mixture model automatically produces as output a set of categorical labels, each indicating all the time instants in which the system is either off or is performing flow regulations; therefore, the corresponding data points can be easily identified and removed. It should be stressed that the static pressure data have been only used here to aid and simplify the completion of Step 2; however, having such measurements at disposal is not mandatory at all for the purpose of monitoring inner deposits, and from this point onward, the discussion will be solely focused on the analysis of the pressure transients (e.g., dynamic pressure data). Lastly, Fig. 4 represents the static (Fig. 4, top) and dynamic (Fig. 4, bottom) pressure measurements after having performed the processing steps previously described.
Processed static (top) and dynamic (bottom) pressure time series, as measured from Stations A, B, and C (respectively, colored with turquoise, purple, and green lines).
Processed static (top) and dynamic (bottom) pressure time series, as measured from Stations A, B, and C (respectively, colored with turquoise, purple, and green lines).
Attenuation Analysis
We demonstrate here that attenuation measurements (validated by PIG tracking) prove to be a valuable feature for assessing the inner status of the pipeline. For single-phase fluids, the specific attenuation of acoustic waves propagating within the pipe can be expressed as follows (Blackstock and Atchley 2001):
where is the internal radius of the pipe (m), corresponds to the frequency (Hz), is the dynamic viscosity of the fluid (Pa·s), refers to the fluid density (kg/m3), and is the measured sound speed within the fluid (m/s). Of all these parameters, particular attention should be given to , as it experiences the most significant variations during cleanup campaigns. Before such operations, pipe sections are internally affected by deposits (especially wax), thus reducing the effective internal diameter of the pipe in which the oil can flow. It follows that the specific attenuation of acoustic waves in a pipe segment is greater when the latter is partially clogged by wax, compared with a clean pipe (e.g., after cleaning operations).
Instead of using to measure , which requires the real-time knowledge of the instantaneous parameters of the fluid (e.g., , , and ), we have developed a novel and simpler approach to derive the specific attenuation of acoustic waves, which only makes use of basic pressure transients. The grid plot of Fig. 5 graphically explains the aforementioned statement with an example: Fig. 5 displays the attenuation levels a few days before and after a PIG campaign performed on 13 May 2014, as recorded in the available maintenance logs. Starting from the time series of dynamic pressure (Fig. 5, charts on the first column), one can evaluate the power spectral density of such signals collected at two different stations (in this example, A and C: Fig. 5, plots on the second column) and successively derive the frequency-dependent specific attenuation values as the ratio between the two power spectral densities, divided by the distance (in km) between the two stations (Fig. 5, charts on the third column). Lastly, the average specific attenuation level can be obtained by integrating within the entire frequency range (in our case, between 0 and 10 Hz); in other words, corresponds to a power ratio between two signals, scaled by a distance factor.
Dynamic pressure signals time series, power spectral density, and specific attenuation curves before (top row) and after (bottom row) a PIG campaign.
Dynamic pressure signals time series, power spectral density, and specific attenuation curves before (top row) and after (bottom row) a PIG campaign.
The procedure described above can be periodically executed to derive the temporal evolution of the specific attenuation for any line section. An example is depicted in Fig. 6, where the short-term attenuation value (gray line) is displayed for the line segment. The long-term trend (magenta curve) has instead been derived from the short-term values by smoothing the latter curve with a noncausal, 1 week moving average. We can observe that the long-term attenuation curve is characterized by a slow and gradual increase over the course of several weeks or months, coupled with rapid decreases having a much shorter temporal duration: The former phenomenon is mainly because of a progressive augmentation in the occlusion levels of the pipes; the latter correspond to the pigging campaigns performed on the pipeline (as reported in the maintenance logs), some of which have been highlighted in Fig. 6 using black vertical bars. It can be noted that every major drop in attenuation occurs right after a pigging operation has been executed. In such circumstances, cleaner pipe sections allow the pumping terminal located at Station A to operate with a lower service pressure while still delivering the reference value of about 3 bar at Station C (as it can be inferred from the topmost plot of Fig. 4).
Short-term (gray line) and long-term (magenta line) specific attenuation for the pipe segment. The main PIG campaigns (from an attenuation perspective) performed on the pipeline have been highlighted with black vertical bars.
Short-term (gray line) and long-term (magenta line) specific attenuation for the pipe segment. The main PIG campaigns (from an attenuation perspective) performed on the pipeline have been highlighted with black vertical bars.
Validation through PIG Detection and Tracking
We have observed that the attenuation represents a valuable indicator of a pipe’s occlusion level. The goodness of such a feature can be further validated if one experimentally verifies the occurrence of each PIG campaign (as reported in the available operation logs), especially for the main operations (highlighted in Fig. 6 using black vertical bars). To do so, we have developed a software tool capable of detecting, in the observed dynamic pressure measurements, the acoustic noise generated by the traveling PIG (Bernasconi and Giunta 2020). An example of the output provided by such a software is displayed in Fig. 7, where the positions of several PIGs inside the pipeline have been tracked during the second half of November 2014. If we consider the topmost chart of Fig. 7, the latter represents a density plot of the normalized cross-correlation between the dynamic pressure transients recorded by the hydrophones at Stations A and C, as a function of time (horizontal axis) and of the distance from Station A (vertical axis). Darker regions of the image correspond to values of closer to unity (maximum correlation); similarly, lighter areas present the lowest cross-correlation values, which tend to be zero. For simplicity, we have converted the correlation delays (that would be displayed on the vertical axis) into distances, because the former are a linear function of the latter and of the sound velocity within the fluid conveyed in the pipe:
Cross-correlation panel (top), describing the position of several PIGs along the pipeline as a function of time and distance from Station A, and corresponding PIG indicator (bottom), highlighting the start of PIG runs.
Cross-correlation panel (top), describing the position of several PIGs along the pipeline as a function of time and distance from Station A, and corresponding PIG indicator (bottom), highlighting the start of PIG runs.
At first sight, two horizontal dark lines located at the cross-correlation distances km and km can be noticed: The former is related to the pressure transients that originate from Station C and propagate to the opposite line end, while the latter corresponds instead to the physiological propagation delay of the acoustic waves emitted by the pump located at Station A that reach Station C. Both curves present slight upward and downward curvatures, as the velocity of sound within the pipe is not constant over time: Temperature variations and changes in the composition of the flowing product are mainly responsible for such irregularities (Creek et al. 1999). In addition, several white rectangular areas of the images are present, corresponding to all the data points that have been ruled out as a result of the processing flow described in section Data Processing; in those circumstances, cannot be computed and it is automatically set to a null value.
On 15 November at approximately 07:00, a pipeline cleaning operation is initiated, as the PIG departs from Station A to reach the end of the line (Station C) about 16 hours later. The event is observable in the cross-correlation map, as a slant dark line originating from km and gradually increasing toward km, for increasing time; this phenomenon can be interpreted as the position of an acoustic source (e.g., the PIG) that is traveling inside the pipes. In the same image, two additional slant lines can also be noticed; they are representative of further runs, respectively, starting on 23 November at 07:00 and on 27 November at 15:00.
Another utility provided by the PIG detection tool consists in the possibility of computing the speed of the moving inspection gauge. Once two consecutive timesteps and of a particular run have been identified, the instantaneous PIG velocity is obtained as:
The bottom plot of Fig. 7 represents a PIG indicator; namely, it highlights the presence of a coherent cross-correlation peak along a tilted line in the top image of Fig. 6. As stated before, this line is the time-pipe coordinate mark of the traveling PIG, and the slope of the line gives the velocity of the gauge. The PIG indicator index is computed by summing the data along a range of “realistic” PIG displacement velocities, like in a Hough/Radon transform processing (Illingworth and Kittler 1988).
From the PIG velocity range (), one can obtain the slope range () of a tilted line of type :
Design of the Inner Deposits Predictor
As stated at the beginning of the paper, the goal of this work consists in the development of an online data-driven procedure that, starting from the short-term attenuation time series (gray curve in Fig. 6), can predict the occlusion level of pipeline internals. We have observed a strong correlation with the specific attenuation values; however, we still need to provide a clear and unambiguous definition that numerically quantifies the concept of pipe occlusion. This operation becomes necessary to entirely formalize the problem within a proper machine learning context. For this purpose, the first step consists in defining which type of learning task needs to be solved. We have opted for supervised regression for two main reasons: First, a pipe gradually clogs up because of multiple factors that continuously evolve over the course of several months (e.g., buildup of wax deposits, debris, etc.); as a consequence, classification algorithms are not suitable for this kind of prediction, as they provide discrete outputs (e.g., binary labels, such as clean pipe and clogged pipe) and disregard any intermediate stage; lastly, by predicting the value of a continuous variable through regression, we can express such a variable as an occlusion indicator that can be easily understood by nonexperts in the field. For instance, an automated system can be set up such that if the amount of inner deposits is above a certain threshold, a cleanup campaign is consequently triggered in the pipeline.
Employing supervised learning techniques requires having labeled data at disposal, which are rarely available in pipeline transportation systems (Lygren et al. 2019). To overcome this issue, we have manually built the target function to be learned by the supervised regressor and expressed it as a numeric variable ranged between zero and unity. For a given pipeline segment between two stations i and , is defined as:
where is a mapping function of the long-term specific attenuation between stations i and (e.g., the magenta curve displayed in Fig. 6), which rescales the data in a range comprised between zero and unity. The result of such an operation is displayed in Fig. 8, where the time series of (corresponding to the segment) is represented. To make the representation more straightforward, we have substituted the numeric values displayed on the vertical axis with their qualitative interpretation (e.g., 0 and 1, respectively, translate into clean and dirty). Lastly, the plotted curve presents some gaps, which correspond to missing attenuation samples; in all those circumstances, the machine learning algorithm cannot be either trained or tested.
The learning task is performed by an extremely randomized trees regressor (ERTR) (Geurts et al. 2006), which is a supervised meta-estimator trained to fit several decision tree regressors (DTRs) and to provide a numerical output that is the average of their predictions. In extremely randomized trees, each DTR is created by introducing randomness during their generation phase. ERTRs share the same working principle as standard DTRs, in which a model is fit to the training data based on a set of intuitive decision rules (e.g., if/else statements) (Quinlan 1986), which are directly derived from the input features; such a model can then predict a numerical quantity that is a nonlinear function of the input characteristics. In our case, we have designed an ERTR to provide automatically, as output, a continuous random variable ranged between zero and unity.
Compared with a DTR, implementing an ERTR can be more advantageous for several reasons. First of all, ERTRs do not exhibit the high-variance issues affecting DTRs (Dietterich and Bae 1995), which are usually associated with overfitting a model to the data: As said, the variance of ERTRs is in fact reduced by averaging the estimates provided by several DTRs. Second, they do not favor features having high cardinality (namely, with several unique values) (Louppe 2014), which may become problematic when computing statistical indicators from continuous random variables (e.g., time series data). Lastly, they also provide information, by means of relative rank assessment, on which features contribute the most to the final prediction (Louppe 2014).
Training a supervised regression algorithm requires the evaluation of two quantities—an matrix of features and a target vector , where and , respectively, correspond to the number of training examples and input characteristics. As previously discussed, has been computed using ; each row of consists instead in a set of statistical indicators, computed over nine different rolling and causal windows, ranging from 8 hours up to 7 days. More precisely, we have chosen to evaluate the mean, minimum, and maximum values of the short-term attenuation, thus resulting in a final set of 27 features. So, for each of the input examples, the ERTR is fed with a feature vector and a target scalar .
The ERTR model has been trained using data from line segment from 1 June 2013 to 31 May 2014 included, whereas the testing phase has been performed on the same line section from 1 June 2014 to 1 December 2014; this results in an even split between the training (, orange line in Fig. 8) and the test (, black line in Fig. 8) sets, because several months of 2013 are characterized by unavailable data points.
Results and Discussion
The prediction accuracy of the model has been assessed by evaluating the root mean squared value of the estimation error between the estimated pigging probability and the true target vector , and by computing the coefficient of determination (denoted as score). is defined as follows:
where is the length of and corresponds to its transpose. Because the target function had been transformed to be bound between zero and unity, one can also derive the percentage prediction accuracy as .
The coefficient of determination, instead, is a widely used goodness of fit indicator in regression problems and measures how precisely unseen data points are going to be predicted by the model. Such a coefficient is ranged between zero and unity: In the former case, the model performs poorly because it always predicts the expected value of (denoted with ); in the latter instance, the model perfectly explains the data. is expressed as:
where corresponds to the squared estimation error between the th prediction and its target value , while represents the number of samples in the validation set.
With regard to the test set considered (, black line in Fig. 8), we have attained values of , prediction accuracy, and score, respectively, equal to 0.0261, 97.39%, and 0.9906. As a reference, Fig. 9 graphically compares the values of (Fig. 9, black line) with the predictions (, red curve in Fig. 9): The performance is satisfactory, as displayed by a good agreement between the two curves.
True (black line) and predicted (red line) values of the target function for the line section.
True (black line) and predicted (red line) values of the target function for the line section.
Once the robustness of the model has been verified on data belonging to the same distribution of the training set, its generalization capabilities must be assessed on additional unseen data. For this purpose, we have used the dynamic pressure measurements collected at Station B to perform further testing of the model on two additional line sections, labeled and ; they, respectively, correspond to the pipeline lengths connecting Station A with Station B and Station B with Station C. Compared with the case, the entire sample set can now be used for testing, as the model does not require additional training. Fig. 10 graphically compares the true (black line) and the predicted (red curves) values of the target functions for the (Fig. 10, top) and (Fig. 10, bottom) segments, while Table 2 summarizes the values of , prediction accuracy, and score for the three pipeline lengths , , and . Once again, the results are quite satisfactory, as the attained accuracy level is greater than 97% in all three cases. In addition, testing measurements collected along different pipeline subsections allow one to determine which portion of the conduit is subject to the highest occlusion levels among the others: For instance, from July 2014 to September 2014, the segment is more affected by internal deposits than , because the probability indicator of the former is higher than the latter (as displayed in Fig. 10).
True (black line) and predicted (red line) values of the target function for the (top) and (bottom) line segments.
True (black line) and predicted (red line) values of the target function for the (top) and (bottom) line segments.
Performance metrics for the three line segments.
Line Section . | . | Accuracy . | . |
---|---|---|---|
0.0261 | 97.39% | 0.9906 | |
0.0196 | 98.04% | 0.9944 | |
0.0197 | 98.03% | 0.9937 |
Line Section . | . | Accuracy . | . |
---|---|---|---|
0.0261 | 97.39% | 0.9906 | |
0.0196 | 98.04% | 0.9944 | |
0.0197 | 98.03% | 0.9937 |
As a last consideration, the proposed model can potentially be deployed for real-time applications: The input features fed to the ERTR are only dependent on the past history (with respect to the current sample at time ), because they are obtained from moving statistics computed over causal windows, and the evaluation of the instantaneous target can be neglected by accepting the prediction to be temporarily unsupervised (we recall that is a function of the long-term attenuation, which has been expressed as noncausal, 1 week moving average of its short-term correspondent). This tradeoff is still acceptable, as it would simply introduce a delay of 3.5 days in the evaluation of the accuracy metrics described at the beginning of this section; the predicted output , instead, would still be provided instantaneously. Moreover, performance assessment becomes even less urgent whenever the model reliability has already been validated on a sufficiently large data set (e.g., several months or years of historical data).
Conclusion
This paper presents a data-driven methodology to automatically monitor the inner deposits in crude oil transportation pipelines. The proposed solution makes use of standard pressure measurements, collected in two different locations along the pipeline, which are reprocessed and fed to a nonlinear, supervised regression algorithm (ERTR). The latter has been designed to output probability measures that numerically quantify the level of debris within the pipe itself. The data-driven machine learning model has been successfully applied to the vibroacoustic data collected from a crude oil pipeline; its performance has been assessed in terms of prediction accuracy and coefficient of determination, achieving scores, respectively, greater than 97% and 0.99 for all three test sets considered over 18 months. Results obtained so far show the possibility of predicting and tracking the occlusion levels of the entire pipeline and of individual pipe sections; such capabilities prove to be advantageous in the context of planning optimal predictive maintenance strategies, as cleanup campaigns can be triggered only when necessary and on the mostly clogged pipeline sections. In addition, the trained ERTR is potentially employable for real-time integrity assessment applications, thus enabling the opportunity for online monitoring.
Future work includes an additional validation phase on other oil and gas transportation systems, testing on multiphase fluids in the upstream scenario and the definition of optimal threshold criteria which would trigger a new pipeline inspection and operative campaign. We also hypothesize that the attenuation of acoustic waves could potentially be measured from the outside, namely, on the pipe shell by using acceleration sensors or strain gauges—a major advantage of such an implementation is the elimination of internal sensing systems (e.g., temperature or flow rate sensors, etc.), which typically require an inspection chamber, thus guaranteeing additional freedom and convenience in the arrangement of the monitoring setup. This aspect will however be studied and tested more thoroughly in future experiments.
Acknowledgment
This research was mainly carried out in the framework of the R&D–DIONISIO project funded by Eni S.p.A. The authors are grateful to Eni R&M Logistic Department and SolAres JV teams for technical support during the field tests.
Article History
Original SPE manuscript received for review 31 January 2022. Revised manuscript received for review 3 March 2022. Paper (SPE 209825) peer approved 8 April 2022.