Distribution pipelines are system of main and service lines that transports the product to each individual home and business place. Typically, it operates at a lower pressure than transmission pipes, and it is not linear referenced in the database. In the meantime, distribution pipelines have more leak records available, which encourages the ability to do machine learning on them. This study applied machine learning methods, including the benchmark performance multiple linear regression (MLR) and decision tree-based extreme gradient boosting regression (XGB), to predict the corrosion-related pipeline leak time with features of pipeline and GIS-related properties. In total, over 30,000 data points were used in this study, while splitting into training and testing data sets for cross-validation. The quality of machine learning predictions was evaluated based on the statistical values, such as the coefficient of determination (R2) and root mean square error (RMSE). As a result, the machine learning algorithms find non-linear relationships in the data set, which could help decision-making in association with the probabilistic risk assessment model.
In the United States, there are fuel pipelines spanning more than 2.6 million miles1. A major portion of the pipelines is gas distribution lines, where the product is delivered from the pressure regulating station to the customer's home or facility. The Pipeline and Hazardous Materials and Safety Administration (PHMSA) finalized rules for Distribution Integrity Management Program (DIMP) plans in 2009, enforcing the distribution pipeline operators to assess, report, and manage the risk associated with the pipeline operation. Corrosion threat is one major threat to the pipeline operation and integrity based on CFR 192, CFR 195 and ASME B31.82. A comprehensive understanding and assessment of corrosion risk are indispensable for a safer pipeline operation. This demands a more precise understanding, prediction, and management of the pipeline corrosion.