In Oil and Gas, technical staff is daily involved in critical activities. Safety is therefore a key priority, even more so with frontier and continuously-updating technologies acting as a fundamental part of the transformation of the traditional industrial processes. While safety reports and investigations have long been adequately stored and continuously monitored by expert professionals, Artificial Intelligence applications to natural language now provide the opportunity to develop a decision support system capable of extracting insights, predicting the risk of future operations, performing scenario analysis and prescribing risk mitigation actions on massive amounts of data. In this work, we used an Open Innovation approach to develop a Safety Pre-Sense system, leveraging Machine Learning and Natural Language Processing techniques as well as incorporating multiple different (and often unexpected) sources of information.

Starting from standard Natural Language Processing tasks, we leverage linguistic patterns to build binary Document-Term Matrices. Operating on these Matrices, we implemented a Domain Keyword Extraction algorithm to extract words (or multi-words) that have high specificity.

Our pipeline also provides a language-agnostic method to detect similarities between documents written in different languages and cluster them accordingly, in order to obtain clear descriptors that can be used to understand their meaning. To do so, we map our text in a high-dimension vector space where we apply cluster analysis to group documents that are semantically close into consistent and multilingual groups. We then extract, for each language, a list of domain keywords that characterize every cluster.

Next, we identify similarities in the data in a completely data-driven manner, with the objective of extracting correlations between event features (such as geographical location and cause or type of event). As a result, we extract new aggregations of complex items such as severe Accidents or Work Processes. We also demonstrate how Correspondence Analysis and Pattern Mining algorithms are able to extract and visualize correlations between topics and events, leveraging a dynamic Qlik dashboard.

Finally, we point at additional sources of information, both internal and external to our company, that can be used to enhance our analysis.

You can access this article if you purchase or spend a download.