When dealing with basin-scale geological studies, two key elements need to be considered: understanding the temporal relationships of geological events with the help of a PSE (Petroleum System Event) chart and identifying the spatial representation of the petroleum system elements of a basin, expressed as Play Fairway maps or Common Risk Segment maps. In both cases, an important source of information consists of unstructured documents (papers from different publishers or company technical reports) containing the textual description of the geological events of a basin, as well as pictures where Play Fairway maps or Common Risk Segment maps are represented as images. The automatic construction of a PSE chart using powerful cognitive tools, such as Knowledge Graphs, has already been documented in a previous OMC paper [1]. This work focuses on an advanced Artificial Intelligence approach to process the pictures contained in the papers and transform them into a workable representation for the explorationist, such as fully georeferenced GIS (Geographic Information System) objects that can be displayed side by side with all the traditional information available in the GIS system (e.g., basins, reservoirs, wells).
In order to achieve this goal, the pictures in the papers must be processed with very advanced automatic algorithms, such as: Image Classification algorithms, to distinguish images that can be considered as maps from images with different content (e.g. well logs, seismic sections, SEM images); OCR (Optical Character Recognition) algorithms, capable of extracting the textual content of the image; Advanced Artificial Intelligence algorithms, capable of understanding if the textual information extracted from the images can be associated to elements that can be georeferenced (e.g. well names of known coordinates and the corresponding well symbols; geographical coordinates (latitude/longitude) displayed on the border of the image or along a rectangle within the image); GIS processing algorithms, capable of transforming the images, enriched with the geolocation information extracted in the previous steps, into georeferenced objects that can be displayed on a GIS application.
The paper will describe the challenges that had to be faced and the algorithms that have been applied.
To discriminate the images representing maps from those representing other entities (well logs, seismic sections, etc), deep learning models act as very efficient image classifiers. However, the challenge was to build a large training set of images, associated with their classification label. For this purpose, we leveraged on the textual analysis capabilities of the Knowledge Graphs, capable of associating each picture to the corresponding caption, and pre-classifying the images based on their captions. This allowed to build a large training set in a semi-automatic way, just quality checking the classification label based on the caption of the pictures.
In case of images containing well names, the first difficulty is to understand if the text extracted by the OCR algorithm from the image actually represents a well name. The text can be compared with an official database of worldwide wells; however, the names might not exactly correspond; therefore fuzzy search techniques need to be used to assess the similarity between the names. Moreover, it is needed to handle ambiguities and inconsistencies (the same well names might be used in multiple countries). The second difficulty is the correct association of the well names displayed in the image with the corresponding symbols (circles, stars, squares) used to represent the well location on the map. For this last task, a Machine Learning approach has been used to train a well marker detector.
In case of pictures containing geographical coordinates (latitude/longitude) displayed on the border of the image or along a rectangle within the image, a variety of image analysis algorithms have been used to handle several challenges. The first one is represented by images with multiple sub-images, where the key problem is to identify the right rectangle around which the geographic coordinates are displayed. A second challenge is the correct identification of the tick marks on the coordinate axes and associate them to the label of the coordinate. This is a very difficult task, due to the presence of slopes in maps, intersections with text and necessity of a fine tuning to determine the best neighbourhood of the coordinates where marks must be searched.
Finally, the paper will describe how an automatic quality control of the georeferenced image is performed to ensure that only high-quality results are imported into Eni’s GIS system. The paper will finally describe the results that have been achieved with a few examples of the georeferenced maps that have been generated, side by side with the traditional GIS objects (wells, reservoirs, basins).