Petroleum biomarkers are complex carbon-based molecules derived from formerly living organisms and found in crude oils. These molecules are used by geochemists to get information on the source rocks responsible for the oils generation, such as lithology, depositional environment, organic matter, maturity and age. So they are of paramount importance for Petroleum System Modelling and more generally for exploration de-risking and sedimentary basin characterization purposes.

Very often, biomarkers datasets are very large and interpretation process by geochemists can take several months to complete.

For this reason, we developed an innovative Machine Learning-based support tool to facilitate and speed-up the whole process of biomarkers examination and interpretation.

The core of tool is an advanced clustering method that allows expressing biomarkers data as a combination (mixing) of underlying components, directly ascribable to different source rocks.

Non-negative constraint is a key aspect: the objective is to express each data sample, i.e. a vector with mainly non-negative values such as biomarkers concentrations and/or concentration ratios, as an additive combination of some of the underlying components, whereas subtracting components would not have any physical interpretation. A sparsity constraint is added to find solutions that allow to represent data as an additive combination of few source rock components. Both constraints greatly reduce non-uniqueness of the solution, greatly enhancing interpretability of the results.

The tool then groups data in clusters, each one having a specific geochemical signature given by a set of scores for each of the different biomarkers' parameters. Each sample is assigned to a specific cluster with a "purity" percentage indicator.

Geochemists can then easily use the high-purity samples to label the relevant samples as belonging to different source rocks. Moreover, the tool is able to distinguish the amount of mixing between different source rocks, through accurate deconvolution algorithms.

Two applications of the tool are here presented, borrowed by real exploration case studies. In both cases the tool was able to separate samples into clusters that geochemists successfully recognized as lacustrine, marine and in some cases, transitional, with less than 10% of misclassifications, isolating also strongly biodegraded samples.

This tool opens the doors also to the insertion and integration of other types of data (light hydrocarbons, diamondoids, etc.) for the whole ‘Big Data’ geochemical characterization of a sedimentary basin.

This content is only available via PDF.
You can access this article if you purchase or spend a download.