Abstract

The oil & gas industry has seen data dramatically increase over the last several years. Petroleum exploration and production is at the crossroad to save, protect and employ data and adjust to new technology forces to be competitive in the real world. In this circumstance, it is very important to integrate related data obtained from different sources and to provide more comprehensive data for various analysis purposes. To improve the integrity and consistency of integrated data, it is required to investigate the data sources detailedly and understand them from the aspect of data quality. For small scale data integration for known data sources, the heterogeneity of data sources results in inconsistent data if they are integrated without pre-process, let along data quality problems. This paper proposes a methodology to enhance the data integration performance by cooperating data management utilities in Microsoft SQL Server database management system (DBMS) for small scale data integration without support of commercial software. By semi-automatically capturing metadata (the data about data), data sources are investigated in detail, data quality problems are partially cleansed, and the performance of data integration is improved.

Introduction

The oil & gas industry has seen data dramatically increase over the last several years. In this increasingly competitive market of petroleum industry, exploration and production is at the crossroad to save, protect and employ data and adjust to new technology forces. As operating companies are focusing on turning mountains of data into useful information, data integration becomes the most significant operation in the management of the exploration and production (E & P) data. Data management specialists predicted that a big corporate database or a national data repository (NDR) [1] or a national data center (NDC) would make all data accessible and compatible for everyone, and it would control the rising tide of data in the early 90s. By integrating data from different aspects and different disciplines, data integration paves the path to data sharing and result management in the whole industry.

As data integration becomes increasingly important for every industry to manage its data and to improve its performance, data integration is applied in several different mechanisms and scales. [2] The small scale and commonly-employed approach is to setup data integration for specific purposes, in which data sources of interest are accessed directly and merged together. Since this type of data integration is for specific purpose, it always works. But it is expensive and hard to be extended for other applications. A more promising approach is to provide some frameworks to aid in integrating data from multiple sources. This approach provides well-defined interfaces to applications for data access and manipulation. Digital library is another type of data integration, and they collect results from multiple different data sources for a user's request. Digital libraries are widely applied in book searching, white book searching and searching for other purposes. Data warehouse and database federation take advantage of database management system utilities and provide efficient tools to manage large amount of data efficiently.

This content is only available via PDF.
You can access this article if you purchase or spend a download.