The phrase unstructured data usually refers to information that doesn't reside in a traditional row-column database. The larger part of enterprise data nearly 80 %, is unstructured and has been much less accessible. From email, text documents, study reports, presentations, memos, to audios, videos and more, unstructured data is huge body of information. This paper proposes a work in progress model to deal with unstructured data management.
In any E&P company, there is data lying in unstructured format including, local drives, network drives, share points, emails, etc. Data sensitivity plays an important role in classifying the data. Irrespective of the classification, it still holds a valuable information, which can be used for predicting business problems in analytical way. The way knowledge is shared among business through email, attachments, flat files, presentations, it requires a robust system/solution to manage the unstructured data. One of the examples could be, related to decision making. Business decision making happens over email or phone calls. There is a huge knowledge potential that exists in the emails of the business. There is a need to extract this information in a way that, it can be utilized in future for analytical decision making. Duplication is an important aspect of unstructured data managed which needs to be tackled. If we scan the current system, we can find various copies of same document, lying at different places in the organization. Same data keeps on circulating among the business users, thus causing the duplications. By having a system that controls the duplication of unstructured data in a meaningful way, will be beneficial for the organization.
With the ongoing advancements in Machine learning and Natural language processing with combined analytical tools, time has come to extract value out of unstructured data. The proposed method will be to identify, gather and classify the unstructured data. Create and use a content management tool to organize and manage the unstructured data.Create a standard engine to deal with unstructured data, without having to convert it to structured data format. Apply an analytical engine at the top of this content and do prediction on the data. Whenever a new data comes into the content management, it gets ingested into the prediction analysis tool to assist business in decision making.