5. Search engines. Search engines have been around for a long time. Search engines have the capability of operating on unstructured data as well as structured data. The only problem is that search engines still need for data to have context in order for a search to produce sophisticated results. While search engines can produce some limited results while operating on unstructured data, sophisticated queries are out of the reach of search engines. The missing ingredient that search engines need is the context of data which is not present in unstructured data.
So the data warehouse has arrived at the point where it is possible to include big data in the realm of data warehousing. But in order to include big data, it is necessary to overcome a very basic problem—the data found in big data is void of context, and without context, it is very difficult to do meaningful analysis on the data.
While it is possible that data warehousing will be extended to include big data, unless the basic problem of achieving or creating context in an unstructured environment is solved, there will always be a gap between big data and the potential value of big data.
Deriving context then is the forthcoming major issue of data warehouse and big data for the future. Without being able to derive context for unstructured data, there are limited uses for big data. So exactly how can context of text be derived, especially when context of text cannot be derived from the text itself?
Two Ways to Derive Context for Unstructured Data
In fact, there are two ways to derive context for unstructured data. Those ways are “general context” and “specific context.” General context can be derived by merely declaring a document to be of a particular variety. A document may be about fishing. A document may be about legislation. A document may be about healthcare, and so forth. Once the general context of the document is declared, then the interpretation of text can be made in accordance with the general category.