Knowledge graphs are becoming an increasingly important tool that organizations are using to manage the vast amounts of data they collect, store, and analyze. An enterprise knowledge graph’s representation of an organization’s content and data creates a model that integrates structured and unstructured data, and leverages semantic and intelligent qualities to make them “smart.”
Data Summit Connect 2020 featured a full day of pre-conference workshops, followed by a free 3-day series of data-focused webinars. As part of the virtual conference hosted by DBTA and Big Data Quarterly, Joe Hilger and Sara Nash presented a workshop, titled "Introduction to Knowledge Graphs."
Hilger, who is COO and co-founder of Enterprise Knowledge, LLC, and Nash, who is a technical analyst with the consultancy, covered what a knowledge graph is, how it is implemented, and how it can be used to increase the value of data.
The wide-ranging and interactive presentation covered how to build a business case for knowledge graphs and enterprise AI; the foundations and technical infrastructure to make knowledge graphs a reality—including commonly used terms and concepts such as triples, RDFs, and virtual mapping; practical use cases for knowledge graphs; and where to begin in knowledge graph development—developing an ontology.
What is a Knowledge Graph?
A knowledge graph is a specialized graph or network of the things we want to describe and how they are related, said Nash. It is a semantic model since we want to capture and generate meaning with the model. According to Nash, a simple way to think of a knowledge graph is: ontology + data/content = knowledge graph. "That, I think, is a really helpful way to understand what a knowledge graph is."
Knowledge graphs are built on graph databases which are very good at modeling relationships, said Hilger, noting that, ironically, "relational" databases are not as effective as modeling relationships.
Where Knowledge Graphs Excel
"What knowledge graphs are best at is aggregating multiple different types of information—including datasets—categorizing them, identifying relationships, and then integrating them together, but not necessarily moving the information," Hilger explained. "This is something that is important and powerful: We are not moving information out of its core, original dataset. We are just describing how it comes together. So you can take information from a whole bunch of systems that you already have."
The way to approach it, said Hilger, is to figure out what your data sources are, and define the methods of categorization for information for efficient and effective reporting, including synonyms for terms. The power of knowledge graphs is that, in the long term, when people want to ask about information, when they are starting to query, they want it in a way that aligns with the way they think and perceive those information assets.
The knowledge graph and ontology maps information in a way that is much more aligned with the way people think and ask questions, he said. In this way, people can take a complex data lake or complex information set and then start to model it in a way that is much more aligned with the way people are going to ask questions of the dataset. "That is what we are doing: We are taking data from multiple sources, categorizing it in a way that makes sense, organizing or modeling it in a way that aligns with the way the people think about the business or the organization, and then storing it in a way that will point to the original sources—but have it organized in a way that makes sense. This is what we are talking about when we are pulling together a knowledge graph and this is why you hear all the talk about it these days." While some data can be pulled in directly to the knowledge graph, other times virtual mapping may be used, leaving the original set in place because, particularly when there are large amounts of data, pulling it in and moving it does not make sense, but mapping and organizing it in a way that allows you to pull from that set dynamically is extremely effective, said Hilger.
Knowledge Graph Use Cases
Hilger and Nash presented four case studies in which the use of knowledge graphs helped prominent organizations they worked with achieve their goals.
Use case #1: Recommendation Engine—In the first case, a global bank that focuses on providing development financing for projects in Latin America and the Caribbean needed a better way to disseminate information and expertise to its staff so it could work more efficiently without knowledge loss or duplication of work. Using knowledge graphs based on a linked data strategy enabled the bank to connect all of its knowledge assets to increase the relevancy and personalization of search, allowed employees to discover content across unstructured data sources, and further facilitated connections between people who share similar interests, expertise, or locations.
Use case #2: Natural Language Querying on Structured Data—In the second example that Hilger and Nash highlighted, a large supply chain needed to provide its business users with a way to obtain quick answers based on very large and varied datasets that were stored in a large RDBMS data warehouse with virtually no context available. "If you put all your data in a data lake without a strong metadata strategy, it can be very difficult to get data out," said Nash. The company implemented a knowledge graph, incorporating natural language querying on structured data using SPARQL to allow non-technical users to uncover answers to critical business questions.
Use case #3: Relationship Discovery through Unstructured Data—In the third case study that Hilger and Nash presented, a federally funded research and development center had an extensive project library with technical documents, certifications, and reports related to engineering projects but the library did not offer much metadata and the information was difficult to search. Using a knowledge graph that connected documents and individuals and a semantic search platform, the center is now able to browse documents by person, project, and topic, and analyze relationships between people and projects directly.
Use case #4: Data Management—Finally, when data scientists and economists at a federal agency were having difficulty connecting siloed data sources to access, interpret, and track data to provide context, the use of a knowledge graph and advanced semantic metadata modeling allowed them to access data in a way that is more intuitive, according to Hilger and Nash. Data scientists and economists can now access the agency's data resources through a single tool that makes data stored in multiple locations available without moving or copying data, and they spend less time tracking or processing data for non-technical users who can now directly access and explore the data for decision making.
For more information about Enterprise Knowledge, LLC, go to https://enterprise-knowledge.com.
Webcast replays of Data Summit Connect presentations are available on the DBTA website at www.dbta.com/DBTA-Downloads/WhitePapers.