How to Implement Knowledge Graphs and Large Language Models (LLMs) Together at the Enterprise Level
A survey of the current methods of integration
Large Language Models (LLMs) and Knowledge Graphs (KGs) are different ways of providing more people access to data. KGs use semantics to connect datasets via their meaning i.e. the entities they are representing. LLMs use vectors and deep neural networks to predict natural language. They are often both aimed at ‘unlocking’ data. For enterprises implementing KGs, the end goal is usually something like a data marketplace, a semantic layer, to FAIR-ify their data or to make their enterprise more data-centric. These are all different solutions with the same end goal: making more data available to the right people faster. For enterprises implementing an LLM or some other similar GenAI solution, the goal is often similar: to provide employees or customers with a ‘digital assistant’ that can get the right information to the right people faster. The potential symbiosis is clear: some of the main weaknesses of LLMs, that they are black-box models and struggle with factual knowledge, are some of KGs’ greatest strengths. KGs are, essentially, collections of facts, and they are fully interpretable. But exactly how can and should KGs and LLMs be implemented together at an enterprise?
When I was searching for a job last year, I had to write a lot of cover letters. I used ChatGPT to help — I’d copy my existing cover letter into the prompt window, along with my resume and the job description of the job I was applying for, and ask ChatGPT to do the rest. ChatGPT helped me gain momentum with some pretty solid first drafts, but unchecked, it also gave me years of experience I didn’t have and claimed I went to schools I never attended.
I bring up my cover letter because 1) I think it is a great example of the strengths and weaknesses of LLMs, and why KGs are an important part of their implementation and 2) this use case is not that different from what many large enterprises are using LLMs for currently: automated report generation. ChatGPT does a pretty good job of recreating a cover letter by changing the content to be more focused on a specific job description, as long as you explicitly include the existing cover letter and job description in the prompt. Ensuring the LLM has the right content is where a KG comes in. If you simply write, ‘write me a cover letter for a job I want,’ the results are going to be laughable. Additionally, the cover letter example is a great application of an LLM because it is about summarizing and restructuring language. Remember what the second L in LLM stands for? LLMs have, historically, focused on unstructured data (text) and that is where they excel, whereas KGs excel at integrating structured and unstructured data. You can use the LLM to write the cover letter but you should use a KG to make sure it has the right resume.
Note: I am not an AI expert but I also don’t really trust anyone who pretends to be. This space is changing so fast that it is impossible to keep up, let alone predict what the future of AI implementation at the enterprise level will look like. I describe some of the ways KGs and LLMs are being integrated currently, as I see it. This is not a comprehensive list and I am open to additions and suggestions.
The two ways KGs and LLMs are related
There are two ways KGs and LLMs are interacting right now: LLMs as tools to build KGs and KGs as inputs into LLM or GenAI applications. Those of us working in the knowledge graph space are in the weird position of building things that are expected to improve AI applications, while AI simultaneously changes the way we build those things. We are expected to optimize AI as a tool in our day to day while changing our output to facilitate AI optimization. These two trends are related and often overlap, but I’ll discuss them one at a time below.
Using LLMs to assist in the KG creation and curation process
LLMs are valuable tools for building KGs. One way to leverage LLM technology in the KG curation process is by vectorization (or embedding) your KG in a vector database. A vector database (or a vector store) is a database built to store vectors or lists of numbers. Vectorization is one of, if not the, core technological component driving language models. These models, through incredible amounts of training data, learn to associate words with vectors. The vectors capture semantic and syntactic information about the word based on its context in the training data. By using an embedding service trained using these incredible amounts of data, we can leverage that semantic and syntactic information in our KG.
Note: vectorizing your KG is by no means the only way to use LLM-tech in KG curation and construction. Also, none of these applications of LLMs are new to KG creation. NLP has been used for decades for entity extraction for example, LLM is just a new capability to assist the ontologist/taxonomist.
Some of the ways LLMs can help in the KG creation process are:
- Entity resolution: Entity resolution is the process of aligning records that refer to the same real-world entity. For example, acetaminophen, a common pain reliever used in the US and sold under the brand name Tylenol, is called paracetamol in the UK and sold under the brand name Panadol. These four names are nothing alike, but If you were to embed your KG into a vector database, the vectors would have the semantic understanding to know that these entities are closely related.
- Tagging of unstructured data: Suppose you want to incorporate some unstructured data into your KG. You have a bunch of PDFs with vague file names but you know there is important information in those documents. You need to tag these documents with file type and topic. If your topical taxonomy and document type taxonomy have been embedded, all you need to do is vectorize the documents and the vector database will identify the most relevant entities from each taxonomy.
- Entity and class extraction: Create or enhance a controlled vocabulary like an ontology or a taxonomy based on a corpus of unstructured data. Entity extraction is similar to tagging but the goal here is about enhancing the ontology rather than incorporating unstructured data into KG. Suppose you have a geographic ontology and you want to populate it with instances of towns, cities, states, etc. You can use an LLM to extract entities from a corpus of text to populate the ontology. Likewise, you can use the LLM to extract classes and relationships between classes from the corpus. Suppose you forgot to include ‘capital’ in your ontology. The LLM might be able to extract this as a new class or a property of a city.
Using KGs to power and govern GenAI pipelines
There are several reasons to use a KG to power and govern your GenAI pipelines and applications. According to Gartner, “Through 2025, at least 30% of GenAI projects will be abandoned after proof of concept (POC) due to poor data quality, inadequate risk controls, escalating costs or unclear business value.” KGs can help improve data quality, mitigate risks, and reduce costs.
Data governance, access control, and regulatory compliance
Only authorized people and applications should have access to certain data and for certain purposes. Usually, enterprises want certain types of people (or apps) to chat with certain types of data, in a well-governed way. How do you know which data should go into which GenAI pipeline? How can you ensure PII does not make its way into the digital assistant you want all of your employees to chat with? The answer is data governance. Some additional points:
- Policies and regulations can change, especially when it comes to AI. Even if your AI apps are compliant now, they might not be in the future. A good data governance foundation allows an enterprise to adapt to these changing regulations.
- Sometimes, the correct answer to a question is ‘I don’t know,’ or ‘you don’t have access to the information required to answer that question,’ or ‘it is illegal or unethical for me to answer that question.’ The quality of responses is more than just a matter of truth or accuracy but also of regulatory compliance.
- Notable players implementing or enabling this solution (alphabetically): Semantic KG companies like Cambridge Semantics, data.world, PoolParty, metaphacts, and TopQuadrant but also data catalogs like Alation, Collibra, and Informatica (and many many more).
Accuracy and contextual understanding
KGs can also help improve overall data quality — if your documents are filled with contradictory and/or false statements, do not be surprised when your ChatBot tells you inconsistent and false things. If your data is poorly structured, storing it in one place isn’t going to help. That is how the promise of data lakes became the scourge of data swamps. Likewise, if your data is poorly structured, vectorizing it isn’t going to solve your problems, it’s just going to create a new headache: a vectorized data swamp. If your data is well structured, however, KGs can provide LLMs with additional relevant resources to generate more personalized and accurate recommendations in several ways. There are different ways of using KGs to improve the accuracy of an LLM, but they generally fall under the category of natural language querying (NLQ)— using natural language to interact with databases. The current ways NLQ is being implemented, as far as I know, are through RAG, prompt-to-query, and fine-tuning.
Retrieval-Augmented Generation (RAG): RAG means supplementing a prompt with additional relevant information outside of the training data to generate a more accurate response. While LLMs have been trained on vast amounts of data, they have not been trained on your data. Think of the cover letter example above. I could ask an LLM to ‘write a cover letter for Steve Hedden for a job in product management at TopQuadrant’ and it would return an answer but it would contain hallucinations. A smarter way of doing that would be for the model to take this prompt, retrieve the LinkedIn profile for Steve Hedden, retrieve the job description for the open position at TopQuadrant, and then write the cover letter. There are currently two prominent ways of doing this retrieval: by vectorizing the graph or by turning the prompt into a graph query (prompt-to-query).
- Vector-based retrieval: This method of retrieval requires that you vectorize your KG and store it in a vector store. If you then vectorize your natural language prompt, you can find vectors in the vector store that are most similar to your prompt. Since these vectors correspond to entities in your graph, you can return the most ‘relevant’ entities in the graph given a natural language prompt. This is the exact same process described above under the tagging capability — we are essentially ‘tagging’ a prompt with relevant tags from our KG.
- Prompt-to-query retrieval: Alternatively, you could use an LLM to generate a SPARQL or Cypher query and use that query to get the most relevant data from the graph. Note: you can use the prompt-to-query method to query the database directly, without using the results to supplement a prompt to an LLM. This would not be an application of RAG, since you are not ‘augmenting’ anything. This method is explained in more detail below.
Some additional pros, cons, and notes on RAG and the two retrieval methods:
- RAG requires, by definition, a knowledge base. A knowledge graph is a knowledge base, and so proponents of KGs are going to be proponents of RAG powered by graphs (sometimes called GraphRAG). But RAG can be implemented without a knowledge graph.
- RAG can supplement a prompt based on the most relevant data from your KG based on the content of the prompt, but also the metadata from the prompt. For example, we can customize the response based on who asked the question, what they have access to, and additional demographic information about them.
- As described above, one benefit of using the vector-based retrieval method is that if you have embedded your KG into a vector database for tagging and entity resolution, the hard part is already done. Finding the most relevant entities related to a prompt is no different than tagging a chunk of unstructured text with entities from a KG.
- RAG provides some level of explainability in the response. The user can now see the supplemental data that went into their prompt, along with, potentially, where the answer to their question lives in that data.
- I mentioned above that AI is affecting the way we build KGs while we are expected to build KGs that facilitate AI. The prompt-to-query approach is a perfect example of this. The schema of the KG will affect how well an LLM can query it. If the purpose of the KG is to feed an AI application, then the ‘best’ ontology is no longer a reflection of reality but a reflection of the way AI sees reality.
- In theory, more relevant information should reduce hallucinations, but that does not mean RAG eliminates hallucinations. We are still using a language model to generate a response, so there is still plenty of room for uncertainty and hallucinations. Even with my resume and job description, an LLM might still exaggerate my experience. For the text to query approach, we are using the LLM to generate the KG query and the response, so there are actually two places for potential hallucinations.
- Likewise, RAG offers some level of explainability, but not entirely. For example, if we used vector-based retrieval, the model can tell us which entities it included because they were the most relevant, but it can’t explain why those were the most relevant. If using an auto-generated KG query, the auto-generated query ‘explains’ why certain data was returned by the graph, but the user will need to understand SPARQL or Cypher to fully understand why those data were returned.
- These two approaches are not mutually exclusive and many companies are pursuing both. For example, Neo4j has tutorials on implementing RAG with vector-based retrieval, and on prompt-to-query generation. Anecdotally, I am writing this just after attending a conference with a heavy focus on KG and LLM implementation in life sciences, and many of the life sciences companies I saw give presentations are doing some combination of vector-based and prompt-to-query RAG.
- Notable players implementing or enabling this solution (alphabetically): data.world, Microsoft, Neo4j, Ontotext, PoolParty, SciBite, Stardog, TopQuadrant (and many many more)
Prompt-to-query alone: Use an LLM to translate a natural language query into a formal query (like in SPARQL or Cypher) for your KG. This is the same as the prompt-to-query retrieval approach to RAG described above, except that we don’t send the data to an LLM after it is retrieved. The idea here is that by using the LLM to generate the query and not interpret the data, you are reducing hallucinations. Though, as mentioned above, it doesn’t matter what the LLM generates, it can contain hallucinations. The argument for this approach is that it is easier for the user to detect hallucinations in the auto-generated query than in an auto-generated response. I am somewhat skeptical about that since, presumably, many users who use an LLM to generate a SPARQL query will not know SPARQL well enough to detect issues with the auto-generated query.
- Anyone implementing a RAG solution using prompt-to-query retrieval can also implement prompt-to-query alone. These include: Neo4j, Ontotext, and Stardog.
KGs for fine-tuning LLMs: Use your KG to provide additional training to an off-the-shelf LLM. Rather than provide the KG data as part of the prompt at query time (RAG), you can use your KG to train the LLM itself. The benefit here is that you can keep all of your data local — you don’t need to send your prompts to OpenAI or anyone else. The downside is that the first L in LLM stands for large and so downloading and fine-tuning one of them is resource intensive. Additionally, while a model fine-tuned on your enterprise or industry-specific data is going to be more accurate, it will not eliminate hallucinations altogether. Some additional thoughts on this:
- Once you use the graph to fine-tune the model, you also lose the ability to use the graph for access control.
- There are LLMs that have already been fine-tuned for different industries like MedLM for healthcare and SecLM for cybersecurity.
- Depending on the use case, a fine-tuned LLM might not be necessary. For example, if you are largely using the LLM to summarize news articles, the LLM might not need special training.
- Rather than fine-tuning the LLM with industry specific information, some are using LLMs fine-tuned to generate code (like Code Llama) as part of their prompt-to-query solution.
- Notable players implementing or enabling this solution (alphabetically): As far as I know, Stardog’s Voicebox is the only solution that uses a KG to fine-tune an LLM for the customer.
A note on the different ways of integrating KGs and LLMs I have listed here: These categories (RAG, prompt-to-query, and fine-tuning) are neither comprehensive nor mutually exclusive. There are other ways of implementing KGs and LLMs and there will be more in the future. Also, there is considerable overlap between these solutions and you can combine solutions. You can run a vector-based and prompt-to-query RAG hybrid solution on a fine-tuned model, for example.
Efficiency and scalability
Building many separate apps that do not connect is inefficient and what Dave McComb refers to as a software wasteland. It doesn’t matter that the apps are ‘powered by AI’. Siloed apps result in duplicative data and code and overall redundancies. KGs provide a foundation for eliminating these redundancies through the smooth flow of data throughout the enterprise.
Gartner’s claim above is that many GenAI projects will be abandoned due to escalating costs, but I don’t know whether a KG will significantly reduce those costs. I don’t know of any studies or cost-benefit analyses done to support that claim. Developing an LLM-powered ChatBot for an enterprise is expensive, but so is developing a KG.
Conclusion
I won’t pretend to know the ‘optimal’ solution and, like I said above, I think anyone who pretends to know the future of AI is full of it. I do believe that both KGs and LLMs are useful tools for anyone trying to make more data available to the right people faster, and that they each have their strengths and weaknesses. Use the LLM to write the cover letter (or regulatory report), but use the KG to make sure you give it the right resume (or studies or journal articles or whatever).
Generally speaking, I believe in using AI as much as possible to build, maintain, and extend knowledge graphs, and also that KGs are necessary for enterprises looking to adopt GenAI technologies. This is for several reasons: data governance, access control, and regulatory compliance; accuracy and contextual understanding; and efficiency and scalability.
How to Implement Knowledge Graphs and Large Language Models (LLMs) together at the Enterprise Level was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
How to Implement Knowledge Graphs and Large Language Models (LLMs) together at the Enterprise Level