• Home
  • Enhancing Language Models: The Role of Knowledge Graph Augmentation in Overcoming LLM Challenges

Introduction

Large language models (LLMs) are becoming increasingly popular in natural language processing for their superior competence in various applications. Although LLMs demonstrate remarkable capabilities in zero-shot scenarios where a model performs tasks it hasn’t been directly trained on, using existing knowledge during pre-training on a wide range of data, their performances on several knowledge-intensive tasks such as question answering, text generation, and language translation are not very satisfactory. This reveals that the enormous parameters in LLMs are not sufficient as they cannot store all the world’s knowledge. Several researchers indicate that LLMs still suffer from issues like hallucinations and factual inaccuracy when answering questions. One of the emerging research directions is enhancing LLMs through the augmentation of external knowledge using knowledge graphs. This post comprehensively walks through diverse methods and strategies employed to enhance LLMs by incorporating knowledge graphs.

What is a Knowledge Graph?

A Knowledge Graph can be explained as a collection of interconnected triples, each consisting of a subject, a relation, and an object. These triples represent facts about entities (subjects and objects) and the relationships (relations) between them. It is depicted in a graph format, where entities are represented by nodes, and the relationships between these entities are depicted as edges between the nodes. For instance, “Inception” is connected to “Leonardo DiCaprio” with the relationship “acted in” and to “Christopher Nolan” with “directed by”.

This structure makes it easy to understand the connections between pieces of information and to use them for reasoning, searching, and knowledge discovery.

Applications

Aiding Question Answering Tasks

Knowledge graphs help LLMs (like GPT) improve their ability to answer questions. For example, researchers have developed a method called Retrieve-Rewrite-Answer. It employs the Knowledge Graph to text (KG-to-Text) method to generate semantically coherent textual descriptions from the Knowledge Graph triples. By integrating this structured knowledge into LLMs, they become better at answering questions that require factual information from the knowledge graph. The proposed framework Retrieve-Rewrite-Answer has three steps:

Subgraph Extraction: It involves using annotations (which could be derived from SPARQL queries or other sources) to identify and retrieve specific triples from the Knowledge Graph that match given patterns. This typically involves specifying the desired subjects, predicates, and objects within the query’s WHERE clause to select the relevant parts of the graph for further processing or analysis.

Simple example of subgraph extraction from a knowledge graph using annotations derived from a SPARQL query:

SPARQL Query:

SELECT ?capital ?language
WHERE {
?country rdf:type dbo:Country .
?country dbo:capital ?capital .
?country dbo:officialLanguage ?language .
?country rdfs:label "France"@en .
}

Query result:

Based on this query, the subgraph extracted from the knowledge graph contain triples like:

(France, dbo:capital, Paris)
(France, dbo:officialLanguage, French)

Retrieve: This step focuses on extracting relevant subgraphs from a knowledge graph based on the given question. For instance, “What is zip code of the capital of China?” It employs three steps: hop prediction, relation path prediction, and triple sampling techniques to identify and retrieve the most pertinent pieces of information from the Knowledge Graph.

Hop prediction

The first step in the retrieval process is to predict the number of hops required to answer a given question. This prediction helps determine the depth of exploration needed in the knowledge graph to find relevant information related to the question.

Given the question 𝑞, we employ Pre-trained Language Models(PLMs) to encode the question 𝑞 and obtain the vector representation 𝑞𝑣:

𝑞𝑣 = 𝑃𝐿𝑀(𝑞)

The representation 𝑞𝑣 is then fed into the linear classification layer to predict the probability distribution.

Relation path prediction: Wu, et al., 2023. Article link: https://arxiv.org/abs/2309.11206

Once the number of hops is predicted, the next step involves predicting the relation path that connects the entities in the question-related subgraph. This step aims to identify the sequence of relations that link the entities in the knowledge graph.

Triple sampling of KG: Wu, et al., 2023. Article link: https://arxiv.org/abs/2309.11206

In this step, triples (subject, relation, object) are sampled from the knowledge graph based on the predicted relation path. These triples form the question-related subgraph that contains the necessary information to answer the question.

Rewrite: Once relevant subgraphs are retrieved, this step transforms the structured triples into free-form text. It uses a KG-to-Text model (Graph-Text Transformation by ChatGPT) to verbalise the triples, turning them into semantically coherent, natural language sentences. This process enriches the LLM’s input with contextually relevant, easily understandable knowledge.

Answer: In the final step, the generated free-form text is combined with the original question to enhance the LLM’s reasoning capabilities. This enriched input is fed into the LLM to generate more accurate and contextually rich answers, leveraging the external knowledge encoded in the natural language sentences.

The framework exhibits superior performance in terms of answer accuracy and the usefulness of knowledge statements compared to previous Knowledge Graph-augmented LLM approaches. This suggests that the KG-to-Text enhanced approach leads to better results in KGQA tasks

Sentiment Analysis

Perspective-level sentiment analysis is the computational study of people’s opinions, positive or negative. Knowledge Graph Enhanced Aspect Level Sentiment Analysis(KG-ALSA) is a method that leverages knowledge graphs and dynamic attention mechanisms. These allow the model to pay more attention to the relevant parts of the sentence that contribute more significantly to the sentiment expressed, improving the accuracy of sentiment classification. The DCGRU (Diffusion Convolutional Gated Recurrent Unit) handles the complex relationships and interactions in the data, leading to improved performance in tasks like sentiment analysis. This method improves how we understand feelings in text by using BERT for context, knowledge graphs for understanding relationships, and special attention to focus on important parts, making it better at identifying specific emotions in text across various datasets.

Framework of the aspect level sentiment analysis model from: Sharma et al., 2023. Article link: https://arxiv.org/abs/2312.10048
Embedded structure of knowledge graph: Sharma et al., 2023. Article link: https://arxiv.org/abs/2312.10048
  • T(vi): Transforms input embeddings vi to intermediate representations.
  • hi: Hidden vector encodes semantic information from the input text.
  • St: Sentinel vector prevents misdirection by using external knowledge.
  • αt_k and βt: Correlate synonyms tk and sentinel vector st with the current context.
  • Loop Attention Layer: Iteratively refines focus on the text for sentiment analysis.

To explain using an example, in the review “The sushi rolls were exceptional, but the service was slow.”

  1. Aspect and Sentiment Extraction: The review mentions two aspects of the dining experience: “sushi rolls” and “service.” Sentiments expressed are positive for “sushi rolls” (exceptional) and negative for “service” (slow).
  2. Knowledge Graph Construction: KG-ALSA constructs a knowledge graph that captures the relationship between the aspects mentioned (“sushi rolls” and “service”) and their associated sentiments (“exceptional” for positive and “slow” for negative).
  3. Aspect-Level Sentiment Analysis: The sentiment towards “sushi rolls” is identified as positive, thanks to the adjective “exceptional.” The sentiment regarding “service” is determined to be negative, as indicated by the word “slow.”
  4. Dynamic Attention Mechanism: The model uses a dynamic attention mechanism to focus on the keywords that significantly contribute to the sentiment expressed. For this review, it focuses more on “exceptional” when analysing the sentiment about “sushi rolls” and on “slow” for assessing the sentiment about “service.”
  5. DCGRU Processing: The Diffusion Convolutional Gated Recurrent Unit (DCGRU) processes the complex interactions between the aspects and their sentiments. It ensures that the sentiment analysis accurately reflects the nuanced customer feedback by considering the interconnectedness of these aspects within the knowledge graph’s structure.
  6. Output: KG-ALSA provides an output that assigns a positive sentiment score to “sushi rolls” and a negative sentiment score to “service.” This detailed analysis helps in understanding the specific strengths and weaknesses of the restaurant as per the customer’s review.

Traditionally, creating these Knowledge Graphs involves identifying key entities or terms in a text and figuring out how they are related, often through techniques that look at how often words appear together or how close they are in the text. However, this traditional approach to combining Knowledge Graphs with language models requires constant updates and training, which can be cumbersome and inflexible for quick changes. Thus, researchers introduce automated knowledge graphs.

Automated Knowledge Graph

Researchers address the above traditional Knowledge Graph issues and introduce an AutoKG that leverages pretrained LLMs for extracting keywords from a text corpus without the need for manual curation. It is designed to be computationally efficient and flexible, avoiding the need for ongoing neural network training or fine-tuning for Knowledge Graph updates. This makes it more adaptable to on-the-fly adjustments compared to traditional Knowledge Graphs, which may require significant effort to update or expand. Unlike traditional Knowledge Graphs that are typically composed of complex entities and relationships (often represented as triples of subject-predicate-object), AutoKG simplifies the Knowledge Graph structure. It focuses on keywords as individual points (nodes) and evaluates how strongly they are connected by looking at the words that connect them. These connections are represented as lines (edges) with a single value indicating the strength of the association. After that, researchers used a hybrid search method: one that looks at the meaning of words and another that uses connections between information points. This helps the language models find better and more related information, improving their answers.

Flowchart of the KG Construction Process: Chen et al., 2023. Article link: https://arxiv.org/abs/2311.14740

The explanation for Knowledge Graph construction diagram is as follows:

  1. Embedding Text Block: Transform text blocks into a form that a computer can understand, known as embedding.
  2. Unsupervised Clustering: Group similar text blocks together without pre-labeled categories using methods like KMeans or Spectral Clustering. K-means clustering is used to partition the text blocks into k clusters based on their embedding vectors. It aims to minimise the sum of squared distances between data points and their respective cluster centroids. By grouping text blocks into clusters, k-means helps identify distinct groups of related content within the knowledge base. Spectral clustering is employed to further refine the clustering results obtained from k-means by considering the relationships between data points in a high-dimensional space. It is particularly useful when the data does not exhibit clear separation boundaries in the original feature space, allowing for more nuanced clustering based on connectivity patterns.
  3. Selecting Representative Text Blocks: For each group from clustering, select text blocks that best represent the group.
  4. Extracting Keywords: Use a language model to extract keywords from the selected text blocks. Refine the keyword list by splitting, deduplicating, filtering, and calibrating using the language model.
  5. Associating Keywords with Corpora: Use the embeddings to connect the keywords with the original text blocks. Employ a graph Laplace learning method to strengthen these associations.
  6. Knowledge Graph Construction: Create a knowledge graph where each node is a keyword. Draw edges between nodes to represent the strength of the association; the more text blocks two keywords share, the stronger their link in the graph.
  7. Understanding the Knowledge Graph: Each node (keyword) in the graph has weighted edges that represent the strength of its relationships with other keywords. Each keyword is also connected to several corpora (bodies of text).

Let’s consider a scenario involving a collection of news articles on various topics such as technology, politics, and health. Here’s how AutoKG could simplify the creation and updating of a knowledge graph from this corpus without manual curation:

Extraction of Keywords: Using a pre-trained language model, AutoKG identifies and extracts significant keywords from the articles, such as “Artificial Intelligence,” “Election,” and “COVID-19.” These keywords represent the nodes of the Knowledge Graph.

Evaluation of Connections: AutoKG then assesses how these keywords are connected based on the context in which they appear across the articles. For example, it might find that “Artificial Intelligence” often appears in close proximity to “Machine Learning,” “Technology,” and “Innovation.” Similarly, “Election” might be frequently mentioned alongside “Politics,” “Democracy,” and “Voting.”

Graph Construction: In the Knowledge Graph, each keyword is a node, and the connections between them are represented by edges. The strength of the connection (edge weight) between any two keywords (nodes) is determined by how often and how closely they are associated in the text. For instance, if “Artificial Intelligence” and “Machine Learning” appear together more frequently and in more relevant contexts than “Artificial Intelligence” and “Innovation,” the edge weight between “Artificial Intelligence” and “Machine Learning” would be higher.

Generated KG in graph format

Conclusion

The integration of Knowledge Graphs with Large Language Models (LLMs) presents a promising direction for overcoming the inherent limitations of LLMs, such as factual inaccuracies and hallucinations. By augmenting LLMs with structured, symbolic knowledge from Knowledge Graphs, this approach enables more accurate, contextually relevant outputs across various tasks, including question-answering, sentiment analysis, and reasoning. The development of AutoKG further streamlines this process, offering a flexible, efficient means to enhance LLMs with external knowledge without the need for constant updates. This synergy between Knowledge Graphs and LLMs marks a significant advancement in natural language processing, paving the way for more intelligent, reliable systems.

References

By Asif Raza

Leave Comment