Integrating Domain-Specific Knowledge Graphs with Large Language Models for Enhanced Question-Answering
In a world where artificial intelligence (AI) is transforming industries, researchers at the Insight Research Ireland Centre for Data Analytics, University of Galway, are working to make AI smarter and more accurate in answering complex, specialised questions. Professor Paul Buitelaar, Ghanshyam Verma and Devishree Pillai from the Insight Centre Galway, in collaboration with Dr Bogdan E. Sacaleanu from Accenture Labs Dublin, are making significant contributions in the field of Natural Language Processing and Large Language Model (LLM)-based question-answering by addressing one of the key limitations of current LLMs: their inability to provide precise answers in fields that require deep, domain-specific knowledge.
LLMs like ChatGPT have become household names, capable of generating text and answering general questions. However, when it comes to more technical queries—especially in fields like healthcare – these models often struggle. The problem lies in their general training: while they know a little bit about a lot of things, they don’t have the specialised knowledge to answer domain-specific questions accurately. This is where the Insight team’s work comes into play.
The Insight team developed an approach called SKnowGPT, designed to enhance LLMs ability to handle domain-specific questions by integrating them with Knowledge Graphs (KGs). KGs are specialised databases that store information in a structured way, such as medical knowledge about diseases, symptoms, treatments and tests. SKnowGPT doesn’t just add knowledge from a KG; it also carefully filters it to remove irrelevant information, ensuring that only the most useful data is used to answer the question.
For example, in the field of medicine, where accuracy is paramount, irrelevant information can confuse the LLM and lead to incorrect answers. SKnowGPT tackles this by finding the most relevant knowledge for a given question, while also pruning away unnecessary data to avoid distractions. This dual-filtering process makes the LLM much more reliable when answering complex medical queries.
Another contribution of this project is the creation of a KG called DisTreatKG by enhancing an existing KG (EMCKG). This expanded KG enables the LLM to draw on a broader range of diseases, symptoms and treatments, making it even more effective in providing accurate answers.
The research has far-reaching implications for improving domain-specific natural language understanding and question-answering systems. Not only does SKnowGPT outperform existing methods like MindMap, but it also demonstrates that smaller LLMs—like Mixtral—can perform just as well as larger, resource-heavy models such as GPT-3.5. This makes the technology both accessible and scalable, opening the door for its use in industries where cost and efficiency are key concerns.
The Insight team’s work is a prime example of how academia and industry can collaborate to solve real-world problems. By releasing their code and datasets to the public, the researchers have ensured that others can build on their findings, accelerating progress in LLMs ability to handle complex, domain-specific inquiries.
This research not only highlights the Insight Centre’s commitment to cutting-edge innovation but also promises to revolutionise how AI is used in high-stakes environments where precision and reliability are non-negotiable. The impact of SKnowGPT and its KG-driven approach will likely extend far beyond healthcare, influencing fields like finance, law and any domain that requires expert-level answers.