Content area

Abstract

Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: https://git.zib.de/lpusch/cyphergenkg-gui

Details

1009240
Identifier / keyword
Title
Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering
Publication title
arXiv.org; Ithaca
Publication year
2024
Publication date
Oct 31, 2024
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2024-11-01
Milestone dates
2024-09-06 (Submission v1); 2024-10-31 (Submission v2)
Publication history
 
 
   First posting date
01 Nov 2024
ProQuest document ID
3123151613
Document URL
https://www.proquest.com/working-papers/combining-llms-knowledge-graphs-reduce/docview/3123151613/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2024. This work is published under http://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-07-21
Database
ProQuest One Academic