The CORE team were at The University of Zadar in Croatia last week for the 27th International Conference on the Theory and Practice of Digital Libraries (TPDL) where they were presented with the Best Paper award for their submission entitled ‘CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering’
David and Petr accepting the award at the conference dinner
The paper’s authors; David Pride, Matteo Cancellieri and Petr Knoth are incredibly proud to have their work recognised in this way at this prestigious international conference.
CORE-GPT: Bridging the Trust Gap
The paper first provides an experimental study showing that about 80% of references provided by ChatGPT3.5 and ChatGPT4.0 are non-existent or conflated then provides a practical solution based on retrieval augmented generation to counter this problem.
Fig. 1. Citations to answers given by LLMs. Each row represents 5 sources / citations for a single answer. Overall, 72.5% of citations provided by GPT3.5 were fictional. This figure was 71.2% for GPT4
CORE-GPT combines the language generation capabilities of large language models (LLMs) and the 34 million research papers in CORE to provide evidence backed answers to users queries along with citations to the papers from which the answer content is composed. CORE-GPT’s performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE. Two evaluators assessed the quality of the answers and the relevance of the provided papers. The answers were judged in terms of Comprehensiveness, Trustworthiness and Utility, and the references were judged on their relevance to the user’s question. CORE-GPT scored highly across the majority of scientific domains, demonstrating the practical application of the system.
Fig. 2. Mean comprehensiveness, trust and utility scores for each domain ordered by mean comprehensiveness.
Fig. 3. Mean citation relevance scores for each domain. (Ordered by relevance score for first citation.
The CORE team is now working on scaling the platform to make it available to the 30 million people who use CORE on a monthly basis. Future developments for the platform include making the service available as an endpoint for the CORE API, allowing answers and references to be retrieved programmatically. Further, based on the feedback received from some of our data providers, we are now looking for ways in which CORE-GPT could be adapted to local content and made available to CORE Members, delivering a question-answering system where the answers are drawn from the knowledge base of a specific institution.
Prof Petr Knoth, Head of CORE said:
“CORE-GPT highlights the value of the open access repositories network as a trustable source of knowledge from research papers. In the age of generative AI, this creates both a huge opportunity for improving the way we do research as well as a critical defence mechanism against online misinformation.”
About TPDL
The highly regarded International Conference on Theory and Practice of Digital Libraries (TPDL) is a yearly date for researchers on Digital Libraries and related topics. The conference draws from a broad and multidisciplinary array of research areas including computer science, information science, librarianship, archival science and practice, museum studies and practice, technology, social sciences, cultural heritage and humanities, and scientific communities. Now in its twenty-seventh year, TPDL has been an international reference forum focused on digital libraries and associated technical, practical, and social issues. This year its focus was on bridging the wide field of Research and Information Science with the related field of Digital Libraries.