SoFAIR study paper accepted to JCDL2025 

The Open University is the project coordinator for the 2-year CHIST-ERA funded SoFAIR project which aims to make research software a first-class, FAIR research object (Findable, Accessible, Interoperable, and Reusable). 

We are excited to share that  our paper, “Identifying and Classifying Software Mentions in Full-Text Scholarly Documents,” has been accepted for presentation at the Joint Conference on Digital Libraries (JCDL 2025). This work reports the first systematic evaluation of large language models (LLMs) for detecting and classifying software mentions in research papers. Using benchmark datasets, SoftCite, SoMeSci, and the new SoFAIR corpus the study compares different prompting and retrieval strategies, showing that LLM-based approaches substantially outperform previous rule-based and conventional NLP methods, particularly across a multi-disciplinary corpus. The work demonstrates the potential for LLMs to move software-mention detection from a research challenge toward a deployable capability, capable of extracting software names, versions, and publishers. 

The SoFAIR project builds on and extends the infrastructures operated by the consortium partners; CORE (Open University), HAL (CNRS) and Software Heritage (INRIA) to create an end-to-end system for managing the research software lifecycle. The SoFAIR project addresses the long-standing problem that most research software remains hidden or poorly linked by developing an automated, machine-assisted workflow capable of detecting software mentions in research papers, validating them with authors, assigning persistent identifiers (PIDs), and archiving the corresponding code in trusted repositories. The SoFAIR tools and workflow will be validated in three use cases; a life sciences demonstrator for Europe PMC, a multi-disciplinary demonstrator for institutional repositories (represented by HAL) and a digital humanities case study led by IBL-PAN (with links to DARIAH and EOSC).

By enhancing how research software is discovered, cited, and preserved, SoFAIR moves a step closer to making this critical software truly open, reproducible, and reusable.