CORE at Open Repositories 2025: Reflections from the Global Stage

Earlier this year, we shared our excitement ahead of the Open Repositories 2025 (OR2025) conference in Chicago. With a packed programme and growing momentum around open science infrastructure, CORE brought a series of contributions focused on the responsible use of AI, metadata innovation, and national-level repository coordination.

Now that the dust has settled and the conference has wrapped up, we’re taking a closer look at the sessions our team presented, the partnerships we strengthened, and the contributions we brought to the open repositories community. From addressing machine access to research content, metadata quality to AI-powered SDG classification and reproducibility, here’s a comprehensive summary of CORE’s significant presence at OR2025.

Day 1: Monday 16th June

SDG Classify: Automating the Classification of Research Outputs into UN SDGs

Authors list: Suchetha Nambanoor Kunnath, KMi, The Open University. Matteo Cancellieri, KMi, The Open University. Petr Knoth, KMi, The Open University.

Presented by Petr Knoth

This presentation introduced SDG Classify, an ingenious AI-powered tool developed by CORE to automatically label research outputs with relevant UN Sustainable Development Goals (SDGs). This tool provides invaluable support to universities in understanding their research’s alignment with critical global impact agendas.

Using an innovative, multi-label, fine-tuned and highly scalable Sentence-BERT (SBERT) model, the system processes research content to suggest one or more SDG labels. The tool has been integrated into the CORE Dashboard and also made accessible via the CORE API, making it easy for institutions to monitor their research contributions.

By automating what would otherwise be a manual time-consuming process, SDG Classify enables evidence-based impact tracking and institutional benchmarking to further understand its contributions to SDGs.

Identifying and Extracting Statements from Full-text Academic Articles Data Access

Authors: Matteo Cancellieri, CORE, KMi, The Open University, David Pride, CORE, KMi, The Open University, Petr Knoth, CORE, KMi, The Open University.

Presented by Matteo Cancellieri

Transparency and reproducibility depend on access to data but data availability statements are often inconsistent or buried in full-text content. This session introduced CORE’s new tool that automatically detects and extracts data access statements from articles across repositories.

Integrated into the CORE Dashboard, the tool allows repository managers to review, enrich, and export data statement metadata, improving the discoverability and integrity of research outputs.

Day 2: Tuesday, 17th June

Managing Machine Access to Open Repositories in the Age of Generative AI

Authors:Matteo Cancellieri, CORE, KMi, The Open University, Martin Klein, Pacific Northwest National Laboratory (PNNL), George Macgregor, The University of Glasgow,Kathleen Shearer, Confederation of Open Access Repositories (COAR),Allison Sherrick, Metropolitan New York Library Council, Petr Knoth, CORE, KMi, The Open University.

Chaired by Matteo Cancellieri

Panel Members:

Martin Klein – Pacific Northwest National Laboratory
Allison Sherrick – Metropolitan New York Library Council
Scott Prater – University of Wisconsin
Petr Knoth – CORE, Open University

In an era where generative AI can significantly benefit from open access repositories as a source of trustworthy training data, repositories are experiencing an increase in traffic from bots and crawlers. While such access aligns with the open science ethos, it also raises concerns around repository system strain, ethical use, and infrastructure fairness.

Petr Knoth, as one of the four panelists, presented CORE’s research and community consultation on this issue, including survey data from CORE’s Board of Supporters. His overall message: repositories should not indiscriminately block machine access to bots entirely, instead, we need to rely on a range of smarter mechanisms and shared protocols to manage access to bots responsibly.

Amongst the recommended measures, his presentation proposed the creation of a community-governed FAIRbots whitelist registry underlined by the principle of bots presumption of innocence. Any bot would be able to freely register in this registry whitelist enabling scholarly communication systems to be able to understand who runs the bots service and for what purposes. The goal? To maintain openness while protecting infrastructure from misuse.

The presentation by Allison Sherrick and the one by Scott Prator offered perspectives from a repository manager’s point of view, elaborating on the severity of the problem while Martin Klein’s presentation addressed the recent COAR survey and presented a study demonstrating how filtering bots can lead to a lack of equitable access to openly available content.

USRN Discovery Pilot: Increasing the Discoverability of Open Access Content Through a National Network

Authors: Petr Knoth, CORE, KMi, The Open University, Paul Walk, Antleaf,Matteo Cancellieri, CORE, KMi, The Open University, Michael Upshall, CORE, KMi, The Open University, Halyna Torchylo, CORE, KMi, The Open University, Jennifer Beamer, SPARC, Kathleen Shearer, COAR, Heather Joseph, SPARC

Presented by Petr Knoth

The USRN Discovery Pilot marks a major step toward national-level repository coordination in the United States. CORE worked with 20 institutions to assess repository readiness, metadata quality, and compliance with emerging US open access mandates.

In just one year, the pilot increased discoverable outputs by over 50% across participating repositories.

This work underscores the importance of shared infrastructure, best practice adoption, and collective action in transforming open access from a local operation to a truly interoperable, national-scale effort.

Additionally, the project delivered a suite of new tools, including a prototype of the “Fresh Finds” discovery tool, a rights retention checker, and the USRN Desirable Characteristics Toolkit.

The USRN Pilot project has demonstrated that a significant impact can be achieved with modest resources. This sets a best practice example showing how repository infrastructure should be supported at a national level by means of technologically driven interventions. CORE is currently anticipating discussions with other local repository networks (whether national or international) to build on and expand this work.

How Do You Describe Software in Record Metadata?

Authors: Matteo Cancellieri, CORE, The Open University, Petr Knoth, CORE, The Open University,

Presented by Matteo Cancellieri

In this session, Matteo Cancellieri explored the complexities of representing software mentions in metadata. Reviewing standards like Codemeta, Dublin Core, and RIOXX, the session addressed how repositories can clearly distinguish between citations, mentions, and relationships to software.

Rather than reinvent metadata standards, the session encouraged pragmatic adoption of existing frameworks, helping repositories improve their support for research software as first-class scholarly outputs. This work is part of the SoFAIR project coordinated by CORE, The Open University

Day 3: Wednesday, 18th June

Interoperable Verification and Dissemination of Software Assets in Repositories Using COAR Notify

Authors: Matteo Cancellieri, CORE, The Open University, Martin Docekal, Brno University of Technology, David Pride, CORE, KMi, The Open University Morane Gruenpeter, Software Heritage, David Douard, Software Heritage,Petr Knoth, CORE, The Open University.

Presented by Matteo Cancellieri

Research software often sits in the shadows of academic publishing. The SoFAIR project aims to change that by building a machine-assisted workflow that identifies, validates, and archives software mentioned in research papers.

At OR2025, the CORE team showcased how this process can be embedded into existing infrastructures using COAR Notify and the CORE Dashboard. By automating the detection of software mentions and routing them for author validation, SoFAIR ensures that critical software assets receive proper recognition and persistent identifiers.

What This Means for the Future

CORE’s compelling presence at OR2025 vividly highlighted the breadth of work we’re doing to support repositories, whether it’s through cutting-edge machine learning, meticulous metadata enrichment, impactful national infrastructure pilots, or fervent open science advocacy. But more than that, the conference powerfully reaffirmed a shared commitment across the community to build open systems that are not only scalable and sustainable, but also inherently fair. Looking ahead, the true strength and future growth of these initiatives will be deeply rooted in the relationships we forged and strengthened.

Collaborations with vital partners along with the myriad of universities, libraries, and other organisations whose collective spirit shone brightly, are absolutely pivotal. It’s through these dynamic partnerships that the innovative ideas discussed at OR2025 will be translated into real-world impact, ensuring that the collective journey towards a more open and equitable research landscape continues with increasing momentum.

You can explore the projects mentioned in this post and more by visiting blog.core.ac.uk, where we regularly share updates, tools, and calls for collaboration.