The Shape of Comprehensiveness: 13,000 Data Providers and counting.

In the world of science, it’s common to talk about big numbers. Citation counts, impact scores, and download figures often dominate the conversation, somewhat controversially. But numbers on their own rarely tell the full story unless they’re connected to purpose.

As of 4th August 2025, CORE (Connecting Repositories), has surged past the direct data provider mark, a milestone that underscores both our accelerated growth and truly global coverage. It’s tangible evidence of CORE’s constant growth and truly global coverage. Unlike some similar services, CORE’s figure reflects direct data providers, meaning that intermediaries such as DOAJ are counted once rather than tallying every individual journal, offering a clearer and more transparent measure of reach. At CORE, we actively curate, maintain, and support these data providers, ensuring they remain operational and fixing issues on a daily basis not only for our users, but for the benefit of the global repositories community. Unlike services that restrict themselves to content with registered (often paid-for) DOIs, we prioritise comprehensiveness and equal visibility of research outputs from all parts of the world. We share this number because it says something vital about the shape of open research and who is included in it. A perfect example is the groundbreaking paper “Attention Is All You Need”, which appears in CORE’s index via preprint repositories as a seminal work without a DOI, showcasing that critical research can be available outside of traditional commercial publishing channels.

Comprehensiveness has always been at the centre of CORE’s work. Not because size alone is impressive, but because scale without diversity, depth, and intention is just surface-level visibility. From our earliest days, we’ve known that if open science is going to serve the world, it has to reflect the world, not just the parts of it that are well-resourced, DOI-registered, or aligned with commercial publishing pipelines.

While many bibliographic databases consider only content registered by DOI registration agencies such as Crossref, (one of the largest in the world), CORE actively indexes a far wider range of research. As of 4th August 2025, and in comparison to other private commercial scholarly infrastructure services, including OpenAlex, Web of Science and Scopus, CORE works at a global scale, surpassing their reported totals. In fact, based on CORE’s internal analysis (2025), around half of the works in our index don’t have a DOI at all. This means countless doctoral theses, working papers, local conference outputs, grey literature from underrepresented regions and sometimes seminal papers from key conferences like NEURIPS, a prestigious machine learning research conference that doesn’t register DOIs, find a home in CORE when they might be missing everywhere else.

This is not a minor detail. It’s central to how knowledge becomes visible or invisible. Infrastructures that exclude work without DOIs aren’t just incomplete; they’re making a choice about what kinds of research and researchers get to shape the global conversation.

CORE indexes metadata from multiple DOI registration agencies, alongside major repositories such as DOAJ, arXiv, Zenodo, and others. These large repositories capture a significant volume of scholarly works efficiently. But CORE also invests heavily in integrating the long tail of research, including smaller, more specialised repositories, each with its own unique attributes, because every piece of research matters. Behind the scenes, this requires a sustained, systematic effort: regularly detecting and identifying new repositories from diverse sources across the internet (beyond reliance on registries like OpenDOAR, which are no longer maintained); uncovering and diagnosing common issues that prevent repository content from being discoverable; and communicating with repository managers through a mix of email notifications and dedicated Dashboard tools for our supporting repositories.

Through debugging and testing, CORE actively facilitates the release of vast amounts of scholarly information into the public domain that might otherwise remain hidden. This isn’t just a theoretical benefit for example; our work with the UK’s USRN has unlocked significant content, and we’re now applying similar approaches to improve repository visibility in countries like Uganda and Nigeria.

This dual approach, investing in large-scale harvesting while directly supporting smaller and often overlooked repositories is a key driver of our growth in both data providers and comprehensiveness. Reaching 13,000 data providers is not simply about scale; it reflects the importance of a model of discoverability that is open, inclusive, and equitable. Unlike commercial services that silo content for their own gain, CORE works directly with repository communities, providing services through our Dashboard and giving back to ensure repositories themselves benefit from the process.

We do this not only for CORE, but for the scholarly community as a whole, because when repositories become more discoverable, everyone benefits, including the existing commercial research infrastructures.

As we continue to mark 15 years of CORE, this milestone is part of a larger reflection: progress in open science doesn’t come from indexing what’s easy. It comes from creating systems that recognise the value of every contribution and from committing to comprehensiveness not as a feature, but as a foundational principle.

To every data provider that’s helped shape CORE into what it is today, thank you. This milestone is yours, too.