There are many reasons why a repository may end up with multiple copies of an article, for example, having the author’s original manuscript and the final post-review copy is a common scenario of near-duplicate content. Another example might be when multiple co-authors deposit the same manuscript without being aware of each other. Detecting (near-)duplicates and distinguishing them from different versions of the same article is both challenging and time-consuming. We have seen that a typical repository will have hundreds of duplicates and near-duplicate records, signifying the scale of this issue.
We very recently surveyed our CORE members to ask what was most important to them and we received wide-ranging feedback. The CORE dashboard provides a range of tools for our data providers and their repository managers and users. Much of the feedback we received was regarding providing additional or enhanced tools for managing repository content via the dashboard. For example, metadata validation and enrichment tools were regarded as highly important.
Interestingly however, what was most important was making repository content machine-readable. This is closely linked to identifying funding information and rights-retention strategies. Ensuring content is machine-readable allows for the extraction of far richer information from full-text documents than that available in the metadata alone. In the U.S., the recent OSPT memo on ‘Ensuring Free, Immediate, and Equitable Access to Federally Funded Research‘ includes machine-readability as a required component of the archiving and deposition of federally funded research.
We recently held the inaugural meeting of the CORE Board of Supporters where we were joined by 32 representatives from the organisations that have committed to supporting the ongoing sustainability of CORE by joining our membership program.
These amazing institutions are critical to the survival of CORE and we’re incredibly grateful for the support they provide us.
Current CORE members
We work with our members as part of our commitment to The Principles of Open Scholarly Infrastructure (POSI), by listening to our members we can understand precisely what is most important to them. Prior to this kickoff meeting, we therefore sent a wide-ranging survey to gauge what really matters to our members’ repositories, their users and the staff that manage them.
We’re proud and excited to announce that the paper authored by our team entitled ‘CORE: a Global aggregation Service for Open access Papers’, was accepted for publication and is now available as an open access article via Nature.com.
This paper is the culmination of work by the whole CORE team, with contributions from team members both past and present. It discusses how CORE has grown from a research project initiated by Dr. Petr Knoth in 2010 to the service it is today, serving over 30 million unique users each month. The paper also elaborates on the continuously growing CORE dataset and details the systematic challenges associated with gathering research papers from thousands of data providers worldwide at an unprecedented scale and the novel solutions developed to address these challenges.
We’re keen to update you with the latest developments as we continue to welcome more CORE Members and keep improving the tools and support for members while delivering on our mission to index all open access research worldwide. In March, we welcomed another six new institutions who have joined CORE as Supporting and Sustaining members; University of Exeter, Cardiff University, Manchester Metropolitan University, University of Hull, University of Nottingham and University of Strathclyde. A huge thank you goes out to all of these amazing folks!
Update 6th July 2023 – Our paper entitled “CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering.” has been accepted to TPDL2023 and will be published in the LCNS series by Springer.
The public release of ChatGPT-3 in November last year captured the public’s imagination and turned this technology into front page news overnight. Only this week we saw the release of its much more powerful sibling, ChatGPT-4. In just a few short weeks there have already been some frankly startling demonstrations of the capabilities of these models, from writing poetry to code completion amongst many others.
As part of our ongoing sustainability plan, in December 2022, we launched the CORE Membership program for data providers. CORE is a not-for-profit service dedicated to the open access mission and one of the signatories of the Principles of Open Scholarly Infrastructures POSI. Following the recently announced changes to our status, to remain free for public use, CORE is leveraging a membership model to help sustain its operations.
We are therefore delighted today to announce that, in the very short time since the membership programme has launched, we have already welcomed ten institutions who have made a public and financial commitment to supporting Open Access infrastructure by becoming Supporting or Sustaining members of CORE.