Posted 1st April 2021. You’ll probably be glad to know that, thankfully, none of what follows is actually true. We do hope however it brought even a weak smile to your REF-weary faces. We, like you, are at the behest of far greater and more powerful forces so, when and if REF2026 comes around, we’ll be hanging on for the ride too! Happy holidays to those lucky enough to get some time off over the next couple of weeks.
October to December 2020 CORE broke records, partnered with arXiv.org and continued improving our REF2021 compliance monitoring service. The CORE team had a busy end to 2020!
Our team concentrated on multiple areas, including collaboration with the open access community and new feature development. Find out more details on Jisc Research Blog.
CORE Discovery helps users find freely accessible copies of research papers that might be behind a paywall on the publisher’s website. It is backed by our huge dataset of millions of full text open access papers as well as content from widely used external services beyond CORE. The tool not only provides state-of-the-art coverage of freely available content, it is the only discovery service which:
- delivers state-of-the-art performance compared to other discovery tools in terms of both content coverage (finding a freely available copy when it is available) and precision (reliably delivering a free copy of the paper on success);
- is run by researchers for researchers (as opposed to companies);
- has the best grip on content from the global network of open repositories;
- can deliver to readers other relevant freely available research papers even in situations where a freely available version is not available from anywhere on the web.
To satisfy the needs of CORE users, the world’s largest global aggregator of open access research papers now helps users access articles of their interest. Generally, discovery tools can find typically free copies of papers for about 15%-30% of published documents (slide 11). This means that in more than 70% of cases, they don’t bring to the user anything useful. CORE Discovery can offer the user relevant documents even in situations where other discovery tools are not successful. What distinguishes CORE Discovery from other discovery services on the market is that it does not stop when an open access version is not available, but always aims to offer related open access articles to the end user.
One of our aims was to showcase in a more clear way the CORE testimonials, i.e. what others think of the project and how the community uses our products, mainly our API and Datasets. In an effort to give credit to the universities and companies
that are using our services, such as our
Last month, CORE attended the JISC ORCID hackday events in Birmingham and London. (ORCID is a non-profit organisation that aims to solve the author disambiguation problem by offering unique author identifiers). Following the discussions that sparked off at the two events, we decided to test the CORE data towards ORCID’s API and we discovered some information that we think is of interest to the scholarly community.
Currently, CORE has data for 5.5 million unique Document Object Identifiers (DOIs) linked to records in our database (both metadata only and full text). Based on this number, we wanted to find out how many of these DOIs were connected to an ORCID id. Therefore, we set up a script that called the ORCID API obeying to the rate limit. In around 7 days we had collected the full results.
From the 5,523,577 articles with a DOI that existed in the CORE collection, we discovered that 196,713 different authors had an ORCID id, and 927,645 articles included at list one ORCID id.
We found that 16% of the DOIs in CORE are connected to at least one author registered in ORCID. The following map shows the distribution of the ORCID ids discovered across the world.
Why is this useful? It enables us to assess the ORCID’s coverage across a large multidisciplinary dataset of Open Access papers. Doing some more digging in the data (we haven’t done this yet), it would also be possible to analyse the growth of ORCID over time. These data can be sliced and diced according to various criteria, such as geographical coverage or repository, to understand how ORCID coverage can be improved.
Based on our results, the UK has the biggest number of ORCID IDs. However, this result is a bit skewed by the fact that CORE has an excellent content coverage across UK repositories.
We also tried to find authors with ORCID IDs who deposited content in one of the UK repositories. Our result indicates that 68,849 ORCIDs were discovered from 254,467 unique DOIs. It was then very useful to look at the distribution of the top 15 repositories based on ORCID IDs across the UK repositories. This analysis can be extremely helpful in identifying repositories with low ORCID coverage and encouraging them to take an appropriate action.
Repositories implementing RIOXX have already the possibility to expose ORCID IDs through an attribute in the rioxxterm:author tag. While this opportunity exists, our quick survey showed that only few repositories supporting RIOXX have implemented it. Thanks to John Salter, Software Developer at Leeds University, for his help in collecting the data and creating the chart. John is currently working on including the ORCID IDs in the White Rose repository, which “forced” us 🙂 to use a log-scale in the chart due to the widespread implementation of the ID attribute in their metadata (>6k ORCID IDs vs less than 10 IDs from the other repositories)
We have made the dataset available online on Github and it can be found here .
There are few caveats in the data that must be taken into consideration. Our main challenge was that some of the aggregated DOIs were not valid or pointing to a journal instead of a paper. The ORCID API returned only a partial match and in the case of the journals’ DOIs, this meant that the ORCID IDs returned results that regarded all the authors of one journal instead of one specific paper.
In this preliminary study we realised that the information we extracted from the data was useful to us and, perhaps, could be useful to repository managers. Our plan is to design and implement a new functionality in the CORE Repositories Dashboard. We are planning to submit a proposal for this to OR2017 and we would really appreciate your feedback. If you are a repository manager and you want to know more email us at firstname.lastname@example.org .
In this year’s Open Repositories 2016, an international conference addressed to the scholarly communications community with a focus on repositories, open access, open data and open science, CORE had 6 items accepted; 1 Paper, 1 Workshop, 1 Repository Rave presentation, 1 Poster and 2 showcases in the Developer Track and Ideas Challenge. The titles and summaries of our accepted proposals are:
Paper: Exploring Semantometrics: full text-based research evaluation for open repositories / Knoth, Petr; Herrmannova, Drahomira
Over the recent years, there has been a growing interest in developing new scientometric measures that could go beyond the traditional citation-based bibliometric measures. This interest is motivated on one side by the wider availability or even emergence of new information evidencing research performance, such as article downloads, views and twitter mentions, and on the other side by the continued frustrations and problems surrounding the application of citation-based metrics to evaluate research performance in practice. Semantometrics are a new class of research evaluation metrics which build on the premise that full text is needed to assess the value of a publication. This talk will present the results of an investigation into the properties of the semantometric contribution measure (Knoth & Herrmannova, 2014). We will provide a comparative evaluation of the contribution measure with traditional bibliometric measures based on citation counting. Our analysis also focuses on the potential application of semantometric measures in large databases of research papers.
Workshop: Mining Repositories: How to assist the research and academic community in their text and data mining needs – a workshop / Pontika, Nancy; Knoth, Petr; van Dijke, Hege; Anastasiou, Lucas
Over the past five years there has been a significant interest in text and data mining (TDM) practices from the European Union (EU). In scholarly communication, TDM is already a developed practice in some scientific fields, for example, in the life sciences and computer science. Nonetheless, after a call that we sent out to the United Kingdom Council of Research Repositories (UKCoRR) list serve, we discovered that there was a limited number of TDM projects that had as their primary source of information the repositories’ collections. To address this challenge, the EU-funded project OpenMinTeD looks to enable the creation of an infrastructure that fosters and facilitates the use of TDM technologies in the scientific publications field, targeting both domain users and TDM experts. In this context we propose a three hour workshop, where we will introduce the topic of TDM to the repositories community, explore how the OpenMinTed project aims to assist with the adoption of TDM practices and present on existing TDM projects that were conducted using text and data from repositories.
Repository Rave presentation: Implementation of the RIOXX metadata guidelines in the UK’s repositories through a harvesting service / Cancellieri, Matteo; Pontika, Nancy
The COnnecting REpositories (CORE) project aims to aggregate content from open access repositories and journals and distribute this content in one central end point facilitating the open access dissemination of the scientific research. In an effort to improve the quality and transparency of the aggregation process of the open access content and create a two-way collaboration between the CORE project and the providers of this content, CORE has created the Repositories Dashboard. The RIOXX Metadata application profile aims to assist repository managers in tracking compliance with the Research Councils UK Policy on Open Access and Guidance. In this Repositories Rave session we will present how CORE is implementing the RIOXX metadata in the CORE Dashboard.
Poster: Intergration of IRUS-UK statistics in the CORE Repositories Dashboard/ Pearce, Samuel; Pontika, Nancy
The COnnecting REpositories (CORE) project aims to aggregate content from open access repositories and journals, and distribute this content in one central point facilitating the open access dissemination of the scientific research. Institutional Repository Usage Statistics UK (IRUS-UK) is a Jisc-funded project that serves as a national repository usage statistics aggregation service, which aims to provide article download statistics from UK repositories. At CORE, we wanted to present the information regarding the manuscripts’ downloads to repository managers and therefore, we have integrated into the CORE Repositories Dashboard. In this poster we will present a) the submission process of the IRUS-UK statistics and b) how CORE retrieves these statistics and displays them to the UK Higher Education Institutions (HEIs).
Developer Track and Ideas Challenge:
Oxford vs Cambridge Contest: Collecting Open Research Evaluation Metrics for University Ranking
This new API should be more reliable and have a higher quality metadata output compared to the old version.
Over the next few months, we aim to finalise the API v2 and finally close access to v1. The scheduled date of the v1 switch off is Monday, 25th April 2016.
We hope that most users have already had an opportunity to test v2 of the API but if not, we suggest that you check out the documentation here.
For those who are wondering:
- API keys generated for API v1 will continue to work for v2. If you want to register or have forgotten your key, please fill out this form.
We hope to make this transition as smooth as possible so if there is something missing from our API v2 or need assistance, please get in touch or write in the comments below.
Like in any type of partnership, the harvesting process is a two way relationship, were the content provider and the aggregator need to be able to communicate and have a mutual understanding. For a successful harvesting it is recommended that content providers apply the following best practices (some of the following recommendations relate generally to harvesting, while some are CORE specific):
There are also three additional points that need to be considered with regards to the <dc:identifier>; a) this field should describe an absolute path to the file, b) it should contain an appropriate file name extension, for example “.pdf” and c) the full-text items should be stored under the same repository domain.
- the content harvested from the repository enabling its management, such as by requesting metadata updates or managing take-down requests,
- the times and frequency of content harvesting, including all detected technical issues and suggestions for improving the efficiency of harvesting and the quality of metadata, including compliance with existing metadata guidelines,
- statistics regarding the repository content, such as the distribution of content according to subject fields and types of research outputs, and the comparison of these with the national average.
In the CORE Dashboard there is a designated page for every institution, where repository managers will be able to add all the information that corresponds to their own repository, such as the institution’s logo, the repository name and email address.
The Dashboard allows repository managers to create accounts for other colleagues as well.
With regards to managing the harvested outputs, the Dashboard enables repository managers to add or remove documents’ full-text in the CORE collection without having to contact the CORE team.
These actions can now be completed immediately only by clicking on the “Take Down” or “Take up” buttons. It is also possible to download a CSV file of all the records harvested by CORE from a repository.
CORE can also be notified about metadata changes through the Dashboard. Repository managers can click on the blue “Update Metadata” button and then the single item’s metadata will be updated without the need for a full repository harvest.