It is intended for (possibly computationally intensive) data analysis. Here you can read the dataset description and the download page. If you need fresh data, and your requirements are not computationally intensive, you can also use our API.
* This post was authored by Nancy Pontika, Lucas Anastasiou and Petr Knoth.
The CORE team is thrilled to announce the release of a new version of our recommender; a plugin that can be installed in repositories and journal systems to suggest similar articles. This is a great opportunity to improve the functionality of repositories by unleashing the power of recommendation over a huge collection of open-access documents, currently 37 million metadata records and more than 4 million full-text, available in CORE.
An investigation by Research Support staff at Brunel University London considers the role CORE might play in supporting funder compliance and the wider transition to open scholarship…
In 2001, the Budapest Open Access Initiative (BOAI) brilliantly and simply encapsulated the aspirational qualities of ‘openness’ that funders, scholars, institutions, services and publishers have since driven forward. This simplicity has been lost in the detail of implementing funder mandates over copyright restrictions, resulting in significant administrative overheads to support staff whose primary role is to smoothly progress a cultural change. Although the momentum is undeniable, the transition to open scholarship is now fraught with complexity.
Last week, the CORE team attended the 11th Annual Conference on Open Repositories, an international conference addressed mainly to subject and institutional repository managers, focusing on open access, open data and open science tools, projects and services.
At the conference the team had six submissions:
- A workshop presentation on “How can repositories support the text-mining of their content and why?” where Nancy Pontika explained how repository managers should be supportive of text-mining practices and Petr Knoth described the technical requirements that can enable the text mining of repositories. In addition to that, the CORE team was the workshop organiser, as part of its involvement with the OpenMinTeD project, an EU-funded project on text and data mining. The workshop has been described in two blog posts, one hosted at the OpenMinTeD blog (which includes all workshop presentations), and another post composed by Rebecca Sutton Koeser, a workshop participant.
- A full presentation on “Exploring Semantometrics: full text-based research evaluation for open repositories” by Petr Knoth. The presentation explored semantometrics, a new class of research evaluation metrics, which builds on the premise that full text is needed to assess the value of a publication. (Presentation available here.)
- A 24×7 presentation on the “Implementation of the RIOXX metadata guidelines in the UK’s repositories through a harvesting service”, where Matteo Cancellieri and Nancy Pontika described how the RIOXX metadata guidelines are now a new embedded feature in the CORE Repositories Dashboard. (Presentation slides here.)
- & 5. Two demo presentations during the Developer Track sessions. The first one was on “Mining Open Access Publications in CORE”, where Matteo Cancellieri demonstrated the new CORE API and the second was entitled “Oxford vs Cambridge Contest: Collecting Open Research Evaluation Metrics for University Ranking” where Petr Knoth used the traditional Oxford University vs Cambridge University contest to show how to freely gather and compare the research performance of universities. (The code for both demo presentations is on Github.)
- A poster on the “Integration of the IRUS-UK Statistics in the CORE Repositories Dashboard”, by Samuel Pearce and Nancy Pontika, which showed the process of embedding the existing IRUS-UK statistics service to the CORE Repositories Dashboard. We were delighted also that our poster won the best poster award (yay!). We would like to thank all the conference participants who stopped by our poster, got the CORE freebies and voted for us! (You can access the poster here.)
Based on the fact that this conference has a clear focus on repository services and that the CORE service uses or is being used by these services, we were also extensively mentioned in other presentations as well. For example: Richard Jones in his presentation on Lantern mentioned that the project is using the CORE API; Paul Walk described how CORE is using the RIOXX metadata application profile; the Repositories of the Future panel, organised by COAR, stressed on the importance of the role of aggregators in the repository environment specifically naming CORE; and the “Ideas Challenge”, a thought-provoking and brainstorming group exercise consisting of programmers and repository managers that focused on how to make the lives of academics easier, proposed CORE as a runner up for the development of a cross-repository journal and topic browse interface. Finally, CORE was also presented in the Jisc poster on “Jisc’s Open Access Services”.
* Post updated on June 20th and June 23rd with links to presentations.
In this year’s Open Repositories 2016, an international conference addressed to the scholarly communications community with a focus on repositories, open access, open data and open science, CORE had 6 items accepted; 1 Paper, 1 Workshop, 1 Repository Rave presentation, 1 Poster and 2 showcases in the Developer Track and Ideas Challenge. The titles and summaries of our accepted proposals are:
Paper: Exploring Semantometrics: full text-based research evaluation for open repositories / Knoth, Petr; Herrmannova, Drahomira
Over the recent years, there has been a growing interest in developing new scientometric measures that could go beyond the traditional citation-based bibliometric measures. This interest is motivated on one side by the wider availability or even emergence of new information evidencing research performance, such as article downloads, views and twitter mentions, and on the other side by the continued frustrations and problems surrounding the application of citation-based metrics to evaluate research performance in practice. Semantometrics are a new class of research evaluation metrics which build on the premise that full text is needed to assess the value of a publication. This talk will present the results of an investigation into the properties of the semantometric contribution measure (Knoth & Herrmannova, 2014). We will provide a comparative evaluation of the contribution measure with traditional bibliometric measures based on citation counting. Our analysis also focuses on the potential application of semantometric measures in large databases of research papers.
Back in March, we announced the beta release of the CORE API v2. This added new features such as searching via DOI and retrieving citations for full texts.
This new API should be more reliable and have a higher quality metadata output compared to the old version.
Over the next few months, we aim to finalise the API v2 and finally close access to v1. The scheduled date of the v1 switch off is Monday, 25th April 2016.
We hope that most users have already had an opportunity to test v2 of the API but if not, we suggest that you check out the documentation here.
Every month Samuel Pearce, one of the CORE developers, collects the CORE statistics – perhaps a boring task, but useful for us to know where we stand as a service. A very brief report of the accumulative statistics of all years that CORE operates as a project, 2011 – 2015, are as follows.
Users can retrieve from CORE,
- 25,363,829 metadata records and
- 2,954,141 open access full-text records,
from 689 repositories (institutional and subject) and 5,488 open access journals. In addition, 122 users have access to the CORE API.
In the playful Christmas spirit we attempted this time to have some fun with the statistics.
The CORE (COnnecting REpositories) project aims to aggregate open access research outputs from open repositories and open journals, and make them available for dissemination via its search engine. The project indexes metadata records and harvests the full-text of the outputs, provided that they are stored in a PDF format and are openly available. Currently CORE hosts around 24 million open access articles from 5,488 open access journals and 679 repositories.
Like in any type of partnership, the harvesting process is a two way relationship, were the content provider and the aggregator need to be able to communicate and have a mutual understanding. For a successful harvesting it is recommended that content providers apply the following best practices (some of the following recommendations relate generally to harvesting, while some are CORE specific):
In an effort to improve the quality and transparency of the harvesting process of the open access content and create a two way collaboration between the CORE project and the providers of this content, CORE is introducing the Repositories Dashboard. The aim of the Dashboard is to provide an online interface for repository providers and offer, through this online interface, valuable information to content providers about:
- the content harvested from the repository enabling its management, such as by requesting metadata updates or managing take-down requests,
- the times and frequency of content harvesting, including all detected technical issues and suggestions for improving the efficiency of harvesting and the quality of metadata, including compliance with existing metadata guidelines,
- statistics regarding the repository content, such as the distribution of content according to subject fields and types of research outputs, and the comparison of these with the national average.
In the CORE Dashboard there is a designated page for every institution, where repository managers will be able to add all the information that corresponds to their own repository, such as the institution’s logo, the repository name and email address.
We are very proud to announce that CORE has now released CORE API 2.0. The new API offers new opportunities for developers to make use of the CORE open access aggregator in their applications.
The main new features are:
- Support for looking up articles by a global identifier (DOI, OAI, arXiv, etc.) instead of just CORE ID.
- Access to new resource types, repositories and journals, and organisation of API methods according to the resource type.
- Enables accessing the original metadata exactly as it was harvested from the repository of origin.
- Supports the retrieval of the changes of the metadata as it was harvested by CORE.
- Provides the possibility of retrieving citations extracted from the full-text by CORE.
- Support for batch request for searching, recommending, accessing full-texts, harvesting history, etc.
The goals of the new API also include improving scalability, cleaning up and unifying the API responses and making it easier for developers to start working with it.
The API is implemented and documented using Swagger, which has the advantage that anybody can start playing with the API directly from our online client. The documentation of the API v2.0 is available and the API is currently in beta. Those interested to register for a new API key can do so by completing the online form.