I am currently working with Martin Klein, Matteo Cancellieri and Herbert Van de Sompel on a project funded by the European Open Science Cloud Pilot that aims to test and benchmark ResourceSync against OAI-PMH in a range of scenarios. The objective is to perform a quantitative evaluation that could then be used as evidence to convince data providers to adopt ResourceSync. During this work, we have encountered a problem related to the scalability of ResourceSync and developed a solution to it in the form of an On Demand Resource Dump. The aim of this blog post is to explain the problem, how we arrived to the solution and how the solution works.
An online editing and proofreading company, Scribendi, has recently put together a list of top 21 freely available online databases. It is a pleasure to see CORE listed as Number 1 resource in this list. CORE has been included in this list thanks to its large volume of open access and free of cost content, offering 66 million of bibliographic metadata records and 5 million of full-text research outputs. Our content originates from open access journals and repositories, both institutional and disciplinary and can be accessed via our search engine. In addition, we also offer an API and Datasets for programmable access to this content, enabling the development of new artificial intelligence-based applications for scientists and for carrying out text and data mining of scientific literature.
Last week, the CORE team attended the 11th Annual Conference on Open Repositories, an international conference addressed mainly to subject and institutional repository managers, focusing on open access, open data and open science tools, projects and services.
At the conference the team had six submissions:
- A workshop presentation on “How can repositories support the text-mining of their content and why?” where Nancy Pontika explained how repository managers should be supportive of text-mining practices and Petr Knoth described the technical requirements that can enable the text mining of repositories. In addition to that, the CORE team was the workshop organiser, as part of its involvement with the OpenMinTeD project, an EU-funded project on text and data mining. The workshop has been described in two blog posts, one hosted at the OpenMinTeD blog (which includes all workshop presentations), and another post composed by Rebecca Sutton Koeser, a workshop participant.
- A full presentation on “Exploring Semantometrics: full text-based research evaluation for open repositories” by Petr Knoth. The presentation explored semantometrics, a new class of research evaluation metrics, which builds on the premise that full text is needed to assess the value of a publication. (Presentation available here.)
- A 24×7 presentation on the “Implementation of the RIOXX metadata guidelines in the UK’s repositories through a harvesting service”, where Matteo Cancellieri and Nancy Pontika described how the RIOXX metadata guidelines are now a new embedded feature in the CORE Repositories Dashboard. (Presentation slides here.)
- & 5. Two demo presentations during the Developer Track sessions. The first one was on “Mining Open Access Publications in CORE”, where Matteo Cancellieri demonstrated the new CORE API and the second was entitled “Oxford vs Cambridge Contest: Collecting Open Research Evaluation Metrics for University Ranking” where Petr Knoth used the traditional Oxford University vs Cambridge University contest to show how to freely gather and compare the research performance of universities. (The code for both demo presentations is on Github.)
- A poster on the “Integration of the IRUS-UK Statistics in the CORE Repositories Dashboard”, by Samuel Pearce and Nancy Pontika, which showed the process of embedding the existing IRUS-UK statistics service to the CORE Repositories Dashboard. We were delighted also that our poster won the best poster award (yay!). We would like to thank all the conference participants who stopped by our poster, got the CORE freebies and voted for us! (You can access the poster here.)
Based on the fact that this conference has a clear focus on repository services and that the CORE service uses or is being used by these services, we were also extensively mentioned in other presentations as well. For example: Richard Jones in his presentation on Lantern mentioned that the project is using the CORE API; Paul Walk described how CORE is using the RIOXX metadata application profile; the Repositories of the Future panel, organised by COAR, stressed on the importance of the role of aggregators in the repository environment specifically naming CORE; and the “Ideas Challenge”, a thought-provoking and brainstorming group exercise consisting of programmers and repository managers that focused on how to make the lives of academics easier, proposed CORE as a runner up for the development of a cross-repository journal and topic browse interface. Finally, CORE was also presented in the Jisc poster on “Jisc’s Open Access Services”.