CORE Ambassador: Nick Sheppard

Nick SheppardNick has worked in scholarly communications for over 10 years, currently as Open Research Advisor at the University of Leeds. Previously he was Research Services Advisor at Leeds Beckett University. Nick is interested in effective dissemination of research through sustainable models of open access, including underlying data, and potential synergies with open education and Open Educational Resources (OER), particularly underlying technology, software and interoperability of systems.

Q: What does Open Access means to you?
A:
We live in the age of information where the world’s knowledge should be immediately and easily accessible to the majority of humanity. Instead much primary research is restricted to those that can afford it, whether to read under traditional subscription models or, under an APC based model, to publish at all. Meanwhile fake news is propagated freely with potentially disastrous consequences for our democracy, our ecology and global equality. Sustainable and affordable open access to research is essential for a well informed global population, the first step to building a better society. 
With equity as the theme of this year’s Open Access week we will be exploring issues of equality including gender imbalance within the academy and how our University’s research can better benefit the Global South. Early plans include a gender analysis of Leeds research outputs and a Wikimedia editathon focussing on women scientists and encouraging researchers of all genders to properly cite Wikipedia with open access research. read more...

CORE Ambassador: Gloria Kadyamatimba

GloriaGloria is a lecturer in the Centre for Language and Communication Studies, Institute of Lifelong Learning and Development Studies at Chinhoyi University of Technology in Zimbabwe. She has special responsibility for coordinating the Information Literacy Skills component of the Communication Skills module. She is a former Library Director at the same institution.

Q: What does Open Access mean to you?
A:
Open access means unlimited access to research materials and tools to publicise research and make it more visible to a wider audience. Open access means knowing  the research others are carrying out and making one’s research known to others.
In the past the Library was on the forefront of celebrating OA week. The  celebrations entailed having seminars with speakers from the Library and other experts from around the country. read more...

CORE Ambassador: George Macgregor

George MacgregorGeorge is an Institutional Repository Co-ordinator at the University of Strathclyde. His interests and expertise are in structured open data, especially within repositories and semantic web contexts, information retrieval, distributed digital repositories and human-computer interaction.

Q: What does Open Access mean to you?
A
: Aside from the usual reasons why Open Access is important, I like to remember that Open Access is about resource discovery. It is about cracking open the sum total of human knowledge in a way that machines can understand and, by extension, providing it in a way which enables users to find scholarly content more easily and, of course, in an unrestricted way.
International Open Access Week is approaching soon but, to be honest, we don’t tend to have plans for Open Access week because at Strathclyde every week is Open Access week! I think there might be quite a few UK institutions that operate in a similar way. In the UK we are fortunate that there is a powerful regulatory aspect to the REF2021 Open Access Policy which ensures researchers take better notice of the open science agenda. read more...

CORE Ambassador: David Walters

David Walters, Brunel UniversityDavid is the Open Access Officer at Brunel University London based within the Scholarly Communication & Rights Management team. He is an advocate of OA publishing, and of building services that realise the movement within local institutional communities. David has spoken at UKSG, NASIG, RLUK and Altmetric conferences about this topic in recent years. David is an ambassador for the CORE service.

Q: What does Open Access mean to you?
A: To us at Brunel, Open Access means many things – ideologically and practically. Most importantly, we consider Open Access to research output a critical, underpinning component on the journey toward an ‘Open Science’ world. Open Science encompasses many areas, aiming to enhance scientific and educational sectors.
As with many institutions, at Brunel we operate local OA services for our community, within an ever-growing landscape of technological and policy drivers. Open Access means creating an environment that supports policy drivers, whilst advantaging new technologies for our community as they emerge.
Much progress is being driven by these factors. However, it is as important to foster discussion and leadership amongst research communities. Open Access means researchers and students shaping and leading their subjects into new forms of science communication and practice.
At Brunel our role in supporting Open Access is to:
– Engage and inform our community about these issues as they evolve
– Build and tailor services to our community’s needs
– Recognise and celebrate ‘open’ activity by our researchers in all its forms read more...

CORE Ambassador: Milica Sevkusic

MilicaMilica is a librarian at the Institute of Technical Sciences of the Serbian Academy of Sciences and Arts since 2007. Her education background is in art history and her previous work experience includes heritage policies and documentation standards, heritage-related civil society projects and digitisation, traditional librarianship and bibliography. Currently, her professional interests focus on Open Science, library services aimed at supporting research activities, training on academic services and tools, information literacy and research ethics. Since November 2014, she has been serving as the EIFL Open Access country coordinator in Serbia. In this capacity, she designed and coordinated the project – Revisiting open access journal policies and practices in Serbia, which was implemented with EIFL’s support in 2016–2017. She has also been involved with institutional repositories since 2013, when her affiliated institution implemented the first fully functional institutional repository in Serbia. She is now a member of the Repository Development Team at the University of Belgrade Computer Centre, which is currently the leading force in repository development in Serbia.
read more...

Increasing the Speed of Harvesting with On Demand Resource Dumps

 

I am currently working with Martin Klein, Matteo Cancellieri and Herbert Van de Sompel on a project funded by the European Open Science Cloud Pilot that aims to test and benchmark ResourceSync against OAI-PMH in a range of scenarios. The objective is to perform a quantitative evaluation that could then be used as evidence to convince data providers to adopt ResourceSync. During this work, we have encountered a problem related to the scalability of ResourceSync and developed a solution to it in the form of an On Demand Resource Dump. The aim of this blog post is to explain the problem, how we arrived to the solution and how the solution works.

The problem

One of the scenarios we have been exploring deals with a situation where the resources to be synchronised are metadata files of a small data size (typically from a few bytes to several kilobytes). Coincidentally, this scenario is very common for metadata in repositories of academic manuscripts, research data (e.g. descriptions of images), cultural heritage, etc.

The problem is related to the issue that while most OAI-PMH implementations typically deliver 100-1000 responses per one HTTP request, ResourceSync is designed in a way that requires resolving each resource individually. We have identified and confirmed by testing that for repositories with larges numbers of metadata items, this can have a very significant impact on the performance of harvesting, as the overhead of the HTTP request is considerable compared to the size of the metadata record.

More specifically, we have run tests over a sample of 357 repositories. The results of these tests show that while the speed of OAI-PMH harvesting ranges from 30-520 metadata records per second, depending largely on the repository platform, the speed of harvesting by ResourceSync is somewhere in the range of only 4 metadata records per second for harvesting the same content using existing ResourceSync client/server implementations and sequential downloading strategy. We are preparing a paper on this, so I am not going to disclose the exact details of the analysis at this stage.

As ResourceSync has been created to overcome many of the problems of OAI-PMH, such as:

  • being too flexible in terms of support for incremental harvesting, resulting in inconsistent implementations of this feature across data providers,
  • some of its implementations being unstable and less suitable for exchanging large quantities of metadata and
  • being only designed for metadata transfer, omitting the much needed support for content exchange

it is important that Resource Sync performs well under all common scenarios, including the one we are dealing with.

Can Resource Dumps be the solution?

An obvious option for solving the problem that is already offered by ResourceSync are Resource Dumps. While a Resource Dump can speed up harvesting to levels far exceeding those of OAI-PMH, it creates some considerable extra complexity on the side of the server. The key problem is that it creates the necessity to periodically package the data as a Resource Dump, which basically means running a batch process to produce a compressed (zip) file containing the resources.

The number of Resource Dumps a source needs to maintain is equal to the number of Capability Lists it maintains times the size of the Resource Dump Index. The minimum practical operational size of a Resource Dump Index is 2. This is to ensure we don’t remove a dump currently being downloaded by a client during the creation of a new dump. As we have observed that a typical repository may contain about 250 OAI-PMH sets (Capability Lists in the ResourceSync terminology), this implies the need for a significant data duplication and requirements on period creation of Resource Dumps if a source chose to use Resource Dumps as part of the harvesting process.

On Demand Resource Dumps

To deal with the problem, we suggest an extension of ResourceSync that will support the concept of an On Demand Resource Dump. An On Demand Resource Dump is a Resource Dump which is created, as the name suggests, whenever a client asks for it. More specifically, a client can scan through the list of resources presented in a Resource List or a Change List (without resolving them individually) and request from the source to package any set of the resources as a Resource Dump. This approach speeds up and saves processing on the side of both the source as well as the client. Our initial tests show that this enables ResourceSync to perform as well as OAI-PMH in the metadata only harvesting scenario when requests are sent sequentially (the most extreme scenario for ResourceSync). However, as ResourceSync requests can be parallelised, as opposed to OAI-PMH (due to the reliance of OAI-PMH on the resumption token), this makes ResourceSync a clear winner.

In the rest of this post, I will explain how this works and how it could be integrated with the ResourceSync specification.

There are basically 3 steps:

  1. defining that the server supports an on-demand Resource Dump,
  2. sending a POST request to the on-demand dump endpoint and
  3. receiving a response from the server that 100% conforms to the Resource Dump specification.

I will first introduce steps 2 and 3 and then I will come back to step 1.

Step 2: sending a POST request to the On Demand dump endpoint

We have defined an endpoint at https://core.ac.uk/datadump . You can POST it a list of resource identifiers (which can be discovered in a Resource List). In the example below, I am using curl to send it a list of resource identifiers in JSON which I want to get resolved. Obviously, the approach is not limited to JSON, it can be used for any resource listed in a Resource List regardless of its type. Try it by executing the code below in your terminal.

curl -d ‘[“https://core.ac.uk/api-v2/articles/get/42138752″,”https://core.ac.uk/api-v2/articles/get/32050″]‘ -H “Content-Type: application/json” https://core.ac.uk/datadump -X POST > on-demand-resource-dump.zip read more...

CORE visits Ethiopia and participates in an Open Science training session

=&0=&

In June 2017, EIFL invited the global open access full text aggregator CORE to take part in an Open Science train-the-trainer course for universities and research institutions in EIFL partner countries.

Watch the videos recorded during the workshop and read more

Solomon Mekonnen – Open Access Ethiopia 

Zaituni Kaijage – Open Access Tanzania

Dr Roshan Karn – Open Access Nepal

Dr Manisha Dhakal – Open Access Nepal

Simon Osei – Open Access Ghana

Gloria Kadyamatimba – Open Access Zimbabwe

It was a great experience travelling to Addis Ababa and a big thanks to the workshop host, Library of the University of Addis Ababa (Mesfin Gezehagn, Solomon Mekonnen and Girma Aweke) for their hospitality. It was also great to meet the trainers participating in the workshop, from Ghana (Lucy Adjoa Dzandu, Simon Kwame Osei, Benjamin Yao Folitse), Nepal (Dr Manisha Dhakal and Dr Roshan Kumar Karn), Tanzania (Zaituni Kokujona Kaijage, Paul Samwel Muneja, Bwire Wilson Bwire) and Zimbabwe (Gloria Kadyamatimba).

 

Implementing the CORE Recommender in Strathprints: a “whitehat” improvement to promote user interaction

by George Macgregor, Institutional Repository Coordinator, University of Strathclyde

This guest blog post briefly reviews why the CORE Recommender was quickly adopted on Strathprints and how it has become a central part of our quest to improve the interactive qualities of repositories.

Back in October 2016 my colleagues at the CORE Team released their Recommender plugin. The CORE Recommender plugin can be installed on repositories and journal systems to recommend similar scholarly content. On this very blog, Nancy Pontika, Lucas Anastasiou and Petr Knoth, announced the release of the Recommender as a:

…great opportunity to improve the functionality of repositories by unleashing the power of recommendation over a huge collection of open-access documents, currently 37 million metadata records and more than 4 million full-text, available in CORE*.
(* Note from CORE Team: the up-to-date numbers are 80,097,014 metadata and 8,586,179 full-text records.).

When the CORE Recommender is deployed a repository user will find that as they are viewing an article or abstract page within the repository, they will be presented with recommendations for other related research outputs, all mined from CORE. The Recommender sends data about the item the user is visiting to CORE. Such data include any identifiers and, where possible, accompanying metadata. The CORE response to the repository then delivers CORE’s content recommendations and a list of suggested related outputs are presented to the user in the repository user interface. The algorithm used to compute these recommendations is described in the original CORE Recommender blog post but is ultimately based on content-based filtering, citation graph analysis and analysis of the semantic relatedness between the articles in the CORE aggregation. It is therefore unlike most standard recommender engines and is an innovative application of open science in repositories.

Needless to say, we were among the first institutions to proudly implement the CORE Recommender on our EPrints repository. The implementation was on Strathprints, the University of Strathclyde’s institutional repository, and was rolled out as part of some wider work to improve repository visibility and web impact. The detail of this other work can be found in a poster presented at the 2017 Repository Fringe Conference and

a recent blog post read more...

CORE listed Number 1 in the list of top 21 free online journal and research databases

Image from the Scribendi website, 101 Free Online Journal and Research Databases for Academics.

An online editing and proofreading company, Scribendi, has recently put together a list of top 21 freely available online databases

. It is a pleasure to see CORE listed as Number 1 resource in this list. CORE has been included in this list thanks to its large volume of open access and free of cost content, offering 66 million of bibliographic metadata records and 5 million of full-text research outputs. Our content originates from open access journals and repositories, both institutional and disciplinary and can be accessed via our read more...

CORE’s open access and text mining services – 2016 growth (or, how about them stats – 2016 edition)

The past year has been productive for the CORE team; the number of harvested repositories and our open access content, both in metadata and full-text, has massively increased. (You can see last year’s blog post with our 2015 achievements in numbers here.)

There was also progress with regards to our services; the number of our API users was almost doubled in 2016, we have now about 200 registered CORE Dashboard users, and this past October we released a new version of our recommender and updated our dataset.

Around this time of the year, the joyful Christmas spirit of the CORE team increases along with our numbers.  Thus, we decided to recalculate how far are the CORE research outputs – if we had printed them – from reaching the moon (last year we made it to 1/3 of the way).

We are thrilled to see that this year we got CORE even closer to the moon! We would also like to thank all our data providers, who have helped us reaching this goal.

Fear not, we will never print all our research outputs, we believe that their mission is to be discoverable on the web as open access. Plus we love trees.

Merry Christmas from the CORE Team!

* Note: Special thanks to Matteo Cancellieri for creating the CORE graphics.