What does Google do?

This is the second post in a series about the issues CORE has encountered trying to harvest (and build services on) metadata and fulltext items from UK HE research repositories. The first post “Finding fulltext” looked at the problems of harvesting fulltext due to variations in how links are made (or not) from metadata records to fulltext content.

In this post I want to consider the question of what services like CORE are allowed or permitted to do with repository content. A third post will then describe some of the solutions to the various challenges we see.


Finding fulltext

In order to be able to provide the search functions, similarity measures and other functionality CORE harvests both metadata and fulltext items from repositories. This raises questions about whether we are allowed to harvest metadata or fulltext items, and if so what are we allowed to do with them once we have harvested them. In the first phase of CORE we relied on OAI-PMH to harvest metadata, and then used links from the harvested records to try to discover the related fulltext item.

This is the first in a series of blog posts looking at these issues, the problems we’ve encountered and the solutions we have put in place (so far). In this post I’m going to focus on the question of finding fulltext items from the metadata. This wasn’t always straightforward. Not all repositories link to fulltext records from the metadata in the same way, and in many cases there is no direct link from the metadata to the fulltext reocrds, but rather a link to the repositories webpage for the record, rather than to the full text.