Project Plan

Aims, Objectives and Final Output(s) of the project

The CORE project aims to facilitate the access and navigation to relevant scientific papers distributed in Open Access institutional repositories.

CORE will:

Release a new open metadata collection in the Linked Data format describing the semantic relations between resources stored across a selection of UK institutional repositories. The project will assign dereferenceable URIs to all resources in the collection and will make them publicly available.
Develop a web-service reusable by other Open Access repositories and a demonstrator tool for the Open Research Online (ORO) repository.
Develop good practice for the uptake of the provided repository and service in collaboration with the Directory of Open Access Repositories (OpenDOAR) and UKOLN.

The CORE objectives will be achieved through the development of a CORE architecture which will consists of the following subsystems:

Content Harvester – system for harvesting metadata and full-text content from institutional repositories and indexing
Relation Analyzer – system for the discovery of semantic relations between full-text articles
RDF Publisher – system for publishing the results in RDF with its associated services (demonstrator)

The functionality of the system is demonstrated in the following Figure:

In the first month of the project we have developed a first version of the Harvesting system which has been already tested on the ORO repository which is based on the EPrints system.

Risk Analysis and Success Plan
There is a number of challenge CORE will face during its execution. The main two challenges are:

Noisy data – the harvested data are often noisy. It has been estimated that only about 10% of the records contain pdf. The content of a pdf does not necessarily have to be a research paper.
Scalability – discovering semantic relations between papers requires in the extreme case checking all pairs of articles. This is fine for small repositories, but becomes prohibitively expensive for a large and distributive system. CORE will work towards the development of a scalable system that is capable of recognizing which pairs of articles are likely to be related and will perform their analysis later.

The success of the system would be marked by the integration of the system in ORO and to other institutional repositories.

IPR

The developed software will be licensed under the New BSD Open Source license and the metadata exposed will be made available in the Linked Data format under Creative Commons or PDDL license and will be regularly updated.

Project Team Relationships and End User Engagement

Zdenek Zdrahal (Project Director) is a Senior Research Fellow at the Knowledge Media Institute. His research interests include knowledge modelling and management, reasoning, knowledge based system in learning, engineering design, and Web technology. He has been PI in a number of national and European projects, including TINY-IN, SILVER (EPSRC), Clockwork, Cipher, Eurogene, Tech-IT-Easy (EU funded).

Owen Stephens (Project Manager) for the CORE project. He joined the Open University in 2009. He is currently Project Manager for the JISC funded LUCERO project and was previously Project Manager for the JISC funded TELSTAR (Technology enhanced learning supporting students to achieve academic rigour) project delivered at the Open University. Owen also works as an independent consultant to the library sector. He has been on the management team of the library services of two leading UK Universities, he has been responsible for a number of innovative projects at both institutional and national levels. Owen was Project Director for the EThOSNet project to launch national e-theses service based at the British Library, and is the founder of the ‘Mashed Libraries’ events in the UK.

Petr Knoth is the main system architect and developer of CORE. Petr is a researcher in KMi focusing on various topics in natural language processing and information retrieval. His particular interests lie in methods that can automatically link related parts of documents in large digital libraries and semantically type the relationships based on discourse characteristics. He has been involved in four European Commission funded projects (KiWi, Eurogene, Tech-IT-EASY and DECIPHER) and has a number of publications at international conferences based on this work.

Annika Wolff has worked on several KMi projects. She was the main researcher on the MGT, Tiny-in and SILVER projects and has a number of published papers resulting from this work. Her research interests include knowledge modeling and narrative hypermedia with a particular interest in using narrative to support inquiry-based learning from multimedia resources.

Project Timeline

Budget