The project team has submitted a paper describing CORE to the International Conference on Theory and Practise in Digital Libraries (TPDL 2011) – http://www.tpdl2011.org/ to be held in September in Berlin. This conference is the main scientific forum on digital libraries in Europe. The paper has been accepted and the acceptance rate for this year was 33%.
The first version of the CORE dataset been released yesterday and registered in the Linked Data cloud (http://ckan.net/package/core). The CORE project exposes data about similarities between papers in the Open Access domain. We are providing links to the OAI repository. The similarities are calculated using Natural Language Processing techniques based on the full-text. This distinguishes CORE from other systems, such as Mendeley or MarcXimiL. The similarities are provided only for research articles with an accessible and machine readable full-text.
To demonstrate the capabilities of CORE, we have developed a mobile application for the Android system that allows navigating and downloading content harvested from Open Access repositories. The application is freely available and downloadable from the Android Market. It works both on SmartPhones and table devices. This allows anybody to read their favourite research articles offline when travelling.
It is not even a week to go, until CORE will be presented at the OAI7 Workshop in Geneva. This will provide networking and dissemination opportunities. CORE is now in the phase of heavy testing and bug fixing. We are working towards being able to roll it out to more repositories and are including more data frequently. The poster presents the main objectives of the CORE system and the three applications developed on top of CORE, i.e. CORE Portal, CORE Mobile and the CORE Plugin. While the first two applications are already operation the last one will be released by the end of July. The CORE poster can be downloaded here.
CORE has been accepted for a presentation at the CERN Workshop on Innovations in Scholarly Communication (OAI7). OAI7 ( http://indico.cern.ch/conferenceDisplay.py?confId=103325 ) takes place in June in Geneva. OAI7 is one of the most important events in the Open Access publishing field. It is aimed at those involved in the development of Open Access (OA) repositories and those who can influence the direction of developments either within their institution, their country or at an international level – that includes technical developers of OA bibliographic databases and connected services, research information policy developers at university or library level, funding bodies concerned with access to the results of their research, OA publishers,and influential researchers keen to lead OA developments in their own field.
In the last weeks we have invested a significant effort in the development of CORE administration tools. These tools will allow an easier maintainance and analysis of the metadata and full-text content flow from Open Access repositories to CORE. It will also enable the inclusion of more Open Access repositories in the future. The approach we are taking is that after the end of the project we want all the regular maintenance tasks to be performed directly from the user interface without requiring the administrator to have any knowledge of the source code. We believe that this is an important step towards sustainability of CORE.
The CORE project will release the provided metadata and software in the following way:
– The metadata will be released under the Creatives Commons attribution license (CC BY). This license lets others distribute, remix, tweak, and build upon our work, even commercially, as long as they credit CORE for the original creation. This is the most accommodating of licenses offered. Recommended for maximum dissemination and use of licensed materials.
– The software will be offered under the New BSD License or similar.
– The provided service will be offered free-of charge to everybody.
The CORE Harvester system is now online! Though the system requires more testing and improvements, it is now deployed in its beta version at http://core.kmi.open.ac.uk.
The CORE system currently relies on the following technologies (this blog post will be updated to keep the information current):
– OCLC OAIHarvester2 – a set of Java classes for the OAI-PMH metadata harvesting
– J2EE and Spring libraries for the development of the web based interface of the application
– Apache Lucene – for the indexing of the metadata and full-text documents
– Apache Tika – for the extraction of text from pdf documents
– Sesame – as a triple store for exposing the extracted triples
– MySQL – as a backend for Sesame and the Harvester application