The University of London Computer Centre is now using the CORE Plugin for cross-repository recommendation of similar documents.
http://pubs.ulcc.ac.uk/80/
The University of London Computer Centre is now using the CORE Plugin for cross-repository recommendation of similar documents.
http://pubs.ulcc.ac.uk/80/
KMI and the European Library/Europeana jointly organised the 1st International Workshop on Mining Scientific Publications associated with JCDL 2012 – the most prestigious conference in the world of digital libraries. The workshop was attended by major players in the field including the National Library of Medicine, Library of Congress, CiteSeerX, Elsevier and British Library. Although Barack in the end didn’t come, the workshop was very successful, the only problem being the lack of chairs in the room. We (the workshop organisers – Petr Knoth, KMi; Zdenek Zdrahal, KMI and Andreas Juffinger, The European Library/Europeana) were motivated by the positive response of the community to the importance of issues researchers face when mining research publications to improve the way research is carried out and evaluated.
The CORE project produced a number of tools that can be reused or adapted to solve specific problems. In this blog post, we are going to explain how do we envisage this to happen and describe how can our team assist. Some of the answers were developed during the last Advisory Board meeting that took place on Monday 25th July.
1) Development of subject based repositories as aggregations of content from a set of existing Open Access repositories – the CORE harvesting software can be easily set to perform metadata and content harvesting from any set of OAI-PMH compliant repositories. The fact that CORE provides access to the full-texts enables us to apply different text mining and classification methods to filter the content to be finally presented to the user.
The development of the CORE system has been rapid and we were overcoming issues at a daily basis. It is just now, when the CORE system is fully functional, when we can evaluate the successes and comment on the issues we had to face.
Let us first start with the challenges we were facing to and explain how we addressed them:
Overall, we are glad to say that we were able to recover from all the major issues we have encountered. We found it it extremely useful to develop and test the system on a daily basis using agile development methodologies. The proof of the very active development and involvement of the CORE project team is that today we have already 575 code revisions in our SVN repository since the project start.
The project team has submitted a paper describing CORE to the International Conference on Theory and Practise in Digital Libraries (TPDL 2011) – http://www.tpdl2011.org/ to be held in September in Berlin. This conference is the main scientific forum on digital libraries in Europe. The paper has been accepted and the acceptance rate for this year was 33%.
We provide an overview presentation of the CORE project.
Core presentation View more presentations from petrknothThe first version of the CORE dataset been released yesterday and registered in the Linked Data cloud (http://ckan.net/package/core). The CORE project exposes data about similarities between papers in the Open Access domain. We are providing links to the OAI repository. The similarities are calculated using Natural Language Processing techniques based on the full-text. This distinguishes CORE from other systems, such as Mendeley or MarcXimiL. The similarities are provided only for research articles with an accessible and machine readable full-text.
The CORE Harvester system is now online! Though the system requires more testing and improvements, it is now deployed in its beta version at http://core.kmi.open.ac.uk.
Aims, Objectives and Final Output(s) of the project
The CORE project aims to facilitate the access and navigation to relevant scientific papers distributed in Open Access institutional repositories.
CORE will:
The CORE objectives will be achieved through the development of a CORE architecture which will consists of the following subsystems:
The functionality of the system is demonstrated in the following Figure:
In the first month of the project we have developed a first version of the Harvesting system which has been already tested on the ORO repository which is based on the EPrints system.