The last 10 years have seen a massive increase in the amounts of Open Access publications available in journals and institutional repositories. The open presence of large volumes of state-of-the-art knowledge online has the potential to provide huge savings and benefits in many fields. However, in order to fully leverage this knowledge, it is necessary to develop systems that (a) make it easy for users to discover, explore and access this knowledge at the level of individual resources, (b) explore and analyse this knowledge at the level of collections of resources and (c) provide infrastructure and access to raw data in order to lower the barriers to the research and development of systems and services on top of this knowledge. The CORE system is trying to address these issues by providing the necessary infrastructure.
The University of London Computer Centre is now using the CORE Plugin for cross-repository recommendation of similar documents.
KMI and the European Library/Europeana jointly organised the 1st International Workshop on Mining Scientific Publications associated with JCDL 2012 – the most prestigious conference in the world of digital libraries. The workshop was attended by major players in the field including the National Library of Medicine, Library of Congress, CiteSeerX, Elsevier and British Library. Although Barack in the end didn’t come, the workshop was very successful, the only problem being the lack of chairs in the room. We (the workshop organisers – Petr Knoth, KMi; Zdenek Zdrahal, KMI and Andreas Juffinger, The European Library/Europeana) were motivated by the positive response of the community to the importance of issues researchers face when mining research publications to improve the way research is carried out and evaluated.
We have recently released a new mobile application for Apple devices. The application entitled CORE Research Mobile is freely available from the iTunes store.
The release of the application has been independently announced also by Gary Price on infodocket.com .
Over the last year, we have worked towards increasing the amount of metadata and full-text content in the aggregation and also on improving the updating frequency of the system. However, increasing the volume of content created also a higher demand on the efficiency of processing, maintaining and exposing the content. In the last three months, we have been optimising the CORE system to improve the parallelisation of processes the CORE system performs. These are namely: downloading and parsing large metadata description files, downloading pdf files from multiple sources, converting pdf files to text, extracting citation information from full-texts, recognising citation targets, discovering semantically related resources, indexing. All these processes have been optimised to allow a relatively even distribution of load across many parallel threads. This task was in our view very important and required significant development effort, but was definitely worth it! A typical CORE repository processing activity will be in our hardware environment distributed among 144 threads (24 processors each with 6 cores). The optimised system enables us to continue adding more repositories into the CORE aggregation and will also helps us to keep content in CORE fresh.
CORE received as part of the release of version 0.7 a brand new design (http://core.kmi.open.ac.uk). The new design should be more user friendly. We have also added more information about CORE on the portal. We hope you will like it!
The CORE team is organising a workshop collocated with JCDL 2012, a major conference in the field of digital libraries. Our proposal to organise the 1st International Workshop on Mining Scientific Publications was accepted. The aim of the workshop is to bring together researchers, digital library developers and practitioners from government and industry to address the current challenges in the field of mining scientific publications and building the necessary infrastructure to support this. The topics of the workshop are directly related to the work carried out in both SreviceCORE and DiggiCORE projects and are available on the workshop website.
The ServiceCORE team has now moved to an agile development lifecycle with a 2 weeks long release period. What is available in the new release that has just been published?
– A new advanced search facility.
– Search snippets available on the results page. Snippets created from the resource ful-text where available.
– The system supports citation extraction (available for newly processed resources) and displays references mined from the article full-texts. CORE also provides direct links to them, if they are held in our repository – http://core.kmi.open.ac.uk/display/41214)
– A new document preview feature
The ServiceCORE project is addressing a wide range of problems caused by the rapid increase of Open Access scientific papers stored across UK institutional repositories. These problems include:- The difficulty of accessing real, full-text data from these distributed sources efficiently.
- The difficulty of generating data statistics (size, growth, subjects).
- The difficulty of searching, organising and navigating this distributed information.
- The difficulty of analysing the data.
- The difficulty of repurposing and reusing the data in other applications.
- The difficulty of building services on top of the UK Repository Infrastructure
The ServiceCORE project is responding to these challenges by developing a nation-wide aggregation service for content stored across UK Open Access Repositories. The CORE system is not only harmonising the access to the UK repository content, but it is also processing the full-text-content using text-mining methods to enrich the existing metadata. This includes extraction of citation information, recommendation of similar content etc.