The first version of the CORE dataset been released yesterday and registered in the Linked Data cloud (http://ckan.net/package/core). The CORE project exposes data about similarities between papers in the Open Access domain. We are providing links to the OAI repository. The similarities are calculated using Natural Language Processing techniques based on the full-text. This distinguishes CORE from other systems, such as Mendeley or MarcXimiL. The similarities are provided only for research articles with an accessible and machine readable full-text.
At the moment we expose more than 3 million RDF triples describing similarities calculated on a set of more than 50,000 full-text articles harvested from British Open Access repositories. In the future we want harvest information and content from as many Open Access Repositories as possible. At the moment there are more than 1,900 of them and we are processing content from only 143 British repositories. We aim at processing all full-text articles available online and making information about record similarities available in a machine readable format. As a result, the number of the RDF triples in our store is likely to grow significantly. Have a look at the data description at http://core-project.kmi.open.ac.uk/node/13#overlay=node/13 or check some example queries at http://core.kmi.open.ac.uk:8081/COREWeb/example-queries to see what is available.