CORE has released a BETA version of the CORE Discovery tool, which offers a one-click access to free copies of research papers whenever you might hit a paywall.
Our free CORE Discovery service provides you with:
Highest coverage of freely available content. Our tests have shown CORE Discovery finding more free content than any other discovery system.
Free service for researchers by researchers. CORE Discovery is the only free content discovery extension developed by researchers for researchers. There is no major publisher or enterprise controlling and profiting from your usage data.
Best grip on open repository content. Due to CORE being a leader in harvesting open access literature, CORE Discovery has the best grip on open content from open repositories as opposed to other services that disproportionately focus only on content indexed in major commercial databases.
Repository integration and discovering documents without a DOI. The only service offering seamless and free integration into repositories. CORE Discovery is also the only discovery system that can locate scientific content even for items with an unknown DOI or which do not have a DOI.
The tool is available as:
A browser extension for researchers and anyone interested in reading scientific documents
Plugin for repositories, enriching metadata only pages in repositories with links to freely available copies of the paper
API for developers and third party services
If you are interested in the CORE Discovery plugin do get in touch.
CORE receives Vannevar Bush Best Paper Award
The CORE team has also won the Vannevar Bush Best Paper Award at JCDL 2019, one of the most highly recognised digital libraries conference in the world, for our work on analysing how soon authors deposit into repositories, which was driven by CORE data. A blog post about this is already available. read more...
For yet another year (see previous years 2016, 2015) CORE has been really productive; the number of our content providers has increased and we have now more open access full text and metadata records than ever.
Our services are also growing steadily and we would like to thank the community for using the CORE API and CORE Datasets.
CORE is continuously growing. This month we have reached 75 million metadata and 6 million full of text scientific research articles harvested from both open access journals and repositories. This past February we reported 66 million metadata and 5 million full text articles, while at the end of December 2016 we had just over 4 million full text. This shows our continuous commitment to bring to our users the widest possible range of Open Access articles.
To celebrate this milestone, we gathered the knowledge of our data scientists, programmers, researchers, and designers to illustrate our portion of metadata and full text with a less traditional (sour apple) “pie chart”. read more...
. It is a pleasure to see CORE listed as Number 1 resource in this list. CORE has been included in this list thanks to its large volume of open access and free of cost content, offering 66 million of bibliographic metadata records and 5 million of full-text research outputs. Our content originates from open access journals and repositories, both institutional and disciplinary and can be accessed via our read more...
CORE is thrilled to announce that it currently provides 5 millions of open access full-text papers.
“In the last year, we have managed to scale up our harvesting process. This enabled us to significantly increase the amount of open access content we can offer to our users. With more and more open access content being made available by data providers, thanks to recent open access policies, CORE now also captures and provides access to a higher percentage of global research literature ”, says CORE’s founder, Dr Petr Knoth.
With 66 million metadata records and 5 million full-text, from 102 countries, in 52 different languages, CORE becomes now the world’s largest full-text open access aggregator. CORE embraces the vibrant collections of both institutional and disciplinary repositories, while its large volume of scholarly outputs ranges from scientific research papers, to grey literature and from Master’s to Doctoral thesis. In addition, it is a metasearch for the all the open access peer-reviewed scientific journal articles published in open access journals. read more...
The past year has been productive for the CORE team; the number of harvested repositories and our open access content, both in metadata and full-text, has massively increased. (You can see last year’s blog post with our 2015 achievements in numbers here.)
There was also progress with regards to our services; the number of our API users was almost doubled in 2016, we have now about 200 registered CORE Dashboard users, and this past October we released a new version of our recommender and updated our dataset.
Around this time of the year, the joyful Christmas spirit of the CORE team increases along with our numbers. Thus, we decided to recalculate how far are the CORE research outputs – if we had printed them – from reaching the moon (last year we made it to 1/3 of the way).
We are thrilled to see that this year we got CORE even closer to the moon! We would also like to thank all our data providers, who have helped us reaching this goal.
Fear not, we will never print all our research outputs, we believe that their mission is to be discoverable on the web as open access. Plus we love trees.
Merry Christmas from the CORE Team!
* Note: Special thanks to Matteo Cancellieri for creating the CORE graphics.
=&0=&This post was authored by Nancy Pontika, Lucas Anastasiou and Petr Knoth.
The CORE team is thrilled to announce the release of a new version of our recommender; a plugin that can be installed in repositories and journal systems to suggest similar articles. This is a great opportunity to improve the functionality of repositories by unleashing the power of recommendation over a huge collection of open-access documents, currently 37 million metadata records and more than 4 million full-text, available in CORE.
Recommender systems and the CORE Plug-In
Typically, a recommender tracks a user’s preferences when browsing a website and then filters the user’s choices suggesting similar or related items. For example, if I am looking for computer components at Amazon, then the service might send me emails suggesting various computer components. Amazon is one of the pioneers of recommenders in the industry being one of the first adopters of item-item collaborative filtering (a method firstly introduced in 2001 by Sarwar et al. in a highly influential scientific paper of modern computer science).
Over the years, many recommendation methods and their variations have been suggested, evaluated both by academia and industry. From a user’s perspective, recommenders are either personalised, recommendations targeted to a particular user, based on the knowledge of the user’s preferences or past activity, or non-personalised, recommending the same items to every user.
From a technological perspective, there are two important classes of recommender systems: collaborative filtering and content based filtering.
1. Collaborative filtering (CF):
Techniques in this category try to match a user’s expected behaviour over an item according to what other users have done in the past. It starts by analysing a large amount of user interactions, ratings, visits and other sources of behaviour and then builds a model according to these. It then predicts a user’s behaviour according to what other similar users – neighbour users – have done in the past – user-based collaborative filtering.
The basic assumption of CF is that a user might like an unseen item, if it is liked by other users similar to him/her. In a production system, the recommender output can then be described as, for example, ‘people similar to you also liked these items.’
These techniques are now widely used and have proven extremely effective exploratory browsing and hence boost sales. However, in order to work effectively, they need to build a sufficiently fine-grained model providing specific recommendations and, thus, they require a large amount of user-generated data. One of the consequences of insufficient amount of data is that CF cannot recommend items that no user has acted upon yet, the so called cold-items. Therefore, the strategy of many recommender systems is to expose these items to users in some way, for example either by blending them discretely to a home page, or by applying content-based filtering on them decreasing in such way the number of cold-items in the database.
While CF can achieve state-of-the-art quality recommendations, it requires some sort of a user profile to produce recommendations. It is therefore more challenging to apply it on websites that do not require a user sign-on, such as CORE.
2. Content-based filtering (CBF)
CBF attempts to find related items based on attributes (features) of each item. These attributes could be, for example the item’s name, description, dimensions, price, location, and so on.
For example, if you are looking in an online store for a TV, the store can recommend other TVs that are close to the price, screen size, and could also be similar – or the same – brand, that you are looking for, be high-definition, etc. The advantage of content-based recommendations is that they do not suffer from the cold-start problem described above. The advantage of content-based filtering is that it can be easily used for both personalised and non-personalised recommendations.
The CORE recommendation system
There is a plethora of recommenders out there serving a broad range of purposes. At CORE, a service that provides access to millions of research articles, we need to support users in finding articles relevant to what they read. As a result, we have developed the CORE Recommender. This recommender is deployed within the CORE system to suggest relevant documents to the ones currently visited.
In addition, we also have a recommender plugin that can be installed and integrated into a repository system, for example, EPrints. When a repository user views an article page within the repository, the plugin sends to CORE information about the visited item. This can include the item’s identifier and, when possible, its metadata. CORE then replies back to the repository system and embeds a list of suggested articles for reading. These actions are generated by the CORE recommendation algorithm.
How does the CORE recommender algorithm work?
Based on the fact that the CORE corpus is a large database of documents that mainly have text, we apply content-based filtering to produce the list of suggested items. In order to discover semantic relatedness between the articles in our collection, we represent this content in a vector space representation, i.e. we transform the content to a set of term vectors and we find similar documents by finding similar vectors.
The CORE Recommender is deployed in various locations, such as on the CORE Portal and in various institutional repositories and journals. From these places, the recommender algorithm receives information as input, such as the identifier, title, authors, abstract, year, source url, etc. In addition, we try to enrich these attributes with additional available data, such as citation counts, number of downloads, whether the full-text available is available in CORE, and more related information. All these form the set of features that are used to find the closest document in the CORE corpus.
Of course not every attribute has the same importance as others. In our internal ranking algorithm we boost positively or negatively some attributes, which means that we weigh more or less some fields to achieve better recommendations. In the case of the year attribute, we go even further, and apply a decay function over it, i.e. recent articles or articles published a couple of years ago get the same boosting (offset), while we reduce the importance of older articles by 50% every N years (half-life). In this way recent articles retain their importance, while older articles contribute less to the recommendation results.
Someone may ask:
how do you know which weight to put in each field you are using? How did you come up with the parameters used in the decay function?read more...
In 2001, the Budapest Open Access Initiative (BOAI) brilliantly and simply encapsulated the aspirational qualities of ‘openness’ that funders, scholars, institutions, services and publishers have since driven forward. This simplicity has been lost in the detail of implementing funder mandates over copyright restrictions, resulting in significant administrative overheads to support staff whose primary role is to smoothly progress a cultural change. Although the momentum is undeniable, the transition to open scholarship is now fraught with complexity.
We are now three months into HEFCE’s open access policy. The urgency surrounding compliance requirements has, in some ways, been a useful tool in embedding good practice in the minds of our research staff. The emphasis has also ensured the speedy public dissemination of accepted research papers. Like most institutions, we have found that technical restrictions have limited our implementation of the policy to an intense institutional focus on internal compliance through our local repository. As a result, such internally driven workflows do not reflect the breadth of engagement with open scholarship or fully realise author compliance with HEFCE’s policy.
Brunel University London assessed their entire research outputs portfolio against the data services of CORE. The assumption prior to undertaking this task was that the global force of the open access movement would highlight the emergent open cultures across disciplines – a view currently inaccessible to request driven institutional services – and this may reveal duplication of effort for both academic and support staff. We share some of our initial discoveries here.
The transition towards open scholarship vs. policy drivers
Research is by nature collaborative. Authors share a responsibility in disseminating their outputs as widely as possible. As an institutional service, our role is to provide expert advice and open dissemination options for our authors, which are tailored to meet the needs of local research communities. We recognise our role as one part of much wider and far reaching landscape.
To this end, our systems deployment centres around our research information management system, Symplectic Elements. The system records our institution’s portfolio of research output; identifying, collating and relating bibliographic and other records from the myriad of services across the scholarly landscape. It drives our international impact through public web profiles. Crucially it enables academics to easily push research papers through to our institutional repository (Brunel University Research Archive) for open access dissemination.
There is a recognition in HEFCE’s open access policy for the wide, varying and longstanding use of external repositories, although responsibility for managing the REF submission process lies at the institutional level. It is no small point that many of these have a special subject based significance, often conceptualised and implemented by scholars themselves. Despite this support, a lack of visibility or control of these systems has necessarily informed a ‘one-size’, compliance-driven workflow, which has been imposed on academics regardless of their use of subject repositories, or the engagement of collaborators with their own local repositories.
External repository systems now represent an unacceptable risk of non-compliance to an institution. It is data we too often cannot see and obviously cannot control. This position does not consider the suitability of the platform or the choice of the author in how they disseminate their paper. We are no longer enhancing the natural, researcher-driven workflows that are continuing to emerge across a huge and growing array of platforms, services and tools.
There is now a danger of alienating through bureaucracy those authors already committed to the cause and readily engaged in open practice, whilst simultaneously creating a culture of anxiety. In this environment, the true value of open scholarship within the research lifecycle is potentially reduced to the language of compliance and REF eligibility. Indeed, during an intensive advocacy campaign leading up to the implementation of HEFCE’s OA policy, we have not found the rate of deposits of current research to be significantly increased. Instead, we witnessed a marked rise in the deposit of legacy publications, many outside of the current REF cycle, and invariably final published versions which we were unable to archive. This hints at an atmosphere of panic amongst some of the academic staff we aim to support.
CORE insights for Brunel
CORE represents one of the most highly regarded aggregators of repository content and is fast becoming an essential part of scholarly infrastructure. Leaving aside the fascinating project work in semantic analysis, we have long felt this resource may help offer a true insight in the collaborative, open publishing practices of our authors and, with a renewed REF focus, a new window for administrators who support compliance. This view is perhaps shared by JISC, who are now actively supporting the project.
We consider our CRIS to be the authoritative record of current research at our institution. We ran our lists of article titles and DOIs through CORE to see if we could identify any publications available:
in any external repository harvested by CORE and in our institutional repository, which might suggest a level of duplicate effort for compliance.
only in any external repository harvested by CORE, which might indicate a truer figure of readily HEFCE compliant outputs.
We have found a huge increase in the identification of Brunel affiliated outputs available in external repository systems.
Fig 4. The distribution of Brunel publications across CORE (data from CORE and Symplectic Elements). Graph produced using google charts.
One finding was that only around 1 in 3 papers are to be found in our repository. The majority are spread across alternative resources. A significant proportion of our repository content is not currently being harvested by CORE as the on-acceptance mandate has led many academics to deposit non-PDF filetypes – an area the CORE team are working to address in future development.
We have only found evidence of a small number of publications (less than 1%) being deposited in both the Brunel University Research Archive and the same paper being available in other repositories harvested by CORE. This suggests that, so far, concerns about duplication of effort have not been realised. However, it should be reiterated that we have only recently entered the period of the REF policy and CORE only harvests publications where a full PDF text is available. After a year (or more), once embargoes begin to expire, this picture may begin to look quite different. For now, the distribution of individual publications appears to be quite evenly spread.
This is confirmation of accepted, and encouraged, academic practice. Namely that a huge number of our publications are widely discoverable in alternative, but suitable, repositories. This is perhaps the most unsurprising outcome given the scale of collaborative projects, funder and publisher policies, the regular migration of staff between institutions and the researcher driven use of subject based resources to disseminate research. The map below highlights the global distribution of Brunel’s research on CORE, beginning to present a wonderful picture of our research as a collaborative enterprise.
However, even this raises some questions. We would expect our partnerships in China, Australia, Japan, India and other parts of the world to have a far greater representation here, especially when international governmental and funder mandates are considered.
Some of the data in our CRIS already contains details of externally held files. These files are not necessarily held in Elements or an Institutional Repository, but purely hold file metadata relating to its holding at an external data source, for example arXiv or Europe PMC records. We have found some discrepancies by comparing this data with that found in CORE. If CORE is to be an essential component in future research service infrastructure, deposits must be completely harvested so as to enhance the efforts undertaken by scholars and institutions. So too, the flow of scholarly metadata must improve. It is essential that our ambition to uniquely attribute authors to their scholarly outputs (through initiatives like ORCID) is fully realised to form the underpinning of this vital research infrastructure.
Externally held files found in Elements
Externally held files found in CORE
Europe PubMed Central
CORE provides an opportunity for a simple, retrospective measure of academia’s inexorable move towards ‘open’. The collective action of academics in disseminating their research cannot be overlooked in this transitory period. Policy compliance is an important factor in the movement, but so too are the traits required for twenty-first century research, namely discoverability, reach, impact and engagement.
Open research transcends borders and policies, and we can see this reflected from the available data in CORE. We see a global community working together in common cause to maximise the communication of their research.
As an institutional service, we are driven by the responsibility we feel for the researchers in our community who require our support, encouragement and guidance. Policy drivers must enhance, and not inhibit, the developing practice of scholars who are rightly taking ownership of dissemination as an integral part of the research lifecycle.
The CORE service might help contextualise the global realities of open access and academic practice as we transition toward 100% open scholarship. There is the potential for CORE to help us re-simplify the agenda, whilst making the process more than just a ‘tick-box’ exercise. In doing so, support staff can then be released from an excessive administrative burden and instead focus greater effort on promoting open scholarship within their institutions.
* Post updated on June 20th and June 23rd with links to presentations.
In this year’s Open Repositories 2016, an international conference addressed to the scholarly communications community with a focus on repositories, open access, open data and open science, CORE had 6 items accepted; 1 Paper, 1 Workshop, 1 Repository Rave presentation, 1 Poster and 2 showcases in the Developer Track and Ideas Challenge. The titles and summaries of our accepted proposals are:
Paper: Exploring Semantometrics: full text-based research evaluation for open repositories / Knoth, Petr; Herrmannova, Drahomira
Over the recent years, there has been a growing interest in developing new scientometric measures that could go beyond the traditional citation-based bibliometric measures. This interest is motivated on one side by the wider availability or even emergence of new information evidencing research performance, such as article downloads, views and twitter mentions, and on the other side by the continued frustrations and problems surrounding the application of citation-based metrics to evaluate research performance in practice. Semantometrics are a new class of research evaluation metrics which build on the premise that full text is needed to assess the value of a publication. This talk will present the results of an investigation into the properties of the semantometric contribution measure (Knoth & Herrmannova, 2014). We will provide a comparative evaluation of the contribution measure with traditional bibliometric measures based on citation counting. Our analysis also focuses on the potential application of semantometric measures in large databases of research papers.
Workshop: Mining Repositories: How to assist the research and academic community in their text and data mining needs – a workshop / Pontika, Nancy; Knoth, Petr; van Dijke, Hege; Anastasiou, Lucas
Over the past five years there has been a significant interest in text and data mining (TDM) practices from the European Union (EU). In scholarly communication, TDM is already a developed practice in some scientific fields, for example, in the life sciences and computer science. Nonetheless, after a call that we sent out to the United Kingdom Council of Research Repositories (UKCoRR) list serve, we discovered that there was a limited number of TDM projects that had as their primary source of information the repositories’ collections. To address this challenge, the EU-funded project OpenMinTeD looks to enable the creation of an infrastructure that fosters and facilitates the use of TDM technologies in the scientific publications field, targeting both domain users and TDM experts. In this context we propose a three hour workshop, where we will introduce the topic of TDM to the repositories community, explore how the OpenMinTed project aims to assist with the adoption of TDM practices and present on existing TDM projects that were conducted using text and data from repositories.
Repository Rave presentation: Implementation of the RIOXX metadata guidelines in the UK’s repositories through a harvesting service / Cancellieri, Matteo; Pontika, Nancy
The COnnecting REpositories (CORE) project aims to aggregate content from open access repositories and journals and distribute this content in one central end point facilitating the open access dissemination of the scientific research. In an effort to improve the quality and transparency of the aggregation process of the open access content and create a two-way collaboration between the CORE project and the providers of this content, CORE has created the Repositories Dashboard. The RIOXX Metadata application profile aims to assist repository managers in tracking compliance with the Research Councils UK Policy on Open Access and Guidance. In this Repositories Rave session we will present how CORE is implementing the RIOXX metadata in the CORE Dashboard.
Poster: Intergration of IRUS-UK statistics in the CORE Repositories Dashboard/ Pearce, Samuel; Pontika, Nancy
The COnnecting REpositories (CORE) project aims to aggregate content from open access repositories and journals, and distribute this content in one central point facilitating the open access dissemination of the scientific research. Institutional Repository Usage Statistics UK (IRUS-UK) is a Jisc-funded project that serves as a national repository usage statistics aggregation service, which aims to provide article download statistics from UK repositories. At CORE, we wanted to present the information regarding the manuscripts’ downloads to repository managers and therefore, we have integrated into the CORE Repositories Dashboard. In this poster we will present a) the submission process of the IRUS-UK statistics and b) how CORE retrieves these statistics and displays them to the UK Higher Education Institutions (HEIs).