The last 10 years have seen a massive increase in the amounts of Open Access publications available in journals and institutional repositories. The open presence of large volumes of state-of-the-art knowledge online has the potential to provide huge savings and benefits in many fields. However, in order to fully leverage this knowledge, it is necessary to develop systems that (a) make it easy for users to discover, explore and access this knowledge at the level of individual resources, (b) explore and analyse this knowledge at the level of collections of resources and (c) provide infrastructure and access to raw data in order to lower the barriers to the research and development of systems and services on top of this knowledge. The CORE system is trying to address these issues by providing the necessary infrastructure.
According to the level of abstraction at which a user communicates with an aggregation system, it is possible to identify the following types of access:
- Raw data access
- Transaction access
- Analytical access
With these access types in mind, we can think of the different kinds of users of aggregation systems and map them according to their major access type. The table below lists the main kinds of users and explains how aggregations can serve them. It is possible to see, that most of the user groups will expect to communicate with an aggregation system in a specific way. While developers are interested in accessing the raw data, for example through an API, individuals will primarily require accessing the content at the level of individual items or relatively small sets of items, mostly expecting to communicate with a digital library (DL) using a set of search and exploration tools. A relatively specific group of users are eResearchers whose work is largely motivated by information communicated at the transaction and analytical levels, but in terms of their actual work are mostly dependent on raw data access typically realised using APIs and downloadable datasets.
Types of information access | What does it provide | Users group |
Raw data access | Access to the raw metadata and content as downloadable files or through an API. The content and metadata might be cleaned, harmonised, preprocessed and enriched. | Developers, DLs, DL researchers, companies |
Transaction information access | Access to information primarily with the goal to find and explore content of interest typically realised through the use of a web portal and its search and exploratory tools. | Researchers, students, life-long learners. |
Analytical information access | Access to statistical information at the collection or sub-collection level often realised through the use of tables or charts. | Funders, government, business intelligence |
The figure below depicts the inputs and outputs of an aggregation system showing the three access levels. Based on the access level requirements for the individual user groups, we can specify services needed for their support. It is true that various existing OA aggregation systems focus on providing access at one or more of these levels. While altogether they cover all the three access levels, none of them supports all access levels. The central question is, if it is sufficient to build an OA infrastructure as a set of complementary services. Each of these services would support a specific access level and altogether they would support all of them. An alternative solution would be a single system providing support for all access levels.
One can argue that out of the three access levels, the most essential one is the raw data access level, as all the other levels can be developed on top of this one. This suggests that the overall OA infrastructure can be composed of many systems and services. So, why does the current infrastructure provide insufficient support for these access levels?
All the needed functionality can be built on top of the first access level, but the current support for this level is very limited. In fact, there is currently no aggregation of all OA materials that would provide harmonised, unrestricted and convenient access to OA metadata and content. Instead, we have many aggregations each of which is supporting a specific access level or a user group, but most of which are essentially relying on different data sets. As a result, it is not possible for analysts to make firm conclusions about the OA data, it is not possible to reliably inform individuals about what is in the data and most importantly it is very difficult for eResearchers and developers to provide better technology for the upper access levels when their level of access to OA content is limited or at least complicated.
To exploit the opportunities OA content offers, OA technical infrastructure must support all the listed access levels users need. This can be realised by many systems and services, but it is essential that they operate over the same dataset.
the CORE system provides a range of services for accessing and exposing the aggregated data. At the moment, the services are delivered through the following applications: CORE Portal, CORE Mobile, CORE Plugin, CORE API and Repository Analytics.
The CORE applications convey information to the user at all three levels of abstraction. The CORE API communicates information in the form of raw data that typically require further processing before they can be used in a specific context. CORE Portal, CORE Mobile and CORE plugin make all use of a user interface to convey information at the level of individual articles. Finally, Repository Analytics provide information at the level of the whole collection or sub-collections.
Since CORE supports all three types of access, it also provides certain functionality for all the user groups identified in that table on a single dataset and at the level of the content (not just metadata). While we do not claim that CORE provides all functionality that these user groups need or (CORE is still at its infancy and improving the existing services as well as adding new services is something that is expected to be done on a regular basis.), we claim that this combination provides a healthy environment on top of which the overall OA technical infrastructure can be built. To give an example, it allows eResearchers to access the dataset and experiment with it, for example, to develop a method for improving a specific task at the transaction level (such as search ranking) or analytical level (such as trends visualisation). The crucial aspect is that the method can be evaluated with respect to existing services already offered by CORE (or anybody else) built on top of the CORE aggregated dataset, i.e. the researcher has the same level of access to the data as all the CORE services. The method can now also be implemented and provided as a service on top of this dataset. The value of such infrastructure is in the ability to interact with the same data collection at any point in time at the three different levels.
A question one might ask is why should an aggregation system like CORE provide support for all three access levels when many might see the main job of an aggregator in just aggregating and providing access. As we previously explained, the whole OA technical infrastructure can consist of many services, providing that they are built on the same dataset. While CORE aims to support others in building their own applications, we also recognise the needs of different user groups (apart from researchers and developers) and want to support them. While this might seems as a dilution of effort, our experience indicates that about 90% developers time is spent in aggregating, cleaning and processing data and only the remaining 10\% in providing services, such as CORE Portal or Repository Analytics on top of this data. It is therefore not only needed that research papers are Open Access, the OA technical infrastructures and services should also be metaphorically “open access,” opening new ways for the development of innovative applications, allowing analytical access to the content while at the same time providing all basic functions users need including searching and accessing research papers.