CORE’s unique position with a global view of all open repositories enables us to work closely with its member organisations to develop and deliver tools that benefit repositories and repository managers. CORE recently introduced the new CORE Dashboard Versions and Duplicates module which provides a simple interface for identifying versions and duplicates in a repository. The system identifies different versions of articles and enables side-by-side reviewing. The different versions can then be marked using the widely used NISO Journal Article Versions (JAV) taxonomy. You can read a full overview of the new module in this recent blog post.
Figure 1: Screenshot from the deduplication module
CORE’s Michael Upshall recently met with Kirsten Vallee, Repository Services Manager at The University of Chicago, to discuss how their institution is benefitting from the new deduplication module.
Michael: Having duplicates in a repository can cause issues, why is identifying these duplicates important for The University of Chicago?
Kirsten: “Duplicates in the repository matter because we do not want to replicate records therefore replicating the statistics associated with them. In order to obtain the best measurements of the institutional repository’s broader impact, we need to have an accurate representation of its reach. This is hindered if the article is duplicated in the repository making retrieving that data more complicated.”
Michael: Have you previously been checking for duplicate records or is this something that was not possible before?
Kirsten: “We do not have a duplicate check built into our repository, so CORE’s duplicate check is immensely helpful. While other repositories might already have duplicate checks, I don’t know about their usage of ML to find duplicates. Based on my experience, I would guess that it isn’t as accurate if it relies primarily on exact matches. CORE’s duplicate check displays the “confidence” it has that a record is duplicated in the repository, which is also helpful when determining the final decision on whether a duplicate record needs to be deleted.”
Michael: Does the duplication detection work as expected, does it find the correct duplicate records?
Kirsten: “In terms of sensitivity, I have found CORE’s duplicate check to be sensitive to duplicates in the repository allowing for the user to be the final decision maker on whether or not the item is indeed a duplicate. This level of sensitivity may be considered too sensitive, but I’d say it is just the right amount.”
The Versions and Duplicates module is one of a range of tools and services delivered by CORE to support the open repositories network. We are continually working to improve what we do and it is by working with, and listening to, our members that we can best deliver on our mission.
If your institution is not yet a Supporting or Sustaining member of CORE, please do consider joining the amazing institutions that have committed to the ongoing sustainability of CORE.