A Deeper Dive into our Proof of Concept

In our previous post we gave a high-level introduction to our Blockchain for Peer Review initiative Proof-of-Concept. We got lots of questions and you were keen to understand what we store exactly on the blockchain. So Katalysis, a founding and technical partner responsible for the technical implementation of the initiative, dives into a bit more detail in this post.

RECAP: The WHO, WHAT and WHY

Digital Science, Springer Nature, Taylor & Francis, Cambridge University Press, ORCID, Katalysis and the Wellcome Trust, as advisors, joined forces to form an industry-wide not-for-profit initiative with the intent to solve the challenges around the peer review process. It’s not news that peer review, the process of subjecting an author’s manuscript to the scrutiny of experts in the same field, today faces increasing challenges:

  • Increasing difficulty in finding and identifying suitable and available reviewers. This is due to the growth in research outputs, including rapid growth from emerging economies.
  • Lack of reviewer recognition.
  • Fraud and manipulation.
  • Lack of transportability of review.
  • Lack of transparency and decreasing trust in this process.

This initiative aims to open up the blackbox of the peer review process. In order to make the process more efficient, transparent, and trustworthy, we aim to create a neutral infrastructure for sharing peer review data in the ecosystem with respect to the confidential nature of the process. This initiative’s aim is to build a solution that will equally benefit all participants, including publishers, funders and institutions.

HIGH-LEVEL ARCHITECTURE

Below you see the high-level overview of the architecture we build for the Proof of Concept for the Blockchain for Peer Review initiative.

We start with extracting data from the publisher’s ‘Manuscript Management Systems’ internal databases. This information we parse into various pieces of data. These processes all happen within the publisher’s firewall in order to comply with the General Data Protection Regulation (GDPR). In the diagram below, everything within the orange dotted line are the processes within the publishers’ firewall.

The extracted data is distributed in various places of which one of them is the blockchain and, in our Proof of Concept, the other is ORCID. The Peer Review Query system, a part that remains within the publishers’ firewall, allows publishers to open up parts of their peer review data to external parties which they may not want to store on the blockchain for various reasons. The User Management System represents the user-facing interface, for instance, a web-based User Interface, to interact with the various pieces of the system to retrieve data.

High level architecture

High-level architecture for the Blockchain for Peer Review initiative

PARSER

As explained above, we developed a parser which extracts data from the Manuscript Management Systems. The parser processes the data into 3 parts:

  1. Personal data — the data which can identify reviewers on a personal level. This data, falling under GDPR in Europe, never leaves the publishers systems unless there is a specific agreement with the reviewer. This data will never be stored on the blockchain.
  2. Public Data — data that are relationships between various entities which comes from the internal systems. For instance, the state in which a review is in, other relevant meta data and the permissions to access the internal data. We store information to be able to prove that a certain relationship exists and to prove the reviewer has performed a specific review.
  3. Internal Data — this data we normalise and store in an internal database. This can be sensitive and confidential data, data which could give competitive advantage for instance. The publisher doesn’t wish to share this data with others per se, or at least not without a permissioning layer. The intent is to develop a very sophisticated and fine-grained permissioning system on the blockchain which would grant permissions to be able to query this data.

Parser

WHAT WE STORE ON THE BLOCKCHAIN

Diving a bit deeper as to what we store in the blockchain. The data extracted from the Manuscript Management Systems are mostly around the review state, the manuscript, the journal and the reviewer. As discussed before however, the data we store on the blockchain is by no means any personal information. What we store is:

  • The manuscript id
  • Journal
  • Timestamp
  • Review State
  • Publishers’ signed shared secret
  • Reviewers’ signed shared secret

The manuscript id, journal and timestamp, non-personal identifiable information, is extracted from the publisher’s data. The Review State in this Proof of Concept is determined by analysing sample data of 26 journal files extracted from Editorial Manager and identifying the overlaps. The review state states for instance if the reviewer has been invited, has accepted the invitation, or if she or he is not responding to the invitation, etc. These states are stored and connected to a manuscript and an entry to which the reviewer can claim with their public key that she or he was part of the peer review process.

Review states

Review states in the Proof of Concept

The reviewer’s name is never stored as is on the blockchain to avoid revealing someone’s identity. Before storing the data on the blockchain, we process the reviewer’s data by removing the reviewer from the data set and replacing it with a random shared secret instead. Using the reviewers public key, this secret is encrypted as a shared secret: the “reviewer encrypted shared secret reviewer”. This shared secret cannot identify the reviewer at all as it differs per manuscript. Using the publishers private key the secret is signed creating: the “publisher signed shared secret”. These shared secrets, which we store on the blockchain, are randomly generated secrets between the publisher and the reviewer. This means that a reviewer can prove when showing the shared secret that she or he is part of the relationship or in other words that they had performed the review. Anyone who has been given the shared secrets or a proxy to the secret can verify with the publishers public key that they were part of the review. Authentication to the publisher is given by the fact that they can prove to external parties that they were present when the secret was generated.

The information stored on the blockchain itself is meaningless unless you can prove the secret.

Concept architecture

Katalysis Proof of Concept Architecture

TECHNOLOGIES USED

The technologies we used for the Proof of Concept:

  • Blockchain is EVM compatible, runs on a private Ethermint network on top of Tendermint (Proof of Stake)
  • Processes are Swift binaries running on Linux Ubuntu (16.04+)
  • Processes run as micro services designed to be containerised
tech stack

Our technology stack

NEXT PHASE — MINIMAL VIABLE PRODUCT

The next phase for us is to integrate the parser and the query server into a single node; a peer review node. This node will run on the publisher’s premises. This allows the publisher to exchange data when given the permissions from both ends. Of course, when a third party is involved falling under GDPR regulations their consent is required as well. The peer review node aggregates data from multiple nodes, sends data to permissioned parties and confirms veracity of review claims. Furthermore, it processes the data into the blockchain.

MVP

High-level MVP

If you’re interested as to why such a system enhances the peer review process for the better, we would encourage you to follow the initiative on Twitter or subscribe to the newsletter

View Katalysis’s original blog post here