How to Give Scholarly Publishing a Digital Facelift – Using Blockchain for Peer Review

Blockchain, you ask — isn’t that something to do with bitcoin and other cryptocurrencies?

While indeed it was first widely used for bitcoin, blockchain is a versatile technology, not a cryptocurrency nor exclusively useful for building one. Blockchain is, at its core, a technology for creating an immutable, trustworthy, distributed data structure. Its distributed nature is one of its essential features. It means that rather than residing on some central server, a copy of a blockchain —with all of the information stored on it — exists on every participating node in its network, a node being a computer or server running the appropriate software. For our project the nodes will ultimately be the publishers and organizations collaborating on our project, but the finer details of who gets to be a node and what can do they depend on the way a blockchain is designed. A cryptocurrency like bitcoin, for instance, has a “public” blockchain, which means that a node is anyone who has a bitcoin wallet. Our blockchain is, by comparison, “private.” But whether private or public, a blockchain has no central data store.

Why is this relevant to our project?

The peer review process, as hardly needs telling, is imperfect, and one of the central difficulties with the process is a lack of transparency — the natural result of many different publishers and journals running many different peer review systems. Our initiative aims to introduce transparency into peer review by making crucial data about the process available across a shared platform that can comply with the demands of confidentiality and privacy. We at Katalysis believe that a blockchain based platform is desirable for this purpose because the data on it isn’t owned or controlled by one entity. This provides a guarantee to the collaborators we bring onboard that they will be operating in an environment without a central gatekeeper — neither we the developers nor any one publisher — who controls access to the data and makes decisions about how it is added and maintained.

This raises two sets of questions:

  1. How, in terms of technology, is data managed on a platform without a central administrator?
  2. And, separately, can a shared platform respect privacy and confidentiality?

I’ll address the second question first, and then come back to the first, technical question.

It’s crucial to emphasize that, as a shared platform, data on the blockchain inherently is available to all of its nodes, unlike with a central database controlled by an admin who can easily lock down certain pieces of information. Nonetheless, we will use the pseudonymous nature of a blockchain to ensure that our system will be both GDPR compliant (privacy), and compatible with different forms of review, be it open, blind, or double-blind (confidentiality). Our approach might be described as divide and conquer, with different pieces of information — a manuscript, the reviewers (i.e. an ORCID iD), the reviewee, etc. — not linked on the blockchain itself. Additionally, they are pseudonymized, effectively meaning that they are addressed as a random string of numbers. These pieces of information only come together off of the blockchain, when a user with the appropriate authority, who essentially knows what they’re looking for, brings them together on their own end. For personal privacy, this means that any information that could be pieced together to infer the identity of an individual will be segregated. Furthermore, we will not store explicit personal identifying information on the blockchain at all (a blockchain is immutable, so this would conflict with GDPR’s Right to be Forgotten). Rather, we will outsource handling of personal information to other platforms such as ORCID.

That interlude aside, there is still the nagging question of the mechanism for legitimately adding data to a platform without a gatekeeper, so to speak. A blockchain is considered trustworthy (or “trustless,” as it is often described) and this is because changes cannot be made willy-nilly by any one node on the blockchain, but only via a platform’s pre-determined consensus algorithm. At the highest level, a consensus algorithm is just what is sounds like: a mechanism through which consensus is reached amongst nodes on the blockchain about what new information is going to be added to it. Or, in more technical terms, it determines which transactions (in a general, not necessarily financial sense) are going to be validated. This consensus process is at the core of what makes a blockchain based system trustless. There is no single administrator whom we have to trust to enter data correctly and efficiently, refrain from censorship or alteration, and secure the integrity of the data from malicious actors. Rather trust is distributed amongst all the nodes on the blockchain.

Our pilot specifically will use the consensus mechanism of the Tendermint core called Proof of Stake (PoS). A bit of context: with a system such as bitcoin, transactions are validated through a mechanism called Proof of Work (PoW). Nodes called “miners” must compete to validate transactions by solving a computationally intensive problem. The solution, which is easy to verify once found, is the proof of work that allows a miner to then add a new block to the blockchain and collect a financial reward. PoW’s main drawback, however, is that it is extremely energy intensive and not environmentally sustainable. PoS is an alternative model whereby the frequency with which a validator node gets to create blocks is proportional to their stake in the project, by whatever metric that is defined. Beyond being more sustainable, PoS is suited for a private blockchain such as ours because rather than seeking to incentivize individuals across the world to validate our transactions, we only want validators to be the publishers involved in the project. And a third advantage of PoS is that it fits the structure of our organization, because the power of validators is proportional to their stake in the project, not the arbitrary factor of computing power.

Having described how changes are made, a final issue that I ought to address is what it means for a blockchain to be “immutable,” as I mentioned early on. Immutability arises from the blockchain’s structure — a sequence of “blocks” of information, linked by cryptographic hashes. This is a complicated way of saying that every block contains a value — the “hash” — that depends on the previous block in the chain. The result is that no individual can surreptitiously change information on one block on their chain without changing each block after it and thus invalidating it. Legitimate changes are made by adding more data to the end of the chain, like a transaction that cancels out a previous transaction on a ledger (this is, in fact, precisely how transactions occur on a blockchain based cryptocurrency system such as bitcoin). But the old data is still there, preserved in the digital equivalent of a paper trail. This gives a blockchain based system the added feature of transparency, since it preserves a record of all the changes that have been made since its inception.

In sum then, a blockchain based system for peer review has a number of advantages due to its structure and distributed nature:

  1. It is trustless, because changes are made through consensus amongst collaborators, not one administrator.
  2. It is immutable, meaning that changes can only be made by adding new data through the consensus process.
  3. And finally, it is transparent, because it preserves a complete record of all the changes that have been made across its history.

For these reasons, we see it as a promising technology for our attempt to bring peer review a few steps into the future.

View the original blog post on the Katalysis website.