Lamport proposed Paxos Commit algorithm as an alternative to 2PC for achieving Transaction Commit. It’s a very natural application of the Paxos algorithm (if you have read this post, where I pointed out Paxos is effectively itself a read-modify-write transaction protocol.)
There’s one subtle difference between 2PC and Paxos Commit that is not addressed in Lamport’s paper or anywhere I can find. It’s about how it enforces constraints. Taking databases for example, values are sometimes associated with constraints (e.g. y can never to less than 0, x can never be greater than 1, etc.). When you are running transactions that touches values with constraints, the correctness of 2pc is straightforward, because every participant in a transaction has veto power. It makes sure all invariants are honored before moving to the commit phase.
But how does this work at all for Paxos Commit? Paxos Commit distributes the decision of a transaction ( commit or abort) to the participants (a.k.a RM/Resource Managers, Acceptors). When
y is involved in a Paxos Commit, the decision can be commit while
x can't commit without violating its own constraint. What's going on here? Did Lamport make a mistake?
Of course not. You can actually break Transaction Commit into two parts
- consensus of executing a transaction (all-or-nothing)
- correctly executing a transaction without violating constraints
2PC solves both problem at the same time. Paxos Commit only solves the first problem. i.e. with Paxos Commit, all participants will agree on whether to execute a transaction or not. But it doesn’t say about how to execute the transaction correctly to avoid violating constraints.
OK. But how do we solve #2 indenpendently? This is exactly what the Calvin paper is about. The gist of it is that if we can agree on #1 (either all or nobody will execute a transaction), we can solve #2. Dr. Abadi actually wrote a blog post about exactly how this would enable us to move away from 2pc. The main idea is that once all participants agreed on executing a transaction (#1), they can figure out all the dependencies and constraints independently. E.g. participants can query the host owning key
x to see if it executed transaction-id-42. It does require keeping extra metadata for transactions at least for a short period of time. Dr.Abadi himself confirmed this on HN as well.
Originally published at https://blog.the-pans.com on February 17, 2021.