Linear scalable read-write lock

The basic concept of a read-write lock is simple. It allows multiple readers to access the resource simultaneously, but at most one thread can have exclusive ownership of the lock (a.k.a write lock). It’s supposed to be an optimization, comparing to simple mutex (e.g. std::mutex). As in theory, multiple readers do not contend. Naively, read performance should scale linearly. However, this is not the case in practice for most read-write locks, due to cacheline pingpong-ing.

Let’s say N threads are running on N cores, all trying to get shared ownership of the mutex (read…