2007-11-29 14:18:54 +00:00
|
|
|
/* -*- mode: C; c-basic-offset: 4 -*- */
|
|
|
|
#ident "Copyright (c) 2007 Tokutek Inc. All rights reserved."
|
|
|
|
|
2007-11-14 17:58:38 +00:00
|
|
|
/* Readers/writers locks implementation
|
|
|
|
*
|
|
|
|
*****************************************
|
|
|
|
* Overview
|
|
|
|
*****************************************
|
|
|
|
*
|
|
|
|
* TokuDB employs readers/writers locks for the ephemeral locks (e.g.,
|
|
|
|
* on BRT nodes) Why not just use the pthread_rwlock API?
|
|
|
|
*
|
|
|
|
* 1) we need multiprocess rwlocks (not just multithreaded)
|
|
|
|
*
|
|
|
|
* 2) pthread rwlocks are very slow since they entail a system call
|
|
|
|
* (about 2000ns on a 2GHz T2500.)
|
|
|
|
*
|
|
|
|
* Related: We expect the common case to be that the lock is
|
|
|
|
* granted
|
|
|
|
*
|
|
|
|
* 3) We are willing to employ machine-specific instructions (such
|
|
|
|
* as atomic exchange, and mfence, each of which runs in about
|
|
|
|
* 10ns.)
|
|
|
|
*
|
|
|
|
* 4) We want to guarantee nonstarvation (many rwlock
|
|
|
|
* implementations can starve the writers because another reader
|
|
|
|
* comes * along before all the other readers have unlocked.)
|
|
|
|
*
|
|
|
|
*****************************************
|
|
|
|
* How it works
|
|
|
|
*****************************************
|
|
|
|
*
|
|
|
|
* We arrange that the rwlock object is in the address space of both
|
|
|
|
* threads or processes. For processes we use mmap().
|
|
|
|
*
|
|
|
|
* The rwlock struct comprises the following fields
|
|
|
|
*
|
|
|
|
* a long mutex field (which is accessed using xchgl() or other
|
|
|
|
* machine-specific instructions. This is a spin lock.
|
|
|
|
*
|
|
|
|
* a read counter (how many readers currently have the lock?)
|
|
|
|
*
|
|
|
|
* a write boolean (does a writer have the lock?)
|
|
|
|
*
|
|
|
|
* a singly linked list of semaphores for waiting requesters. This
|
|
|
|
* list is sorted oldest requester first. Each list element
|
|
|
|
* contains a semaphore (which is provided by the requestor) and a
|
|
|
|
* boolean indicating whether it is a reader or a writer.
|
|
|
|
*
|
|
|
|
* To lock a read rwlock:
|
|
|
|
*
|
|
|
|
* 1) Acquire the mutex.
|
|
|
|
*
|
|
|
|
* 2) If the linked list is not empty or the writer boolean is true
|
|
|
|
* then
|
|
|
|
*
|
|
|
|
* a) initialize your semaphore (to 0),
|
|
|
|
* b) add your list element to the end of the list (with rw="read")
|
|
|
|
* c) release the mutex
|
|
|
|
* d) wait on the semaphore
|
|
|
|
* e) when the semaphore release, return success.
|
|
|
|
*
|
|
|
|
* 3) Otherwise increment the reader count, release the mutex, and
|
|
|
|
* return success.
|
|
|
|
*
|
|
|
|
* To lock the write rwlock is almost the same.
|
|
|
|
* 1) Acquire the mutex
|
|
|
|
* 2) If the list is not empty or the reader count is nonzero
|
|
|
|
* a) initialize semaphore
|
|
|
|
* b) add to end of list (with rw="write")
|
|
|
|
* c) release mutex
|
|
|
|
* d) wait on the semaphore
|
|
|
|
* e) return success when the semaphore releases
|
|
|
|
* 3) Otherwise set writer=TRUE, release mutex and return success.
|
|
|
|
*
|
|
|
|
* To unlock a read rwlock:
|
|
|
|
* 1) Acquire mutex
|
|
|
|
* 2) Decrement reader count
|
|
|
|
* 3) If the count is still positive or the list is empty then
|
|
|
|
* return success
|
|
|
|
* 4) Otherwise (count==zero and the list is nonempty):
|
|
|
|
* a) If the first element of the list is a reader:
|
|
|
|
* i) while the first element is a reader:
|
|
|
|
* x) pop the list
|
|
|
|
* y) increment the reader count
|
|
|
|
* z) increment the semaphore (releasing it for some waiter)
|
|
|
|
* ii) return success
|
|
|
|
* b) Else if the first element is a writer
|
|
|
|
* i) pop the list
|
|
|
|
* ii) set writer to TRUE
|
|
|
|
* iii) increment the semaphore
|
|
|
|
* iv) return success
|
|
|
|
*/
|
|
|
|
|