| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | Generic Mutex Subsystem | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | started by Ingo Molnar <mingo@redhat.com> | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | updated by Davidlohr Bueso <davidlohr@hp.com> | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | What are mutexes? | 
					
						
							|  |  |  | ----------------- | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | In the Linux kernel, mutexes refer to a particular locking primitive | 
					
						
							|  |  |  | that enforces serialization on shared memory systems, and not only to | 
					
						
							|  |  |  | the generic term referring to 'mutual exclusion' found in academia | 
					
						
							|  |  |  | or similar theoretical text books. Mutexes are sleeping locks which | 
					
						
							|  |  |  | behave similarly to binary semaphores, and were introduced in 2006[1] | 
					
						
							|  |  |  | as an alternative to these. This new data structure provided a number | 
					
						
							|  |  |  | of advantages, including simpler interfaces, and at that time smaller | 
					
						
							|  |  |  | code (see Disadvantages). | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | [1] http://lwn.net/Articles/164802/ | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | Implementation | 
					
						
							|  |  |  | -------------- | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h | 
					
						
							|  |  |  | and implemented in kernel/locking/mutex.c. These locks use a three | 
					
						
							|  |  |  | state atomic counter (->count) to represent the different possible | 
					
						
							|  |  |  | transitions that can occur during the lifetime of a lock: | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | 	  1: unlocked | 
					
						
							|  |  |  | 	  0: locked, no waiters | 
					
						
							|  |  |  |    negative: locked, with potential waiters | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | In its most basic form it also includes a wait-queue and a spinlock | 
					
						
							|  |  |  | that serializes access to it. CONFIG_SMP systems can also include | 
					
						
							|  |  |  | a pointer to the lock task owner (->owner) as well as a spinner MCS | 
					
						
							|  |  |  | lock (->osq), both described below in (ii). | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | When acquiring a mutex, there are three possible paths that can be | 
					
						
							|  |  |  | taken, depending on the state of the lock: | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | (i) fastpath: tries to atomically acquire the lock by decrementing the | 
					
						
							|  |  |  |     counter. If it was already taken by another task it goes to the next | 
					
						
							|  |  |  |     possible path. This logic is architecture specific. On x86-64, the | 
					
						
							|  |  |  |     locking fastpath is 2 instructions: | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  |     0000000000000e10 <mutex_lock>: | 
					
						
							|  |  |  |     e21:   f0 ff 0b                lock decl (%rbx) | 
					
						
							|  |  |  |     e24:   79 08                   jns    e2e <mutex_lock+0x1e> | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							|  |  |  |    the unlocking fastpath is equally tight: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  |     0000000000000bc0 <mutex_unlock>: | 
					
						
							|  |  |  |     bc8:   f0 ff 07                lock incl (%rdi) | 
					
						
							|  |  |  |     bcb:   7f 0a                   jg     bd7 <mutex_unlock+0x17> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | (ii) midpath: aka optimistic spinning, tries to spin for acquisition | 
					
						
							|  |  |  |      while the lock owner is running and there are no other tasks ready | 
					
						
							|  |  |  |      to run that have higher priority (need_resched). The rationale is | 
					
						
							|  |  |  |      that if the lock owner is running, it is likely to release the lock | 
					
						
							|  |  |  |      soon. The mutex spinners are queued up using MCS lock so that only | 
					
						
							|  |  |  |      one spinner can compete for the mutex. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |      The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock | 
					
						
							|  |  |  |      with the desirable properties of being fair and with each cpu trying | 
					
						
							|  |  |  |      to acquire the lock spinning on a local variable. It avoids expensive | 
					
						
							|  |  |  |      cacheline bouncing that common test-and-set spinlock implementations | 
					
						
							|  |  |  |      incur. An MCS-like lock is specially tailored for optimistic spinning | 
					
						
							|  |  |  |      for sleeping lock implementation. An important feature of the customized | 
					
						
							|  |  |  |      MCS lock is that it has the extra property that spinners are able to exit | 
					
						
							|  |  |  |      the MCS spinlock queue when they need to reschedule. This further helps | 
					
						
							|  |  |  |      avoid situations where MCS spinners that need to reschedule would continue | 
					
						
							|  |  |  |      waiting to spin on mutex owner, only to go directly to slowpath upon | 
					
						
							|  |  |  |      obtaining the MCS lock. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | (iii) slowpath: last resort, if the lock is still unable to be acquired, | 
					
						
							|  |  |  |       the task is added to the wait-queue and sleeps until woken up by the | 
					
						
							|  |  |  |       unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | While formally kernel mutexes are sleepable locks, it is path (ii) that | 
					
						
							|  |  |  | makes them more practically a hybrid type. By simply not interrupting a | 
					
						
							|  |  |  | task and busy-waiting for a few cycles instead of immediately sleeping, | 
					
						
							|  |  |  | the performance of this lock has been seen to significantly improve a | 
					
						
							|  |  |  | number of workloads. Note that this technique is also used for rw-semaphores. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Semantics | 
					
						
							|  |  |  | --------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The mutex subsystem checks and enforces the following rules: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     - Only one task can hold the mutex at a time. | 
					
						
							|  |  |  |     - Only the owner can unlock the mutex. | 
					
						
							|  |  |  |     - Multiple unlocks are not permitted. | 
					
						
							|  |  |  |     - Recursive locking/unlocking is not permitted. | 
					
						
							|  |  |  |     - A mutex must only be initialized via the API (see below). | 
					
						
							|  |  |  |     - A task may not exit with a mutex held. | 
					
						
							|  |  |  |     - Memory areas where held locks reside must not be freed. | 
					
						
							|  |  |  |     - Held mutexes must not be reinitialized. | 
					
						
							|  |  |  |     - Mutexes may not be used in hardware or software interrupt | 
					
						
							|  |  |  |       contexts such as tasklets and timers. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. | 
					
						
							|  |  |  | In addition, the mutex debugging code also implements a number of other | 
					
						
							|  |  |  | features that make lock debugging easier and faster: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     - Uses symbolic names of mutexes, whenever they are printed | 
					
						
							|  |  |  |       in debug output. | 
					
						
							|  |  |  |     - Point-of-acquire tracking, symbolic lookup of function names, | 
					
						
							|  |  |  |       list of all locks held in the system, printout of them. | 
					
						
							|  |  |  |     - Owner tracking. | 
					
						
							|  |  |  |     - Detects self-recursing locks and prints out all relevant info. | 
					
						
							|  |  |  |     - Detects multi-task circular deadlocks and prints out all affected | 
					
						
							|  |  |  |       locks and tasks (and only those tasks). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Interfaces | 
					
						
							|  |  |  | ---------- | 
					
						
							|  |  |  | Statically define the mutex: | 
					
						
							|  |  |  |    DEFINE_MUTEX(name); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Dynamically initialize the mutex: | 
					
						
							|  |  |  |    mutex_init(mutex); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Acquire the mutex, uninterruptible: | 
					
						
							|  |  |  |    void mutex_lock(struct mutex *lock); | 
					
						
							|  |  |  |    void mutex_lock_nested(struct mutex *lock, unsigned int subclass); | 
					
						
							|  |  |  |    int  mutex_trylock(struct mutex *lock); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Acquire the mutex, interruptible: | 
					
						
							|  |  |  |    int mutex_lock_interruptible_nested(struct mutex *lock, | 
					
						
							|  |  |  | 				       unsigned int subclass); | 
					
						
							|  |  |  |    int mutex_lock_interruptible(struct mutex *lock); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Acquire the mutex, interruptible, if dec to 0: | 
					
						
							|  |  |  |    int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Unlock the mutex: | 
					
						
							|  |  |  |    void mutex_unlock(struct mutex *lock); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Test if the mutex is taken: | 
					
						
							|  |  |  |    int mutex_is_locked(struct mutex *lock); | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							|  |  |  | Disadvantages | 
					
						
							|  |  |  | ------------- | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | Unlike its original design and purpose, 'struct mutex' is larger than | 
					
						
							|  |  |  | most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice | 
					
						
							|  |  |  | as large as 'struct semaphore' (24 bytes) and 8 bytes shy of the | 
					
						
							|  |  |  | 'struct rw_semaphore' variant. Larger structure sizes mean more CPU | 
					
						
							|  |  |  | cache and memory footprint. | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | When to use mutexes | 
					
						
							|  |  |  | ------------------- | 
					
						
							| 
									
										
										
										
											2006-01-09 15:59:20 -08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2014-05-28 21:36:43 -07:00
										 |  |  | Unless the strict semantics of mutexes are unsuitable and/or the critical | 
					
						
							|  |  |  | region prevents the lock from being shared, always prefer them to any other | 
					
						
							|  |  |  | locking primitive. |