114 lines
		
	
	
	
		
			5 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
		
		
			
		
	
	
			114 lines
		
	
	
	
		
			5 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
| 
								 | 
							
									Locking scheme used for directory operations is based on two
							 | 
						||
| 
								 | 
							
								kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									For our purposes all operations fall in 5 classes:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								1) read access.  Locking rules: caller locks directory we are accessing.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								2) object creation.  Locking rules: same as above.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								3) object removal.  Locking rules: caller locks parent, finds victim,
							 | 
						||
| 
								 | 
							
								locks victim and calls the method.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								4) rename() that is _not_ cross-directory.  Locking rules: caller locks
							 | 
						||
| 
								 | 
							
								the parent, finds source and target, if target already exists - locks it
							 | 
						||
| 
								 | 
							
								and then calls the method.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								5) link creation.  Locking rules:
							 | 
						||
| 
								 | 
							
									* lock parent
							 | 
						||
| 
								 | 
							
									* check that source is not a directory
							 | 
						||
| 
								 | 
							
									* lock source
							 | 
						||
| 
								 | 
							
									* call the method.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								6) cross-directory rename.  The trickiest in the whole bunch.  Locking
							 | 
						||
| 
								 | 
							
								rules:
							 | 
						||
| 
								 | 
							
									* lock the filesystem
							 | 
						||
| 
								 | 
							
									* lock parents in "ancestors first" order.
							 | 
						||
| 
								 | 
							
									* find source and target.
							 | 
						||
| 
								 | 
							
									* if old parent is equal to or is a descendent of target
							 | 
						||
| 
								 | 
							
										fail with -ENOTEMPTY
							 | 
						||
| 
								 | 
							
									* if new parent is equal to or is a descendent of source
							 | 
						||
| 
								 | 
							
										fail with -ELOOP
							 | 
						||
| 
								 | 
							
									* if target exists - lock it.
							 | 
						||
| 
								 | 
							
									* call the method.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The rules above obviously guarantee that all directories that are going to be
							 | 
						||
| 
								 | 
							
								read, modified or removed by method will be locked by caller.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								If no directory is its own ancestor, the scheme above is deadlock-free.
							 | 
						||
| 
								 | 
							
								Proof:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									First of all, at any moment we have a partial ordering of the
							 | 
						||
| 
								 | 
							
								objects - A < B iff A is an ancestor of B.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									That ordering can change.  However, the following is true:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								(1) if object removal or non-cross-directory rename holds lock on A and
							 | 
						||
| 
								 | 
							
								    attempts to acquire lock on B, A will remain the parent of B until we
							 | 
						||
| 
								 | 
							
								    acquire the lock on B.  (Proof: only cross-directory rename can change
							 | 
						||
| 
								 | 
							
								    the parent of object and it would have to lock the parent).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								(2) if cross-directory rename holds the lock on filesystem, order will not
							 | 
						||
| 
								 | 
							
								    change until rename acquires all locks.  (Proof: other cross-directory
							 | 
						||
| 
								 | 
							
								    renames will be blocked on filesystem lock and we don't start changing
							 | 
						||
| 
								 | 
							
								    the order until we had acquired all locks).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								(3) any operation holds at most one lock on non-directory object and
							 | 
						||
| 
								 | 
							
								    that lock is acquired after all other locks.  (Proof: see descriptions
							 | 
						||
| 
								 | 
							
								    of operations).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									Now consider the minimal deadlock.  Each process is blocked on
							 | 
						||
| 
								 | 
							
								attempt to acquire some lock and already holds at least one lock.  Let's
							 | 
						||
| 
								 | 
							
								consider the set of contended locks.  First of all, filesystem lock is
							 | 
						||
| 
								 | 
							
								not contended, since any process blocked on it is not holding any locks.
							 | 
						||
| 
								 | 
							
								Thus all processes are blocked on ->i_sem.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									Non-directory objects are not contended due to (3).  Thus link
							 | 
						||
| 
								 | 
							
								creation can't be a part of deadlock - it can't be blocked on source
							 | 
						||
| 
								 | 
							
								and it means that it doesn't hold any locks.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									Any contended object is either held by cross-directory rename or
							 | 
						||
| 
								 | 
							
								has a child that is also contended.  Indeed, suppose that it is held by
							 | 
						||
| 
								 | 
							
								operation other than cross-directory rename.  Then the lock this operation
							 | 
						||
| 
								 | 
							
								is blocked on belongs to child of that object due to (1).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									It means that one of the operations is cross-directory rename.
							 | 
						||
| 
								 | 
							
								Otherwise the set of contended objects would be infinite - each of them
							 | 
						||
| 
								 | 
							
								would have a contended child and we had assumed that no object is its
							 | 
						||
| 
								 | 
							
								own descendent.  Moreover, there is exactly one cross-directory rename
							 | 
						||
| 
								 | 
							
								(see above).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									Consider the object blocking the cross-directory rename.  One
							 | 
						||
| 
								 | 
							
								of its descendents is locked by cross-directory rename (otherwise we
							 | 
						||
| 
								 | 
							
								would again have an infinite set of of contended objects).  But that
							 | 
						||
| 
								 | 
							
								means that cross-directory rename is taking locks out of order.  Due
							 | 
						||
| 
								 | 
							
								to (2) the order hadn't changed since we had acquired filesystem lock.
							 | 
						||
| 
								 | 
							
								But locking rules for cross-directory rename guarantee that we do not
							 | 
						||
| 
								 | 
							
								try to acquire lock on descendent before the lock on ancestor.
							 | 
						||
| 
								 | 
							
								Contradiction.  I.e.  deadlock is impossible.  Q.E.D.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									These operations are guaranteed to avoid loop creation.  Indeed,
							 | 
						||
| 
								 | 
							
								the only operation that could introduce loops is cross-directory rename.
							 | 
						||
| 
								 | 
							
								Since the only new (parent, child) pair added by rename() is (new parent,
							 | 
						||
| 
								 | 
							
								source), such loop would have to contain these objects and the rest of it
							 | 
						||
| 
								 | 
							
								would have to exist before rename().  I.e. at the moment of loop creation
							 | 
						||
| 
								 | 
							
								rename() responsible for that would be holding filesystem lock and new parent
							 | 
						||
| 
								 | 
							
								would have to be equal to or a descendent of source.  But that means that
							 | 
						||
| 
								 | 
							
								new parent had been equal to or a descendent of source since the moment when
							 | 
						||
| 
								 | 
							
								we had acquired filesystem lock and rename() would fail with -ELOOP in that
							 | 
						||
| 
								 | 
							
								case.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									While this locking scheme works for arbitrary DAGs, it relies on
							 | 
						||
| 
								 | 
							
								ability to check that directory is a descendent of another object.  Current
							 | 
						||
| 
								 | 
							
								implementation assumes that directory graph is a tree.  This assumption is
							 | 
						||
| 
								 | 
							
								also preserved by all operations (cross-directory rename on a tree that would
							 | 
						||
| 
								 | 
							
								not introduce a cycle will leave it a tree and link() fails for directories).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
									Notice that "directory" in the above == "anything that might have
							 | 
						||
| 
								 | 
							
								children", so if we are going to introduce hybrid objects we will need
							 | 
						||
| 
								 | 
							
								either to make sure that link(2) doesn't work for them or to make changes
							 | 
						||
| 
								 | 
							
								in is_subdir() that would make it work even in presence of such beasts.
							 |