314 lines
		
	
	
	
		
			13 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
		
		
			
		
	
	
			314 lines
		
	
	
	
		
			13 KiB
			
		
	
	
	
		
			Text
		
	
	
	
	
	
|   | 	     ==================================================== | ||
|  | 	     IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT | ||
|  | 	     ==================================================== | ||
|  | 
 | ||
|  | By: David Howells <dhowells@redhat.com> | ||
|  | 
 | ||
|  | Contents: | ||
|  | 
 | ||
|  |  (*) Representation | ||
|  | 
 | ||
|  |  (*) Object management state machine. | ||
|  | 
 | ||
|  |      - Provision of cpu time. | ||
|  |      - Locking simplification. | ||
|  | 
 | ||
|  |  (*) The set of states. | ||
|  | 
 | ||
|  |  (*) The set of events. | ||
|  | 
 | ||
|  | 
 | ||
|  | ============== | ||
|  | REPRESENTATION | ||
|  | ============== | ||
|  | 
 | ||
|  | FS-Cache maintains an in-kernel representation of each object that a netfs is | ||
|  | currently interested in.  Such objects are represented by the fscache_cookie | ||
|  | struct and are referred to as cookies. | ||
|  | 
 | ||
|  | FS-Cache also maintains a separate in-kernel representation of the objects that | ||
|  | a cache backend is currently actively caching.  Such objects are represented by | ||
|  | the fscache_object struct.  The cache backends allocate these upon request, and | ||
|  | are expected to embed them in their own representations.  These are referred to | ||
|  | as objects. | ||
|  | 
 | ||
|  | There is a 1:N relationship between cookies and objects.  A cookie may be | ||
|  | represented by multiple objects - an index may exist in more than one cache - | ||
|  | or even by no objects (it may not be cached). | ||
|  | 
 | ||
|  | Furthermore, both cookies and objects are hierarchical.  The two hierarchies | ||
|  | correspond, but the cookies tree is a superset of the union of the object trees | ||
|  | of multiple caches: | ||
|  | 
 | ||
|  | 	    NETFS INDEX TREE               :      CACHE 1     :      CACHE 2 | ||
|  | 	                                   :                  : | ||
|  | 	                                   :   +-----------+  : | ||
|  | 	                          +----------->|  IObject  |  : | ||
|  | 	      +-----------+       |        :   +-----------+  : | ||
|  | 	      |  ICookie  |-------+        :         |        : | ||
|  | 	      +-----------+       |        :         |        :   +-----------+ | ||
|  | 	            |             +------------------------------>|  IObject  | | ||
|  | 	            |                      :         |        :   +-----------+ | ||
|  | 	            |                      :         V        :         | | ||
|  | 	            |                      :   +-----------+  :         | | ||
|  | 	            V             +----------->|  IObject  |  :         | | ||
|  | 	      +-----------+       |        :   +-----------+  :         | | ||
|  | 	      |  ICookie  |-------+        :         |        :         V | ||
|  | 	      +-----------+       |        :         |        :   +-----------+ | ||
|  | 	            |             +------------------------------>|  IObject  | | ||
|  | 	      +-----+-----+                :         |        :   +-----------+ | ||
|  | 	      |           |                :         |        :         | | ||
|  | 	      V           |                :         V        :         | | ||
|  | 	+-----------+     |                :   +-----------+  :         | | ||
|  | 	|  ICookie  |------------------------->|  IObject  |  :         | | ||
|  | 	+-----------+     |                :   +-----------+  :         | | ||
|  | 	      |           V                :         |        :         V | ||
|  | 	      |     +-----------+          :         |        :   +-----------+ | ||
|  | 	      |     |  ICookie  |-------------------------------->|  IObject  | | ||
|  | 	      |     +-----------+          :         |        :   +-----------+ | ||
|  | 	      V           |                :         V        :         | | ||
|  | 	+-----------+     |                :   +-----------+  :         | | ||
|  | 	|  DCookie  |------------------------->|  DObject  |  :         | | ||
|  | 	+-----------+     |                :   +-----------+  :         | | ||
|  | 	                  |                :                  :         | | ||
|  | 	          +-------+-------+        :                  :         | | ||
|  | 	          |               |        :                  :         | | ||
|  | 	          V               V        :                  :         V | ||
|  | 	    +-----------+   +-----------+  :                  :   +-----------+ | ||
|  | 	    |  DCookie  |   |  DCookie  |------------------------>|  DObject  | | ||
|  | 	    +-----------+   +-----------+  :                  :   +-----------+ | ||
|  | 	                                   :                  : | ||
|  | 
 | ||
|  | In the above illustration, ICookie and IObject represent indices and DCookie | ||
|  | and DObject represent data storage objects.  Indices may have representation in | ||
|  | multiple caches, but currently, non-index objects may not.  Objects of any type | ||
|  | may also be entirely unrepresented. | ||
|  | 
 | ||
|  | As far as the netfs API goes, the netfs is only actually permitted to see | ||
|  | pointers to the cookies.  The cookies themselves and any objects attached to | ||
|  | those cookies are hidden from it. | ||
|  | 
 | ||
|  | 
 | ||
|  | =============================== | ||
|  | OBJECT MANAGEMENT STATE MACHINE | ||
|  | =============================== | ||
|  | 
 | ||
|  | Within FS-Cache, each active object is managed by its own individual state | ||
|  | machine.  The state for an object is kept in the fscache_object struct, in | ||
|  | object->state.  A cookie may point to a set of objects that are in different | ||
|  | states. | ||
|  | 
 | ||
|  | Each state has an action associated with it that is invoked when the machine | ||
|  | wakes up in that state.  There are four logical sets of states: | ||
|  | 
 | ||
|  |  (1) Preparation: states that wait for the parent objects to become ready.  The | ||
|  |      representations are hierarchical, and it is expected that an object must | ||
|  |      be created or accessed with respect to its parent object. | ||
|  | 
 | ||
|  |  (2) Initialisation: states that perform lookups in the cache and validate | ||
|  |      what's found and that create on disk any missing metadata. | ||
|  | 
 | ||
|  |  (3) Normal running: states that allow netfs operations on objects to proceed | ||
|  |      and that update the state of objects. | ||
|  | 
 | ||
|  |  (4) Termination: states that detach objects from their netfs cookies, that | ||
|  |      delete objects from disk, that handle disk and system errors and that free | ||
|  |      up in-memory resources. | ||
|  | 
 | ||
|  | 
 | ||
|  | In most cases, transitioning between states is in response to signalled events. | ||
|  | When a state has finished processing, it will usually set the mask of events in | ||
|  | which it is interested (object->event_mask) and relinquish the worker thread. | ||
|  | Then when an event is raised (by calling fscache_raise_event()), if the event | ||
|  | is not masked, the object will be queued for processing (by calling | ||
|  | fscache_enqueue_object()). | ||
|  | 
 | ||
|  | 
 | ||
|  | PROVISION OF CPU TIME | ||
|  | --------------------- | ||
|  | 
 | ||
|  | The work to be done by the various states is given CPU time by the threads of | ||
|  | the slow work facility (see Documentation/slow-work.txt).  This is used in | ||
|  | preference to the workqueue facility because: | ||
|  | 
 | ||
|  |  (1) Threads may be completely occupied for very long periods of time by a | ||
|  |      particular work item.  These state actions may be doing sequences of | ||
|  |      synchronous, journalled disk accesses (lookup, mkdir, create, setxattr, | ||
|  |      getxattr, truncate, unlink, rmdir, rename). | ||
|  | 
 | ||
|  |  (2) Threads may do little actual work, but may rather spend a lot of time | ||
|  |      sleeping on I/O.  This means that single-threaded and 1-per-CPU-threaded | ||
|  |      workqueues don't necessarily have the right numbers of threads. | ||
|  | 
 | ||
|  | 
 | ||
|  | LOCKING SIMPLIFICATION | ||
|  | ---------------------- | ||
|  | 
 | ||
|  | Because only one worker thread may be operating on any particular object's | ||
|  | state machine at once, this simplifies the locking, particularly with respect | ||
|  | to disconnecting the netfs's representation of a cache object (fscache_cookie) | ||
|  | from the cache backend's representation (fscache_object) - which may be | ||
|  | requested from either end. | ||
|  | 
 | ||
|  | 
 | ||
|  | ================= | ||
|  | THE SET OF STATES | ||
|  | ================= | ||
|  | 
 | ||
|  | The object state machine has a set of states that it can be in.  There are | ||
|  | preparation states in which the object sets itself up and waits for its parent | ||
|  | object to transit to a state that allows access to its children: | ||
|  | 
 | ||
|  |  (1) State FSCACHE_OBJECT_INIT. | ||
|  | 
 | ||
|  |      Initialise the object and wait for the parent object to become active.  In | ||
|  |      the cache, it is expected that it will not be possible to look an object | ||
|  |      up from the parent object, until that parent object itself has been looked | ||
|  |      up. | ||
|  | 
 | ||
|  | There are initialisation states in which the object sets itself up and accesses | ||
|  | disk for the object metadata: | ||
|  | 
 | ||
|  |  (2) State FSCACHE_OBJECT_LOOKING_UP. | ||
|  | 
 | ||
|  |      Look up the object on disk, using the parent as a starting point. | ||
|  |      FS-Cache expects the cache backend to probe the cache to see whether this | ||
|  |      object is represented there, and if it is, to see if it's valid (coherency | ||
|  |      management). | ||
|  | 
 | ||
|  |      The cache should call fscache_object_lookup_negative() to indicate lookup | ||
|  |      failure for whatever reason, and should call fscache_obtained_object() to | ||
|  |      indicate success. | ||
|  | 
 | ||
|  |      At the completion of lookup, FS-Cache will let the netfs go ahead with | ||
|  |      read operations, no matter whether the file is yet cached.  If not yet | ||
|  |      cached, read operations will be immediately rejected with ENODATA until | ||
|  |      the first known page is uncached - as to that point there can be no data | ||
|  |      to be read out of the cache for that file that isn't currently also held | ||
|  |      in the pagecache. | ||
|  | 
 | ||
|  |  (3) State FSCACHE_OBJECT_CREATING. | ||
|  | 
 | ||
|  |      Create an object on disk, using the parent as a starting point.  This | ||
|  |      happens if the lookup failed to find the object, or if the object's | ||
|  |      coherency data indicated what's on disk is out of date.  In this state, | ||
|  |      FS-Cache expects the cache to create | ||
|  | 
 | ||
|  |      The cache should call fscache_obtained_object() if creation completes | ||
|  |      successfully, fscache_object_lookup_negative() otherwise. | ||
|  | 
 | ||
|  |      At the completion of creation, FS-Cache will start processing write | ||
|  |      operations the netfs has queued for an object.  If creation failed, the | ||
|  |      write ops will be transparently discarded, and nothing recorded in the | ||
|  |      cache. | ||
|  | 
 | ||
|  | There are some normal running states in which the object spends its time | ||
|  | servicing netfs requests: | ||
|  | 
 | ||
|  |  (4) State FSCACHE_OBJECT_AVAILABLE. | ||
|  | 
 | ||
|  |      A transient state in which pending operations are started, child objects | ||
|  |      are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary | ||
|  |      lookup data is freed. | ||
|  | 
 | ||
|  |  (5) State FSCACHE_OBJECT_ACTIVE. | ||
|  | 
 | ||
|  |      The normal running state.  In this state, requests the netfs makes will be | ||
|  |      passed on to the cache. | ||
|  | 
 | ||
|  |  (6) State FSCACHE_OBJECT_UPDATING. | ||
|  | 
 | ||
|  |      The state machine comes here to update the object in the cache from the | ||
|  |      netfs's records.  This involves updating the auxiliary data that is used | ||
|  |      to maintain coherency. | ||
|  | 
 | ||
|  | And there are terminal states in which an object cleans itself up, deallocates | ||
|  | memory and potentially deletes stuff from disk: | ||
|  | 
 | ||
|  |  (7) State FSCACHE_OBJECT_LC_DYING. | ||
|  | 
 | ||
|  |      The object comes here if it is dying because of a lookup or creation | ||
|  |      error.  This would be due to a disk error or system error of some sort. | ||
|  |      Temporary data is cleaned up, and the parent is released. | ||
|  | 
 | ||
|  |  (8) State FSCACHE_OBJECT_DYING. | ||
|  | 
 | ||
|  |      The object comes here if it is dying due to an error, because its parent | ||
|  |      cookie has been relinquished by the netfs or because the cache is being | ||
|  |      withdrawn. | ||
|  | 
 | ||
|  |      Any child objects waiting on this one are given CPU time so that they too | ||
|  |      can destroy themselves.  This object waits for all its children to go away | ||
|  |      before advancing to the next state. | ||
|  | 
 | ||
|  |  (9) State FSCACHE_OBJECT_ABORT_INIT. | ||
|  | 
 | ||
|  |      The object comes to this state if it was waiting on its parent in | ||
|  |      FSCACHE_OBJECT_INIT, but its parent died.  The object will destroy itself | ||
|  |      so that the parent may proceed from the FSCACHE_OBJECT_DYING state. | ||
|  | 
 | ||
|  | (10) State FSCACHE_OBJECT_RELEASING. | ||
|  | (11) State FSCACHE_OBJECT_RECYCLING. | ||
|  | 
 | ||
|  |      The object comes to one of these two states when dying once it is rid of | ||
|  |      all its children, if it is dying because the netfs relinquished its | ||
|  |      cookie.  In the first state, the cached data is expected to persist, and | ||
|  |      in the second it will be deleted. | ||
|  | 
 | ||
|  | (12) State FSCACHE_OBJECT_WITHDRAWING. | ||
|  | 
 | ||
|  |      The object transits to this state if the cache decides it wants to | ||
|  |      withdraw the object from service, perhaps to make space, but also due to | ||
|  |      error or just because the whole cache is being withdrawn. | ||
|  | 
 | ||
|  | (13) State FSCACHE_OBJECT_DEAD. | ||
|  | 
 | ||
|  |      The object transits to this state when the in-memory object record is | ||
|  |      ready to be deleted.  The object processor shouldn't ever see an object in | ||
|  |      this state. | ||
|  | 
 | ||
|  | 
 | ||
|  | THE SET OF EVENTS | ||
|  | ----------------- | ||
|  | 
 | ||
|  | There are a number of events that can be raised to an object state machine: | ||
|  | 
 | ||
|  |  (*) FSCACHE_OBJECT_EV_UPDATE | ||
|  | 
 | ||
|  |      The netfs requested that an object be updated.  The state machine will ask | ||
|  |      the cache backend to update the object, and the cache backend will ask the | ||
|  |      netfs for details of the change through its cookie definition ops. | ||
|  | 
 | ||
|  |  (*) FSCACHE_OBJECT_EV_CLEARED | ||
|  | 
 | ||
|  |      This is signalled in two circumstances: | ||
|  | 
 | ||
|  |      (a) when an object's last child object is dropped and | ||
|  | 
 | ||
|  |      (b) when the last operation outstanding on an object is completed. | ||
|  | 
 | ||
|  |      This is used to proceed from the dying state. | ||
|  | 
 | ||
|  |  (*) FSCACHE_OBJECT_EV_ERROR | ||
|  | 
 | ||
|  |      This is signalled when an I/O error occurs during the processing of some | ||
|  |      object. | ||
|  | 
 | ||
|  |  (*) FSCACHE_OBJECT_EV_RELEASE | ||
|  |  (*) FSCACHE_OBJECT_EV_RETIRE | ||
|  | 
 | ||
|  |      These are signalled when the netfs relinquishes a cookie it was using. | ||
|  |      The event selected depends on whether the netfs asks for the backing | ||
|  |      object to be retired (deleted) or retained. | ||
|  | 
 | ||
|  |  (*) FSCACHE_OBJECT_EV_WITHDRAW | ||
|  | 
 | ||
|  |      This is signalled when the cache backend wants to withdraw an object. | ||
|  |      This means that the object will have to be detached from the netfs's | ||
|  |      cookie. | ||
|  | 
 | ||
|  | Because the withdrawing releasing/retiring events are all handled by the object | ||
|  | state machine, it doesn't matter if there's a collision with both ends trying | ||
|  | to sever the connection at the same time.  The state machine can just pick | ||
|  | which one it wants to honour, and that effects the other. |