| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | Making Filesystems Exportable | 
					
						
							|  |  |  | ============================= | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | Overview | 
					
						
							|  |  |  | -------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | All filesystem operations require a dentry (or two) as a starting | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | point.  Local applications have a reference-counted hold on suitable | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | dentries via open file descriptors or cwd/root.  However remote | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | applications that access a filesystem via a remote filesystem protocol | 
					
						
							|  |  |  | such as NFS may not be able to hold such a reference, and so need a | 
					
						
							|  |  |  | different way to refer to a particular dentry.  As the alternative | 
					
						
							|  |  |  | form of reference needs to be stable across renames, truncates, and | 
					
						
							|  |  |  | server-reboot (among other things, though these tend to be the most | 
					
						
							|  |  |  | problematic), there is no simple answer like 'filename'. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The mechanism discussed here allows each filesystem implementation to | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | specify how to generate an opaque (outside of the filesystem) byte | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | string for any dentry, and how to find an appropriate dentry for any | 
					
						
							|  |  |  | given opaque byte string. | 
					
						
							|  |  |  | This byte string will be called a "filehandle fragment" as it | 
					
						
							|  |  |  | corresponds to part of an NFS filehandle. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A filesystem which supports the mapping between filehandle fragments | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | and dentries will be termed "exportable". | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Dcache Issues | 
					
						
							|  |  |  | ------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The dcache normally contains a proper prefix of any given filesystem | 
					
						
							|  |  |  | tree.  This means that if any filesystem object is in the dcache, then | 
					
						
							|  |  |  | all of the ancestors of that filesystem object are also in the dcache. | 
					
						
							|  |  |  | As normal access is by filename this prefix is created naturally and | 
					
						
							|  |  |  | maintained easily (by each object maintaining a reference count on | 
					
						
							|  |  |  | its parent). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | However when objects are included into the dcache by interpreting a | 
					
						
							|  |  |  | filehandle fragment, there is no automatic creation of a path prefix | 
					
						
							|  |  |  | for the object.  This leads to two related but distinct features of | 
					
						
							|  |  |  | the dcache that are not needed for normal filesystem access. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1/ The dcache must sometimes contain objects that are not part of the | 
					
						
							|  |  |  |    proper prefix. i.e that are not connected to the root. | 
					
						
							|  |  |  | 2/ The dcache must be prepared for a newly found (via ->lookup) directory | 
					
						
							|  |  |  |    to already have a (non-connected) dentry, and must be able to move | 
					
						
							|  |  |  |    that dentry into place (based on the parent and name in the | 
					
						
							|  |  |  |    ->lookup).   This is particularly needed for directories as | 
					
						
							|  |  |  |    it is a dcache invariant that directories only have one dentry. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To implement these features, the dcache has: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | a/ A dentry flag DCACHE_DISCONNECTED which is set on | 
					
						
							|  |  |  |    any dentry that might not be part of the proper prefix. | 
					
						
							|  |  |  |    This is set when anonymous dentries are created, and cleared when a | 
					
						
							|  |  |  |    dentry is noticed to be a child of a dentry which is in the proper | 
					
						
							|  |  |  |    prefix.  | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | b/ A per-superblock list "s_anon" of dentries which are the roots of | 
					
						
							|  |  |  |    subtrees that are not in the proper prefix.  These dentries, as | 
					
						
							|  |  |  |    well as the proper prefix, need to be released at unmount time.  As | 
					
						
							|  |  |  |    these dentries will not be hashed, they are linked together on the | 
					
						
							|  |  |  |    d_hash list_head. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | c/ Helper routines to allocate anonymous dentries, and to help attach | 
					
						
							|  |  |  |    loose directory dentries at lookup time. They are: | 
					
						
							| 
									
										
										
										
											2014-02-18 12:31:31 -05:00
										 |  |  |     d_obtain_alias(inode) will return a dentry for the given inode. | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  |       If the inode already has a dentry, one of those is returned. | 
					
						
							|  |  |  |       If it doesn't, a new anonymous (IS_ROOT and | 
					
						
							|  |  |  |         DCACHE_DISCONNECTED) dentry is allocated and attached. | 
					
						
							|  |  |  |       In the case of a directory, care is taken that only one dentry | 
					
						
							|  |  |  |       can ever be attached. | 
					
						
							| 
									
										
										
										
											2014-10-12 22:24:21 -04:00
										 |  |  |     d_splice_alias(inode, dentry) will introduce a new dentry into the tree; | 
					
						
							|  |  |  |       either the passed-in dentry or a preexisting alias for the given inode | 
					
						
							|  |  |  |       (such as an anonymous one created by d_obtain_alias), if appropriate. | 
					
						
							|  |  |  |       It returns NULL when the passed-in dentry is used, following the calling | 
					
						
							|  |  |  |       convention of ->lookup. | 
					
						
							| 
									
										
										
										
											2014-02-18 12:31:31 -05:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  |   | 
					
						
							|  |  |  | Filesystem Issues | 
					
						
							|  |  |  | ----------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For a filesystem to be exportable it must: | 
					
						
							|  |  |  |   | 
					
						
							|  |  |  |    1/ provide the filehandle fragment routines described below. | 
					
						
							|  |  |  |    2/ make sure that d_splice_alias is used rather than d_add | 
					
						
							|  |  |  |       when ->lookup finds an inode for a given parent and name. | 
					
						
							| 
									
										
										
										
											2011-07-26 03:40:45 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-08-20 20:33:17 +09:00
										 |  |  |       If inode is NULL, d_splice_alias(inode, dentry) is equivalent to | 
					
						
							| 
									
										
										
										
											2011-07-26 03:40:45 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | 		d_add(dentry, inode), NULL | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       Typically the ->lookup routine will simply end with a: | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 		return d_splice_alias(inode, dentry); | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 	} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   A file system implementation declares that instances of the filesystem | 
					
						
							|  |  |  | are exportable by setting the s_export_op field in the struct | 
					
						
							|  |  |  | super_block.  This field must point to a "struct export_operations" | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | struct which has the following members: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  encode_fh  (optional) | 
					
						
							|  |  |  |     Takes a dentry and creates a filehandle fragment which can later be used | 
					
						
							|  |  |  |     to find or create a dentry for the same object.  The default | 
					
						
							|  |  |  |     implementation creates a filehandle fragment that encodes a 32bit inode | 
					
						
							|  |  |  |     and generation number for the inode encoded, and if necessary the | 
					
						
							|  |  |  |     same information for the parent. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   fh_to_dentry (mandatory) | 
					
						
							|  |  |  |     Given a filehandle fragment, this should find the implied object and | 
					
						
							| 
									
										
										
										
											2014-02-18 12:31:31 -05:00
										 |  |  |     create a dentry for it (possibly with d_obtain_alias). | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  |   fh_to_parent (optional but strongly recommended) | 
					
						
							|  |  |  |     Given a filehandle fragment, this should find the parent of the | 
					
						
							| 
									
										
										
										
											2014-02-18 12:31:31 -05:00
										 |  |  |     implied object and create a dentry for it (possibly with | 
					
						
							|  |  |  |     d_obtain_alias).  May fail if the filehandle fragment is too small. | 
					
						
							| 
									
										
										
										
											2007-10-21 16:42:19 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  |   get_parent (optional but strongly recommended) | 
					
						
							|  |  |  |     When given a dentry for a directory, this should return  a dentry for | 
					
						
							|  |  |  |     the parent.  Quite possibly the parent dentry will have been allocated | 
					
						
							|  |  |  |     by d_alloc_anon.  The default get_parent function just returns an error | 
					
						
							|  |  |  |     so any filehandle lookup that requires finding a parent will fail. | 
					
						
							|  |  |  |     ->lookup("..") is *not* used as a default as it can leave ".." entries | 
					
						
							|  |  |  |     in the dcache which are too messy to work with. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   get_name (optional) | 
					
						
							|  |  |  |     When given a parent dentry and a child dentry, this should find a name | 
					
						
							|  |  |  |     in the directory identified by the parent dentry, which leads to the | 
					
						
							|  |  |  |     object identified by the child dentry.  If no get_name function is | 
					
						
							|  |  |  |     supplied, a default implementation is provided which uses vfs_readdir | 
					
						
							|  |  |  |     to find potential names, and matches inode numbers to find the correct | 
					
						
							|  |  |  |     match. | 
					
						
							| 
									
										
										
										
											2005-04-16 15:20:36 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A filehandle fragment consists of an array of 1 or more 4byte words, | 
					
						
							|  |  |  | together with a one byte "type". | 
					
						
							|  |  |  | The decode_fh routine should not depend on the stated size that is | 
					
						
							|  |  |  | passed to it.  This size may be larger than the original filehandle | 
					
						
							|  |  |  | generated by encode_fh, in which case it will have been padded with | 
					
						
							|  |  |  | nuls.  Rather, the encode_fh routine should choose a "type" which | 
					
						
							|  |  |  | indicates the decode_fh how much of the filehandle is valid, and how | 
					
						
							|  |  |  | it should be interpreted. |