| 
									
										
										
										
											2010-10-20 00:17:59 -04:00
										 |  |  | Reference counting in pnfs: | 
					
						
							|  |  |  | ========================== | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The are several inter-related caches.  We have layouts which can | 
					
						
							|  |  |  | reference multiple devices, each of which can reference multiple data servers. | 
					
						
							|  |  |  | Each data server can be referenced by multiple devices.  Each device | 
					
						
							|  |  |  | can be referenced by multiple layouts.  To keep all of this straight, | 
					
						
							|  |  |  | we need to reference count. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | struct pnfs_layout_hdr | 
					
						
							|  |  |  | ---------------------- | 
					
						
							|  |  |  | The on-the-wire command LAYOUTGET corresponds to struct | 
					
						
							|  |  |  | pnfs_layout_segment, usually referred to by the variable name lseg. | 
					
						
							|  |  |  | Each nfs_inode may hold a pointer to a cache of of these layout | 
					
						
							|  |  |  | segments in nfsi->layout, of type struct pnfs_layout_hdr. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | We reference the header for the inode pointing to it, across each | 
					
						
							|  |  |  | outstanding RPC call that references it (LAYOUTGET, LAYOUTRETURN, | 
					
						
							|  |  |  | LAYOUTCOMMIT), and for each lseg held within. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Each header is also (when non-empty) put on a list associated with | 
					
						
							|  |  |  | struct nfs_client (cl_layouts).  Being put on this list does not bump | 
					
						
							|  |  |  | the reference count, as the layout is kept around by the lseg that | 
					
						
							|  |  |  | keeps it in the list. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | deviceid_cache | 
					
						
							|  |  |  | -------------- | 
					
						
							|  |  |  | lsegs reference device ids, which are resolved per nfs_client and | 
					
						
							|  |  |  | layout driver type.  The device ids are held in a RCU cache (struct | 
					
						
							|  |  |  | nfs4_deviceid_cache).  The cache itself is referenced across each | 
					
						
							|  |  |  | mount.  The entries (struct nfs4_deviceid) themselves are held across | 
					
						
							|  |  |  | the lifetime of each lseg referencing them. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | RCU is used because the deviceid is basically a write once, read many | 
					
						
							|  |  |  | data structure.  The hlist size of 32 buckets needs better | 
					
						
							|  |  |  | justification, but seems reasonable given that we can have multiple | 
					
						
							|  |  |  | deviceid's per filesystem, and multiple filesystems per nfs_client. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The hash code is copied from the nfsd code base.  A discussion of | 
					
						
							|  |  |  | hashing and variations of this algorithm can be found at: | 
					
						
							|  |  |  | http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | data server cache | 
					
						
							|  |  |  | ----------------- | 
					
						
							|  |  |  | file driver devices refer to data servers, which are kept in a module | 
					
						
							|  |  |  | level cache.  Its reference is held over the lifetime of the deviceid | 
					
						
							|  |  |  | pointing to it. | 
					
						
							| 
									
										
										
										
											2011-03-01 01:34:23 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | lseg | 
					
						
							|  |  |  | ---- | 
					
						
							|  |  |  | lseg maintains an extra reference corresponding to the NFS_LSEG_VALID | 
					
						
							|  |  |  | bit which holds it in the pnfs_layout_hdr's list.  When the final lseg | 
					
						
							|  |  |  | is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED | 
					
						
							|  |  |  | bit is set, preventing any new lsegs from being added. | 
					
						
							| 
									
										
										
										
											2012-03-19 20:47:58 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | layout drivers | 
					
						
							|  |  |  | -------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | PNFS utilizes what is called layout drivers. The STD defines 3 basic | 
					
						
							|  |  |  | layout types: "files" "objects" and "blocks". For each of these types | 
					
						
							|  |  |  | there is a layout-driver with a common function-vectors table which | 
					
						
							|  |  |  | are called by the nfs-client pnfs-core to implement the different layout | 
					
						
							|  |  |  | types. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Files-layout-driver code is in: fs/nfs/nfs4filelayout.c && nfs4filelayoutdev.c | 
					
						
							|  |  |  | Objects-layout-deriver code is in: fs/nfs/objlayout/.. directory | 
					
						
							|  |  |  | Blocks-layout-deriver code is in: fs/nfs/blocklayout/.. directory | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | objects-layout setup | 
					
						
							|  |  |  | -------------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | As part of the full STD implementation the objlayoutdriver.ko needs, at times, | 
					
						
							|  |  |  | to automatically login to yet undiscovered iscsi/osd devices. For this the | 
					
						
							|  |  |  | driver makes up-calles to a user-mode script called *osd_login* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The path_name of the script to use is by default: | 
					
						
							|  |  |  | 	/sbin/osd_login. | 
					
						
							|  |  |  | This name can be overridden by the Kernel module parameter: | 
					
						
							|  |  |  | 	objlayoutdriver.osd_login_prog | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If Kernel does not find the osd_login_prog path it will zero it out | 
					
						
							|  |  |  | and will not attempt farther logins. An admin can then write new value | 
					
						
							|  |  |  | to the objlayoutdriver.osd_login_prog Kernel parameter to re-enable it. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The /sbin/osd_login is part of the nfs-utils package, and should usually | 
					
						
							|  |  |  | be installed on distributions that support this Kernel version. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The API to the login script is as follows: | 
					
						
							|  |  |  | 	Usage: $0 -u <URI> -o <OSDNAME> -s <SYSTEMID> | 
					
						
							|  |  |  | 	Options: | 
					
						
							|  |  |  | 		-u		target uri e.g. iscsi://<ip>:<port> | 
					
						
							|  |  |  | 				(allways exists) | 
					
						
							|  |  |  | 				(More protocols can be defined in the future. | 
					
						
							|  |  |  | 				 The client does not interpret this string it is | 
					
						
							| 
									
										
										
										
											2012-04-10 00:22:13 +09:00
										 |  |  | 				 passed unchanged as received from the Server) | 
					
						
							| 
									
										
										
										
											2012-03-19 20:47:58 -07:00
										 |  |  | 		-o		osdname of the requested target OSD | 
					
						
							|  |  |  | 				(Might be empty) | 
					
						
							|  |  |  | 				(A string which denotes the OSD name, there is a | 
					
						
							|  |  |  | 				 limit of 64 chars on this string) | 
					
						
							|  |  |  | 		-s 		systemid of the requested target OSD | 
					
						
							|  |  |  | 				(Might be empty) | 
					
						
							|  |  |  | 				(This string, if not empty is always an hex | 
					
						
							|  |  |  | 				 representation of the 20 bytes osd_system_id) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | blocks-layout setup | 
					
						
							|  |  |  | ------------------- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | TODO: Document the setup needs of the blocks layout driver |