| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  | =============================================================================== | 
					
						
							|  |  |  | WHAT IS EXOFS? | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | exofs is a file system that uses an OSD and exports the API of a normal Linux | 
					
						
							|  |  |  | file system. Users access exofs like any other local file system, and exofs | 
					
						
							|  |  |  | will in turn issue commands to the local OSD initiator. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | OSD is a new T10 command set that views storage devices not as a large/flat | 
					
						
							|  |  |  | array of sectors but as a container of objects, each having a length, quota, | 
					
						
							|  |  |  | time attributes and more. Each object is addressed by a 64bit ID, and is | 
					
						
							|  |  |  | contained in a 64bit ID partition. Each object has associated attributes | 
					
						
							|  |  |  | attached to it, which are integral part of the object and provide metadata about | 
					
						
							|  |  |  | the object. The standard defines some common obligatory attributes, but user | 
					
						
							|  |  |  | attributes can be added as needed. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | ENVIRONMENT | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To use this file system, you need to have an object store to run it on.  You | 
					
						
							|  |  |  | may download a target from: | 
					
						
							|  |  |  | http://open-osd.org | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | See Documentation/scsi/osd.txt for how to setup a working osd environment. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | USAGE | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1. Download and compile exofs and open-osd initiator: | 
					
						
							|  |  |  |   You need an external Kernel source tree or kernel headers from your | 
					
						
							|  |  |  |   distribution. (anything based on 2.6.26 or later). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   a. download open-osd including exofs source using: | 
					
						
							|  |  |  |      [parent-directory]$ git clone git://git.open-osd.org/open-osd.git | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   b. Build the library module like this: | 
					
						
							|  |  |  |      [parent-directory]$ make -C KSRC=$(KER_DIR) open-osd | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |      This will build both the open-osd initiator as well as the exofs kernel | 
					
						
							|  |  |  |      module. Use whatever parameters you compiled your Kernel with and | 
					
						
							|  |  |  |      $(KER_DIR) above pointing to the Kernel you compile against. See the file | 
					
						
							|  |  |  |      open-osd/top-level-Makefile for an example. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2. Get the OSD initiator and target set up properly, and login to the target. | 
					
						
							|  |  |  |   See Documentation/scsi/osd.txt for farther instructions. Also see ./do-osd | 
					
						
							|  |  |  |   for example script that does all these steps. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 3. Insmod the exofs.ko module: | 
					
						
							|  |  |  |    [exofs]$ insmod exofs.ko | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 4. Make sure the directory where you want to mount exists. If not, create it. | 
					
						
							|  |  |  |    (For example, mkdir /mnt/exofs) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 5. At first run you will need to invoke the mkfs.exofs application | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    As an example, this will create the file system on: | 
					
						
							|  |  |  |    /dev/osd0 partition ID 65536 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    mkfs.exofs --pid=65536 --format /dev/osd0 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-07-27 13:26:32 -03:00
										 |  |  |    The --format is optional. If not specified, no OSD_FORMAT will be | 
					
						
							|  |  |  |    performed and a clean file system will be created in the specified pid, | 
					
						
							| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  |    in the available space of the target. (Use --format=size_in_meg to limit | 
					
						
							|  |  |  |    the total LUN space available) | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-07-27 13:26:32 -03:00
										 |  |  |    If pid already exists, it will be deleted and a new one will be created in | 
					
						
							|  |  |  |    its place. Be careful. | 
					
						
							| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  | 
 | 
					
						
							|  |  |  |    An exofs lives inside a single OSD partition. You can create multiple exofs | 
					
						
							|  |  |  |    filesystems on the same device using multiple pids. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    (run mkfs.exofs without any parameters for usage help message) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 6. Mount the file system. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    For example, to mount /dev/osd0, partition ID 0x10000 on /mnt/exofs: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	mount -t exofs -o pid=65536 /dev/osd0 /mnt/exofs/ | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 7. For reference (See do-exofs example script): | 
					
						
							|  |  |  | 	do-exofs start - an example of how to perform the above steps. | 
					
						
							| 
									
										
										
										
											2009-07-27 13:26:32 -03:00
										 |  |  | 	do-exofs stop - an example of how to unmount the file system. | 
					
						
							| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  | 	do-exofs format - an example of how to format and mkfs a new exofs. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 8. Extra compilation flags (uncomment in fs/exofs/Kbuild): | 
					
						
							|  |  |  | 	CONFIG_EXOFS_DEBUG - for debug messages and extra checks. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | exofs mount options | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | Similar to any mount command: | 
					
						
							|  |  |  | 	mount -t exofs -o exofs_options /dev/osdX mount_exofs_directory | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Where: | 
					
						
							|  |  |  |     -t exofs: specifies the exofs file system | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     /dev/osdX: X is a decimal number. /dev/osdX was created after a successful | 
					
						
							|  |  |  |                login into an OSD target. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     mount_exofs_directory: The directory to mount the file system on | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     exofs specific options: Options are separated by commas (,) | 
					
						
							|  |  |  | 		pid=<integer> - The partition number to mount/create as | 
					
						
							|  |  |  |                                 container of the filesystem. | 
					
						
							| 
									
										
										
										
											2011-01-31 14:32:14 +02:00
										 |  |  |                                 This option is mandatory. integer can be | 
					
						
							|  |  |  |                                 Hex by pre-pending an 0x to the number. | 
					
						
							|  |  |  | 		osdname=<id>  - Mount by a device's osdname. | 
					
						
							|  |  |  |                                 osdname is usually a 36 character uuid of the | 
					
						
							|  |  |  |                                 form "d2683732-c906-4ee1-9dbd-c10c27bb40df". | 
					
						
							|  |  |  |                                 It is one of the device's uuid specified in the | 
					
						
							|  |  |  |                                 mkfs.exofs format command. | 
					
						
							|  |  |  |                                 If this option is specified then the /dev/osdX | 
					
						
							|  |  |  |                                 above can be empty and is ignored. | 
					
						
							| 
									
										
										
										
											2009-07-27 13:26:32 -03:00
										 |  |  |                 to=<integer>  - Timeout in ticks for a single command. | 
					
						
							| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  |                                 default is (60 * HZ) [for debugging only] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | DESIGN | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * The file system control block (AKA on-disk superblock) resides in an object | 
					
						
							|  |  |  |   with a special ID (defined in common.h). | 
					
						
							|  |  |  |   Information included in the file system control block is used to fill the | 
					
						
							|  |  |  |   in-memory superblock structure at mount time. This object is created before | 
					
						
							| 
									
										
										
										
											2009-07-27 13:26:32 -03:00
										 |  |  |   the file system is used by mkexofs.c. It contains information such as: | 
					
						
							| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  | 	- The file system's magic number | 
					
						
							|  |  |  | 	- The next inode number to be allocated | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * Each file resides in its own object and contains the data (and it will be | 
					
						
							|  |  |  |   possible to extend the file over multiple objects, though this has not been | 
					
						
							|  |  |  |   implemented yet). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * A directory is treated as a file, and essentially contains a list of <file | 
					
						
							|  |  |  |   name, inode #> pairs for files that are found in that directory. The object | 
					
						
							|  |  |  |   IDs correspond to the files' inode numbers and will be allocated according to | 
					
						
							|  |  |  |   a bitmap (stored in a separate object). Now they are allocated using a | 
					
						
							|  |  |  |   counter. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * Each file's control block (AKA on-disk inode) is stored in its object's | 
					
						
							|  |  |  |   attributes. This applies to both regular files and other types (directories, | 
					
						
							|  |  |  |   device files, symlinks, etc.). | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2009-07-27 13:26:32 -03:00
										 |  |  | * Credentials are generated per object (inode and superblock) when they are | 
					
						
							|  |  |  |   created in memory (read from disk or created). The credential works for all | 
					
						
							| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  |   operations and is used as long as the object remains in memory. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | * Async OSD operations are used whenever possible, but the target may execute | 
					
						
							|  |  |  |   them out of order. The operations that concern us are create, delete, | 
					
						
							|  |  |  |   readpage, writepage, update_inode, and truncate. The following pairs of | 
					
						
							|  |  |  |   operations should execute in the order written, and we need to prevent them | 
					
						
							|  |  |  |   from executing in reverse order: | 
					
						
							|  |  |  | 	- The following are handled with the OBJ_CREATED and OBJ_2BCREATED | 
					
						
							|  |  |  | 	  flags. OBJ_CREATED is set when we know the object exists on the OSD - | 
					
						
							| 
									
										
										
										
											2009-07-27 13:26:32 -03:00
										 |  |  | 	  in create's callback function, and when we successfully do a | 
					
						
							|  |  |  | 	  read_inode. | 
					
						
							| 
									
										
										
										
											2008-10-28 17:22:01 +02:00
										 |  |  | 	  OBJ_2BCREATED is set in the beginning of the create function, so we | 
					
						
							|  |  |  | 	  know that we should wait. | 
					
						
							|  |  |  | 		- create/delete: delete should wait until the object is created | 
					
						
							|  |  |  | 		  on the OSD. | 
					
						
							|  |  |  | 		- create/readpage: readpage should be able to return a page | 
					
						
							|  |  |  | 		  full of zeroes in this case. If there was a write already | 
					
						
							|  |  |  | 		  en-route (i.e. create, writepage, readpage) then the page | 
					
						
							|  |  |  | 		  would be locked, and so it would really be the same as | 
					
						
							|  |  |  | 		  create/writepage. | 
					
						
							|  |  |  | 		- create/writepage: if writepage is called for a sync write, it | 
					
						
							|  |  |  | 		  should wait until the object is created on the OSD. | 
					
						
							|  |  |  | 		  Otherwise, it should just return. | 
					
						
							|  |  |  | 		- create/truncate: truncate should wait until the object is | 
					
						
							|  |  |  | 		  created on the OSD. | 
					
						
							|  |  |  | 		- create/update_inode: update_inode should wait until the | 
					
						
							|  |  |  | 		  object is created on the OSD. | 
					
						
							|  |  |  | 	- Handled by VFS locks: | 
					
						
							|  |  |  | 		- readpage/delete: shouldn't happen because of page lock. | 
					
						
							|  |  |  | 		- writepage/delete: shouldn't happen because of page lock. | 
					
						
							|  |  |  | 		- readpage/writepage: shouldn't happen because of page lock. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | LICENSE/COPYRIGHT | 
					
						
							|  |  |  | =============================================================================== | 
					
						
							|  |  |  | The exofs file system is based on ext2 v0.5b (distributed with the Linux kernel | 
					
						
							|  |  |  | version 2.6.10).  All files include the original copyrights, and the license | 
					
						
							|  |  |  | is GPL version 2 (only version 2, as is true for the Linux kernel).  The | 
					
						
							|  |  |  | Linux kernel can be downloaded from www.kernel.org. |