Commit graph

142798 commits

Author SHA1 Message Date
David Howells
36c9559022 FS-Cache: Object management state machine
Implement the cache object management state machine.

The following documentation is added to illuminate the working of this state
machine.  It will also be added as:

	Documentation/filesystems/caching/object.txt

	     ====================================================
	     IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
	     ====================================================

==============
REPRESENTATION
==============

FS-Cache maintains an in-kernel representation of each object that a netfs is
currently interested in.  Such objects are represented by the fscache_cookie
struct and are referred to as cookies.

FS-Cache also maintains a separate in-kernel representation of the objects that
a cache backend is currently actively caching.  Such objects are represented by
the fscache_object struct.  The cache backends allocate these upon request, and
are expected to embed them in their own representations.  These are referred to
as objects.

There is a 1:N relationship between cookies and objects.  A cookie may be
represented by multiple objects - an index may exist in more than one cache -
or even by no objects (it may not be cached).

Furthermore, both cookies and objects are hierarchical.  The two hierarchies
correspond, but the cookies tree is a superset of the union of the object trees
of multiple caches:

	    NETFS INDEX TREE               :      CACHE 1     :      CACHE 2
	                                   :                  :
	                                   :   +-----------+  :
	                          +----------->|  IObject  |  :
	      +-----------+       |        :   +-----------+  :
	      |  ICookie  |-------+        :         |        :
	      +-----------+       |        :         |        :   +-----------+
	            |             +------------------------------>|  IObject  |
	            |                      :         |        :   +-----------+
	            |                      :         V        :         |
	            |                      :   +-----------+  :         |
	            V             +----------->|  IObject  |  :         |
	      +-----------+       |        :   +-----------+  :         |
	      |  ICookie  |-------+        :         |        :         V
	      +-----------+       |        :         |        :   +-----------+
	            |             +------------------------------>|  IObject  |
	      +-----+-----+                :         |        :   +-----------+
	      |           |                :         |        :         |
	      V           |                :         V        :         |
	+-----------+     |                :   +-----------+  :         |
	|  ICookie  |------------------------->|  IObject  |  :         |
	+-----------+     |                :   +-----------+  :         |
	      |           V                :         |        :         V
	      |     +-----------+          :         |        :   +-----------+
	      |     |  ICookie  |-------------------------------->|  IObject  |
	      |     +-----------+          :         |        :   +-----------+
	      V           |                :         V        :         |
	+-----------+     |                :   +-----------+  :         |
	|  DCookie  |------------------------->|  DObject  |  :         |
	+-----------+     |                :   +-----------+  :         |
	                  |                :                  :         |
	          +-------+-------+        :                  :         |
	          |               |        :                  :         |
	          V               V        :                  :         V
	    +-----------+   +-----------+  :                  :   +-----------+
	    |  DCookie  |   |  DCookie  |------------------------>|  DObject  |
	    +-----------+   +-----------+  :                  :   +-----------+
	                                   :                  :

In the above illustration, ICookie and IObject represent indices and DCookie
and DObject represent data storage objects.  Indices may have representation in
multiple caches, but currently, non-index objects may not.  Objects of any type
may also be entirely unrepresented.

As far as the netfs API goes, the netfs is only actually permitted to see
pointers to the cookies.  The cookies themselves and any objects attached to
those cookies are hidden from it.

===============================
OBJECT MANAGEMENT STATE MACHINE
===============================

Within FS-Cache, each active object is managed by its own individual state
machine.  The state for an object is kept in the fscache_object struct, in
object->state.  A cookie may point to a set of objects that are in different
states.

Each state has an action associated with it that is invoked when the machine
wakes up in that state.  There are four logical sets of states:

 (1) Preparation: states that wait for the parent objects to become ready.  The
     representations are hierarchical, and it is expected that an object must
     be created or accessed with respect to its parent object.

 (2) Initialisation: states that perform lookups in the cache and validate
     what's found and that create on disk any missing metadata.

 (3) Normal running: states that allow netfs operations on objects to proceed
     and that update the state of objects.

 (4) Termination: states that detach objects from their netfs cookies, that
     delete objects from disk, that handle disk and system errors and that free
     up in-memory resources.

In most cases, transitioning between states is in response to signalled events.
When a state has finished processing, it will usually set the mask of events in
which it is interested (object->event_mask) and relinquish the worker thread.
Then when an event is raised (by calling fscache_raise_event()), if the event
is not masked, the object will be queued for processing (by calling
fscache_enqueue_object()).

PROVISION OF CPU TIME
---------------------

The work to be done by the various states is given CPU time by the threads of
the slow work facility (see Documentation/slow-work.txt).  This is used in
preference to the workqueue facility because:

 (1) Threads may be completely occupied for very long periods of time by a
     particular work item.  These state actions may be doing sequences of
     synchronous, journalled disk accesses (lookup, mkdir, create, setxattr,
     getxattr, truncate, unlink, rmdir, rename).

 (2) Threads may do little actual work, but may rather spend a lot of time
     sleeping on I/O.  This means that single-threaded and 1-per-CPU-threaded
     workqueues don't necessarily have the right numbers of threads.

LOCKING SIMPLIFICATION
----------------------

Because only one worker thread may be operating on any particular object's
state machine at once, this simplifies the locking, particularly with respect
to disconnecting the netfs's representation of a cache object (fscache_cookie)
from the cache backend's representation (fscache_object) - which may be
requested from either end.

=================
THE SET OF STATES
=================

The object state machine has a set of states that it can be in.  There are
preparation states in which the object sets itself up and waits for its parent
object to transit to a state that allows access to its children:

 (1) State FSCACHE_OBJECT_INIT.

     Initialise the object and wait for the parent object to become active.  In
     the cache, it is expected that it will not be possible to look an object
     up from the parent object, until that parent object itself has been looked
     up.

There are initialisation states in which the object sets itself up and accesses
disk for the object metadata:

 (2) State FSCACHE_OBJECT_LOOKING_UP.

     Look up the object on disk, using the parent as a starting point.
     FS-Cache expects the cache backend to probe the cache to see whether this
     object is represented there, and if it is, to see if it's valid (coherency
     management).

     The cache should call fscache_object_lookup_negative() to indicate lookup
     failure for whatever reason, and should call fscache_obtained_object() to
     indicate success.

     At the completion of lookup, FS-Cache will let the netfs go ahead with
     read operations, no matter whether the file is yet cached.  If not yet
     cached, read operations will be immediately rejected with ENODATA until
     the first known page is uncached - as to that point there can be no data
     to be read out of the cache for that file that isn't currently also held
     in the pagecache.

 (3) State FSCACHE_OBJECT_CREATING.

     Create an object on disk, using the parent as a starting point.  This
     happens if the lookup failed to find the object, or if the object's
     coherency data indicated what's on disk is out of date.  In this state,
     FS-Cache expects the cache to create

     The cache should call fscache_obtained_object() if creation completes
     successfully, fscache_object_lookup_negative() otherwise.

     At the completion of creation, FS-Cache will start processing write
     operations the netfs has queued for an object.  If creation failed, the
     write ops will be transparently discarded, and nothing recorded in the
     cache.

There are some normal running states in which the object spends its time
servicing netfs requests:

 (4) State FSCACHE_OBJECT_AVAILABLE.

     A transient state in which pending operations are started, child objects
     are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary
     lookup data is freed.

 (5) State FSCACHE_OBJECT_ACTIVE.

     The normal running state.  In this state, requests the netfs makes will be
     passed on to the cache.

 (6) State FSCACHE_OBJECT_UPDATING.

     The state machine comes here to update the object in the cache from the
     netfs's records.  This involves updating the auxiliary data that is used
     to maintain coherency.

And there are terminal states in which an object cleans itself up, deallocates
memory and potentially deletes stuff from disk:

 (7) State FSCACHE_OBJECT_LC_DYING.

     The object comes here if it is dying because of a lookup or creation
     error.  This would be due to a disk error or system error of some sort.
     Temporary data is cleaned up, and the parent is released.

 (8) State FSCACHE_OBJECT_DYING.

     The object comes here if it is dying due to an error, because its parent
     cookie has been relinquished by the netfs or because the cache is being
     withdrawn.

     Any child objects waiting on this one are given CPU time so that they too
     can destroy themselves.  This object waits for all its children to go away
     before advancing to the next state.

 (9) State FSCACHE_OBJECT_ABORT_INIT.

     The object comes to this state if it was waiting on its parent in
     FSCACHE_OBJECT_INIT, but its parent died.  The object will destroy itself
     so that the parent may proceed from the FSCACHE_OBJECT_DYING state.

(10) State FSCACHE_OBJECT_RELEASING.
(11) State FSCACHE_OBJECT_RECYCLING.

     The object comes to one of these two states when dying once it is rid of
     all its children, if it is dying because the netfs relinquished its
     cookie.  In the first state, the cached data is expected to persist, and
     in the second it will be deleted.

(12) State FSCACHE_OBJECT_WITHDRAWING.

     The object transits to this state if the cache decides it wants to
     withdraw the object from service, perhaps to make space, but also due to
     error or just because the whole cache is being withdrawn.

(13) State FSCACHE_OBJECT_DEAD.

     The object transits to this state when the in-memory object record is
     ready to be deleted.  The object processor shouldn't ever see an object in
     this state.

THE SET OF EVENTS
-----------------

There are a number of events that can be raised to an object state machine:

 (*) FSCACHE_OBJECT_EV_UPDATE

     The netfs requested that an object be updated.  The state machine will ask
     the cache backend to update the object, and the cache backend will ask the
     netfs for details of the change through its cookie definition ops.

 (*) FSCACHE_OBJECT_EV_CLEARED

     This is signalled in two circumstances:

     (a) when an object's last child object is dropped and

     (b) when the last operation outstanding on an object is completed.

     This is used to proceed from the dying state.

 (*) FSCACHE_OBJECT_EV_ERROR

     This is signalled when an I/O error occurs during the processing of some
     object.

 (*) FSCACHE_OBJECT_EV_RELEASE
 (*) FSCACHE_OBJECT_EV_RETIRE

     These are signalled when the netfs relinquishes a cookie it was using.
     The event selected depends on whether the netfs asks for the backing
     object to be retired (deleted) or retained.

 (*) FSCACHE_OBJECT_EV_WITHDRAW

     This is signalled when the cache backend wants to withdraw an object.
     This means that the object will have to be detached from the netfs's
     cookie.

Because the withdrawing releasing/retiring events are all handled by the object
state machine, it doesn't matter if there's a collision with both ends trying
to sever the connection at the same time.  The state machine can just pick
which one it wants to honour, and that effects the other.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:38 +01:00
David Howells
2868cbea72 FS-Cache: Bit waiting helpers
Add helpers for use with wait_on_bit().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:38 +01:00
David Howells
726dd7ff10 FS-Cache: Add netfs registration
Add functions to register and unregister a network filesystem or other client
of the FS-Cache service.  This allocates and releases the cookie representing
the top-level index for a netfs, and makes it available to the netfs.

If the FS-Cache facility is disabled, then the calls are optimised away at
compile time.

Note that whilst this patch may appear to work with FS-Cache enabled and a
netfs attempting to use it, it will leak the cookie it allocates for the netfs
as fscache_relinquish_cookie() is implemented in a later patch.  This will
cause the slab code to emit a warning when the module is removed.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:38 +01:00
David Howells
955d00917f FS-Cache: Provide a slab for cookie allocation
Provide a slab from which can be allocated the FS-Cache cookies that will be
presented to the netfs.

Also provide a slab constructor and a function to recursively discard a cookie
and its ancestor chain.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:38 +01:00
David Howells
4c515dd47a FS-Cache: Add cache management
Implement the entry points by which a cache backend may initialise, add,
declare an error upon and withdraw a cache.

Further, an object is created in sysfs under which each cache added will get
an object created:

	/sys/fs/fscache/<cachetag>/

All of this is described in Documentation/filesystems/caching/backend-api.txt
added by a previous patch.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:37 +01:00
David Howells
0e04d4cefc FS-Cache: Add cache tag handling
Implement two features of FS-Cache:

 (1) The ability to request and release cache tags - names by which a cache may
     be known to a netfs, and thus selected for use.

 (2) An internal function by which a cache is selected by consulting the netfs,
     if the netfs wishes to be consulted.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:37 +01:00
David Howells
a6891645cf FS-Cache: Root index definition
Add a description of the root index of the cache for later patches to make use
of.

The root index is owned by FS-Cache itself.  When a netfs requests caching
facilities, FS-Cache will, if one doesn't already exist, create an entry in
the root index with the key being the name of the netfs ("AFS" for example),
and the auxiliary data holding the index structure version supplied by the
netfs:

				     FSDEF
				       |
				 +-----------+
				 |           |
				NFS         AFS
			       [v=1]       [v=1]

If an entry with the appropriate name does already exist, the version is
compared.  If the version is different, the entire subtree from that entry
will be discarded and a new entry created.

The new entry will be an index, and a cookie referring to it will be passed to
the netfs.  This is then the root handle by which the netfs accesses the
cache.  It can create whatever objects it likes in that index, including
further indices.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:37 +01:00
David Howells
7394daa8c6 FS-Cache: Add use of /proc and presentation of statistics
Make FS-Cache create its /proc interface and present various statistical
information through it.  Also provide the functions for updating this
information.

These features are enabled by:

	CONFIG_FSCACHE_PROC
	CONFIG_FSCACHE_STATS
	CONFIG_FSCACHE_HISTOGRAM

The /proc directory for FS-Cache is also exported so that caching modules can
add their own statistics there too.

The FS-Cache module is loadable at this point, and the statistics files can be
examined by userspace:

	cat /proc/fs/fscache/stats
	cat /proc/fs/fscache/histogram

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:37 +01:00
David Howells
06b3db1b9b FS-Cache: Add main configuration option, module entry points and debugging
Add the main configuration option, allowing FS-Cache to be selected; the
module entry and exit functions and the debugging stuff used by these patches.

The two configuration options added are:

	CONFIG_FSCACHE
	CONFIG_FSCACHE_DEBUG

The first enables the facility, and the second makes the debugging statements
enableable through the "debug" module parameter.  The value of this parameter
is a bitmask as described in:

	Documentation/filesystems/caching/fscache.txt

The module can be loaded at this point, but all it will do at this point in
the patch series is to start up the slow work facility and shut it down again.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:36 +01:00
David Howells
0dfc41d1ef FS-Cache: Add the FS-Cache cache backend API and documentation
Add the API for a generic facility (FS-Cache) by which caches may declare them
selves open for business, and may obtain work to be done from network
filesystems.  The header file is included by:

	#include <linux/fscache-cache.h>

Documentation for the API is also added to:

	Documentation/filesystems/caching/backend-api.txt

This API is not usable without the implementation of the utility functions
which will be added in further patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:36 +01:00
David Howells
2d6fff6370 FS-Cache: Add the FS-Cache netfs API and documentation
Add the API for a generic facility (FS-Cache) by which filesystems (such as AFS
or NFS) may call on local caching capabilities without having to know anything
about how the cache works, or even if there is a cache:

	+---------+
	|         |                        +--------------+
	|   NFS   |--+                     |              |
	|         |  |                 +-->|   CacheFS    |
	+---------+  |   +----------+  |   |  /dev/hda5   |
	             |   |          |  |   +--------------+
	+---------+  +-->|          |  |
	|         |      |          |--+
	|   AFS   |----->| FS-Cache |
	|         |      |          |--+
	+---------+  +-->|          |  |
	             |   |          |  |   +--------------+
	+---------+  |   +----------+  |   |              |
	|         |  |                 +-->|  CacheFiles  |
	|  ISOFS  |--+                     |  /var/cache  |
	|         |                        +--------------+
	+---------+

General documentation and documentation of the netfs specific API are provided
in addition to the header files.

As this patch stands, it is possible to build a filesystem against the facility
and attempt to use it.  All that will happen is that all requests will be
immediately denied as if no cache is present.

Further patches will implement the core of the facility.  The facility will
transfer requests from networking filesystems to appropriate caches if
possible, or else gracefully deny them.

If this facility is disabled in the kernel configuration, then all its
operations will trivially reduce to nothing during compilation.

WHY NOT I_MAPPING?
==================

I have added my own API to implement caching rather than using i_mapping to do
this for a number of reasons.  These have been discussed a lot on the LKML and
CacheFS mailing lists, but to summarise the basics:

 (1) Most filesystems don't do hole reportage.  Holes in files are treated as
     blocks of zeros and can't be distinguished otherwise, making it difficult
     to distinguish blocks that have been read from the network and cached from
     those that haven't.

 (2) The backing inode must be fully populated before being exposed to
     userspace through the main inode because the VM/VFS goes directly to the
     backing inode and does not interrogate the front inode's VM ops.

     Therefore:

     (a) The backing inode must fit entirely within the cache.

     (b) All backed files currently open must fit entirely within the cache at
     	 the same time.

     (c) A working set of files in total larger than the cache may not be
     	 cached.

     (d) A file may not grow larger than the available space in the cache.

     (e) A file that's open and cached, and remotely grows larger than the
     	 cache is potentially stuffed.

 (3) Writes go to the backing filesystem, and can only be transferred to the
     network when the file is closed.

 (4) There's no record of what changes have been made, so the whole file must
     be written back.

 (5) The pages belong to the backing filesystem, and all metadata associated
     with that page are relevant only to the backing filesystem, and not
     anything stacked atop it.

OVERVIEW
========

FS-Cache provides (or will provide) the following facilities:

 (1) Caches can be added / removed at any time, even whilst in use.

 (2) Adds a facility by which tags can be used to refer to caches, even if
     they're not available yet.

 (3) More than one cache can be used at once.  Caches can be selected
     explicitly by use of tags.

 (4) The netfs is provided with an interface that allows either party to
     withdraw caching facilities from a file (required for (1)).

 (5) A netfs may annotate cache objects that belongs to it.  This permits the
     storage of coherency maintenance data.

 (6) Cache objects will be pinnable and space reservations will be possible.

 (7) The interface to the netfs returns as few errors as possible, preferring
     rather to let the netfs remain oblivious.

 (8) Cookies are used to represent indices, files and other objects to the
     netfs.  The simplest cookie is just a NULL pointer - indicating nothing
     cached there.

 (9) The netfs is allowed to propose - dynamically - any index hierarchy it
     desires, though it must be aware that the index search function is
     recursive, stack space is limited, and indices can only be children of
     indices.

(10) Indices can be used to group files together to reduce key size and to make
     group invalidation easier.  The use of indices may make lookup quicker,
     but that's cache dependent.

(11) Data I/O is effectively done directly to and from the netfs's pages.  The
     netfs indicates that page A is at index B of the data-file represented by
     cookie C, and that it should be read or written.  The cache backend may or
     may not start I/O on that page, but if it does, a netfs callback will be
     invoked to indicate completion.  The I/O may be either synchronous or
     asynchronous.

(12) Cookies can be "retired" upon release.  At this point FS-Cache will mark
     them as obsolete and the index hierarchy rooted at that point will get
     recycled.

(13) The netfs provides a "match" function for index searches.  In addition to
     saying whether a match was made or not, this can also specify that an
     entry should be updated or deleted.

FS-Cache maintains a virtual index tree in which all indices, files, objects
and pages are kept.  Bits of this tree may actually reside in one or more
caches.

                                           FSDEF
                                             |
                        +------------------------------------+
                        |                                    |
                       NFS                                  AFS
                        |                                    |
           +--------------------------+                +-----------+
           |                          |                |           |
        homedir                     mirror          afs.org   redhat.com
           |                          |                            |
     +------------+           +---------------+              +----------+
     |            |           |               |              |          |
   00001        00002       00007           00125        vol00001   vol00002
     |            |           |               |                         |
 +---+---+     +-----+      +---+      +------+------+            +-----+----+
 |   |   |     |     |      |   |      |      |      |            |     |    |
PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak
                     |                                            |
                    PG0                                       +-------+
                                                              |       |
                                                            00001   00003
                                                              |
                                                          +---+---+
                                                          |   |   |
                                                         PG0 PG1 PG2

In the example above, two netfs's can be seen to be backed: NFS and AFS.  These
have different index hierarchies:

 (*) The NFS primary index will probably contain per-server indices.  Each
     server index is indexed by NFS file handles to get data file objects.
     Each data file objects can have an array of pages, but may also have
     further child objects, such as extended attributes and directory entries.
     Extended attribute objects themselves have page-array contents.

 (*) The AFS primary index contains per-cell indices.  Each cell index contains
     per-logical-volume indices.  Each of volume index contains up to three
     indices for the read-write, read-only and backup mirrors of those volumes.
     Each of these contains vnode data file objects, each of which contains an
     array of pages.

The very top index is the FS-Cache master index in which individual netfs's
have entries.

Any index object may reside in more than one cache, provided it only has index
children.  Any index with non-index object children will be assumed to only
reside in one cache.

The FS-Cache overview can be found in:

	Documentation/filesystems/caching/fscache.txt

The netfs API to FS-Cache can be found in:

	Documentation/filesystems/caching/netfs-api.txt

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:36 +01:00
David Howells
266cf658ef FS-Cache: Recruit a page flags for cache management
Recruit a page flag to aid in cache management.  The following extra flag is
defined:

 (1) PG_fscache (PG_private_2)

     The marked page is backed by a local cache and is pinning resources in the
     cache driver.

If PG_fscache is set, then things that checked for PG_private will now also
check for that.  This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:36 +01:00
David Howells
03fb3d2af9 FS-Cache: Release page->private after failed readahead
The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails.  If the filler function fails, then
the problematic page is left attached to the pagecache (with appropriate flags
set, one presumes) and the remaining to-be-attached pages are invalidated and
discarded.  This permits pages with caching references associated with them to
be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:35 +01:00
David Howells
8f0aa2f25b Document the slow work thread pool
Document the slow work thread pool.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:35 +01:00
David Howells
12e22c5e4b Make the slow work pool configurable
Make the slow work pool configurable through /proc/sys/kernel/slow-work.

 (*) /proc/sys/kernel/slow-work/min-threads

     The minimum number of threads that should be in the pool as long as it is
     in use.  This may be anywhere between 2 and max-threads.

 (*) /proc/sys/kernel/slow-work/max-threads

     The maximum number of threads that should in the pool.  This may be
     anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater.

 (*) /proc/sys/kernel/slow-work/vslow-percentage

     The percentage of active threads in the pool that may be used to execute
     very slow work items.  This may be between 1 and 99.  The resultant number
     is bounded to between 1 and one fewer than the number of active threads.
     This ensures there is always at least one thread that can process very
     slow work items, and always at least one thread that won't.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:35 +01:00
David Howells
109d9272c4 Make slow-work thread pool actually dynamic
Make the slow-work thread pool actually dynamic in the number of threads it
contains.  With this patch, it will both create additional threads when it has
extra work to do, and cull excess threads that aren't doing anything.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:35 +01:00
David Howells
07fe7cb7c7 Create a dynamically sized pool of threads for doing very slow work items
Create a dynamically sized pool of threads for doing very slow work items, such
as invoking mkdir() or rmdir() - things that may take a long time and may
sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable
for workqueues.

The number of threads is always at least a settable minimum, but more are
started when there's more work to do, up to a limit.  Because of the nature of
the load, it's not suitable for a 1-thread-per-CPU type pool.  A system with
one CPU may well want several threads.

This is used by FS-Cache to do slow caching operations in the background, such
as looking up, creating or deleting cache objects.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:35 +01:00
FUJITA Tomonori
015640edb1 [SCSI] sg: fix q->queue_lock on scsi_error_handler path
sg_rq_end_io() is called via rq->end_io. In some rare cases,
sg_rq_end_io calls blk_put_request/blk_rq_unmap_user (when a program
issuing a command has gone before the command completion; e.g. by
interrupting a program issuing a command before the command
completes).

We can't call blk_put_request/blk_rq_unmap_user in interrupt so the
commit c96952ed70 uses
execute_in_process_context().

The problem is that scsi_error_handler() calls rq->end_io too. We
can't call blk_put_request/blk_rq_unmap_user too in this path (we hold
q->queue_lock).

To avoid the above problem, in these rare cases, this patch always
uses schedule_work() instead of execute_in_process_context().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 10:23:16 -05:00
Harvey Harrison
1beb6fa85c [SCSI] replace __inline with inline
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 10:23:16 -05:00
Adrian Bunk
5880f486ef [SCSI] a2091: make 2 functions static
a2091_{detect,release}() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 10:17:31 -05:00
Adrian Bunk
9387edbe60 [SCSI] a3000: make 2 functions static
a3000_{detect,release}() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 10:17:17 -05:00
Adrian Bunk
e0aae1a531 [SCSI] ses: #if 0 the unused ses_match_host()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 10:17:01 -05:00
Wei Yongjun
ebef264bd9 [SCSI] use kmem_cache_zalloc instead of kmem_cache_alloc/memset
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 10:16:31 -05:00
Ingo Molnar
484cad34dd Merge branch 'dma-debug' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-04-03 16:35:09 +02:00
H. Peter Anvin
95a38f3463 x86, setup: compile with -DDISABLE_BRANCH_PROFILING
Impact: code size reduction (possibly critical)

The x86 boot and decompression code has no use of the branch profiling
constructs, so disable them.  This would bloat the setup code by as
much as 14K, eating up a fairly large chunk of the 32K area we are
guaranteed to have.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-03 16:34:45 +02:00
FUJITA Tomonori
0fdf96b67a [SCSI] sg: fix iovec bugs introduced by the block layer conversion
- needs to use copy_from_user for iovec before passing it to
blk_rq_map_user_iov().

- before the block layer conversion, if ->dxfer_len and sum of iovec
disagrees, the shorter one wins. However, currently sg returns
-EINVAL. This restores the old behavior.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:25:23 -05:00
Jaswinder Singh Rajput
f894e74dc1 [SCSI] qlogicpti: use request_firmware
Firmware blob is little endian

Thanks to Stephen Rothwell for fixing typos

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:25:23 -05:00
Jaswinder Singh Rajput
989bb5f58c [SCSI] advansys: use request_firmware
Firmware blob looks like this...
        __le32 checksum
        unsigned char data[]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:25:23 -05:00
Jaswinder Singh Rajput
1bfa11db71 [SCSI] qla1280: use request_firmware
Firmware blob is little endian looks like this...
        unsigned char  Version1
        unsigned char  Version2
        unsigned char  Version3
        unsigned char  Padding
        unsigned short start_address
	unsigned short data

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:24:42 -05:00
Jean Delvare
fd6e1c14b7 [SCSI] libiscsi: fix iscsi pool error path
Le lundi 30 mars 2009, Chris Wright a écrit :
> q->queue could be ERR_PTR(-ENOMEM) which will break unwinding
> on error.  Make iscsi_pool_free more defensive.
>

Making the freeing of q->queue dependent on q->pool being set looks
really weird (although it is correct at the moment. But this seems
to be fixable in a much simpler way.

With the benefit that only the error case is slowed down. In both
cases we have a problem if q->queue contains an error value but it's
not -ENOMEM. Apparently this can't happen today, but it doesn't feel
right to assume this will always be true. Maybe it's the right time
to fix this as well.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:14 -05:00
Mike Christie
5b2639d59a [SCSI] cxgb3i: call ddp release function directly
cxgb3i_ddp_cleanup just calls ddp_release directly so there is
no reason for the wrapper. This patch just renames ddp_release
to cxgb3i_ddp_cleanup and removes the old wrapper function.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:13 -05:00
Karen Xie
0d0c27f2e8 [SCSI] cxgb3i: merge cxgb3i_ddp into cxgb3i module
- Merge cxgb3i_ddp.ko to cxgb3i.ko as there is no other users.
- Bump the driver version up to 1.0.2.

Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:13 -05:00
Karen Xie
2a90030fcb [SCSI] cxgb3i: close all tcp connections upon chip reset
Keep track of offloaded tcp connections per adapter. Close all of the
connections upon reset.

Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:12 -05:00
Mike Christie
ed6f7744f9 [SCSI] cxgb3i: re-read ddp settings information after chip reset
Orignally from Karen Xie, but merge conflicts/errors fixed up by
Mike Christie.

Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:12 -05:00
Karen Xie
9fa1926afb [SCSI] cxgb3i: re-initialize ddp settings after chip reset
Re-initialize the ddp settings after chip reset. It includes re-initialize
the related registers and the ddp map.

Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:12 -05:00
Karen Xie
515f1c885a [SCSI] cxgb3i: subscribe to error notification from cxgb3 driver
Add error notification handling function which is called during chip reset.

Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:11 -05:00
Leubner, Achim
d8e9650765 [SCSI] aacraid driver update
changes:

- set aac_cache=2 as default value to avoid performance problem
  (Novell bugzilla #469922)

- Dell/PERC controller boot problem fixed (RedHat bugzilla #457552)

- WWN flag added to fix SLES10 SP1/SP2 drive detection problems

- 64-bit support changes

- DECLARE_PCI_DEVICE_TABLE macro added

- controller type changes

Signed-off-by: Achim Leubner <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:11 -05:00
Alan Cox
d58069adc6 [SCSI] mptsas: remove unneeded check
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Eric Moore <eric.moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:10 -05:00
Alan Cox
e7fb6d2ee0 [SCSI] config: Make need for SCSI_CDROM clearer
Mention ATAPI. We could insert an essay about libata and ide-scsi etc but
the failure case is someone enables it which is just fine so keep it
simple.

(Revised text from suggestion by Matthew Wilcox)

Closes #7736

Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:10 -05:00
Ed Lin
05b4460bd4 [SCSI] stex: update version to 4.6.0000.3
Signed-off-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:10 -05:00
Ed Lin
0f3f6ee68f [SCSI] stex: add new 6G controller support
This adds the support of a new SAS 6G controller (st_yel)

Signed-off-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:09 -05:00
Ed Lin
591a3a5f60 [SCSI] stex: use config struct for parameters of different controllers
Use config struct (st_card_info) for parameters of different controllers

Signed-off-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:09 -05:00
Ed Lin
99946f8141 [SCSI] stex: add MSI support
This adds the MSI support (default 0=off)

Signed-off-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:09 -05:00
Ed Lin
f149816162 [SCSI] stex: small code fixes and changes
These are some small code fixes and changes, including:
- use 64 bit when possible
- remove some unnecessary code (in interrupt, queuecommand routine etc.)
- code change for reset handler

Signed-off-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:08 -05:00
Joe Eykholt
97c8389d54 [SCSI] fcoe, libfcoe: Add support for FIP. FCoE discovery and keep-alive.
FIP is the new standard way to discover Fibre-Channel Forwarders (FCFs)
by sending solicitations and listening for advertisements from FCFs.

It also provides for keep-alives and period advertisements so that both
parties know they have connectivity.  If the FCF loses connectivity to
the storage fabric, it can send a Link Reset to inform the E_node.

This version is also compatible with pre-FIP implementations, so no
configured selection between FIP mode and non-FIP mode is required.

We wait a couple seconds after sending the initial solicitation
and then send an old-style FLOGI.  If we receive any FIP frames,
we use FIP only mode.  If the old FLOGI receives a response,
we disable FIP mode.  After every reset or link up, this
determination is repeated.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:08 -05:00
Joe Eykholt
af5f428763 [SCSI] fcoe: Add a header file defining the FIP protocol for FCoE.
Adds include/scsi/fc/fc_fip.h for FIP protocol definitions.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:07 -05:00
Vasu Dev
a0a25da2a4 [SCSI] fcoe, libfc: fix double fcoe_softc memory alloc
The foce_softc mem was reserved by libfc_host_alloc as well as
by fcoe_host_alloc.

Removes one liner fcoe_host_alloc completely, instead directly calls
libfc_host_alloc to alloc scsi_host with libfc for just one fcoe_softc
as fcoe private data.

Moves libfc_host_alloc to libfc.h since it is a libfc API, placed
lport_priv API adjacent to libfc_host_alloc since this is related
to scsi_host priv data.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:07 -05:00
Vasu Dev
fdd78027fd [SCSI] fcoe: cleans up libfcoe.h and adds fcoe.h for fcoe module
Removes no where used several inline functions prefixed with skb_*
and be16_to_cpu.

Moves fcoe module specific func prototypes to fcoe.c from libfcoe.h,
moved only need for build.

Adds fcoe module header file fcoe.h and then moves fcoe module
specific fcoe_percpu_s and fcoe_softc to fcoe.h from libfcoe.h.

Moves all defines from fcoe.c to fcoe.h since now fcoe module
has its own header file fcoe.h.

[jejb: removed EXPORT_SYMBOL_GPL(fcoe_fc_crc) which caused a section mismatch]
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:06 -05:00
Vasu Dev
5e80f7f7c8 [SCSI] fcoe: moves common FCoE library API functions to libfcoe module
Moves these functions as-is from fcoe.c to libfcoe.c, since
they're are common routines:

	- fcoe_wwn_from_mac
	- fcoe_libfc_config

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:05 -05:00
Vasu Dev
9b34ecffd5 [SCSI] fcoe, libfc: add libfcoe module
Just sets up build environment for libfcoe module towards a
libfcoe library for libfc LLDs using FCoE as libfc transport.

Common library code to libfcoe is added in next patch.

Also, updated MODULE_LICENSE from "GPL" string to "GPL v2" for
libfc, libfcoe and fcoe modules to accurately match the licenses.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-03 09:23:04 -05:00