Commit graph

24,970 commits

Author SHA1 Message Date
Mikulas Patocka
e5d6a7dd5e HPFS: Remove mark_inode_dirty
Remove mark_inode_dirty

HPFS doesn't use kernel's dirty inode indicator anyway because
writing an inode requires directory's mutex.

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-09 09:04:23 -07:00
Mikulas Patocka
0fe105aa29 HPFS: Remove CR/LF conversion option
Remove CR/LF conversion option

It is unused anyway. It was used on 2.2 kernels or so.

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-09 09:04:23 -07:00
Mikulas Patocka
7d23ce36e3 HPFS: Remove remaining locks
Remove remaining locks

Because of a new global per-fs lock, no other locks are needed

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-09 09:04:23 -07:00
Mikulas Patocka
7dd29d8d86 HPFS: Introduce a global mutex and lock it on every callback from VFS.
Introduce a global mutex and lock it on every callback from VFS.

Performance doesn't matter, reviewing the whole code for locking correctness
would be too complicated, so simply lock it all.

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-09 09:04:23 -07:00
Mikulas Patocka
637b424bf8 HPFS: Make HPFS compile on preempt and SMP
Make HPFS compile on preempt and SMP

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-09 09:04:23 -07:00
Steven Whitehouse
9eed04cd99 GFS2: Move final part of inode.c into super.c
Now inode.c is empty.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:38 +01:00
Steven Whitehouse
194c011fc4 GFS2: Move most of the remaining inode.c into ops_inode.c
This is in preparation to remove inode.c and rename ops_inode.c
to inode.c. Also most of the functions which were left in inode.c
relate to the creation and lookup of inodes. I'm intending to work
on consolidating some of that code, and its easier when its all in
one place.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:45:14 +01:00
Steven Whitehouse
d4b2cf1b05 GFS2: Move gfs2_refresh_inode() and friends into glops.c
Eventually there will only be a single caller of this code, so lets
move it where it can be made static at some future date.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:49 +01:00
Steven Whitehouse
94fb763b1a GFS2: Remove gfs2_dinode_print() function
This function was intended for debugging purposes, but it is not very
useful. If we want to know what is on disk then all we need is a
block number and gfs2_edit can give us much better information about
what is there. Otherwise, if we are interested in what is stored in
the in-core inode, it doesn't help us out there either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:44:29 +01:00
Steven Whitehouse
3d6ecb7d16 GFS2: When adding a new dir entry, inc link count if it is a subdir
This adds an increment of the link count when we add a new directory
entry, if that entry is itself a directory. This means that we no
longer need separate code to perform this operation.

Now that both adding and removing directory entries automatically
update the parent directory's link count if required, that makes
the code shorter and simpler than before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:43:53 +01:00
Steven Whitehouse
855d23ce26 GFS2: Make gfs2_dir_del update link count when required
When we remove an entry from a directory, we can save ourselves
some trouble if we know the type of the entry in question, since
if it is itself a directory, we can update the link count of the
parent at the same time as removing the directory entry.

In addition this patch also merges the rmdir and unlink code which
was almost identical anyway. This eliminates the calls to remove
the . and .. directory entries on each rmdir (not needed since the
directory will be deallocated, anyway) which was the only thing preventing
passing the dentry to gfs2_dir_del(). The passing of the dentry
rather than just the name allows us to figure out the type of the entry
which is being removed, and thus adjust the link count when required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:42:37 +01:00
Steven Whitehouse
2baee03fb9 GFS2: Don't use gfs2_change_nlink in link syscall
There are three users of gfs2_change_nlink which add to the link
count. Two of these are about to be removed in later patches, so
this means that there will no callers, when that happens allowing
removal of that function, also in a later patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-09 16:35:25 +01:00
Theodore Ts'o
2cd05cc393 ext4: remove unneeded ext4_journal_get_undo_access
The block allocation code used to use jbd2_journal_get_undo_access as
a way to make changes that wouldn't show up until the commit took
place.  The new multi-block allocation code has a its own way of
preventing newly freed blocks from getting reused until the commit
takes place (it avoids updating the buddy bitmaps until the commit is
done), so we don't need to use jbd2_journal_get_undo_access(), which
has extra overhead compared to jbd2_journal_get_write_access().

There was one last vestigal use of ext4_journal_get_undo_access() in
ext4_add_groupblocks(); change it to use ext4_journal_get_write_access()
and then remove the ext4_journal_get_undo_access() support.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09 10:58:45 -04:00
Amir Goldstein
2846e82004 ext4: move ext4_add_groupblocks() to mballoc.c
In preparation for the next patch, the function ext4_add_groupblocks()
is moved to mballoc.c, where it could use some static functions.

Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09 10:46:41 -04:00
Amerigo Wang
66bb82798d ext4: remove redundant #ifdef in super.c
There is already an #ifdef CONFIG_QUOTA some lines above,
so this one is totally useless.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09 10:30:41 -04:00
Tao Ma
55ff3840a2 ext4: remove redundant check for first_not_zeroed in ext4_register_li_request
We have checked first_not_zeroed == ngroups already above, so remove
this redundant check.

sbi->s_li_request = NULL above is also removed since it is NULL
already.

Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09 10:28:41 -04:00
Tao Ma
00d098822f ext4: use s_inodes_per_block directly in __ext4_get_inode_loc
In __ext4_get_inode_loc, we calculate inodes_per_block every time by
EXT4_BLOCK_SIZE(sb) / EXT4_INODE_SIZE(sb).  AFAICS, this function is a
hot path for ext4, so we'd better use s_inodes_per_block directly
instead of calculating every time.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09 10:26:41 -04:00
Tao Ma
e8bbe8c401 ext4: use EXT4FS_DEBUG instead of EXT4_DEBUG in fsync.c
We have EXT4FS_DEBUG for some old debug and CONFIG_EXT4_DEBUG
for the new mballoc debug, but there isn't any EXT4_DEBUG.

As CONFIG_EXT4_DEBUG seems to be only used in mballoc, use
EXT4FS_DEBUG in fsync.c.

[ It doesn't really matter; although I'm including this commit for
  consistency's sake.  The whole point of the #ifdef's is to disable
  the debugging code.  In general you're not going to want to enable
  all of the code protected by EXT4FS_DEBUG at the same time.  -- Ted ]

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-09 10:25:54 -04:00
Jens Axboe
bbdd304cf6 fs: fixup warning part_discard_alignment_show()
Stephen reports:

-----

After merging the block tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

fs/partitions/check.c: In function 'part_discard_alignment_show':
fs/partitions/check.c:263: warning: format '%u' expects type 'unsigned int', but argument 3 has type 'long long unsigned int'

Introduced by commit  ("block: Remove extra discard_alignment from
hd_struct")

-----

Fix it up by just removing the cast, we return an int already.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-09 08:28:13 +02:00
Theodore Ts'o
1be2add685 jbd2: only print the debugging information for tid wraparound once
If we somehow wrap, we don't want to keep printing the warning message
over and over again.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-08 19:37:54 -04:00
Jan Kara
229309caeb jbd2: Fix forever sleeping process in do_get_write_access()
In do_get_write_access() we wait on BH_Unshadow bit for buffer to get
from shadow state. The waking code in journal_commit_transaction() has
a bug because it does not issue a memory barrier after the buffer is
moved from the shadow state and before wake_up_bit() is called. Thus a
waitqueue check can happen before the buffer is actually moved from
the shadow state and waiting process may never be woken. Fix the
problem by issuing proper barrier.

Reported-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-08 19:09:53 -04:00
Tao Ma
23ceb5b771 block: Remove extra discard_alignment from hd_struct.
Currently, hd_struct.discard_alignment is only used when we
show /sys/block/sdx/sdx/discard_alignment. So remove it and
calculate when it is asked to show.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06 19:30:02 -06:00
Linus Torvalds
c2bf807eb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: handle errors from coalesce_t2
  cifs: refactor mid finding loop in cifs_demultiplex_thread
  cifs: sanitize length checking in coalesce_t2 (try #3)
  cifs: check for bytes_remaining going to zero in CIFS_SessSetup
  cifs: change bleft in decode_unicode_ssetup back to signed type
2011-05-06 15:32:41 -07:00
Timo Warns
fa039d5f6b Validate size of EFI GUID partition entries.
Otherwise corrupted EFI partition tables can cause total confusion.

Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-06 07:46:37 -07:00
David Sterba
182608c829 btrfs: remove old unused commented out code
Remove code which has been #if0-ed out for a very long time and does not
seem to be related to current codebase anymore.

Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-06 12:34:10 +02:00
David Sterba
f2a97a9dbd btrfs: remove all unused functions
Remove static and global declarations and/or definitions. Reduces size
of btrfs.ko by ~3.4kB.

  text    data     bss     dec     hex filename
402081    7464     200  409745   64091 btrfs.ko.base
398620    7144     200  405964   631cc btrfs.ko.remove-all

Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-06 12:34:03 +02:00
Steven Whitehouse
588da3b3be GFS2: Don't use a try lock when promoting to a higher mode
Previously we marked all locks being promoted to a higher mode
with the try flag to avoid any potential deadlocks issues. The
DLM is able to detect these and report them in way that GFS2 can
deal with them correctly. So we can just request the required mode
and wait for a response without needing to perform this check.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05 12:36:38 +01:00
Steven Whitehouse
d192a8e5c6 GFS2: Double check link count under glock
To avoid any possible races relating to the link count, we need to
recheck it under the inode's glock in all cases where it matters.
Also to ensure we never get any nasty surprises, this patch also
ensures that once the link count has hit zero it can never be
elevated by rereading in data from disk.

The only place we cannot provide a proper solution is in rename
in the case where we are removing a target inode and we discover
that the target inode has been already unlinked on another node.
The race window is very small, and we return EAGAIN in this case
to indicate what has happened. The proper solution would be to move
the lookup parts of rename from the vfs into library calls which
the fs could call directly, but that is potentially a very big job
and this fix should cover most cases for now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-05 12:35:40 +01:00
Linus Torvalds
bd355f8ae6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: do not call __mark_dirty_inode under i_lock
  libceph: fix ceph_osdc_alloc_request error checks
  ceph: handle ceph_osdc_new_request failure in ceph_writepages_start
  libceph: fix ceph_msg_new error path
  ceph: use ihold() when i_lock is held
2011-05-04 14:22:20 -07:00
Sage Weil
fca65b4ad7 ceph: do not call __mark_dirty_inode under i_lock
The __mark_dirty_inode helper now takes i_lock as of 250df6ed.  Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-04 12:56:45 -07:00
David Sterba
621496f4fd btrfs: remove unused function prototypes
function prototypes without a body

Signed-off-by: David Sterba <dsterba@suse.cz>
2011-05-04 14:01:26 +02:00
Linus Torvalds
cce2c56e76 logfs: initialize superblock entries earlier
In particular, s_freeing_list needs to be initialized early, since it is
used on some of the error paths when mounts fail.  The mapping inode,
for example, would be initialized and then free'd on an error path
before s_freeing_list was initialized, but the inode drop operation
needs the s_freeing_list to be set up.

Normally you'd never see this, because not only is logfs fairly rare,
but a successful mount will never have any issues.

Reported-by: werner <w.landgraf@ru.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-03 16:10:25 -07:00
Henry C Chang
8c71897be2 ceph: handle ceph_osdc_new_request failure in ceph_writepages_start
We should unlock the page and return -ENOMEM if ceph_osdc_new_request
failed.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03 09:28:12 -07:00
Sage Weil
3772d26d87 ceph: use ihold() when i_lock is held
See 0444d76ae6.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03 09:28:08 -07:00
Yongqiang Yang
667eff35a1 ext4: reimplement convert and split_unwritten
Reimplement ext4_ext_convert_to_initialized() and
ext4_split_unwritten_extents() using ext4_split_extent()

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03 12:25:07 -04:00
Yongqiang Yang
47ea3bb59c ext4: add ext4_split_extent_at() and ext4_split_extent()
Add two functions: ext4_split_extent_at(), which splits an extent into
two extents at given logical block, and ext4_split_extent() which
splits an extent into three extents.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03 12:23:07 -04:00
Yongqiang Yang
197217a5af ext4: add a function merging extents right and left
1) Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right().

2) Add a new function ext4_ext_try_to_merge() which tries to merge
   an extent both left and right.

3) Use the new function in ext4_ext_convert_unwritten_endio() and
   ext4_ext_insert_extent().

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Tested-by: Allison Henderson <achender@linux.vnet.ibm.com>
2011-05-03 11:45:29 -04:00
Jan Kara
df5e622340 ext4: fix deadlock in ext4_symlink() in ENOSPC conditions
ext4_symlink() cannot call __page_symlink() with transaction open.
__page_symlink() calls ext4_write_begin() which can wait for
transaction commit if we are running out of space thus causing a
deadlock. Also error recovery in ext4_truncate_failed_write() does not
count with the transaction being already started (although I'm not
aware of any particular deadlock here).

Fix the problem by stopping a transaction before calling
__page_symlink() (we have to be careful and put inode to orphan list
so that it gets deleted in case of crash) and starting another one
after __page_symlink() returns for addition of symlink into a
directory.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03 11:12:58 -04:00
Jan Kara
7ad8e4e6ae ext4: Fix fs corruption when make_indexed_dir() fails
When make_indexed_dir() fails (e.g. because of ENOSPC) after it has
allocated block for index tree root, we did not properly mark all
changed buffers dirty.  This lead to only some of these buffers being
written out and thus effectively corrupting the directory.

Fix the issue by marking all changed data dirty even in the error
failure case.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03 11:05:55 -04:00
Theodore Ts'o
74e4e6db38 ext4: set extents flag when migrating file to use extents
Fix a typo that was introduced in commit 07a038245b (in 2.6.36) which
caused the extents flag not to be set at the conclusion of converting
an inode to use extents.

Reported-by: Peter Uchno <peter.uchno@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-03 09:34:42 -04:00
Steven Whitehouse
8f065d3650 GFS2: Improve bug trap code in ->releasepage()
If the buffer is dirty or pinned, then as well as printing a
warning, we should also refuse to release the page in
question.

Currently this can occur if there is a race between mmap()ed
writers and O_DIRECT on the same file. With the addition of
->launder_page() in the future, we should be able to close
this gap.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03 11:49:19 +01:00
Steven Whitehouse
4f1de01821 GFS2: Fix ail list traversal
In the recent patches to update the AIL list code, I managed to
forget that the ail list lock got dropped, even though I
added a comment specifically to remind myself :(

Reported-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03 11:48:07 +01:00
Benjamin Marzinski
6905d9e4dd GFS2: make sure fallocate bytes is a multiple of blksize
The GFS2 fallocate code chooses a target size to for allocating chunks of
space.  Whenever it can't find any resource groups with enough space free, it
halves its target. Since this target is in bytes, eventually it will no longer
be a multiple of blksize.  As long as there is more space available in the
resource group than the target, this isn't a problem, since gfs2 will use the
actual space available, which is always a multiple of blksize.  However,
when gfs couldn't fallocate a bigger chunk than the target, it was using the
non-blksize aligned number. This caused a BUG in later code that required
blksize aligned offsets.  GFS2 now ensures that bytes is always a multiple of
blksize

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-05-03 11:47:42 +01:00
Jeff Layton
16541ba11c cifs: handle errors from coalesce_t2
cifs_demultiplex_thread calls coalesce_t2 to try and merge follow-on t2
responses into the original mid buffer. coalesce_t2 however can return
errors, but the caller doesn't handle that situation properly. Fix the
thread to treat such a case as it would a malformed packet. Mark the
mid as being malformed and issue the callback.

Cc: stable@kernel.org
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-03 03:42:15 +00:00
Jeff Layton
146f9f65bd cifs: refactor mid finding loop in cifs_demultiplex_thread
...to reduce the extreme indentation. This should introduce no
behavioral changes.

Cc: stable@kernel.org
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-05-03 03:42:07 +00:00
David Howells
3a852d3bd5 CRED: Fix load_flat_shared_library() to initialise bprm correctly
Fix binfmt_flag's load_flat_shared_library() to initialise bprm correctly.

Currently, prepare_binprm() is called with only .filename .file and .cred
fields set in bprm, but the .cred_prepared and .per_clear fields at least need
initialising.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-05-03 10:10:51 +10:00
Thomas Gleixner
99ee5315da timerfd: Allow timers to be cancelled when clock was set
Some applications must be aware of clock realtime being set
backward. A simple example is a clock applet which arms a timer for
the next minute display. If clock realtime is set backward then the
applet displays a stale time for the amount of time which the clock
was set backwards. Due to that applications poll the time because we
don't have an interface.

Extend the timerfd interface by adding a flag which puts the timer
onto a different internal realtime clock. All timers on this clock are
expired whenever the clock was set.

The timerfd core records the monotonic offset when the timer is
created. When the timer is armed, then the current offset is compared
to the previous recorded offset. When it has changed, then
timerfd_settime returns -ECANCELED. When a timer is read the offset is
compared and if it changed -ECANCELED returned to user space. Periodic
timers are not rearmed in the cancelation case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Chris Friesen <chris.friesen@genband.com>
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Reviewed-by: Alexander Shishkin <virtuoso@slind.org>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:39:15 +02:00
Linus Torvalds
adadfe48df Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6
* 'for-linus' of git://git.infradead.org/ubifs-2.6:
  UBIFS: seek journal heads to the latest bud in replay
  UBIFS: do not free write-buffers when in R/O mode
2011-05-02 12:17:29 -07:00
Artem Bityutskiy
52c6e6f990 UBIFS: seek journal heads to the latest bud in replay
This is the second fix of the following symptom:

UBIFS error (pid 34456): could not find an empty LEB

which sometimes happens after power cuts when we mount the file-system - UBIFS
refuses it with the above error message which comes from the
'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
with the UBIFS power cut emulation enabled.

Analysis of the problem.

Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
But the buds are replayed out-of-order, so the replay basically seeks journal
heads to the "random" bud belonging to this head, and not to the _last_ one.

The result of this is that the GC head may be seeked to a full LEB with no free
space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
fully or mostly dirty LEB to match the current GC head (because we need to
garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
also fail. As a result - recovery fails and mounting fails.

This patch teaches the replay to initialize the GC heads exactly to the latest
buds, i.e. the buds which have the largest sequence number in corresponding
log reference nodes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
2011-05-02 19:23:48 +03:00
Artem Bityutskiy
b50b9f4085 UBIFS: do not free write-buffers when in R/O mode
Currently UBIFS has a small optimization - it frees write-buffers when it is
re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
does not allocate write-buffers as well.

This optimization is nice but it leads to subtle problems and complications
in recovery, which I can reproduce using the integck test. The symptoms are
that after a power cut the file-system cannot be mounted if we first mount
it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:

UBIFS error (pid 34456): could not find an empty LEB

Analysis of the  problem.

When mounting R/W, the reply process sets journal heads to buds [1], but
when mounting R/O - it does not do this, because the write-buffers are not
allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
same file-system but for the following 2 cases:

1. mounting R/W after a power cut and recover
2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery

In the former case, we have journal heads seeked to the a bud, in the latter
case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
try to recover the GC LEB by garbage-collecting to the GC head, but we just
try to find an empty LEB, and there may be no empty LEBs, so we just fail.
On the other hand, in the former case (mount R/W), we are able to make a GC LEB
(@c->gc_lnum) by garbage-collecting.

Thus, let's remove this small nice optimization and always allocate
write-buffers. This should not make too big difference - we have only 3
of them, each of max. write unit size, which is usually 2KiB. So this is
about 6KiB of RAM for the typical case, and only when mounted R/O.

[1]: Note, currently the replay process is setting (seeking) the journal heads
to _some_ buds, not necessarily to the buds which had been the journal heads
before the power cut happened. This will be fixed separately.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
2011-05-02 19:23:36 +03:00