Instead of a "cpu" arg with magic values NR_CPUS (any cpu) and ~0 (all
cpus), pass a cpumask_t. Allow NULL for the common case (where we
don't care which CPU the function is run on): temporary cpumask_t's
are usually considered bad for stack space.
This deprecates stop_machine_run, to be removed soon when all the
callers are dead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Akinobu points out that if take_cpu_down() succeeds, the cpu must be offline.
Remove the cpu_online() check, and put a BUG_ON().
Quoting Akinobu Mita:
Actually the cpu_online() check was necessary before appling this
stop_machine: simplify patch.
With old __stop_machine_run(), __stop_machine_run() could succeed
(return !IS_ERR(p) value) even if take_cpu_down() returned non-zero value.
The return value of take_cpu_down() was obtained through kthread_stop()..
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Akinobu Mita" <akinobu.mita@gmail.com>
stop_machine creates a kthread which creates kernel threads. We can
create those threads directly and simplify things a little. Some care
must be taken with CPU hotunplug, which has special needs, but that code
seems more robust than it was in the past.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
-allow stop_mahcine_run() to call a function on all cpus. Calling
stop_machine_run() with a 'ALL_CPUS' invokes this new behavior.
stop_machine_run() proceeds as normal until the calling cpu has
invoked 'fn'. Then, we tell all the other cpus to call 'fn'.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: mingo@elte.hu
CC: akpm@osdl.org
This patch fixed the warning:
CC kernel/module.o
/home/wangcong/Projects/linux-2.6/kernel/module.c:332: warning:
‘lookup_symbol’ defined but not used
Signed-off-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Call the standard hook after setting up signal handlers.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds TIF_NOTIFY_RESUME support for sparc64.
When set, we call tracehook_notify_resume() on the way to user mode.
Signed-off-by: Roland McGrath <roland@redhat.com>
This changes sparc64 syscall tracing to use the new tracehook.h entry
points.
[ Add assembly changes to force an immediate -ENOSYS return from
the system call when syscall_trace() returns non-zero at syscall
entry. -DaveM ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For small file block allocations, mballoc uses per cpu prealloc
space. Use goal block when searching for the right prealloc
space. Also make sure ext4_da_writepages tries to write
all the pages for small files in single attempt
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The write_cache_pages() function uses the mapping->writeback_index as
the starting index to write out when range_cyclic is set. Properly
initialize writeback_index so that we start the writeout at index 0.
This was found when debugging the small file fragmentation on ext4.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix ext4_has_free_blocks() to return 0 when we don't have enough space.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Previous delalloc writepages implementation started a new transaction
outside of a loop which called get_block() to do the block allocation.
Since we didn't know exactly how many blocks would need to be allocated,
the estimated journal credits required was very conservative and caused
many issues.
With the reworked delayed allocation, a new transaction is created for
each get_block(), thus we don't need to guess how many credits for the
multiple chunk of allocation. We start every transaction with enough
credits for inserting a single exent. When estimate the credits for
indirect blocks to allocate a chunk of blocks, we need to know the
number of data blocks to allocate. We use the total number of reserved
delalloc datablocks; if that is too big, for non-extent files, we need
to limit the number of blocks to EXT4_MAX_TRANS_BLOCKS.
Code cleanup from Aneesh.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
With the below changes we reserve credit needed to insert only one
extent resulting from a call to single get_block. This makes sure we
don't take too much journal credits during writeout. We also don't
limit the pages to write. That means we loop through the dirty pages
building largest possible contiguous block request. Then we issue a
single get_block request. We may get less block that we requested. If
so we would end up not mapping some of the buffer_heads. That means
those buffer_heads are still marked delay. Later in the writepage
callback via __mpage_writepage we redirty those pages.
We should also not limit/throttle wbc->nr_to_write in the filesystem
writepages callback. That cause wrong behaviour in
generic_sync_sb_inodes caused by wbc->nr_to_write being <= 0
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
DIO and fallocate credit calculation is different than writepage, as
they do start a new journal right for each call to ext4_get_blocks_wrap().
This patch uses the helper function in DIO and fallocate case, passing
a flag indicating that the modified data are contigous thus could account
less indirect/index blocks.
This patch also fixed the journal credit reservation for direct I/O
(DIO). Previously the estimated credits for DIO only was calculated for
non-extent files, which was not enough if the file is extent-based.
Also fixed was fallocate double-counting credits for modifying the the
superblock.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch modified the writepage/write_begin credit calculation for
extent files, to use the credits caculation helper function.
The current calculation of how many index/leaf blocks should be
accounted is too conservetive, it always considered the worse case,
where the tree level is 5, and in the case of multiple chunk
allocations, it always assumed no blocks were dirtied in common across
the allocations. This path uses the accurate depth of the inode with
some extras to calculate the index blocks, and also less conservative in
the case of multiple allocation accounting.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When considering how many journal credits are needed for modifying a
chunk of data, we need to account for the super block, inode block,
quota blocks and xattr block, indirect/index blocks, also, group bitmap
and group descriptor blocks for new allocation (including data and
indirect/index blocks). There are many places in ext4 do the calculation
on their own and often missed one or two meta blocks, and often they
assume single block allocation, and did not considering the multile
chunk of allocation case.
This patch is trying to cleanup current journal credit code, provides
some common helper funtion to calculate the journal credits, to be used
for writepage, writepages, DIO, fallocate, migration, defrag, and for
both nonextent and extent files.
This patch modified the writepage/write_begin credit caculation for
nonextent files, to use the new helper function. It also fixed the
problem that writepage on nonextent files did not consider the case
blocksize <pagesize, thus could possibelly need multiple block
allocation in a single transaction.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The find_group_flex() function starts with best_flex as the
parent_fbg_group, which happens to have 0 inodes free. Some of the
flex groups searched have free blocks and free inodes, but the
flex_freeb_ratio is < 10, so they're skipped. Then when a group is
compared to the current "best" flex group, it does not have more free
blocks than "best", so it is skipped as well.
This continues until no flex group with free inodes is found which has
a proper ratio or which has more free blocks than the "best" group,
and we're left with a "best" group that has 0 inodes free, and we
return -ENOSPC.
We fix this by changing the logic so that if the current "best" flex
group has no inodes free, and the current one does have room, it is
promoted to the next "best."
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When trying to resize an ext4 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left. This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Dilger <adilger@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In ext4_ext_truncate(), we should use the more generic
ext4_discard_reservations() call so we do the right thing when the
filesystem is mounted with the nomballoc option.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
This fixes a bug where readdir() would return a directory entry twice
if there was a hash collision in an hash tree indexed directory.
Signed-off-by: Eugene Dashevsky <eugene@ibrix.com>
Signed-off-by: Mike Snitzer <msnitzer@ibrix.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Ext4 will release the reserved blocks for delayed allocations when
inode is truncated/unlinked. If there is no reserved block at all, we
shouldn't need to do so. But current code still tries to release the
reserved blocks regardless whether the counters's value is 0.
Continue to do that causes the later calculation to go wrong and a
kernel BUG_ON() caught that. This doesn't happen for extent-based
files, as the calculation for 0 reserved blocks was right for extent
based file.
This patch fixed the kernel BUG() due to above reason. It adds checks
for 0 to avoid unnecessary release and fix calculation for non-extent
files.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to call ext4_discard_reservation() earlier in ext4_truncate(),
to avoid a BUG() in ext4_mb_return_to_preallocation(), which is called
(ultimately) by ext4_free_blocks(). So we must ditch the blocks on
i_prealloc_list before we start freeing the data blocks.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When using fallocate the buffer_heads are marked unwritten and unmapped.
We need to map them in the writepages after a get_block. Otherwise we
split the uninit extents, but never write the content to disk.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Ensure we call nfs_sb_deactive() after releasing the directory inode
nfs_remount oops when rebooting + possible fix
Simplify the code of include/linux/task_io_accounting.h.
It is also more reasonable to have all the task i/o-related statistics in a
single struct (task_io_accounting).
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mxl5007 was forcing for its compilation:
In file included from drivers/media/common/tuners/mxl5007t.c:25:drivers/media/common/tuners/mxl5007t.h:80:1: warning: "CONFIG_MEDIA_TUNER_MXL5007T" redefined
In file included from <command-line>:0:
./include/linux/autoconf.h:2782:1: warning: this is the location of the previous definition
Probably, some temporary hack for testing.
Cc: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
In order to avoid the "Busy inodes after unmount" error message, we need to
ensure that nfs_async_unlink_release() releases the super block after the
call to nfs_free_unlinkdata().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The offset field of the scatterlist entry *after* the last valid scatterlist
entry was used instead of the first scatterlist entry (as was the intention
of this code).
This worked fine until the kzalloc of the sglist was replaced with kmalloc
and sg_init_table only zeroed the exact needed length. Apparently kzalloc
zeroes a bit more than is strictly necessary so the offset field was
always 0 in the past.
But now the offset field was suddenly random and this led to broken captures.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The device is flagged present after it is registered. During that window calls
to open() that should work fail with -ENODEV. Reversing the order fixes
the race.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
There are two videomate boards supporded by em28xx. The names are almost
identical.
This patch renames one of such entries to something else.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>