Also employ the overflow handler to adjust the frequency, this results
in a stable frequency in about 40~50 samples, instead of that many ticks.
This also means we can start sampling at a sample period of 1 without
running head-first into the throttle.
It relies on sched_clock() to accurately measure the time difference
between the overflow NMIs.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sort includes, and reorder code so we can kill the forward declarations
No functional changes
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Although get_block() callback function can return extent of contiguous
blocks with bh->b_size, nilfs_get_block() function did not support
this feature.
This adds contiguous lookup feature to the block mapping codes of
nilfs, and allows the nilfs_get_blocks() function to return the extent
information by applying the feature.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This applies block_sync_page() function to the sync_page method of
page caches for meta data files, gc page caches, and btree node
buffers. This is a companion patch of ("nilfs2: enable sync_page
mothod") which applied the function for data pages.
This allows lock_page() for those meta data to unplug pending bio
requests.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Previously, default_backing_dev_info was used for the mapping of btree
node caches. This uses device dependent backing_dev_info to allow
detailed control of the device for the btree node pages.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This helps userland programs like the rmcp command to distinguish
error codes returned against a checkpoint removal request.
Previously -EPERM was returned, and not discriminable from real
permission errors. This also allows removal of the latest checkpoint
because the deletion leads to create a new checkpoint, and thus it's
harmless for the filesystem.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This adds a missing sync_page method which unplugs bio requests when
waiting for page locks. This will improve read performance of nilfs.
Here is a measurement result using dd command.
Without this patch:
# mount -t nilfs2 /dev/sde1 /test
# dd if=/test/aaa of=/dev/null bs=512k
1024+0 records in
1024+0 records out
536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s
With this patch:
# mount -t nilfs2 /dev/sde1 /test
# dd if=/test/aaa of=/dev/null bs=512k
1024+0 records in
1024+0 records out
536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This sets BIO_RW_UNPLUG flag on the last bio of each segment during
write. The last bio should be unplugged immediately because the
caller waits for the completion after the submission.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Nilfs has some ioctl commands to read out metadata from meta data
files:
- NILFS_IOCTL_GET_CPINFO for checkpoint file,
- NILFS_IOCTL_GET_SUINFO for segment usage file, and
- NILFS_IOCTL_GET_VINFO for Disk Address Transalation (DAT) file,
respectively.
Every routine on these metadata files is implemented so that it allows
future expansion of on-disk format. But, the above ioctl commands do
not support expansion even though nilfs_argv structure can handle
arbitrary size for data exchanged via ioctl.
This allows future expansion of the following structures which give
basic format of the "get information" ioctls:
- struct nilfs_cpinfo
- struct nilfs_suinfo
- struct nilfs_vinfo
So, this introduces forward compatility of such ioctl commands.
In this patch, a sanity check in nilfs_ioctl_get_info() function is
changed to accept larger data structure [1], and metadata read
routines are rewritten so that they become compatible for larger
structures; the routines will just ignore the remaining fields which
the current version of nilfs doesn't know.
[1] The ioctl function already has another upper limit (PAGE_SIZE
against a structure, which appears in nilfs_ioctl_wrap_copy
function), and this will not cause security problem.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Hi,
I introduced "is_partially_uptodate" aops for NILFS2.
A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.
I did a performance test using the sysbench.
1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil
e-rw-ratio=1 run
-2.6.30-rc5
Test execution summary:
total time: 151.2907s
total number of events: 200000
total time taken by event execution: 2409.8387
per-request statistics:
min: 0.0000s
avg: 0.0120s
max: 0.9306s
approx. 95 percentile: 0.0439s
Threads fairness:
events (avg/stddev): 12500.0000/238.52
execution time (avg/stddev): 150.6149/0.01
-2.6.30-rc5-patched
Test execution summary:
total time: 140.8828s
total number of events: 200000
total time taken by event execution: 2240.8577
per-request statistics:
min: 0.0000s
avg: 0.0112s
max: 0.8750s
approx. 95 percentile: 0.0418s
Threads fairness:
events (avg/stddev): 12500.0000/218.43
execution time (avg/stddev): 140.0536/0.01
arch: ia64
pagesize: 16k
Thanks.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Previously, the bmap codes of nilfs used three types of function
tables. The abuse of indirect function calls decreased source
readability and suffered many indirect jumps which would confuse
branch prediction of processors.
This eliminates one type of the function tables,
nilfs_bmap_ptr_operations, which was used to dispatch low level
pointer operations of the nilfs bmap.
This adds a new integer variable "b_ptr_type" to nilfs_bmap struct,
and uses the value to select the pointer operations.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This will cut off 16 bytes from the nilfs_bmap struct which is
embedded in the on-memory inode of nilfs.
The b_high field was never used, and the b_low field stores a constant
value which can be determined by whether the inode uses btree for
block mapping or not.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This indirect function is set to NULL only for gc cache inodes, but
the gc cache inodes never call this function.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Two get block function for btree nodes, nilfs_bmap_get_block() and
nilfs_bmap_get_new_block(), are called only from the btree codes.
This relocation will increase opportunities of compiler optimization.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
nilfs_bmap_delete_block() is a wrapper function calling
nilfs_btnode_delete(). This removes it for simplicity.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
nilfs_bmap_put_block() is a wrapper function calling brelse(). This
eliminates the wrapper for simplicity.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This will eliminate obsolete list operations of nilfs_segment_entry
structure which has been used to handle mutiple segment numbers.
The patch ("nilfs2: remove list of freeing segments") removed use of
the structure from the segment constructor code, and this patch
simplifies the remaining code by integrating it into recovery.c.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This will clean up the removal list of segments and the related
functions from segment.c and ioctl.c, which have hurt code
readability.
This elimination is applied by using nilfs_sufile_updatev() previously
introduced in the patch ("nilfs2: add sufile function that can modify
multiple segment usages").
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This is a preparation for the later cleanup patch ("nilfs2: remove
list of freeing segments").
This adds nilfs_sufile_updatev() to sufile, which can modify multiple
segment usages at a time.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This simplifies some low level functions of bmap.
Three bmap pointer operations, nilfs_bmap_start_v(),
nilfs_bmap_commit_v(), and nilfs_bmap_abort_v(), are unified into one
nilfs_bmap_start_v() function. And the related indirect function calls
are replaced with it.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* Delete Makefile. It is only used for out-of-tree compilation
and was never needed. It slipped in by mistake.
* Remove from Kbuild all the out of tree stuff as promised.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
libosd has it's own sense decoding and printout. Don't
let scsi_lib duplicate that printout. (Which is done wrong
in regard to osd commands)
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch was inspired by Al Viro, for simplifying and fixing the
retrieval of osd-devices by in-kernel users, eg: file systems.
In-Kernel users, now, go through the same path user-mode does by
opening a file on the osd char-device and though holding a reference
to both the device and the Module.
A file pointer was added to the osd_dev structure which is now
allocated for each user. The internal osd_dev is no longer exposed
outside of the uld. I wanted to do that for a long time so each
libosd user can have his own defaults on the device.
The API is left the same, so user code need not change.
It is no longer needed to open/close a file handle on the osd
char-device from user-mode, before mounting an exofs on it.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
libosd users that need to work with bios, must sometime use
the request_queue associated with the osd_dev. Make a wrapper for
that, and convert all in-tree users.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
For supporting of chained-bios we can not inspect the first
bio only, as before. Caller shall pass the total length of the
request, ie. sum_bytes(bio-chain).
Also since the bio might be a chain we don't set it's direction
on behalf of it's callers. The bio direction should be properly
set prior to this call. So fix a couple of write users that now
need to set the bio direction properly
[In this patch I change both library code and user sites at
exofs, to make it easy on integration. It should be submitted
via James's scsi-misc tree.]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
CC: Jeff Garzik <jeff@garzik.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
_osd_req_finalize_data_integrity was trying to deduce the number of
out_bytes from passed osd_request->out.bio. This is wrong when
the bio is chained. The caller of _osd_req_finalize_data_integrity
has more ready available information and should just pass it.
Also in the light of future support for CDB-continuation segment this is
a better solution.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
By popular demand, define usefull wrappers for osd_req_read/write
that recieve kernel pointers. All users had their own.
Also remove these from exofs
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Shorten out the Attributes names.
Align all results on column 24.
Print system ID in a new line.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Some New revision 5 Attribute definitions.
Some missing definitions of Attributes and pages
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Add all constant definitions of new OSD commands added in
revision 4 & 5. Mainly for creating snapshots and clones.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Commit 4eaa9819dc changed semantics of
private_value member of kcontrol. This resulted in inability to control
amplifier and subsequently in very low output volume.
Tested-by: Johannes Schauer <josch@pyneo.org>
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
There should be no functional changes caused by this patch.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
PowerMac bootup with CONFIG_IDE=y oopses in ide_pio_cycle_time():
because "ide: try to use PIO Mode 0 during probe if possible" causes
pmac_ide_set_pio_mode() to be called before drive->id has been set.
Bart points out other places which now need drive->id set earlier,
so follow his advice to allocate it in ide_port_alloc_devices()
(using kzalloc_node, without error message, as when allocating drive)
and memset it for reuse in ide_port_init_devices_data().
Fixed in passing: ide_host_alloc() was missing ide_port_free_devices()
from an error path.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Joao Ramos <joao.ramos@inov.pt>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Introduce per-conntrack locks and use them instead of the global protocol
locks to avoid contention. Especially tcp_lock shows up very high in
profiles on larger machines.
This will also allow to simplify the upcoming reliable event delivery patches.
Signed-off-by: Patrick McHardy <kaber@trash.net>
If userspace specifies a memory slot that is larger than 8 petabytes, it
could overflow the largepages variable.
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
If a slots guest physical address and host virtual address unequal (mod
large page size), then we would erronously try to back guest large pages
with host large pages. Detect this misalignment and diable large page
support for the trouble slot.
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Combined mode pci quirk hacks went away - so the table to keep in sync
no longer exists.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
We can't do this for the later ones as they have all sorts of magic boot
time stuff that needs reviewing and the like. However we can do it for the
older ones and it turns out we need to as some IBM docking stations have a
second PIIX series device in them and without this change you can't use it
very well
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Make the following EM related cleanups.
* Use msleep(1) instead of udelay(100) and reduce retry count to 5.
* s/MAX_SLOTS/EM_MAX_SLOTS/, s/MAX_RETRY/EM_MAX_RETRY/
* Make EM constants enums as suggested by Jeff.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
We very rarely (if ever) complete more than one command in the
sactive mask at the time, even for extremely high IO rates. So
looping over the entire range of possible tags is pointless,
instead use __ffs() to just find the completed tags directly.
Updated to clear the tag from the done_mask instead of shifting
done_mask down as suggested by From: Tejun Heo <htejun@gmail.com>
Verified with a user space tester to produce the same results.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
32-bit PIO seems to work fine on sata_sil hardware (tested on SiI3114) and is
listed as OK in the Silicon Image datasheets. Enable it.
Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ECC initialization takes too long. It writes zeroes by portions
of 4 byte, it takes more than 6 minutes on my machine to initialize
512Mb ECC DIMM module. Change portion to 128Kb - it significantly
reduces initialization time.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Handling of the trailing byte in ata_sff_data_xfer() is suboptimal bacause:
- it always initializes the padding buffer to 0 which is not really needed in
both the read and write cases;
- it has to use memcpy() to transfer a single byte from/to the padding buffer;
- it uses io{read|write}16() accessors which swap bytes on the big endian CPUs
and so have to additionally convert the data from/to the little endian format
instead of using io{read|write}16_rep() accessors which are not supposed to
change the byte ordering.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>