page_mapping() check this via VM_BUG_ON(PageSlab(page)) so we bug here
with the according debuging turned on.
Future TODO: replace this with a flush_dcache_page_for_pio() API
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: stable@kernel.org
cmd640_hardware_init() reads CFR but doesn't use the value read...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Now, with the introduction of the sff_set_devctl() method, we can
use it in sff_irq_on() method too -- that way its implementations
in 'pata_bf54x' and 'pata_scc' become virtually identical to
ata_sff_irq_on(). The sff_irq_on() method now becomes quite
superfluous, and the only reason not to remove it completely is
the existence of the 'pata_octeon_cf' driver which implements it
as an empty function. Just make the method optional then, with
ata_sff_irq_on() becoming generic taskfile-bound function, still
global for the 'pata_bf54x' driver to be able to call it from its
thaw() and postreset() methods.
While at it, make the sff_irq_on() method and ata_sff_irq_on() return
'void' as the result is always ignored anyway.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The set of libata's taskfile access methods is clearly incomplete as
it lacks a method to write to the device control register -- which
forces drivers like 'pata_bf54x' and 'pata_scc' to implement more
"high level" (and more weighty) methods like freeze() and postreset().
So, introduce the optional sff_set_devctl() method which the drivers
only have to implement if the standard iowrite8() can't be used (just
like the existing sff_check_altstatus() method) and make use of it
in the freeze() and postreset() method implementations (I could also
have used it in softreset() method but it also reads other taskfile
registers without using tf_read() making that quite pointless);
this makes freeze() method implementations in the 'pata_bf54x' and
'pata_scc' methods virtually identical to ata_sff_freeze(), so we
can get rid of them completely.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add "em_buffer" attribute for SATA AHCI hosts to provide a way for
userland to access AHCI EM (enclosure management) buffer directly if the
host supports EM.
AHCI driver should support SGPIO EM messages. However the SATA/AHCI
specs did not define the SGPIO message format filled in EM buffer.
Different HW vendors may have different definitions. The mainly purpose
of this attribute is to solve this issue by allowing HW vendors to
provide userland drivers and tools for their SGPIO initiators.
Signed-off-by: Harry Zhang <harry.zhang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Detect enclosure management message type automatically at driver
initialization, instead of using module parameter "ahci_em_messages".
Signed-off-by: Harry Zhang <harry.zhang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The device control register exists and its address is set by scc_setup_ports(),
hence the check is useless...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
... since, of course, it's not used outside this driver.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Use __ratelimit() instead of its own private rate limit implementation.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
There are some SATA devices which take relatively long to get out of
0xff status after reset. In libata, this timeout is determined by
ATA_TMOUT_FF_WAIT. Quantum GoVault is the worst requring about 2s for
reliable detection. However, because 2s 0xff timeout can introduce
rather long spurious delay during boot, libata has been compromising
at the next longest timeout of 800ms for HHD424020F7SV00 iVDR drive.
Now that parallel scan is in place for common drivers, libata can
afford 2s 0xff timeout. Use 2s 0xff timeout if parallel scan is
enabled.
Please note that the chance of spurious wait is pretty slim w/ working
SCR access so this will only affect SATA controllers w/o SCR access
which isn't too common these days.
Please read the following thread for more information on the GoVault
drive.
http://thread.gmane.org/gmane.linux.ide/14545/focus=14663
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
In 2009, While running "cache read" performance test of drives behind
SII PMP we encountered a "all 5 drives" timeout on more than 30% of the
machines under test. This patch reduces the rate by a factor of about 70.
Low enough that we didn't care to further investigate the issue.
Performance impact with any sort of "normal" use was ~2%+ CPU and less
than 1% throughput degradation. Worst case impact (cached read) was
6% IOPS reduction. This is with NCQ off (q=1) but I believe FIS based
switching enabled in the SATA driver.
The patch disables "Early ACK" in the 3726 port multiplier.
"Early ACK" is issued when device sends a FIS to the host (via PMP)
and the PMP sends an ACK immediately back to the device - well before
the host gets the response. Under worst case IOPs load (cached read
test) and more than 2 PMPs connected to a 4-port SATA controller,
I suspect the time to service all of the PMPs is exceeding the PMPs
ability to keep track of outstanding FIS it owes the Host. Reducing
the number of PMPs to 2 (or 1) reduces the frequency by several orders
of magnitude. Kudos to Gwendal for initial debugging of this issue.
[Any errors in the description are mine, not his.]
Patch is currently in production on Google servers.
Signed-off-by: Grant Grundler <grundler@google.com>
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Implicit slab.h inclusion via percpu.h is about to go away. Make sure
gfp.h or slab.h is included as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
It turns out different generations of MCPs have differing quirks.
* MCP 65-73 : FPDMA AA broken, lies about PMP support, forgets to report NCQ
* MCP 77-79 : FPDMA AA broken, lies about PMP support
* MCP 89 : FPDMA AA broken
Instead of turngin off FPDMA AA on all NVIDIAs, implement
HFLAG_NO_FPDMA_AA, define additional board IDs and apply necessary
quirks.
This fixes bko#15481 and the list of quirks is verified by Peer Chen.
http://bugzilla.kernel.org/show_bug.cgi?id=15481
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
I've prepared a totally simple patch that, if I did it and measured it
correctly, reduces the text size as of the ppc-6xx-size command of
pata-mpc52xx by more than 10%, by reducing the rodata size from 0x4a4
to 0x17e bytes. This is simply done by changing the data types of the
ATA timing constants.
If you are interested at all, and it's worth the trouble, here the
details:
ppc-6xx-size:
text data bss dec hex filename
old: 6532 1068 0 7600 1db0 pata-mpc52xx.o
new: 5718 1068 0 6786 1a82 pata-mpc52xx.o
The (assembler) code itself doesn't really change very much. I double
checked the final results inside mpc52xx-ata-apply-timings() and they
match. The driver is still working fine of course.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ahci over time has grown a number of board IDs and it's a bit of mess
right now. Clean it up such that,
* board_id_* now live in a separate enum board_ids and numbers are
assigned automatically.
* Board IDs assigned to features are separated from the ones assigned
to specific implementations and both are ordered alphabetically.
* For NV MCPs, define per-generation alias board_ids and assign
matching aliases in the pci id table. This makes mcp_linux, 67-73
use board_ahci_mcp65 instead of board_ahci_yesncq. Both are
identical in content.
* Kill now unused board_ahci_nopmp and board_ahci_yesncq.
This patch doesn't cause any functional change but will make future
changes to board_ids and quirks much less painful.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
According to section 10.3.1 of the AHCI spec, PxCMD.ST must not be set
unless there's a device attached. Following this saves us a measurable
quantity of power and does not impair hotplug support. Based on a patch
by Kristen Carlson Accardi.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This can be used for AHCI-compatible interfaces implemented inside
System-On-Chip solutions, or AHCI devices connected via localbus.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch should contain no functional changes, just moves code
around.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Factor out some ahci_em_messages handling code from ahci_init_one().
We would like to reuse it for non-PCI devices.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Introduce ahci_pci_print_info() that now handles PCI stuff.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Move PCI stuff into ahci_pci_init_controller().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
To make the function bus-independand we have to get rid of
"struct pci_dev *", so let's pass just "struct devce *".
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Move PCI stuff into ahci_pci_reset_controller().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
To make the function generic we have to get rid of "struct pci_dev *",
so let's pass just a "struct devce *".
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Make ahci_save_initial_config() a bit more generic by introducing
force_port_map and mask_port_map arguments.
Move PCI stuff into ahci_pci_save_initial_config().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Currently the driver uses host->iomap to store all the iomapped BARs
of a PCI device (while AHCI devices actually use just a single memory
window).
We're going to teach AHCI to work with non-PCI buses, so there are two
options to make this work:
1. "fake" host->iomap array for non-PCI devices, and place the needed
address at iomap[AHCI_PCI_BAR];
2. Get rid of host->iomap usage, instead introduce a private mmio
field.
This patch implements the second option.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform. The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.
This fixes booting certain VMware configurations with CONFIG_MRST=y.
Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.
Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>
Fix xt_TEE build for the case of NF_CONNTRACK=m and
NETFILTER_XT_TARGET_TEE=y:
xt_TEE.c:(.text+0x6df5c): undefined reference to `nf_conntrack_untracked'
4x
Built with all 4 m/y combinations.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
x86, k8: Fix build error when K8_NB is disabled
x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
x86: Fix fake apicid to node mapping for numa emulation
Now that the rpc.gssd daemon can explicitly tell us that the key expired,
we should cache that information to avoid spamming gssd.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This improves the packing of the rpc_task, and ensures that on 64-bit
platforms the size reduces to 216 bytes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
iput() can potentially attempt to allocate memory, so we should avoid
calling it in a memory shrinker. Instead, rely on the fact that iput() will
call nfs_access_zap_cache().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Both iput() and put_rpccred() might allocate memory under certain
circumstances, so make sure that we don't recurse and deadlock...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The 'cred_unused' list, that is traversed by rpcauth_cache_shrinker is
ordered by time. If we hit a credential that is under the 60 second garbage
collection moratorium, we should exit because we know at that point that
all successive credentials are subject to the same moratorium...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Under some circumstances, put_rpccred() can end up allocating memory, so
check the gfp_mask to prevent deadlocks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Again, we can deadlock if the memory reclaim triggers a writeback that
requires a rpcsec_gss credential lookup.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We do not want to have the state recovery thread kick off and wait for a
memory reclaim, since that may deadlock when the writebacks end up
waiting for the state recovery thread to complete.
The safe thing is therefore to use GFP_NOFS in all open, close,
delegation return, lock, etc. operations that may be called by the
state recovery thread.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is a BUG for anybody to call this function without setting
args->bc_xprt. Trying to return an error value is just wrong, since the
user cannot fix this: it is a programming error, not a user error.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently RPC performance metrics that tabulate elapsed time use
jiffies time values. This is problematic on systems that use slow
jiffies (for instance 100HZ systems built for paravirtualized
environments). It is also a problem for computing precise latency
statistics for advanced network transports, such as InfiniBand,
that can have round-trip latencies significanly faster than a single
clock tick.
For the RPC client, adopt the high resolution time stamp mechanism
already used by the network layer and blktrace: ktime.
We use ktime format time stamps for all internal computations, and
convert to milliseconds for presentation. As a result, we need only
addition operations in the performance critical paths; multiply/divide
is required only for presentation.
We could report RTT metrics in microseconds. In fact the mountstats
format is versioned to accomodate exactly this kind of interface
improvement.
For now, however, we'll stay with millisecond precision for
presentation to maintain backwards compatibility with the handful of
currently deployed user space tools. At a later point, we'll move to
an API such as BDI_STATS where a finer timestamp precision can be
reported.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
To report ktime statistics to user space in milliseconds, a new helper
is required.
When considering how to do this conversion, I didn't immediately see
why the extra step of converting ktime to a timeval was needed. To
make that more clear, introduce a couple of large comments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Compute an RPC request's RTT once, and use that value both for reporting
RPC metrics, and for adjusting the RTT context used by the RPC client's RTT
estimator algorithm.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>