Miklos Szeredi found the bug:
"Basically what happens is that on the server nlm_fopen() calls
nfsd_open() which returns -EACCES, to which nlm_fopen() returns
NLM_LCK_DENIED.
"On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
which in will cause fcntl_setlk() to retry forever."
So, for example, opening a file on an nfs filesystem, changing
permissions to forbid further access, then trying to lock the file,
could result in an infinite loop.
And Trond Myklebust identified the culprit, from Marc Eshel and I:
7723ec9777 "locks: factor out
generic/filesystem switch from setlock code"
That commit claimed to just be reshuffling code, but actually introduced
a behavioral change by calling the lock method repeatedly as long as it
returned -EAGAIN.
We assumed this would be safe, since we assumed a lock of type SETLKW
would only return with either success or an error other than -EAGAIN.
However, nfs does can in fact return -EAGAIN in this situation, and
independently of whether that behavior is correct or not, we don't
actually need this change, and it seems far safer not to depend on such
assumptions about the filesystem's ->lock method.
Therefore, revert the problematic part of the original commit. This
leaves vfs_lock_file() and its other callers unchanged, while returning
fcntl_setlk and fcntl_setlk64 to their former behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The per node counters are used mainly for showing data through the sysfs API.
If that API is not compiled in then there is no point in keeping track of this
data. Disable counters for the number of slabs and the number of total slabs
if !SLUB_DEBUG. Incrementing the per node counters is also accessing a
potentially contended cacheline so this could actually be a performance
benefit to embedded systems.
SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which
is on by default).
Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab()
if the system is not compiled with NUMA support.
[penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
__free_slab does some diagnostics. The resetting of mapcount etc
in discard_slab() can interfere with debug processing. So move
the reset immediately before the page is freed.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Only output per cpu stats if the kernel is build for SMP.
Use a capital "C" as a leading character for the processor number
(same as the numa statistics that also use a capital letter "N").
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
count_partial() is used by both slabinfo and the sysfs proc support. Move
the function directly before the beginning of the sysfs code so that it can
be easily found. Rework the preprocessor conditional to take into account
that slub sysfs support depends on CONFIG_SYSFS *and* CONFIG_SLUB_DEBUG.
Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There
is no point of keeping statistics if no one can restrive them.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Move the definition of kmalloc_caches_dma() into a later #ifdef CONFIG_ZONE_DMA.
This saves one #ifdef and leaves us with a total of two #ifdefs for dma slab support.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
As spotted by kmemcheck, we need to initialize the per-CPU ->stat array before
using it.
[kmem_cache_cpu structures are usually allocated from arrays defined via
DEFINE_PER_CPU that are zeroed so we have not noticed this so far --cl].
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] make ali_atapi_dma static
[libata] sata_svw: fix reversed port count
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
[BRIDGE]: Fix crash in __ip_route_output_key with bridge netfilter
[NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put
[IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface.
[IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled.
[IPV6]: Use appropriate sock tclass setting for routing lookup.
[IPV6]: IPv6 extension header structures need to be packed.
[IPV6]: Fix ipv6 address fetching in raw6_icmp_error().
[NET]: Return more appropriate error from eth_validate_addr().
[ISDN]: Do not validate ISDN net device address prior to interface-up
[NET]: Fix kernel-doc for skb_segment
[SOCK] sk_stamp: should be initialized to ktime_set(-1L, 0)
net: check for underlength tap writes
net: make struct tun_struct private to tun.c
[SCTP]: IPv4 vs IPv6 addresses mess in sctp_inet[6]addr_event.
[SCTP]: Fix compiler warning about const qualifiers
[SCTP]: Fix protocol violation when receiving an error lenght INIT-ACK
[SCTP]: Add check for hmac_algo parameter in sctp_verify_param()
[NET_SCHED] cls_u32: refcounting fix for u32_delete()
[DCCP]: Fix skb->cb conflicts with IP
[AX25]: Potential ax25_uid_assoc-s leaks on module unload.
...
Correctly determine the address of an illegal instruction. The EPCR0 register
holds this value (masked by EPCR0_PC) if the validity bit is set (masked by
EPCR0_V). So the test as to whether the contents of the register are usable
should be involve checking the _V bit, not the _PC bits.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
revert "sched: fix fair sleepers" (e22ecef1d2),
because it is causing audio skipping, see:
http://bugzilla.kernel.org/show_bug.cgi?id=10428
the patch is correct and the real cause of the skipping is not
understood (tracing makes it go away), but time has run out so we'll
revert it and re-try in 2.6.26.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I was asked about "why don't we perform a sk_net filtering in
bind_conflict calls, like we do in other sock lookup places"
for a couple of times.
Can we please add a comment about why we do not need one?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These sockets now have a bit other names and are no longer global.
Shame on me, I haven't provided a good comment for this when
sending DCCP netnsization patches.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ebtables nflog watcher to the kernel in order to
allow ebtables log through the nfnetlink_log backend.
Signed-off-by: Peter Warasin <peter@endian.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Directly call IPv4 and IPv6 variants where the address family is
easily known.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Invalid states can cause out-of-bound memory accesses of the state table.
Also don't insist on having a new state contained in the netlink message.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Connection tracking helpers (specifically FTP) need to be called
before NAT sequence numbers adjustments are performed to be able
to compare them against previously seen ones. We've introduced
two new hooks around 2.6.11 to maintain this ordering when NAT
modules were changed to get called from conntrack helpers directly.
The cost of netfilter hooks is quite high and sequence number
adjustments are only rarely needed however. Add a RCU-protected
sequence number adjustment function pointer and call it from
IPv4 conntrack after calling the helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
New extensions may only be added to unconfirmed conntracks to avoid races
when reallocating the storage.
Also change NF_CT_ASSERT to use WARN_ON to get backtraces.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Adding extensions to confirmed conntracks is not allowed to avoid races
on reallocation. Don't setup NAT for confirmed conntracks in case NAT
module is loaded late.
The has one side-effect, the connections existing before the NAT module
was loaded won't enter the bysource hash. The only case where this actually
makes a difference is in case of SNAT to a multirange where the IP before
NAT is also part of the range. Since old connections don't enter the
bysource hash the first new connection from the IP will have a new address
selected. This shouldn't matter at all.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Locally generated ICMP packets have a reference to the conntrack entry
of the original packet manually attached by icmp_send(). Therefore the
check for locally originated untracked ICMP redirects can never be
true.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move the UDP-Lite conntrack checksum validation to a generic helper
similar to nf_checksum() and make it fall back to nf_checksum()
in case the full packet is to be checksummed and hardware checksums
are available. This is to be used by DCCP conntrack, which also
needs to verify partial checksums.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move responsibility for setting the IP_NAT_RANGE_PROTO_SPECIFIED flag
to the NAT protocol, properly propagate errors and get rid of ugly
return value convention.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The port rover should not get overwritten when using random mode,
otherwise other rules will also use more or less random ports.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.
Return EAGAIN so userspace can retry the operation.
Fixes (with current iptables SVN version) netfilter bugzilla #104.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Some callers pass uninitialized structures, clear the address to make
sure later comparisions work properly.
Signed-off-by: Patrick McHardy <kaber@trash.net>