A recent commit:
commit 449aad3e25
introduced the possibility of an A-B/B-A deadlock between
bd_mutex and reconfig_mutex.
__blkdev_get holds bd_mutex while calling md_open which takes
reconfig_mutex,
do_md_run is always called with reconfig_mutex held, and it now
takes bd_mutex in the call the revalidate_disk.
This potential deadlock was not caught by lockdep due to the
use of mutex_lock_interruptible_nexted which was introduced
by
commit d63a5a74de
do avoid a warning of an impossible deadlock.
It is quite possible to split reconfig_mutex in to two locks.
One protects the array data structures while it is being
reconfigured, the other ensures that an array is never even partially
open while it is being deactivated.
In particular, the second lock prevents an open from completing
between the time when do_md_stop checks if there are any active opens,
and the time when the array is either set read-only, or when ->pers is
set to NULL. So we can be certain that no IO is in flight as the
array is being destroyed.
So create a new lock, open_mutex, just to ensure exclusion between
'open' and 'stop'.
This avoids the deadlock and also avoids the lockdep warning mentioned
in commit d63a5a74d
Reported-by: "Mike Snitzer" <snitzer@gmail.com>
Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This adds support for Marvell's Cryptographic Engines and Security
Accelerator (CESA) which can be found on a few SoC.
Tested with dm-crypt.
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.
Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fix the header files to define round_hint_to_min() and to define
mmap_min_addr_handler() in the !CONFIG_SECURITY case.
Built and tested with !CONFIG_SECURITY
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Avoid redelivery of edge interrupt before next edge
KVM: MMU: limit rmap chain length
KVM: ia64: fix build failures due to ia64/unsigned long mismatches
KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
KVM: fix ack not being delivered when msi present
KVM: s390: fix wait_queue handling
KVM: VMX: Fix locking imbalance on emulation failure
KVM: VMX: Fix locking order in handle_invalid_guest_state
KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages
KVM: SVM: force new asid on vcpu migration
KVM: x86: verify MTRR/PAT validity
KVM: PIT: fix kpit_elapsed division by zero
KVM: Fix KVM_GET_MSR_INDEX_LIST
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix/complete ftrace event records sampling
perf_counter, ftrace: Fix perf_counter integration
tracing/filters: Always free pred on filter_add_subsystem_pred() failure
tracing/filters: Don't use pred on alloc failure
ring-buffer: Fix memleak in ring_buffer_free()
tracing: Fix recordmcount.pl to handle sections with only weak functions
ring-buffer: Fix advance of reader in rb_buffer_peek()
tracing: do not use functions starting with .L in recordmcount.pl
ring-buffer: do not disable ring buffer on oops_in_progress
ring-buffer: fix check of try_to_discard result
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix buffer overflow in efi_init()
x86: Add quirk to make Apple MacBookPro5,1 use reboot=pci
x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus
x86: Fix VMI && stack protector
* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Fix typos in documentation
lockdep: Fix file mode of lock_stat
rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()
For events that are rare, such as referral DNS lookups, it makes limited
sense to have a daemon constantly listening for upcalls on a channel. An
alternative in those cases might simply be to run the app that fills the
cache using call_usermodehelper_exec() and friends.
The following patch allows the cache_detail to specify alternative upcall
mechanisms for these particular cases.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
While we do want to protect against multiple concurrent readers and writers
on each upcall/downcall pipe, we don't want to limit concurrent reading and
writing to separate caches.
This patch therefore replaces the static buffer 'write_buf', which can only
be used by one writer at a time, with use of the page cache as the
temporary buffer for downcalls. We still fall back to using the the old
global buffer if the downcall is larger than PAGE_CACHE_SIZE, since this is
apparently needed by the SPKM security context initialisation.
It then replaces the use of the global 'queue_io_mutex' with the
inode->i_mutex in cache_read() and cache_write().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Also ensure that we destroy those files before we destroy the cache_detail.
Otherwise, user processes might attempt to write into uninitialised caches.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In order to allow rpc_pipefs to create directories with different types of
subtrees, it is useful to allow the caller to customise the subtree filling
process.
In order to do so, we separate out the parts which are specific to making
an RPC client directory, and put them in a separate helper, then we convert
the process of filling the directory contents into a callback.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There is still a little wart or two there: Since we've already got a
vfsmount, we might as well pass that in to rpc_create_client_dir.
Another point is that if we open code __rpc_lookup_path() here, then we can
avoid looking up the entire parent directory path over and over again: it
doesn't change.
Also get rid of rpc_clnt->cl_pathname, since it has no users...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This reflects the fact that rpc_mkdir() as it stands today, can only create
a RPC client type directory.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: provide documenting comments for the functions in
net/sunrpc/timer.c.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
After a bind completes, update the transport instance's address
strings so debugging messages display the current port the transport
is connected to.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL.
Currently there are no uses of RPC_DISPLAY_ALL outside the transport
modules themselves, so we can safely get rid of it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Give the "addr" and "port" field less ambiguous names.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Replace PROC macro with open coded C99 structure
initializers to improve readability.
The rpcbind v4 GETVERSADDR procedure is never sent by the current
implementation, so it is not copied to the new structures.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace the open-coded decode logic for PMAP_GETPORT/RPCB_GETADDR with
an xdr_stream-based implementation, similar to what NFSv4 uses, to
protect against buffer overflows. The new implementation also checks
that the incoming port number is reasonable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace the open-coded decode logic for rpcbind UNSET results with an
xdr_stream-based implementation, similar to what NFSv4 uses, to
protect against buffer overflows.
The new function is unused for the moment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace the open-coded encode logic for rpcbind arguments with an
xdr_stream-based implementation, similar to what NFSv4 uses, to
better protect against buffer overflows.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In write_failover_ip(), replace the sscanf() with a call to the common
sunrpc.ko presentation address parser.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Use shared rpc_set_port() function instead of nlm_clear_port().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Use the common routine now provided in sunrpc.ko for parsing mount
addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: In addition to using the new generic rpc_ntop() and
rpc_get_port() functions, have the RPC client compute the presentation
address buffer sizes dynamically using kstrdup().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
RPC universal address generation is currently done in several places:
rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the
redundant cases that convert a socket address to a universal
address. The nfs4proc.c case takes a pre-formatted presentation
address string, not a socket address, so we'll leave that one.
Because the new uaddr constructor uses the recently introduced
rpc_ntop(), it now supports proper "::" shorthanding for IPv6
addresses. This allows the kernel to register properly formed
universal addresses with the local rpcbind service, in _all_ cases.
The kernel can now also send properly formed universal addresses in
RPCB_GETADDR requests, and support link-local properly when
encoding and decoding IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a set of functions in the kernel's RPC implementation for
converting between a socket address and either a standard
presentation address string or an RPC universal address.
The universal address functions will be used to encode and decode
RPCB_FOO and NFSv4 SETCLIENTID arguments. The other functions are
part of a previous promise to deliver shared functions that can be
used by upper-layer protocols to display and manipulate IP
addresses.
The kernel's current address printf formatters were designed
specifically for kernel to user-space APIs that require a particular
string format for socket addresses, thus are somewhat limited for the
purposes of sunrpc.ko. The formatter for IPv6 addresses, %pI6, does
not support short-handing or scope IDs. Also, these printf formatters
are unique per address family, so a separate formatter string is
required for printing AF_INET and AF_INET6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>