Commit d2e5f0c (ACPI / PCI: Rework the setup and cleanup of device
wakeup) moved the initial disabling of system wakeup for PCI devices
into a place where it can actually work and that exposed a hidden old
issue with crap^Wunusual system designs where the same power
resources are used for both wakeup power and device power control at
run time.
Namely, say there is one power resource such that the ACPI power
state D0 of a PCI device depends on that power resource (i.e. the
device is in D0 when that power resource is "on") and it is used
as a wakeup power resource for the same device. Then, calling
acpi_pci_sleep_wake(pci_dev, false) for the device in question will
cause the reference counter of that power resource to drop to 0,
which in turn will cause it to be turned off. As a result, the
device will go into D3cold at that point, although it should have
stayed in D0.
As it turns out, that happens to USB controllers on some laptops
and USB becomes unusable on those machines as a result, which is
a major regression from v3.8.
To fix this problem, (1) increment the reference counters of wakup
power resources during their initialization if they are "on"
initially, (2) prevent acpi_disable_wakeup_device_power() from
decrementing the reference counters of wakeup power resources that
were not enabled for wakeup power previously, and (3) prevent
acpi_enable_wakeup_device_power() from incrementing the reference
counters of wakeup power resources that already are enabled for
wakeup power.
In addition to that, if it is impossible to determine the initial
states of wakeup power resources, avoid enabling wakeup for devices
whose wakeup power depends on those power resources.
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Fabio Baltieri <fabio.baltieri@linaro.org>
Tested-by: Fabio Baltieri <fabio.baltieri@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Trivial, but WRITE SAME is an SBC command so it seems strange for a
related function (defined in target_core_sbc.c) to be in the spc_
namespace.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The sock_diag_lock_handler() and sock_diag_unlock_handler() actually
make the code less readable. Get rid of them and make the lock usage
and access to sock_diag_handlers[] clear on the first sight.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userland can send a netlink message requesting SOCK_DIAG_BY_FAMILY
with a family greater or equal then AF_MAX -- the array size of
sock_diag_handlers[]. The current code does not test for this
condition therefore is vulnerable to an out-of-bound access opening
doors for a privilege escalation.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlx4_en driver allocates the number of objects for the CPU affinity
reverse-map based on the number of rx rings of the device. However,
mlx4_assign_eq() calls irq_cpu_rmap_add() as many times as IRQ's are
assigned to EQ's, which can be as large as mlx4_dev->caps.comp_pool. If
caps.comp_pool is larger than rx_ring_num we will eventually hit the
BUG_ON() in cpu_rmap_add().
Fix this problem by allocating space for the maximum number of CPU
affinity reverse-map objects we might want to add.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory to hold the network device tx_cq is not being allocated with
the correct size in mlx4_en_init_netdev(). It should use MAX_TX_RINGS
instead of MAX_RX_RINGS. This can cause problems if the number of tx
rings being used is greater than MAX_RX_RINGS.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disable hard IRQ before kexec a new kernel image.
Not doing it can result in corrupted data in the memory segments
reserved for the new kernel.
Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull parisc updates from Helge Deller.
The bulk of this is optimized page coping/clearing and cache flushing
(virtual caches are lovely) by John David Anglin.
* 'parisc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (31 commits)
arch/parisc/include/asm: use ARRAY_SIZE macro in mmzone.h
parisc: remove empty lines and unnecessary #ifdef coding in include/asm/signal.h
parisc: sendfile and sendfile64 syscall cleanups
parisc: switch to available compat_sched_rr_get_interval implementation
parisc: fix fallocate syscall
parisc: fix error return codes for rt_sigaction and rt_sigprocmask
parisc: convert msgrcv and msgsnd syscalls to use compat layer
parisc: correctly wire up mq_* functions for CONFIG_COMPAT case
parisc: fix personality on 32bit kernel
parisc: wire up process_vm_readv, process_vm_writev, kcmp and finit_module syscalls
parisc: led driver requires CONFIG_VM_EVENT_COUNTERS
parisc: remove unused compat_rt_sigframe.h header
parisc/mm/fault.c: Port OOM changes to do_page_fault
parisc: space register variables need to be in native length (unsigned long)
parisc: fix ptrace breakage
parisc: always detect multiple physical ranges
parisc: ensure that mmapped shared pages are aligned at SHMLBA addresses
parisc: disable preemption while flushing D- or I-caches through TMPALIAS region
parisc: remove IRQF_DISABLED
parisc: fixes and cleanups in page cache flushing (4/4)
...
Right now it's safe only during initial mount *and* functions are asking
to be abused for dynamic adding of objects.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.
CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Jeff Moyer <jmoyer@redhat.com>
CC: stable@vger.kernel.org
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The last caller was removed >2 years ago in commit 7b2a69ba7.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
- not enough memory for file structures
- operation is not allowed
- user is over its limit
Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.
Return error through pointer. Change all callers to preserve this error code.
[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It's safe only under namespace_sem or vfsmount_lock; all places
in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use
current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in
there).
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull core locking changes from Ingo Molnar:
"The biggest change is the rwsem lock-steal improvements, both to the
assembly optimized and the spinlock based variants.
The other notable change is the clean up of the seqlock implementation
to be based on the seqcount infrastructure.
The rest is assorted smaller debuggability, cleanup and continued -rt
locking changes."
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rwsem-spinlock: Implement writer lock-stealing for better scalability
futex: Revert "futex: Mark get_robust_list as deprecated"
generic: Use raw local irq variant for generic cmpxchg
lockdep: Selftest: convert spinlock to raw spinlock
seqlock: Use seqcount infrastructure
seqlock: Remove unused functions
ntp: Make ntp_lock raw
intel_idle: Convert i7300_idle_lock to raw_spinlock
locking: Various static lock initializer fixes
lockdep: Print more info when MAX_LOCK_DEPTH is exceeded
rwsem: Implement writer lock-stealing for better scalability
lockdep: Silence warning if CONFIG_LOCKDEP isn't set
watchdog: Use local_clock for get_timestamp()
lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug()
locking/stat: Fix a typo
Pull x86 microcode loading update from Peter Anvin:
"This patchset lets us update the CPU microcode very, very early in
initialization if the BIOS fails to do so (never happens, right?)
This is handy for dealing with things like the Atom erratum where we
have to run without PSE because microcode loading happens too late.
As I mentioned in the x86/mm push request it depends on that
infrastructure but it is otherwise a standalone feature."
* 'x86/microcode' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Make early microcode loading a configuration feature
x86/mm/init.c: Copy ucode from initrd image to kernel memory
x86/head64.c: Early update ucode in 64-bit
x86/head_32.S: Early update ucode in 32-bit
x86/microcode_intel_early.c: Early update ucode on Intel's CPU
x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled()
x86/microcode_intel_lib.c: Early update ucode on Intel's CPU
x86/microcode_core_early.c: Define interfaces for early loading ucode
x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP
x86/common.c: Make have_cpuid_p() a global function
x86/microcode_intel.h: Define functions and macros for early loading ucode
x86, doc: Documentation for early microcode loading
With commit 8170e6bed4 ("x86, 64bit: Use a #PF handler to materialize
early mappings on demand") we started hitting an early bootup crash
where the Xen hypervisor would inform us that:
(XEN) d7:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffea000005b2d0:
(XEN) L4[0x1d4] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 7 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.2.0 x86_64 debug=n Not tainted ]----
.. that Xen was unable to context switch back to dom0.
Looking at the calling stack we find:
[<ffffffff8103feba>] xen_get_user_pgd+0x5a <--
[<ffffffff8103feba>] xen_get_user_pgd+0x5a
[<ffffffff81042d27>] xen_write_cr3+0x77
[<ffffffff81ad2d21>] init_mem_mapping+0x1f9
[<ffffffff81ac293f>] setup_arch+0x742
[<ffffffff81666d71>] printk+0x48
We are trying to figure out whether we need to up-date the user PGD as
well. Please keep in mind that under 64-bit PV guests we have a limited
amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel
and user-space. As such the Linux pvops'fied version of write_cr3
checks if it has to update the user-space cr3 as well.
That clearly is not needed during early bootup. The recent changes (see
above git commit) streamline the x86 page table allocation to be much
simpler (And also incidentally the #PF handler ends up in spirit being
similar to how the Xen toolstack sets up the initial page-tables).
The fix is to have an early-bootup version of cr3 that just loads the
kernel %cr3. The later version - which also handles user-page
modifications will be used after the initial page tables have been
setup.
[ hpa: removed a redundant #ifdef and made the new function __init.
Also note that x86-32 already has such an early xen_write_cr3. ]
Tested-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case of error, the function devm_regulator_get() returns
ERR_PTR() and never returns NULL. The NULL test in the return
value check should be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
randconfig build reports the following error which is caused by
CONFIG_PM_OPP being unset.
CC arch/arm/mach-imx/mach-imx6q.o
arch/arm/mach-imx/mach-imx6q.c: In function ‘imx6q_opp_init’:
arch/arm/mach-imx/mach-imx6q.c:248:2: error: implicit declaration of function ‘of_init_opp_table’ [-Werror=implicit-function-declaration]
Fix the error by giving a more correct condition for empty
of_init_opp_table() implementation.
Reported-by: Rob Herring <robherring2@gmail.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit 92ef2a2 (ACPI: Change the ordering of PCI root bridge
driver registrarion), acpi_hest_init() is never called for acpi=off
(acpi_disabled), so hest_disable is not set, but hest_tab is NULL,
which causes apei_hest_parse() to crash when it is called from
aer_acpi_firmware_first().
Fix that by making apei_hest_parse() check if hest_tab is not NULL
in addition to checking hest_disable. Also remove the now useless
acpi_disabled check from apei_hest_parse().
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit def8203 ("microblaze idle: delete pm_idle") introduced the following
compile error:
arch/microblaze/kernel/process.c: In function 'cpu_idle':
arch/microblaze/kernel/process.c💯 error: 'idle' undeclared (first use in this function)
arch/microblaze/kernel/process.c💯 error: (Each undeclared identifier is reported only once
arch/microblaze/kernel/process.c💯 error: for each function it appears in.)
arch/microblaze/kernel/process.c:106: error: implicit declaration of function 'idle'
This patch fixes it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 26bab0c ("blackfin idle: delete pm_idle") introduced the following
compile error:
arch/blackfin/kernel/process.c: In function ‘cpu_idle’:
arch/blackfin/kernel/process.c:83: error: ‘idle’ undeclared (first use in this function)
arch/blackfin/kernel/process.c:83: error: (Each undeclared identifier is reported only once
arch/blackfin/kernel/process.c:83: error: for each function it appears in.)
arch/blackfin/kernel/process.c:88: error: implicit declaration of function ‘idle’
This patch fixes it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Update the ia64 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update the ia64 arch_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The code requires the use of the proper per-exception-vector stub
functions (set up as the early_idt_handlers[] array - note the 's') that
make sure to set up the error vector number. This is true regardless of
whether CONFIG_EARLY_PRINTK is set or not.
Why? The stack offset for the comparison of __KERNEL_CS won't be right
otherwise, nor will the new check (from commit 8170e6bed4: "x86,
64bit: Use a #PF handler to materialize early mappings on demand") for
the page fault exception vector.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext4_has_free_clusters() should tell us whether there is enough free
clusters to allocate, however number of free clusters in the file system
is converted to blocks using EXT4_C2B() which is not only wrong use of
the macro (we should have used EXT4_NUM_B2C) but it's also completely
wrong concept since everything else is in cluster units.
Moreover when calculating number of root clusters we should be using
macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be
off by one. However r_blocks_count should always be a multiple of the
cluster ratio so doing a plain bit shift should be enough here. We
avoid using EXT4_B2C() because it's confusing.
As a result of the first problem number of free clusters is much bigger
than it should have been and ext4_has_free_clusters() would return 1 even
if there is really not enough free clusters available.
Fix this by removing the EXT4_C2B() conversion of free clusters and
using bit shift when calculating number of root clusters. This bug
affects number of xfstests tests covering file system ENOSPC situation
handling. With this patch most of the ENOSPC problems with bigalloc file
system disappear, especially the errors caused by delayed allocation not
having enough space when the actual allocation is finally requested.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
len is 0 means no extent needs to be removed, so return immediately.
Otherwise it could trigger the following BUG_ON() in
ext4_es_remove_extent()
end = lblk + len - 1;
BUG_ON(end < lblk);
This could be reproduced by a simple truncate(1) command by an
unprivileged user
truncate -s $(($((2**32 - 1)) * 4096)) /mnt/ext4/testfile
The same is true for __es_insert_extent().
Patched kernel passed xfstests regression test.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
In fast open the sender unncessarily reduces the space available
for data in SYN by 12 bytes. This is because in the sender
incorrectly reserves space for TS option twice in tcp_send_syn_data():
tcp_mtu_to_mss() already accounts for TS option space. But it further
reserves MAX_TCP_OPTION_SPACE when computing the payload space.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mem cgroup socket limit is only used if the config option is
enabled. Found with sparse
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Jones reported a lockdep splat occurring in IP defrag code.
commit 6d7b857d54 (net: use lib/percpu_counter API for
fragmentation mem accounting) added a possible deadlock.
Because percpu_counter_sum_positive() needs to acquire
a lock that can be used from softirq, we need to disable BH
in sum_frag_mem_limit()
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>