On Versatile Express, the PCI Express buses are broken and unusable, so
we aren't going to support PCI/ISA IO cycles on this platform. Remove
the PCI/ISA IO inb et.al. support for this platform.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Avoid potential build problems caused by lacking mach/irqs.h includes
on non-OF builds caused by an errant include in asm/prom.h. asm/prom.h
requires nothing from asm/irq.h, as Grant says:
On Mon, Feb 06, 2012 at 05:56:23AM +0000, Grant Likely wrote:
> On Sat, Feb 04, 2012 at 10:17:48PM +0000, Russell King wrote:
> > Finally, do we need asm/irq.h in our asm/prom.h ? That's causing
> > fragility between DT and non-DT builds, because people are finding
> > that their DT builds work without their mach/irqs.h includes but
> > fail when built with non-DT. The only thing which DT might need -
> > at the most - is NR_IRQS, but I'd hope with things like irq domains
> > it doesn't actually require it.
>
> I don't think so. There may be a file or two that break because they're
> not including everything they need, but I don't think anything in the
> header requires it.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Print debug information on user faults for SIGBUS if user_debug = 16
in the kernel command line.
Reference: <1327333344-26340-1-git-send-email-javi.merino@arm.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The irq_start and hwirq_base assignment code is fairly hairy and ended
up being difficult to read following a conflict resolution for 3.2.
This patch rearranges the code slightly to make it easier to read.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM unconditionally selects CONFIG_GENERIC_HARDIRQS, so the definition
of for_each_irq_desc will check that the desc is non-NULL anyway.
This patch removes a redundant check from the IRQ migration code.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cortex-A7 implements an ARMv7-compatible PMU compliant with the PMUv2
architecture specification.
This patch adds support for the PMU to the ARM perf backend.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ensure that the software state for sched_clock() is updated at the
point of suspend so that we avoid losing ticks since the last update.
This prevents the platform dependent possibility that sched_clock()
may appear to go backwards across a suspend/resume cycle.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add the compiled ISA to oops dumps, along side the preempt/smp
configuration. This allows us to see immediately whether the kernel
was compiled for Thumb-2 or not.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This line is irritating and wrong when modules are not supported, so
don't show it then.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds some endianness-agnostic helpers to convert machine
instructions between canonical integer form and in-memory
representation.
A canonical integer form for representing instructions is also
formalised here.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now that we can select a sched_clock at runtime, let's implement
it for the Integrator AP, default-select the one found in all
other board it for all plat-versatile boards and make the right
clock kick in at runtime.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The current user mapping for the vectors page is inserted as a `horrible
hack vma' into each task via arch_setup_additional_pages. This causes
problems with the MM subsystem and vm_normal_page, as described here:
https://lkml.org/lkml/2012/1/14/55
Following the suggestion from Hugh in the above thread, this patch uses
the gate_vma for the vectors user mapping, therefore consolidating
the horrible hack VMAs into one.
Acked-and-Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Avoid namespace conflicts with drivers over the CP15 definitions by
moving CP15 related prototypes and definitions to a private header
file.
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com> [Tegra]
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Tested-by: H Hartley Sweeten <hsweeten@visionengravers.com> [EP93xx]
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rather than open-coding the jiffy-based wait, and polling for the
secondary CPU to come online, use a completion instead. This
removes the need to poll, instead we will be notified when the
secondary CPU has initialized.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Get rid of the NO_IRQ madness from Acorn expansion card handling code.
Thankfully, are relatively few users of this here, and so it's easy to
audit.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This register is only present on older platforms, and not on RiscPC,
so lets remove this unused support.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Use irq chip data to store the expansion card data pointer, rather
than converting from the interrupt number to a slot number. This
allows the interrupt chip methods to avoid knowing about interrupt
numbering.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
RiscPC is the only platform using the Acorn expansion card support, so
move it into its mach-* directory.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rather than including asm/irq.h into the keyboard driver, pass the
IRQ numbers via the platform device instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sigh, warnings are there for a reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: John Stultz <john.stultz@linaro.org>
Remove CONFIG_TR=y from the x86 defconfigs since
token ring support is antiquated and obsolete.
( I reviewed both x86 defconfigs - I didn't come up with
anything else that obviously should be removed. )
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Link: http://lkml.kernel.org/r/4F6D05CA.2050801@xenotime.net
[ Twiddled the changelog a bit ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After printing out the first line of a stack backtrace,
print_context_stack() calls print_ftrace_graph_addr() to check
if it's making a graph of function calls, usually not the case.
But unfortunate ordering of assignments causes this to oops if
an earlier stack overflow corrupted threadinfo->task. Reorder
to avoid that irritation.
( The fact that there was a stack overflow may often be more
interesting than the stack that can now be shown; but
integrating that information with this stacktrace is awkward,
so leave it to overflow reporting. )
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20120323225648.15DD5A033B@akpm.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull sysctl updates from Eric Biederman:
- Rewrite of sysctl for speed and clarity.
Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
are no longer bottlenecks in the process of adding and removing
network devices.
sysctl is now focused on being a filesystem instead of system call
and the code can all be found in fs/proc/proc_sysctl.c. Hopefully
this means the code is now approachable.
Much thanks is owed to Lucian Grinjincu for keeping at this until
something was found that was usable.
- The recent proc_sys_poll oops found by the fuzzer during hibernation
is fixed.
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
sysctl: protect poll() in entries that may go away
sysctl: Don't call sysctl_follow_link unless we are a link.
sysctl: Comments to make the code clearer.
sysctl: Correct error return from get_subdir
sysctl: An easier to read version of find_subdir
sysctl: fix memset parameters in setup_sysctl_set()
sysctl: remove an unused variable
sysctl: Add register_sysctl for normal sysctl users
sysctl: Index sysctl directories with rbtrees.
sysctl: Make the header lists per directory.
sysctl: Move sysctl_check_dups into insert_header
sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
sysctl: Replace root_list with links between sysctl_table_sets.
sysctl: Add sysctl_print_dir and use it in get_subdir
sysctl: Stop requiring explicit management of sysctl directories
sysctl: Add a root pointer to ctl_table_set
sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
sysctl: Normalize the root_table data structure.
sysctl: Factor out insert_header and erase_header
...
Pull cpufreq updates for 3.4 from Dave Jones: new drivers and some fixes.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
provide disable_cpufreq() function to disable the API.
EXYNOS5250: Add support cpufreq for EXYNOS5250
EXYNOS4X12: Add support cpufreq for EXYNOS4X12
[CPUFREQ] CPUfreq ondemand: update sampling rate without waiting for next sampling
[CPUFREQ] Add S3C2416/S3C2450 cpufreq driver
[CPUFREQ] Fix exposure of ARM_EXYNOS4210_CPUFREQ
[CPUFREQ] EXYNOS4210: update the name of EXYNOS clock register
[CPUFREQ] EXYNOS: Initialize locking_frequency with initial frequency
[CPUFREQ] s3c64xx: Fix mis-cherry pick of VDDINT
Fix up trivial conflicts in Kconfig and Makefile due to just changes
next to each other (OMAP2PLUS changes vs some new EXYNOS cpufreq
drivers).
Pull cpufreq fixes from Dave Jones:
"I meant to get some of these in for 3.3 final, but left things too
late, so I've got two trees this time."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
cpufreq: OMAP: specify range for voltage scaling
cpufreq: OMAP: scale voltage along with frequency
cpufreq: OMAP driver depends CPUfreq tables
Pull #3 ARM updates from Russell King:
"This adds gpio support to soc_common, allowing an amount of code to be
deleted from each PCMCIA socket driver for the PXA/SA11x0 SoCs."
* 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm:
PCMCIA: sa1111: rename sa1111 socket drivers to have sa1111_ prefix.
PCMCIA: make lubbock socket driver part of sa1111_cs
PCMCIA: add Kconfig control for building sa11xx_base.c
PCMCIA: sa1111: jornada720: no need to disable IRQs around sa1111_set_io
PCMCIA: sa1111: pass along sa1111_pcmcia_configure_socket() failure code
PCMCIA: soc_common: remove explicit wrprot initialization in socket drivers
PCMCIA: soc_common: remove soc_pcmcia_*_irqs functions
PCMCIA: sa11x0: h3600: convert to use new irq/gpio management
PCMCIA: sa11x0: simpad: convert to use new irq/gpio management
PCMCIA: sa11x0: shannon: convert to use new irq/gpio management
PCMCIA: sa11x0: nanoengine: convert reset handling to use GPIO subsystem
PCMCIA: sa11x0: nanoengine: convert to use new irq/gpio management
PCMCIA: sa11x0: cerf: convert reset handling to use GPIO subsystem
PCMCIA: sa11x0: cerf: convert to use new irq/gpio management
PCMCIA: sa11x0: assabet: convert to use new irq/gpio management
PCMCIA: sa1111: use new per-socket irq/gpio infrastructure
PCMCIA: pxa: convert PXA socket drivers to use new irq/gpio management
PCMCIA: soc_common: add GPIO support for card status signals
PCMCIA: soc_common: move common initialization into soc_common
Pull #2 ARM updates from Russell King:
"Further ARM AMBA primecell updates which aren't included directly in
the previous commit. I wanted to keep these separate as they're
touching stuff outside arch/arm/."
* 'amba' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7362/1: AMBA: Add module_amba_driver() helper macro for amba_driver
ARM: 7335/1: mach-u300: do away with MMC config files
ARM: 7280/1: mmc: mmci: Cache MMCICLOCK and MMCIPOWER register
ARM: 7309/1: realview: fix unconnected interrupts on EB11MP
ARM: 7230/1: mmc: mmci: Fix PIO read for small SDIO packets
ARM: 7227/1: mmc: mmci: Prepare for SDIO before setting up DMA job
ARM: 7223/1: mmc: mmci: Fixup use of runtime PM and use autosuspend
ARM: 7221/1: mmc: mmci: Change from using legacy suspend
ARM: 7219/1: mmc: mmci: Change vdd_handler to a generic ios_handler
ARM: 7218/1: mmc: mmci: Provide option to configure bus signal direction
ARM: 7217/1: mmc: mmci: Put power register deviations in variant data
ARM: 7216/1: mmc: mmci: Do not release spinlock in request_end
ARM: 7215/1: mmc: mmci: Increase max_segs from 16 to 128
Pull #1 ARM updates from Russell King:
"This one covers stuff which Arnd is waiting for me to push, as this is
shared between both our trees and probably other trees elsewhere.
Essentially, this contains:
- AMBA primecell device initializer updates - mostly shrinking the
size of the device declarations in platform code to something more
reasonable.
- Getting rid of the NO_IRQ crap from AMBA primecell stuff.
- Nicolas' idle cleanups. This in combination with the restart
cleanups from the last merge window results in a great many
mach/system.h files being deleted."
Yay: ~80 files, ~2000 lines deleted.
* 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm: (60 commits)
ARM: remove disable_fiq and arch_ret_to_user macros
ARM: make entry-macro.S depend on !MULTI_IRQ_HANDLER
ARM: rpc: make default fiq handler run-time installed
ARM: make arch_ret_to_user macro optional
ARM: amba: samsung: use common amba device initializers
ARM: amba: spear: use common amba device initializers
ARM: amba: nomadik: use common amba device initializers
ARM: amba: u300: use common amba device initializers
ARM: amba: lpc32xx: use common amba device initializers
ARM: amba: netx: use common amba device initializers
ARM: amba: bcmring: use common amba device initializers
ARM: amba: ep93xx: use common amba device initializers
ARM: amba: omap2: use common amba device initializers
ARM: amba: integrator: use common amba device initializers
ARM: amba: realview: get rid of private platform amba_device initializer
ARM: amba: versatile: get rid of private platform amba_device initializer
ARM: amba: vexpress: get rid of private platform amba_device initializer
ARM: amba: provide common initializers for static amba devices
ARM: amba: make use of -1 IRQs warn
ARM: amba: u300: get rid of NO_IRQ initializers
...
This series for the OpenRISC architecture consists of mostly trivial fixups.
The most interesting bits of the series are:
* A fix to the timer code whereby the shortest trigger period is set to
100 cycles; previously, it was possible to set this to 1 cycle, but by
the time the register was written, that time had already passed and the
timer interrupt would not go off until the cycle counter had gone a full
cycle.
* Allowing a device tree binary to be passed in to the kernel from u-boot.
The OpenRISC architecture has been recently merged into upstream u-boot,
so this change gets OpenRISC Linux into sync with that project.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEABECAAYFAk9sO1EACgkQ70gcjN2673MC3wCgrCMVzL0htkO7o6oYAP2+oMeh
FKIAnRZ+8KFQ1/WM/jnPHLJxVClyJpBp
=Uk9/
-----END PGP SIGNATURE-----
Merge tag 'for-3.4' of git://openrisc.net/jonas/linux
Pull OpenRISC changes for 3.4 from Jonas Bonn:
"This series for the OpenRISC architecture consists of mostly trivial
fixups. The most interesting bits of the series are:
* A fix to the timer code whereby the shortest trigger period is set
to 100 cycles; previously, it was possible to set this to 1 cycle,
but by the time the register was written, that time had already
passed and the timer interrupt would not go off until the cycle
counter had gone a full cycle.
* Allowing a device tree binary to be passed in to the kernel from
u-boot. The OpenRISC architecture has been recently merged into
upstream u-boot, so this change gets OpenRISC Linux into sync with
that project."
* tag 'for-3.4' of git://openrisc.net/jonas/linux:
OpenRISC: Remove memory_start/end prototypes
openrisc: remove semicolon from KSTK_ defs
openrisc: sanitize use of orig_gpr11
openrisc: fix virt_addr_valid
OpenRISC: Export dump_stack()
OpenRISC: Select GENERIC_ATOMIC64
openrisc: Set shortest clock event to 100 ticks
openrisc: included linux/thread_info.h twice
OpenRISC: Use set_current_blocked() and block_sigmask()
OpenRISC: Don't mask signals if we fail to setup signal stack
OpenRISC: No need to reset handler if SA_ONESHOT
OpenRISC: Don't reimplement force_sigsegv()
openrisc: enable passing of flattened device tree pointer
arch/openrisc/mm/init.c: trivial: use BUG_ON
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJPa7EVAAoJEKurIx+X31iBADgP/1eHhyb4A2ULOk19MhrEAb3/
myoz76qVZvFC7i/5kmtkpAZqCYNnNfiBzr0vpd6MNq9mgFx5jna/IiaV8cZ08kcj
JPknyrAoK8IgGgVQmi+pSHoaiNRAjnvkLwBsadLSFGmcjpi1BYlzmQrFYvMh5GAN
J6j7+2Sv2YNec2QthZZ56L29QqQnbzg0DfS6ZTcOLSvitCnAsUb54Hoi/saK1c9r
loUdWeVen8FPBSTfyqFtJLWJj1Yo3KxVoUgME+f7TeBQNRfihzDhMLuK+IQ5Bsp5
/Z4+Pi4MxiTccYETWCLFWyS9cheK7tfR/SOpEd+XQYx7BQ2gLp/WMNd3q1iUTsg+
AXr3/XOtmKWKAlu6Wy/oFUOjokPVm/yl9MSB6Io1bmlYfaCmb8I+Eyr+fJzPKLyI
wVd+SQl02Z2PlC9dnr73Wu+bCrx0u9xQncucNEpZIpBZLGeuU0sPP0Y7nCGAOL5C
NL7yQmb45pvT7kvzBfrrFSqEfWccycIL+CJw9kb+k1aspTxT0YBZPbpNTKj83aXW
H4KAF9Av/ZHeacriDpxrywwMtTsTS/xLhxEdHB/XEzjkGLuB0R7ZlzvTSuLBbm/1
fu8/bty4ehOHRqACIuNNMGbdIZ5ruThylSC4Yx2a3JRIJISZS/Bo/0mU5zLRRaH+
XuEw3+mfhLq9oU8SdAug
=78St
-----END PGP SIGNATURE-----
Merge tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull miscellaneous Itanium patches from Tony Luck.
The conflicts in arch/ia64/hp/sim/simserial.c were due to patches to
simserial that had alredy been included (with lots of further cleanups)
in the serial tree.
* tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
Documentation/kernel-parameters: remove inttest parameter
[IA64] Fix ISA IRQ trigger model and polarity setting
[IA64] Fix a couple of warnings for EXPORT_SYMBOL
[IA64] Check return from device_register() in cx_device_register()
[IA64] Fix warning from machine_kexec.c
[IA64] simserial, bail out when request_irq fails
[IA64] hpsim, initialize chip for assigned irqs
[IA64] simserial, include some headers
[IA64] hpsim, fix SAL handling in fw-emu
[IA64] genirq fixup for SGI/SN
[IA64] disable interrupts when exiting from ia64_mca_cmc_int_handler()
Pull additional x86 fixes from Peter Anvin:
- address a long-standing bug related to when a kernel-spawned process
gets a signal on an i386 kernel compiled without CONFIG_VM86.
- fix the newly introduced build warning in arch/x86/boot.
- fix a typo in the i386 system call table which affects building some
libcs.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-32: Fix endless loop when processing signals for kernel tasks
x86, boot: Correct CFLAGS for hostprogs
x86-32: Fix typo for mq_getsetattr in syscall table
Merge second batch of patches from Andrew Morton:
- various misc things
- core kernel changes to prctl, exit, exec, init, etc.
- kernel/watchdog.c updates
- get_maintainer
- MAINTAINERS
- the backlight driver queue
- core bitops code cleanups
- the led driver queue
- some core prio_tree work
- checkpatch udpates
- largeish crc32 update
- a new poll() feature for the v4l guys
- the rtc driver queue
- fatfs
- ptrace
- signals
- kmod/usermodehelper updates
- coredump
- procfs updates
* emailed from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
seq_file: add seq_set_overflow(), seq_overflow()
proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
procfs: speed up /proc/pid/stat, statm
procfs: add num_to_str() to speed up /proc/stat
proc: speed up /proc/stat handling
fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
coredump: remove VM_ALWAYSDUMP flag
kmod: make __request_module() killable
kmod: introduce call_modprobe() helper
usermodehelper: ____call_usermodehelper() doesn't need do_exit()
usermodehelper: kill umh_wait, renumber UMH_* constants
usermodehelper: implement UMH_KILLABLE
usermodehelper: introduce umh_complete(sub_info)
usermodehelper: use UMH_WAIT_PROC consistently
signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/
signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths
signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
Hexagon: use set_current_blocked() and block_sigmask()
...
It is undocumented but a seq_file's overflow state is indicated by
m->count == m->size. Add seq_set_overflow() and seq_overflow() to
set/check overflow status explicitly.
Based on an idea from Eric Dumazet.
[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace. Keeping that network namespace from being freed
when the last user goes away. Leaving things like vlan devices in the
leaked network namespace.
If you use ip netns add for much real work this problem becomes apparent
pretty quickly. It light testing the problem hides because frequently
you simply don't notice the leak.
Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.
This issue exists back to 3.0.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Process accounting applications as top, ps visit some files under
/proc/<pid>. With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
and /proc/<pid>/statm files.
This patch adds
- seq_put_decimal_ll() for signed values.
- allow delimiter == 0.
- convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.
Test result on a system with 2000+ procs.
Before patch:
[kamezawa@bluextal test]$ top -b -n 1 | wc -l
2223
[kamezawa@bluextal test]$ time top -b -n 1 > /dev/null
real 0m0.675s
user 0m0.044s
sys 0m0.121s
[kamezawa@bluextal test]$ time ps -elf > /dev/null
real 0m0.236s
user 0m0.056s
sys 0m0.176s
After patch:
kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null
real 0m0.657s
user 0m0.052s
sys 0m0.100s
[kamezawa@bluextal ~]$ time ps -elf > /dev/null
real 0m0.198s
user 0m0.050s
sys 0m0.145s
Considering top, ps tend to scan /proc periodically, this will reduce cpu
consumption by top/ps to some extent.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
== stat_check.py
num = 0
with open("/proc/stat") as f:
while num < 1000 :
data = f.read()
f.seek(0, 0)
num = num + 1
==
perf shows
20.39% stat_check.py [kernel.kallsyms] [k] format_decode
13.41% stat_check.py [kernel.kallsyms] [k] number
12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf
10.85% stat_check.py [kernel.kallsyms] [k] memcpy
4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup
4.43% stat_check.py [kernel.kallsyms] [k] seq_printf
This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().
On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py
real 0m0.150s
user 0m0.026s
sys 0m0.121s
== After patch ==
[root@bluextal test]# time ./stat_check.py
real 0m0.055s
user 0m0.022s
sys 0m0.030s
[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Turner <pjt@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
and is slow :
# strace -T -o /tmp/STRACE cat /proc/stat | wc -c
5826
# grep "cpu " /tmp/STRACE
read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504>
Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)
Fix this by :
1) Taking into account nr_irqs in the initial buffer sizing.
2) Using ksize() to allow better filling of initial buffer.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_sparsemem_vmemmap_info() is only used inside fs/proc/kcore.c
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
for 'VM_NODUMP' flag. The idea is is to add a new madvise() flag:
MADV_DONTDUMP, which can be set by applications to specifically request
memory regions which should not dump core.
The specific application I have in mind is qemu: we can add a flag there
that wouldn't dump all of guest memory when qemu dumps core. This flag
might also be useful for security sensitive apps that want to absolutely
make sure that parts of memory are not dumped. To clear the flag use:
MADV_DODUMP.
[akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
[akpm@linux-foundation.org: fix up the architectures which broke]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The motivation for this patchset was that I was looking at a way for a
qemu-kvm process, to exclude the guest memory from its core dump, which
can be quite large. There are already a number of filter flags in
/proc/<pid>/coredump_filter, however, these allow one to specify 'types'
of kernel memory, not specific address ranges (which is needed in this
case).
Since there are no more vma flags available, the first patch eliminates
the need for the 'VM_ALWAYSDUMP' flag. The flag is used internally by
the kernel to mark vdso and vsyscall pages. However, it is simple
enough to check if a vma covers a vdso or vsyscall page without the need
for this flag.
The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
'MADV_DONTDUMP', and unset via 'MADV_DODUMP'. The core dump filters
continue to work the same as before unless 'MADV_DONTDUMP' is set on the
region.
The qemu code which implements this features is at:
http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch
In my testing the qemu core dump shrunk from 383MB -> 13MB with this
patch.
I also believe that the 'MADV_DONTDUMP' flag might be useful for
security sensitive apps, which might want to select which areas are
dumped.
This patch:
The VM_ALWAYSDUMP flag is currently used by the coredump code to
indicate that a vma is part of a vsyscall or vdso section. However, we
can determine if a vma is in one these sections by checking it against
the gate_vma and checking for a non-NULL return value from
arch_vma_name(). Thus, freeing a valuable vma bit.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Tetsuo Handa pointed out, request_module() can stress the system
while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.
The task T uses "almost all" memory, then it does something which
triggers request_module(). Say, it can simply call sys_socket(). This
in turn needs more memory and leads to OOM. oom-killer correctly
chooses T and kills it, but this can't help because it sleeps in
TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
TIF_MEMDIE task T.
Make __request_module() killable. The only necessary change is that
call_modprobe() should kmalloc argv and module_name, they can't live in
the stack if we use UMH_KILLABLE. This memory is freed via
call_usermodehelper_freeinfo()->cleanup.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>