* 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp/intel: remove restore in resume
agp: fix uninorth build
intel-agp: Set dma mask for i915
agp: kill phys_to_gart() and gart_to_phys()
intel-agp: fix sglist allocation to avoid vmalloc()
intel-agp: Move repeated sglist free into separate function
agp: Switch agp_{un,}map_page() to take struct page * argument
agp: tidy up handling of scratch pages w.r.t. DMA API
intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
agp: Add generic support for graphics dma remapping
agp: Switch mask_memory() method to take address argument again, not page
When
* there is almost out of ates
* one asks for more than one ate
* there are some available at the end of ate array
then the inner for loop will end without incrementing 'index'. This
means the outer loop will start at the same point finding it's available
and runs the inner loop again from the same index. This repeats forever.
Hence make sure we check we were at the end of ate array and return
an error in such case.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Found-by: Jeff Mahoney <jeffm@novell.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
APERF/MPERF support for cpu_power.
APERF/MPERF is arch defined to be a relative scale of work capacity
per logical cpu, this is assumed to include SMT and Turbo mode.
APERF/MPERF are specified to both reset to 0 when either counter
wraps, which is highly inconvenient, since that'll give a blimp
when that happens. The manual specifies writing 0 to the counters
after each read, but that's 1) too expensive, and 2) destroys the
possibility of sharing these counters with other users, so we live
with the blimp - the other existing user does too.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move some of the aperf/mperf code out from the cpufreq driver
thingy so that other people can enjoy it too.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the APERFMPERF capacility into a X86_FEATURE flag so that it
can be used outside of the acpi cpufreq driver.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If we're looking to place a new task, we might as well find the
idlest position _now_, not 1 tick ago.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the idle balancer more agressive, to improve a
x264 encoding workload provided by Jason Garrett-Glaser:
NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 252.82 fps, 22096.60 kb/s
encoded 600 frames, 250.69 fps, 22096.60 kb/s
encoded 600 frames, 245.76 fps, 22096.60 kb/s
NO_NEXT_BUDDY LB_BIAS
encoded 600 frames, 344.44 fps, 22096.60 kb/s
encoded 600 frames, 346.66 fps, 22096.60 kb/s
encoded 600 frames, 352.59 fps, 22096.60 kb/s
NO_NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 425.75 fps, 22096.60 kb/s
encoded 600 frames, 425.45 fps, 22096.60 kb/s
encoded 600 frames, 422.49 fps, 22096.60 kb/s
Peter pointed out that this is better done via newidle_idx,
not via LB_BIAS, newidle balancing should look for where
there is load _now_, not where there was load 2 ticks ago.
Worst-case latencies are improved as well as no buddies
means less vruntime spread. (as per prior lkml discussions)
This change improves kbuild-peak parallelism as well.
Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When merging select_task_rq_fair() and sched_balance_self() we lost
the use of wake_idx, restore that and set them to 0 to make wake
balancing more aggressive.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.
To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().
Modify sched_balance_self() to:
- update_shares() when walking up the domain tree,
(it only called it for the top domain, but it should
have done this anyway), which allows us to remove
this ugly bit from try_to_wake_up().
- do wake_affine() on the smallest domain that contains
both this (the waking) and the prev (the wakee) cpu for
WAKE invocations.
Then use the top-down balance steps it had to replace wake_idle().
This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch to using both register sets - side A and side B for display panning.
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (23 commits)
at_hdmac: Rework suspend_late()/resume_early()
PM: Reset transition_started at dpm_resume_noirq
PM: Update kerneldoc comments in drivers/base/power/main.c
PM: Add convenience macro to make switching to dev_pm_ops less error-prone
hp-wmi: Switch driver to dev_pm_ops
floppy: Switch driver to dev_pm_ops
PM: Trivial fixes
PM / Hibernate / Memory hotplug: Always use for_each_populated_zone()
PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
PM/Hibernate: Rework shrinking of memory
PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
PM: Run-time PM platform device bus support
PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
Driver Core: Make PM operations a const pointer
PM: Remove platform device suspend_late()/resume_early() V2
USB: Rework musb suspend()/resume_early()
I2C: Rework i2c-s3c2410 suspend_late()/resume() V2
I2C: Rework i2c-pxa suspend_late()/resume_early()
DMA: Rework txx9dmac suspend_late()/resume_early()
...
Fix trivial conflict in drivers/base/platform.c (due to same
constification patch being merged in both sides, along with some other
PM work in the PM branch)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (52 commits)
Input: bcm5974 - silence uninitialized variables warnings
Input: wistron_btns - add keymap for AOpen 1557
Input: psmouse - use boolean type
Input: i8042 - use platform_driver_probe
Input: i8042 - use boolean type where it makes sense
Input: i8042 - try disabling and re-enabling AUX port at close
Input: pxa27x_keypad - allow modifying keymap from userspace
Input: sunkbd - fix formatting
Input: i8042 - bypass AUX IRQ delivery test on laptops
Input: wacom_w8001 - simplify querying logic
Input: atkbd - allow setting force-release bitmap via sysfs
Input: w90p910_keypad - move a dereference below a NULL test
Input: add twl4030_keypad driver
Input: matrix-keypad - add function to build device keymap
Input: tosakbd - fix cleaning up KEY_STROBEs after error
Input: joydev - validate axis/button maps before clobbering current ones
Input: xpad - add USB ID for the drumkit controller from Rock Band
Input: w90p910_keypad - rename driver name to match platform
Input: add new driver for Sentelic Finger Sensing Pad
Input: psmouse - allow defining read-only attributes
...
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (209 commits)
[SCSI] fix oops during scsi scanning
[SCSI] libsrp: fix memory leak in srp_ring_free()
[SCSI] libiscsi, bnx2i: make bound ep check common
[SCSI] libiscsi: add completion function for drivers that do not need pdu processing
[SCSI] scsi_dh_rdac: changes for rdac debug logging
[SCSI] scsi_dh_rdac: changes to collect the rdac debug information during the initialization
[SCSI] scsi_dh_rdac: move the init code from rdac_activate to rdac_bus_attach
[SCSI] sg: fix oops in the error path in sg_build_indirect()
[SCSI] mptsas : Bump version to 3.04.12
[SCSI] mptsas : FW event thread and scsi mid layer deadlock in SYNCHRONIZE CACHE command
[SCSI] mptsas : Send DID_NO_CONNECT for pending IOs of removed device
[SCSI] mptsas : PAE Kernel more than 4 GB kernel panic
[SCSI] mptsas : NULL pointer on big endian systems causing Expander not to tear off
[SCSI] mptsas : Sanity check for phyinfo is added
[SCSI] scsi_dh_rdac: Add support for Sun StorageTek ST2500, ST2510 and ST2530
[SCSI] pmcraid: PMC-Sierra MaxRAID driver to support 6Gb/s SAS RAID controller
[SCSI] qla2xxx: Update version number to 8.03.01-k6.
[SCSI] qla2xxx: Properly delete rports attached to a vport.
[SCSI] qla2xxx: Correct various NPIV issues.
[SCSI] qla2xxx: Correct qla2x00_eh_wait_on_command() to wait correctly.
...
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (257 commits)
[ARM] Update mach-types
ARM: 5636/1: Move vendor enum to AMBA include
ARM: Fix pfn_valid() for sparse memory
[ARM] orion5x: Add LaCie NAS 2Big Network support
[ARM] pxa/sharpsl_pm: zaurus c3000 aka spitz: fix resume
ARM: 5686/1: at91: Correct AC97 reset line in at91sam9263ek board
ARM: 5640/1: This patch modifies the support of AC97 on the at91sam9263 ek board
ARM: 5689/1: Update default config of HP Jornada 700-series machines
ARM: 5691/1: fix cache aliasing issues between kmap() and kmap_atomic() with highmem
ARM: 5688/1: ks8695_serial: disable_irq() lockup
ARM: 5687/1: fix an oops with highmem
ARM: 5684/1: Add nuc960 platform to w90x900
ARM: 5683/1: Add nuc950 platform to w90x900
ARM: 5682/1: Add cpu.c and dev.c and modify some files of w90p910 platform
ARM: 5626/1: add suspend/resume functions to amba-pl011 serial driver
ARM: 5625/1: fix hard coded 4K resource size in amba bus detection
MMC: MMCI: convert realview MMC to use gpiolib
ARM: 5685/1: Make MMCI driver compile without gpiolib
ARM: implement highpte
ARM: Show FIQ in /proc/interrupts on CONFIG_FIQ
...
Fix up trivial conflict in arch/arm/kernel/signal.c.
It was due to the TIF_NOTIFY_RESUME addition in commit d0420c83f ("KEYS:
Extend TIF_NOTIFY_RESUME to (almost) all architectures") and follow-ups.
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits)
MAINTAINERS: update KVM entry
KVM: correct error-handling code
KVM: fix compile warnings on s390
KVM: VMX: Check cpl before emulating debug register access
KVM: fix misreporting of coalesced interrupts by kvm tracer
KVM: x86: drop duplicate kvm_flush_remote_tlb calls
KVM: VMX: call vmx_load_host_state() only if msr is cached
KVM: VMX: Conditionally reload debug register 6
KVM: Use thread debug register storage instead of kvm specific data
KVM guest: do not batch pte updates from interrupt context
KVM: Fix coalesced interrupt reporting in IOAPIC
KVM guest: fix bogus wallclock physical address calculation
KVM: VMX: Fix cr8 exiting control clobbering by EPT
KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp
KVM: Document KVM_CAP_IRQCHIP
KVM: Protect update_cr8_intercept() when running without an apic
KVM: VMX: Fix EPT with WP bit change during paging
KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
KVM: x86 emulator: Add adc and sbb missing decoder flags
KVM: Add missing #include
...
console_print() is an old legacy interface mostly unused in the entire
kernel tree. It's best to clean up its existing use and let developers
use their own implementation of it as they feel fit.
Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Early clock initialization sets up PLL/FLL values for optimal RF
behaviour. As this relationship is presently undocumented, we document
this in the script so the rationale is apparent.
Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch prevents the USB1 interrupt from remaining asserted
immediately after re-boot or during a jump in to a secondary kernel.
Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
After KYCR2 is set, udelay might become necessary if there are only a
small number of keys attached. This patch introduces an optional delay
through the platform data to address this problem.
Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This tidies up the printks when running on SMP, and aids in debugging
when certain cores are unable to be scaled.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Setting monarch_cpu = -1 to let slaves frozen might not work, because
there might be slaves being late, not entered the rendezvous yet.
Such slaves might be caught in while (monarch_cpu == -1) loop.
Use kdump_in_progress instead of monarch_cpus to break INIT rendezvous
and let all slaves enter DIE_INIT_SLAVE_LEAVE smoothly.
And monarch no longer need to manage rendezvous if once kdump_in_progress
is set, catch the monarch in DIE_INIT_MONARCH_ENTER then.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
kdump_on_init
CPUs should be frozen if possible, otherwise it might hinder kdump.
So if there are CPUs not respond to IPI, try INIT to stop them.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Summary:
Asserting INIT might block kdump if the system is already going to
start kdump via panic.
Description:
INIT can interrupt anywhere in panic path, so it can interrupt in
middle of kdump kicked by panic. Therefore there is a race if kdump
is kicked concurrently, via Panic and via INIT.
INIT could fail to invoke kdump if the system is already going to
start kdump via panic. It could not restart kdump from INIT handler
if some of cpus are already playing dead with INIT masked. It also
means that INIT could block kdump's progress if no monarch is entered
in the INIT rendezvous.
Panic+INIT is a rare, but possible situation since it can be assumed
that the kernel or an internal agent decides to panic the unstable
system while another external agent decides to send an INIT to the
system at same time.
How to reproduce:
Assert INIT just after panic, before all other cpus have frozen
Expected results:
continue kdump invoked by panic, or restart kdump from INIT
Actual results:
might be hang, crashdump not retrieved
Proposed Fix:
This patch masks INIT first in panic path to take the initiative on
kdump, and reuse atomic value kdump_in_progress to make sure there is
only one initiator of kdump. All INITs asserted later should be used
only for freezing all other cpus.
This mask will be removed soon by rfi in relocate_kernel.S, before jump
into kdump kernel, after all cpus are frozen and no-op INIT handler is
registered. So if INIT was in the interval while it is masked, it will
pend on the system and will received just after the rfi, and handled by
the no-op handler.
If there was a MCA event while psr.mc is 1, in theory the event will
pend on the system and will received just after the rfi same as above.
MCA handler is unregistered here at the time, so received MCA will not
reach to OS_MCA and will result in warmboot by SAL.
Note that codes in this masked interval are relatively simpler than
that in MCA/INIT handler which also executed with the mask. So it can
be said that probability of error in this interval is supposed not so
higher than that in MCA/INIT handler.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Summary:
Asserting INIT on cpu going to be offline will result in unexpected
behavior. It will be a real problem in kdump cases where INIT might
be asserted to unstable APs going to be offline by returning to SAL.
Description:
Since psr.mc is cleared when bits in psr are set to SAL_PSR_BITS_TO_SET
in ia64_jump_to_sal(), there is a small window (~few msecs) that the
cpu can receive INIT even if the cpu enter there via INIT handler.
In this window we do restore of registers for SAL, so INIT asserted
here will not work properly.
It is hard to remove this window by masking INIT (i.e. setting psr.mc)
because we have to unmask it later in OS, because we have to use branch
instruction (br.ret, not rfi) to return SAL, due to OS_BOOT_RENDEZ to
SAL return convention.
I suppose this window will not be a real problem on cpu offline if we
can educate people not to push INIT button during hotplug operation.
However, only exception is a race in kdump and INIT. Now kdump returns
APs to SAL before processing dump, but the kernel might receive INIT at
that point in time. Such INIT might be asserted by kdump itself if an
AP doesn't react IPI soon and kdump decided to use INIT to stop the AP.
Or it might be asserted by operator or an external agent to start dump
on the unstable system.
Such panic+INIT or INIT+INIT cases should be rare, but it will be happy
if we can retrieve crashdump even in such cases.
How to reproduce:
panic+INIT or INIT+INIT, with kdump configured
Expected results:
crashdump is retrieved anyway
Actual results:
panic, hang etc. (unexpected)
Proposed fix
To avoid the window on the way to SAL, this patch stops returning APs
to SAL in case of kdump. In other words, this patch makes APs spin
in OS instead of spinning in SAL.
(* Note: What impact would be there? If a cpu is spinning in SAL,
the cpu is in BOOT_RENDEZ loop, as same as offlined cpu.
In theory if an INIT is asserted there, cpus in the BOOT_RENDEZ loop
should not invoke OS_INIT on it. So in either way, no matter where
the cpu is spinning actually in, once cpu starts spin and act as
"frozen," INIT on the cpu have no effects.
From another point of view, all debug information on the cpu should
have stored to memory before the cpu start to be frozen. So no more
action on the cpu is required.)
I confirmed that the kdump sometime hangs by concurrent INITs (another
INIT after an INIT), and it doesn't hang after applying this patch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Summary:
MCA on the beginning of kdump/kexec kernel will result in unexpected
behavior because MCA handler for previous kernel is invoked on the
kdump kernel.
Description:
Once a cpu is passed to new kernel, all resources in previous kernel
should not be used from the cpu. Even the resources for MCA handler
are no exception. So we cannot handle MCAs and its machine check
errors during kernel transition, until new handler for new kernel is
registered with new resources ready for handling the MCA.
How to reproduce:
Assert MCA while kdump kernel is booting, before new MCA handler for
kdump kernel is registered.
Expected(Desirable) results:
No recovery, cancel kdump and reboot the system.
Actual results:
MCA handler for previous kernel is invoked on the kdump kernel.
=> panic, hang etc. (unexpected)
Proposed fix:
To avoid entering MCA handler from early stage of new kernel,
unregister the entry point from SAL before leave from current
kernel. Then SAL will make all MCAs to warmboot safely, without
invoking OS_MCA.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
kdump/kexec kernel
Summary:
Asserting INIT on the beginning of kdump/kexec kernel will result
in unexpected behavior because INIT handler for previous kernel is
invoked on new kernel.
Description:
In panic situation, we can receive INIT while kernel transition,
i.e. from beginning of panic to bootstrap of kdump kernel.
Since we initialize registers on leave from current kernel, no
longer monarch/slave handlers of current kernel in virtual mode are
called safely. (In fact system goes hang as far as I confirmed)
How to Reproduce:
Start kdump
# echo c > /proc/sysrq-trigger
Then assert INIT while kdump kernel is booting, before new INIT
handler for kdump kernel is registered.
Expected(Desirable) result:
kdump kernel boots without any problem, crashdump retrieved
Actual result:
INIT handler for previous kernel is invoked on kdump kernel
=> panic, hang etc. (unexpected)
Proposed fix:
We can unregister these init handlers from SAL before jumping into
new kernel, however then the INIT will fallback to default behavior,
result in warmboot by SAL (according to the SAL specification) and
we cannot retrieve the crashdump.
Therefore this patch introduces a NOP init handler and register it
to SAL before leave from current kernel, to start kdump safely by
preventing INITs from entering virtual mode and resulting in warmboot.
On the other hand, in case of kexec that not for kdump, it also
has same problem with INIT while kernel transition.
This patch handles this case differently, because for kexec
unregistering handlers will be preferred than registering NOP
handler, since the situation "no handlers registered" is usual
state for kernel's entry.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Summary:
INIT asserted on kdump kernel invokes INIT handler not only on a
cpu that running on the kdump kernel, but also BSP of the panicked
kernel, because the (badly) frozen BSP can be thawed by INIT.
Description:
The kdump_cpu_freeze() is called on cpus except one that initiates
panic and/or kdump, to stop/offline the cpu (on ia64, it means we
pass control of cpus to SAL, or put them in spinloop). Note that
CPU0(BSP) always go to spinloop, so if panic was happened on an AP,
there are at least 2cpus (= the AP and BSP) which not back to SAL.
On the spinning cpus, interrupts are disabled (rsm psr.i), but INIT
is still interruptible because psr.mc for mask them is not set unless
kdump_cpu_freeze() is not called from MCA/INIT context.
Therefore, assume that a panic was happened on an AP, kdump was
invoked, new INIT handlers for kdump kernel was registered and then
an INIT is asserted. From the viewpoint of SAL, there are 2 online
cpus, so INIT will be delivered to both of them. It likely means
that not only the AP (= a cpu executing kdump) enters INIT handler
which is newly registered, but also BSP (= another cpu spinning in
panicked kernel) enters the same INIT handler. Of course setting of
registers in BSP are still old (for panicked kernel), so what happen
with running handler with wrong setting will be extremely unexpected.
I believe this is not desirable behavior.
How to Reproduce:
Start kdump on one of APs (e.g. cpu1)
# taskset 0x2 echo c > /proc/sysrq-trigger
Then assert INIT after kdump kernel is booted, after new INIT handler
for kdump kernel is registered.
Expected results:
An INIT handler is invoked only on the AP.
Actual results:
An INIT handler is invoked on the AP and BSP.
Sample of results:
I got following console log by asserting INIT after prompt "root:/>".
It seems that two monarchs appeared by one INIT, and one panicked at
last. And it also seems that the panicked one supposed there were
4 online cpus and no one did rendezvous:
:
[ 0 %]dropping to initramfs shell
exiting this shell will reboot your system
root:/> Entered OS INIT handler. PSP=fff301a0 cpu=0 monarch=0
ia64_init_handler: Promoting cpu 0 to monarch.
Delaying for 5 seconds...
All OS INIT slaves have reached rendezvous
Processes interrupted by INIT - 0 (cpu 0 task 0xa000000100af0000)
:
<<snip>>
:
Entered OS INIT handler. PSP=fff301a0 cpu=0 monarch=1
Delaying for 5 seconds...
mlogbuf_finish: printing switched to urgent mode, MCA/INIT might be dodgy or fail.
OS INIT slave did not rendezvous on cpu 1 2 3
INIT swapper 0[0]: bugcheck! 0 [1]
:
<<snip>>
:
Kernel panic - not syncing: Attempted to kill the idle task!
Proposed fix:
To avoid this problem, this patch inserts ia64_set_psr_mc() to mask
INIT on cpus going to be frozen. This masking have no effect if the
kdump_cpu_freeze() is called from INIT handler when kdump_on_init == 1,
because psr.mc is already turned on to 1 before entering OS_INIT.
I confirmed that weird log like above are disappeared after applying
this patch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: kexec@lists.infradead.org
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix compilation error in arch/x86/kernel/cpu/mcheck/mce-severity.c
when CONFIG_DEBUG_FS is disabled, introduced in commit
5be9ed251f.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (21 commits)
sparc64: Initial niagara2 perf counter support.
sparc64: Perf counter 'nop' event is not constant.
sparc64: Provide a way to specify a perf counter overflow IRQ enable bit.
sparc64: Provide hypervisor tracing bit support for perf counters.
sparc64: Initial hw perf counter support.
sparc64: Implement a real set_perf_counter_pending().
sparc64: Use nmi_enter() and nmi_exit(), as needed.
sparc64: Provide extern decls for sparc_??u_type strings.
sparc64: Make touch_nmi_watchdog() actually work.
sparc64: Kill unnecessary cast in profile_timer_exceptions_notify().
sparc64: Manage NMI watchdog enabling like x86.
sparc: add basic support for 'perf'
sparc: convert /proc/io_map, /proc/dvma_map to seq_file
sparc, leon: sparc-leon specific SRMMU initialization and bootup fixes.
sparc,leon: Added support for AMBAPP bus.
sparc,leon: Introduce the sparc-leon CPU type.
sparc,leon: Redefine MMU register access asi if CONFIG_LEON
sparc,leon: CONFIG_SPARC_LEON option and leon specific files.
sparc64: cheaper asm/uaccess.h inclusion
SPARC: fix duplicate declaration
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...
Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.
- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.
Noted in 'next' by Stephen Rothwell.
* 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: split __phys_addr out into separate file
xen: use stronger barrier after unlocking lock
xen: only enable interrupts while actually blocking for spinlock
xen: make -fstack-protector work under Xen
Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.
CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, percpu: Collect hot percpu variables into one cacheline
x86, percpu: Fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED()
x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, highmem_32.c: Clean up comment
x86, pgtable.h: Clean up types
x86: Clean up dump_pagetable()
* 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Simplify the Makefile in a minor way through use of cc-ifversion
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: move clts into batch cpu state updates when preloading fpu
x86-64: move unlazy_fpu() into lazy cpu state part of context switch
x86-32: make sure clts is batched during context switch
x86: split out core __math_state_restore
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Decrease the level of some NUMA messages to KERN_DEBUG
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
x86: Fix code patching for paravirt-alternatives on 486
x86, msr: change msr-reg.o to obj-y, and export its symbols
x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus
x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
x86, mcheck: Use correct cpumask for shared bank4
x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors
x86: Fix CPU llc_shared_map information for AMD Magny-Cours
x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too
x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h
x86, msr: fix msr-reg.S compilation with gas 2.16.1
x86, msr: Export the register-setting MSR functions via /dev/*/msr
x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs()
x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
x86, msr: CFI annotations, cleanups for msr-reg.S
x86, asm: Make _ASM_EXTABLE() usable from assembly code
x86, asm: Add 32-bit versions of the combined CFI macros
x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit
x86, msr: Rewrite AMD rd/wrmsr variants
x86, msr: Add rd/wrmsr interfaces with preset registers
x86: add specific support for Intel Atom architecture
...
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Make memtype_seq_ops const
x86: uv: Clean up uv_ptc_init(), use proc_create()
x86: Use printk_once()
x86/cpu: Clean up various files a bit
x86: Remove duplicated #include
x86, ipi: Clean up safe_smp_processor_id() by using the cpu_has_apic() macro helper
x86: Clean up idt_descr and idt_tableby using NR_VECTORS instead of hardcoded number
x86: Further clean up of mtrr/generic.c
x86: Clean up mtrr/main.c
x86: Clean up mtrr/state.c
x86: Clean up mtrr/mtrr.h
x86: Clean up mtrr/if.c
x86: Clean up mtrr/generic.c
x86: Clean up mtrr/cyrix.c
x86: Clean up mtrr/cleanup.c
x86: Clean up mtrr/centaur.c
x86: Clean up mtrr/amd.c:
x86: ds.c fix invalid assignment
* 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: remove all now-duplicate header files
x86: convert termios.h to the asm-generic version
x86: convert almost generic headers to asm-generic version
x86: convert trivial headers to asm-generic version
x86: add copies of some headers to convert to asm-generic
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=n
x86, apic: Slim down stack usage in early_init_lapic_mapping()
x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources()
x86, ioapic: Define IO_APIC_DEFAULT_PHYS_BASE constant
x86: Fix x86_model test in es7000_apic_is_cluster()
x86, apic: Move dmar_table_init() out of enable_IR()
x86, ioapic: Panic on irq-pin binding only if needed
x86/apic: Enable x2APIC without interrupt remapping under KVM
x86, apic: Drop redundant bit assignment
x86, ioapic: Throw BUG instead of NULL dereference
x86, ioapic: Introduce for_each_irq_pin() helper
x86: Remove superfluous NULL pointer check in destroy_irq()
x86/ioapic.c: unify ioapic_retrigger_irq()
x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loop
x86/ioapic.c: clean up replace_pin_at_irq_node logic and comments
x86/ioapic.c: convert replace_pin_at_irq_node to conventional for() loop
x86/ioapic.c: simplify add_pin_to_irq_node()
x86/ioapic.c: convert io_apic_level_ack_pending loop to normal for() loop
x86/ioapic.c: move lost comment to what seems like appropriate place
x86/ioapic.c: remove redundant declaration of irq_pin_list
...