Commit graph

310671 commits

Author SHA1 Message Date
Linus Torvalds
513335f964 PARISC fixes on 20120607
This is a set of three bug fixes for minor build breakages that got introduced
 just before 3.5-rc1 was released.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJP0ENjAAoJEDeqqVYsXL0MyjIIAIWuQZ4YYSB1b06U8Bd82vLL
 ES9hJ7ZfgKO/2bgEWRR3HgUM4kqWt31TbqaVmwbZKy+Z7XRTCtpCeOEIraS9VrI+
 tTDyUmhEwaxDAj9XisbVZwPxCI6f06Pry0K1JJn505MaVzQ8F4/fyjhLuwIoWX+Z
 Lqkl4DW4hQwqPNuBUT8paGLAenA5S9FHjhugqSiNCBKRhz7/vfHyOUHzZb8mMs2a
 8A5ZjJA6rLiCLauM3JPndq5e+GxHyN04A6o75b721I440yvBqxHsnCM8n59N8LRd
 nBgdrlorsOav/2l+BG5qrnh4Uavmniax5sAk4byqq4F6etHFCeem9lqhKFPcw6k=
 =xh6K
 -----END PGP SIGNATURE-----

Merge tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6

Pull PARISC fixes from James Bottomley:
 "This is a set of three bug fixes for minor build breakages that got
  introduced just before 3.5-rc1 was released."

* tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
  [PARISC] fix code to find libgcc
  [PARISC] fix compile break in use of lib/strncopy_from_user.c
  [PARISC] fix missing TAINT_WARN problem
2012-06-07 09:06:54 -07:00
Linus Torvalds
0c30989cc9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tile fixes from Chris Metcalf:
 "These two minor bug fixes fix build failures from some changes that
  were merged in during the 3.5 merge window."

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: add #include to unbreak build after generic init_task conversion
  tile: remove cpu_idle_on_new_stack
2012-06-07 09:06:13 -07:00
Artem Bityutskiy
12027f1b3f UBI: correct ubi_wl_flush locking
Commit "62f38455 UBI: modify ubi_wl_flush function to clear work queue for a lnum"
takes the 'work_sem' semaphore in write mode for the entire loop, which is not
very good because it will block other workers for potentially long time. We do
not need to have it in write mode - read mode is enough, and we do not need to
hole it over the entire loop. So this patch turns changes the locking: takes
'work_sem' in read mode and pushes it down to the loop.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-06-07 15:22:21 +03:00
Artem Bityutskiy
818039c7d5 UBIFS: fix debugfs-less systems support
Commit "f70b7e5 UBIFS: remove Kconfig debugging option" broke UBIFS and it
refuses to initialize if debugfs (CONFIG_DEBUG_FS) is disabled. I incorrectly
assumed that debugfs files creation function will return success if debugfs
is disabled, but they actually return -ENODEV. This patch fixes the issue.

Reported-by: Paul Parsons <lost.distance@yahoo.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Paul Parsons <lost.distance@yahoo.com>
2012-06-07 10:43:54 +03:00
Artem Bityutskiy
e9b4cf2094 UBI: fix debugfs-less systems support
Commit "aa44d1d UBI: remove Kconfig debugging option" broke UBI and it
refuses to initialize if debugfs (CONFIG_DEBUG_FS) is disabled. I incorrectly
assumed that debugfs files creation function will return success if debugfs
is disabled, but they actually return -ENODEV. This patch fixes the issue.

Reported-by: Paul Parsons <lost.distance@yahoo.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Paul Parsons <lost.distance@yahoo.com>
2012-06-07 10:43:54 +03:00
Adam Jackson
23e81d691a drm/i915: pch_irq_handler -> {ibx, cpt}_irq_handler
Cougar/Panther Point redefine the bits in SDEIIR pretty completely.
This function is just debugging, but if we're debugging we probably want
to be told accurate things instead of lies.

I'm told Lynx Point changes this yet more, but I have no idea how...

Note from Eugeni's review:

"For the record and for future enabling efforts, for LPT, bits 28-31
and 1-14 are gone since CPT/PPT (e.g., those must be zero). And there
is the bit 15 as a new addition, but we are not using it yet and
probably won't be using in foreseeable future."

Signed-off-by: Adam Jackson <ajax@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35103
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-06 23:01:08 +02:00
Linus Torvalds
71fae7e714 hwmon updates for 3.5-rc2
Update e-mail address in MAINTAINERS
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPz59TAAoJEMsfJm/On5mB7pYQAKzNq0FoC0UhPU6Jr97/nRho
 hICntMb0XkJbFOmC8u+h6ZvxoECTNyRXhBUJbbvgWH1mjL+FWrHFk9LuS7E2RhrR
 co3sNkF9SgJ7Pg1DOjER+p4sf3rfpXt9XiaF9Wu/9w/jO3cJ47qBIicEsFokUbAN
 TuhVudWRur4ZEmrpUS8qPOQSM55gcZ8d/2+TBqkwIqrh8Z7KrepbMDI5dyE2GsJq
 dFJ8fo0J6qB/5K6R8QMm7DEuX/voX3cYuKZhyfbOCuiua4J08gQG2MNs5BQ5BScR
 XXvLJHYzum8fncBvhcMUmuwlFJzjjfFzJOMZwdTndAgjUR8pns3lVBUiem+o0XmE
 WHnTxSt6qjEI0WkuNKwDnxTnhD1EYS06A28Qy8AMg04bkGBji+JrKWK4N2mG/SJ2
 vxwQ8GsFi4DBhpO47r3Cb+UCKSirRoTwF63IcaQrkTRqCjm2+AQZKwfMfLXamAST
 25+M8G+Rq7CV/wvRcFVaHXXwxs06YENI1jkToDAFVFDf9CyIglCh763P/Shppn83
 7mSkV7OG/ZxPUzjuUsbQ2hbDB8RyWXUOMAW69o2UcSsU4QtmCMDSnjg5M2fqBat5
 wXsI+wevFty7UMIkVcmM4MdKs9/hljafKc27QNFutH1Zg5hzy9Jdanb9ZrZQkA6I
 U/SCa16Pmu9B65gGvRMn
 =djSV
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Update e-mail address in MAINTAINERS"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  MAINTAINERS: Update my e-mail address
2012-06-06 11:56:45 -07:00
Linus Torvalds
ff39d0e8f0 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull ACPI and Power Management changes from Len Brown.

This does an evil merge to fix up what I think is a mismerge by Len to
the gma500 driver, and restore it to the mainline state.

In that driver, both branches had commented out the call to
acpi_video_register(), and Len resolved the merge to that commented-out
version.

However, in mainline, further changes by Alan (commit d839ede47a:
"gma500: opregion and ACPI" to be exact) had re-enabled the ACPI video
registration, so the current state of the driver seems to want it.

Alan is apparently still feeling the effects of partying with the Queen,
so he didn't reply to my query, but I'll do the evil merge anyway.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  ACPI: fix acpi_bus.h build warnings when ACPI is not enabled
  drivers: acpi: Fix dependency for ACPI_HOTPLUG_CPU
  tools/power turbostat: fix IVB support
  tools/power turbostat: fix un-intended affinity of forked program
  ACPI video: use after input_unregister_device()
  gma500: don't register the ACPI video bus
  acpi_video: Intel video is not always i915
  acpi_video: fix leaking PCI references
  ACPI: Ignore invalid _PSS entries, but use valid ones
  ACPI battery: only refresh the sysfs files when pertinent information changes
2012-06-06 10:47:15 -07:00
Linus Torvalds
ae501be0f6 InfiniBand/RDMA fixes for 3.5-rc2, all in hardware drivers:
- Fix crash in cxgb4
  - Fixes to new ocrdma driver
  - Regression fixes for mlx4
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPz5V+AAoJEENa44ZhAt0hsXoQAIFBNbsLWLft4J9jpyDAFj5z
 SpspcZAVgJW+K7I6j38SdQXMsR/XmcwM8Yh3iJIa/YG0P/6bWXDNU20itN62mAcI
 D+LphECUv/T8PGxEcAFlwjLYy5kcgFBTs/t7AgVS57OL4z8YuDLDr7uhdcnPdq7t
 AMIptqhXZgSnM/peIkf2SWBYxHSUMhukrlu5uNB9uc9GC9VRGhioXxT3eBPwr54O
 7BwHxQGDYtFZ4fYKmFxN9sJJmFf9nl/WhYM2HwCMPEe4UzlxxfbP7/c17QNS7YGo
 e072Jvs+U2ttb4J7J2yBRcpiiSjYDAVi+7fE9OtAdWyfPDa9MmcajVq7PUnohZLc
 muNSv1RGOebj2mSjE+oc1oWZKkM/nSEwbIE7PJOvWX2RdqOs8xNNC0RjgvsXDvyf
 +lARPcyolxpJXtvIlLY/Ua1KhJf52zu+Kp1QCbEnloU/NuGcc09It6Wq6X1yU/Gs
 N29qjuFOBoHQxDzfWPc3Uhp1/eNI8UJiTfO9CSYHv2ZHJzW8BqoblbKlE2l43TDg
 frH1l1/9RYY7gTGoCVvfJByvSWM7rBJdNfpu+yJDm5en86xcF6HB7GG0nJ/okpw2
 dUqQsLu+nTKZn8KCM3jzgP/ANT+PWk1UZlS1fIaMFMwa1SLWYqbaW/KEbRrY5IeP
 H0q/TFhJH8PccBjPbcsT
 =zl8u
 -----END PGP SIGNATURE-----

Merge tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA fixes from Roland Dreier:
 "All in hardware drivers:
   - Fix crash in cxgb4
   - Fixes to new ocrdma driver
   - Regression fixes for mlx4"

* tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Fix max_wqe capacity reported from query device
  mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
  IB/mlx4: Fix EQ deallocation in legacy mode
  RDMA/cxgb4: Fix crash when peer address is 0.0.0.0
  RDMA/ocrdma: Remove unnecessary version.h includes
  RDMA/ocrdma: Fix signaled event for SRQ_LIMIT_REACHED
  RDMA/ocrdma: Correct queue free count math
2012-06-06 10:45:21 -07:00
Roland Dreier
20952cdd8e Merge branches 'cxgb4', 'mlx4' and 'ocrdma' into for-linus 2012-06-06 10:08:11 -07:00
Sagi Grimberg
fc2d004419 IB/mlx4: Fix max_wqe capacity reported from query device
1. Limit the max number of WQEs per QP reported when querying the
   device, so that ib_create_qp() will not fail for a QP size that the
   device claimed to support due to additional headroom WQEs being
   allocated.

2. Limit qp resources accepted for ib_create_qp() to the limits
   reported in ib_query_device().  In kernel space, make sure that the
   limits returned to the caller following qp creation also lie within
   the reported device limits. For userspace, report as before, and do
   adjustment in libmlx4 (so as not to break ABI).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-06 10:08:03 -07:00
Jack Morgenstein
edc4a67e15 mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
Commit 096335b3f9 ("mlx4_core: Allow dynamic MTU configuration for
IB ports") modifies the port VL setting.  This exposes a bug in
mlx4_common_set_port(), where the VL cap value passed in (inside the
command mailbox) is incorrectly zeroed-out:

mlx4_SET_PORT modifies the VL_cap field (byte 3 of the mailbox).
Since the SET_PORT command is paravirtualized on the master as well as
on the slaves, mlx4_SET_PORT_wrapper() is invoked on the master.  This
calls mlx4_common_set_port() where mailbox byte 3 gets overwritten by
code which should only set a single bit in that byte (for the reset
qkey counter flag) -- but instead overwrites the entire byte.

The result is that when running in SR-IOV mode, the VL_cap will be set
to zero -- fix this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-06 10:07:54 -07:00
Linus Torvalds
374916ed16 md: 2 fixes for 3.5-rc
One sparse-warning fix, one bigfix for 3.4-stable
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIVAwUAT86Oqjnsnt1WYoG5AQJ9rhAAuLVImkvxtHHMM7j2E8ZTQ1pWT6JRf6qC
 Rsz/s41olPwSEVRuLXpZrle/dSN2l1Ys49FR2u6m+96lM0At2JlkML/Sc4Gszr0g
 Oeo8FN+Rv/Sv6Chv7MuWp0z0WOs3ruIR3AYQIo+jnaVzZLLQ2HRN8wupjvpCIZyk
 WdPu6t/9G+OtnkFWCC3FDEIyqpghg1TcoK93b1eRFD/ZoPV8yDJ9bba//fDesVVI
 OhvUJPqeJ/ow+sA1MzyLhKB6CLPmEob0qxi8++CdnTfx8fwnkYNKlgsxf0WQ8JQ5
 GSClKNUpki0yiYWJR6pJrv6+e6WbesX1DriRSRODJLKls/bQKnskaxGx4DUa73BM
 DkOUsALfaTfGD5XiXgEjTU2HR+codiqvDavQjWOlHWgwIKB2MYQWIwFLK/T2RSdC
 5f30IiM6kHJMZS3lVP2kjfXAfQ10kiTBg7E6btzCO3aso84yxr6Er65skdnlIi5r
 q1z7FCnQimfZYjlbuR8EUtdxHdGZkSQbtZ5E7X9dvmUpFstvgGGPr/SuAP3r87kM
 LbyRSoDpGk8dXZ5/epY+IKCQGsFZIeTlg+eonjSuVNN8Anr3WAE1VeRmLBQilnXk
 hGDLKAZ4v9YwRJWqoY3hewtpcYhCMqNGGk4hPKmJuh37OTOWFQl8sXVk2Pqzy1ap
 uIP66qrvvI0=
 =VrYL
 -----END PGP SIGNATURE-----

Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull two md fixes from NeilBrown:
 "One sparse-warning fix, one bugfix for 3.4-stable"

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
  md: raid1/raid10: fix problem with merge_bvec_fn
  lib/raid6: fix sparse warnings in recovery functions
2012-06-06 09:49:28 -07:00
Linus Torvalds
9e68447f5b IOMMU fixes for v3.5-rc1
Two patches are in here which fix AMD IOMMU specific issues. One patch
 fixes a long-standing warning on resume because the amd_iommu_resume
 function enabled interrupts. The other patch fixes a deadlock in an
 error-path of the page-fault request handling code of the IOMMU driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPz1wQAAoJECvwRC2XARrj6FkQAJsF4LuzKXQcAybUnzMZq3E8
 g2ii0uknErrNgsAjnooH1aTzq0vo5Ps/ZLj84XJER+V0vYkD574vhKs9dzu8Y8at
 xv20AH+Uei+0tUVy2WCPglVwZaSrQB+eVm0oqWIqNKnp28q3pVsns8iccp+Anqxz
 VV8IqJchfAUXdB8EGJCbZkT+YGfNcFGXx2bfYyiNBUvXOfE6fzzXZly0AXWcm9GB
 XLYxLdtZ5RYG874IMKnQvAUCXAfN21snK61i1hcx1aoZynzVAFvbf1SnzcINuX4M
 SqXdies6Ui8S2Ndvl4wZXfEkFYRL48v2+tpjp9bxEvvfX40WCbUHJ3M7Z+LoayvP
 Xgpshp5JaoXcgp3Rgyhyd22HzKMP291NyjObrKBgf6QxFaQSz16OIsFN9aKxEiyc
 QCMJyLgfKeBDWM64Zw2u526RIIMENPIim+YH5R2jN08kZUYd+5r611E5BcxE0I6Z
 uuuDi+eY+UCiBVyofckuJz4yVnNIZroRSFoldfsbv7BvHCE4QNOCyG5JrdVzL0e3
 81dgmMDaPCr2ga2NjJ2hebYJLNYNc4IIf/zwK8OoT/S5sD53zU42WIQ2Ug91hXmw
 ep04oobprN764te1VOaSbm9FNUCD3ykf6xLtqX9gKp254mZ/w46YaMR0GSphrUbS
 YBNnrSbIq+1z5MtruD/V
 =qLIK
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "Two patches are in here which fix AMD IOMMU specific issues.  One
  patch fixes a long-standing warning on resume because the
  amd_iommu_resume function enabled interrupts.  The other patch fixes a
  deadlock in an error-path of the page-fault request handling code of
  the IOMMU driver.

* tag 'iommu-fixes-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix deadlock in ppr-handling error path
  iommu/amd: Cache pdev pointer to root-bridge
2012-06-06 09:47:57 -07:00
Chris Metcalf
2ded5c2484 tile: add #include to unbreak build after generic init_task conversion
Some code was moved from init_task.c to setup.c but the appropriate
header needed to be moved as well.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-06-06 11:29:35 -04:00
Chris Metcalf
10db9e009a tile: remove cpu_idle_on_new_stack
This routine isn't used unless CONFIG_HOMECACHE is enabled, which
isn't even available as a public configuration option yet.
Since it no longer links correctly in 3.4, just remove it for now.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-06-06 11:29:31 -04:00
Andi Kleen
70ab7003de perf/x86: Don't assume there can be only 4 PEBS events
On Sandy Bridge in non HT mode there are 8 counters available.
Since every counter can write a PEBS record assuming there are
4 max is incorrect. Use the reported counter number -- with an
upper limit for a static array -- instead.

Also I made the warning messages a bit more informational.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338944211-28275-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:23:40 +02:00
Vince Weaver
c48b60538c perf/x86: Use rdpmc() rather than rdmsr() when possible in the kernel
The rdpmc instruction is faster than the equivelant rdmsr call,
so use it when possible in the kernel.

The perfctr kernel patches did this, after extensive testing showed
rdpmc to always be faster (One can look in etc/costs in the perfctr-2.6
package to see a historical list of the overhead).

I have done some tests on a 3.2 kernel, the kernel module I used
was included in the first posting of this patch:

                   rdmsr           rdpmc
 Core2 T9900:      203.9 cycles     30.9 cycles
 AMD fam0fh:        56.2 cycles      9.8 cycles
 Atom 6/28/2:      129.7 cycles     50.6 cycles

The speedup of using rdpmc is large.

[ It's probably possible (and desirable) to do this without
  requiring a new field in the hw_perf_event structure, but
  the fixed events make this tricky. ]

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1203011724030.26934@cl320.eecs.utk.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:23:35 +02:00
Andi Kleen
1ff4d58a19 x86: Add rdpmcl()
Add a version of rdpmc() that directly reads into a u64

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338944211-28275-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:23:27 +02:00
Peter Zijlstra
1c2ac3fde3 perf/x86: Fix wrmsrl() debug wrapper
Move the wrmslr() debug wrapper to the common header now that all the
include games are gone. Also clean it up a bit to avoid multiple
evaluation of the argument.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-l4gkfnivwv4yi5mqxjlovymx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:23:22 +02:00
Ingo Molnar
ff1f74fdcf Merge branch 'perf/urgent' into perf/core
Eliminate a conflict in a patch I am going to apply.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:22:57 +02:00
Peter Zijlstra
212d95dfdb perf/x86: Update SNB PEBS constraints
Afaict there's no need to (incompletely) iterate the
MEM_UOPS_RETIRED.* umask state.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:22:43 +02:00
Peter Zijlstra
47a8863dbb perf/x86: Enable/Add IvyBridge hardware support
Implement rudimentary IVB perf support. The SDM states its identical
to SNB with exception of the exact event tables, but a quick look
suggests they're similar enough.

Also mark SNB-EP as broken for now.

Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:22:39 +02:00
Peter Zijlstra
0780c927a0 perf/x86: Implement cycles:p for SNB/IVB
Now that there's finally a chip with working PEBS (IvyBridge), we can
enable the hardware and implement cycles:p for SNB/IVB.

Cc: Stephane Eranian <eranian@google.com>
Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:22:34 +02:00
Peter Zijlstra
5a425294ee perf/x86: Fix Intel shared extra MSR allocation
Zheng Yan reported that event group validation can wreck event state
when Intel extra_reg allocation changes event state.

Validation shouldn't change any persistent state. Cloning events in
validate_{event,group}() isn't really pretty either, so add a few
special cases to avoid modifying the event state.

The code is restructured to minimize the special case impact.

Reported-by: Zheng Yan <zheng.z.yan@linux.intel.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338903031.28282.175.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:22:26 +02:00
Oleg Nesterov
778b032d96 uprobes: Kill uprobes_srcu/uprobe_srcu_id
Kill the no longer needed uprobes_srcu/uprobe_srcu_id code.

It doesn't really work anyway. synchronize_srcu() can only
synchronize with the code "inside" the
srcu_read_lock/srcu_read_unlock section, while
uprobe_pre_sstep_notifier() does srcu_read_lock() _after_ we
already hit the breakpoint.

I guess this probably works "in practice". synchronize_srcu() is
slow and it implies synchronize_sched(), and the probed task
enters the non- preemptible section at the start of exception
handler. Still this is not right at least in theory, and
task->uprobe_srcu_id blows task_struct.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120529193008.GG8057@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:22:22 +02:00
Oleg Nesterov
56bb4cf647 uprobes: Teach handle_swbp() to rely on "is_swbp" rather than uprobes_srcu
Currently handle_swbp() assumes that it can't race with
unregister, so it roughly does:

	if (find_uprobe(vaddr))
		process_uprobe();
	else
		send_sig(SIGTRAP);

This relies on the not-really-working uprobes_srcu code we are
going to remove, see the next patch.

With this patch we rely on the result of
is_swbp_at_addr(bp_vaddr) if find_uprobe() fails.

If is_swbp == 1, then we hit the normal int3, we should send
SIGTRAP.

If is_swbp == 0, we raced with uprobe_unregister(), we simply
restart this insn again.

The "difficult" case is is_swbp == -EFAULT, when we can't read
this memory. In this case I think we should restart too, and
this is more correct compared to the current code which sends
SIGTRAP.

Ignoring ENOMEM/etc from get_user_pages(), this can only happen
if another thread unmaps this memory before find_active_uprobe()
takes mmap_sem. It would be better to pretend it was unmapped
before this insn was executed, restart, and get SIGSEGV.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120529192947.GF8057@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:22:03 +02:00
Oleg Nesterov
77fc4af1b5 uprobes: Change register_for_each_vma() to take mm->mmap_sem for writing
Change register_for_each_vma() to take mm->mmap_sem for writing.
This is a bit unfortunate but hopefully not too bad, this is the
slow path anyway.

This is needed to ensure that find_active_uprobe() can not race
with uprobe_register() which adds the new bp at the same
bp_vaddr, after find_uprobe() fails and before
is_swbp_at_addr_fast() checks the memory.

IOW, this is needed to ensure that if find_active_uprobe()
returns NULL but is_swbp == true, we can safely assume that it
was the "normal" int3 and we should send SIGTRAP.

There is another reason for this change. We are going to replace
uprobes_state->count with MMF_ flags set by register/unregister
and cleared by find_active_uprobe(), and set/clear shouldn't
race with each other.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120529192928.GE8057@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:21:48 +02:00
Oleg Nesterov
d790d34653 uprobes: Teach find_active_uprobe() to provide the "is_swbp" info
A separate patch to simplify the review, and for the
documentation.

The patch adds another "int *is_swbp" argument to
find_active_uprobe(), so far its only caller doesn't use this
info.

With this patch find_active_uprobe() additionally does:

	- if find_vma() + ->vm_start check fails, *is_swbp = -EFAULT

	- otherwise, if valid_vma() + find_uprobe() fails, it holds
	  the result of is_swbp_at_addr(), can be negative too. The
	  latter is only possible if we raced with another thread
	  which did munmap/etc after we hit this bp.

IOW. If find_active_uprobe(&is_swbp) returns NULL, the caller
can look at is_swbp to figure out whether the current insn is bp
or not, or detect the race with another thread if it is
negative.

Note: I think that performance-wise this change is fine. This
adds is_swbp_at_addr(), but only if we raced with
uprobe_unregister() or if we hit the "normal" int3 but this mm
has uprobes as well. And even in this case the slow
read_opcode() path is very unlikely, this insn recently
triggered do_int3(), __copy_from_user_inatomic() shouldn't fail
in the likely case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120529192914.GD8057@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:15:24 +02:00
Oleg Nesterov
3a9ea0520f uprobes: Introduce find_active_uprobe() helper
No functional changes. Move the "find uprobe" code from
handle_swbp() to the new helper, find_active_uprobe().

Note: with or without this change, the find-active-uprobe logic
is not exactly right. We can race with another thread which
unmaps the memory with the valid uprobe before we take
mm->mmap_sem. We can't find this uprobe simply because
find_vma() fails. In this case we wrongly assume that this trap
was not caused by uprobe and send the erroneous SIGTRAP. See the
next changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120529192857.GC8057@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:15:17 +02:00
Oleg Nesterov
a3d7bb4793 uprobes: Change read_opcode() to use FOLL_FORCE
set_orig_insn()->read_opcode() should not fail if the probed
task did mprotect() after uprobe_register(), change it to use
FOLL_FORCE. Without FOLL_WRITE this doesn't have any "side"
effect but allows to read the !VM_READ memory.

There is another reason for this change, we are going to use
is_swbp_at_addr() from handle_swbp() which can race with another
thread doing mprotect().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120529192759.GB8057@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:14:49 +02:00
Oleg Nesterov
c00b275043 uprobes: Optimize is_swbp_at_addr() for current->mm
Change is_swbp_at_addr() to try to avoid the costly
read_opcode() if mm == current->mm, __copy_from_user_inatomic()
should succeed in the likely case.

Currently this optimization is not important, but we are going
to add more is_swbp_at_addr(current->mm) callers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120529192744.GA8057@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:13:59 +02:00
Arun Sharma
db0dc75d64 perf/x86: Check user address explicitly in copy_from_user_nmi()
Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-5-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:08:04 +02:00
Arun Sharma
bc6ca7b342 perf/x86: Check if user fp is valid
Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-4-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:08:01 +02:00
Arun Sharma
0b0d9cf6ec perf: Limit callchains to 127
Stack depth of 255 seems excessive, given that copy_from_user_nmi()
could be slow.

Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-3-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:08:00 +02:00
Arun Sharma
302fa4b58a perf/x86: Allow multiple stacks
Without this patch, applications with two different stack
regions (eg: native stack vs JIT stack) get truncated
callchains even when RBP chaining is present. GDB shows proper
stack traces and the frame pointer chaining is intact.

This patch disables the (fp < RSP) check, hoping that other checks
in the code save the day for us. In our limited testing, this
didn't seem to break anything.

In the long term, we could potentially have userspace advise
the kernel on the range of valid stack addresses, so we don't
spend a lot of time unwinding from bogus addresses.

Signed-off-by: Arun Sharma <asharma@fb.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:07:58 +02:00
Dimitri Sivanich
a841f8cef4 sched: Fix the relax_domain_level boot parameter
It does not get processed because sched_domain_level_max is 0 at the
time that setup_relax_domain_level() is run.

Simply accept the value as it is, as we don't know the value of
sched_domain_level_max until sched domain construction is completed.

Fix sched_relax_domain_level in cpuset.  The build_sched_domain() routine calls
the set_domain_attribute() routine prior to setting the sd->level, however,
the set_domain_attribute() routine relies on the sd->level to decide whether
idle load balancing will be off/on.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120605184436.GA15668@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:07:41 +02:00
Eugeni Dodonov
67384fe3fd char/agp: add another Ironlake host bridge
This seems to come on Gigabyte H55M-S2V and was discovered through the
https://bugs.freedesktop.org/show_bug.cgi?id=50381 debugging.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50381
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-06 17:05:29 +02:00
Peter Zijlstra
8440ccb43f perf/x86: Update SNB PEBS constraints
Afaict there's no need to (incompletely) iterate the
MEM_UOPS_RETIRED.* umask state.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:52 +02:00
Peter Zijlstra
b6db437ba8 perf/x86: Enable/Add IvyBridge hardware support
Implement rudimentary IVB perf support. The SDM states its identical
to SNB with exception of the exact event tables, but a quick look
suggests they're similar enough.

Also mark SNB-EP as broken for now.

Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:49 +02:00
Peter Zijlstra
cccb9ba9e4 perf/x86: Implement cycles:p for SNB/IVB
Now that there's finally a chip with working PEBS (IvyBridge), we can
enable the hardware and implement cycles:p for SNB/IVB.

Cc: Stephane Eranian <eranian@google.com>
Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:47 +02:00
Peter Zijlstra
b430f7c470 perf/x86: Fix Intel shared extra MSR allocation
Zheng Yan reported that event group validation can wreck event state
when Intel extra_reg allocation changes event state.

Validation shouldn't change any persistent state. Cloning events in
validate_{event,group}() isn't really pretty either, so add a few
special cases to avoid modifying the event state.

The code is restructured to minimize the special case impact.

Reported-by: Zheng Yan <zheng.z.yan@linux.intel.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338903031.28282.175.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:44 +02:00
Peter Zijlstra
d039ac6080 sched: Validate assumptions in sched_init_numa()
Add some code to validate assumptions we're making and output
warnings if they are not.

If this trigger we want to know about it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alex Shi <lkml.alex@gmail.com>
Link: http://lkml.kernel.org/n/tip-6uc3wk5s9udxtdl9cnku0vtt@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:52:30 +02:00
Peter Zijlstra
c3decf0dfb sched: Always initialize cpu-power
Often when we run into mis-shapen topologies the balance iteration
fails to update the cpu power properly and we'll end up in /0 traps.

Always initialize the cpu-power to a semi-sane value so that we can
at least boot the machine, even if the load-balancer might not
function correctly.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-3lbhyj25sr169ha7z3qht5na@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:52:27 +02:00
Peter Zijlstra
c117487687 sched: Fix domain iteration
Weird topologies can lead to asymmetric domain setups. This needs
further consideration since these setups are typically non-minimal
too.

For now, make it work by adding an extra mask selecting which CPUs
are allowed to iterate up.

The topology that triggered it is the one from David Rientjes:

	10 20 20 30
	20 10 20 20
	20 20 10 20
	30 20 20 10

resulting in boxes that wouldn't even boot.

Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-3p86l9cuaqnxz7uxsojmz5rm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:52:26 +02:00
Peter Zijlstra
7f1b43936f sched/rt: Fix lockdep annotation within find_lock_lowest_rq()
Roland Dreier reported spurious, hard to trigger lockdep warnings
within the scheduler - without any real lockup.

This bit gives us the right clue:

> [89945.640512]  [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90
> [89945.640568]  [<ffffffff8104c546>] push_rt_task+0xc6/0x290

if you look at that code you'll find the double_lock_balance() in
question is the one in find_lock_lowest_rq() [yay for inlining].

Now find_lock_lowest_rq() has a bug.. it fails to use
double_unlock_balance() in one exit path, if this results in a retry in
push_rt_task() we'll call double_lock_balance() again, at which point
we'll run into said lockdep confusion.

Reported-by: Roland Dreier <roland@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337282386.4281.77.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:52:26 +02:00
Alex Shi
10717dcde1 sched/numa: Load balance between remote nodes
Commit cb83b629b ("sched/numa: Rewrite the CONFIG_NUMA sched
domain support") removed the NODE sched domain and started checking
if the node distance in SLIT table is farther than REMOTE_DISTANCE,
if so, it will lose the load balance chance at exec/fork/wake_affine
points.

But actually, even the node distance is farther than REMOTE_DISTANCE.

Modern CPUs also has QPI like connections, which ensures that memory
access is not too slow between nodes. So the above change in behavior
on NUMA machine causes a performance regression on various benchmarks:
hackbench, tbench, netperf, oltp, etc.

This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and thus fixes the
perfromance regressions. (all of them just have 2 kinds distance, 10, 21)

Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338965571-9812-1-git-send-email-alex.shi@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:52:25 +02:00
Kamalesh Babulal
ceb1cbac8e sched/x86: Calculate booted cores after construction of sibling_mask
Commit 316ad24830 ("sched/x86: Rewrite set_cpu_sibling_map()")
broke the booted_cores accounting.

The problem is that the booted_cores accounting needs all the
sibling links set up. So restore the second loop and add a comment as
to why its needed.

On qemu booted with -smp sockets=1,cores=2,threads=2;
Before:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 1
 cpu cores       : 4
 cpu cores       : 3

With the patch:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2

Reported-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120531073738.GH7511@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:37:59 +02:00
Tomoki Sekiyama
f6175f5bfb x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
In current Linux, percpu variable `vector_irq' is not cleared on
offlined cpus while disabling devices' irqs. If the cpu that has
the disabled irqs in vector_irq is hotplugged,
__setup_vector_irq() hits invalid irq vector and may crash.

This bug can be reproduced as following;

  # echo 0 > /sys/devices/system/cpu/cpu7/online
  # modprobe -r some_driver_using_interrupts      # vector_irq@cpu7 uncleared
  # echo 1 > /sys/devices/system/cpu/cpu7/online  # kernel may crash

This patch fixes this bug by clearing vector_irq in
__clear_irq_vector() even if the cpu is offlined.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: ltc-kernel@ml.yrl.intra.hitachi.co.jp
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/4FC340BE.7080101@hitachi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 12:03:25 +02:00
Feng Tang
55c844a4dd x86/reboot: Fix a warning message triggered by stop_other_cpus()
When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
always see this warning msg:

Restarting system.
machine restart
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:125
native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
Modules linked in: igb [last unloaded: scsi_wait_scan]
Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
Call Trace:
 <IRQ>  [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
 [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
 [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
 [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
 [<ffffffff81050112>] scheduler_tick+0xe0/0xe9
 [<ffffffff81036768>] update_process_times+0x60/0x70
 [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
 [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
 [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
 [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
 [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
 [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
 <EOI>  [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4
 [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
 [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
 [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
 [<ffffffff81018926>] machine_shutdown+0xa/0xc
 [<ffffffff8101897e>] native_machine_restart+0x20/0x32
 [<ffffffff810189b0>] machine_restart+0xa/0xc
 [<ffffffff8103b196>] kernel_restart+0x47/0x4c
 [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
 [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
 [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
 [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
 [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
 [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
---[ end trace 320af5cb1cb60c5b ]---

The root cause seems to be the
default_send_IPI_mask_allbutself_phys() takes quite some time (I
measured it could be several ms) to complete sending NMIs to all
the other 23 CPUs, and for HZ=250/1000 system, the time is long
enough for a timer interrupt to happen, which will in turn
trigger to kick load balance to a stopped CPU and cause this
warning in native_smp_send_reschedule().

So disabling the local irq before stop_other_cpu() can fix this
problem (tested 25 times reboot ok), and it is fine as there
should be nobody caring the timer interrupt in such reboot
stage.

The latest 3.4 kernel slightly changes this behavior by sending
REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
fails, and this patch is still needed to prevent the problem.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 12:03:23 +02:00