Commit graph

414655 commits

Author SHA1 Message Date
Andreas Rohner
70f2fe3a26 nilfs2: fix segctor bug that causes file system corruption
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

 1. The cleaner starts the segment construction
 2. nilfs_segctor_collect is called
 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
 4. sc_stage is on NILFS_ST_DAT current segment is full
 5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
 6. The new segment is one of the segments freed in step 3
 7. nilfs_sufile_cancel_freev is called and produces an error message
 8. Loop around and the collection starts again
 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-15 14:19:42 +07:00
Ingo Molnar
e59da0aedb Merge branch 'clockevents/3.13-fixes' of git://git.linaro.org/people/daniel.lezcano/linux into timers/urgent
Pull clock driver fix from Daniel Lezcano:

 " * Soren Brinkmann fixed the cadence_ttc driver where a call to
     clk_get_rate happens in an interrupt context. More precisely in an IPI
     when the broadcast timer is initialized for each cpu in the cpuidle
     driver. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-15 07:39:30 +01:00
Phil Pokorny
d303b1b5fb hwmon: (k10temp) Add support for Kaveri CPUs
Add new PCI ID to support new model "Kaveri" family.

Signed-off-by: Philip Pokorny <ppokorny@penguincomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:54 -08:00
Vivien Didelot
ffcc3b2aa1 hwmon: (sht15) add include guard
Add include guard to include/linux/platform_data/sht15.h to prevent
multiple inclusion.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:53 -08:00
Vivien Didelot
4cb6409bbe hwmon: (max197) add include guard
Add include guard to include/linux/platform_data/max197.h to prevent
multiple inclusion.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:53 -08:00
Guenter Roeck
f5776cc3b5 hwmon: (nct6775) Re-enable logical device mapping for NCT6791 during resume
After a suspend/resume cycle, the NCT6791 is back to its original BIOS
programming. In this state, HWMON IO access may be locked.
Re-enable it during resume.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:53 -08:00
Sachin Kamat
2ac1dfc52d hwmon: (s3c) Trivial cleanup in hwmon-s3c.h
Commit 436d42c61c ("ARM: samsung: move platform_data definitions")
moved the file to the current location but forgot to remove the pointer
to its previous location. Clean it up. While at it also change the header
file protection macros appropriately.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:52 -08:00
Guenter Roeck
bf6ea084eb hwmon: (coretemp) Do not return -EAGAIN for low temperatures
Some Intel CPUs do not set the 'valid' bit in IA32_THERM_STATUS if the
temperature is too low to be measured. This condition will not change until
the CPU is hot enough for its temperature to be measured. Returning an error
in such conditions is not very useful. Drop checking the valid bit and just
return the reported temperature instead.

Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:52 -08:00
Anthony Olech
c09088d934 hwmon: (da9052) Fix adc to voltage calculation
The ADC resolution of the PMIC is 10-bits, this means that the maximum
possible value is 1023 and not the 1024 as originally in the code.

Signed-off-by: Anthony Olech <anthony.olech.opensource@diasemi.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:31 -08:00
Guenter Roeck
9fb6c9c73b hwmon: (coretemp) Refine TjMax detection
Intel's turbostat code uses only 7 bits from MSR_IA32_TEMPERATURE_TARGET to
read TjMax, and also only accepts it if the reported temperature is at least
85 degrees C. Play safe and do the same.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:31 -08:00
Guenter Roeck
347c16cfde hwmon: (coretemp) Add PCI device ID for CE41x0 CPUs
Since we now have to use PCI IDs to detect CPU types anyway, use this mechanism
to detect CE41x0 CPUs. Advantage is that it only requires a single entry and
covers all variants of CE41x0, including those unknown to us.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:31 -08:00
Guenter Roeck
14513ee696 hwmon: (coretemp) Use PCI host bridge ID to identify CPU if necessary
Atom S12x0 CPUs are identified by the CPU host bridge ID. Add an override
table based on PCI IDs as well as code to detect it.

PCI access functions can now be called with PCI disabled, so unlike previous
attempts to use PCI IDs, the code no longer depends on it. If PCI is disabled,
the CPU will not be identified correctly. Since it is unlikely that anything
will work in this case, this is an acceptable limitation.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:30 -08:00
Jingoo Han
cd9bb0564c hwmon: remove DEFINE_PCI_DEVICE_TABLE macro
Don't use DEFINE_PCI_DEVICE_TABLE macro, because this macro
is not preferred.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 21:36:30 -08:00
Dave Airlie
703a8c2dfa Merge branch 'drm-nouveau-next' of git://git.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
Single regression fix for nouveau

* 'drm-nouveau-next' of git://git.freedesktop.org/git/nouveau/linux-2.6:
  drm/nouveau: fix null ptr dereferences on some boards
2014-01-15 15:01:11 +10:00
Ben Skeggs
fdd239ac99 drm/nouveau: fix null ptr dereferences on some boards
Regression from "device: populate master subdev pointer only when fully
constructed"

Reported-by: Bob Gleitsmann <rjgleits@bellsouth.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2014-01-15 14:24:05 +10:00
Jitendra Kalsaria
51bb352f15 qlge: Fix vlan netdev features.
vlan gets the same netdev features except vlan filter.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 19:02:35 -08:00
Hannes Frederic Sowa
95f4a45de1 net: avoid reference counter overflows on fib_rules in multicast forwarding
Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.

This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.

Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.

Fixes: f0ad0860d0 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 17:37:25 -08:00
Peter Korsgaard
7c4b5175f6 dm9601: add USB IDs for new dm96xx variants
A number of new dm96xx variants now exist.

Reported-by: Joseph Chang <joseph_chang@davicom.com.tw>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 17:36:19 -08:00
Michael S. Tsirkin
a02bbb1ccf MAINTAINERS: add virtio-dev ML for virtio
Since virtio is an OASIS standard draft now, virtio implementation
discussions are taking place on the virtio-dev OASIS mailing list.
Update MAINTAINERS.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 17:09:38 -08:00
Christian Engelmayer
267d29a69c ieee802154: Fix memory leak in ieee802154_add_iface()
Fix a memory leak in the ieee802154_add_iface() error handling path.
Detected by Coverity: CID 710490.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:40:56 -08:00
Jean Delvare
3f9aec7610 hwmon: (coretemp) Fix truncated name of alarm attributes
When the core number exceeds 9, the size of the buffer storing the
alarm attribute name is insufficient and the attribute name is
truncated. This causes libsensors to skip these attributes as the
truncated name is not recognized.

Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-01-14 09:47:52 -08:00
Stephen Warren
2fac2b891f i2c: Re-instate body of i2c_parent_is_i2c_adapter()
The body of i2c_parent_is_i2c_adapter() is currently guarded by
I2C_MUX. It should be CONFIG_I2C_MUX instead.

Among potentially other problems, this resulted in i2c_lock_adapter()
only locking I2C mux child adapters, and not the parent adapter. In
turn, this could allow inter-mingling of mux child selection and I2C
transactions, which could result in I2C transactions being directed to
the wrong I2C bus, and possibly even switching between busses in the
middle of a transaction.

One concrete issue caused by this bug was corrupted HDMI EDID reads
during boot on the NVIDIA Tegra Seaboard system, although this only
became apparent in recent linux-next, when the boot timing was changed
just enough to trigger the race condition.

Fixes: 3923172b3d ("i2c: reduce parent checking to a NOOP in non-I2C_MUX case")
Cc: Phil Carmody <phil.carmody@partner.samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-01-14 17:11:13 +01:00
Eugene Crosser
1c59a861d6 s390/qdio: bridgeport support - CHSC part
Introduce function for the "Perform network-subchannel operation"
CHSC command with operation code "bridgeport information",
and bit definitions for "characteristics" pertaning to this command.

Signed-off-by: Eugene Crosser <eugene.crosser@ru.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-14 15:16:09 +01:00
Bjørn Mork
fdc3452cd2 net: usbnet: fix SG initialisation
Commit 60e453a940 ("USBNET: fix handling padding packet")
added an extra SG entry in case padding is necessary, but
failed to update the initialisation of the list. This can
cause list traversal to fall off the end of the list,
resulting in an oops.

Fixes: 60e453a940 ("USBNET: fix handling padding packet")
Reported-by: Thomas Kear <thomas@kear.co.nz>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:32:04 -08:00
Neal Cardwell
70315d22d3 inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
(not just TIME_WAIT), and for such sockets the tw_substate field holds
the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.

This brings the inet_diag state-matching code in line with the field
it uses to populate idiag_state. This is also analogous to the info
exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
and get_timewait4_sock() exports tw->tw_substate.

Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
state time-wait" would also return sockets in state TCP_FIN_WAIT2.

This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:35:46 -08:00
NeilBrown
8313b8e57f md: fix problem when adding device to read-only array with bitmap.
If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.

If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.

However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update.  We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.

This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag.  If that is set,
then the device will not be added to a read-only array.

Cc: Andrei Warkentin <andreiw@vmware.com>
Fixes: d70ed2e4fa
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:08 +11:00
NeilBrown
e8b8491585 md/raid10: fix bug when raid10 recovery fails to recover a block.
commit e875ecea26
    md/raid10 record bad blocks as needed during recovery.

added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.

So move the freeing of r10bio down where it is safe.

Cc: stable@vger.kernel.org (v3.1+)
Fixes: e875ecea26
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:08 +11:00
NeilBrown
5af9bef72c md/raid5: fix a recently broken BUG_ON().
commit 6d183de407
    md/raid5: fix newly-broken locking in get_active_stripe.

simplified a BUG_ON, but removed too much so now it sometimes fires
when it shouldn't.

When the STRIPE_EXPANDING flag is set, the stripe_head might be on a
special list while multiple stripe_heads are collected, or it might
not be on any list, even a 'free' list when the refcount is zero.  As
long as STRIPE_EXPANDING is set, it will be found and added back to a
list eventually.

So both of the BUG_ONs which test for the ->lru being empty or not
need to avoid the case where STRIPE_EXPANDING is set.

The patch which broke this was marked for -stable, so this patch needs
to be applied to any branch that received 6d183de4

Fixes: 6d183de407
Cc: stable@vger.kernel.org (any release to which above was applied)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
NeilBrown
41a336e011 md/raid1: fix request counting bug in new 'barrier' code.
The new iobarrier implementation in raid1 (which keeps normal writes
and resync activity separate) counts every request what is not before
the current resync point in either next_window_requests or
current_window_requests.
It flags that the request is counted by setting ->start_next_window.

allow_barrier follows this model exactly and decrements one of the
*_window_requests if and only if ->start_next_window is set.

However wait_barrier(), which increments *_window_requests uses a
slightly different test for setting -.start_next_window (which is set
from the return value of this function).
So there is a possibility of the counts getting out of sync, and this
leads to the resync hanging.

So change wait_barrier() to return a non-zero value in exactly the
same cases that it increments *_window_requests.

But was introduced in 3.13-rc1.

Reported-by: Bruno Wolff III <bruno@wolff.to>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68061
Fixes: 79ef3a8aa1
Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
NeilBrown
b50c259e25 md/raid10: fix two bugs in handling of known-bad-blocks.
If we discover a bad block when reading we split the request and
potentially read some of it from a different device.

The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
   seen by comparison with raid1.c

This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.

Cc: stable@vger.kernel.org (v3.1+)
Fixes: 856e08e237
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
NeilBrown
1cc03eb932 md/raid5: Fix possible confusion when multiple write errors occur.
commit 5d8c71f9e5
    md: raid5 crash during degradation

Fixed a crash in an overly simplistic way which could leave
R5_WriteError or R5_MadeGood set in the stripe cache for devices
for which it is no longer relevant.
When those devices are removed and spares added the flags are still
set and can cause incorrect behaviour.

commit 14a75d3e07
    md/raid5: preferentially read from replacement device if possible.

Fixed the same bug if a more effective way, so we can now revert
the original commit.

Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: stable@vger.kernel.org (3.2+ - 3.2 will need a different fix though)
Fixes: 5d8c71f9e5
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:07 +11:00
Dave Airlie
abce1ec9b0 Revert "drm: copy mode type in drm_mode_connector_list_update()"
This reverts commit 3fbd6439e4.

This caused some strange booting lockup issues on an Intel G33
belonging to Daniel Vetter, very unusual, I was hoping Daniel
would track this down, but it looks like instead I'll have to hack
a different fix for -next.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-01-14 12:50:49 +10:00
Dave Airlie
c0eeb856e9 Merge tag 'drm-intel-fixes-2014-01-13' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Black screen fixes, one for hsw+bdw each and a regression fix for
locking+load detection.

* tag 'drm-intel-fixes-2014-01-13' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915/bdw: make sure south port interrupts are enabled properly v2
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
  drm/i915: fix DDI PLLs HW state readout code
2014-01-14 12:44:48 +10:00
Dan Carpenter
9ef9730ba8 cxgb4: silence shift wrapping static checker warning
I don't know how large "tp->vlan_shift" is but static checkers worry
about shift wrapping bugs here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:15:58 -08:00
Paul Gortmaker
28aa39b853 s390: delete new instances of __cpuinit usage
The patch "s390/perf: add support for the CPU-Measurement Sampling
Facility" added a new instance of the __cpuinit macro usage.

We removed this a couple versions ago; we now want to remove
the compat no-op stubs.  Introducing new users is not what
we want to see at this point in time, as it will break once
the stubs are gone.

Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-13 16:50:25 +01:00
Heiko Carstens
075dfd8210 s390/compat: fix PSW32_USER_BITS definition
PSW32_USER_BITS should define the primary address space for user space
instead of the home address space.
Symptom of this bug is that gdb doesn't work in compat mode.

The bug was introduced with e258d719ff "s390/uaccess: always run the kernel
in home space" and f26946d7ec "s390/compat: make psw32_user_bits a constant
value again".

Cc: stable@vger.kernel.org # v3.13+
Reported-by: Andreas Arnez <arnez@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-13 16:50:21 +01:00
Michael Schmitz
a0b7b24226 m68k/irq - Use polled IRQ flag for MFP timer cascaded interrupts
Some Atari hardware has no capacity to raise interrupts (e.g.
network or USB adapter hardware attached via ROM port). The driver
interrupt routine is called from a timer interrupt (timer D) in
these cases, using chained device specific pseudo interrupts
(IRQ_MFP_TIMER1 ff.)

These interrupts will more often than not, return IRQ_NONE as
there is not always work for the device handler when called.
Too many unhandled interrupts will result in the interrupt
being disabled by the stuck interrupt watchdog.

As preferred option to flag interrupts as needing exclusion
from the watchdog mechanism, tglx added the IRQ_IS_POLLED flag
for use in such a case. Currently, two interrupts need to use
this flag. Add more users as needed.

Signed-off-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2014-01-13 09:29:10 +01:00
Linus Torvalds
a6da83f982 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fix from Ben Herrenschmidt:
 "Here's one regression fix for 3.13 that I would appreciate if you
  could still pull in.  It was an "interesting" one to debug, basically
  it's an old bug that got somewhat "exposed" by new code breaking the
  boot on PA Semi boards (yes, it does appear that some people are still
  using these!)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Check return value of instance-to-package OF call
2014-01-13 10:59:05 +07:00
Linus Torvalds
061f49ec2d Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Sorry, meant to push out this batch earlier this weekend"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
  ftrace/x86: Load ftrace_ops in parameter not the variable holding it
2014-01-13 07:28:49 +07:00
Benjamin Herrenschmidt
10348f5976 powerpc: Check return value of instance-to-package OF call
On PA-Semi firmware, the instance-to-package callback doesn't seem
to be implemented. We didn't check for error, however, thus
subsequently passed the -1 value returned into stdout_node to
thins like prom_getprop etc...

Thus caused the firmware to load values around 0 (physical) internally
as node structures. It somewhat "worked" as long as we had a NULL in the
right place (address 8) at the beginning of the kernel, we didn't "see"
the bug. But commit 5c0484e25e
"powerpc: Endian safe trampoline" changed the kernel entry point causing
that old bug to now cause a crash early during boot.

This fixes booting on PA-Semi board by properly checking the return
value from instance-to-package.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Olof Johansson <olof@lixom.net>
---
2014-01-13 09:49:17 +11:00
Taras Kondratiuk
b25f3e1c35 ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
Kexec disables outer cache before jumping to reboot code, but it doesn't
flush it explicitly. Flush is done implicitly inside of l2x0_disable().
But some SoC's override default .disable handler and don't flush cache.
This may lead to a corrupted memory during Kexec reboot on these
platforms.

This patch adds cache flush inside of OMAP4 and Highbank outer_cache.disable()
handlers to make it consistent with default l2x0_disable().

Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-01-12 14:15:27 +00:00
Linus Torvalds
7e22e91102 Linux 3.13-rc8 2014-01-12 17:04:18 +07:00
Steven Rostedt
3dc91d4338 SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()
While running stress tests on adding and deleting ftrace instances I hit
this bug:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: selinux_inode_permission+0x85/0x160
  PGD 63681067 PUD 7ddbe067 PMD 0
  Oops: 0000 [#1] PREEMPT
  CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
  Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
  task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
  RIP: 0010:[<ffffffff812d8bc5>]  [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
  RSP: 0018:ffff88007ddb1c48  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
  RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
  RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
  R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
  R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
  FS:  00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
  CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
  Call Trace:
    security_inode_permission+0x1c/0x30
    __inode_permission+0x41/0xa0
    inode_permission+0x18/0x50
    link_path_walk+0x66/0x920
    path_openat+0xa6/0x6c0
    do_filp_open+0x43/0xa0
    do_sys_open+0x146/0x240
    SyS_open+0x1e/0x20
    system_call_fastpath+0x16/0x1b
  Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
  RIP  selinux_inode_permission+0x85/0x160
  CR2: 0000000000000020

Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.

in selinux_inode_permission():

	isec = inode->i_security;

	rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);

Note, the crash came from stressing the deletion and reading of debugfs
files.  I was not able to recreate this via normal files.  But I'm not
sure they are safe.  It may just be that the race window is much harder
to hit.

What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().

The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct.  Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here.  (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).

Note, this is a hack, but it fixes the problem at hand.  A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback.  But that is a major job to do, and requires a
lot of work.  For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.

Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-12 16:53:13 +07:00
Hugh Dickins
eecc1e426d thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only
We see General Protection Fault on RSI in copy_page_rep: that RSI is
what you get from a NULL struct page pointer.

  RIP: 0010:[<ffffffff81154955>]  [<ffffffff81154955>] copy_page_rep+0x5/0x10
  RSP: 0000:ffff880136e15c00  EFLAGS: 00010286
  RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200
  RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000
  RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c
  R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001
  R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0
  Call Trace:
    copy_user_huge_page+0x93/0xab
    do_huge_pmd_wp_page+0x710/0x815
    handle_mm_fault+0x15d8/0x1d70
    __do_page_fault+0x14d/0x840
    do_page_fault+0x2f/0x90
    page_fault+0x22/0x30

do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but
since shrink_huge_zero_page() can free the huge_zero_page, and we have
no hold of our own on it here (except where the fourth test holds
page_table_lock and has checked pmd_same), it's possible for it to
answer yes the first time, but no to the second or third test.  Change
all those last three to tests for NULL page.

(Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG
in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin
in https://lkml.org/lkml/2013/3/29/103.  I believe that one is due
to the source page being split, and a tail page freed, while copy
is in progress; and not a problem without DEBUG_PAGEALLOC, since
the pmd_same check will prevent a miscopy from being made visible.)

Fixes: 97ae17497e ("thp: implement refcounting for huge zero page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.10 v3.11 v3.12
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-12 16:47:15 +07:00
Ming Lei
518d00b749 block: null_blk: fix queue leak inside removing device
When queue_mode is NULL_Q_MQ and null_blk is being removed,
blk_cleanup_queue() isn't called to cleanup queue, so the queue
allocated won't be freed.

This patch calls blk_cleanup_queue() for MQ to drain all pending
requests first and release the reference counter of queue kobject, then
blk_mq_free_queue() will be called in queue kobject's release handler
when queue kobject's reference counter drops to zero.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-12 16:22:42 +07:00
John Stultz
7a06c41cbe sched_clock: Disable seqlock lockdep usage in sched_clock()
Unfortunately the seqlock lockdep enablement can't be used
in sched_clock(), since the lockdep infrastructure eventually
calls into sched_clock(), which causes a deadlock.

Thus, this patch changes all generic sched_clock() usage
to use the raw_* methods.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Krzysztof Hałasa <khalasa@piap.pl>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:14:00 +01:00
John Stultz
0c3351d451 seqlock: Use raw_ prefix instead of _no_lockdep
Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.

This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Krzysztof Hałasa <khalasa@piap.pl>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 10:13:59 +01:00
Rik van Riel
9722c2dac7 sched: Calculate effective load even if local weight is 0
Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
node") as the problem which had modified the behaviour of effective_load.

Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
on a preferred NUMA node") changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.

wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.

Reported-and-tested-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
[ Wrote changelog]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12 09:22:15 +01:00
Linus Torvalds
26bef1318a x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog <me@halfdog.net>
Tested-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-11 19:15:52 -08:00
Taras Kondratiuk
d6cd989477 ARM: 7939/1: traps: fix opcode endianness when read from user memory
Currently code has an inverted logic: opcode from user memory
is swapped to a proper endianness only in case of read error.
While normally opcode should be swapped only if it was read
correctly from user memory.

Reviewed-by: Victor Kamensky <victor.kamensky@linaro.org>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-01-11 12:06:59 +00:00