Pull two Ceph fixes from Sage Weil:
"One of these is fixing a regression from the d_flags file type patch
that went into -rc1 that broke instantiation of inodes and dentries
(we were doing dentries first). The other is just an off-by-one
corner case"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: Avoid data inconsistency due to d-cache aliasing in readpage()
ceph: initialize inode before instantiating dentry
There's a possible deadlock if we flush the peers notifying work during setting
mtu:
[ 22.991149] ======================================================
[ 22.991173] [ INFO: possible circular locking dependency detected ]
[ 22.991198] 3.10.0-54.0.1.el7.x86_64.debug #1 Not tainted
[ 22.991219] -------------------------------------------------------
[ 22.991243] ip/974 is trying to acquire lock:
[ 22.991261] ((&(&net_device_ctx->dwork)->work)){+.+.+.}, at: [<ffffffff8108af95>] flush_work+0x5/0x2e0
[ 22.991307]
but task is already holding lock:
[ 22.991330] (rtnl_mutex){+.+.+.}, at: [<ffffffff81539deb>] rtnetlink_rcv+0x1b/0x40
[ 22.991367]
which lock already depends on the new lock.
[ 22.991398]
the existing dependency chain (in reverse order) is:
[ 22.991426]
-> #1 (rtnl_mutex){+.+.+.}:
[ 22.991449] [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
[ 22.991477] [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
[ 22.991501] [<ffffffff81673659>] mutex_lock_nested+0x89/0x4f0
[ 22.991529] [<ffffffff815392b7>] rtnl_lock+0x17/0x20
[ 22.991552] [<ffffffff815230b2>] netdev_notify_peers+0x12/0x30
[ 22.991579] [<ffffffffa0340212>] netvsc_send_garp+0x22/0x30 [hv_netvsc]
[ 22.991610] [<ffffffff8108d251>] process_one_work+0x211/0x6e0
[ 22.991637] [<ffffffff8108d83b>] worker_thread+0x11b/0x3a0
[ 22.991663] [<ffffffff81095e5d>] kthread+0xed/0x100
[ 22.991686] [<ffffffff81681c6c>] ret_from_fork+0x7c/0xb0
[ 22.991715]
-> #0 ((&(&net_device_ctx->dwork)->work)){+.+.+.}:
[ 22.991715] [<ffffffff810de817>] check_prevs_add+0x967/0x970
[ 22.991715] [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
[ 22.991715] [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
[ 22.991715] [<ffffffff8108afde>] flush_work+0x4e/0x2e0
[ 22.991715] [<ffffffff8108e1b5>] __cancel_work_timer+0x95/0x130
[ 22.991715] [<ffffffff8108e303>] cancel_delayed_work_sync+0x13/0x20
[ 22.991715] [<ffffffffa03404e4>] netvsc_change_mtu+0x84/0x200 [hv_netvsc]
[ 22.991715] [<ffffffff815233d4>] dev_set_mtu+0x34/0x80
[ 22.991715] [<ffffffff8153bc2a>] do_setlink+0x23a/0xa00
[ 22.991715] [<ffffffff8153d054>] rtnl_newlink+0x394/0x5e0
[ 22.991715] [<ffffffff81539eac>] rtnetlink_rcv_msg+0x9c/0x260
[ 22.991715] [<ffffffff8155cdd9>] netlink_rcv_skb+0xa9/0xc0
[ 22.991715] [<ffffffff81539dfa>] rtnetlink_rcv+0x2a/0x40
[ 22.991715] [<ffffffff8155c41d>] netlink_unicast+0xdd/0x190
[ 22.991715] [<ffffffff8155c807>] netlink_sendmsg+0x337/0x750
[ 22.991715] [<ffffffff8150d219>] sock_sendmsg+0x99/0xd0
[ 22.991715] [<ffffffff8150d63e>] ___sys_sendmsg+0x39e/0x3b0
[ 22.991715] [<ffffffff8150eba2>] __sys_sendmsg+0x42/0x80
[ 22.991715] [<ffffffff8150ebf2>] SyS_sendmsg+0x12/0x20
[ 22.991715] [<ffffffff81681d19>] system_call_fastpath+0x16/0x1b
This is because we hold the rtnl_lock() before ndo_change_mtu() and try to flush
the work in netvsc_change_mtu(), in the mean time, netdev_notify_peers() may be
called from worker and also trying to hold the rtnl_lock. This will lead the
flush won't succeed forever. Solve this by not canceling and flushing the work,
this is safe because the transmission done by NETDEV_NOTIFY_PEERS was
synchronized with the netif_tx_disable() called by netvsc_change_mtu().
Reported-by: Yaju Cao <yacao@redhat.com>
Tested-by: Yaju Cao <yacao@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull powerpc fixes from Ben Herrenschmidt:
"Uli's patch fixes a regression in ptrace caused by a mis-merge of a
previous LE patch. The rest are all more endian fixes, all fairly
trivial, found during testing of 3.13-rc's"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/powernv: Fix OPAL LPC access in Little Endian
powerpc/powernv: Fix endian issue in opal_xscom_read
powerpc: Fix endian issues in crash dump code
powerpc/pseries: Fix endian issues in MSI code
powerpc/pseries: Fix PCIE link speed endian issue
powerpc/pseries: Fix endian issues in nvram code
powerpc/pseries: Fix endian issues in /proc/ppc64/lparcfg
powerpc: Fix topology core_id endian issue on LE builds
powerpc: Fix endian issue in setup-common.c
powerpc: PTRACE_PEEKUSR always returns FPR0
If a user calls 'cpupower set --perf-bias 15', the process will end with
a SIGSEGV in libc because cpupower-set passes a NULL optarg to the atoi
call. This is because the getopt_long structure currently has all of
the options as having an optional_argument when they really have a
required argument. We change the structure to use required_argument to
match the short options and it resolves the issue.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1000439
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Thomas Renninger <trenn@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a driver workaround for a HW issue.
A race condition in the HW results in missing interrupts,
which can be avoided by a read/write with the ISR register.
All chips in the AR9002 series are affected by this bug - AR9003
and above do not have this problem.
Cc: stable@vger.kernel.org
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On Fedora systems, unloading rtl8192ce causes an oops. This patch fixes the
problem reported at https://bugzilla.redhat.com/show_bug.cgi?id=852761.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Pick the MAC address of the first virtual interface as the new hardware MAC
address. Set BSSID mask according to this MAC address. This fixes CVE-2013-4579.
Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The GMCH_CTRL register (or MGCC in the spec) is at a different address
on Sandybridge, and the address to which we currently write to is
undefined. These stray writes appear to upset (hard hang) my Ivybridge
machine whilst it is in UEFI mode.
Note that the register is still marked as locked RO on Sandybridge, so
vgaarb is still dysfunctional.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We use this hook starting from ILK onwards, so change the prefix
accordingly. Also rename functions/struct names used from
haswell_update_wm that are relevant to ILK already.
No functional change.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No functional change.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With block processing of echoed output, observed output order is still
required. Push completed echoes and echo commands prior to output.
Introduce echo_mark echo buffer index, which tracks completed echo
commands; ie., those submitted via commit_echoes but which may not
have been committed. Ensure that completed echoes are output prior
to subsequent terminal writes in process_echoes().
Fixes newline/prompt output order in cooked mode shell.
Cc: <stable@vger.kernel.org> # 3.12.x : 39434ab n_tty: Fix missing newline echo
Reported-by: Karl Dahlke <eklhad@comcast.net>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Karl Dahlke <eklhad@comcast.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Newer Intel PCHs with LPSS have the same Designware controllers than
Haswell but ACPI IDs are different. Add these IDs to the driver list.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce device tree bindings for the MIPI pad calibration controller
found on Tegra SoCs. The controller can be used to perform calibration
of pads used for DSI and CSI peripherals.
Signed-off-by: Thierry Reding <treding@nvidia.com>
The display controller primary clock was recently renamed to "dc", so
update the example to reflect that.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Use the DRM panel framework to attach a panel to an output. If the panel
attached to a connector supports supports the backlight brightness
accessors, a property will be available to allow the brightness to be
modified from userspace.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Add a driver for simple panels. Such panels can have a regulator that
provides the supply voltage and a separate GPIO to enable the panel.
Optionally the panels can have a backlight associated with them so it
can be enabled or disabled according to the panel's power management
mode.
Support is added for two panels: An AU Optronics 10.1" WSVGA and a
Chunghwa Picture Tubes 10.1" WXGA panel.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Add a very simple framework to register and lookup panels. Panel drivers
can initialize a DRM panel and register it with the framework, allowing
them to be retrieved and used by display drivers. Currently only support
for DPMS and obtaining panel modes is provided. However it should be
sufficient to enable a large number of panels. The framework should also
be easily extensible to support more sophisticated kinds of panels such
as DSI.
The framework hasn't been tied into the DRM core, even though it should
be easily possible to do so if that's what we want. In the current
implementation, display drivers can simple make use of it to retrieve a
panel, obtain its modes and control its DPMS mode.
Note that this is currently only tested on systems that boot from a
device tree. No glue code has been written yet for systems that use
platform data, but it should be easy to add.
Signed-off-by: Thierry Reding <treding@nvidia.com>
MIPI DSI bus allows to model DSI hosts and DSI peripherals using the
Linux driver model. DSI hosts are registered by the DSI host drivers.
During registration DSI peripherals will be created from the children
of the DSI host's device tree node. Support for registration from
board-setup code will be added later when needed.
DSI hosts expose operations which can be used by DSI peripheral drivers
to access associated devices.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
This binding specifies a set of common properties for display panels. It
can be used as a basis by bindings for specific panels.
Bindings for three specific panels are provided to show how the simple
panel binding can be used.
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Document the device tree bindings for the MIPI DSI bus. The MIPI Display
Serial Interface specifies a serial bus and a protocol for communication
between a host and up to four peripherals.
Signed-off-by: Thierry Reding <treding@nvidia.com>
This branch includes all the changes to Tegra's powergate driver for 3.14.
These are separate out, since the Tegra DRM changes for 3.14 rely on the
new APIs introduced here.
A few cleanups and fixes are included, plus additions of Tegra124 SoC
support, and a new API for manipulating Tegra's IO rail deep power down
states.
This branch is based on tag tegra-for-3.14-dmas-resets-rework, in order
to avoid conflicts with the addition of common reset controller support
to the powergate driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSr3G5AAoJEMzrak5tbycxHaEP/2YP1KIUTINzF9JlTaFWLd38
dlqCOdDjcIow6QvaI0H2xdD7OhWqbiK2gAKz10PH0TYVr5AJZKyC7uxAw1gvV5bI
2hgP9ZBVQY8qfveZuyQhL9jCwAg52TfG2LBO4Iw3wng9tO00uo8knjoHURmiewuY
fLY8CQxYydwK8MmqabPI/L0gQhSlswyDByaeB0ixHJncSVT8KHmbSX8kx7FnYsI6
/r/D7IQtMyx52XJRbInN83UIdONKP9vkPAXmptKBU2SGwdXOzqPqfxFuNRKN85Gq
a4wlJUQAfBAndiemaZIH5O7gmmUht1lB0L9JWDFw9nBr6EHDVOpgRgi/OgVFaFE+
Qer3LdrpEvyXsRe3AjeQgta8716KaRMW3IbaqkdS0F+HwyFGL9/nGRttu9JLHSCR
oo7FVscaE7tC2d/LGNqI0y4pprgbrjROT70hWoNDF8oS5qDsI3rxEjB0FXyzE2by
Bd702Sc9heh60Hnov/vZy9WFcWETEKVEF5hgIL3lrjA6pMKdH7fEylOD+bxVAmE2
t5wePXc7OSdciaLN/D1GhtCD+R58u+idHadKnU9uRYB1wmeIAeJMi9D/ZnpKftX4
M5Ts7t8k1jxZlRXXgiara3SwJ6tIOlc5vEvhQWEeInzL0yishFDy2mmq1lMaV0mJ
cFgQUdhufycI0Yprfwjt
=jJOV
-----END PGP SIGNATURE-----
Merge tag 'tegra-for-3.14-powergate' into drm/for-next
ARM: tegra: powergate driver changes
This branch includes all the changes to Tegra's powergate driver for 3.14.
These are separate out, since the Tegra DRM changes for 3.14 rely on the
new APIs introduced here.
A few cleanups and fixes are included, plus additions of Tegra124 SoC
support, and a new API for manipulating Tegra's IO rail deep power down
states.
This branch is based on tag tegra-for-3.14-dmas-resets-rework, in order
to avoid conflicts with the addition of common reset controller support
to the powergate driver.
This series converts the Tegra DTs and drivers to use the common/
standard DMA and reset bindings, rather than custom bindings. It also
adds complete documentation for the Tegra clock bindings without
actually changing any binding definitions.
This conversion relies on a few sets of patches in branches from outside
the Tegra tree:
1) A patch to add an DMA channel request API which allows deferred probe
to be implemented.
2) A patch to implement a common part of the of_xlate function for DMA
controllers.
3) Some ASoC patches (which in turn rely on (1) above), which support
deferred probe during DMA channel allocation.
4) The Tegra clock driver changes for 3.14.
Consequently, this branch is based on a merge of all of those external
branches.
In turn, this branch is or will be pulled into a few places that either
rely on features introduced here, or would otherwise conflict with the
patches:
a) Tegra's own for-3.14/powergate and for-4.14/dt branches, to avoid
conflicts.
b) The DRM tree, which introduces new code that relies on the reset
controller framework introduced in this branch, and to avoid
conflicts.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSr3AnAAoJEMzrak5tbycxfMwQAMeffTFreJqDiQ4Vj0XmuhSn
RFlXiZQsWtQ6gGgNfKyDsXzDMaz1KDAabcUYRcZwrluxuSCPBcK1JirCj5R8uRY7
LDZFX92CO8zRgiij0mhgokV4zzuEQ56q1uhPxqI3o+wG3v44jlMSMgFHQJUevdET
aKr2Pss8Hb00XDztnpxprs6FUoU/W99NRH0i/5znbBwuHqYFP37zlKe2MRwbqDwR
AMgkrnGoawe85Stz4p/iR9pCLpAMa0dH94V4JrAP4+IQrl0DEKWbrolpQHii4gzh
NCGazMELTqkaZaorC/n1SmczH1kTj4vcjbbmeB8dwS8Vqhr+uf7W1oLlJ46TUOsp
ESO0uD2GfpHKQQwLxEfgjfmwsIUMbdWHef8f2HUuvl6Js+LCpaPkxd52Pt/qL4sU
0sKqTbldRZXzGhvwa0/MK32WhmH4v31s7IZAg5A2YxqDR6yWryl7legWyvrI96C0
OlmVe1C/2NGk0QCfK0G/xTa9V7YzMfj8k4ICSZOgUoF4BeGGj6d3svWvLbKbbrU1
0fVvR7aCm78pRXixI6kURpj9D0mEfqus9Hx7VoWcL0TS4QH2dSYlGI+jDCiliQmj
+kWrZWHsASSvPmUZk4RBNaviCbnGU8/t5nNdJSdFIUM/PIswzZ4GaAu6gdVksIY8
hcx410PyAzTZL2lENamE
=8T7+
-----END PGP SIGNATURE-----
Merge tag 'tegra-for-3.14-dmas-resets-rework' into drm/for-next
ARM: tegra: implement common DMA and resets DT bindings
This series converts the Tegra DTs and drivers to use the common/
standard DMA and reset bindings, rather than custom bindings. It also
adds complete documentation for the Tegra clock bindings without
actually changing any binding definitions.
This conversion relies on a few sets of patches in branches from outside
the Tegra tree:
1) A patch to add an DMA channel request API which allows deferred probe
to be implemented.
2) A patch to implement a common part of the of_xlate function for DMA
controllers.
3) Some ASoC patches (which in turn rely on (1) above), which support
deferred probe during DMA channel allocation.
4) The Tegra clock driver changes for 3.14.
Consequently, this branch is based on a merge of all of those external
branches.
In turn, this branch is or will be pulled into a few places that either
rely on features introduced here, or would otherwise conflict with the
patches:
a) Tegra's own for-3.14/powergate and for-4.14/dt branches, to avoid
conflicts.
b) The DRM tree, which introduces new code that relies on the reset
controller framework introduced in this branch, and to avoid
conflicts.
When the process is sleeping at the SNDRV_PCM_STATE_PAUSED
state from the wait_for_avail function, the sleep process will be woken by
timeout(10 seconds). Even if the sleep process wake up by timeout, by this
patch, the process will continue with sleep and wait for the other state.
Signed-off-by: JongHo Kim <furmuwon@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If we are doing aysnc writeback of metadata, we can get write errors
but have nobody to report them to. At the moment, we simply attempt
to reissue the write from io completion in the hope that it's a
transient error.
When it's not a transient error, the buffer is stuck forever in
this loop, and we cannot break out of it. Eventually, unmount will
hang because the AIL cannot be emptied and everything goes downhill
from them.
To solve this problem, only retry the write IO once before aborting
it. We don't throw the buffer away because some transient errors can
last minutes (e.g. FC path failover) or even hours (thin
provisioned devices that have run out of backing space) before they
go away. Hence we really want to keep trying until we can't try any
more.
Because the buffer was not cleaned, however, it does not get removed
from the AIL and hence the next pass across the AIL will start IO on
it again. As such, we still get the "retry forever" semantics that
we currently have, but we allow other access to the buffer in the
mean time. Meanwhile the filesystem can continue to modify the
buffer and relog it, so the IO errors won't hang the log or the
filesystem.
Now when we are pushing the AIL, we can see all these "permanent IO
error" buffers and we can issue a warning about failures before we
retry the IO. We can also catch these buffers when unmounting an
issue a corruption warning, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
When swalloc is specified as a mount option, allocations are
supposed to be aligned to the stripe width rather than the stripe
unit of the underlying filesystem. However, it does not do this.
What the implementation does is round up the allocation size to a
stripe width, hence ensuring that all allocations span a full stripe
width. It does not, however, ensure that that allocation is aligned
to a stripe width, and hence the allocations can span multiple
underlying stripes and so still see RMW cycles for things like
direct IO on MD RAID.
So, if the swalloc mount option is set, change the allocation
alignment in xfs_bmap_btalloc() to use the stripe width rather than
the stripe unit.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
handles the case of a shut down filesystem. Most of the users have private,
uncached buffers that can just be freed in this case, but the complex error
handling in xfs_bioerror_relse messes up the case when it's called without
a locked buffer.
Remove xfsbdstrat and opencode the error handling in the callers. All but
one can simply return an error and don't need to deal with buffer state,
and the one caller that cares about the buffer state could do with a major
cleanup as well, but we'll defer that to later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The function xfs_bmap_isaeof() is used to indicate that an
allocation is occurring at or past the end of file, and as such
should be aligned to the underlying storage geometry if possible.
Commit 27a3f8f ("xfs: introduce xfs_bmap_last_extent") changed the
behaviour of this function for empty files - it turned off
allocation alignment for this case accidentally. Hence large initial
allocations from direct IO are not getting correctly aligned to the
underlying geometry, and that is cause write performance to drop in
alignment sensitive configurations.
Fix it by considering allocation into empty files as requiring
aligned allocation again.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit f9b395a8ef)
When I tried to send the patches to XFS Maintainers,
I got returned mail included delivery fail message for Dave's mail.
Maybe, Dave Chinner mail address is incorrect.
I try to fix it correctly.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit db10bddc7d)
xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
# mount -ouquota,gquota /dev/sda7 /xfs
# mkdir /xfs/test
# xfs_quota -xc 'off -g' /xfs <-- hangs up
# echo w > /proc/sysrq-trigger
# dmesg
SysRq : Show Blocked State
task PC stack pid father
xfs_quota D 0000000000000000 0 27574 2551 0x00000000
[snip]
Call Trace:
[<ffffffff81aaa21d>] schedule+0xad/0xc0
[<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
[<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
[<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
[<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
[<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
[<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
[<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
[<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
[<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
[<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
[<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
[<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
[<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
[<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
[<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
[<ffffffff812fba4b>] ? iput+0x5b/0x90
[<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
[<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b
It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s). Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.
This problem has been around since Linux 3.4, it was introduced by:
[ b84a3a9675 xfs: remove the per-filesystem list of dquots ]
Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.
In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.
Cc: stable@vger.kernel.org
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit df8052e7da)
For CRC enabled v5 super block, change a file's ownership can simply
trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
project quota are enabled, i.e,
[ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
[ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
[ 305.383939] Call Trace:
[ 305.385536] [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
[ 305.387142] [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
[ 305.388727] [<ffffffff811ca388>] notify_change+0x1a8/0x350
[ 305.390298] [<ffffffff811ac39d>] chown_common+0xfd/0x110
[ 305.391868] [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
[ 305.393440] [<ffffffff811ad760>] SyS_lchown+0x20/0x30
[ 305.394995] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[ 305.399870] RIP [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]
This fix adjust the assertion to check if the super block support both
quota inodes or not.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 5a01dd54f4)
After the previous fix, there still has another ASSERT failure if turning
off any type of quota while fsstress is running at the same time.
Backtrace in this case:
[ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
[ 50.867924] ------------[ cut here ]------------
... <snip>
[ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
[ 50.867999] invalid opcode: 0000 [#1] SMP
[ 50.869407] Call Trace:
[ 50.869446] [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
[ 50.869512] [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
[ 50.869564] [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
[ 50.869615] [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
[ 50.869655] [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
[ 50.869689] [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
[ 50.869723] [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
[ 50.869757] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
[ 50.870003] RIP [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
[ 50.870050] RSP <ffff88002941fd60>
[ 50.879251] ---[ end trace c93a2b342341c65b ]---
We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
however the assertion itself is not right IMHO. While performing quota off, we
firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee
that the desired quota is still active.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 37eb9706eb)
Fix the leak of kernel memory in xfs_dir2_node_removename()
when xfs_dir2_leafn_remove() returns an error code.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit ef701600fd)
Spread spectrum seems to cause hangs when dynamic clock
switching is enabled. Disable it for now. This does not
affect performance or the amount of power saved. Tracked
down by Martin Andersson.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=69723
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This patch fixes the rates declared in the CPU DAI parameters:
- SNDRV_PCM_RATE_KNOT and the discrete rates SNDRV_PCM_RATE_xxx should
not be used with SNDRV_PCM_RATE_CONTINUOUS,
- SNDRV_PCM_RATE_CONTINUOUS asks for rate_min and rate_max,
- the device may do streaming down to 5512Hz.
Signed-off-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Mark Brown <broonie@linaro.org>
This patch touches the RT group scheduling case.
Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's
priority, while rt_rq passed to them may be not the top-level rt_rq.
This is wrong, because changing of priority on a child level does not
guarantee that the priority is the highest all over the rq. So, this
leak makes RT balancing unusable.
The short example: the task having the highest priority among all rq's
RT tasks (no one other task has the same priority) are waking on a
throttle rt_rq. The rq's cpupri is set to the task's priority
equivalent, but real rq->rt.highest_prio.curr is less.
The patch below fixes the problem.
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
dereference on sd_busy but the fix also altered what scheduling domain it
used for the 'sd_llc' percpu variable.
One impact of this is that a task selecting a runqueue may consider
idle CPUs that are not cache siblings as candidates for running.
Tasks are then running on CPUs that are not cache hot.
This was found through bisection where ebizzy threads were not seeing equal
performance and it looked like a scheduling fairness issue. This patch
mitigates but does not completely fix the problem on all machines tested
implying there may be an additional bug or a common root cause. Here are
the average range of performance seen by individual ebizzy threads. It
was tested on top of candidate patches related to x86 TLB range flushing.
4-core machine
3.13.0-rc3 3.13.0-rc3
vanilla fixsd-v3r3
Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%)
Mean 2 0.34 ( 0.00%) 0.10 ( 70.59%)
Mean 3 1.29 ( 0.00%) 0.93 ( 27.91%)
Mean 4 7.08 ( 0.00%) 0.77 ( 89.12%)
Mean 5 193.54 ( 0.00%) 2.14 ( 98.89%)
Mean 6 151.12 ( 0.00%) 2.06 ( 98.64%)
Mean 7 115.38 ( 0.00%) 2.04 ( 98.23%)
Mean 8 108.65 ( 0.00%) 1.92 ( 98.23%)
8-core machine
Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%)
Mean 2 0.40 ( 0.00%) 0.21 ( 47.50%)
Mean 3 23.73 ( 0.00%) 0.89 ( 96.25%)
Mean 4 12.79 ( 0.00%) 1.04 ( 91.87%)
Mean 5 13.08 ( 0.00%) 2.42 ( 81.50%)
Mean 6 23.21 ( 0.00%) 69.46 (-199.27%)
Mean 7 15.85 ( 0.00%) 101.72 (-541.77%)
Mean 8 109.37 ( 0.00%) 19.13 ( 82.51%)
Mean 12 124.84 ( 0.00%) 28.62 ( 77.07%)
Mean 16 113.50 ( 0.00%) 24.16 ( 78.71%)
It's eliminated for one machine and reduced for another.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: H Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit fdfbbd07e9 ("perf: Add generic transaction flags")
added support for PERF_SAMPLE_TRANSACTION but forgot to add documentation
for the sample type to include/uapi/linux/perf_event.h
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1312131548450.10372@pianoman.cluster.toy
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, only one PMU in a context gets disabled during unthrottling
and event_sched_{out,in}(), however, events in one context may belong to
different pmus, which results in PMUs being reprogrammed while they are
still enabled.
This means that mixed PMU use [which is rare in itself] resulted in
potentially completely unreliable results: corrupted events, bogus
results, etc.
This patch temporarily disables PMUs that correspond to
each event in the context while these events are being modified.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: http://lkml.kernel.org/r/1387196256-8030-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Kyung-Kwee Ryu <kyung-kwee.ryu@wolfsonmicro.com>
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: stable@vger.kernel.org
Hugh reported this bug:
> CONFIG_MEMCG_SWAP is broken in 3.13-rc. Try something like this:
>
> mkdir -p /tmp/tmpfs /tmp/memcg
> mount -t tmpfs -o size=1G tmpfs /tmp/tmpfs
> mount -t cgroup -o memory memcg /tmp/memcg
> mkdir /tmp/memcg/old
> echo 512M >/tmp/memcg/old/memory.limit_in_bytes
> echo $$ >/tmp/memcg/old/tasks
> cp /dev/zero /tmp/tmpfs/zero 2>/dev/null
> echo $$ >/tmp/memcg/tasks
> rmdir /tmp/memcg/old
> sleep 1 # let rmdir work complete
> mkdir /tmp/memcg/new
> umount /tmp/tmpfs
> dmesg | grep WARNING
> rmdir /tmp/memcg/new
> umount /tmp/memcg
>
> Shows lots of WARNING: CPU: 1 PID: 1006 at kernel/res_counter.c:91
> res_counter_uncharge_locked+0x1f/0x2f()
>
> Breakage comes from 34c00c319c ("memcg: convert to use cgroup id").
>
> The lifetime of a cgroup id is different from the lifetime of the
> css id it replaced: memsw's css_get()s do nothing to hold on to the
> old cgroup id, it soon gets recycled to a new cgroup, which then
> mysteriously inherits the old's swap, without any charge for it.
Instead of removing cgroup id right after all the csses have been
offlined, we should do that after csses have been destroyed.
To make sure an invalid css pointer won't be returned after the css
is destroyed, make sure css_from_id() returns NULL in this case.
tj: Updated comment to note planned changes for cgrp->id.
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>
Apparently always enabling the sprite scaler magically made
sprites work on ILK in the past.
I think the real reason for the failure was missing sprite
watermark programming, and enabling the scaler effectively
disabled LP1+ watermarks, which was enough to keep things going.
Or it might be that the hardware more or less ignores watermarks
for scaled sprites since things seem to work even if I leave
sprite watermarks at 0 and disable all other planes except the
sprite.
In any case, we left the scaler always on but then failed to
check whether we might be exceeding the scaler's source size
limits. That caused the sprite to fail when a sufficiently
large unscaled image was being displayed.
Now that we're getting proper watermark programming for ILK, we
can keep the scaler disabled unless we need to do actual scaling.
This reverts commit 8aaa81a166.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As the watermark registers aren't double bufferd, clearing the
watermarks immediately after writing the sprite registers can be
hazardous.
Until we have something better, add a wait for vblank between the
two steps to make sure the sprite no longer needs the watermark
levels before we clear them.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When color keying is used, the primary may not be invisible even though
the sprite fully covers it. So check for color keying before deciding to
disable the primary plane.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We now have a very clear method of disabling LP1+ wartermarks,
and we can actually detect if we actually did disable them, or
if they were already disabled. Use that to clean up the
WaCxSRDisabledForSpriteScaling:ivb handling.
I was hoping to apply the workaround in a way that wouldn't
require a blocking wait, but sadly IVB really does appear to
require LP1+ watermarks to be off for an entire frame before
enabling sprite scaling. Simply disabling LP1+ watermarks
during the previous frame is not enough, no matter how early
in the frame we do it :(
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The new HSW watermark code can now handle ILK/SNB/IVB as well, so
switch them over. Kill the old code.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>