Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
is spending several days looking for a bug in the state save/restore
code. And the preload code has some rather subtle interactions with
both paravirtualization support and segment state restore, so it's not
nearly as simple as it should be.
Also, now that we no longer necessarily depend on a single bit (ie
TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
to do better. If we are really switching between two processes that
keep touching the FP state, save/restore is inevitable, but in the case
of having one process that does most of the FPU usage, we may actually
be able to do much better than the preloading.
In particular, we may be able to keep track of which CPU the process ran
on last, and also per CPU keep track of which process' FP state that CPU
has. For modern CPU's that don't destroy the FPU contents on save time,
that would allow us to do a lazy restore by just re-enabling the
existing FPU state - with no restore cost at all!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.
One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.
https://launchpad.net/bugs/926292
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sebastien Bacher <seb128@ubuntu.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: <stable@vger.kernel.org>
statfs() calls on eCryptfs files returned the wrong filesystem type and,
when using filename encryption, the wrong maximum filename length.
If mount-wide filename encryption is enabled, the cipher block size and
the lower filesystem's max filename length will determine the max
eCryptfs filename length. Pre-tested, known good lengths are used when
the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte
block sizes is used. In other, less common cases, we fall back to a safe
rounded-down estimate when determining the eCryptfs namelen.
https://launchpad.net/bugs/885744
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
This creates three helper functions that do the TS_USEDFPU accesses, and
makes everybody that used to do it by hand use those helpers instead.
In addition, there's a couple of helper functions for the "change both
CR0.TS and TS_USEDFPU at the same time" case, and the places that do
that together have been changed to use those. That means that we have
fewer random places that open-code this situation.
The intent is partly to clarify the code without actually changing any
semantics yet (since we clearly still have some hard to reproduce bug in
this area), but also to make it much easier to use another approach
entirely to caching the CR0.TS bit for software accesses.
Right now we use a bit in the thread-info 'status' variable (this patch
does not change that), but we might want to make it a full field of its
own or even make it a per-cpu variable.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do
it. By moving it into the callers, we always do the TS_USEDFPU next to
the CR0.TS accesses in the source code, and it's much easier to see how
the two go hand in hand.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5b1cbac377 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.
However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.
Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.
This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid. With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.
There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.
However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.
Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.
Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clearing a range's bits is different with setting them, since we don't
need to touch them when states do not contain bits we want.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
happened.
Steps to reproduce:
[tty0]
# export MOUNT_OPTIONS="-o autodefrag"
# export TEST_DEV=<partition1>
# export TEST_DIR=<mountpoint1>
# export SCRATCH_DEV=<partition2>
# export SCRATCH_MNT=<mountpoint2>
# while [ 1 ]
> do
> ./check 091 127 263
> sleep 1
> done
[tty1]
# while [ 1 ]
> do
> echo 3 > /proc/sys/vm/drop_caches
> done
Several hours later, the test processes will hang on, and the deadlock will
happen on page lock.
The reason is that:
Auto defrag task Flush thread Test task
btrfs_writepages()
add ordered extent
(including page 1, 2)
set page 1 writeback
set page 2 writeback
endio_fn()
end page 2 writeback
release page 2
lock page 1
alloc and lock page 2
page 2 is not uptodate
btrfs_readpage()
start ordered extent()
btrfs_writepages()
try to lock page 1
so deadlock happens.
Fix this bug by unlocking the page which is in writeback, and re-locking it
after the writeback end.
Signed-off-by: Miao Xie <miax@cn.fujitsu.com>
The bitmap introduced in the commit [527e4d73: ALSA: hda/realtek - Fix
missing volume controls with ALC260] is too narrow for some codecs,
which may have more NIDs than 0x20, thus it may overflow the bitmap
array on them.
Just double the number to cover all and also add a sanity-check code
to be safer.
Cc: <stable@kernel.org> [v3.2+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
arch/arm/mach-pxa/pxa25x.c and arch/arm/mach-pxa/saarb.c
included 'linux/gpio.h' twice, remove the duplicates.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
arch/arm/mach-mmp/: some files include some headers twice:
- arch/arm/mach-mmp/aspenite.c and
arch/arm/mach-mmp/tavorevb.c: 'linux/gpio.h'
- arch/arm/mach-mmp/pxa168.c: 'linux/platform_device.h'
Remove the duplicates.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
pl061 uses same routines for suspend/freeze/poweroff and resume/thaw/restore.
We are only saving and restoring register values on these routines.
During hibernation, in freeze() we take a snapshot of gpio registers. In thaw()
we don't actually need to restore these registers, as power was never shut down
till now. Similarly, in poweroff() we don't need to take snapshot of these
registers again, as it was done during freeze() and by now the image is already
saved on disk.
This patch passes poweroff() and thaw() routines as NULL to avoid this extra
work done.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch converts Microblaze to use the irq_domain remapper and get
away from hard coding the offset between hwirq number and the linux irq
number space. This also paves the way for multiple interrupt controllers.
v2: Don't enable SPARSE_IRQ and keep NR_IRQS set to 33
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: John Williams <john.williams@petalogix.com>
Cc: John Linn <john.linn@xilinx.com>
This patch converts a number of the powerpc drivers to use the common library
of irq_domain xlate functions, dropping a bunch of lines in the process.
v5: - Remove tsi108 changes from patch
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
Make all the irq_domain_ops structures in powerpc 'static const'
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
The c6x irq controllers don't need to define custom .xlate hooks
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
The C6X IRQ support was copied almost verbatim from the PowerPC virtual IRQ
code. The PowerPC code was used as the basis for generic irq_domain support,
so this patch mostly copies what what done to arch/powerpc by Grant Likely
in his irq_domain patch series.
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Make irq_domain_ops pointer a constant to make it safer for multiple
instances to share the same ops pointer and change the irq_domain code
so that it does not modify the ops.
v4: Fix mismatched type reference in powerpc code
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
Rather than having each interrupt controller driver creating its own barely
unique .xlate function for irq_domain, create a library of translators which
any driver can use directly.
v5: - Remove irq_domain_xlate_pci(). It was incorrect.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
irq_domain_add_simple() was a stop-gap measure until complete irq_domain
support was complete. This patch removes the irq_domain_add_simple()
interface.
This patch also drops the explicit irq_domain initialization performed
by the mach-versatile code because the versatile interrupt controller
already has irq_domain support built into it. This was a bug that was
hanging around quietly for a while, but with the full irq_domain which
actually verifies that irq_domain ranges are available it would cause
the registration to fail and the system wouldn't boot.
v4: Fixed number of irqs in mx5 gpio code
v2: Updated to pass in host_data pointer on irq_domain allocation.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Cc: Russell King <linux@arm.linux.org.uk>
Tested-by: Olof Johansson <olof@lixom.net>
This patch removes the simplistic implementation of irq_domains and enables
the powerpc infrastructure for all irq_domain users. The powerpc
infrastructure includes support for complex mappings between Linux and
hardware irq numbers, and can manage allocation of irq_descs.
This patch also converts the few users of irq_domain_add()/irq_domain_del()
to call irq_domain_add_legacy() instead.
v3: Fix bug that set up too many irqs in translation range.
v2: Fix removal of irq_alloc_descs() call in gic driver
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
TWL4030 does handle 3 different interrupts ranges: 8 for the core, 8 for
the power events and 18 for the GPIOs.
Change the total number of interrupts managed by TWL4030 from 8 to 34.
Signed-off-by: Benoit Cousson <b-cousson@ti.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
As the title says, this patch adds empty implementations for the address
translation functions so that they can be used when CONFIG_OF is disabled.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Add support for a legacy mapping where irq = (hwirq - first_hwirq + first_irq)
so that a controller driver can allocate a fixed range of irq_descs and use
a simple calculation to translate back and forth between linux and hw irq
numbers. This is needed to use an irq_domain with many of the ARM interrupt
controller drivers that manage their own irq_desc allocations. Ultimately
the goal is to migrate those drivers to use the linear revmap, but doing it
this way allows each driver to be converted separately which makes the
migration path easier.
This patch generalizes the IRQ_DOMAIN_MAP_LEGACY method to use
(first_irq-first_hwirq) as the offset between hwirq and linux irq number,
and adds checks to make sure that the hwirq number does not exceed range
assigned to the controller.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
Each revmap type has different arguments for setting up the revmap.
This patch splits up the generator functions so that each revmap type
can do its own setup and the user doesn't need to keep track of how
each revmap type handles the arguments.
This patch also adds a host_data argument to the generators. There are
cases where the host_data pointer will be needed before the function returns.
ie. the legacy map calls the .map callback for each irq before returning.
v2: - Add void *host_data argument to irq_domain_add_*() functions
- fixed failure to compile
- Moved IRQ_DOMAIN_MAP_* defines into irqdomain.c
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
No functional changes. Replaces non-exported references to 'host' with domain.
Does not change any symbol names referenced by other .c files.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
zero always means no irq when using irq domains. Get rid of the NO_IRQ
references.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
this function ins't needed anymore.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
with vblank_refcount = 1, there was the case that drm_vblank_put
is called by specific page flip function so this patch fixes the
issue.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
basically, all crtcs are possible to clone each other.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
if one process is terminated by ctrl-c while two processes are
using pageflip feature then for last pageflip event,
user can't get poll from kernel side so this patch fixes the problem.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyoungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch only moves the code. It doesn't make any changes, and the
code is still only compiled for powerpc. Follow-on patches will generalize
the code for other architectures.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
There is only one user, and it is trivial to open-code.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
perf on POWER stopped working after commit e050e3f0a7 (perf: Fix
broken interrupt rate throttling). That patch exposed a bug in
the POWER perf_events code.
Since the PMCs count upwards and take an exception when the top bit
is set, we want to write 0x80000000 - left in power_pmu_start. We were
instead programming in left which effectively disables the counter
until we eventually hit 0x80000000. This could take seconds or longer.
With the patch applied I get the expected number of samples:
SAMPLE events: 9948
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>
Program Check exceptions are the result of WARNs, BUGs, some
type of breakpoints, kprobe, and other illegal instructions.
We want interrupts (and thus preemption) to remain disabled
while doing the initial stage of testing the reason and
branching off to a debugger or kprobe, so we are still on
the original CPU which makes debugging easier in various cases.
This is how the code was intended, hence the local_irq_enable()
right in the middle of program_check_exception().
However, the assembly exception prologue for that exception was
incorrectly marked as enabling interrupts, which defeats that
(and records a redundant enable with lockdep).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since we are heading towards removing the Legacy iSeries platform, start
by no longer building it for ppc64_defconfig.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Upstream changes to the way PHB resources are registered
broke the resource fixup for FSL boards.
We can no longer rely on the resource pointer array for the PHB's
pci_bus structure, so let's leave it alone and go straight for
the PHB resources instead. This also makes the code generally
more readable.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A kernel oops/panic prints an instruction dump showing several
instructions before and after the instruction which caused the
oops/panic.
The code intended that the faulting instruction be enclosed in angle
brackets, however a bug caused the faulting instruction to be
interpreted by printk() as the message log level.
To fix this, the KERN_CONT log level is added before the actual text of
the printed message.
=== Before the patch ===
[ 1081.587266] Instruction dump:
[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
[ 1081.602500] 4e800020 3803ffd0 2b800009
<4>[ 1081.587266] Instruction dump:
<4>[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
<4>[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000
<98090000>[ 1081.602500] 4e800020 3803ffd0 2b800009
=== After the patch ===
[ 51.385216] Instruction dump:
[ 51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
[ 51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009
<4>[ 51.385216] Instruction dump:
<4>[ 51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001
<4>[ 51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use standard ror64() instead of hand-written.
There is no standard ror64, so create it.
The difference is shift value being "unsigned int" instead of uint64_t
(for which there is no reason). gcc starts to emit native ROR instructions
which it doesn't do for some reason currently. This should make the code
faster.
Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>