Conflicts:
arch/powerpc/kernel/time.c
Reason: The powerpc next tree contains two commits which conflict with
the timekeeping changes:
8fd63a9e powerpc: Rework VDSO gettimeofday to prevent time going backwards
c1aa687d powerpc: Clean up obsolete code relating to decrementer and timebase
John Stultz identified them and provided the conflict resolution.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since the decrementer and timekeeping code was moved over to using
the generic clockevents and timekeeping infrastructure, several
variables and functions have been obsolete and effectively unused.
This deletes them.
In particular, wakeup_decrementer() is no longer needed since the
generic code reprograms the decrementer as part of the process of
resuming the timekeeping code, which happens during sysdev resume.
Thus the wakeup_decrementer calls in the suspend_enter methods for
52xx platforms have been removed. The call in the powermac cpu
frequency change code has been replaced by set_dec(1), which will
cause a timer interrupt as soon as interrupts are enabled, and the
generic code will then reprogram the decrementer with the correct
value.
This also simplifies the generic_suspend_en/disable_irqs functions
and makes them static since they are not referenced outside time.c.
The preempt_enable/disable calls are removed because the generic
code has disabled all but the boot cpu at the point where these
functions are called, so we can't be moved to another cpu.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently it is possible for userspace to see the result of
gettimeofday() going backwards by 1 microsecond, assuming that
userspace is using the gettimeofday() in the VDSO. The VDSO
gettimeofday() algorithm computes the time in "xsecs", which are
units of 2^-20 seconds, or approximately 0.954 microseconds,
using the algorithm
now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec
and then converts the time in xsecs to seconds and microseconds.
The kernel updates the tb_orig_stamp and stamp_xsec values every
tick in update_vsyscall(). If the length of the tick is not an
integer number of xsecs, then some precision is lost in converting
the current time to xsecs. For example, with CONFIG_HZ=1000, the
tick is 1ms long, which is 1048.576 xsecs. That means that
stamp_xsec will advance by either 1048 or 1049 on each tick.
With the right conditions, it is possible for userspace to get
(timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
slightly late in updating the vdso_datapage, and then for stamp_xsec
to advance by 1048 when the kernel does update it, and for userspace
to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
integer truncation. The result is that time appears to go backwards
by 1 microsecond.
To fix this we change the VDSO gettimeofday to use a new field in the
VDSO datapage which stores the nanoseconds part of the time as a
fractional number of seconds in a 0.32 binary fraction format.
(Or put another way, as a 32-bit number in units of 0.23283 ns.)
This is convenient because we can use the mulhwu instruction to
convert it to either microseconds or nanoseconds.
Since it turns out that computing the time of day using this new field
is simpler than either using stamp_xsec (as gettimeofday does) or
stamp_xtime.tv_nsec (as clock_gettime does), this converts both
gettimeofday and clock_gettime to use the new field. The existing
__do_get_tspec function is converted to use the new field and take
a parameter in r7 that indicates the desired resolution, 1,000,000
for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec
function is then unused and is deleted.
The new algorithm is
now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
+ (stamp_xtime_seconds << 32) + stamp_sec_fraction
with 'now' in units of 2^-32 seconds. That is then converted to
seconds and either microseconds or nanoseconds with
seconds = now >> 32
partseconds = ((now & 0xffffffff) * resolution) >> 32
The 32-bit VDSO code also makes a further simplification: it ignores
the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
fraction. Doing so gets rid of 4 multiply instructions. Assuming
a timebase frequency of 1GHz or less and an update interval of no
more than 10ms, the upper 32 bits of tb_to_xs will be at least
4503599, so the error from ignoring the low 32 bits will be at most
2.2ns, which is more than an order of magnitude less than the time
taken to do gettimeofday or clock_gettime on our fastest processors,
so there is no possibility of seeing inconsistent values due to this.
This also moves update_gtod() down next to its only caller, and makes
update_vsyscall use the time passed in via the wall_time argument rather
than accessing xtime directly. At present, wall_time always points to
xtime, but that could change in future.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Try to enable data division support for FCP devices and indicate in
the adapter status flag if it succeeded.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Per the da850/omap-l138 Beta EVM SOM schematic, the DEFDCDC2 and
DEFDCDC3 lines are tied high. This leads to a 3.3V IO and 1.2V CVDD
voltage.
Pass the right platform data to the TPS6507x driver so it can operate
on the DEFDCDC{2,3}_HIGH register to read and change voltage levels.
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
This patch simply declares the new sys_fanotify_mark syscall
int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
int dfd const char *pathname)
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch defines a new syscall fanotify_init() of the form:
int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
unsigned int priority)
This syscall is used to create and fanotify group. This is very similar to
the inotify_init() syscall.
Signed-off-by: Eric Paris <eparis@redhat.com>
This is the soc_camera support developed by Sascha Hauer for the i.MX27. Alan
Carvalho de Assis modified the original driver to get it working on more recent
kernels. I modified it further to add support for i.MX25. This driver has been
tested on i.MX25 and i.MX27 based platforms.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
The GPIO registers need protection from concurrent access for operations that
are not atomic.
Cc: stable@kernel.org
Cc: Juergen Beisert <j.beisert@pengutronix.de>
Cc: Daniel Mack <daniel@caiaq.de>
Reported-by: rpkamiak@rockwellcollins.com
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Don't quote $nm in the script for checking the vdso for external
references. Doing so breaks multiword constructs, like using
CROSS_COMPILE='ccache '.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20100728134252.2e4c27cf.sfr@canb.auug.org.au>
Clean up and simplify set_64bit(). This code is quite old (1.3.11)
and contains a fair bit of auxilliary machinery that current versions
of gcc handle just fine automatically. Worse, the auxilliary
machinery can actually cause an unnecessary spill to memory.
Furthermore, the loading of the old value inside the loop in the
32-bit case is unnecessary: if the value doesn't match, the CMPXCHG8B
instruction will already have loaded the "new previous" value for us.
Clean up the comment, too, and remove page references to obsolete
versions of the Intel SDM.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@vger.kernel.org>
gcc 4.4.4 will complain if you use a .discard section for both text and
data ("causes a section type conflict"). Add support for ".discard.*"
sections, and use .discard.text for a dummy function in the x86
RESERVE_BRK() macro.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xchg() and cmpxchg() modify their memory operands, not merely read
them. For some versions of gcc the "memory" clobber has apparently
dealt with the situation, but not for all.
Originally-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Glauber Costa <glommer@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Peter Palfrader <peter@palfrader.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Zachary Amsden <zamsden@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: <stable@kernel.org>
LKML-Reference: <4C4F7277.8050306@zytor.com>
I've been trying to avoid this for a long time ... but per-cpu space
has slowly been growing. Tejun has some patches in linux-next that
pre-reserve some space (PERCPU_DYNAMIC_EARLY_SIZE) for use before
slab comes online ... and this pushes ia64 above the 64K current
limit on static percpu space.
I could probably squeeze it back under (we are only over by 512 bytes).
But I don't think that I'll be able to squeeze it down enough to build
a comfortable breathing space - and I don't want to keep nibbling off
a dozen bytes here and there every time some generic code bumps us
back over the limit.
Next available supported page size is 256K ... so we have to quadruple
the available space - a bigger jump than I'd like. But perhaps it will
be enough to last a few more years before it needs to be increased again.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The etr events switch-to-local and sync-check disable the synchronous clock
and schedule a work queue that tries to get the clock back into sync.
If another switch-to-local or sync-check event occurs while the work queue
function etr_work_fn still runs the eacr.es bit and the clock_sync_word can
become inconsistent because check_sync_clock only uses the clock_sync_word
to determine if the clock is in sync or not. The second pass of the
etr_work_fn will reset the eacr.es bit but will leave the clock_sync_word
intact. Fix this race by moving the reset of the eacr.es bit into the
switch-to-local and sync-check functions and by checking the eacr.es bit
as well to decide if the clock needs to be synced.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case user space is single stepped (PER) the program check handler
claims too early that IRQs are enabled on the return path.
Subsequent checks will notice that the IRQ mask in the PSW and
what lockdep thinks the IRQ mask should be do not correlate and
therefore will print a warning to the console and disable lockdep.
Fix this by doing all the work within the correct context.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
functions.
We add the glue code that sets up a dma_ops structure with the
xen_swiotlb_* functions. The code turns on xen_swiotlb flag
when it detects it is running under Xen and it is either
in privileged mode or the iommu=soft flag was passed in.
It also disables the bare-metal SWIOTLB if the Xen-SWIOTLB has
been enabled.
Note: The Xen-SWIOTLB is only built when CONFIG_XEN is enabled.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Albert Herranz <albert_herranz@yahoo.es>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Rather than trying to deal with aliases once they appear, just completely
inhibit them. Mostly the removal of aliases was managable, but it comes
unstuck in xen_create_contiguous_region() because it gets executed at
interrupt time (as a result of dma_alloc_coherent()), which causes all
sorts of confusion in the vmap code, as it was never intended to be run
in interrupt context.
This has the unfortunate side effect of removing all the unmap batching
the vmap code so carefully added, but that can't be helped.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add LPC32XX support in arch/arm/Kconfig and arch/arm/Makefile
Signed-off-by: Kevin Wells <wellsk40@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Platform support file for the PHY3250 mach id
Signed-off-by: Kevin Wells <wellsk40@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Misc support functions and prototypes used in the LPC32XX arch
and platforms
Signed-off-by: Kevin Wells <wellsk40@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
This patch exports the capability of the AMD IOMMU to force
cache coherency of DMA transactions through the IOMMU-API.
This is required to disable some nasty hacks in KVM when
this capability is not available.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
We should use perf_sample_data_init() to initialize struct
perf_sample_data. As explained in the description of commit dc1d628a
("perf: Provide generic perf_sample_data initialization"), it is
possible for userspace to get the kernel to dereference data.raw,
so if it is not initialized, that means that unprivileged userspace
can possibly oops the kernel. Using perf_sample_data_init makes sure
it gets initialized to NULL.
This conversion should have been included in commit dc1d628a, but it
got missed.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This converts the most common of the x86 clocksources over to use
clocksource_register_hz/khz.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-11-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch converts the um arch to use read_persistent_clock().
This allows it to avoid accessing xtime and wall_to_monotonic
directly.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
LKML-Reference: <1279068988-21864-8-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
update_vsyscall() did not provide the wall_to_monotoinc offset,
so arch specific implementations tend to reference wall_to_monotonic
directly. This limits future cleanups in the timekeeping core, so
this patch fixes the update_vsyscall interface to provide
wall_to_monotonic, allowing wall_to_monotonic to be made static
as planned in Documentation/feature-removal-schedule.txt
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This removes powerpc's direct xtime usage, allowing for further
generic timeekeping cleanups
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
LKML-Reference: <1279068988-21864-6-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently powerpc's update_vsyscall calls an inline update_gtod.
However, both are straightforward, and there are no other users,
so this patch merges update_gtod into update_vsyscall.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1279068988-21864-5-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Now that all arches have been converted over to use generic time via
clocksources or arch_gettimeoffset(), we can remove the GENERIC_TIME
config option and simplify the generic code.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-4-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Due to vtime calling vgettimeofday(), its possible that an application
could call time();create("stuff",O_RDRW); only to see the file's
creation timestamp to be before the value returned by time.
A similar way to reproduce the issue is to compare the vsyscall time()
with the syscall time(), and observe ordering issues.
The modified test case from Oleg Nesterov below can illustrate this:
int main(void)
{
time_t sec1,sec2;
do {
sec1 = time(&sec2);
sec2 = syscall(__NR_time, NULL);
} while (sec1 <= sec2);
printf("vtime: %d.000000\n", sec1);
printf("time: %d.000000\n", sec2);
return 0;
}
The proper fix is to make vtime use the same time value as
current_kernel_time() (which is exported via update_vsyscall) instead of
vgettime().
Thanks to Jiri Olsa for bringing up the issue and catching bugs in
earlier verisons of this fix.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
LKML-Reference: <1279068988-21864-2-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Common drivers for the LPC32XX used on all platforms
Signed-off-by: Kevin Wells <wellsk40@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Clock driver for the LPC32XX architecture
Signed-off-by: Kevin Wells <wellsk40@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Debug (printascii) and IRQ handler macros for the LPC32XX
arch
Signed-off-by: Kevin Wells <wellsk40@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
x86 calls machine_shutdown() from the various machine_*() calls which
take the machine down ready for halting, restarting, etc, and uses
this to bring the system safely to a point where those actions can be
performed. Such actions are stopping the secondary CPUs.
So, change the ARM implementation of these to reflect what x86 does.
This solves kexec problems on ARM SMP platforms, where the secondary
CPUs were left running across the kexec call.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The TWD local timers are unable to wake up the CPU when it is placed
into a low power mode, eg. C3. Therefore, we need to adapt things
such that the TWD code can cope with this.
We do this by always providing a broadcast tick function, and marking
the fact that the TWD local timer will stop in low power modes. This
means that when the CPU is placed into a low power mode, the core
timer code marks this fact, and allows an IPI to be given to the core.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
All implementations of cpu_proc_fin() start by disabling interrupts
and then flush caches. Rather than have every processors proc_fin()
implementation do this, move it out into generic code - and move the
cache flush past setup_mm_for_reboot() (so it can benefit from having
caches still enabled.)
This allows cpu_proc_fin() to become independent of the L1/L2 cache
types, and eventually move the L2 cache flushing into the L2 support
code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Statuses 3 (0b00011) and 6 (0x00110) of DFSR are Access Flags faults on
ARMv6K and ARMv7. Let's patch fsr_info[] at runtime if we are on ARMv7
or later.
Unfortunately, we don't have runtime check for 'K' extension, so we
can't check for it.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On ARM one Linux PGD entry contains two hardware entries (see page
tables layout in pgtable.h). We normally guarantee that we always
fill both L1 entries. But create_mapping() doesn't follow the rule.
It can create inidividual L1 entries, so here we have to call
pmd_none() check in do_translation_fault() for the entry really
corresponded to address, not for the first of pair.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add one more parameter to hook_fault_code() to be able to set 'code'
field of struct fsr_info.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
POSIX specify to use signal SIGBUS with code BUS_ADRALN for invalid
address alignment.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
SPARSE_IRQ doesn't need to be a visible option, only those platforms
supporting that will select it.
Signed-off-by: Eric Miao <eric.miao@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The DMA coherent remap area is used to provide an uncached mapping
of memory for coherency with DMA engines. Currently, we look for
any free hole which our allocation will fit in with page alignment.
However, this can lead to fragmentation of the area, and allows small
allocations to cross L1 entry boundaries. This is undesirable as we
want to move towards allocating sections of memory.
Align allocations according to the size, limiting the alignment between
the page and section sizes.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>