Mapping from a struct timecounter to a time returned by functions like
ktime_get_real() is implemented. This is sufficient to use this code
in a network device driver which wants to support hardware time
stamping and transformation of hardware time stamps to system time.
The interface could have been made more versatile by not depending on
a time counter, but this wasn't done to avoid writing glue code
elsewhere.
The method implemented here is the one used and analyzed under the name
"assisted PTP" in the LCI PTP paper:
http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far struct clocksource acted as the interface between time/timekeeping.c
and hardware. This patch generalizes the concept so that a similar
interface can also be used in other contexts. For that it introduces
new structures and related functions *without* touching the existing
struct clocksource.
The reasons for adding these new structures to clocksource.[ch] are
* the APIs are clearly related
* struct clocksource could be cleaned up to use the new structs
* avoids proliferation of files with similar names (timesource.h?
timecounter.h?)
As outlined in the discussion with John Stultz, this patch adds
* struct cyclecounter: stateless API to hardware which counts clock cycles
* struct timecounter: stateful utility code built on a cyclecounter which
provides a nanosecond counter
* only the function to read the nanosecond counter; deltas are used internally
and not exposed to users of timecounter
The code does no locking of the shared state. It must be called at least
as often as the cycle counter wraps around to detect these wrap arounds.
Both is the responsibility of the timecounter user.
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: fix build error
to fix:
tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Base versions handle constant folding now. For headers exposed to
userspace, we must only expose the __ prefixed versions.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use cpu_to_le32 directly as it handles constant folding now, replace direct
uses of __constant_cpu_to_{endian} as well.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.
For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Kconfig symbols are not available in userspace, and are not stripped by
headers-install. Avoid their use by adding #defines in <asm/kvm.h> to
suit each architecture.
Signed-off-by: Avi Kivity <avi@redhat.com>
s/HELD_OVER/ENABLED/g
so that its similar to the hard and soft-irq names.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
s/\(LOCKF\?_ENABLED_[^ ]*\)S\(_READ\)\?\>/\1\2/g
So that the USED_IN and ENABLED have the same names.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Here is another version, with the incremental patch rolled up, and
added reclaim context annotation to kswapd, and allocation tracing
to slab allocators (which may only ever reach the page allocator
in rare cases, so it is good to put annotations here too).
Haven't tested this version as such, but it should be getting closer
to merge worthy ;)
--
After noticing some code in mm/filemap.c accidentally perform a __GFP_FS
allocation when it should not have been, I thought it might be a good idea to
try to catch this kind of thing with lockdep.
I coded up a little idea that seems to work. Unfortunately the system has to
actually be in __GFP_FS page reclaim, then take the lock, before it will mark
it. But at least that might still be some orders of magnitude more common
(and more debuggable) than an actual deadlock condition, so we have some
improvement I hope (the concept is no less complete than discovery of a lock's
interrupt contexts).
I guess we could even do the same thing with __GFP_IO (normal reclaim), and
even GFP_NOIO locks too... but filesystems will have the most locks and fiddly
code paths, so let's start there and see how it goes.
It *seems* to work. I did a quick test.
=================================
[ INFO: inconsistent lock state ]
2.6.28-rc6-00007-ged31348-dirty #26
---------------------------------
inconsistent {in-reclaim-W} -> {ov-reclaim-W} usage.
modprobe/8526 [HC0[0]:SC0[0]:HE1:SE1] takes:
(testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
{in-reclaim-W} state was registered at:
[<ffffffff80267bdb>] __lock_acquire+0x75b/0x1a60
[<ffffffff80268f71>] lock_acquire+0x91/0xc0
[<ffffffff8070f0e1>] mutex_lock_nested+0xb1/0x310
[<ffffffffa002002b>] brd_init+0x2b/0x216 [brd]
[<ffffffff8020903b>] _stext+0x3b/0x170
[<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
[<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 3929
hardirqs last enabled at (3929): [<ffffffff8070f2b5>] mutex_lock_nested+0x285/0x310
hardirqs last disabled at (3928): [<ffffffff8070f089>] mutex_lock_nested+0x59/0x310
softirqs last enabled at (3732): [<ffffffff8061f623>] sk_filter+0x83/0xe0
softirqs last disabled at (3730): [<ffffffff8061f5b6>] sk_filter+0x16/0xe0
other info that might help us debug this:
1 lock held by modprobe/8526:
#0: (testlock){--..}, at: [<ffffffffa0020055>] brd_init+0x55/0x216 [brd]
stack backtrace:
Pid: 8526, comm: modprobe Not tainted 2.6.28-rc6-00007-ged31348-dirty #26
Call Trace:
[<ffffffff80265483>] print_usage_bug+0x193/0x1d0
[<ffffffff80266530>] mark_lock+0xaf0/0xca0
[<ffffffff80266735>] mark_held_locks+0x55/0xc0
[<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
[<ffffffff802667ca>] trace_reclaim_fs+0x2a/0x60
[<ffffffff80285005>] __alloc_pages_internal+0x475/0x580
[<ffffffff8070f29e>] ? mutex_lock_nested+0x26e/0x310
[<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
[<ffffffffa002006a>] brd_init+0x6a/0x216 [brd]
[<ffffffffa0020000>] ? brd_init+0x0/0x216 [brd]
[<ffffffff8020903b>] _stext+0x3b/0x170
[<ffffffff8070f8b9>] ? mutex_unlock+0x9/0x10
[<ffffffff8070f83d>] ? __mutex_unlock_slowpath+0x10d/0x180
[<ffffffff802669ec>] ? trace_hardirqs_on_caller+0x12c/0x190
[<ffffffff80272ebf>] sys_init_module+0xaf/0x1e0
[<ffffffff8020c3fb>] system_call_fastpath+0x16/0x1b
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This modifies the timer code in a way to allow lockdep to detect
deadlocks resulting from a lock being taken in the timer function
as well as around the del_timer_sync() call.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Add a more flexible BSS lookup function so that mac80211 or
other drivers can actually use this for getting the BSS to
connect to.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch introduces cfg80211_unlink_bss, a function to
allow a driver to remove a BSS from the internal list and
make it not show up in scan results any more -- this is
to be used when the driver detects that the BSS is no
longer available.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When cfg80211 users have their own allocated data in the per-BSS
private data, they will need to free this when the BSS struct is
destroyed. Add a free_priv method and fix one place where the BSS
was kfree'd rather than released properly.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds basic scan capability to cfg80211/nl80211 and
changes mac80211 to use it. The BSS list that cfg80211 maintains
is made driver-accessible with a private area in each BSS struct,
but mac80211 doesn't yet use it. That's another large project.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The atomic requirement for the TSF callbacks
is outdated. get_tsf() is only called by
ieee80211_rx_bss_info() which is indirectly
called by the work queue ieee80211_sta_work().
In the same context are called several other
non-atomic functions, too.
And the atomic requirement causes problems
for drivers of USB wifi cards.
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ASoC: Only register AC97 bus if it's not done already
ALSA: hda - Add snd_hda_multi_out_dig_cleanup()
ALSA: hda - Add missing terminator in slave dig-out array
ALSA: hda - Change HP dv7 (103c:30f4) quirk from hp-m4 to hp-dv5 model
ALSA: hda - Register (new) devices at reconfig
ALSA: mtpav - Fix initial value for input hwport
ALSA: hda - add id for Intel IbexPeak integrated HDMI codec
ALSA: hda - compute checksum in HDMI audio infoframe
ALSA: hda - enable HDMI audio pin out at module loading time
ALSA: hda - allow multi-channel HDMI audio playback when ELD is not present
ASoC: Update SDP3430 machine driver for snd_soc_card
ALSA: hda - Add quirk for Asus z37e (1043:8284)
sound: Remove OSSlib stuff from linux/soundcard.h
ASoC: WM8990: Fix kcontrol's private value use in put callback
ASoC: TLV320AIC3X: Fix kcontrol's private value use in put callback
Convert the c/p state "power" tracer to use tracepoints. Avoids a
function call when the tracer is disabled.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Jaswinder Singh Rajput reported that commit 23a185ca8a caused the
context switch and migration software counters to report zero always.
With that commit, the software counters only count events that occur
between sched-in and sched-out for a task. This is necessary for the
counter enable/disable prctls and ioctls to work. However, the
context switch and migration counts are incremented after sched-out
for one task and before sched-in for the next. Since the increment
doesn't occur while a task is scheduled in (as far as the software
counters are concerned) it doesn't count towards any counter.
Thus the context switch and migration counters need to count events
that occur at any time, provided the counter is enabled, not just
those that occur while the task is scheduled in (from the perf_counter
subsystem's point of view). The problem though is that the software
counter code can't tell the difference between being enabled and being
scheduled in, and between being disabled and being scheduled out,
since we use the one pair of enable/disable entry points for both.
That is, the high-level disable operation simply arranges for the
counter to not be scheduled in any more, and the high-level enable
operation arranges for it to be scheduled in again.
One way to solve this would be to have sched_in/out operations in the
hw_perf_counter_ops struct as well as enable/disable. However, this
takes a simpler approach: it adds a 'prev_state' field to the
perf_counter struct that allows a counter's enable method to know
whether the counter was previously disabled or just inactive
(scheduled out), and therefore whether the enable method is being
called as a result of a high-level enable or a schedule-in operation.
This then allows the context switch, migration and page fault counters
to reset their hw.prev_count value in their enable functions only if
they are called as a result of a high-level enable operation.
Although page faults would normally only occur while the counter is
scheduled in, this changes the page fault counter code too in case
there are ever circumstances where page faults get counted against a
task while its counters are not scheduled in.
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
wimax: fix oops in wimax_dev_get_by_genl_info() when looking up non-wimax iface
net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2
netxen: fix compile waring "label ‘set_32_bit_mask’ defined but not used" on IA64 platform
bnx2: Update version to 1.9.2 and copyright.
bnx2: Fix jumbo frames error handling.
bnx2: Update 5709 firmware.
bnx2: Update 5706/5708 firmware.
3c505: do not set pcb->data.raw beyond its size
Documentation/connector/cn_test.c: don't use gfp_any()
net: don't use in_atomic() in gfp_any()
IRDA: cnt is off by 1
netxen: remove pcie workaround
sun3: print when lance_open() fails
qlge: bugfix: Add missing rx buf clean index on early exit.
qlge: bugfix: Fix RX scaling values.
qlge: bugfix: Fix TSO breakage.
qlge: bugfix: Add missing dev_kfree_skb_any() call.
qlge: bugfix: Add missing put_page() call.
qlge: bugfix: Fix fatal error recovery hang.
qlge: bugfix: Use netif_receive_skb() and vlan_hwaccel_receive_skb().
...
The problem is that in_atomic() will return false inside spinlocks if
CONFIG_PREEMPT=n. This will lead to deadlockable GFP_KERNEL allocations
from spinlocked regions.
Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking
will instead use GFP_ATOMIC from this callsite. Hence we won't get the
might_sleep() debugging warnings which would have informed us of the buggy
callsites.
Solve both these problems by switching to in_interrupt(). Now, if someone
runs a gfp_any() allocation from inside spinlock we will get the warning
if CONFIG_PREEMPT=y.
I reviewed all callsites and most of them were too complex for my little
brain and none of them documented their interface requirements. I have no
idea what this patch will do.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: avoid corruption in system time accounting
Martin Schwidefsky told me that there was an issue with NMIs and
system accounting. The problem is that the accounting code is
not reentrant, and if an NMI goes off after an interrupt it can
corrupt the accounting.
For now, the best we can do is to treat NMIs like SMIs and they
are not accounted for.
This patch changes nmi_enter to not call __irq_enter and to do
the preempt-count and tracing calls directly.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
To add a bit in the preempt_count to be set when in NMI context, we
found that some archs did not have enough bits to spare. This is
due to the hardirq_count being a mask that can hold NR_IRQS.
Some archs allow for over 16000 IRQs, and that would require a mask
of 14 bits. The sofitrq mask is 8 bits and the preempt disable mask
is also 8 bits. The PREEMP_ACTIVE bit is bit 30, and bit 31 would
make the preempt_count (which is type int) a negative number.
A negative preempt_count is a sign of failure.
Add them up 14+8+8+1+1 you get 32 bits. No room for the NMI bit.
But the hardirq_count is to track the number of nested IRQs, not
the number of total IRQs. This originally took the paranoid approach
of setting the max nesting to NR_IRQS. But when we have archs with
over 1000 IRQs, it is not practical to think they will ever all
nest on a single CPU. Not to mention that this would most definitely
cause a stack overflow.
This patch sets a max of 10 bits to be used for IRQ nesting.
I did a 'git grep HARDIRQ' to examine all users of HARDIRQ_BITS and
HARDIRQ_MASK, and found that making it a max of 10 would not hurt
anyone. I did find that the m68k expected it to be 8 bits, so
I allow for the archs to set the number to be less than 10.
I removed the setting of HARDIRQ_BITS from the archs that set it
to more than 10. This includes ALPHA, ia64 and avr32.
This will always allow room for the NMI bit, and if we need to allow
for NMI nesting, we have 4 bits to play with.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This patch allows LSM modules to determine whether current process is in an
execve operation or not so that they can behave differently while an execve
operation is in progress.
This patch is needed by TOMOYO. Please see another patch titled "LSM adapter
functions." for backgrounds.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Some of the kerneldoc comments in the dmaengine header describe
already removed structure members. Remove them.
Also add a short description for dma_device->device_is_tx_complete.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Based on discussions on linux-audit, as per Steve Grubb's request
http://lkml.org/lkml/2009/2/6/269, the following changes were made:
- forced audit result to be either 0 or 1.
- made template names const
- Added new stand-alone message type: AUDIT_INTEGRITY_RULE
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
With the new system call defines we get this on uml:
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x308): undefined reference to `sys_sigprocmask'
Reason for this is that uml passes the preprocessor option
-Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel.
This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to
SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system
call named sys_kernel_sigprocmask. However sys_sigprocmask is missing
because of this.
To avoid macro expansion for the system call name just concatenate the
name at first define instead of carrying it through severel levels.
This was pointed out by Al Viro.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I enabled all cgroup subsystems when compiling kernel, and then:
# mount -t cgroup -o net_cls xxx /mnt
# mkdir /mnt/0
This showed up immediately:
BUG: MAX_LOCKDEP_SUBCLASSES too low!
turning off the locking correctness validator.
It's caused by the cgroup hierarchy lock:
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->root == root)
mutex_lock_nested(&ss->hierarchy_mutex, i);
}
Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
MAX_LOCKDEP_SUBCLASSES is 8.
This patch uses different lockdep keys for different subsystems.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: fix TIMER_ABSTIME for process wide cpu timers
timers: split process wide cpu clocks/timers, fix
x86: clean up hpet timer reinit
timers: split process wide cpu clocks/timers, remove spurious warning
timers: split process wide cpu clocks/timers
signal: re-add dead task accumulation stats.
x86: fix hpet timer reinit for x86_64
sched: fix nohz load balancer on cpu offline
Ptrace_detach() races with __ptrace_unlink() if the traced task is
reaped while detaching. This might cause a double-free of the BTS
buffer.
Change the ptrace_detach() path to only do the memory accounting in
ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
which will be called from __ptrace_unlink().
The fix follows a proposal from Oleg Nesterov.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The POSIX timer interface allows for absolute time expiry values through the
TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock
every time we start it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>