8 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  Gideon Israel Dsouza | 52f5684c8e | kernel: use macros from compiler.h instead of __attribute__((...)) To increase compiler portability there is <linux/compiler.h> which provides convenience macros for various gcc constructs. Eg: __weak for __attribute__((weak)). I've replaced all instances of gcc attributes with the right macro in the kernel subsystem. Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> | ||
|  Fernando Luis Vazquez Cao | 96b3d28bf4 | sched/clock: Prevent tracing recursion in sched_clock_cpu() Prevent tracing of preempt_disable/enable() in sched_clock_cpu().
When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are
traced and this causes trace_clock() users (and probably others) to
go into an infinite recursion. Systems with a stable sched_clock()
are not affected.
This problem is similar to that fixed by upstream commit  | ||
|  Peter Zijlstra | d375b4e0fa | sched/clock: Fixup early initialization The code would assume sched_clock_stable() and switch to !stable
later, this switch brings a discontinuity in time.
The discontinuity on switching from stable to unstable was always
present, but previously we would set stable/unstable before
initializing TSC and usually stick to the one we start out with.
So the static_key bits brought an extra switch where there previously
wasn't one.
Things are further complicated by the fact that we cannot use
static_key as early as we usually call set_sched_clock_stable().
Fix things by tracking the stable state in a regular variable and only
set the static_key to the right state on sched_clock_init(), which is
ran right after late_time_init->tsc_init().
Before this we would not be using the TSC anyway.
Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: dyoung@redhat.com
Fixes:  | ||
|  Peter Zijlstra | 6577e42a3e | sched/clock: Fix up clear_sched_clock_stable() The below tells us the static_key conversion has a problem; since the exact point of clearing that flag isn't too important, delay the flip and use a workqueue to process it. [ ] TSC synchronization [CPU#0 -> CPU#22]: [ ] Measured 8 cycles TSC warp between CPUs, turning off TSC clock. [ ] [ ] ====================================================== [ ] [ INFO: possible circular locking dependency detected ] [ ] 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 Not tainted [ ] ------------------------------------------------------- [ ] swapper/0/1 is trying to acquire lock: [ ] (jump_label_mutex){+.+...}, at: [<ffffffff8115a637>] jump_label_lock+0x17/0x20 [ ] [ ] but task is already holding lock: [ ] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60 [ ] [ ] which lock already depends on the new lock. [ ] [ ] [ ] the existing dependency chain (in reverse order) is: [ ] [ ] -> #1 (cpu_hotplug.lock){+.+.+.}: [ ] [<ffffffff810def00>] lock_acquire+0x90/0x130 [ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0 [ ] [<ffffffff81093fdc>] get_online_cpus+0x3c/0x60 [ ] [<ffffffff8104cc67>] arch_jump_label_transform+0x37/0x130 [ ] [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80 [ ] [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0 [ ] [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0 [ ] [<ffffffff810c0f65>] sched_feat_set+0xf5/0x100 [ ] [<ffffffff810c5bdc>] set_numabalancing_state+0x2c/0x30 [ ] [<ffffffff81d12f3d>] numa_policy_init+0x1af/0x1b7 [ ] [<ffffffff81cebdf4>] start_kernel+0x35d/0x41f [ ] [<ffffffff81ceb5a5>] x86_64_start_reservations+0x2a/0x2c [ ] [<ffffffff81ceb6a2>] x86_64_start_kernel+0xfb/0xfe [ ] [ ] -> #0 (jump_label_mutex){+.+...}: [ ] [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0 [ ] [<ffffffff810def00>] lock_acquire+0x90/0x130 [ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0 [ ] [<ffffffff8115a637>] jump_label_lock+0x17/0x20 [ ] [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0 [ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20 [ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70 [ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150 [ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890 [ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160 [ ] [<ffffffff810942c9>] cpu_up+0x79/0x90 [ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c [ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197 [ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130 [ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0 [ ] [ ] other info that might help us debug this: [ ] [ ] Possible unsafe locking scenario: [ ] [ ] CPU0 CPU1 [ ] ---- ---- [ ] lock(cpu_hotplug.lock); [ ] lock(jump_label_mutex); [ ] lock(cpu_hotplug.lock); [ ] lock(jump_label_mutex); [ ] [ ] *** DEADLOCK *** [ ] [ ] 2 locks held by swapper/0/1: [ ] #0: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81094037>] cpu_maps_update_begin+0x17/0x20 [ ] #1: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60 [ ] [ ] stack backtrace: [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ ] ffffffff82c9c270 ffff880236843bb8 ffffffff8165c5f5 ffffffff82c9c270 [ ] ffff880236843bf8 ffffffff81658c02 ffff880236843c80 ffff8802368586a0 [ ] ffff880236858678 0000000000000001 0000000000000002 ffff880236858000 [ ] Call Trace: [ ] [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a [ ] [<ffffffff81658c02>] print_circular_bug+0x1f9/0x207 [ ] [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0 [ ] [<ffffffff816680ff>] ? __atomic_notifier_call_chain+0x8f/0xb0 [ ] [<ffffffff810def00>] lock_acquire+0x90/0x130 [ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20 [ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20 [ ] [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0 [ ] [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20 [ ] [<ffffffff8115a637>] jump_label_lock+0x17/0x20 [ ] [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0 [ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20 [ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70 [ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150 [ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890 [ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160 [ ] [<ffffffff810942c9>] cpu_up+0x79/0x90 [ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c [ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130 [ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] ------------[ cut here ]------------ [ ] WARNING: CPU: 0 PID: 1 at /usr/src/linux-2.6/kernel/smp.c:374 smp_call_function_many+0xad/0x300() [ ] Modules linked in: [ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 [ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010 [ ] 0000000000000009 ffff880236843be0 ffffffff8165c5f5 0000000000000000 [ ] ffff880236843c18 ffffffff81093d8c 0000000000000000 0000000000000000 [ ] ffffffff81ccd1a0 ffffffff810ca951 0000000000000000 ffff880236843c28 [ ] Call Trace: [ ] [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a [ ] [<ffffffff81093d8c>] warn_slowpath_common+0x8c/0xc0 [ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0 [ ] [<ffffffff81093dda>] warn_slowpath_null+0x1a/0x20 [ ] [<ffffffff8110b72d>] smp_call_function_many+0xad/0x300 [ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30 [ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30 [ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0 [ ] [<ffffffff8110ba96>] smp_call_function+0x46/0x80 [ ] [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30 [ ] [<ffffffff8110bb3c>] on_each_cpu+0x3c/0xa0 [ ] [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20 [ ] [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0 [ ] [<ffffffff8104f964>] text_poke_bp+0x64/0xd0 [ ] [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20 [ ] [<ffffffff8104ccde>] arch_jump_label_transform+0xae/0x130 [ ] [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80 [ ] [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0 [ ] [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0 [ ] [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20 [ ] [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70 [ ] [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150 [ ] [<ffffffff81076612>] native_cpu_up+0x3a2/0x890 [ ] [<ffffffff810941cb>] _cpu_up+0xdb/0x160 [ ] [<ffffffff810942c9>] cpu_up+0x79/0x90 [ ] [<ffffffff81d0af6b>] smp_init+0x60/0x8c [ ] [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] [<ffffffff8164e32e>] kernel_init+0xe/0x130 [ ] [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0 [ ] [<ffffffff8164e320>] ? rest_init+0xd0/0xd0 [ ] ---[ end trace 6ff1df5620c49d26 ]--- [ ] tsc: Marking TSC unstable due to check_tsc_sync_source failed Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-v55fgqj3nnyqnngmvuu8ep6h@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> | ||
|  Peter Zijlstra | 35af99e646 | sched/clock, x86: Use a static_key for sched_clock_stable In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.
Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.
                        MAINLINE   PRE       POST
    sched_clock_stable: 1          1         1
    (cold) sched_clock: 329841     221876    215295
    (cold) local_clock: 301773     234692    220773
    (warm) sched_clock: 38375      25602     25659
    (warm) local_clock: 100371     33265     27242
    (warm) rdtsc:       27340      24214     24208
    sched_clock_stable: 0          0         0
    (cold) sched_clock: 382634     235941    237019
    (cold) local_clock: 396890     297017    294819
    (warm) sched_clock: 38194      25233     25609
    (warm) local_clock: 143452     71234     71232
    (warm) rdtsc:       27345      24245     24243
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org> | ||
|  Peter Zijlstra | ef08f0fff8 | sched/clock: Remove local_irq_disable() from the clocks Now that x86 no longer requires IRQs disabled for sched_clock() and
ia64 never had this requirement (it doesn't seem to do cpufreq at
all), we can remove the requirement of disabling IRQs.
                        MAINLINE   PRE        POST
    sched_clock_stable: 1          1          1
    (cold) sched_clock: 329841     257223     221876
    (cold) local_clock: 301773     309889     234692
    (warm) sched_clock: 38375      25280      25602
    (warm) local_clock: 100371     85268      33265
    (warm) rdtsc:       27340      24247      24214
    sched_clock_stable: 0          0          0
    (cold) sched_clock: 382634     301224     235941
    (cold) local_clock: 396890     399870     297017
    (warm) sched_clock: 38194      25630      25233
    (warm) local_clock: 143452     129629     71234
    (warm) rdtsc:       27345      24307      24245
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-36e5kohiasnr106d077mgubp@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org> | ||
|  Thomas Gleixner | a1cbcaa9ea | sched_clock: Prevent 64bit inatomicity on 32bit systems The sched_clock_remote() implementation has the following inatomicity problem on 32bit systems when accessing the remote scd->clock, which is a 64bit value. CPU0 CPU1 sched_clock_local() sched_clock_remote(CPU0) ... remote_clock = scd[CPU0]->clock read_low32bit(scd[CPU0]->clock) cmpxchg64(scd->clock,...) read_high32bit(scd[CPU0]->clock) While the update of scd->clock is using an atomic64 mechanism, the readout on the remote cpu is not, which can cause completely bogus readouts. It is a quite rare problem, because it requires the update to hit the narrow race window between the low/high readout and the update must go across the 32bit boundary. The resulting misbehaviour is, that CPU1 will see the sched_clock on CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value to this bogus timestamp. This stays that way due to the clamping implementation for about 4 seconds until the synchronization with CLOCK_MONOTONIC undoes the problem. The issue is hard to observe, because it might only result in a less accurate SCHED_OTHER timeslicing behaviour. To create observable damage on realtime scheduling classes, it is necessary that the bogus update of CPU1 sched_clock happens in the context of an realtime thread, which then gets charged 4 seconds of RT runtime, which results in the RT throttler mechanism to trigger and prevent scheduling of RT tasks for a little less than 4 seconds. So this is quite unlikely as well. The issue was quite hard to decode as the reproduction time is between 2 days and 3 weeks and intrusive tracing makes it less likely, but the following trace recorded with trace_clock=global, which uses sched_clock_local(), gave the final hint: <idle>-0 0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80 <idle>-0 0d..30 400269.477151: hrtimer_start: hrtimer=0xf7061e80 ... irq/20-S-587 1d..32 400273.772118: sched_wakeup: comm= ... target_cpu=0 <idle>-0 0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80 What happens is that CPU0 goes idle and invokes sched_clock_idle_sleep_event() which invokes sched_clock_local() and CPU1 runs a remote wakeup for CPU0 at the same time, which invokes sched_remote_clock(). The time jump gets propagated to CPU0 via sched_remote_clock() and stays stale on both cores for ~4 seconds. There are only two other possibilities, which could cause a stale sched clock: 1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic wrong value. 2) sched_clock() which reads the TSC returns a sporadic wrong value. #1 can be excluded because sched_clock would continue to increase for one jiffy and then go stale. #2 can be excluded because it would not make the clock jump forward. It would just result in a stale sched_clock for one jiffy. After quite some brain twisting and finding the same pattern on other traces, sched_clock_remote() remained the only place which could cause such a problem and as explained above it's indeed racy on 32bit systems. So while on 64bit systems the readout is atomic, we need to verify the remote readout on 32bit machines. We need to protect the local->clock readout in sched_clock_remote() on 32bit as well because an NMI could hit between the low and the high readout, call sched_clock_local() and modify local->clock. Thanks to Siegfried Wulsch for bearing with my debug requests and going through the tedious tasks of running a bunch of reproducer systems to generate the debug information which let me decode the issue. Reported-by: Siegfried Wulsch <Siegfried.Wulsch@rovema.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org | ||
|  Peter Zijlstra | 391e43da79 | sched: Move all scheduler bits into kernel/sched/ There's too many sched*.[ch] files in kernel/, give them their own directory. (No code changed, other than Makefile glue added.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> | 
		Renamed from kernel/sched_clock.c (Browse further)