linux-uconsole/kernel
Srivatsa S. Bhat 8f48f1a28e CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume
commit d35be8bab9 upstream.

In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.

However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.

In order to achieve that, do the following:

1. Don't modify cpusets during suspend/resume. At all.
   In particular, don't move the tasks from one cpuset to another, and
   don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
   during the CPU hotplug operations that are carried out in the
   suspend/resume path.

2. However, cpusets and sched domains are related. We just want to avoid
   altering cpusets alone. So, to keep the sched domains updated, build
   a single sched domain (containing all active cpus) during each of the
   CPU hotplug operations carried out in s/r path, effectively ignoring
   the cpusets' cpus_allowed masks.

   (Since userspace is frozen while doing all this, it will go unnoticed.)

3. During the last CPU online operation during resume, build the sched
   domains by looking up the (unaltered) cpusets' cpus_allowed masks.
   That will bring back the system to the same original state as it was in
   before suspend.

Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13 05:28:15 +09:00
..
debug kgdb,debug_core: pass the breakpoint struct instead of address and memory 2012-04-13 08:14:07 -07:00
events perf_event: Switch to internal refcount, fix race with close() 2012-10-02 09:47:24 -07:00
gcov gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL 2011-06-15 20:04:01 -07:00
irq random: remove rand_initialize_irq() 2012-08-15 12:04:28 -07:00
power ftrace: Disable function tracing during suspend/resume and hibernation, again 2012-08-09 08:27:35 -07:00
time time: Move ktime_t overflow checking into timespec_valid_strict 2012-10-02 09:47:52 -07:00
trace tracing: change CPU ring buffer state from tracing_cpumask 2012-07-16 08:47:51 -07:00
.gitignore
acct.c
async.c Fix a dead loop in async_synchronize_full() 2012-10-02 09:47:41 -07:00
audit.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
audit.h audit: make functions static 2010-10-30 01:42:19 -04:00
audit_tree.c audit: fix refcounting in audit-tree 2012-09-14 10:00:38 -07:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c audit: acquire creds selectively to reduce atomic op overhead 2011-04-27 15:11:03 +02:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c Merge branch 'master' into next 2011-05-19 18:51:57 +10:00
cgroup.c cgroup: fix to allow mounting a hierarchy by name 2012-01-12 11:35:08 -08:00
cgroup_freezer.c cgroup_freezer: fix freezing groups with stopped tasks 2011-12-09 08:52:27 -08:00
compat.c compat: Fix RT signal mask corruption via sigprocmask 2012-05-21 09:40:04 -07:00
configs.c
cpu.c PM / Sleep: Fix race between CPU hotplug and freezer 2012-01-12 11:35:46 -08:00
cpuset.c CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume 2012-10-13 05:28:15 +09:00
crash_dump.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
cred.c cred: copy_process() should clear child->replacement_session_keyring 2012-04-13 08:14:08 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c sched: Fix ancient race in do_exit() 2012-10-02 09:47:55 -07:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c cpuset: mm: reduce large amounts of memory barrier related damage v3 2012-08-01 12:27:20 -07:00
freezer.c Freezer: Use SMP barriers 2011-05-17 23:19:17 +02:00
futex.c futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() 2012-08-09 08:27:54 -07:00
futex_compat.c futex: Do not leak robust list to unprivileged process 2012-04-22 16:21:45 -07:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-19 08:58:46 -07:00
hung_task.c hung_task: fix false positive during vfork 2012-01-06 14:14:13 -08:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c jump_label: jump_label_inc may return before the code is patched 2011-12-09 08:52:50 -08:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt
kexec.c PM: Remove sysdev suspend, resume and shutdown operations 2011-05-11 21:37:15 +02:00
kfifo.c
kmod.c kmod: prevent kmod_loop_msg overflow in __request_module() 2011-11-11 09:35:48 -08:00
kprobes.c kprobes: adjust "fix a memory leak in function pre_handler_kretprobe()" 2012-03-12 10:32:57 -07:00
ksysfs.c kernel/ksysfs.c: expose file_caps_enabled in sysfs 2011-04-19 16:45:51 -07:00
kthread.c cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed 2011-05-28 17:02:57 +02:00
latencytop.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep.c lockdep: Fix lock_is_held() on recursion 2011-06-07 12:25:50 +02:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
Makefile cgroup: remove the ns_cgroup 2011-05-26 17:12:34 -07:00
module.c module: Remove module size limit 2012-04-02 09:27:20 -07:00
mutex-debug.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c lockdep, mutex: provide mutex_lock_nest_lock 2011-05-25 08:39:17 -07:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c
nsproxy.c cgroup: remove the ns_cgroup 2011-05-26 17:12:34 -07:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep 2012-02-13 11:06:10 -08:00
params.c params.c: Use new strtobool function to process boolean inputs 2011-05-19 16:55:28 +09:30
pid.c next_pidmap: fix overflow condition 2011-04-18 10:35:30 -07:00
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
pm_qos_params.c Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 2011-05-29 11:18:09 -07:00
posix-cpu-timers.c cputimer: Cure lock inversion 2011-10-25 07:10:14 +02:00
posix-timers.c posix-timers: RCU conversion 2011-05-24 12:10:51 +02:00
printk.c cap_syslog: don't use WARN_ONCE for CAP_SYS_ADMIN deprecation warning 2012-02-03 09:18:57 -08:00
profile.c kernel/profile.c: remove some duplicate code from profile_hits() 2011-05-26 17:12:37 -07:00
ptrace.c ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread 2011-05-25 19:20:21 +02:00
range.c kernel/range.c: fix clean_sort_range() for the case of full array 2010-11-12 07:55:31 -08:00
rcupdate.c rcu: Use WARN_ON_ONCE for DEBUG_OBJECTS_RCU_HEAD warnings 2011-05-05 23:16:57 -07:00
rcutiny.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
rcutiny_plugin.h rcu: Converge TINY_RCU expedited and normal boosting 2011-05-05 23:16:58 -07:00
rcutorture.c rcu: mark rcutorture boosting callback as being on-stack 2011-05-05 23:16:57 -07:00
rcutree.c rcu: Fix day-one dyntick-idle stall-warning bug 2012-10-13 05:28:11 +09:00
rcutree.h rcu: Move RCU_BOOST #ifdefs to header file 2011-06-16 16:12:05 -07:00
rcutree_plugin.h softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
rcutree_trace.c rcu: use softirq instead of kthreads except when RCU_BOOST=y 2011-06-15 23:07:21 -07:00
relay.c relay: prevent integer overflow in relay_open() 2012-02-20 12:48:10 -08:00
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c resource: ability to resize an allocated resource 2011-07-06 10:54:08 -07:00
rtmutex-debug.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex.h
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rwsem.c
sched.c CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume 2012-10-13 05:28:15 +09:00
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched, autogroup: Stop going ahead if autogroup is disabled 2011-02-23 11:33:59 +01:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
sched_fair.c sched: Break out cpu_power from the sched_group structure 2011-07-20 18:32:40 +02:00
sched_features.h sched: Allow for overlapping sched_domain spans 2011-07-20 18:32:41 +02:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW 2012-02-13 11:06:08 -08:00
sched_stats.h sched: More sched_domain iterations fixes 2011-05-28 17:02:54 +02:00
sched_stoptask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
seccomp.c
semaphore.c
signal.c ptrace: don't clear GROUP_STOP_SIGMASK on double-stop 2011-11-11 09:36:23 -08:00
smp.c generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts 2011-06-17 10:17:12 +02:00
softirq.c softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c
stop_machine.c x86, mtrr: lock stop machine during MTRR rendezvous sequence 2011-08-29 13:29:08 -07:00
sys.c kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() 2012-10-13 05:28:03 +09:00
sys_ni.c ipc: Add missing sys_ni entries for ipc/compat.c functions 2011-05-20 13:53:02 -07:00
sysctl.c sysctl: fix write access to dmesg_restrict/kptr_restrict 2012-04-13 08:14:07 -07:00
sysctl_binary.c binary_sysctl(): fix memory leak 2012-01-06 14:13:50 -08:00
sysctl_check.c sysctl_check: drop dead code 2011-03-23 19:46:51 -07:00
taskstats.c Make TASKSTATS require root access 2011-12-21 12:57:40 -08:00
test_kprobes.c
time.c time: Change jiffies_to_clock_t() argument type to unsigned long 2011-11-11 09:35:52 -08:00
timeconst.pl
timer.c timers: Consider slack value in mod_timer() 2011-06-03 15:02:32 +02:00
tracepoint.c jump label: Introduce static_branch() interface 2011-04-04 12:48:08 -04:00
tsacct.c taskstats: use real microsecond granularity for CPU times 2010-10-27 18:03:17 -07:00
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
utsname.c ns proc: Add support for the uts namespace 2011-05-10 14:35:35 -07:00
utsname_sysctl.c
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c kernel/watchdog.c: Use proper ANSI C prototypes 2011-05-23 21:07:40 -07:00
workqueue.c workqueue: add missing smp_wmb() in process_one_work() 2012-10-13 05:28:03 +09:00
workqueue_sched.h