linux-uconsole/fs
Hidetoshi Seto 19eb722b76 sched, cputime: Introduce thread_group_times()
commit 0cf55e1ec0 upstream.

This is a real fix for problem of utime/stime values decreasing
described in the thread:

   http://lkml.org/lkml/2009/11/3/522

Now cputime is accounted in the following way:

 - {u,s}time in task_struct are increased every time when the thread
   is interrupted by a tick (timer interrupt).

 - When a thread exits, its {u,s}time are added to signal->{u,s}time,
   after adjusted by task_times().

 - When all threads in a thread_group exits, accumulated {u,s}time
   (and also c{u,s}time) in signal struct are added to c{u,s}time
   in signal struct of the group's parent.

So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.

And accounted values are used by:

 - task_times(), to get cputime of a thread:
   This function returns adjusted values that originates from raw
   {u,s}time and scaled by sum_exec_runtime that accounted by CFS.

 - thread_group_cputime(), to get cputime of a thread group:
   This function returns sum of all {u,s}time of living threads in
   the group, plus {u,s}time in the signal struct that is sum of
   adjusted cputimes of all exited threads belonged to the group.

The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:

  group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).

To fix this, we could do:

  group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

But task_times() contains hard divisions, so applying it for
every thread should be avoided.

This patch fixes the above problem in the following way:

 - Modify thread's exit (= __exit_signal()) not to use task_times().
   It means {u,s}time in signal struct accumulates raw values instead
   of adjusted values.  As the result it makes thread_group_cputime()
   to return pure sum of "raw" values.

 - Introduce a new function thread_group_times(*task, *utime, *stime)
   that converts "raw" values of thread_group_cputime() to "adjusted"
   values, in same calculation procedure as task_times().

 - Modify group's exit (= wait_task_zombie()) to use this introduced
   thread_group_times().  It make c{u,s}time in signal struct to
   have adjusted values like before this patch.

 - Replace some thread_group_cputime() by thread_group_times().
   This replacements are only applied where conveys the "adjusted"
   cputime to users, and where already uses task_times() near by it.
   (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)

This patch have a positive side effect:

 - Before this patch, if a group contains many short-life threads
   (e.g. runs 0.9ms and not interrupted by ticks), the group's
   cputime could be invisible since thread's cputime was accumulated
   after adjusted: imagine adjustment function as adj(ticks, runtime),
     {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
   After this patch it will not happen because the adjustment is
   applied after accumulated.

v2:
 - remove if()s, put new variables into signal_struct.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-13 13:20:14 -07:00
..
9p 9p: Skip check for mandatory locks when unlocking 2010-04-26 07:41:29 -07:00
adfs adfs: remove redundant test on unsigned 2009-09-24 07:21:05 -07:00
affs fix affs parse_options() 2010-02-09 04:50:48 -08:00
afs FS-Cache: Handle pages pending storage that get evicted under OOM conditions 2009-11-19 18:11:35 +00:00
autofs trivial: remove unnecessary semicolons 2009-09-21 15:14:58 +02:00
autofs4 autofs4 - fix missed case when changing to use struct path 2009-08-31 17:44:05 -10:00
befs befs: fix leak 2010-02-23 07:37:55 -08:00
bfs Fix failure exits in bfs_fill_super() 2010-02-09 04:50:46 -08:00
btrfs Btrfs: kfree correct pointer during mount option parsing 2010-08-13 13:20:12 -07:00
cachefiles CacheFiles: Fix error handling in cachefiles_determine_cache_security() 2010-05-26 14:29:20 -07:00
cifs CIFS: Fix compile error with __init in cifs_init_dns_resolver() definition 2010-08-10 10:20:46 -07:00
coda headers: remove sched.h from poll.h 2009-10-04 15:05:10 -07:00
configfs writeback: add name to backing_dev_info 2009-09-11 09:20:26 +02:00
cramfs
debugfs debugfs: fix create mutex racy fops and private data 2009-12-18 14:04:16 -08:00
devpts devpts_get_tty() should validate inode 2009-12-18 14:04:15 -08:00
dlm dlm: fix socket fd translation 2009-09-30 12:19:44 -05:00
ecryptfs fs/ecryptfs/file.c: introduce missing free 2010-08-13 13:19:38 -07:00
efs get rid of BKL in fs/efs 2009-06-17 00:36:36 -04:00
exofs exofs: confusion between kmap() and kmap_atomic() api 2010-07-05 11:10:47 -07:00
exportfs
ext2 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-09-24 07:53:22 -07:00
ext3 ext3: journal all modifications in ext3_xattr_set_handle 2010-04-26 07:41:21 -07:00
ext4 ext4: fix freeze deadlock under IO 2010-08-13 13:19:51 -07:00
fat fat: fix buffer overflow in vfat_create_shortname() 2010-04-26 07:41:13 -07:00
freevxfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
fscache FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n 2009-11-20 21:50:44 +00:00
fuse mm: flush dcache before writing into page to avoid alias 2010-02-09 04:50:59 -08:00
gfs2 GFS2: rename causes kernel Oops 2010-08-10 10:20:45 -07:00
hfs hfs: fix a potential buffer overflow 2009-12-18 14:04:08 -08:00
hfsplus hfsplus: refuse to mount volumes larger than 2TB 2009-10-29 07:39:27 -07:00
hostfs hostfs: set maximum filesize in superblock for proper LFS support 2009-06-30 18:56:03 -07:00
hpfs headers: smp_lock.h redux 2009-07-12 12:22:34 -07:00
hppfs
hugetlbfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-09-24 08:32:11 -07:00
isofs fs: Make unload_nls() NULL pointer safe 2009-09-24 07:47:42 -04:00
jbd jbd: jbd-debug and jbd2-debug should be writable 2010-07-05 11:11:20 -07:00
jbd2 ext4, jbd2: Add barriers for file systems with exernal journals 2010-08-02 10:21:10 -07:00
jffs2 jffs2: Fix long-standing bug with symlink garbage collection. 2009-12-18 14:05:52 -08:00
jfs jfs: don't allow os2 xattr namespace overlap with others 2010-08-13 13:19:48 -07:00
lockd headers: utsname.h redux 2009-09-23 18:13:10 -07:00
minix V3 minixfs: add missing directory type checking 2009-09-23 07:39:57 -07:00
ncpfs const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
nfs NFS: kswapd must not block in nfs_release_page 2010-08-10 10:20:36 -07:00
nfs_common
nfsd NFSD: don't report compiled-out versions as present 2010-07-05 11:10:27 -07:00
nilfs2 nilfs2: fix sync silent failure 2010-05-26 14:29:21 -07:00
nls Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 2009-09-30 09:31:14 -07:00
notify inotify: don't leak user struct on inotify release 2010-05-26 14:29:17 -07:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-09-24 08:32:11 -07:00
ocfs2 ocfs2_dlmfs: Fix math error when reading LVB. 2010-05-12 14:57:04 -07:00
omfs const: constify remaining file_operations 2009-10-01 16:11:11 -07:00
openpromfs
partitions fs/partition/msdos: fix unusable extended partition for > 512B sector 2010-04-01 15:58:44 -07:00
proc sched, cputime: Introduce thread_group_times() 2010-08-13 13:20:14 -07:00
qnx4 qnx4: remove write support 2009-09-23 07:39:30 -07:00
quota quota: Fix possible dq_flags corruption 2010-04-26 07:41:29 -07:00
ramfs truncate: use new helpers 2009-09-24 08:41:47 -04:00
reiserfs reiserfs: fix corruption during shrinking of xattrs 2010-05-12 14:57:01 -07:00
romfs fix leak in romfs_fill_super() 2010-02-09 04:50:47 -08:00
smbfs fs: Make unload_nls() NULL pointer safe 2009-09-24 07:47:42 -04:00
squashfs const: mark remaining super_operations const 2009-09-22 07:17:24 -07:00
sysfs sysfs: sysfs_sd_setattr set iattrs unconditionally 2010-02-23 07:37:56 -08:00
sysv get rid of BKL in fs/sysv 2009-06-17 00:36:37 -04:00
ubifs const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
udf udf: Try harder when looking for VAT inode 2010-01-06 15:05:00 -08:00
ufs ufs: sector_t cannot be negative 2009-06-18 13:03:46 -07:00
xfs xfs: prevent swapext from operating on write-only files 2010-08-10 10:20:44 -07:00
aio.c aio.c: move EXPORT* macros to line after function 2009-09-23 07:39:29 -07:00
anon_inodes.c headers: remove sched.h from poll.h 2009-10-04 15:05:10 -07:00
attr.c truncate: new helpers 2009-09-24 08:41:47 -04:00
bad_inode.c
binfmt_aout.c Split 'flush_old_exec' into two functions 2010-02-09 04:50:49 -08:00
binfmt_elf.c Split 'flush_old_exec' into two functions 2010-02-09 04:50:49 -08:00
binfmt_elf_fdpic.c Split 'flush_old_exec' into two functions 2010-02-09 04:50:49 -08:00
binfmt_em86.c
binfmt_flat.c Split 'flush_old_exec' into two functions 2010-02-09 04:50:49 -08:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c Split 'flush_old_exec' into two functions 2010-02-09 04:50:49 -08:00
bio-integrity.c block: fix bugs in bio-integrity mempool usage 2010-02-09 04:50:58 -08:00
bio.c block: fix bio_add_page for non trivial merge_bvec_fn case 2010-02-09 04:50:58 -08:00
block_dev.c blkdev: cgroup whitelist permission fix 2010-08-13 13:19:37 -07:00
buffer.c Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block 2009-09-25 09:27:30 -07:00
char_dev.c fs/char_dev.c: remove useless loop 2009-09-24 07:21:03 -07:00
compat.c revert "procfs: provide stack information for threads" and its fixup commits 2010-05-26 14:29:19 -07:00
compat_binfmt_elf.c
compat_ioctl.c fs: add missing compat_ptr handling for FS_IOC_RESVSP ioctl 2009-11-12 07:25:57 -08:00
dcache.c sched: Pull up the might_sleep() check into cond_resched() 2009-07-18 15:51:44 +02:00
dcookies.c
direct-io.c
drop_caches.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
eventfd.c anonfd: split interface into file creation and install 2009-09-23 07:39:29 -07:00
eventpoll.c epoll: fix nested calls support 2009-06-18 13:03:41 -07:00
exec.c revert "procfs: provide stack information for threads" and its fixup commits 2010-05-26 14:29:19 -07:00
fcntl.c Fix race in tty_fasync() properly 2010-02-23 07:37:44 -08:00
fifo.c
file.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
file_table.c vfs: take f_lock on modifying f_mode after open time 2010-03-15 08:49:37 -07:00
filesystems.c
fs-writeback.c writeback: disable periodic old data writeback for !dirty_writeback_centisecs 2010-07-05 11:10:45 -07:00
fs_struct.c
generic_acl.c
inode.c vfs: optimize touch_time() too 2009-09-24 07:47:27 -04:00
internal.h fs: fix overflow in sys_mount() for in-kernel calls 2009-09-24 08:40:15 -04:00
ioctl.c __generic_block_fiemap(): fix for files bigger than 4GB 2009-11-12 07:26:01 -08:00
ioprio.c
Kconfig powerpc: Cleanup Kconfig selection of hugetlbfs support 2009-10-30 15:03:54 +11:00
Kconfig.binfmt
libfs.c wrong type for 'magic' argument in simple_fill_super() 2010-07-05 11:11:12 -07:00
locks.c const: make lock_manager_operations const 2009-09-22 07:17:25 -07:00
Makefile
mbcache.c
mpage.c
namei.c fix LOOKUP_FOLLOW on automount "symlinks" 2010-03-15 08:49:32 -07:00
namespace.c vfs: add NOFOLLOW flag to umount(2) 2010-07-05 11:11:15 -07:00
nfsctl.c
no-block.c
open.c fs: change sys_truncate length parameter type 2009-09-23 09:21:05 -07:00
pipe.c fs: pipe.c null pointer dereference 2009-10-22 08:11:44 +09:00
pnode.c
pnode.h
posix_acl.c
read_write.c vfs: remove redundant position check in do_sendfile 2009-09-24 07:47:34 -04:00
read_write.h
readdir.c
select.c headers: remove sched.h from poll.h 2009-10-04 15:05:10 -07:00
seq_file.c vfs: seq_file: add helpers for data filling 2009-09-24 07:47:35 -04:00
signalfd.c signalfd: fill in ssi_int for posix timers and message queues 2010-08-13 13:19:39 -07:00
splice.c splice: fix misuse of SPLICE_F_NONBLOCK 2010-08-13 13:19:35 -07:00
stack.c
stat.c Add unlocked version of inode_add_bytes() function 2010-01-06 15:05:01 -08:00
super.c vfs: get_sb_single() - do not pass options twice 2010-01-28 15:00:47 -08:00
sync.c fs/buffer.c: clean up EXPORT* macros 2009-09-23 07:39:29 -07:00
timerfd.c
utimes.c
xattr.c VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. 2009-09-10 10:11:22 +10:00
xattr_acl.c