linux-uconsole/include
Boqun Feng ec81048cc3 sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
stack allocation of the caller. However, on some compilers, a temporary
structure was allocated for the return value of
COMPLETION_INITIALIZER_ONSTACK().

For example in write_journal() with LOCKDEP_COMPLETIONS=y (GCC is 7.1.1):

	io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
	    2462:       e8 00 00 00 00          callq  2467 <write_journal+0x47>
	    2467:       48 8d 85 80 fd ff ff    lea    -0x280(%rbp),%rax
	    246e:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
	    2475:       48 c7 c2 00 00 00 00    mov    $0x0,%rdx
		x->done = 0;
	    247c:       c7 85 90 fd ff ff 00    movl   $0x0,-0x270(%rbp)
	    2483:       00 00 00
		init_waitqueue_head(&x->wait);
	    2486:       48 8d 78 18             lea    0x18(%rax),%rdi
	    248a:       e8 00 00 00 00          callq  248f <write_journal+0x6f>
		if (commit_start + commit_sections <= ic->journal_sections) {
	    248f:       41 8b 87 a8 00 00 00    mov    0xa8(%r15),%eax
		io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
	    2496:       48 8d bd e8 f9 ff ff    lea    -0x618(%rbp),%rdi
	    249d:       48 8d b5 90 fd ff ff    lea    -0x270(%rbp),%rsi
	    24a4:       b9 17 00 00 00          mov    $0x17,%ecx
	    24a9:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
		if (commit_start + commit_sections <= ic->journal_sections) {
	    24ac:       41 39 c6                cmp    %eax,%r14d
		io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
	    24af:       48 8d bd 90 fd ff ff    lea    -0x270(%rbp),%rdi
	    24b6:       48 8d b5 e8 f9 ff ff    lea    -0x618(%rbp),%rsi
	    24bd:       b9 17 00 00 00          mov    $0x17,%ecx
	    24c2:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)

We can obviously see the temporary structure allocated, and the compiler
also does two meaningless memcpy with "rep movsq".

And according to:

	https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs

The return value of a statement expression is returned by value, so the
temporary variable is created in COMPLETION_INITIALIZER_ONSTACK(), and
that's why the temporary structures are allocted.

To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK()
return a pointer and dereference it outside the block rather than return
the whole structure, in this way, we are able to teach the compiler not
to do the unnecessary stack allocation.

This could also reduce the stack size even if !LOCKDEP, for example in
write_journal(), compiled with gcc 7.1.1, the result of command:

	 objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86

before:

	0x0000246a write_journal [dm-integrity.o]:              696

after:

	0x00002b7a write_journal [dm-integrity.o]:              296

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/20170823152542.5150-3-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 15:14:38 +02:00
..
acpi ACPI: NUMA: add missing include in acpi_numa.h 2017-07-24 22:27:43 +02:00
asm-generic futex: Remove duplicated code and fix undefined behaviour 2017-08-25 22:49:59 +02:00
clocksource
crypto
drm i915, amd and some core fixes + mediatek color support 2017-07-13 11:26:18 -07:00
dt-bindings Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-07-15 10:59:54 -07:00
keys
kvm KVM: arm/arm64: PMU: Fix overflow interrupt injection 2017-07-25 14:18:01 +01:00
linux sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK() 2017-08-29 15:14:38 +02:00
math-emu
media media: platform: davinci: drop VPFE_CMD_S_CCDC_RAW_PARAMS 2017-07-26 06:14:33 -04:00
memory
misc
net net_sched: fix order of queue length updates in qdisc_replace() 2017-08-20 20:02:00 -07:00
pcmcia
ras
rdma IB/core: Avoid accessing non-allocated memory when inferring port type 2017-08-24 15:33:33 -04:00
rxrpc
scsi SCSI fixes on 20170823 2017-08-23 11:34:40 -07:00
soc
sound ASoC: fix pcm-creation regression 2017-07-17 15:50:32 +01:00
target iscsi-target: Fix iscsi_np reset hung task during parallel delete 2017-08-06 14:41:41 -07:00
trace ext4: remove unused metadata accounting variables 2017-07-30 22:30:11 -04:00
uapi drm/msm: Remove __user from __u64 data types 2017-08-01 19:11:48 -04:00
video
xen xen/balloon: don't online new memory initially 2017-07-23 08:13:18 +02:00