Commit graph

310675 commits

Author SHA1 Message Date
Ben Skeggs
875ac34aad drm/nouveau/fence: make ttm interfaces wrap ours, not the other way around
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:55:44 +10:00
Ben Skeggs
35bcf5d555 drm/nouveau: move flip-related channel setup to software engine
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:55:43 +10:00
Ben Skeggs
20abd1634a drm/nouveau: create real execution engine for software object class
Just a cleanup more or less, and to remove the need for special handling of
software objects.

This removes a heap of documentation on dma/graph object formats.  The info
is very out of date with our current understanding, and is far better
documented in rnndb in envytools git.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:55:41 +10:00
Chanwoo Choi
29f772d41c mfd: Fix build break of max77693 by adding REGMAP_I2C option
This patch add REGMAP_I2C config option to fix build break
of max77693 mfd driver because max77693 use regmap interface
for i2c communication.

drivers/mfd/max77693.c:103: error: variable 'max77693_regmap_config' has initializer but incomplete type
drivers/mfd/max77693.c:104: error: unknown field 'reg_bits' specified in initializer
drivers/mfd/max77693.c:104: warning: excess elements in struct initializer
drivers/mfd/max77693.c:104: warning: (near initialization for 'max77693_regmap_config')
drivers/mfd/max77693.c:105: error: unknown field 'val_bits' specified in initializer
drivers/mfd/max77693.c:105: warning: excess elements in struct initializer
drivers/mfd/max77693.c:105: warning: (near initialization for 'max77693_regmap_config')
drivers/mfd/max77693.c:106: error: unknown field 'max_register' specified in initializer
drivers/mfd/max77693.c:106: warning: excess elements in struct initializer
drivers/mfd/max77693.c:106: warning: (near initialization for 'max77693_regmap_config')
drivers/mfd/max77693.c: In function 'max77693_i2c_probe':
drivers/mfd/max77693.c:122: error: implicit declaration of function 'devm_regmap_init_i2c'
drivers/mfd/max77693.c:122: warning: assignment makes pointer from integer without a cast

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2012-05-24 08:53:22 +02:00
Ben Skeggs
2cda7f4c5e drm/nvd0/disp: remove unnecessary sync from flip_next
This shouldn't be necessary, I believe this is just a bit of missed debug
code that got left over somehow.

Causes flips to be always synced to vblank, regardless of swap interval,
which we don't want..

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:32:03 +10:00
Ben Skeggs
afada5e0bb drm/nv04/disp: disable vblank interrupts when disabling display
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:32:01 +10:00
Marcin Slusarz
695b95b810 drm/nouveau: base fence timeout on time of emission
Wait loop can be interrupted by signal, so if signals are raised
periodically (e.g. SIGALRM) this loop may never finish. Use
emission time as a base for fence timeout.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:59 +10:00
Ben Skeggs
d58086deaa drm/nv40-50/gr: restructure grctx/prog generation
The conditional definition of the generation helper functions apparently
confuses some IDEs....

Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:58 +10:00
Ben Skeggs
a8f81837c5 drm/nv50/disp: fixup error paths in crtc object creation
Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:56 +10:00
Marcin Slusarz
5ace2c9d6f drm/nouveau: cleanup after display init failure
Depending on exact point of failure, not cleaning would lead to
BUG_ONs/oopses in various distant places.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:54 +10:00
Marcin Slusarz
d37f60c87f drm/nv50: fix ramin heap size for kernel channel too
Port change from "drm/nouveau: Keep RAMIN heap within the channel"
to kernel channel, which has its own ramin heap initialisation.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Younes Manton <younes.m@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:52 +10:00
Ben Skeggs
d8b6624549 drm/nve0/graph: bump hub2gpc buffer size
Reported-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:50 +10:00
Ben Skeggs
6d59702775 drm/nouveau: use the same packet header macros as userspace
Cosmetic cleanup only.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:49 +10:00
Ben Skeggs
78339fb75c drm/nouveau/bios: allow loading alternate vbios image as firmware
Useful for debugging different VBIOS versions.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:47 +10:00
Ben Skeggs
c6b7e89582 drm/nve0/ttm: implement buffer moves with weirdo pcopy-on-pgraph methods
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:45 +10:00
Ben Skeggs
f1c65e7c7f drm/nv50-/fbcon: move 2d class to subchannel 3
Kepler GRAPH has (well, sorta) fixed subchannel<->class assignments, make
this match up to keep it happy without trapping.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:43 +10:00
Ben Skeggs
ab394543dd drm/nve0/gr: initial implementation
This may, perhaps, get re-merged with nvc0_graph.c at some point.  It's
still unclear as to how great an idea that'd be.  Stay tuned...

Completely dependent on firmware blobs from NVIDIA binary driver currently.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:41 +10:00
Ben Skeggs
5132f37700 drm/nve0/fifo: initial implementation
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:39 +10:00
Ben Skeggs
d0f3c7e41d drm/nouveau: give a slightly larger pci(e)gart aperture on all chipsets
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:38 +10:00
Ben Skeggs
78c2018658 drm/nouveau/pm: some more delays for ddr3 reclocking
These numbers from the binary driver's daemon scripts, and fix the transition
to perflvl 3 on my T510.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:36 +10:00
Ben Skeggs
9d6ba0b58c drm/nvc0/pm: very initial mclk freq change
Loads of magic missing, this will probably blow up if you try it.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:34 +10:00
Ben Skeggs
a94ba1fcac drm/nvd9/pm: oops, fix timing calc
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:32 +10:00
Ben Skeggs
6b91d6b056 drm/nvc0/pm: enable mpll src pll, and calc mpll coefficients
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:30 +10:00
Ben Skeggs
a1da205f42 drm/nvc0/pm: start filling in memory reclocking stubs 2012-05-24 16:31:29 +10:00
Ben Skeggs
19a1e47799 drm/nva3/pm: another few magic regs, and slightly better 0x004018 handling
Not entirely convinced 0x004018 transitions are correct yet, but, it's
an improvement.

The 750MHz value comes from fiddling with the binary driver + coolbits on
two different DDR3 NVA8 chipsets (T510 NVS3100M, and NVS300), not a clue
where this number comes from.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:27 +10:00
Ben Skeggs
2b20fd0ab4 drm/nva3/pm: initial attempt at handling 111100/111104
Probably not quite right, but this is enough now to make NVS300 reclock
between all 3 of its perflvls correctly.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:25 +10:00
Ben Skeggs
5f54d29ee9 drm/nva3/pm: make pll->pll mode work
This probably wants a cleanup, but I'm holding off until I know for sure
how the rest of the things that need doing fit together.

Tested on NVS300 by hacking up perflvl 1 to require PLL mode, and switching
between perflvl 3 and 1.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:23 +10:00
Ben Skeggs
001a3990f6 drm/nva3/pm: attempt to bash a few 0x100200 bits correctly
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:21 +10:00
Ben Skeggs
4719b55be5 drm/nva3/pm: begin to restructure memory clock changes + another magic
The binary driver appears to do various bits and pieces of the memory
clock frequency change at different times, depending on the particular
transition that's occuring.  I've attempted to replicate this here
for div->pll, pll->div and div->div transitions.

With some additional (patches upcoming) magic regs being bashed, this
allows me to correctly transition between all 3 perflvls on NVS300.

pll->pll transitions will *not* work correctly at the moment, pending
me tricking the binary driver into doing one and seeing how to correctly
handle it.

This patch also handles (hopefully) 0x1110e0, which appears to need
changing depending on whether in PLL or divider mode.. Maybe.  We'll
see.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:20 +10:00
Ben Skeggs
30e533900e drm/nva3/pm: more random unknown PFB regs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:18 +10:00
Ben Skeggs
27740383dd drm/nva3/pm: initial attempt at more magic PFB regs
The reg calculation may get moved elsewhere at some point, but lets
figure out what exactly we need to do first.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:16 +10:00
Ben Skeggs
65115bb05a drm/nva3/pm: hook up to ram reclocking helper
This gets us a start on memory timings.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:14 +10:00
Ben Skeggs
074e747a6d drm/nva3/pm: introduce more paranoia
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2012-05-24 16:31:12 +10:00
Eric Dumazet
1ca7ee3063 tcp: take care of overlaps in tcp_try_coalesce()
Sergio Correia reported following warning :

WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()

WARN(skb && !before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq),
     "cleanup rbuf bug: copied %X seq %X rcvnxt %X\n",
     tp->copied_seq, TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt);

It appears TCP coalescing, and more specifically commit b081f85c29
(net: implement tcp coalescing in tcp_queue_rcv()) should take care of
possible segment overlaps in receive queue. This was properly done in
the case of out_or_order_queue by the caller.

For example, segment at tail of queue have sequence 1000-2000, and we
add a segment with sequence 1500-2500.
This can happen in case of retransmits.

In this case, just don't do the coalescing.

Reported-by: Sergio Correia <lists@uece.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Sergio Correia <lists@uece.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 00:28:21 -04:00
Yanmin Zhang
e49cc0da72 ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
We hit a kernel OOPS.

<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G        W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.000416]  [<c1238c2a>] __might_sleep+0x10a/0x110
<4>[23900.007357]  [<c1228021>] do_page_fault+0xd1/0x3c0
<4>[23900.013764]  [<c18e9ba9>] ? restore_all+0xf/0xf
<4>[23900.024024]  [<c17c007b>] ? napi_complete+0x8b/0x690
<4>[23900.029297]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.123739]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.128955]  [<c18ea0c3>] error_code+0x5f/0x64
<4>[23900.133466]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.138450]  [<c17f6298>] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312]  [<c17f5f8d>] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730]  [<c17f63df>] ip_route_output_flow+0x1f/0x60
<4>[23900.156261]  [<c181de58>] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960]  [<c18e981f>] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834]  [<c18298d6>] inet_dgram_connect+0x36/0x80
<4>[23900.173224]  [<c14f9e88>] ? _copy_from_user+0x48/0x140
<4>[23900.178817]  [<c17ab9da>] sys_connect+0x9a/0xd0
<4>[23900.183538]  [<c132e93c>] ? alloc_file+0xdc/0x240
<4>[23900.189111]  [<c123925d>] ? sub_preempt_count+0x3d/0x50

Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.

With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.

Thank Eric for very detailed review/checking the issue.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Kun Jiang <kunx.jiang@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 00:28:21 -04:00
Tim Bird
31fe62b958 mm: add a low limit to alloc_large_system_hash
UDP stack needs a minimum hash size value for proper operation and also
uses alloc_large_system_hash() for proper NUMA distribution of its hash
tables and automatic sizing depending on available system memory.

On some low memory situations, udp_table_init() must ignore the
alloc_large_system_hash() result and reallocs a bigger memory area.

As we cannot easily free old hash table, we leak it and kmemleak can
issue a warning.

This patch adds a low limit parameter to alloc_large_system_hash() to
solve this problem.

We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
allocation.

Reported-by: Mark Asselstine <mark.asselstine@windriver.com>
Reported-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 00:28:21 -04:00
David S. Miller
4efcac3a24 sparc: Optimize strncpy_from_user() zero byte search.
Compute a mask that will only have 0x80 in the bytes which
had a zero in them.  The formula is:

	~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f)

In the inner word iteration, we have to compute the "x | 0x7f7f7f7f"
part, so we can reuse that in the above calculation.

Once we have this mask, we perform divide and conquer to find the
highest 0x80 location.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-23 19:20:20 -07:00
Oleg Nesterov
f23ca33546 keys: kill task_struct->replacement_session_keyring
Kill the no longer used task_struct->replacement_session_keyring, update
copy_creds() and exit_creds().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:11:41 -04:00
Oleg Nesterov
dea649b8ac keys: kill the dummy key_replace_session_keyring()
After the previouse change key_replace_session_keyring() becomes a nop.
Remove the dummy definition in key.h and update the callers in
arch/*/kernel/signal.c.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:11:31 -04:00
Oleg Nesterov
413cd3d9ab keys: change keyctl_session_to_parent() to use task_work_add()
Change keyctl_session_to_parent() to use task_work_add() and move
key_replace_session_keyring() logic into task_work->func().

Note that we do task_work_cancel() before task_work_add() to ensure that
only one work can be pending at any time.  This is important, we must not
allow user-space to abuse the parent's ->task_works list.

The callback, replace_session_keyring(), checks PF_EXITING.  I guess this
is not really needed but looks better.

As a side effect, this fixes the (unlikely) race.  The callers of
key_replace_session_keyring() and keyctl_session_to_parent() lack the
necessary barriers, the parent can miss the request.

Now we can remove task_struct->replacement_session_keyring and related
code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:11:23 -04:00
Oleg Nesterov
4d1d61a6b2 genirq: reimplement exit_irq_thread() hook via task_work_add()
exit_irq_thread() and task->irq_thread are needed to handle the unexpected
(and unlikely) exit of irq-thread.

We can use task_work instead and make this all private to
kernel/irq/manage.c, cleanup plus micro-optimization.

1. rename exit_irq_thread() to irq_thread_dtor(), make it
   static, and move it up before irq_thread().

2. change irq_thread() to do task_work_add(irq_thread_dtor)
   at the start and task_work_cancel() before return.

   tracehook_notify_resume() can never play with kthreads,
   only do_exit()->exit_task_work() can call the callback
   and this is what we want.

3. remove task_struct->irq_thread and the special hook
   in do_exit().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:11:12 -04:00
Oleg Nesterov
e73f8959af task_work_add: generic process-context callbacks
Provide a simple mechanism that allows running code in the (nonatomic)
context of the arbitrary task.

The caller does task_work_add(task, task_work) and this task executes
task_work->func() either from do_notify_resume() or from do_exit().  The
callback can rely on PF_EXITING to detect the latter case.

"struct task_work" can be embedded in another struct, still it has "void
*data" to handle the most common/simple case.

This allows us to kill the ->replacement_session_keyring hack, and
potentially this can have more users.

Performance-wise, this adds 2 "unlikely(!hlist_empty())" checks into
tracehook_notify_resume() and do_exit().  But at the same time we can
remove the "replacement_session_keyring != NULL" checks from
arch/*/signal.c and exit_creds().

Note: task_work_add/task_work_run abuses ->pi_lock.  This is only because
this lock is already used by lookup_pi_state() to synchronize with
do_exit() setting PF_EXITING.  Fortunately the scope of this lock in
task_work.c is really tiny, and the code is unlikely anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:09:21 -04:00
Al Viro
62366c88b2 avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers
we need that not just on syscall returns but on irq ones as well...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:09:21 -04:00
Al Viro
617c62a9fc parisc: need to check NOTIFY_RESUME when exiting from syscall
... not just on return from interrupt

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:09:20 -04:00
Al Viro
a42c6ded82 move key_repace_session_keyring() into tracehook_notify_resume()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:09:20 -04:00
Al Viro
1227dd773d TIF_NOTIFY_RESUME is defined on all targets now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23 22:09:19 -04:00
Linus Torvalds
f9369910a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull first series of signal handling cleanups from Al Viro:
 "This is just the first part of the queue (about a half of it);
  assorted fixes all over the place in signal handling.

  This one ends with all sigsuspend() implementations switched to
  generic one (->saved_sigmask-based).

  With this, a bunch of assorted old buglets are fixed and most of the
  missing bits of NOTIFY_RESUME hookup are in place.  Two more fixes sit
  in arm and um trees respectively, and there's a couple of broken ones
  that need obvious fixes - parisc and avr32 check TIF_NOTIFY_RESUME
  only on one of two codepaths; fixes for that will happen in the next
  series"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (55 commits)
  unicore32: if there's no handler we need to restore sigmask, syscall or no syscall
  xtensa: add handling of TIF_NOTIFY_RESUME
  microblaze: drop 'oldset' argument of do_notify_resume()
  microblaze: handle TIF_NOTIFY_RESUME
  score: add handling of NOTIFY_RESUME to do_notify_resume()
  m68k: add TIF_NOTIFY_RESUME and handle it.
  sparc: kill ancient comment in sparc_sigaction()
  h8300: missing checks of __get_user()/__put_user() return values
  frv: missing checks of __get_user()/__put_user() return values
  cris: missing checks of __get_user()/__put_user() return values
  powerpc: missing checks of __get_user()/__put_user() return values
  sh: missing checks of __get_user()/__put_user() return values
  sparc: missing checks of __get_user()/__put_user() return values
  avr32: struct old_sigaction is never used
  m32r: struct old_sigaction is never used
  xtensa: xtensa_sigaction doesn't exist
  alpha: tidy signal delivery up
  score: don't open-code force_sigsegv()
  cris: don't open-code force_sigsegv()
  blackfin: don't open-code force_sigsegv()
  ...
2012-05-23 18:11:45 -07:00
Mel Gorman
05f144a0d5 mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages
Dave Jones' system call fuzz testing tool "trinity" triggered the
following bug error with slab debugging enabled

    =============================================================================
    BUG numa_policy (Not tainted): Poison overwritten
    -----------------------------------------------------------------------------

    INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
    INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
     __slab_alloc+0x3d3/0x445
     kmem_cache_alloc+0x29d/0x2b0
     mpol_new+0xa3/0x140
     sys_mbind+0x142/0x620
     system_call_fastpath+0x16/0x1b
    INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
     __slab_free+0x2e/0x1de
     kmem_cache_free+0x25a/0x260
     __mpol_put+0x27/0x30
     remove_vma+0x68/0x90
     exit_mmap+0x118/0x140
     mmput+0x73/0x110
     exit_mm+0x108/0x130
     do_exit+0x162/0xb90
     do_group_exit+0x4f/0xc0
     sys_exit_group+0x17/0x20
     system_call_fastpath+0x16/0x1b
    INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x          (null) flags=0x20000000004080
    INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0

This implied a reference counting bug and the problem happened during
mbind().

mbind() applies a new memory policy to a range and uses mbind_range() to
merge existing VMAs or split them as necessary.  In the event of splits,
mpol_dup() will allocate a new struct mempolicy and maintain existing
reference counts whose rules are documented in
Documentation/vm/numa_memory_policy.txt .

The problem occurs with shared memory policies.  The vm_op->set_policy
increments the reference count if necessary and split_vma() and
vma_merge() have already handled the existing reference counts.
However, policy_vma() screws it up by replacing an existing
vma->vm_policy with one that potentially has the wrong reference count
leading to a premature free.  This patch removes the damage caused by
policy_vma().

With this patch applied Dave's trinity tool runs an mbind test for 5
minutes without error.  /proc/slabinfo reported that there are no
numa_policy or shared_policy_node objects allocated after the test
completed and the shared memory region was deleted.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-23 17:57:33 -07:00
Arnaldo Carvalho de Melo
a1d44b9acd perf evlist: Explicititely initialize input_name
It was a global variable, so it was initialized, implicitely, to zero by
being placed in the bss.

Now it is just a local variable that is then passed to the __cmd_evlist
routine, so it must be explicitely set to NULL.

The problem manifested on a Fedora 17 system, using:

 gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC)

But not on several other systems, by luck.

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-5e8wolcjs3rgd5i6yi995gfh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-23 21:47:51 -03:00
Linus Torvalds
644473e9c6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
2012-05-23 17:42:39 -07:00