We currently have an explicit "needs_post" vtable method which
returns a stack variable for whether we should later run
post-schedule. This leads to an awkward exchange of the
variable as it bubbles back up out of the context switch. Peter
Zijlstra observed that this information could be stored in the
run-queue itself instead of handled on the stack.
Therefore, we revert to the method of having context_switch
return void, and update an internal rq->post_schedule variable
when we require further processing.
In addition, we fix a race condition where we try to access
current->sched_class without holding the rq->lock. This is
technically racy, as the sched-class could change out from
under us. Instead, we reference the per-rq post_schedule
variable with the runqueue unlocked, but with preemption
disabled to see if we need to reacquire the rq->lock.
Finally, we clean the code up slightly by removing the #ifdef
CONFIG_SMP conditionals from the schedule() call, and implement
some inline helper functions instead.
This patch passes checkpatch, and rt-migrate.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The might_sleep() test inside cond_resched_lock() assumes the
spinlock is held and then preemption is disabled. This is true
with CONFIG_PREEMPT but the preempt_count() doesn't change
otherwise.
Check by starting from the appropriate preempt offset depending
on the config.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248458723-12146-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to be able to distinguish between no samples due to
inactivity and no samples due to task ended, Arjan asked for
PERF_EVENT_EXIT events. This is useful to the boot delay
instrumentation (bootchart) app.
This patch changes the PERF_EVENT_FORK to be emitted on every
clone, and adds PERF_EVENT_EXIT to be emitted on task exit,
after the task's counters have been closed.
This task tracing is controlled through: attr.comm || attr.mmap
and through the new attr.task field.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ cleaned up perf_counter.h a bit ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce blk_limits_io_min() and make blk_queue_io_min() call it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds new line discipline name an number to include/linux/tty.h. The
line discipline will be used by the Amstrad E3 (Delta) sound driver that will
come next in this series of patches.
Created against linux-2.6.31-rc3.
Applies to linux-omap-2.6 commit 7c5cb7862d32cb344be7831d466535d5255e35ac as
well.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
io context: fix ref counting
block: make the end_io functions be non-GPL exports
block: fix improper kobject release in blk_integrity_unregister
block: always assign default lock to queues
mg_disk: Add missing ready status check on mg_write()
mg_disk: fix issue with data integrity on error in mg_write()
mg_disk: fix reading invalid status when use polling driver
mg_disk: remove prohibited sleep operation
This helps CODECs with sparse register maps work better with the
register cache display interface.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
To fix the common case where ->enable() does not set up
mult, make sure mult_orig is saved in mult on disable.
Also add comments to explain why we do this.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: johnstul@us.ibm.com
Cc: lethal@linux-sh.org
Cc: akpm@linux-foundation.org
LKML-Reference: <20090618152432.10136.9932.sendpatchset@rx1.opensource.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the code allready uses flush_kernel_dcache_page(). This patch updates the
driver to the recent sg API changes which require that either SG_MITER_TO_SG
or SG_MITER_FROM_SG is set. SG_MITER_TO_SG calls flush_kernel_dcache_page()
in sg_mitter_stop()
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
sg_miter_start() is currently unaware of the direction of the copy
process (to or from the scatter list). It is important to know the
direction because the page has to be flushed in case the data written
is seen on a different mapping in user land on cache incoherent
architectures.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Choose saner defaults for xfrm[4|6] gc_thresh values on init
Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024). Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.
For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache. Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.
For ipv6, its a bit trickier. there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Header to define the standard registers for an
PL093 SSMC memory controller.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
I've been doing this for years, and akpm picked me up on it about 12
months ago. lguest partly serves as example code, so let's do it Right.
Also, remove two unused fields in struct vblk_info in the example launcher.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
I don't really notice it (except to begrudge the extra vertical
space), but Ingo does. And he pointed out that one excuse of lguest
is as a teaching tool, it should set a good example.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Once a structure goes over PAGE_SIZE*2, we see occasional allocation
failures. Some people have chosen to switch over to things like vmalloc()
that will let them keep array-like access to such a large structures.
But, vmalloc() has plenty of downsides.
Here's an alternative. I think it's what Andrew was suggesting here:
http://lkml.org/lkml/2009/7/2/518
I call it a flexible array. It does all of its work in PAGE_SIZE bits, so
never does an order>0 allocation. The base level has
PAGE_SIZE-2*sizeof(int) bytes of storage for pointers to the second level.
So, with a 32-bit arch, you get about 4MB (4183112 bytes) of total
storage when the objects pack nicely into a page. It is half that on
64-bit because the pointers are twice the size. There's a table detailing
this in the code.
There are kerneldocs for the functions, but here's an
overview:
flex_array_alloc() - dynamically allocate a base structure
flex_array_free() - free the array and all of the
second-level pages
flex_array_free_parts() - free the second-level pages, but
not the base (for static bases)
flex_array_put() - copy into the array at the given index
flex_array_get() - copy out of the array at the given index
flex_array_prealloc() - preallocate the second-level pages
between the given indexes to
guarantee no allocs will occur at
put() time.
We could also potentially just pass the "element_size" into each of the
API functions instead of storing it internally. That would get us one
more base pointer on 32-bit.
I've been testing this by running it in userspace. The header and patch
that I've been using are here, as well as the little script I'm using to
generate the size table which goes in the kerneldocs.
http://sr71.net/~dave/linux/flexarray/
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found with make headers_check
/usr/include/linux/pps.h:52: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit ec64f51545 ("cgroup: fix
frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg)
doesn't return -EBUSY by temporary ref counts. That commit expects all
refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can
wait permanently. This patch tries to fix that and change followings.
- set CGRP_WAIT_ON_RMDIR flag before pre_destroy().
- clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case.
if there are sleeping ones, wakes them up.
- rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set.
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Balbir Sigh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bug was introduced by commit cc31edceee
("cgroups: convert tasks file to use a seq_file with shared pid array").
We cache a pid array for all threads that are opening the same "tasks"
file, but the pids in the array are always from the namespace of the
last process that opened the file, so all other threads will read pids
from that namespace instead of their own namespaces.
To fix it, we maintain a list of pid arrays, which is keyed by pid_ns.
The list will be of length 1 at most time.
Reported-by: Paul Menage <menage@google.com>
Idea-by: Paul Menage <menage@google.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Serge Hallyn <serue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we now have handlers IWESSID for all modes, we can
combine them into one.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since we now have IWAP handlers for all modes, we can
combine them into one.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Until now we implemented iwfreq for managed mode, we
needed to keep the implementations separate, but now
that we have all versions implemented we can combine
them and export just one handler.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'drm-radeon-kms' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (35 commits)
drm/radeon: set fb aperture sizes for framebuffer handoff.
drm/ttm: fix highuser vs dma32 confusion.
drm/radeon: Fix size used for benchmarking BO copies.
drm/radeon: Add radeon.test parameter for running BO GPU copy tests.
drm/radeon/kms: allow interruptible waits for objects.
drm/ttm: powerpc: Fix Highmem cache flushing.
x86: Export kmap_atomic_prot() needed for TTM.
drm/ttm: Fix ttm in-kernel copying of pages with non-standard caching attributes.
drm/ttm: Fix an oops and sync object leak.
drm/radeon/kms: vram sizing on certain r100 chips needs workaround.
drm/radeon: Pay more attention to object placement requested by userspace.
drm/radeon: Fall back to evicting BOs with memcpy if necessary.
drm/radeon: Don't unreserve twice on failure to validate.
drm/radeon/kms: fix bandwidth computation on avivo hardware
drm/radeon/kms: add initial colortiling support.
drm/radeon/kms: fix hotspot handling on pre-avivo chips
drm/radeon/kms: enable frac fb divs on rs600/rs690/rs740
drm/radeon/kms: add PLL flag to prefer frequencies <= the target freq
drm/radeon/kms: block RN50 from using 3D engine.
drm/radeon/kms: fix VRAM sizing like DDX does it.
...
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: accept late unlocking of HPA
libata: Updates and fixes for pata_at91 driver
ata_piix: Add new short cable ID
ata_piix: Add new laptop short cable IDs
ahci: add device IDs for Ibex Peak ahci controllers
libata: remove superfluous NULL pointer checks
libata: add missing NULL pointer check to ata_eh_reset()
pata_pcmcia: add CNF-CDROM-ID
We really don't want to mark the pty as a low-latency device, because as
Alan points out, the ->write method can be called from an IRQ (ppp?),
and that means we can't use ->low_latency=1 as we take mutexes in the
low_latency case.
So rather than using low_latency to force the written data to be pushed
to the ldisc handling at 'write()' time, just make the reader side (or
the poll function) do the flush when it checks whether there is data to
be had.
This also fixes the problem with lost data in an emacs compile buffer
(bugzilla 13815), and we can thus revert the low_latency pty hack
(commit 3a54297478: "pty: quickfix for the
pty ENXIO timing problems").
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[ Modified to do the tty_flush_to_ldisc() inside input_available_p() so
that it triggers for both read and poll() - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create bdgrab(). This function copies an existing reference to a
block_device. It is safe to call from any context.
Hibernation code wishes to copy a reference to the active swap device.
Right now it calls bdget() under a spinlock, but this is wrong because
bdget() can sleep. It doesn't need a full bdget() because we already
hold a reference to active swap devices (and the spinlock protects
against swapoff).
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13827
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This adds new set/get tiling interfaces where the pitch
and macro/micro tiling enables can be set. Along with
a flag to decide if this object should have a surface when mapped.
The only thing we need to allocate with a mapped surface should be
the frontbuffer. Note rotate scanout shouldn't require one, and
back/depth shouldn't either, though mesa needs some fixes.
It fixes the TTM interfaces along Thomas's suggestions, and I've tested
the surface stealing code with two X servers and not seen any lockdep issues.
I've stopped tiling the fbcon frontbuffer, as I don't see there being
any advantage other than testing, I've left the testing commands in there,
just flip the fb_tiled to true in radeon_fb.c
Open: Can we integrate endian swapping in with this?
Future features:
texture tiling - need to relocate texture registers TXOFFSET* with tiling info.
This also merges Michel's cleanup surfaces regs at init time patch
even though it makes sense on its own, this patch really relies on it.
Some PowerMac firmwares set up a tiling surface at the beginning of VRAM
which messes us up otherwise.
that patch is:
Signed-off-by: Michel Dänzer <daenzer@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On certain configurations, HPA isn't or can't be unlocked during
probing but it somehow ends up unlocked afterwards. In the following
thread, the problem can be reliably reproduced after resuming from
STR. The BIOS turns on HPA during boot but forgets to do it during
resume.
http://thread.gmane.org/gmane.linux.kernel/858310
This patch updates libata revalidation such that it considers native
n_sectors. If the device size has increased to match native
n_sectors, it's assumed that HPA has been unlocked involuntarily and
the device is recognized as the same one. This should be fairly safe
while nicely working around the problem.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christof Warlich <christof@warlich.name>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Instead of trying to share the generic 4.1 reply cache code for the
CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION
separately.
The nfs41 single slot clientid DRC holds the results of create session
processing. CREATE_SESSION can be preceeded by a SEQUENCE operation
(an embedded CREATE_SESSION) and the create session single slot cache must be
maintained. nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not
implement the replay of an embedded CREATE_SESSION.
The clientid DRC slot does not need the inuse, cachethis or other fields that
the multiple slot session cache uses. Replace the clientid DRC cache struct
nfs4_slot cache with a new nfsd4_clid_slot cache. Save the xdr struct
nfsd4_create_session into the cache at the end of processing, and on a replay,
replace the struct for the replay request with the cached version all while
under the state lock.
nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case
via the normal use of encode_operation.
Errors that do not change the create session cache:
A create session NFS4ERR_STALE_CLIENTID error means that a client record
(and associated create session slot) could not be found and therefore can't
be changed. NFSERR_SEQ_MISORDERED errors do not change the slot cache.
All other errors get cached.
Remove the clientid DRC specific check in nfs4svc_encode_compoundres to
put the session only if cstate.session is set which will now always be true.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NFSD_SLOT_CACHE_SIZE is the size of all encoded operation responses
(excluding the sequence operation) that we want to cache.
For now, keep NFSD_SLOT_CACHE_SIZE at PAGE_SIZE. It will be reduced
when the DRC is changed from page based to memory based.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This converts swiotlb to use phys_to_dma and dma_to_phys instead of
swiotlb_phys_to_bus() and swiotlb_bus_to_phys().
swiotlb_phys_to_bus() and swiotlb_bus_to_phys() are not necessary so
this patch also removes them.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
is_buffer_dma_capable() was replaced with dma_capable().
is_buffer_dma_capable() tells if a buffer is dma-capable or
not. However, it doesn't take a pointer to struct device so it doesn't
work for POWERPC.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Even though reverse path filter was changed from simple boolean to
trinary control, the loose mode only works if both all and device are
configured because of this logic error.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: use GFP_NOFS under potential memory pressure
fsnotify: fix inotify tail drop check with path entries
inotify: check filename before dropping repeat events
fsnotify: use def_bool in kconfig instead of letting the user choose
inotify: fix error paths in inotify_update_watch
inotify: do not leak inode marks in inotify_add_watch
inotify: drop user watch count when a watch is removed
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
cnic: Fix ISCSI_KEVENT_IF_DOWN message handling.
net: irda: init spinlock after memcpy
ixgbe: fix for 82599 errata marking UDP checksum errors
r8169: WakeOnLan fix for the 8168
netxen: reset ring consumer during cleanup
net/bridge: use kobject_put to release kobject in br_add_if error path
smc91x.h: add config for Nomadik evaluation kit
NET: ROSE: Don't use static buffer.
eepro: Read buffer overflow
tokenring: Read buffer overflow
at1700: Read buffer overflow
fealnx: Write outside array bounds
ixgbe: remove unnecessary call to device_init_wakeup
ixgbe: Don't priority tag control frames in DCB mode
ixgbe: Enable FCoE offload when DCB is enabled for 82599
net: Rework mdio-ofgpio driver to use of_mdio infrastructure
register at91_ether using platform_driver_probe
skge: Enable WoL by default if supported
net: KS8851 needs to depend on MII
be2net: Bug fix in the non-lro path. Size of received packet was not updated in statistics properly.
...
When a station queries us for a PS-poll response, we wrongly
queue the frame on the virtual interface's queue rather than
the pending queue.
Additionally, fix a race condition where we could potentially
send multiple frames to the sleeping station due to using a
station flag rather than a packet flag. When converting to a
packet flag, we can also convert p54 and remove the filter
clearing we added for it.
(Also remove a now dead function)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Bob Copeland <me@bobcopeland.com>
Tested-by: Bob Copeland <me@bobcopeland.com>
Cc: Christian Lamparter <chunkeey@web.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>