This is a workaround for find_group_flex() which badly needs to be
replaced. One of its problems (besides ignoring the Orlov algorithm)
is that it is a bit hyperactive about returning failure under
suspicious circumstances. This can lead to spurious ENOSPC failures
even when there are inodes still available.
Work around this for now by retrying the search using
find_group_other() if find_group_flex() returns -1. If
find_group_other() succeeds when find_group_flex() has failed, log a
warning message.
A better block/inode allocator that will fix this problem for real has
been queued up for the next merge window.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts
[CIFS] improve posix semantics of file create
[CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS
cifs: posix fill in inode needed by posix open
cifs: properly handle case where CIFSGetSrvInodeNumber fails
cifs: refactor new_inode() calls and inode initialization
[CIFS] Prevent OOPs when mounting with remote prefixpath.
[CIFS] ipv6_addr_equal for address comparison
At scan time we observed following scenario:
node A inserted
node B inserted
node C inserted -> sets overlapped flag on node B
node A is removed due to CRC failure -> overlapped flag on node B remains
while (tn->overlapped)
tn = tn_prev(tn);
==> crash, when tn_prev(B) is referenced.
When the ultimate node is removed at scan time and the overlapped flag
is set on the penultimate node, then nothing updates the overlapped
flag of that node. The overlapped iterators blindly expect that the
ultimate node does not have the overlapped flag set, which causes the
scan code to crash.
It would be a huge overhead to go through the node chain on node
removal and fix up the overlapped flags, so detecting such a case on
the fly in the overlapped iterators is a simpler and reliable
solution.
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Andrew was concerned about the unit of variables named or have suffix
size. Every usage in percpu allocator is in bytes but make it super
clear by adding comments.
While at it, make pcpu_depopulate_chunk() take int @off and @size like
everyone else.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Impact: Bug fix on UP
Checkin 6ec68bff3c:
x86, mce: reinitialize per cpu features on resume
introduced a call to mce_cpu_features() in the resume path, in order
for the MCE machinery to get properly reinitialized after a resume.
However, this function (and its successors) was flagged __cpuinit,
which becomes __init on UP configurations (on SMP suspend/resume
requires CPU hotplug and so this would not be seen.)
Remove the offending __cpuinit annotations for mce_cpu_features() and
its successor functions.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When two different users mount the same Windows 2003 Server share using CIFS,
the first session mounted can be invalidated. Some servers invalidate the first
smb session when a second similar user (e.g. two users who get mapped by server to "guest")
authenticates an smb session from the same client.
By making sure that we set the 2nd and subsequent vc numbers to nonzero values,
this ensures that we will not have this problem.
Fixes Samba bug 6004, problem description follows:
How to reproduce:
- configure an "open share" (full permissions to Guest user) on Windows 2003
Server (I couldn't reproduce the problem with Samba server or Windows older
than 2003)
- mount the share twice with different users who will be authenticated as guest.
noacl,noperm,user=john,dir_mode=0700,domain=DOMAIN,rw
noacl,noperm,user=jeff,dir_mode=0700,domain=DOMAIN,rw
Result:
- just the mount point mounted last is accessible:
Signed-off-by: Steve French <sfrench@us.ibm.com>
Samba server added support for a new posix open/create/mkdir operation
a year or so ago, and we added support to cifs for mkdir to use it,
but had not added the corresponding code to file create.
The following patch helps improve the performance of the cifs create
path (to Samba and servers which support the cifs posix protocol
extensions). Using Connectathon basic test1, with 2000 files, the
performance improved about 15%, and also helped reduce network traffic
(17% fewer SMBs sent over the wire) due to saving a network round trip
for the SetPathInfo on every file create.
It should also help the semantics (and probably the performance) of
write (e.g. when posix byte range locks are on the file) on file
handles opened with posix create, and adds support for a few flags
which would have to be ignored otherwise.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fixes kernel bug #10451http://bugzilla.kernel.org/show_bug.cgi?id=10451
Certain NAS appliances do not set the operating system or network operating system
fields in the session setup response on the wire. cifs was oopsing on the unexpected
zero length response fields (when trying to null terminate a zero length field).
This fixes the oops.
Acked-by: Jeff Layton <jlayton@redhat.com>
CC: stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
...if it does then we pass a pointer to an unintialized variable for
the inode number to cifs_new_inode. Have it pass a NULL pointer instead.
Also tweak the function prototypes to reduce the amount of casting.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Move new inode creation into a separate routine and refactor the
callers to take advantage of it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fixes OOPs with message 'kernel BUG at fs/cifs/cifs_dfs_ref.c:274!'.
Checks if the prefixpath in an accesible while we are still in cifs_mount
and fails with reporting a error if we can't access the prefixpath
Should fix Samba bugs 6086 and 5861 and kernel bug 12192
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (26 commits)
drm/radeon: update sarea copies of last_ variables on resume.
drm/i915: Keep refs on the object over the lifetime of vmas for GTT mmap.
drm/i915: take struct mutex around fb unref
drm: Use spread spectrum when the bios tells us it's ok.
drm: Collapse identical i8xx_clock() and i9xx_clock().
drm: Bring PLL limits in sync with DDX values.
drm: Add locking around cursor gem operations.
drm: Propagate failure from setting crtc base.
drm: Check for a NULL encoder when reverting on error path
drm/i915: Cleanup the hws on ringbuffer constrution failure.
drm/i915: Don't add panel_fixed_mode to the probed modes list at LVDS init.
drm: Release user fbs in drm_release
drm/i915: Unpin the fb on error during construction.
drm/i915: Unpin the hws if we fail to kmap.
drm/i915: Unpin the ringbuffer if we fail to ioremap it.
drm/i915: unpin for an invalid memory domain.
drm/i915: Release and unlock on mmap_gtt error path.
drm/i915: Set framebuffer alignment based upon the fence constraints.
drm: Do not leak a new reference for flink() on an existing name
drm/i915: Fix potential AB-BA deadlock in i915_gem_execbuffer()
...
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: use the right protections for split-up pagetables
x86, vmi: TSC going backwards check in vmi clocksource
Intel 8257x Ethernet boards have a feature called Serial Over Lan.
This feature works by emulating a serial port, and it is detected by
kernel as a normal 8250 port. However, this emulation is not perfect, as
also noticed on changeset 7500b1f602.
Before this patch, the kernel were trying to check if the serial TX is
capable of work using IRQ's.
This were done with a code similar this:
serial_outp(up, UART_IER, UART_IER_THRI);
lsr = serial_in(up, UART_LSR);
iir = serial_in(up, UART_IIR);
serial_outp(up, UART_IER, 0);
if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT)
up->bugs |= UART_BUG_TXEN;
This works fine for other 8250 ports, but, on 8250-emulated SoL port, the
chip is a little lazy to down UART_IIR_NO_INT at UART_IIR register.
Due to that, UART_BUG_TXEN is sometimes enabled. However, as TX IRQ keeps
working, and the TX polling is now enabled, the driver miss-interprets the
IRQ received later, hanging up the machine until a key is pressed at the
serial console.
This is the 6 version of this patch. Previous versions were trying to
introduce a large enough delay between serial_outp and serial_in(up,
UART_IIR), but not taking forever. However, the needed delay couldn't be
safely determined.
At the experimental tests, a delay of 1us solves most of the cases, but
still hangs sometimes. Increasing the delay to 5us was better, but still
doesn't solve. A very high delay of 50 ms seemed to work every time.
However, poking around with delays and pray for it to be enough doesn't
seem to be a good approach, even for a quirk.
So, instead of playing with random large arbitrary delays, let's just
disable UART_BUG_TXEN for all SoL ports.
[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds more documentation of the lowlevel API to avoid future bugs.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: proper vcache flush on unmap_kernel_range()
flush_cache_vunmap() should be called before pages are unmapped. Add
a call to it in unmap_kernel_range().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed the old commit 8f5aa26c75
("cpusets: update_cpumask documentation fix") is not a complete fix,
resulting in inconsistent paragraphs. This patch fixes it and does other
fixes and updates:
- s/migrate_all_tasks()/migrate_live_tasks()/
- describe more cpuset control files
- s/cpumask_t/struct cpumask/
- document cpu hotplug and change of 'sched_relax_domain_level' may cause
domain rebuild
- document various ways to query and modify cpusets
- the equivalent of "mount -t cpuset" is "mount -t cgroup -o cpuset,noprefix"
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
BUILD_DOCSRC should be controlled by "config" instead of "menuconfig".
I have no idea how I managed to use "menuconfig" here.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "break" would just result in reusing a free'd pointer. I don't have
the cards myself to test it though. :/
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Caused by 736d54533a (sx.c: fix missed unlock_kernel() on error path in
sx_fw_ioctl()). You guys keep breaking things this way in every single
kernel release in at least couple of places... :-(
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It should be Documentation/build/kconfig.txt.
Introduced by commit 2af238e455
("kbuild: make *config usage docs").
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kzfree() is a wrapper for kfree() that additionally zeroes the underlying
memory before releasing it to the slab allocator.
Currently there is code which memset()s the memory region of an object
before releasing it back to the slab allocator to make sure
security-sensitive data are really zeroed out after use.
These callsites can then just use kzfree() which saves some code, makes
users greppable and allows for a stupid destructor that isn't necessarily
aware of the actual object size.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
oprofile for MN10300 seems to have been broken by the advent of the new
tracing framework.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: extend prefetch handling on 64-bit
Currently there's an extra is_prefetch() check done in do_sigbus(),
which we only do on 32 bits.
This is a last-ditch check before we terminate a task, so it's worth
giving prefetch instructions another chance - should none of our
existing quirks have caught a prefetch instruction related spurious
fault.
The only risk is if a prefetch causes a real sigbus, in that case
we'll not OOM but try another fault. But this code has been on
32-bit for a long time, so it should be fine in practice.
So do this on 64-bit too - and thus remove one more #ifdef.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Removal of an #ifdef in fault_in_kernel_space(), by making
use of the new TASK_SIZE_MAX symbol which is now available
on 32-bit too.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the
define on 32-bit too. (mapped to TASK_SIZE)
This allows 32-bit code to make use of the (former-) TASK_SIZE64
symbol as well, in a clean way.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
do_page_fault() has this ugly #ifdef in its prototype:
#ifdef CONFIG_X86_64
asmlinkage
#endif
void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
Replace it with 'dotraplinkage' which maps to exactly the above
construct: nothing on 32-bit and asmlinkage on 64-bit.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add oops-recursion check to 32-bit
Unify the oops state-machine, to the 64-bit version. It is
slightly more careful in that it does a recursion check
in oops_begin(), and is thus more likely to show the relevant
oops.
It also means that 32-bit will print one more line at the
end of pagefault triggered oopses:
printk(KERN_EMERG "CR2: %016lx\n", address);
Which is generally good information to be seen in partial-dump
digital-camera jpegs ;-)
The downside is the somewhat more complex critical path. Both
variants have been tested well meanwhile by kernel developers
crashing their boxes so i dont think this is a practical worry.
This removes 3 ugly #ifdefs from no_context() and makes the
function a lot nicer read.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: refine/extend page fault related oops printing on 64-bit
- honor the pause_on_oops logic on 64-bit too
- print out NX fault warnings on 64-bit as well
- factor out the NX fault message to make it git-greppable and readable
Note that this means that we do the PF_INSTR check on 32-bit non-PAE
as well where it should not occur ... normally. Cannot hurt.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Remove an #ifdef from notify_page_fault(). The function still
compiles to nothing in the !CONFIG_KPROBES case.
Introduce kprobes_built_in() and kprobe_fault_handler() helpers
to allow this - they returns 0 if !CONFIG_KPROBES.
No code changed:
text data bss dec hex filename
4618 32 24 4674 1242 fault.o.before
4618 32 24 4674 1242 fault.o.after
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Remove an #ifdef from kmmio_fault() - we can do this by
providing default implementations for is_kmmio_active()
and kmmio_handler(). The compiler optimizes it all away
in the !CONFIG_MMIOTRACE case.
Also, while at it, clean up mmiotrace.h a bit:
- standard header guards
- standard vertical spaces for structure definitions
No code changed (both with mmiotrace on and off in the config):
text data bss dec hex filename
2947 12 12 2971 b9b fault.o.before
2947 12 12 2971 b9b fault.o.after
Cc: Pekka Paalanen <pq@iki.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: improve page fault handling robustness
The 'PF_RSVD' flag (bit 3) of the page-fault error_code is a
relatively recent addition to x86 CPUs, so the 32-bit do_fault()
implementation never had it. This flag gets set when the CPU
detects nonzero values in any reserved bits of the page directory
entries.
Extend the existing 64-bit check for PF_RSVD in do_page_fault()
to 32-bit too. If we detect such a fault then we print a more
informative oops and the pagetables.
This unifies the code some more, removes an ugly #ifdef and improves
the 32-bit page fault code robustness a bit. It slightly increases
the 32-bit kernel text size.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Instead of an ugly, open-coded, #ifdef-ed vm86 related legacy check
in do_page_fault(), put it into the check_v8086_mode() helper
function and merge it with an existing #ifdef.
Also, simplify the code flow a tiny bit in the helper.
No code changed:
arch/x86/mm/fault.o:
text data bss dec hex filename
2711 12 12 2735 aaf fault.o.before
2711 12 12 2735 aaf fault.o.after
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: no functionality changed
Factor out the opcode checker into a helper inline.
The code got a tiny bit smaller:
text data bss dec hex filename
4632 32 24 4688 1250 fault.o.before
4618 32 24 4674 1242 fault.o.after
And it got cleaner / easier to review as well.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, no code changed
Clean up various small details, which can be correctness checked
automatically:
- tidy up the include file section
- eliminate unnecessary includes
- introduce show_signal_msg() to clean up code flow
- standardize the code flow
- standardize comments and other style details
- more cleanups, pointed out by checkpatch
No code changed on either 32-bit nor 64-bit:
arch/x86/mm/fault.o:
text data bss dec hex filename
4632 32 24 4688 1250 fault.o.before
4632 32 24 4688 1250 fault.o.after
the md5 changed due to a change in a single instruction:
2e8a8241e7f0d69706776a5a26c90bc0 fault.o.before.asm
c5c3d36e725586eb74f0e10692f0193e fault.o.after.asm
Because a __LINE__ reference in a WARN_ONCE() has changed.
On 32-bit a few stack offsets changed - no code size difference
nor any functionality difference.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: added precaution on failure detection
Break out of the modifying loop as soon as a failure is detected.
This is just an added precaution found by code review and was not
found by any bug chasing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix to prevent NMI lockup
If the page fault handler produces a WARN_ON in the modifying of
text, and the system is setup to have a high frequency of NMIs,
we can lock up the system on a failure to modify code.
The modifying of code with NMIs allows all NMIs to modify the code
if it is about to run. This prevents a modifier on one CPU from
modifying code running in NMI context on another CPU. The modifying
is done through stop_machine, so only NMIs must be considered.
But if the write causes the page fault handler to produce a warning,
the print can slow it down enough that as soon as it is done
it will take another NMI before going back to the process context.
The new NMI will perform the write again causing another print and
this will hang the box.
This patch turns off the writing as soon as a failure is detected
and does not wait for it to be turned off by the process context.
This will keep NMIs from getting stuck in this back and forth
of print outs.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: keep kernel text read only
Because dynamic ftrace converts the calls to mcount into and out of
nops at run time, we needed to always keep the kernel text writable.
But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
the kernel code to writable before ftrace modifies the text, and converts
it back to read only afterward.
The kernel text is converted to read/write, stop_machine is called to
modify the code, then the kernel text is converted back to read only.
The original version used SYSTEM_STATE to determine when it was OK
or not to change the code to rw or ro. Andrew Morton pointed out that
using SYSTEM_STATE is a bad idea since there is no guarantee to what
its state will actually be.
Instead, I moved the check into the set_kernel_text_* functions
themselves, and use a local variable to determine when it is
OK to change the kernel text RW permissions.
[ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
clean up vmi_read_cycles to use max()
Reported-b: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>