Commit graph

9203 commits

Author SHA1 Message Date
Michael Chan
4cf78e4fb6 [TG3]: add 5780 basic support
Add 5780 PCI IDs, chip IDs, and other basic support.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-25 12:29:19 -07:00
Linus Torvalds
6b6a93c687 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-07-24 20:39:30 -07:00
Linus Torvalds
e89227889c Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2005-07-24 20:38:44 -07:00
David S. Miller
db7d9a4eb7 [SPARC64]: Move syscall success and newchild state out of thread flags.
These two bits were accesses non-atomically from assembler
code.  So, in order to eliminate any potential races resulting
from that, move these pieces of state into two bytes elsewhere
in struct thread_info.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:36:26 -07:00
David S. Miller
cdd5186f75 [SPARC64]: Privatize sun5_timer.
It is only used by some localized code in irq.c, and also
delete enable_prom_timer() as that is totally unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:36:13 -07:00
David S. Miller
c5019a578f [SPARC64]: Kill totally unused inline functions from asm/spitfire.h
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:35:56 -07:00
David S. Miller
620de54675 [SPARC64]: Simplify asm/rwsem.h slightly.
rwsem_atomic_update and rwsem_atomic_add can be implemented
straightly using atomic_*() routines.

Also, rwsem_cmpxchgw() is totally unused, kill it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:35:42 -07:00
David S. Miller
6593eaed81 [SPARC64]: Non-atomic bitops do not need volatile operations
Noticed this while comparing sparc64's bitops.h to ppc64's.
We can cast the volatile memory argument to be non-volatile.

While we're here, __inline__ --> inline.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:35:28 -07:00
David S. Miller
48647feed9 [W1]: Do not use NFLOG netlink number.
Use the reserved by never used NETLINK_SKIP value instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:30:28 -07:00
Russell King
4e8fd22bd4 [PATCH] ARM SMP: Fix ARMv6 spinlock and semaphore implementations
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-07-24 12:13:40 +01:00
Linus Torvalds
2c2a68b847 Merge master.kernel.org:/home/rmk/linux-2.6-serial 2005-07-23 17:01:26 -07:00
Linus Torvalds
c94c0d201f Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-07-23 16:59:55 -07:00
Linus Torvalds
2847e3478c x86: use alternative instructions for fnsave/fxsave too
This one ends up using an inline asm format that claims to read memory
and then clobber it (rather than just write it directly), which made it
easier to use the existing "alternative_input()" infrastructure support.

Now the fxsave code matches the fxrstor.
2005-07-22 18:19:20 -04:00
Linus Torvalds
38afd6adf6 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-07-22 16:33:00 -07:00
David S. Miller
261688d01e [PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT}
More unusable TCF_META_* match types that need to get eliminated
before 2.6.13 goes out the door.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 14:43:52 -07:00
Linus Torvalds
8ed1383fb7 x86: make restore_fpu() use alternative assembler instructions
It's really just a single instruction, conditional on whether the CPU
supports FXSR or not, so implement it as such instead of making it a
function that queries FXSR dynamically.

This means that the instruction just gets automatically rewritten to the
correct one at boot-time.
2005-07-22 16:06:16 -04:00
David S. Miller
28e212fb36 [PKT_SCHED]: Kill TCF_META_ID_REALDEV from meta ematch.
It won't exist any longer when we shrink the SKB in 2.6.14,
and we should kill this off before anyone in userspace starts
using it.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 11:47:25 -07:00
Rusty Russell
4acdbdbe50 [NETFILTER]: ip_conntrack_expect_related must not free expectation
If a connection tracking helper tells us to expect a connection, and
we're already expecting that connection, we simply free the one they
gave us and return success.

The problem is that NAT helpers (eg. FTP) have to allocate the
expectation first (to see what port is available) then rewrite the
packet.  If that rewrite fails, they try to remove the expectation,
but it was freed in ip_conntrack_expect_related.

This is one example of a larger problem: having registered the
expectation, the pointer is no longer ours to use.  Reference counting
is needed for ctnetlink anyway, so introduce it now.

To have a single "put" path, we need to grab the reference to the
connection on creation, rather than open-coding it in the caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-21 13:14:46 -07:00
David Woodhouse
39299d9d15 Merge with /shiny/git/linux-2.6/.git 2005-07-19 17:49:39 -04:00
Patrick McHardy
0303770deb [NET]: Make ipip/ip6_tunnel independant of XFRM
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:03:34 -07:00
Sridhar Samudrala
d1ad1ff299 [SCTP]: Fix potential null pointer dereference while handling an icmp error
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:44:10 -07:00
Victor Fusco
e2bf521d97 [NET]: Fix "nocast type" warnings in skbuff.h
From: Victor Fusco <victor@cetuc.puc-rio.br>

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:36:38 -07:00
Patrick McHardy
23af27eb8f [PKT_SCHED]: Kill TCF_META_ID_TCCLASSID.
Thomas Graf states:

> I used to mark such ids as obsolete in the header but since
> skb is on diet anyway and there has been no official
> iproute2 release with the ematch bits included it might be
> a better idea to remove the ids from the header completely.
> Those that have picked up my patch on netdev shouldn't care
> about a ABI breakage, actually I doubt that someone is using
> it already.

So here's the patch to remove it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:34:35 -07:00
Andrey Panin
fbc0dc0df5 [PATCH] Serial: Add support for SIIG Quartet serial card
Add support for SIIG Quartet Serial card.  This card has Oxford
Semiconducor 16954 quad UART which is clocked by 10x faster
(18.432 MHz) quartz.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-07-18 11:38:09 +01:00
Sascha Hauer
772a9e631c [PATCH] ARM: 2687/1: i.MX framebuffer: make dmacr register platform configurable
Patch from Sascha Hauer

The dmacr needs different settings on some boards. This patch makes the
register configurable by the platform part.
Also we have imxfb_disable_controller(), so lets use it.

Signed-off-by: Steven Scholz
Signed-off-by: Sascha Hauer
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-07-17 20:15:36 +01:00
Linus Torvalds
f60f700876 Merge master.kernel.org:/home/rmk/linux-2.6-serial 2005-07-16 20:06:51 -07:00
Alexander Schulz
b7523418f6 [PATCH] ARM: 2815/1: Shark: new defconfig, fixes with __io and serial ports
Patch from Alexander Schulz

This patch brings a new default config file for the shark and
fixes a compilation issue with io addressing and a runtime
problem with the serial ports, where I corrected a wrong
regshift value.
These are all shark specific files so I hope it is ok to
put them in one patch.

Signed-off-by: Alexander Schulz <alex@shark-linux.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-07-16 17:17:18 +01:00
Olaf Hering
6d283d2716 [PATCH] Serial: Remove linux/version.h
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-07-16 09:59:00 +01:00
Russell King
661f83a67c [PATCH] Serial: Move deprecation of register_serial forward to September
I think it's about time to make the build a little more vocal about the
expiry of these functions.  Due to recent discussions with problems in
the console initialisation vs power manglement, I'd like to move the
date forward to September.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-07-16 09:30:53 +01:00
NeilBrown
6a806c510d [PATCH] md/raid1: clear bitmap when fullsync completes
We need to be careful differentiating between a resync of a complete array,
in which we can clear the bitmap, and a resync of a degraded array, in
which we cannot.

This patch cleans all that up.

Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-15 09:54:51 -07:00
Sam Ravnborg
6d30e3a899 kbuild: Avoid inconsistent kallsyms data
Several reports on inconsistent kallsyms data has been caused by the aliased symbols
__sched_text_start and __down to shift places in the output of nm.
The root cause was that on second pass ld aligned __sched_text_start to a 4 byte boundary
which is the function alignment on i386.
sched.text and spinlock.text is now aligned to an 8 byte boundary to make sure they
are aligned to a function alignemnt on most (all?) archs.

Tested by: Paulo Marques <pmarques@grupopie.com>
Tested by: Alexander Stohr <Alexander.Stohr@gmx.de>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2005-07-14 20:15:44 +00:00
James Bottomley
e10fb91c4d [SCSI] fix function prototype warning
int_to_scsilun() takes a pointer to a struct scsi_lun in it's
prototype, so add this structure to scsi_device.h to avoid declaration
inside function prototype warnings.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-07-14 11:54:17 -05:00
Paolo 'Blaisorblade' Giarrusso
2e5e55923e [PATCH] uml: consolidate modify_ldt
*) Reorganize the two cases of sys_modify_ldt to share all the reasonably
   common code.

*) Avoid memory allocation when unneeded (i.e.  when we are writing and the
   passed buffer size is known), thus not returning ENOMEM (which isn't
   allowed for this syscall, even if there is no strict "specification").

*) Add copy_{from,to}_user to modify_ldt for TT mode.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-14 09:00:24 -07:00
James.Smart@Emulex.Com
2f4701d827 [SCSI] add int_to_scsilun() function
One of the issues we had was reverting the midlayers lun value
into the 8byte lun value that we wanted to send to the device.
Historically, there's been some combination of byte swapping,
setting high/low, etc. There's also been no common thread between
how our driver did it and others.  I also got very confused as
to why byteswap routines were being used.

Anyway, this patch is a LLDD-callable function that reverts the
midlayer's lun value, stored in an int, to the 8-byte quantity
(note: this is not the real 8byte quantity, just the same amount
that scsilun_to_int() was able to convert and store originally).

This also solves the dilemma of the thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112116767118981&w=2

A patch for the lpfc driver to use this function will be along
in a few days (batched with other patches).

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-07-14 11:21:27 -04:00
Andrew Vasquez
ac96202ba0 [SCSI] qla2xxx: Add pci ids for new ISP types.
Add pci ids for new ISP types.

Move old definitions in local qla_def.h file to pci_ids.h as
well.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-07-14 10:54:20 -04:00
Robert Moore
f9f4601f33 ACPICA 20050708 from Bob Moore <robert.moore@intel.com>
The use of the CPU stack in the debug version of the
subsystem has been considerably reduced.  Previously, a
debug structure was declared in every function that used
the debug macros.  This structure has been removed in
favor of declaring the individual elements as parameters
to the debug functions.  This reduces the cumulative stack
use during nested execution of ACPI function calls at the
cost of a small increase in the code size of the debug
version of the subsystem.  With assistance from Alexey
Starikovskiy and Len Brown.

Added the ACPI_GET_FUNCTION_NAME macro to enable the
compiler-dependent headers to define a macro that will
return the current function name at runtime (such as
__FUNCTION__ or _func_, etc.) The function name is used
by the debug trace output.  If ACPI_GET_FUNCTION_NAME
is not defined in the compiler-dependent header, the
function name is saved on the CPU stack (one pointer per
function.) This mechanism is used because apparently there
exists no standard ANSI-C defined macro that that returns
the function name.

Alexey Starikovskiy redesigned and reimplemented the
"Owner ID" mechanism used to track namespace objects
created/deleted by ACPI tables and control method
execution.  A bitmap is now used to allocate and free the
IDs, thus solving the wraparound problem present in the
previous implementation.  The size of the namespace node
descriptor was reduced by 2 bytes as a result.

Removed the UINT32_BIT and UINT16_BIT types that were used
for the bitfield flag definitions within the headers for
the predefined ACPI tables.  These have been replaced by
UINT8_BIT in order to increase the code portability of
the subsystem.  If the use of UINT8 remains a problem,
we may be forced to eliminate bitfields entirely because
of a lack of portability.

Alexey Starikovksiy enhanced the performance of
acpi_ut_update_object_reference.  This is a frequently used
function and this improvement increases the performance
of the entire subsystem.

Alexey Starikovskiy fixed several possible memory leaks
and the inverse - premature object deletion.

Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-14 00:42:23 -04:00
Robert Moore
73459f73e5 ACPICA 20050617-0624 from Bob Moore <robert.moore@intel.com>
ACPICA 20050617:

Moved the object cache operations into the OS interface
layer (OSL) to allow the host OS to handle these operations
if desired (for example, the Linux OSL will invoke the
slab allocator).  This support is optional; the compile
time define ACPI_USE_LOCAL_CACHE may be used to utilize
the original cache code in the ACPI CA core.  The new OSL
interfaces are shown below.  See utalloc.c for an example
implementation, and acpiosxf.h for the exact interface
definitions.  Thanks to Alexey Starikovskiy.
	acpi_os_create_cache
	acpi_os_delete_cache
	acpi_os_purge_cache
	acpi_os_acquire_object
	acpi_os_release_object

Modified the interfaces to acpi_os_acquire_lock and
acpi_os_release_lock to return and restore a flags
parameter.  This fits better with many OS lock models.
Note: the current execution state (interrupt handler
or not) is no longer passed to these interfaces.  If
necessary, the OSL must determine this state by itself, a
simple and fast operation.  Thanks to Alexey Starikovskiy.

Fixed a problem in the ACPI table handling where a valid
XSDT was assumed present if the revision of the RSDP
was 2 or greater.  According to the ACPI specification,
the XSDT is optional in all cases, and the table manager
therefore now checks for both an RSDP >=2 and a valid
XSDT pointer.  Otherwise, the RSDT pointer is used.
Some ACPI 2.0 compliant BIOSs contain only the RSDT.

Fixed an interpreter problem with the Mid() operator in the
case of an input string where the resulting output string
is of zero length.  It now correctly returns a valid,
null terminated string object instead of a string object
with a null pointer.

Fixed a problem with the control method argument handling
to allow a store to an Arg object that already contains an
object of type Device.  The Device object is now correctly
overwritten.  Previously, an error was returned.

ACPICA 20050624:

Modified the new OSL cache interfaces to use ACPI_CACHE_T
as the type for the host-defined cache object.  This allows
the OSL implementation to define and type this object in
any manner desired, simplifying the OSL implementation.
For example, ACPI_CACHE_T is defined as kmem_cache_t for
Linux, and should be defined in the OS-specific header
file for other operating systems as required.

Changed the interface to AcpiOsAcquireObject to directly
return the requested object as the function return (instead
of ACPI_STATUS.) This change was made for performance
reasons, since this is the purpose of the interface in the
first place.  acpi_os_acquire_object is now similar to the
acpi_os_allocate interface.  Thanks to Alexey Starikovskiy.

Modified the initialization sequence in
acpi_initialize_subsystem to call the OSL interface
acpi_osl_initialize first, before any local initialization.
This change was required because the global initialization
now calls OSL interfaces.

Restructured the code base to split some files because
of size and/or because the code logically belonged in a
separate file.  New files are listed below.

  utilities/utcache.c	/* Local cache interfaces */
  utilities/utmutex.c	/* Local mutex support */
  utilities/utstate.c	/* State object support */
  parser/psloop.c	/* Main AML parse loop */

Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-13 23:45:36 -04:00
Linus Torvalds
514fd7fd01 Merge master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6 2005-07-13 15:48:33 -07:00
Robert Moore
88ac00f5a8 ACPICA 20050526 from Bob Moore <robert.moore@intel.com>
Implemented support to execute Type 1 and Type 2 AML
opcodes appearing at the module level (not within a control
method.)  These opcodes are executed exactly once at the
time the table is loaded. This type of code was legal up
until the release of ACPI 2.0B (2002) and is now supported
within ACPI CA in order to provide backwards compatibility
with earlier BIOS implementations. This eliminates the
"Encountered executable code at module level" warning that
was previously generated upon detection of such code.

Fixed a problem in the interpreter where an AE_NOT_FOUND
exception could inadvertently be generated during the
lookup of namespace objects in the second pass parse of
ACPI tables and control methods. It appears that this
problem could occur during the resolution of forward
references to namespace objects.

Added the ACPI_MUTEX_DEBUG #ifdef to the
acpi_ut_release_mutex function, corresponding to the same
the deadlock detection debug code to be compiled out in
the normal case, improving mutex performance (and overall
subsystem performance) considerably.  As suggested by
Alexey Starikovskiy.

Implemented a handful of miscellaneous fixes for possible
memory leaks on error conditions and error handling
control paths. These fixes were suggested by FreeBSD and
the Coverity Prevent source code analysis tool.

Added a check for a null RSDT pointer in
acpi_get_firmware_table (tbxfroot.c) to prevent a fault
in this error case.

Signed-off-by Len Brown <len.brown@intel.com>
2005-07-13 16:46:34 -04:00
Robert Moore
6f42ccf2fc ACPICA from Bob Moore <robert.moore@intel.com>
Implemented support for PCI Express root bridges
-- added support for device PNP0A08 in the root
bridge search within AcpiEvPciConfigRegionSetup.
acpi_ev_pci_config_region_setup().

The interpreter now automatically truncates incoming
64-bit constants to 32 bits if currently executing out
of a 32-bit ACPI table (Revision < 2). This also affects
the iASL compiler constant folding. (Note: as per below,
the iASL compiler no longer allows 64-bit constants within
32-bit tables.)

Fixed a problem where string and buffer objects with
"static" pointers (pointers to initialization data within
an ACPI table) were not handled consistently. The internal
object copy operation now always copies the data to a newly
allocated buffer, regardless of whether the source object
is static or not.

Fixed a problem with the FromBCD operator where an
implicit result conversion was improperly performed while
storing the result to the target operand. Since this is an
"explicit conversion" operator, the implicit conversion
should never be performed on the output.

Fixed a problem with the CopyObject operator where a copy
to an existing named object did not always completely
overwrite the existing object stored at name. Specifically,
a buffer-to-buffer copy did not delete the existing buffer.

Replaced "interrupt_level" with "interrupt_number" in all
GPE interfaces and structs for consistency.

Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-13 16:29:07 -04:00
Jeff Garzik
327309e899 Merge upstream 2.6.13-rc3 into ieee80211 branch of netdev-2.6. 2005-07-13 16:23:51 -04:00
Tony Luck
99ad25a313 Auto merge with /home/aegl/GIT/linus 2005-07-13 12:15:43 -07:00
David Gibson
96e2844999 [PATCH] ppc64: kill bitfields in ppc64 hash code
This patch removes the use of bitfield types from the ppc64 hash table
manipulation code.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:25:25 -07:00
Martin Schwidefsky
068e1b94bb [PATCH] s390: fadvise hint values.
Add special case for the POSIX_FADV_DONTNEED and POSIX_FADV_NOREUSE hint
values for s390-64.  The user space values in the s390-64 glibc headers for
these two defines have always been 6 and 7 instead of 4 and 5.  All 64 bit
applications therefore use the "wrong" values.  To get these applications
working without recompiling the kernel needs to accept the "wrong" values.
Since the values for s390-31 are 4 and 5 the compat wrapper for fadvise64
and fadvise64_64 need to rewrite the values for 31 bit system calls.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:25:24 -07:00
Guillaume Autran
ddca3b80ce [PATCH] ppc32: fix destroy_context() race condition
Fix for a race condition when a task gets preempted by another task while
executing the destroy_context(...) in a FEW_CONTEXTS environment.
mm->context == NO_CONTEXT but the context_map may indicate all contexts are
in use.

The solution to this problem is to disable kernel preemption while
destroying a MMU context.

Signed-off-by: Guillaume Autran <gautran@mrv.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:25:24 -07:00
Anton Altaparmakov
88bd5121d6 [PATCH] Fix soft lockup due to NTFS: VFS part and explanation
Something has changed in the core kernel such that we now get concurrent
inode write outs, one e.g via pdflush and one via sys_sync or whatever.
This causes a nasty deadlock in ntfs.  The only clean solution
unfortunately requires a minor vfs api extension.

First the deadlock analysis:

Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount
time.  The NTFS driver uses the page cache for storing the file contents as
usual.  More interestingly this file contains the table of on-disk inodes
as a sequence of MFT_RECORDs.  Thus NTFS driver accesses the on-disk inodes
by accessing the MFT_RECORDs in the page cache pages of the loaded inode
$MFT.

The situation: VFS inode X on a mounted ntfs volume is dirty.  For same
inode X, the ntfs_inode is dirty and thus corresponding on-disk inode,
which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the
table of inodes ($MFT, inode 0).

What happens:

Process 1: sys_sync()/umount()/whatever...  calls __sync_single_inode() for
$MFT -> do_writepages() -> write_page for the dirty page containing the
on-disk inode X, the page is now locked -> ntfs_write_mst_block() which
clears PageUptodate() on the page to prevent anyone else getting hold of it
whilst it does the write out (this is necessary as the on-disk inode needs
"fixups" applied before the write to disk which are removed again after the
write and PageUptodate is then set again).  It then analyses the page
looking for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this on-disk
inode.  This then calls ilookup5() to check if the corresponding VFS inode
is in icache().  This in turn calls ifind() which waits on the inode lock
via wait_on_inode whilst holding the global inode_lock.

Process 2: pdflush results in a call to __sync_single_inode for the same
VFS inode X on the ntfs volume.  This locks the inode (I_LOCK) then calls
write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of
the page (in page cache of table of inodes $MFT, inode 0) containing the
on-disk inode.  This page has PageUptodate() clear because of Process 1
(see above) so read_cache_page() blocks when tries to take the page lock
for the page so it can call ntfs_read_page().

Thus Process 1 is holding the page lock on the page containing the on-disk
inode X and it is waiting on the inode X to be unlocked in ifind() so it
can write the page out and then unlock the page.

And Process 2 is holding the inode lock on inode X and is waiting for the
page to be unlocked so it can call ntfs_readpage() or discover that
Process 1 set PageUptodate() again and use the page.

Thus we have a deadlock due to ifind() waiting on the inode lock.

The only sensible solution: NTFS does not care whether the VFS inode is
locked or not when it calls ilookup5() (it doesn't use the VFS inode at
all, it just uses it to find the corresponding ntfs_inode which is of
course attached to the VFS inode (both are one single struct); and it uses
the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant)
hence we want a modified ilookup5_nowait() which is the same as ilookup5()
but it does not wait on the inode lock.

Without such functionality I would have to keep my own ntfs_inode cache in
the NTFS driver just so I can find ntfs_inodes independent of their VFS
inodes which would be slow, memory and cpu cycle wasting, and incredibly
stupid given the icache already exists in the VFS.

Below is a patch that does the ilookup5_nowait() implementation in
fs/inode.c and exports it.

ilookup5_nowait.diff:

Introduce ilookup5_nowait() which is basically the same as ilookup5() but
it does not wait on the inode's lock (i.e. it omits the wait_on_inode()
done in ifind()).

This is needed to avoid a nasty deadlock in NTFS.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:25:24 -07:00
Robert Love
5995f16b4a [PATCH] inotify: event ordering
This rearranges the event ordering for "open" to be consistent with the
ordering of the other events.

Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:09:31 -07:00
Robert Love
0399cb08c5 [PATCH] inotify: move sysctl
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from
"/proc/sys/fs".  Also some related cleanup.

Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 11:09:31 -07:00
Linus Torvalds
6cd59f7a41 Merge /home/torvalds/linux-2.6-arm 2005-07-13 10:12:50 -07:00
David Woodhouse
30beab1491 Merge with /shiny/git/linux-2.6/.git 2005-07-13 15:25:59 +01:00