linux-pinenote

Author	SHA1	Message	Date
Alex Chiang	3ed4fd96b3	PCI: Introduce pci_rescan_bus() This API is used by the PCI core to rescan a bus and rediscover newly added devices. Over time, it is expected that the various PCI hotplug drivers will migrate to this interface and away from the old pci_do_scan_bus() interface. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 14:57:44 -07:00
Kenji Kaneshige	79af72d716	PCI: pci_is_root_bus helper Introduce pci_is_root_bus helper function. This will help make code more consistent, as well as prevent incorrect assumptions (such as pci_bus->self == NULL on a root bus, which is not always true). Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 14:56:36 -07:00
Yu Zhao	74bb1bcc7d	PCI: handle SR-IOV Virtual Function Migration Add or remove a Virtual Function after receiving a Migrate In or Out Request. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:28 -07:00
Yu Zhao	dd7cc44d0b	PCI: add SR-IOV API for Physical Function driver Add or remove the Virtual Function when the SR-IOV is enabled or disabled by the device driver. This can happen anytime rather than only at the device probe stage. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:26 -07:00
Yu Zhao	d1b054da8f	PCI: initialize and release SR-IOV capability If a device has the SR-IOV capability, initialize it (set the ARI Capable Hierarchy in the lowest numbered PF if necessary; calculate the System Page Size for the VF MMIO, probe the VF Offset, Stride and BARs). A lock for the VF bus allocation is also initialized if a PF is the lowest numbered PF. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:22 -07:00
David O'Shea	8293b0f629	PCI: Compaq Evo D510 SMBus quirk using USB instead of VGA On the Compaq Evo D510 SFF/CMT, a PCI quirk activated the SMBus device based on detection of the on-board VGA controller, but the on-board VGA is disabled if an AGP card is inserted, so look for one of the USB controllers instead. Signed-off-by: David O'Shea <dcoshea@hotmail.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:19 -07:00
Matthew Wilcox	1c8d7b0a56	PCI MSI: Add support for multiple MSI Add the new API pci_enable_msi_block() to allow drivers to request multiple MSI and reimplement pci_enable_msi in terms of pci_enable_msi_block. Ensure that the architecture back ends don't have to know about multiple MSI. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:14 -07:00
Matthew Wilcox	f2440d9acb	PCI MSI: Refactor interrupt masking code Since most of the callers already know whether they have an MSI or an MSI-X capability, split msi_set_mask_bits() into msi_mask_irq() and msix_mask_irq(). The only callers which don't (mask_msi_irq() and unmask_msi_irq()) can share code in msi_set_mask_bit(). This then becomes the only caller of msix_flush_writes(), so we can inline it. The flushing read can be to any address that belongs to the device, so we can eliminate the calculation too. We can also get rid of maskbits_mask from struct msi_desc and simply recalculate it on the rare occasion that we need it. The single-bit 'masked' element is replaced by a copy of the 32-bit 'masked' register, so this patch does not affect the size of msi_desc. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:13 -07:00
Matthew Wilcox	264d9caaa1	PCI MSI: Use mask_pos instead of mask_base when appropriate MSI interrupts have a mask_pos where MSI-X have a mask_base. Use a transparent union to get rid of some ugly casts. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:13 -07:00
Matthew Wilcox	24d2755339	PCI MSI: Replace 'type' with 'is_msix' By changing from a 5-bit field to a 1-bit field, we free up some bits that can be used by a later patch. Also rearrange the fields for better packing on 64-bit platforms (reducing the size of msi_desc from 72 bytes to 64 bytes). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:12 -07:00
Yu Zhao	998dd7c719	PCI: fix incorrect mask of PM No_Soft_Reset bit Reviewed-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:08 -07:00
Kenji Kaneshige	d18690af62	PCI/ACPI: fix wrong assumption in acpi_find_root_bridge_handle Current acpi_find_root_bridge_handle() has a assumption that pci_bus->self is NULL on the root pci bus. But it might not be true on some platforms. Because of this wrong assumption, current acpi_find_root_bridge_handle() might cause endless loop. We must check pci_bus->parent instead. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:47:57 -07:00
Kenji Kaneshige	0747aaf42d	PCI/ACPI: fix wrong assumption in acpi_pci_get_bridge_handle Current acpi_pci_get_bridge_handle() has an assumption that pci_bus->self is NULL on the root pci bus. But it might not true on some platforms. Because of this wrong assumption, current acpi_pci_get_bridge_handle() might return improper ACPI handle. We must check pci_bus->parent instead. This bug is the root cause of the following kernel panic reported by James Bottomley. This problem was introduced by the commit `e8c331e963`. The immediate cause was acpi_pci_get_bridge_handle() returned NULL unexpectedly and it was passed as the second argument of acpi_walk_namespace(). pci_hotplug: PCI Hot Plug PCI Core version: 0.5 acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff8039646f>] acpi_ns_get_next_node+0xb/0x3c PGD 0 Oops: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.28 #1 RIP: 0010:[<ffffffff8039646f>] [<ffffffff8039646f>] acpi_ns_get_next_node+0xb/0x3c RSP: 0018:ffff88007f87fd30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff8037d260 R09: ffff88007f87fdfc R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff80742040(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000010 CR3: 0000000000201000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff88007f87e000, task ffff88007f875040) Stack: 0000000000000000 ffffffff803964f5 ffff88007f81b728 0000000000001001 ffff88007f87fdfc ffffffff8037d260 0000000600000001 0000000000000000 ffffffff8037d260 0000000000000000 0000000000000001 ffff88007f87fdfc Call Trace: [<ffffffff803964f5>] acpi_ns_walk_namespace+0x55/0x138 [<ffffffff8037d260>] is_pci_dock_device+0x0/0x20 [<ffffffff8037d260>] is_pci_dock_device+0x0/0x20 [<ffffffff80394a9e>] acpi_walk_namespace+0x5f/0x83 [<ffffffff8037dd33>] detect_ejectable_slots+0x53/0x70 [<ffffffff8037de38>] add_bridge+0xe8/0x200 [<ffffffff80394aaa>] acpi_walk_namespace+0x6b/0x83 [<ffffffff803a4ad1>] acpi_pci_register_driver+0x48/0x61 [<ffffffff806fc5df>] acpiphp_init+0x0/0x58 [<ffffffff806fc732>] acpiphp_glue_init+0x4c/0x5a [<ffffffff806fc616>] acpiphp_init+0x37/0x58 [<ffffffff8020903b>] _stext+0x3b/0x180 [<ffffffff80312598>] create_proc_entry+0x58/0xa0 [<ffffffff802815d1>] register_irq_proc+0xc1/0xe0 [<ffffffff806db64b>] kernel_init+0x152/0x1ac [<ffffffff8023d970>] finish_task_switch+0x0/0x110 [<ffffffff8020ca7a>] child_rip+0xa/0x20 [<ffffffff8020c47c>] restore_args+0x0/0x30 [<ffffffff806db4f9>] kernel_init+0x0/0x1ac [<ffffffff8020ca70>] child_rip+0x0/0x20 Code: 89 c2 48 8b 00 48 85 c0 75 f5 48 8b 45 00 48 89 02 44 88 65 09 48 89 5d 00 31 c0 5b 5d 41 5c c3 53 48 85 d2 89 fb 48 89 d7 75 06 <48> 8b 56 10 eb 08 e8 73 f1 ff ff 48 89 c2 85 db 74 1a eb 13 0f RIP [<ffffffff8039646f>] acpi_ns_get_next_node+0xb/0x3c RSP <ffff88007f87fd30> CR2: 0000000000000010 ---[ end trace a7919e7f17c0a725 ]--- Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:47:56 -07:00
Rafael J. Wysocki	3a3c244c9a	PCI: PCIe portdrv: Implement pm object Implement pm object for the PCI Express port driver in order to use the new power management framework and reduce the code size. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:47:49 -07:00
David Brownell	a4b6d516a6	[MTD] partitioning utility predicates Move mtd_has_partitions() and mtd_has_cmdlinepart() inlines from a DaVinci-specific driver to the <linux/mtd/partitions.h> header. Use those to eliminate #ifdefs in two drivers which had their own definitions of mtd_has_partitions(). Quite a lot of other MTD drivers could benefit from using use one or both of these to remove #ifdeffery. Maybe some Janitors would like to help. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-03-20 13:16:44 +00:00
Ingo Molnar	7f00a2495b	Merge branches 'x86/cleanups', 'x86/mm', 'x86/setup' and 'linus' into x86/core	2009-03-20 10:34:22 +01:00
Ingo Molnar	22de89b371	Merge branches 'tracing/ftrace', 'tracing/kprobes', 'tracing/tasks' and 'linus' into tracing/core	2009-03-20 10:14:53 +01:00
Eric Dumazet	5e140dfc1f	net: reorder struct Qdisc for better SMP performance dev_queue_xmit() needs to dirty fields "state", "q", "bstats" and "qstats" On x86_64 arch, they currently span three cache lines, involving more cache line ping pongs than necessary, making longer holding of queue spinlock. We can reduce this to one cache line, by grouping all read-mostly fields at the beginning of structure. (Or should I say, all highly modified fields at the end :) ) Before patch : offsetof(struct Qdisc, state)=0x38 offsetof(struct Qdisc, q)=0x48 offsetof(struct Qdisc, bstats)=0x80 offsetof(struct Qdisc, qstats)=0x90 sizeof(struct Qdisc)=0xc8 After patch : offsetof(struct Qdisc, state)=0x80 offsetof(struct Qdisc, q)=0x88 offsetof(struct Qdisc, bstats)=0xa0 offsetof(struct Qdisc, qstats)=0xac sizeof(struct Qdisc)=0xc0 Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-20 01:33:32 -07:00
Stephen Hemminger	2e1ab634bf	rtnetlink: add new value for DHCP added routes To improve manageability, it would be good to be able to disambiguate routes added by administrator from those added by DHCP client. The only necessary kernel change is to add value to rtnetlink include file so iproute2 utility can use it. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-19 23:49:41 -07:00
Andrew Morton	ea7415512a	PCI: constify pci_bus_assign_resources() drivers/pci/hotplug/fakephp.c: In function 'pci_rescan_bus': drivers/pci/hotplug/fakephp.c:271: warning: passing argument 1 of 'pci_bus_assign_resources' discards qualifiers from pointer target type Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:35 -07:00
akpm@linux-foundation.org	c48f1670f4	PCI: constify pci_bus_add_devices() drivers/pci/hotplug/fakephp.c:283: warning: passing argument 1 of 'pci_bus_add_devices' discards qualifiers from pointer target type Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:35 -07:00
Kenji Kaneshige	9f5404d8ea	PCI/ACPI: rename pci_osc_control_set() - Rename pci_osc_control_set() to acpi_pci_osc_control_set() according to the other API names in drivers/acpi/pci_root.c. - Move _OSC related definitions to include/linux/acpi.h because _OSC related API is implemented in drivers/acpi/pci_root.c now. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Reviewed-by: Andrew Patterson <andrew.patterson@hp.com> Tested-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:33 -07:00
Kenji Kaneshige	63f10f0f6d	PCI/ACPI: move _OSC code to pci_root.c Move PCI _OSC management code from drivers/pci/pci-acpi.c to drivers/acpi/pci_root.c. The benefits are - We no longer need struct osc_data and its management code (contents are moved to struct acpi_pci_root). This simplify the code, and we no longer care about kmalloc() failure. - We can make pci_acpi_osc_support() be a static function, which is called only from drivers/acpi/pci_root.c. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Reviewed-by: Andrew Patterson <andrew.patterson@hp.com> Tested-by: Andrew Patterson <andrew.patterson@hp.com> Acked-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:32 -07:00
Rafael J. Wysocki	b43d451385	PCI/PCIe portdrv: Fix allocation of interrupts If MSI-X interrupt mode is used by the PCI Express port driver, too many vectors are allocated and it is not ensured that the right vectors will be used for the right services. Namely, the PCI Express specification states that both PCI Express native PME and PCI Express hotplug will always use the same MSI or MSI-X message for signalling interrupts, which implies that the same vector will be used by both of them. Also, the VC service does not use interrupts at all. Moreover, is not clear which of the vectors allocated by pci_enable_msix() in the current code will be used for PME and hotplug and which of them will be used for AER if all of these services are configured. For these reasons, rework the allocation of interrupts for PCI Express ports so that if MSI-X are enabled, the right vectors will be used for the right purposes. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:25 -07:00
Rafael J. Wysocki	a52e2e3513	PCI/MSI: Introduce pci_msix_table_size() Introduce new function pci_msix_table_size() returning the size of the MSI-X table of given PCI device or 0 if the device doesn't support MSI-X. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:25 -07:00
Rafael J. Wysocki	22106368c9	PCI: PCIe portdrv: Remove struct pcie_port_service_id The PCI Express port driver uses 'struct pcie_port_service_id' for matching port service devices and drivers, but this structure contains fields that duplicate information from the port device itself (vendor, device, subvendor, subdevice) and fields that are not used by any existing port service driver (class, class_mask, drvier_data). Also, both existing port service drivers (AER and PCIe HP) don't even use the vendor and device fields for device matching. Therefore 'struct pcie_port_service_id' can be removed altogether and the only useful members of it (port_type, service) can be introduced directly into the port service device and port service driver structures. That simplifies the code quite a bit and reduces its size. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:23 -07:00
Rafael J. Wysocki	0516c8bcd2	PCI: PCIe portdrv: Simplily probe callback of service drivers The second argument of the ->probe() callback in struct pcie_port_service_driver is unnecessary and never used. Remove it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:23 -07:00
Rafael J. Wysocki	90e9cd50f7	PCI: PCIe portdrv: Aviod using service devices with wrong interrupts The PCI Express port driver should not attempt to register service devices that require the ability to generate interrupts if generating interrupts is not possible. Namely, if the port has no interrupt pin configured and we cannot set up MSI or MSI-X for it, there is no way it can generate interrupts and in such a case the port services that rely on interrupts (PME, PCIe HP, AER) should not be enabled for it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:21 -07:00
Rafael J. Wysocki	1bf83e558c	PCI: PCIe portdrv: Use driver data to simplify code PCI Express port driver extension, as defined by struct pcie_port_device_ext in portdrv.h, is allocated and initialized, but never used (it also is never freed). Extend it to hold the PCI Express port type as well as the port interrupt mode, change its name and use it to simplify the code in portdrv_core.c . Additionally, remove the redundant interrupt_mode member of struct pcie_device defined in include/linux/pcieport_if.h . Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-19 19:29:20 -07:00
Russell King	7d83f8fca5	Merge branch 'master' of git://git.marvell.com/orion into devel Conflicts: arch/arm/mach-mx1/devices.c	2009-03-19 23:10:40 +00:00
Trond Myklebust	7fe5c398fc	NFS: Optimise NFS close() Close-to-open cache consistency rules really only require us to flush out writes on calls to close(), and require us to revalidate attributes on the very last close of the file. Currently we appear to be doing a lot of extra attribute revalidation and cache flushes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-19 15:35:50 -04:00
Trond Myklebust	7d1e8255cf	SUNRPC: Add the equivalent of the linger and linger2 timeouts to RPC sockets This fixes a regression against FreeBSD servers as reported by Tomas Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers don't ever react to the client closing the socket, and so commit `e06799f958` (SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket) causes the setup to hang forever whenever the client attempts to close and then reconnect. We break the deadlock by adding a 'linger2' style timeout to the socket, after which, the client will abort the connection using a TCP 'RST'. The default timeout is set to 15 seconds. A subsequent patch will put it under user control by means of a systctl. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2009-03-19 15:17:34 -04:00
Yevgeny Petrilin	27bf91d6a0	mlx4_core: Add link type autosensing When a port's link is down (except to driver restart) and the port is configured for auto sensing, we try to sense port link type (Ethernet or InfiniBand) in order to determine how to initialize the port. If the port type needs to be changed, all mlx4 for the device interfaces are unregistered and then registered again with the new port types. Sensing is done with intervals of 3 seconds. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-03-18 19:45:11 -07:00
Chuck Lever	2795e53b4e	SUNRPC: Clean up static inline functions in svc_xprt.h Clean up: Enable the use of const arguments in higher level svc_ APIs by adding const to the arguments of the helper functions in svc_xprt.h Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 18:13:28 -04:00
Greg Banks	03cf6c9f49	knfsd: add file to export stats about nfsd pools Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void )1) merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_()" by Harshula Jayasuriya. merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: Harshula Jayasuriya <harshula@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:42 -04:00
Greg Banks	59a252ff8c	knfsd: avoid overloading the CPU scheduler with enormous load averages Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374 Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:41 -04:00
David Shaw	31dec2538e	Short write in nfsd becomes a full write to the client If a filesystem being written to via NFS returns a short write count (as opposed to an error) to nfsd, nfsd treats that as a success for the entire write, rather than the short count that actually succeeded. For example, given a 8192 byte write, if the underlying filesystem only writes 4096 bytes, nfsd will ack back to the nfs client that all 8192 bytes were written. The nfs client does have retry logic for short writes, but this is never called as the client is told the complete write succeeded. There are probably other ways it could happen, but in my case it happened with a fuse (filesystem in userspace) filesystem which can rather easily have a partial write. Here is a patch to properly return the short write count to the client. Signed-off-by: David Shaw <dshaw@jabberwocky.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:40 -04:00
J. Bruce Fields	8b671b8070	nfsd4: remove use of mutex for file_hashtable As part of reducing the scope of the client_mutex, and in order to remove the need for mutexes from the callback code (so that callbacks can be done as asynchronous rpc calls), move manipulations of the file_hashtable under the recall_lock. Update the relevant comments while we're here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Alexandros Batsakis <batsakis@netapp.com> Reviewed-by: Benny Halevy <bhalevy@panasas.com>	2009-03-18 17:38:38 -04:00
J. Bruce Fields	6150ef0dc7	nfsd4: remove unused CHECK_FH flag All users now pass this, so it's meaningless. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:37 -04:00
J. Bruce Fields	203a8c8e66	nfsd4: separate delegreturn case from preprocess_stateid_op Delegreturn is enough a special case for preprocess_stateid_op to warrant just open-coding it in delegreturn. There should be no change in behavior here; we're just reshuffling code. Thanks to Yang Hongyang for catching a critical typo. Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:38:18 -04:00
Harvey Harrison	77f18f5e4e	nfs: replace uses of __constant_{endian} The base versions handle constant folding now, none of these headers are exported to userspace, so the __ prefixed versions are not necessary. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:30:51 -04:00
J. Bruce Fields	6c02eaa1d1	nfsd4: use helper for copying delegation filehandle Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:30:49 -04:00
J. Bruce Fields	a4773c08f2	nfsd4: use helper for copying filehandles for replay Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-03-18 17:30:49 -04:00
Florian Westphal	711d60a9e7	netfilter: remove nf_ct_l4proto_find_get/nf_ct_l4proto_put users have been moved to __nf_ct_l4proto_find. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-03-18 17:30:50 +01:00
Ingo Molnar	705bb9dc72	Merge branches 'x86/cleanups', 'x86/cpu', 'x86/debug', 'x86/mce2', 'x86/mm', 'x86/mtrr', 'x86/setup', 'x86/setup-memory', 'x86/urgent', 'x86/uv', 'x86/x2apic' and 'linus' into x86/core Conflicts: arch/parisc/kernel/irq.c	2009-03-18 13:19:49 +01:00
Ingo Molnar	84be58d460	dma-debug: fix dma_debug_add_bus() definition for !CONFIG_DMA_API_DEBUG Impact: build fix Fix: arch/x86/kvm/x86.o: In function `dma_debug_add_bus': (.text+0x0): multiple definition of `dma_debug_add_bus' dma_debug_add_bus() should be a static inline function. Cc: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20090317120112.GP6159@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-18 11:53:48 +01:00
Ingo Molnar	95f3c4ebff	Merge branch 'dma-api/debug' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu	2009-03-18 10:37:48 +01:00
Ingo Molnar	04dfcfcb54	Merge branch 'linus' into core/iommu	2009-03-18 10:37:43 +01:00
Ingo Molnar	37ba317c9e	Merge branches 'sched/cleanups' and 'linus' into sched/core	2009-03-18 09:57:02 +01:00
Ingo Molnar	327019b01e	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-18 06:59:56 +01:00

... 16 17 18 19 20 ...

30950 commits