linux-pinenote

Author	SHA1	Message	Date
Herbert Xu	4cc7f68d65	net: Reexport sock_alloc_send_pskb The function sock_alloc_send_pskb is completely useless if not exported since most of the code in it won't be used as is. In fact, this code has already been duplicated in the tun driver. Now that we need accounting in the tun driver, we can in fact use this function as is. So this patch marks it for export again. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:55:54 -08:00
Herbert Xu	9a279bcbe3	net: Partially allow skb destructors to be used on receive path As it currently stands, skb destructors are forbidden on the receive path because the protocol end-points will overwrite any existing destructor with their own. This is the reason why we have to call skb_orphan in the loopback driver before we reinject the packet back into the stack, thus creating a period during which loopback traffic isn't charged to any socket. With virtualisation, we have a similar problem in that traffic is reinjected into the stack without being associated with any socket entity, thus providing no natural congestion push-back for those poor folks still stuck with UDP. Now had we been consistent in telling them that UDP simply has no congestion feedback, I could just fob them off. Unfortunately, we appear to have gone to some length in catering for this on the standard UDP path, with skb/socket accounting so that has created a very unhealthy dependency. Alas habits are difficult to break out of, so we may just have to allow skb destructors on the receive path. It turns out that making skb destructors useable on the receive path isn't as easy as it seems. For instance, simply adding skb_orphan to skb_set_owner_r isn't enough. This is because we assume all over the IP stack that skb->sk is an IP socket if present. The new transparent proxy code goes one step further and assumes that skb->sk is the receiving socket if present. Now all of this can be dealt with by adding simple checks such as only treating skb->sk as an IP socket if skb->sk->sk_family matches. However, it turns out that for bridging at least we don't need to do all of this work. This is of interest because most virtualisation setups use bridging so we don't actually go through the IP stack on the host (with the exception of our old nemesis the bridge netfilter, but that's easily taken care of). So this patch simply adds skb_orphan to the point just before we enter the IP stack, but after we've gone through the bridge on the receive path. It also adds an skb_orphan to the one place in netfilter that touches skb->sk/skb->destructor, that is, tproxy. One word of caution, because of the internal code structure, anyone wishing to deploy this must use skb_set_owner_w as opposed to skb_set_owner_r since many functions that create a new skb from an existing one will invoke skb_set_owner_w on the new skb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:55:27 -08:00
David S. Miller	7870389478	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2009-02-04 16:52:41 -08:00
David S. Miller	005c79b3d4	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/	2009-02-04 16:51:58 -08:00
Andy Fleming	4d7902f22b	gianfar: Fix stashing support Stashing is only supported on the 85xx (e500-based) SoCs. The 83xx and 86xx chips don't have a proper cache for this. U-Boot has been updated to add stashing properties to the device tree nodes of gianfar devices on 85xx. So now we modify Linux to keep stashing off unless those properties are there. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:43:44 -08:00
Andy Fleming	0fd56bb5be	gianfar: Add support for skb recycling Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:43:16 -08:00
Andy Fleming	1577ecef76	netdev: Merge UCC and gianfar MDIO bus drivers The MDIO bus drivers for the UCC and gianfar ethernet controllers are essentially the same. There's no reason to duplicate that much code. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:42:35 -08:00
Andy Fleming	b98ac702f4	gianfar: Fix potential soft reset race SOFT_RESET must be asserted for at least 3 TX clocks in order for it to work properly. The syncs in the gfar_write() commands have been hiding this, but we need to guarantee it. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:38:05 -08:00
Andy Fleming	1fbe49328f	gianfar: Fix BD_LENGTH_MASK definition BD_LENGTH_MASK is supposed to catch the low 16-bits of the status field, not the low byte. The old way, we would never be able to clean up tx packets with sizes divisible by 256. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:37:40 -08:00
Alex Williamson	9c46f6d42f	virtio_net: Allow setting the MAC address of the NIC Many physical NICs let the OS re-program the "hardware" MAC address. Virtual NICs should allow this too. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:36:34 -08:00
Alex Williamson	0bde95690d	virtio_net: Add support for VLAN filtering in the hypervisor VLAN filtering allows the hypervisor to drop packets from VLANs that we're not a part of, further reducing the number of extraneous packets recieved. This makes use of the VLAN virtqueue command class. The CTRL_VLAN feature bit tells us whether the backend supports VLAN filtering. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:35:13 -08:00
Alex Williamson	f565a7c259	virtio_net: Add a MAC filter table Make use of the MAC control virtqueue class to support a MAC filter table. The filter table is managed by the hypervisor. We consider the table to be available if the CTRL_RX feature bit is set. We leave it to the hypervisor to manage the table and enable promiscuous or all-multi mode as necessary depending on the resources available to it. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:35:13 -08:00
Alex Williamson	2af7698e2d	virtio_net: Add a set_rx_mode interface Make use of the RX_MODE control virtqueue class to enable the set_rx_mode netdev interface. This allows us to selectively enable/disable promiscuous and allmulti mode so we don't see packets we don't want. For now, we automatically enable these as needed if additional unicast or multicast addresses are requested. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:35:12 -08:00
Alex Williamson	2a41f71d3b	virtio_net: Add a virtqueue for outbound control commands This will be used for RX mode, MAC filter table, VLAN filtering, etc... The control transaction consists of one or more "out" sg entries and one or more "in" sg entries. The first out entry contains a header defining the class and command. Additional out entries may provide data for the command. The last in entry provides a status response back from the command. Virtqueues typically run asynchronous, running a callback function when there's data in the channel. We can't readily make use of this in the command paths where we need to use this. Instead, we kick the virtqueue and spin. The kick causes an I/O write, triggering an immediate trap into the hypervisor. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:35:11 -08:00
Divy Le Ray	65ab8385b6	cxgb3: Fix lro switch The LRO switch is always set to 1 in the rx processing loop. It breaks the accelerated iSCSI receive traffic. Fix its computation. Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 16:31:39 -08:00
Alex Chiang	4560839939	x86: fix grammar in user-visible BIOS warning Fix user-visible grammo. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 01:14:38 +01:00
Pavel Emelyanov	a6a95406c6	x86: fix hpet timer reinit for x86_64 There's a small problem with hpet_rtc_reinit function - it checks for the: hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0 to continue increasing both the HPET_T1_CMP (register) and the hpet_t1_cmp (variable). But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp is 64-bit this condition will always be FALSE once the latter hits the 32-bit boundary, and we can have a situation, when we don't increase the HPET_T1_CMP register high enough. The result - timer stops ticking, since HPET_T1_CMP becomes less, than the COUNTER and never increased again. The solution is (based on Linus's suggestion) to not compare 64-bits (on 64-bit x86), but to do the comparison on 32-bit signed integers. Reported-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 01:04:16 +01:00
Takashi Iwai	f67d8176ba	ALSA: hda - Add quirk for FSC Amilo Xi2550 Added model=fujisu-pi2515 for FSC Amilo Xi2550 with ALC883 codec. Refernece: Novell bnc#450979 https://bugzilla.novell.com/show_bug.cgi?id=450979 Signed-off-by: Takashi Iwai <tiwai@suse.de>	2009-02-04 23:31:50 +01:00
Linus Torvalds	647802d6db	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: APIC: enable workaround on AMD Fam10h CPUs xen: disable interrupts before saving in percpu x86: add x86@kernel.org to MAINTAINERS x86: push old stack address on irqstack for unwinder irq, x86: fix lock status with numa_migrate_irq_desc x86: add cache descriptors for Intel Core i7 x86/Voyager: make it build and boot	2009-02-04 13:58:50 -08:00
Linus Torvalds	3e561f975c	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: add missing kernel-doc in sched.h	2009-02-04 13:58:37 -08:00
Linus Torvalds	9f96ae6ee0	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: do_each_pid_task() needs rcu lock	2009-02-04 13:58:24 -08:00
David S. Miller	dc4ff585ff	sparc64: Call dump_stack() in die_nmi(). Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-04 13:48:11 -08:00
Suresh Siddha	483b4ee60e	sched: fix nohz load balancer on cpu offline Christian Borntraeger reports: > After a logical cpu offline, even on a complete idle system, there > is one cpu with full ticks. It turns out that nohz.cpu_mask has the > the offlined cpu still set. > > In select_nohz_load_balancer() we check if the system is completely > idle to turn of load balancing. We compare cpu_online_map with > nohz.cpu_mask. Since cpu_online_map is updated on cpu unplug, > but nohz.cpu_mask is not, the check fails and the scheduler believes > that we need an "idle load balancer" even on a fully idle system. > Since the ilb cpu does not deactivate the timer tick this breaks NOHZ. Fix the select_nohz_load_balancer() to not set the nohz.cpu_mask while a cpu is going offline. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-04 22:31:19 +01:00
Borislav Petkov	36723bfe81	x86/Kconfig.cpu: make Kconfig help readable in the console Impact: cleanup Some lines exceed the 80 char width making them unreadable. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-04 22:19:27 +01:00
Reinette Chatre	c4e061ace7	iwlwifi: save PCI state before suspend, restore after resume This is the right thing to do and fixes the following warning: [ 115.012278] ------------[ cut here ]------------ [ 115.012281] WARNING: at drivers/pci/pci-driver.c:370 pci_legacy_suspend+0x85/0xc2() [ 115.012285] Hardware name: Latitude D630 [ 115.012301] PCI PM: Device state not saved by iwl3945_pci_suspend+0x0/0x4c [iwl3945] [ 115.012304] Modules linked in: fuse nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 acpi_cpufreq kvm_intel kvm snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_seq_device snd_pcm_oss snd_mixer_oss ecb snd_pcm cryptomgr aead snd_timer crypto_blkcipher snd snd_page_alloc ohci1394 crypto_hash crypto_algapi ch341 ieee1394 usbserial thermal iwl3945 mac80211 led_class lib80211 tg3 processor i2c_i801 i2c_core sg cfg80211 libphy usbhid battery ac button sr_mod cdrom evdev dcdbas ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode] [ 115.012374] Pid: 4163, comm: pm-suspend Not tainted 2.6.29-rc3-00227-gf1dd849-dirty #67 [ 115.012377] Call Trace: [ 115.012382] [<ffffffff8023d04d>] warn_slowpath+0xb1/0xed [ 115.012387] [<ffffffff80450b5e>] ? _spin_unlock_irqrestore+0x5c/0x78 [ 115.012390] [<ffffffff80254f08>] ? up+0x34/0x39 [ 115.012394] [<ffffffff80362319>] ? acpi_ut_release_mutex+0x5d/0x61 [ 115.012397] [<ffffffff803584b2>] ? acpi_get_data+0x5e/0x70 [ 115.012400] [<ffffffff80363dd9>] ? acpi_bus_get_device+0x25/0x39 [ 115.012403] [<ffffffff80363e98>] ? acpi_bus_power_manageable+0x11/0x29 [ 115.012406] [<ffffffff803462f7>] ? acpi_pci_power_manageable+0x17/0x19 [ 115.012410] [<ffffffff8033ddfd>] ? pci_set_power_state+0xcc/0x101 [ 115.012418] [<ffffffffa01f28e9>] ? iwl3945_pci_suspend+0x0/0x4c [iwl3945] [ 115.012422] [<ffffffff803401e6>] pci_legacy_suspend+0x85/0xc2 [ 115.012425] [<ffffffff80340316>] pci_pm_suspend+0x34/0x86 [ 115.012429] [<ffffffff8039d7ce>] pm_op+0x52/0xe5 [ 115.012432] [<ffffffff8039dd78>] device_suspend+0x32a/0x451 [ 115.012436] [<ffffffff80269ec2>] suspend_devices_and_enter+0x3e/0x13a [ 115.012439] [<ffffffff8026a128>] enter_state+0x110/0x164 [ 115.012442] [<ffffffff8026a233>] state_store+0xb7/0xd7 [ 115.012446] [<ffffffff8032f95f>] kobj_attr_store+0x17/0x19 [ 115.012449] [<ffffffff80307d64>] sysfs_write_file+0xe4/0x119 [ 115.012453] [<ffffffff802baa7a>] vfs_write+0xae/0x137 [ 115.012456] [<ffffffff802babc7>] sys_write+0x47/0x70 [ 115.012459] [<ffffffff8020b73a>] system_call_fastpath+0x16/0x1b [ 115.012467] ---[ end trace 829828966f6f24dc ]--- Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Ming Lei <tom.leiming@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2009-02-04 16:11:42 -05:00
Reinette Chatre	5e46882e2e	iwlwifi: clean key table in iwl_clear_stations_table Cleans uCode key table bit map iwl_clear_stations_table since all stations are cleared also the key table must be. Since the keys are not removed properly on suspend by mac80211 this may result in exhausting key table on resume leading to memory corruption during removal Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2009-02-04 16:11:42 -05:00
Kyle McMartin	48ec4d9537	x86, 64-bit: print DMI info in the oops trace This patch echoes what we already do on 32-bit since `90f7d25c6b`, and prints the DMI product name in show_regs, so that system specific problems can be easily identified. Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-04 22:10:12 +01:00
Mark Fasheh	436443f0f7	Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()" This reverts commit `0e0333429a`. I committed this by accident - Joel and Louis are working with the lockdep maintainer to provide a better solution than just turning lockdep off. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: <Joel Becker <joel.becker@oracle.com>	2009-02-04 09:46:25 -08:00
Linus Torvalds	9dfea1b46d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: pcm_oss: AFMT_S24_LE is set twice in return value ALSA: ASoC: email - update email addresses. OMAP: ASoC: Fix spinlock misuse in omap-pcm.c ALSA: hda - No widget selection for volume knob widgets in proc output ALSA: hda - Add support of iMac 24 Aluminium ALSA: alsa: time reaches -1, tested 0 ALSA: hda - Add quirk for another HP dv5 model	2009-02-04 09:39:12 -08:00
Takashi Iwai	7df0eb424d	Merge branch 'fix/asoc' into for-linus	2009-02-04 18:19:11 +01:00
Takashi Iwai	af7af69039	Merge branch 'fix/hda' into for-linus	2009-02-04 18:19:07 +01:00
Roel Kluin	7924f0cadc	ALSA: pcm_oss: AFMT_S24_LE is set twice in return value AFMT_S24_LE is set twice in return value vi sound/core/oss/pcm_oss.c +640 #define AFMT_S24_LE 0x00008000 #define AFMT_S24_BE 0x00010000 Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2009-02-04 18:18:03 +01:00
Linus Torvalds	eda58a85ec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (40 commits) Blackfin arch: Remove outdated code Blackfin arch: Fix udelay implementation Blackfin arch: Update Copyright information Blackfin arch: Add BF561 PPI POLS, POLC Masks Blackfin arch: Update CM-BF527 kernel config Blackfin arch: define bfin_memmap as static since it is only used here Blackfin arch: cplb mananger: use a do...while loop rather than a for loop Blackfin arch: fix bug - traps test case 19 for exception 0x2d fails Blackfin arch: add platform device bfin_mii-bus and KSZ8893M switch driver platform resources to board files Blackfin arch: build jtag tty driver as a module by default Blackfin arch: fix 2 bugs related to debug Blackfin arch: Add ANOMALY_05000380 to BF54x to kill the compile warning Blackfin arch: Fix bug - 561 SMP kernel can't boot from jffs2 Blackfin arch: base SIC_IWR# programming on whether the MMR exists Blackfin arch: read SYSCR on newer parts that mirror the bits of SWRST in it Blackfin arch: fixup board init function name Blackfin arch: drop CONFIG_I2C_BOARDINFO ifdefs Blackfin arch: bfin_reset->_bfin_reset redirection no longer needed Blackfin arch: sync reboot handler with version in u-boot Blackfin arch: Faster Implementation of csum_tcpudp_nofold() ...	2009-02-04 07:56:25 -08:00
Linus Torvalds	024bb9617e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc64: Kill bogus TPC/address truncation during 32-bit faults. sparc: fixup for sparseirq changes sparc64: Validate kernel generated fault addresses on sparc64. sparc64: On non-Niagara, need to touch NMI watchdog in NOHZ mode. sparc64: Implement NMI watchdog on capable cpus. sparc: Probe PMU type and record in sparc_pmu_type. sparc64: Move generic PCR support code to seperate file.	2009-02-04 07:54:00 -08:00
Linus Torvalds	25431e900d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: sunrpc: fix rdma dependencies e1000: Fix PCI enable to honor the need_ioport flag sgi-xp: link XPNET's net_device_ops to its net_device structure pcnet_cs: Fix misuse of the equality operator. hso: add new device id's dca: redesign locks to fix deadlocks cassini/sungem: limit reaches -1, but 0 tested net: variables reach -1, but 0 tested qlge: bugfix: Add missing netif_napi_del call. qlge: bugfix: Add flash offset for second port. qlge: bugfix: Fix endian issue when reading flash. udp: increments sk_drops in __udp_queue_rcv_skb() net: Fix userland breakage wrt. linux/if_tunnel.h net: packet socket packet_lookup_frame fix	2009-02-04 07:52:21 -08:00
Linus Torvalds	0d7a063fa7	Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd * 'for-linus' of git://git.o-hand.com/linux-mfd: mfd: Remove non exported references from pcf50633	2009-02-04 07:40:54 -08:00
Chris Mason	9b0d3ace33	Btrfs: don't return congestion in write_cache_pages as often On fast devices that go from congested to uncongested very quickly, pdflush is waiting too often in congestion_wait, and the FS is backing off to easily in write_cache_pages. For now, fix this on the btrfs side by only checking congestion after some bios have already gone down. Longer term a real fix is needed for pdflush, but that is a larger project. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:33:00 -05:00
Chris Mason	7b78c170dc	Btrfs: Only prep for btree deletion balances when nodes are mostly empty Whenever an item deletion is done, we need to balance all the nodes in the tree to make sure we don't end up with an empty node if a pointer is deleted. This balance prep happens from the root of the tree down so we can drop our locks as we go. reada_for_balance was triggering read-ahead on neighboring nodes even when no balancing was required. This adds an extra check to avoid calling balance_level() and avoid reada_for_balance() when a balance won't be required. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:12:46 -05:00
Chris Mason	12f4daccfc	Btrfs: fix btrfs_unlock_up_safe to walk the entire path btrfs_unlock_up_safe would break out at the first NULL node entry or unlocked node it found in the path. Some of the callers have missing nodes at the lower levels of the path, so this commit fixes things to check all the nodes in the path before returning. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:31:42 -05:00
Chris Mason	4d081c41a4	Btrfs: change btrfs_del_leaf to drop locks earlier btrfs_del_leaf does two things. First it removes the pointer in the parent, and then it frees the block that has the leaf. It has the parent node locked for both operations. But, it only needs the parent locked while it is deleting the pointer. After that it can safely free the block without the parent locked. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:31:28 -05:00
Chris Mason	06d9a8d7c2	Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode btrfs_truncate_inode_items is setup to stop doing btree searches when it has finished removing the items for the inode. It used to detect the end of the inode by looking for an objectid that didn't match the one we were searching for. But, this would result in an extra search through the btree, which adds extra balancing and cow costs to the operation. This commit adds a check to see if we found the inode item, which means we can stop searching early. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:30:58 -05:00
Chris Mason	f03d9301f1	Btrfs: Don't try to compress pages past i_size The compression code had some checks to make sure we were only compressing bytes inside of i_size, but it wasn't catching every case. To make things worse, some incorrect math about the number of bytes remaining would make it try to compress more pages than the file really had. The fix used here is to fall back to the non-compression code in this case, which does all the proper cleanup of delalloc and other accounting. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:31:06 -05:00
Josef Bacik	811449496b	Btrfs: join the transaction in __btrfs_setxattr With selinux on we end up calling __btrfs_setxattr when we create an inode, which calls btrfs_start_transaction(). The problem is we've already called that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a wait_current_trans(). If btrfs-transaction has started committing it will wait for all handles to finish, while the other process is waiting for the transaction to commit. This is fixed by using btrfs_join_transaction, which won't wait for the transaction to commit. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-02-04 09:18:33 -05:00
Chris Ball	8c087b5183	Btrfs: Handle SGID bit when creating inodes Before this patch, new files/dirs would ignore the SGID bit on their parent directory and always be owned by the creating user's uid/gid. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:29:54 -05:00
Chris Mason	bd56b30205	Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks Every transaction in btrfs creates a new snapshot, and then schedules the snapshot from the last transaction for deletion. Snapshot deletion works by walking down the btree and dropping the reference counts on each btree block during the walk. If if a given leaf or node has a reference count greater than one, the reference count is decremented and the subtree pointed to by that node is ignored. If the reference count is one, walking continues down into that node or leaf, and the references of everything it points to are decremented. The old code would try to work in small pieces, walking down the tree until it found the lowest leaf or node to free and then returning. This was very friendly to the rest of the FS because it didn't have a huge impact on other operations. But it wouldn't always keep up with the rate that new commits added new snapshots for deletion, and it wasn't very optimal for the extent allocation tree because it wasn't finding leaves that were close together on disk and processing them at the same time. This changes things to walk down to a level 1 node and then process it in bulk. All the leaf pointers are sorted and the leaves are dropped in order based on their extent number. The extent allocation tree and commit code are now fast enough for this kind of bulk processing to work without slowing the rest of the FS down. Overall it does less IO and is better able to keep up with snapshot deletions under high load. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:27:02 -05:00
Chris Mason	b4ce94de9b	Btrfs: Change btree locking to use explicit blocking points Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:25:08 -05:00
Chris Mason	c487685d7c	Btrfs: hash_lock is no longer needed Before metadata is written to disk, it is updated to reflect that writeout has begun. Once this update is done, the block must be cow'd before it can be modified again. This update was originally synchronized by using a per-fs spinlock. Today the buffers for the metadata blocks are locked before writeout begins, and everyone that tests the flag has the buffer locked as well. So, the per-fs spinlock (called hash_lock for no good reason) is no longer required. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:24:25 -05:00
Chris Mason	3935127c50	Btrfs: disable leak debugging checks in extent_io.c extent_io.c has debugging code to report and free leaked extent_state and extent_buffer objects at rmmod time. This helps track down leaks and it saves you from rebooting just to properly remove the kmem_cache object. But, the code runs under a fairly expensive spinlock and the checks to see if it is currently enabled are not entirely consistent. Some use #ifdef and some #if. This changes everything to #if and disables the leak checking. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:24:05 -05:00
Chris Mason	b7a9f29fcf	Btrfs: sort references by byte number during btrfs_inc_ref When a block goes through cow, we update the reference counts of everything that block points to. The internal pointers of the block can be in just about any order, and it is likely to have clusters of things that are close together and clusters of things that are not. To help reduce the seeks that come with updating all of these reference counts, sort them by byte number before actual updates are done. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:23:45 -05:00
Chris Mason	b51912c91f	Btrfs: async threads should try harder to find work Tracing shows the delay between when an async thread goes to sleep and when more work is added is often very short. This commit adds a little bit of delay and extra checking to the code right before we schedule out. It allows more work to be added to the worker without requiring notifications from other procs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-02-04 09:23:24 -05:00

... 36 37 38 39 40 ...

133426 commits