Commit graph

106172 commits

Author SHA1 Message Date
Matthew Wilcox
2351ec533e Remove asm/semaphore.h
All users have now been converted to linux/semaphore.h and we don't need
to keep these files around any longer.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-07-24 08:31:12 -04:00
Matthew Wilcox
6310e47271 Remove use of asm/semaphore.h
Change to use linux/semaphore.h

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-07-24 08:31:00 -04:00
Matthew Wilcox
0f17e4c796 Add missing semaphore.h includes
These files use semaphores but don't include semaphore.h

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2008-07-24 08:30:48 -04:00
Matthew Wilcox
78305de2f9 Remove mention of semaphores from kernel-locking
Since the consensus seems to be to eliminate semaphores where possible,
we shouldn't be educating people about how to use them as locks.  Use
mutexes instead.  Semaphores should be described in a separate document
if we end up keeping them.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-24 08:29:40 -04:00
Peter Zijlstra
58838cf3ca sched: clean up compiler warning
Reported-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 13:24:57 +02:00
Vegard Nossum
04bbe430f7 x86: fix header export, asm-x86/processor-flags.h, CONFIG_* leaks
Apparently,

commit 6330a30a76
Author: Vegard Nossum <vegard.nossum@gmail.com>
Date:   Wed May 28 09:46:19 2008 +0200

    x86: break mutual header inclusion

introduced some CONFIG names to processor-flags.h, which was exported in

commit 6093015db2
Author: Ingo Molnar <mingo@elte.hu>
Date:   Sun Mar 30 11:45:23 2008 +0200

    x86: cleanup replace most vm86 flags with flags from processor-flags.h, fix

Fix it by wrapping the CONFIG parts in __KERNEL__.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 12:49:53 +02:00
Hugh Dickins
9d25d4db81 x86: BUILD_IRQ say .text to avoid .data.percpu
When I edit the x86_64 Makefile to -fno-unit-at-a-time, bootup panics
on 0xCCs in IRQ0x3e_interrupt(): IRQ0x20_interrupt etc. have got linked
into .data.percpu.  Perhaps there are other ways of triggering that:
specify ".text" in the BUILD_IRQ() macro for safety.

I've been using -fno-unit-at-a-time (to lessen inlining, for easier
debugging) for a long time.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 12:42:57 +02:00
Artem Bityutskiy
eeb16e87b6 UBI: fix gcc warning
Fix the following warning:

drivers/mtd/ubi/vmt.c: In function 'ubi_rename_volumes':
drivers/mtd/ubi/vmt.c:642: warning: statement with no effect

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:36:10 +03:00
Artem Bityutskiy
9869cd801c UBI: remove pre-sqnum images support
Before UBI got into mainline, there was a slight flash format
change - we did not have sequence number support, then added it.

We have carried full support of those ancient images till this
moment. Now the support is removed, well, not fully removed.

Now UBI will support only _clean_ old images, which were cleanly
detached last time (just before kernel upgrade). This is most
likely the case.

But we will not support unclean ancient images. Surprisingly,
this allows us to remove a big chunk of legacy code.

And the same should be true for downgrading: clean images should
downgrade fine, but unclean ones will not.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:36:09 +03:00
Artem Bityutskiy
ebaaf1af3e UBI: fix kernel-doc errors and warnings
No functional changes, just tweak comments to make kernel-doc
work fine and stop complaining.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:36:09 +03:00
Artem Bityutskiy
9c9ec14770 UBI: fix checkpatch.pl errors and warnings
Just out or curiousity ran checkpatch.pl for whole UBI,
and discovered there are quite a few of stylistic issues.
Fix them.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:36:09 +03:00
Artem Bityutskiy
4d88de4beb UBI: bugfix - do not torture PEB needlessly
This is probably a copy-paste bug - we torture the old PEB
in the atomic LEB change function, but we should not do this.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:34:46 +03:00
Artem Bityutskiy
8c1e6ee10b UBI: rework scrubbing messages
If bit-flips happen often, UBI prints to many messages. Lessen
the amount by only printing the messages when the PEB has been
scrubbed. Also, print torturing messages.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:34:46 +03:00
Artem Bityutskiy
f40ac9cdf6 UBI: implement multiple volumes rename
Quite useful ioctl which allows to make atomic system upgrades.
The idea belongs to Richard Titmuss <richard_titmuss@logitech.com>

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:34:46 +03:00
Artem Bityutskiy
c8566350a3 UBI: fix and re-work debugging stuff
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:34:45 +03:00
Artem Bityutskiy
85c6e6e282 UBI: amend commentaries
Hch asked not to use "unit" for sub-systems, let it be so.
Also some other commentaries modifications.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Artem Bityutskiy
bb84c1a199 UBI: fix error message
The ubi_err() macro will add \n.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Artem Bityutskiy
a6ea440769 UBI: improve mkvol request validation
Check that volume name is not shorter than 'name_len'.

No need to copy the trailing zero byte because whole array
was zeroed earlier.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Artem Bityutskiy
a5bf619041 UBI: add ubi_sync() interface
To flush MTD device caches.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Bruce Leonard
73789a3d9f UBI: fix 64-bit calculations
Signed-off-by: Bruce Leonard <brucle@selinc.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Artem Bityutskiy
23add7455c UBI: fix LEB locking
leb_read_unlock() may be called simultaniously by several tasks.
The would race at the following code:

 up_read(&le->mutex);
 if (free)
         kfree(le);

And it is possible that one task frees 'le' before the other tasks
do 'up_read()'. Fix this by doing up_read and free inside the
'ubi->ltree' lock. Below it the oops we had because of this:

BUG: spinlock bad magic on CPU#0, integck/7504
BUG: unable to handle kernel paging request at 6b6b6c4f
IP: [<c0211221>] spin_bug+0x5c/0xdb
*pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ubifs ubi nandsim nand nand_ids nand_ecc video output

Pid: 7504, comm: integck Not tainted (2.6.26-rc3ubifs26 #8)
EIP: 0060:[<c0211221>] EFLAGS: 00010002 CPU: 0
EIP is at spin_bug+0x5c/0xdb
EAX: 00000032 EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: f7f7ce30
ESI: f76491dc EDI: c044f51f EBP: e8a736cc ESP: e8a736a8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process integck (pid: 7504, ti=e8a72000 task=f7f7ce30 task.ti=e8a72000)
Stack: c044f754 c044f51f 00000000 f7f7d024 00001d50 00000001 f76491dc 00000296       f6df50e0 e8a736d8 c02112f0 f76491dc e8a736e8 c039157a f7d9e830 f76491d8       e8a7370c c020b975 f76491dc 00000296 f76491f8 00000000 f76491d8 00000000 Call Trace:
[<c02112f0>] ? _raw_spin_unlock+0x50/0x7c
[<c039157a>] ? _spin_unlock_irqrestore+0x20/0x58
[<c020b975>] ? rwsem_wake+0x4b/0x122
[<c0390e0a>] ? call_rwsem_wake+0xa/0xc
[<c0139ee7>] ? up_read+0x28/0x31
[<f8873b3c>] ? leb_read_unlock+0x73/0x7b [ubi]
[<f88742a3>] ? ubi_eba_read_leb+0x195/0x2b0 [ubi]
[<f8872a04>] ? ubi_leb_read+0xaf/0xf8 [ubi]

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:56 +03:00
Artem Bityutskiy
472018f73e UBI: fix memory leak on error path
Normally UBI volumes are freed in the release function of
the struct device object. However, on error path they may
have to be freed before the struct device objects have been
initialized.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:55 +03:00
Artem Bityutskiy
505d1caa79 UBI: do not forget to free internal volumes
UBI forgets to free internal volumes when detaching MTD device.
Fix this.

Pointed-out-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:55 +03:00
Artem Bityutskiy
abc5e92262 UBI: fix memory leak
ubi_free_volume() function sets ubi->volumes[] to NULL, so
ubi_eba_close() is useless, it does not free what has to be freed.
So zap it and free vol->eba_tbl at the volume release function.

Pointed-out-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:54 +03:00
Kyungmin Park
cadb40ccc1 UBI: avoid unnecessary division operations
UBI already checks that @min io size is the power of 2 at io_init.
It is save to use bit operations then.

Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:54 +03:00
Kyungmin Park
a0fd1efd48 UBI: fix buffer padding
Instead of correctly pad the buffer wich we are writing to the
eraseblock during update, we used weird construct:

memset(buf + len, 0xFF, len - len);

Fix this.

Signed-off-by: Kyungmin Park <kmpark@infradead.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:54 +03:00
Artem Bityutskiy
beeea63603 UBI: add a comment
It is not clear why we schedule PEB for scrubbing in case of
-EBADMSG. Elaborate.

Requested-by: Kyungmin Park <kmpark@infradead.org>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:54 +03:00
Artem Bityutskiy
979c9296bd UBI: print error code
Print error code if checking failed which is very useful
to identify problems.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-07-24 13:32:54 +03:00
Jeremy Fitzhardinge
2dc1697eb3 xen: don't use sysret for sysexit32
When implementing sysexit32, don't let Xen use sysret to return to
userspace.  That results in usermode register state being trashed.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 12:28:12 +02:00
Jeremy Fitzhardinge
9e882c9282 x86: call early_cpu_init at the same point
Call early_cpu_init() at the same (early) point in setup_arch().
The x86_64 code was calling it relatively late, after when other arch
code need to do cpu-related setup which depends on it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-24 12:28:11 +02:00
Jarek Poplawski
f867e6af94 pkt_sched: sch_sfq: dump a real number of flows
Dump the "flows" number according to the number of active flows
instead of repeating the "limit".

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 21:34:27 -07:00
Linus Torvalds
338b9bb3ad Merge branch 'x86/auditsc' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'x86/auditsc' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  i386 syscall audit fast-path
  x86_64 ia32 syscall audit fast-path
  x86_64 syscall audit fast-path
  x86_64: remove bogus optimization in sysret_signal
2008-07-23 20:39:21 -07:00
Chas Williams
6f75a9b642 atm: [fore200e] use MODULE_FIRMWARE() and other suggested cleanups
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 20:29:21 -07:00
Linus Torvalds
7f9dce3837 Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: hrtick_enabled() should use cpu_active()
  sched, x86: clean up hrtick implementation
  sched: fix build error, provide partition_sched_domains() unconditionally
  sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
  cpu hotplug: Make cpu_active_map synchronization dependency clear
  cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
  sched: rework of "prioritize non-migratable tasks over migratable ones"
  sched: reduce stack size in isolated_cpu_setup()
  Revert parts of "ftrace: do not trace scheduler functions"

Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
2008-07-23 19:36:53 -07:00
Linus Torvalds
26dcce0fab Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
  NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
  cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
  NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
  NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
  NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
  cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
  cpumask: Provide a generic set of CPUMASK_ALLOC macros
  cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
  cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
  cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
  cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
  cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
  Revert "cpumask: introduce new APIs"
  cpumask: make for_each_cpu_mask a bit smaller
  net: Pass reference to cpumask variable in net/sunrpc/svc.c
  ...

Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-23 18:37:44 -07:00
Linus Torvalds
d7b6de14a0 Merge branch 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  softlockup: fix invalid proc_handler for softlockup_panic
  softlockup: fix watchdog task wakeup frequency
  softlockup: fix watchdog task wakeup frequency
  softlockup: show irqtrace
  softlockup: print a module list on being stuck
  softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
  softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
  softlockup: fix softlockup_thresh fix
  softlockup: fix softlockup_thresh unaligned access and disable detection at runtime
  softlockup: allow panic on lockup
2008-07-23 18:34:13 -07:00
Linus Torvalds
30d38542ec Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (85 commits)
  [ARM] pxa: add base support for PXA930 Handheld Platform (aka SAAR)
  [ARM] pxa: add base support for PXA930 Evaluation Board (aka TavorEVB)
  [ARM] pxa: add base support for PXA930 (aka Tavor-P)
  [ARM] Update mach-types
  [ARM] pxa: make littleton to use the new smc91x platform data
  [ARM] pxa: make zylonite to use the new smc91x platform data
  [ARM] pxa: make mainstone to use the new smc91x platform data
  [ARM] pxa: make lubbock to use new smc91x platform data
  [NET] smc91x: prepare SMC_USE_PXA_DMA to be specified in platform data
  [NET] smc91x: prepare for SMC_IO_SHIFT to be a platform configurable variable
  [NET] smc91x: add SMC91X_NOWAIT flag to platform data
  [NET] smc91x: favor the use of SMC91X_USE_* instead of SMC_CAN_USE_*
  [NET] smc91x: remove "irq_flags" from "struct smc91x_platdata"
  [ARM] 5146/1: pxa2xx: convert all boards to call pxa2xx_transceiver_mode helper
  Support for LCD on e740 e750 e400 and e800 e-series PDAs
  E-series UDC support
  PXA UDC - allow use of inverted GPIO for pullup
  Add e350 support
  Fix broken e-series build
  E-series GPIO / IRQ definitions.
  ...
2008-07-23 18:24:08 -07:00
Andre Detsch
ad1ede1277 powerpc/spufs: better placement of spu affinity reference context
This patch adjusts the placement of a reference context from
a spu affinity chain. The reference context can now be placed
only on nodes that have enough spus not intended to be used by
another gang (already running on the node).

Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
2008-07-24 11:01:54 +10:00
Roland McGrath
af0575bba0 i386 syscall audit fast-path
This adds fast paths for 32-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
These paths does not need to save and restore all registers as
the general case of tracing does.  Avoiding the iret return path
when syscall audit is enabled helps performance a lot.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-23 18:00:30 -07:00
Andre Detsch
0855b54322 powerpc/spufs: fix aff_mutex and cbe_spu_info[n].list_mutex deadlock
Currenlt,, it is possible to lock aff_mutex and
cbe_spu_info[n].list_mutex in different orders, allowing a deadlock to
occur. With this change, aff_mutex is not taken within a list_mutex
critical section anymore.

Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
2008-07-24 10:57:26 +10:00
Roland McGrath
5cbf1565f2 x86_64 ia32 syscall audit fast-path
This adds fast paths for 32-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
These paths does not need to save and restore all registers as
the general case of tracing does.  Avoiding the iret return path
when syscall audit is enabled helps performance a lot.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-23 17:55:22 -07:00
Roland McGrath
86a1c34a92 x86_64 syscall audit fast-path
This adds a fast path for 64-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
This path does not need to save and restore all registers as
the general case of tracing does.  Avoiding the iret return path
when syscall audit is enabled helps performance a lot.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-23 17:47:32 -07:00
Roland McGrath
15e8f348db x86_64: remove bogus optimization in sysret_signal
This short-circuit path in sysret_signal looks wrong to me.
AFAICT, in practice the branch is never taken--and if it were,
it would go wrong.  To wit, try loading a module whose init
function does set_thread_flag(TIF_IRET), and see insmod crash
(presumably with a wrong user stack pointer).

This is because the FIXUP_TOP_OF_STACK work hasn't been done yet
when we jump around the call to ptregscall_common and get to
int_with_check--where it expects the user RSP,SS,CS and EFLAGS to
have been stored by FIXUP_TOP_OF_STACK.

I don't think it's normally possible to get to sysret_signal with no
_TIF_DO_NOTIFY_MASK bits set anyway, so these two instructions are
already superfluous.  If it ever did happen, it is harmless to call
do_notify_resume with nothing for it to do.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-23 17:43:36 -07:00
Patrick McHardy
70eed75d76 netfilter: make security table depend on NETFILTER_ADVANCED
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 16:42:42 -07:00
David S. Miller
4b53fb67e3 tcp: Clear probes_out more aggressively in tcp_ack().
This is based upon an excellent bug report from Eric Dumazet.

tcp_ack() should clear ->icsk_probes_out even if there are packets
outstanding.  Otherwise if we get a sequence of ACKs while we do have
packets outstanding over and over again, we'll never clear the
probes_out value and eventually think the connection is too sick and
we'll reset it.

This appears to be some "optimization" added to tcp_ack() in the 2.4.x
timeframe.  In 2.2.x, probes_out is pretty much always cleared by
tcp_ack().

Here is Eric's original report:

----------------------------------------
Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost.

In order to reproduce the problem, please try the following program on linux-2.6.25.*

Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000

iptables -N SLOWLO
iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT
iptables -A SLOWLO -j DROP

iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO

Then run the attached program and see the output :

# ./loop
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,200ms,1)
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,200ms,3)
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,200ms,5)
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,200ms,7)
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,200ms,9)
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,200ms,11)
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,201ms,13)
State      Recv-Q Send-Q                                  Local Address:Port                                    Peer Address:Port
ESTAB      0      40                                          127.0.0.1:54455                                      127.0.0.1:12000  timer:(persist,188ms,15)
write(): Connection timed out
wrote 890 bytes but was interrupted after 9 seconds
ESTAB      0      0                 127.0.0.1:12000            127.0.0.1:54455
Exiting read() because no data available (4000 ms timeout).
read 860 bytes

While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15)

tcpdump :

15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257
15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257
15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257
15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257
15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257
15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257
15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257
15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257
15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257
15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257
15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257
15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257
15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257
15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257
15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257
15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257
15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257
15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257
15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257
15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257
15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257
15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257
15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257
15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257
15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257
15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257
15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257
15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257
15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257
15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257
15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257
15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257
15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257
15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257
15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257
15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0

Source of program :

/*
 * small producer/consumer program.
 * setup a listener on 127.0.0.1:12000
 * Forks a child
 *   child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms
 * Father accepts connection, and read all data
 */
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <time.h>
#include <sys/poll.h>

int port = 12000;
char buffer[4096];
int main(int argc, char *argv[])
{
        int lfd = socket(AF_INET, SOCK_STREAM, 0);
        struct sockaddr_in socket_address;
        time_t t0, t1;
        int on = 1, sfd, res;
        unsigned long total = 0;
        socklen_t alen = sizeof(socket_address);
        pid_t pid;

        time(&t0);
        socket_address.sin_family = AF_INET;
        socket_address.sin_port = htons(port);
        socket_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK);

        if (lfd == -1) {
                perror("socket()");
                return 1;
        }
        setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int));
        if (bind(lfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) {
                perror("bind");
                close(lfd);
                return 1;
        }
        if (listen(lfd, 1) == -1) {
                perror("listen()");
                close(lfd);
                return 1;
        }
        pid = fork();
        if (pid == 0) {
                int i, cfd = socket(AF_INET, SOCK_STREAM, 0);
                close(lfd);
                if (connect(cfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) {
                        perror("connect()");
                        return 1;
                        }
                for (i = 0 ; ;) {
                        res = write(cfd, "blablabla\n", 10);
                        if (res > 0) total += res;
                        else if (res == -1) {
                                perror("write()");
                                break;
                        } else break;
                        usleep(100000);
                        if (++i == 10) {
                                system("ss -on dst 127.0.0.1:12000");
                                i = 0;
                        }
                }
                time(&t1);
                fprintf(stderr, "wrote %lu bytes but was interrupted after %g seconds\n", total, difftime(t1, t0));
                system("ss -on | grep 127.0.0.1:12000");
                close(cfd);
                return 0;
        }
        sfd = accept(lfd, (struct sockaddr *)&socket_address, &alen);
        if (sfd == -1) {
                perror("accept");
                return 1;
        }
        close(lfd);
        while (1) {
                struct pollfd pfd[1];
                pfd[0].fd = sfd;
                pfd[0].events = POLLIN;
                if (poll(pfd, 1, 4000) == 0) {
                        fprintf(stderr, "Exiting read() because no data available (4000 ms timeout).\n");
                        break;
                }
                res = read(sfd, buffer, sizeof(buffer));
                if (res > 0) total += res;
                else if (res == 0) break;
                else perror("read()");
        }
        fprintf(stderr, "read %lu bytes\n", total);
        close(sfd);
        return 0;
}
----------------------------------------

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 16:38:45 -07:00
David S. Miller
7ae93f51d7 sparc64: Fix cpufreq notifier registry.
Based upon a report by Daniel Smolik.

We do it too early, which triggers a BUG in
cpufreq_register_notifier().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 16:21:07 -07:00
Ingo Molnar
e8ebe3b893 e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call
Evgeniy Polyakov noticed that drivers/net/e1000e/netdev.c:e1000_netpoll()
was calling e1000_clean_tx_irq() without taking the TX lock.

David Miller suggested to remove the call altogether: since in this
callpah there's periodic calls to ->poll() anyway which will do
e1000_clean_tx_irq() and will garbage-collect any finished TX ring
descriptors.

This fix solved the e1000e+netconsole crashes i've been seeing:

=============================================================================
BUG skbuff_head_cache: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b
INFO: Allocated in __alloc_skb+0x2c/0x110 age=0 cpu=0 pid=5098
INFO: Freed in __kfree_skb+0x31/0x80 age=0 cpu=1 pid=4440
INFO: Slab 0xc16cc140 objects=16 used=1 fp=0xf658ae00 flags=0x400000c3
INFO: Object 0xf658ae00 @offset=3584 fp=0xf658af00

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 15:30:52 -07:00
Roland Dreier
1fa6d8181b MAINTAINERS: Remove Glenn Streiff from NetEffect entry
Glenn is no longer at NetEffect.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-23 14:20:12 -07:00
Oliver Hartkopp
b4942af650 net: Update entry in af_family_clock_key_strings
In the merge phase of the CAN subsystem the 
af_family_clock_key_strings[] have been added to sock.c in commit 
443aef0edd 
(lockdep: fixup sk_callback_lock annotation). This trivial patch adds 
the missing name for address family 29 (AF_CAN).

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 14:06:04 -07:00
David S. Miller
5b3ab1dbd4 netdev: Remove warning from __netif_schedule().
It isn't helping anything and we aren't going to be able to change all
the drivers that do queue wakeups in strange situations.

Just letting a noop_qdisc get scheduled will work because when
qdisc_run() executes via net_tx_work() it will simply find no packets
pending when it makes the ->dequeue() call in qdisc_restart.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 14:01:29 -07:00