Commit graph

362558 commits

Author SHA1 Message Date
John W. Linville
b9d5319041 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-20 14:26:37 -04:00
Daniel Baluta
4c1d8d0617 net: fix psock_fanout selftest bind error message
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:42:41 -04:00
Takashi Iwai
eb49faa6a4 ALSA: hda - Fix abuse of snd_hda_lock_devices() for DSP loader
The current DSP loader code abuses snd_hda_lock_devices() for ensuring
the DSP loader not conflicting with the other normal operations.  But
this trick obviously doesn't work for the PM resume since the streams
are kept opened there where snd_hda_lock_devices() returns -EBUSY.
That means we need another lock mechanism instead of abuse.

This patch provides the new lock state to azx_dev.  Theoretically it's
possible that the DSP loader conflicts with the stream that has been
already assigned for another PCM.  If it's running, the DSP loader
should simply fail.  If not -- it's the case for PM resume --, we
should assign this stream temporarily to the DSP loader, and take it
back to the PCM after finishing DSP loading.  If the PCM is operated
during the DSP loading, it should get an error, too.

Reported-and-tested-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-03-20 18:36:06 +01:00
Eric Dumazet
70386d40e1 chelsio: add headroom in RX path
Drivers should reserve some headroom in skb used in receive path,
to avoid future head reallocation.

One possible way to do that is to use dev_alloc_skb() instead
of alloc_skb(), so that NET_SKB_PAD bytes are reserved.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:29:34 -04:00
Chris Metcalf
8fdc929f57 dynticks: avoid flow_cache_flush() interrupting every core
Previously, if you did an "ifconfig down" or similar on one core, and
the kernel had CONFIG_XFRM enabled, every core would be interrupted to
check its percpu flow list for items that could be garbage collected.

With this change, we generate a mask of cores that actually have any
percpu items, and only interrupt those cores.  When we are trying to
isolate a set of cpus from interrupts, this is important to do.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:28:39 -04:00
Yuval Mintz
7fa6f34081 bnx2x: AER revised
Revised bnx2x implementation of PCI Express Advanced Error Recovery -
stop and free driver resources according to the AER flow (instead of the
currently implemented `hope-for-the-best' release approach), and do not make
any assumptions on the HW state after slot reset.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:27:28 -04:00
Wei Yongjun
47a5247fdd net: fec: make local function fec_poll_controller() static
fec_poll_controller() was not declared. It should be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Wei Yongjun
e052a5893b net: ethernet: davinci_emac: make local function emac_poll_controller() static
emac_poll_controller() was not declared. It should be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Sachin Kamat
9fad0c941a net: mdio-octeon: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Sachin Kamat
f8e5fc8c20 net: mdio-gpio: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Sachin Kamat
95d158df44 net: au1k_ir: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:36 -04:00
Sachin Kamat
6d7496836d net: s6gmac: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:36 -04:00
Sachin Kamat
18e4a7374c net: ks8695net: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:36 -04:00
Jesper Derehag
2b5faa4c55 connector: Added coredumping event to the process connector
Process connector can now also detect coredumping events.

Main aim of patch is get notified at start of coredumping, instead of
having to wait for it to finish and then being notified through EXIT
event.

Could be used for instance by process-managers that want to get
notified as soon as possible about process failures, and not
necessarily beeing notified after coredump, which could be in the
order of minutes depending on size of coredump, piping and so on.

Signed-off-by: Jesper Derehag <jderehag@hotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:23:21 -04:00
Claudiu Manoil
800c644bcd gianfar: Refactor config coalescing calls for all queues
The only place where gfar_configure_coalescing is called
with an actual bitmask (other than 0xff) is in gfar_poll
(on the hot path). So make gfar_configure_coalescing()
static for the buffer processing path, and export
gfar_configure_coalescing_all() for the remaining cases
that require to set coalescing for all the queues at once
(on the slow path).

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:53 -04:00
Claudiu Manoil
5d9657d83a gianfar: Remove redundant programming of [rt]xic registers
For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic
are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This
avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:52 -04:00
Claudiu Manoil
6be5ed3fef gianfar: Poll only active Rx queues
Split the napi budget fairly among the active queues only, instead
of dividing it by the total number of Rx queues assigned to the
given interrupt group.
Use the h/w indication field RXFi in rstat (receive status register)
to identify the active rx queues from the current interrupt group
(i.e. receive event occured on ring i, if ring i is part of the current
interrupt group). This indication field in rstat, RXFi i=0..7,
allows us to find out on which queues of the same interrupt group
do we have incomming traffic once we entered the polling routine for
the given interrupt group. After servicing the ring i, the corresponding
bit RXFi will be written with 1 to clear the active queue indication for
that ring.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:52 -04:00
Claudiu Manoil
c233cf4074 gianfar: Fix tx napi polling
There are 2 issues with the current napi poll routine, with regards
to tx ring cleanup:
1) for multi-queue devices (MQ_MG_MODE), should tx_bit_map != rx_bit_map,
which is possible (and supported in h/w) if the DT property "fsl,tx-bit-map"
holds a different value than rx_bit_map, the current polling routine will
service the wrong Tx queues in this case (i.e. the interrupt group will
receive interrupts from tx queues that it will not service)
2) Tx cleanup completion consumes napi budget, whereas the napi budget
should be reserved for Rx work only.

The patch fixes these issues and provides a clean napi polling routine.
Napi poll completion is reached when all the Rx queues have been
serviced and there is no Tx work to do.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:52 -04:00
Mike Snitzer
ea2dd8c1ed dm cache: policy ignore hints if generated by different version
When reading the dm cache metadata from disk, ignore the policy hints
unless they were generated by the same major version number of the same
policy module.

The hints are considered to be private data belonging to the specific
module that generated them and there is no requirement for them to make
sense to different versions of the policy that generated them.
Policy modules are all required to work fine if no previous hints are
supplied (or if existing hints are lost).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:28 +00:00
Mike Snitzer
4e7f506f64 dm cache: policy change version from string to integer set
Separate dm cache policy version string into 3 unsigned numbers
corresponding to major, minor and patchlevel and store them at the end
of the on-disk metadata so we know which version of the policy generated
the hints in case a future version wants to use them differently.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:27 +00:00
Joe Thornber
e2e74d617e dm cache: fix race in writethrough implementation
We have found a race in the optimisation used in the dm cache
writethrough implementation.  Currently, dm core sends the cache target
two bios, one for the origin device and one for the cache device and
these are processed in parallel.  This patch avoids the race by
changing the code back to a simpler (slower) implementation which
processes the two writes in series, one after the other, until we can
develop a complete fix for the problem.

When the cache is in writethrough mode it needs to send WRITE bios to
both the origin and cache devices.

Previously we've been implementing this by having dm core query the
cache target on every write to find out how many copies of the bio it
wants.  The cache will ask for two bios if the block is in the cache,
and one otherwise.

Then main problem with this is it's racey.  At the time this check is
made the bio hasn't yet been submitted and so isn't being taken into
account when quiescing a block for migration (promotion or demotion).
This means a single bio may be submitted when two were needed because
the block has since been promoted to the cache (catastrophic), or two
bios where only one is needed (harmless).

I really don't want to start entering bios into the quiescing system
(deferred_set) in the get_num_write_bios callback.  Instead this patch
simplifies things; only one bio is submitted by the core, this is
first written to the origin and then the cache device in series.
Obviously this will have a latency impact.

deferred_writethrough_bios is introduced to record bios that must be
later issued to the cache device from the worker thread.  This deferred
submission, after the origin bio completes, is required given that we're
in interrupt context (writethrough_endio).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:27 +00:00
Joe Thornber
79ed9caffc dm cache: metadata clear dirty bits on clean shutdown
When writing the dirty bitset to the metadata device on a clean
shutdown, clear the dirty bits.  Previously they were left indicating
the cache was dirty. This led to confusion about whether there really
was dirty data in the cache or not.  (This was a harmless bug.)

Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:27 +00:00
Heinz Mauelshagen
b978440b8d dm cache: avoid calling policy destructor twice on error
If the cache policy's config values are not able to be set we must
set the policy to NULL after destroying it in create_cache_policy()
so we don't attempt to destroy it a second time later.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:26 +00:00
Heinz Mauelshagen
617a0b89da dm cache: detect cache_create failure
Return error if cache_create() fails.

A missing return check made cache_ctr continue even after an error in
cache_create() resulting in the cache object being destroyed.  So a
simple failure like an odd number of cache policy config value arguments
would result in an oops.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:26 +00:00
Joe Thornber
414dd67d50 dm cache: avoid 64 bit division on 32 bit
Squash various 32bit link errors.

  >> on i386:
  >> drivers/built-in.o: In function `is_discarded_oblock':
  >> dm-cache-target.c:(.text+0x1ea28e): undefined reference to `__udivdi3'
  ...

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:25 +00:00
Mikulas Patocka
3b6b7813b1 dm verity: avoid deadlock
A deadlock was found in the prefetch code in the dm verity map
function.  This patch fixes this by transferring the prefetch
to a worker thread and skipping it completely if kmalloc fails.

If generic_make_request is called recursively, it queues the I/O
request on the current->bio_list without making the I/O request
and returns. The routine making the recursive call cannot wait
for the I/O to complete.

The deadlock occurs when one thread grabs the bufio_client
mutex and waits for an I/O to complete but the I/O is queued
on another thread's current->bio_list and is waiting to get
the mutex held by the first thread.

The fix recognises that prefetching is not essential.  If memory
can be allocated, it queues the prefetch request to the worker thread,
but if not, it does nothing.

Signed-off-by: Paul Taysom <taysom@chromium.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
2013-03-20 17:21:25 +00:00
Joe Thornber
58051b94e0 dm thin: fix non power of two discard granularity calc
Fix a discard granularity calculation to work for non power of 2 block sizes.

In order for thinp to passdown discard bios to the underlying data
device, the data device must have a discard granularity that is a
factor of the thinp block size.  Originally this check was done by
using bitops since the block_size was known to be a power of two.

Introduced by commit f13945d757
("dm thin: support a non power of 2 discard_granularity").

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:25 +00:00
Joe Thornber
f046f89a99 dm thin: fix discard corruption
Fix a bug in dm_btree_remove that could leave leaf values with incorrect
reference counts.  The effect of this was that removal of a shared block
could result in the space maps thinking the block was no longer used.
More concretely, if you have a thin device and a snapshot of it, sending
a discard to a shared region of the thin could corrupt the snapshot.

Thinp uses a 2-level nested btree to store it's mappings.  This first
level is indexed by thin device, and the second level by logical
block.

Often when we're removing an entry in this mapping tree we need to
rebalance nodes, which can involve shadowing them, possibly creating a
copy if the block is shared.  If we do create a copy then children of
that node need to have their reference counts incremented.  In this
way reference counts percolate down the tree as shared trees diverge.

The rebalance functions were incrementing the children at the
appropriate time, but they were always assuming the children were
internal nodes.  This meant the leaf values (in our case packed
block/flags entries) were not being incremented.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:24 +00:00
Daniel Borkmann
3e5289d5e3 filter: add ANC_PAY_OFFSET instruction for loading payload start offset
It is very useful to do dynamic truncation of packets. In particular,
we're interested to push the necessary header bytes to the user space and
cut off user payload that should probably not be transferred for some reasons
(e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET,
we can load it into the accumulator, and return it. E.g. in bpfc syntax ...

        ld #poff        ; { 0x20, 0, 0, 0xfffff034 },
        ret a           ; { 0x16, 0, 0, 0x00000000 },

... as a filter will accomplish this without having to do a big hackery in
a BPF filter itself. Follow-up JIT implementations are welcome.

Thanks to Eric Dumazet for suggesting and discussing this during the
Netfilter Workshop in Copenhagen.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:15:45 -04:00
Daniel Borkmann
f77668dc25 net: flow_dissector: add __skb_get_poff to get a start offset to payload
__skb_get_poff() returns the offset to the payload as far as it could
be dissected. The main user is currently BPF, so that we can dynamically
truncate packets without needing to push actual payload to the user
space and instead can analyze headers only.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:15:45 -04:00
David S. Miller
61816596d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull in the 'net' tree to get Daniel Borkmann's flow dissector
infrastructure change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:46:26 -04:00
Willem de Bruijn
23a9072e3a net: fix psock_fanout selftest hash collision
Fix flaky results with PACKET_FANOUT_HASH depending on whether the
two flows hash into the same packet socket or not.

Also adds tests for PACKET_FANOUT_LB and PACKET_FANOUT_CPU and
replaces the counting method with a packet ring.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:33:18 -04:00
Fabio Estevam
da2191e314 net: fec: Define indexes as 'unsigned int'
Fix the following warnings that happen when building with W=1 option:

drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_free_buffers':
drivers/net/ethernet/freescale/fec.c:1337:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_alloc_buffers':
drivers/net/ethernet/freescale/fec.c:1361:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_init':
drivers/net/ethernet/freescale/fec.c:1631:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:28:59 -04:00
Kees Cook
896ee0eee6 net/irda: add missing error path release_sock call
This makes sure that release_sock is called for all error conditions in
irda_getsockopt.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:23:13 -04:00
Wei Yongjun
fa90b077d7 lpc_eth: fix error return code in lpc_eth_drv_probe()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:19:15 -04:00
Sergei Shtylyov
fc0c090040 sh_eth: check TSU registers ioremap() error
One must check the result of ioremap() -- in this case it prevents potential
kernel oops when initializing TSU registers further on...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:17:59 -04:00
Sergei Shtylyov
0582b7d15f sh_eth: fix bitbang memory leak
sh_mdio_init() allocates pointer to 'struct bb_info' but only stores it locally,
so that sh_mdio_release() can't free it on driver unload.  Add the pointer to
'struct bb_info' to 'struct sh_eth_private', so that sh_mdio_init() can save
'bitbang' variable for sh_mdio_release() to be able to free it later...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:17:59 -04:00
Martin Fuzzey
283951f95b ipconfig: Fix newline handling in log message.
When using ipconfig the logs currently look like:

Single name server:
[    3.467270] IP-Config: Complete:
[    3.470613]      device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.480670]      host=infigo-1, domain=, nis-domain=(none)
[    3.486166]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.492910]      nameserver0=172.16.42.1[    3.496853] ALSA device list:

Three name servers:
[    3.496949] IP-Config: Complete:
[    3.500293]      device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.510367]      host=infigo-1, domain=, nis-domain=(none)
[    3.515864]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.522635]      nameserver0=172.16.42.1, nameserver1=172.16.42.100
[    3.529149] , nameserver2=172.16.42.200

Fix newline handling for these cases

Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:15:58 -04:00
Daniel Borkmann
8ed781668d flow_keys: include thoff into flow_keys for later usage
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:14:36 -04:00
David S. Miller
e0ebcb80cb Merge branch 'l2tp'
Tom Parkin says:

====================
This l2tp bugfix patchset addresses a number of issues.

The first five patches in the series prevent l2tp sessions pinning an l2tp
tunnel open.  This occurs because the l2tp tunnel is torn down in the tunnel
socket destructor, but each session holds a tunnel socket reference which
prevents tunnels with sessions being deleted.  The solution I've implemented
here involves adding a .destroy hook to udp code, as discussed previously on
netdev[1].

The subsequent seven patches address futher bugs exposed by fixing the problem
above, or exposed through stress testing the implementation above.  Patch 11
(avoid deadlock in l2tp stats update) isn't directly related to tunnel/session
lifetimes, but it does prevent deadlocks on i386 kernels running on 64 bit
hardware.

This patchset has been tested on 32 and 64 bit preempt/non-preempt kernels,
using iproute2, openl2tp, and custom-made stress test code.

[1] http://comments.gmane.org/gmane.linux.network/259169
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:46 -04:00
Tom Parkin
f6e16b299b l2tp: unhash l2tp sessions on delete, not on free
If we postpone unhashing of l2tp sessions until the structure is freed, we
risk:

 1. further packets arriving and getting queued while the pseudowire is being
    closed down
 2. the recv path hitting "scheduling while atomic" errors in the case that
    recv drops the last reference to a session and calls l2tp_session_free
    while in atomic context

As such, l2tp sessions should be unhashed from l2tp_core data structures early
in the teardown process prior to calling pseudowire close.  For pseudowires
like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
7b7c0719cd l2tp: avoid deadlock in l2tp stats update
l2tp's u64_stats writers were incorrectly synchronised, making it possible to
deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
code netlink commands while passing data through l2tp sessions.

Previous discussion on netdev determined that alternative solutions such as
spinlock writer synchronisation or per-cpu data would bring unjustified
overhead, given that most users interested in high volume traffic will likely
be running 64bit kernels on 64bit hardware.

As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
thereby avoiding the deadlock.

Ref:
http://marc.info/?l=linux-netdev&m=134029167910731&w=2
http://marc.info/?l=linux-netdev&m=134079868111131&w=2

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
cf2f5c886a l2tp: push all ppp pseudowire shutdown through .release handler
If userspace deletes a ppp pseudowire using the netlink API, either by
directly deleting the session or by deleting the tunnel that contains the
session, we need to tear down the corresponding pppox channel.

Rather than trying to manage two pppox unbind codepaths, switch the netlink
and l2tp_core session_close handlers to close via. the l2tp_ppp socket
.release handler.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
4c6e2fd354 l2tp: purge session reorder queue on delete
Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
and l2tp_session_delete.  Pseudowire implementations which are deleted only
via. l2tp_core l2tp_session_delete calls can dispense with their own code for
flushing the reorder queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
48f72f92b3 l2tp: add session reorder queue purge function to core
If an l2tp session is deleted, it is necessary to delete skbs in-flight
on the session's reorder queue before taking it down.

Rather than having each pseudowire implementation reaching into the
l2tp_session struct to handle this itself, provide a function in l2tp_core to
purge the session queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
02d13ed5f9 l2tp: don't BUG_ON sk_socket being NULL
It is valid for an existing struct sock object to have a NULL sk_socket
pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
8abbbe8ff5 l2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookup
When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
whether the socket was created by the kernel or by userspace.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
2b551c6e7d l2tp: close sessions before initiating tunnel delete
When a user deletes a tunnel using netlink, all the sessions in the tunnel
should also be deleted.  Since running sessions will pin the tunnel socket
with the references they hold, have the l2tp_tunnel_delete close all sessions
in a tunnel before finally closing the tunnel socket.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
936063175a l2tp: close sessions in ip socket destroy callback
l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
socket being closed from userspace.  We need to do the same thing for
IP-encapsulation sockets.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
e34f4c7050 l2tp: export l2tp_tunnel_closeall
l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
tunnel when a UDP-encapsulation socket is destroyed.  We need to do something
similar for IP-encapsulation sockets.

Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
call it from their .destroy handlers.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00