* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
ucc_geth: Fix oops when using fixed-link support
dm9000: locking bugfix
net: update dnet.c for bus_id removal
dnet: DNET should depend on HAS_IOMEM
dca: add missing copyright/license headers
nl80211: Check that function pointer != NULL before using it
sungem: missing net_device_ops
be2net: fix to restore vlan ids into BE2 during a IF DOWN->UP cycle
be2net: replenish when posting to rx-queue is starved in out of mem conditions
bas_gigaset: correctly allocate USB interrupt transfer buffer
smsc911x: reset last known duplex and carrier on open
sh_eth: Fix mistake of the address of SH7763
sh_eth: Change handling of IRQ
netns: oops in ip[6]_frag_reasm incrementing stats
net: kfree(napi->skb) => kfree_skb
net: fix sctp breakage
ipv6: fix display of local and remote sit endpoints
net: Document /proc/sys/net/core/netdev_budget
tulip: fix crash on iface up with shirq debug
virtio_net: Make virtio_net support carrier detection
...
This patch adds nfnetlink_set_err() to propagate the error to netlink
broadcast listener in case of memory allocation errors in the
message building.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Use net_device_ops for usbnet device, and export for use
by other derived drivers.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initial version of the DSA driver only supported a single switch
chip per network interface, while DSA-capable switch chips can be
interconnected to form a tree of switch chips. This patch adds support
for multiple switch chips on a network interface.
An example topology for a 16-port device with an embedded CPU is as
follows:
+-----+ +--------+ +--------+
| |eth0 10| switch |9 10| switch |
| CPU +----------+ +-------+ |
| | | chip 0 | | chip 1 |
+-----+ +---++---+ +---++---+
|| ||
|| ||
||1000baseT ||1000baseT
||ports 1-8 ||ports 9-16
This requires a couple of interdependent changes in the DSA layer:
- The dsa platform driver data needs to be extended: there is still
only one netdevice per DSA driver instance (eth0 in the example
above), but each of the switch chips in the tree needs its own
mii_bus device pointer, MII management bus address, and port name
array. (include/net/dsa.h) The existing in-tree dsa users need
some small changes to deal with this. (arch/arm)
- The DSA and Ethertype DSA tagging modules need to be extended to
use the DSA device ID field on receive and demultiplex the packet
accordingly, and fill in the DSA device ID field on transmit
according to which switch chip the packet is heading to.
(net/dsa/tag_{dsa,edsa}.c)
- The concept of "CPU port", which is the switch chip port that the
CPU is connected to (port 10 on switch chip 0 in the example), needs
to be extended with the concept of "upstream port", which is the
port on the switch chip that will bring us one hop closer to the CPU
(port 10 for both switch chips in the example above).
- The dsa platform data needs to specify which ports on which switch
chips are links to other switch chips, so that we can enable DSA
tagging mode on them. (For inter-switch links, we always use
non-EtherType DSA tagging, since it has lower overhead. The CPU
link uses dsa or edsa tagging depending on what the 'root' switch
chip supports.) This is done by specifying "dsa" for the given
port in the port array.
- The dsa platform data needs to be extended with information on via
which port to reach any given switch chip from any given switch chip.
This info is specified via the per-switch chip data struct ->rtable[]
array, which gives the nexthop ports for each of the other switches
in the tree.
For the example topology above, the dsa platform data would look
something like this:
static struct dsa_chip_data sw[2] = {
{
.mii_bus = &foo,
.sw_addr = 1,
.port_names[0] = "p1",
.port_names[1] = "p2",
.port_names[2] = "p3",
.port_names[3] = "p4",
.port_names[4] = "p5",
.port_names[5] = "p6",
.port_names[6] = "p7",
.port_names[7] = "p8",
.port_names[9] = "dsa",
.port_names[10] = "cpu",
.rtable = (s8 []){ -1, 9, },
}, {
.mii_bus = &foo,
.sw_addr = 2,
.port_names[0] = "p9",
.port_names[1] = "p10",
.port_names[2] = "p11",
.port_names[3] = "p12",
.port_names[4] = "p13",
.port_names[5] = "p14",
.port_names[6] = "p15",
.port_names[7] = "p16",
.port_names[10] = "dsa",
.rtable = (s8 []){ 10, -1, },
},
},
static struct dsa_platform_data pd = {
.netdev = &foo,
.nr_switches = 2,
.sw = sw,
};
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Gary Thomas <gary@mlbassoc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Protocols should be able to use constant value for the descriptor.
Minor whitespace cleanup as well
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove 2 TEST_FRAME hacks that are no longer needed. These allowed
sctp regression tests to compile before, but are no longer needed.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some minor changes to queue hashing:
1. Use const on accessor functions
2. Export skb_tx_hash for use in drivers (see ixgbe)
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In two dca files copyright and license headers are missing.
This patch adds them there.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
reorder struct inet6_ifaddr to remove padding on 64 bit builds
remove 8 bytes of padding so inet6_ifaddr becomes 192 bytes & fits into
a smaller slab.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_queue_xmit() needs to dirty fields "state", "q", "bstats" and "qstats"
On x86_64 arch, they currently span three cache lines, involving more
cache line ping pongs than necessary, making longer holding of queue spinlock.
We can reduce this to one cache line, by grouping all read-mostly fields
at the beginning of structure. (Or should I say, all highly modified fields
at the end :) )
Before patch :
offsetof(struct Qdisc, state)=0x38
offsetof(struct Qdisc, q)=0x48
offsetof(struct Qdisc, bstats)=0x80
offsetof(struct Qdisc, qstats)=0x90
sizeof(struct Qdisc)=0xc8
After patch :
offsetof(struct Qdisc, state)=0x80
offsetof(struct Qdisc, q)=0x88
offsetof(struct Qdisc, bstats)=0xa0
offsetof(struct Qdisc, qstats)=0xac
sizeof(struct Qdisc)=0xc0
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve manageability, it would be good to be able to disambiguate routes
added by administrator from those added by DHCP client. The only necessary
kernel change is to add value to rtnetlink include file so iproute2 utility
can use it.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a port's link is down (except to driver restart) and the port is
configured for auto sensing, we try to sense port link type (Ethernet
or InfiniBand) in order to determine how to initialize the port. If
the port type needs to be changed, all mlx4 for the device interfaces
are unregistered and then registered again with the new port
types. Sensing is done with intervals of 3 seconds.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since creating a device node is normally an operation requiring special
privilege, Igor Zhbanov points out that it is surprising (to say the
least) that a client can, for example, create a device node on a
filesystem exported with root_squash.
So, make sure CAP_MKNOD is among the capabilities dropped when an nfsd
thread handles a request from a non-root user.
Reported-by: Igor Zhbanov <izh1979@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It depends on L3 support from 2.4 kernel (CONFIG_L3) that never got
merged into mainline. Since there's no way to use it on any of
supported machines (iPaq h3100 or h3600), better drop it for now.
It can be reimplemented later using ASoC infrastructure (there's
already a driver for uda1341 codec in mainline, so only CPU and machine
parts need to be written).
Signed-off-by: Dmitry Artamonow <mad_soft@inbox.ru>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This allows us to send to userspace "regulatory" events.
For now we just send an event when we change regulatory domains.
We also notify userspace when devices are using their own custom
world roaming regulatory domains.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We do this so we can later inform userspace who set the
regulatory domain and provide details of the request.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is not used as we can always just assume the first
regulatory domain set will _always_ be a static regulatory
domain. REGDOM_SET_BY_CORE will be the first request from
cfg80211 for a regdomain and that then populates the first
regulatory request.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place. As it
stands we may pass freed skbs to netpoll_receive_skb.
This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll. This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry. I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.
This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out). This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).
Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:
(jhash(source IP) % total_nodes) & node_mask
For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):
iptables -I PREROUTING -t mangle -i eth1 -m cluster \
--cluster-total-nodes 2 --cluster-local-node 1 \
--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
--cluster-total-nodes 2 --cluster-local-node 1 \
--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
-m mark ! --mark 0xffff -j DROP
And the following commands to make all nodes see the same packets:
ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
--destination-mac 01:00:5e:00:01:01 \
-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
--destination-mac 01:00:5e:00:01:02 \
-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
BTW, some final notes:
* This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
* This match supersedes the CLUSTERIP target.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit 784544739a (netfilter: iptables:
lock free counters) broke a number of modules whose rule data referenced
itself. A reallocation would not reestablish the correct references, so
it is best to use a separate struct that does not fall under RCU.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Traditionally, changes to struct file->f_flags have been done under BKL
protection, or with no protection at all. This patch causes all f_flags
changes after file open/creation time to be done under protection of
f_lock. This allows the removal of some BKL usage and fixes a number of
longstanding (if microscopic) races.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This lock moves out of the CONFIG_EPOLL ifdef and becomes f_lock. For now,
epoll remains the only user, but a future patch will use it to protect
f_flags as well.
Cc: Davide Libenzi <davidel@xmailserver.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This patch skips the delivery of conntrack events if the packet
was drop due to a race condition in the conntrack insertion.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.
This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The results is very unlikely change every so often so we
hardly need to divide again after doing that once for a
connection. Yet, if divide still becomes necessary we
detect that and do the right thing and again settle for
non-divide state. Takes the u16 space which was previously
taken by the plain xmit_size_goal.
This should take care part of the tso vs non-tso difference
we found earlier.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's very little need for most of the callsites to get
tp->xmit_goal_size updated. That will cost us divide as is,
so slice the function in two. Also, the only users of the
tp->xmit_goal_size are directly behind tcp_current_mss(),
so there's no need to store that variable into tcp_sock
at all! The drop of xmit_goal_size currently leaves 16-bit
hole and some reorganization would again be necessary to
change that (but I'm aiming to fill that hole with u16
xmit_goal_size_segs to cache the results of the remaining
divide to get that tso on regression).
Bring xmit_goal_size parts into tcp.c
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wow, it was quite tricky to merge that stream of negations
but I think I finally got it right:
check & replace_ts_recent:
(s32)(rcv_tsval - ts_recent) >= 0 => 0
(s32)(ts_recent - rcv_tsval) <= 0 => 0
discard:
(s32)(ts_recent - rcv_tsval) > TCP_PAWS_WINDOW => 1
(s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW => 0
I toggled the return values of tcp_paws_check around since
the old encoding added yet-another negation making tracking
of truth-values really complicated.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
On x86_64, its rather unfortunate that "wait_queue_head_t wait"
field of "struct socket" spans two cache lines (assuming a 64
bytes cache line in current cpus)
offsetof(struct socket, wait)=0x30
sizeof(wait_queue_head_t)=0x18
This might explain why Kenny Chang noticed that his multicast workload
was performing bad with 64 bit kernels, since more cache lines ping pongs
were involved.
This litle patch moves "wait" field next "fasync_list" so that both
fields share a single cache line, to speedup sock_def_readable()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/home/rmk/linux-2.6-arm: (23 commits)
[ARM] Fix virtual to physical translation macro corner cases
[ARM] update mach-types
[ARM] 5421/1: ftrace: fix crash due to tracing of __naked functions
MX1 fix include
[ARM] 5419/1: ep93xx: fix build warnings about struct i2c_board_info
[ARM] 5418/1: restore lr before leaving mcount
ARM: OMAP: board-omap3beagle: set i2c-3 to 100kHz
ARM: OMAP: Allow I2C bus driver to be compiled as a module
ARM: OMAP: sched_clock() corrected
ARM: OMAP: Fix compile error if pm.h is included
[ARM] orion5x: pass dram mbus data to xor driver
[ARM] S3C64XX: Fix s3c64xx_setrate_clksrc
[ARM] S3C64XX: sparse warnings in arch/arm/plat-s3c64xx/irq.c
[ARM] S3C64XX: sparse warnings in arch/arm/plat-s3c64xx/s3c6400-clock.c
[ARM] S3C64XX: Fix USB host clock mux list
[ARM] S3C64XX: Fix name of USB host clock.
[ARM] S3C64XX: Rename IRQ_UHOST to IRQ_USBH
[ARM] S3C64XX: Do gpiolib configuration earlier
[ARM] S3C64XX: Staticise s3c64xx_init_irq_eint()
[ARM] SMDK6410: Declare iodesc table static
...
As the PXA27x series allow 2 gpios to reset the ac97 bus,
allow through platform data configuration the definition of
the correct gpio which will reset the AC97 bus.
This comes from a silicon defect on the PXA27x series, where
the gpio must be manually controlled in warm reset cases.
Signed-off-by: Robert Jarzmik <rjarzmik@free.fr>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Stricter gfp_mask might be required for clone allocation.
For example, request-based dm may clone bio in interrupt context
so it has to use GFP_ATOMIC.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>