In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RTL8153-AD supports a persistent system specific MAC address.
This means a device plugged into two different systems with host side
support will show different (but persistent) MAC addresses.
This information for the system's persistent MAC address is burned in when
the system HW is built and available under \_SB.AMAC in the DSDT at runtime.
This technology is currently implemented in the Dell TB15 and WD15 Type-C
docks. More information is available here:
http://www.dell.com/support/article/us/en/04/SLN301147
Signed-off-by: Mario Limonciello <mario_limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
as a control message to TCP. Since __sock_cmsg_send does not
support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
hence breaks pulse audio over TCP.
SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
but they semantically belong to SOL_UNIX. Since all
cmsg-processing functions including sock_cmsg_send ignore control
messages of other layers, it is best to ignore SCM_RIGHTS
and SCM_CREDENTIALS for consistency (and also for fixing pulse
audio over TCP).
Fixes: c14ac9451c ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/dsa/b53/b53_srab.c: In function 'b53_srab_probe':
>> drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
pdata->chip_id = (u32)of_id->data;
^
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Kconfig currently controlling compilation of this code is:
init/Kconfig:config BPF_SYSCALL
init/Kconfig: bool "Enable bpf() system call"
...meaning that it currently is not being built as a module by anyone.
Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.
Note that MODULE_ALIAS is a no-op for non-modular code.
We replace module.h with init.h since the file does use __init.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
idx can be returned as -ENOSPC, so we should check for this first
before using it as an index into nn->vxlan_usecnt[] to avoid an
out of bounds array offset read.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable lan91x adapters in some ARM machines and models
when booted with an ACPI kernel.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some comments weren't updated to reflect the renaming of ndo's and the
change of arguments.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vegard Nossum is reporting for a crash in fib_dump_info
when nh_dev = NULL and fib_nhs == 1:
Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+
RIP: 0033:[<00000000602b3d18>]
RSP: 0000000062623890 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00
RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000
R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00
R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000
Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18
CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581
Stack:
626238f0 960226a02 00000400 000000fe
62623910 600afca7 62623970 62623a48
62468038 00000018 00000000 00000000
Call Trace:
[<602b3e93>] rtmsg_fib+0xd3/0x190
[<602b6680>] fib_table_insert+0x260/0x500
[<602b0e5d>] inet_rtm_newroute+0x4d/0x60
[<60250def>] rtnetlink_rcv_msg+0x8f/0x270
[<60267079>] netlink_rcv_skb+0xc9/0xe0
[<60250d4b>] rtnetlink_rcv+0x3b/0x50
[<60265400>] netlink_unicast+0x1a0/0x2c0
[<60265e47>] netlink_sendmsg+0x3f7/0x470
[<6021dc9a>] sock_sendmsg+0x3a/0x90
[<6021e0d0>] ___sys_sendmsg+0x300/0x360
[<6021fa64>] __sys_sendmsg+0x54/0xa0
[<6021fac0>] SyS_sendmsg+0x10/0x20
[<6001ea68>] handle_syscall+0x88/0x90
[<600295fd>] userspace+0x3fd/0x500
[<6001ac55>] fork_handler+0x85/0x90
$ addr2line -e vmlinux -i 0x602b3d18
include/linux/inetdevice.h:222
net/ipv4/fib_semantics.c:1264
Problem happens when RTNH_F_LINKDOWN is provided from user space
when creating routes that do not use the flag, catched with
netlink fuzzer.
Currently, the kernel allows user space to set both flags
to nh_flags and fib_flags but this is not intentional, the
assumption was that they are not set. Fix this by rejecting
both flags with EINVAL.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Fixes: 0eeb075fad ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Dinesh Dutt <ddutt@cumulusnetworks.com>
Cc: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As requested by Scott, removing him.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a combination if #if conditionals and goto labels to unwind
tunnel4_init seems unwieldy. This patch takes a simpler approach of
directly unregistering previously registered protocols when an error
occurs.
This fixes a number of problems with the current implementation
including the potential presence of labels when they are unused
and the potential absence of unregister code when it is needed.
Fixes: 8afe97e5d4 ("tunnels: support MPLS over IPv4 tunnels")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: implement rfc7496 in sctp
This patchset implements "Additional Policies for the Partially Reliable
Stream Control Transmission Protocol Extension" described on RFC7496.
The Partially Reliable SCTP (PR-SCTP) extension defined in [RFC3758]
provides a generic method for senders to abandon user messages. The
decision to abandon a user message is sender side only, and the exact
condition is called a "PR-SCTP policy". This patchset implements 3
policies:
1. Timed Reliability: This allows the sender to specify a timeout for
a user message after which the SCTP stack abandons the user message.
2. Limited Retransmission Policy: Allows limitation of the number of
retransmissions.
3. Priority Policy: Allows removal of lower-priority messages if space
for higher-priority messages is needed in the send buffer.
Patch 1-3 add some sockopts in sctp to set/get pr_sctp policy status.
Patch 4-6 implement these 3 policies one by one.
====================
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
prsctp PRIO policy is a policy to abandon lower priority chunks when
asoc doesn't have enough snd buffer, so that the current chunk with
higher priority can be queued successfully.
Similar to TTL/RTX policy, we will set the priority of the chunk to
prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
So if PRIO policy is enabled, msg->expire_at won't work.
asoc->sent_cnt_removable will record how many chunks can be checked to
remove. If priority policy is enabled, when the chunk is queued into
the out_queue, we will increase sent_cnt_removable. When the chunk is
moved to abandon_queue or dequeue and free, we will decrease
sent_cnt_removable.
In sctp_sendmsg, we will check if there is enough snd buffer for current
msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
free chunks from out_queue in right order until the abandon+free size >
msg_len - sctp_wfree. For the abandon size, we have to wait until it
sends FORWARD TSN, receives the sack and the chunks are really freed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
prsctp RTX policy is a policy to abandon chunks when they are
retransmitted beyond the max count.
This patch uses sent_count to count how many times one chunk has
been sent, and prsctp_param is the max rtx count, which is from
sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So similar
to TTL policy, if RTX policy is enabled, msg->expire_at won't
work.
Then in sctp_chunk_abandoned, this patch checks if chunk->sent_count
is bigger than chunk->prsctp_param to abandon this chunk.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
prsctp TTL policy is a policy to abandon chunks when they expire
at the specific time in local stack. It's similar with expires_at
in struct sctp_datamsg.
This patch uses sinfo->sinfo_timetolive to set the specific time for
TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at.
So if prsctp_enable or TTL policy is not enabled, msg->expires_at
still works as before.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
to dump the prsctp statistics info from the asoc. The prsctp statistics
includes abandoned_sent/unsent from the asoc. abandoned_sent is the
count of the packets we drop packets from retransmit/transmited queue,
and abandoned_unsent is the count of the packets we drop from out_queue
according to the policy.
Note: another option for prsctp statistics dump described in rfc is
SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
info from each stream. But by now, linux doesn't yet have per stream
statistics info, it needs rfc6525 to be implemented. As the prsctp
statistics for each stream has to be based on per stream statistics,
we will delay it until rfc6525 is done in linux.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used
to set/get sctp Partially Reliable Policies' default params,
which includes 3 policies (ttl, rtx, prio) and their values.
Still, if we set policy params in sndinfo, we will use the params
of sndinfo against chunks, instead of the default params.
In this patch, we will use 5-8bit of sp/asoc->default_flags
to store prsctp policies, and reuse asoc->default_timetolive
to store their values. It means if we enable and set prsctp
policy, prior ttl timeout in sctp will not work any more.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
We will add prsctp_enable to both asoc and ep, and replace the places
where it used net.sctp->prsctp_enable with asoc->prsctp_enable.
ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
also modify it's value through sockopt SCTP_PR_SUPPORTED.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LPM can be enabled via a DDC write command at specific DDC ID.
As any other DDC value, this is up to the DDC config file to
include (or not) the low power mode configuration.
Signed-off-by: Loic Poulain <loic.poulain@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This reverts commit 4386f5662e ("net: ethernet: bcmgenet: use
phy_ethtool_{get|set}_link_ksettings")
This patch is wrong, the function phy_ethtool_{get|set}_link_ksettings
don't check if the device is running, but the driver bcmgenet need this
check.
The function {get|set}_settings need to access the mdio bus, and this
bus may only be used when the device is running. Otherwise, the clock
is disable and a mdio access will fail.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: dsa: b53: Add Broadcom NSP switch support
This patch series updates the B53 driver to support Broadcom's Northstar Plus
Soc integrated switch.
Unlike the version of the core present in BCM5301x/Northstar, we cannot read the
full chip id of the switch, so we need to get the information about our switch
id from Device Tree.
Other than that, this is a regular Broadcom Ethernet switch which is register
compatible for all practical purposes with the existing switch driver.
Since DSA requires a working CPU Ethernet MAC driver this depends on Jon
Mason's AMAC/BGMAC driver changes to support NSP. Board specific changes depend
on patches present in Broadcom's ARM SoC branches and will be posted in a short
while.
====================
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the SRAB, core driver and binding document to support the
BCM585xx/586xx/88312 integrated switch (Northstar Plus SoCs family).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Northstart Plus SoCs, we cannot detect the switch because only the
revision information is provied in the Management page, instead, rely on
Device Tree to tell us the chip id, and pass it down using the
b53_platform_data structure.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If socket filter truncates an udp packet below the length of UDP header
in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
kernel is configured that way) can be easily enforced by an unprivileged
user which was reported as CVE-2016-6162. For a reproducer, see
http://seclists.org/oss-sec/2016/q3/8
Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rc is not initialized so it can contain garbage if it is not
set by the call to bnxt_read_sfp_module_eeprom_info. Ensure
garbage is not returned by initializing rc to 0.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix possible NULL pointer dereference for vlan_insert_tag (two patches)
- Fix reference handling in some features, which may lead to reference
leaks or invalid memory access (four patches)
- Fix speedy join: DHCP packets handled by the gateway feature should
be sent with 4-address unicast instead of 3-address unicast to make
speedy join work. This fixes/speeds up DHCP assignment for clients
which join a mesh for the first time. (one patch)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJXf3SCAAoJEKEr45hCkp6hAaAQAJxKFavGbXHWvj1M1VxqVFkN
AlxP7JZ6OHgnWxBT3drk4ZRaxIA7v/2VkRYrCbxoYjIENiyrmNz+93SAzaBcTKxE
nnUntdDbQWYE3MOGC1lUBIoPgjvs4DQRejyq5dvG9CYEcK9hE4pDKV7FUfeBgmgL
dG5+9ht8JEjMYZq48FQp4SQwkQGpWRiS4fekZEUmcO1pIQpx0uOYTMfMZ/HpqpCN
im1QhUXlAGCBcOIJwztqVb/04LKcuTS8Du+b50BFF5uITmCZdK0NmG5yBH+1Nn8K
uKYanY3dHYUE4eGw3NAqnJ0uSiMQFlhk3gqKgHY8uu/KoMiqZ3tjBkNp+3fF3KqH
0AnXEPPsQPU8RJ5WAHH6TR/UNnoCrfqU6AjbIclHNq7l3WY6u0fD2uKHCGlaV13M
8XolPWECum8iLEptmYDlhYZrh5D9kteGDV7kt3XtQY8Hpv/UE1Jh1/iGrhNjtbdX
7P6NsZdi/cnkGPhIaRnoEQaWHZVmbO4Rl8Q2Yb3Ze2LEUuLdrkmBjTBKqiOFMnMe
7ltA3JL7ip/alRPeNsuiHOY28uNaog3YuEHg8QYiyTs449Os/TjWoh9pzD44dhkB
auIxmiy/IyVdYwlQwfBHDJupVK7WncUq+iF/rv3TfTmY25FO4FC+EV+PsBZdWsc+
co+amJR57ZOAygd0GgU2
=7Z04
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-for-davem-20160708' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are a couple batman-adv bugfix patches, all by Sven Eckelmann:
- Fix possible NULL pointer dereference for vlan_insert_tag (two patches)
- Fix reference handling in some features, which may lead to reference
leaks or invalid memory access (four patches)
- Fix speedy join: DHCP packets handled by the gateway feature should
be sent with 4-address unicast instead of 3-address unicast to make
speedy join work. This fixes/speeds up DHCP assignment for clients
which join a mesh for the first time. (one patch)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
register_vga_switcheroo() sets the PM ops from the hda structure which
is freed later in azx_free. Make sure that these ops are cleared.
Caught by KASAN, initially noticed due to a general protection fault.
Fixes: 246efa4a07 ("snd/hda: add runtime suspend/resume on optimus support (v4)")
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Currently the two are unioned together, but I don't think that's safe.
It looks like get_cached_acl could race with the last put in
posix_acl_release. get_cached_acl calls atomic_inc_not_zero on
a_refcount, but that field could have already been clobbered by
call_rcu, and may no longer be zero. Fix this by de-unioning the two
fields.
Fixes: b8a7a3a667 (posix_acl: Inode acl caching fixes)
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Variable "now" seems to be genuinely used unintialized
if branch
if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
is not taken and branch
if (unlikely(sighand == NULL)) {
is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Revert commit 3d4b7ae96d (ACPI 2.0 / AML: Improve module level
execution by moving the If/Else/While execution to per-table basis)
that enabled the execution of module-level AML after loading each
table (rather than after all AML tables have been loaded), but
overlooked locking issues resulting from that change.
Fixes: 3d4b7ae96d (ACPI 2.0 / AML: Improve module level execution by moving the If/Else/While execution to per-table basis)
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Revert commit 2f38b1b16d (ACPICA: Namespace: Fix deadlock triggered by
MLC support in dynamic table loading) that attempted to fix a deadlock
issue introduced by a previous commit, but it led to a lock ordering
inconsistency that caused further problems to appear.
Fixes: 2f38b1b16d (ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch corrects an off-by-one error in the DecodeQ931 function in
the nf_conntrack_h323 module. This error could result in reading off
the end of a Q.931 frame.
Signed-off-by: Toby DiPasquale <toby@cbcg.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
IPVS Updates for v4.8
please consider these enhancements to the IPVS. This alters the behaviour
of the "least connection" schedulers such that pre-established connections
are included in the active connection count. This avoids overloading
servers when a large number of new connections arrive in a short space of
time - e.g. when clients reconnect after a node or network failure.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We can pass the netns pointer as parameter to the functions that need to
gain access to it. From basechains, I didn't find any client for this
field anymore so let's remove this too.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If we want to use ct packets expr, and add a rule like follows:
# nft add rule filter input ct packets gt 1 counter
We will find that no packets will hit it, because
nf_conntrack_acct is disabled by default. So It will
not work until we enable it manually via
"echo 1 > /proc/sys/net/netfilter/nf_conntrack_acct".
This is not friendly, so like xt_connbytes do, if the user
want to use ct byte/packet expr, enable nf_conntrack_acct
automatically.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
physdev_mt() will check skb->nf_bridge first, which was alloced in
br_nf_pre_routing. So if we want to use --physdev-out and physdev-is-out,
we need to match it in FORWARD or POSTROUTING chain. physdev_mt_check()
only checked physdev-out and missed physdev-is-out. Fix it and update the
debug message to make it clearer.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Marcelo R Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It did use a fixed-size bucket list plus single lock to protect add/del.
Unlike the main conntrack table we only need to add and remove keys.
Convert it to rhashtable to get table autosizing and per-bucket locking.
The maximum number of entries is -- as before -- tied to the number of
conntracks so we do not need another upperlimit.
The change does not handle rhashtable_remove_fast error, only possible
"error" is -ENOENT, and that is something that can happen legitimetely,
e.g. because nat module was inserted at a later time and no src manip
took place yet.
Tested with http-client-benchmark + httpterm with DNAT and SNAT rules
in place.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
Second Round of IPVS Fixes for v4.7
The fix from Quentin Armitage allows the backup sync daemon to
be bound to a link-local mcast IPv6 address as is already the case
for IPv4.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nat extension structure is 32bytes in size on x86_64:
struct nf_conn_nat {
struct hlist_node bysource; /* 0 16 */
struct nf_conn * ct; /* 16 8 */
union nf_conntrack_nat_help help; /* 24 4 */
int masq_index; /* 28 4 */
/* size: 32, cachelines: 1, members: 4 */
/* last cacheline: 32 bytes */
};
The hlist is needed to quickly check for possible tuple collisions
when installing a new nat binding. Storing this in the extension
area has two drawbacks:
1. We need ct backpointer to get the conntrack struct from the extension.
2. When reallocation of extension area occurs we need to fixup the bysource
hash head via hlist_replace_rcu.
We can avoid both by placing the hlist_head in nf_conn and place nf_conn in
the bysource hash rather than the extenstion.
We can also remove the ->move support; no other extension needs it.
Moving the entire nat extension into nf_conn would be possible as well but
then we have to add yet another callback for deletion from the bysource
hash table rather than just using nat extension ->destroy hook for this.
nf_conn size doesn't increase due to aligment, followup patch replaces
hlist_node with single pointer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We don't need to acquire the bucket lock during early drop, we can
use lockless traveral just like ____nf_conntrack_find.
The timer deletion serves as synchronization point, if another cpu
attempts to evict same entry, only one will succeed with timer deletion.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
From: Liping Zhang <liping.zhang@spreadtrum.com>
Similar to ctnl_untimeout, when hash resize happened, we should try
to do unhelp from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing
ctnl_untimeout, and iterate on 3000# bucket.
Meanwhile, another user try to reduce hash size to 2048, then all nf_conn
are removed to the new hashtable. When this hash resize operation finished,
we still try to itreate ct begin from 3000# bucket, find nothing to do and
just return.
We may miss unlinking some timeout objects. And later we will end up with
invalid references to timeout object that are already gone.
So when we find that hash resize happened, try to unlink timeout objects
from the 0# bucket again.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack
hash table via /sys/module/nf_conntrack/parameters/hashsize, race will
happen, because reader can observe a newly allocated hash but the old size
(or vice versa). So oops will happen like follows:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack]
Call Trace:
[<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack]
[<ffffffff81261a1c>] seq_read+0x2cc/0x390
[<ffffffff812a8d62>] proc_reg_read+0x42/0x70
[<ffffffff8123bee7>] __vfs_read+0x37/0x130
[<ffffffff81347980>] ? security_file_permission+0xa0/0xc0
[<ffffffff8123cf75>] vfs_read+0x95/0x140
[<ffffffff8123e475>] SyS_read+0x55/0xc0
[<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4
It is very easy to reproduce this kernel crash.
1. open one shell and input the following cmds:
while : ; do
echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize
done
2. open more shells and input the following cmds:
while : ; do
cat /proc/net/nf_conntrack
done
3. just wait a monent, oops will happen soon.
The solution in this patch is based on Florian's Commit 5e3c61f981
("netfilter: conntrack: fix lookup race during hash resize"). And
add a wrapper function nf_conntrack_get_ht to get hash and hsize
suggested by Florian Westphal.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
the dwmac is set to sgmii.
Signed-off-by: Tien Hock Loh <thloh@altera.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af ("tmpfs: don't undo fallocate past its last page")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>