ipip packets are blocked in some public cloud environments, this patch
allows gue encapsulation with the tunneling method, which would make
tunneling working in those environments.
Signed-off-by: Jacky Hu <hengqing.hu@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Minor comment merge conflict in mlx5.
Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
On top of this, a cleanup of kvm_para.h headers, which were exported by
some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcoM5VAAoJEL/70l94x66DU3EH/A8sYdsfeqALWElm2Sy9TYas
mntz+oTWsl3vDy8s8zp1ET2NpF7oBlBEMmCWhVEJaD+1qW3VpTRAseR3Zr9ML9xD
k+BQM8SKv47o86ZN+y4XALl30Ckb3DXh/X1xsrV5hF6J3ofC+Ce2tF560l8C9ygC
WyHDxwNHMWVA/6TyW3mhunzuVKgZ/JND9+0zlyY1LKmUQ0BQLle23gseIhhI0YDm
B4VGIYU2Mf8jCH5Ir3N/rQ8pLdo8U7f5P/MMfgXQafksvUHJBg6B6vOhLJh94dLh
J2wixYp1zlT0drBBkvJ0jPZ75skooWWj0o3otEA7GNk/hRj6MTllgfL5SajTHZg=
=/A7u
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"A collection of x86 and ARM bugfixes, and some improvements to
documentation.
On top of this, a cleanup of kvm_para.h headers, which were exported
by some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
KVM: doc: Document the life cycle of a VM and its resources
KVM: selftests: complete IO before migrating guest state
KVM: selftests: disable stack protector for all KVM tests
KVM: selftests: explicitly disable PIE for tests
KVM: selftests: assert on exit reason in CR4/cpuid sync test
KVM: x86: update %rip after emulating IO
x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
kvm: don't redefine flags as something else
kvm: mmu: Used range based flushing in slot_handle_level_range
KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
KVM: Reject device ioctls from processes other than the VM's creator
KVM: doc: Fix incorrect word ordering regarding supported use of APIs
KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
...
Here are some binder, habanalabs, and vboxguest driver fixes for
5.1-rc3.
The Binder fixes resolve some reported issues found by testing, first by
the selinux developers, and then earlier today by syzbot.
The habanalabs fixes are all minor, resolving a number of tiny things.
The vboxguest patches are a bit larger. They resolve the fact that
virtual box decided to change their api in their latest release in a way
that broke the existing kernel code, despite saying that they were never
going to do that. So this is a bit of a "new feature", but is good to
get merged so that 5.1 will work with the latest release. The changes
are not large and of course virtual box "swears" they will not break
this again, but no one is holding their breath here.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXJ50KA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykGsgCgtDaSHl+qjalyC3SegP9s6fUfoXwAoKuKS2Ti
ROSQqZKSRNWvAqCwWUT4
=K3ll
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some binder, habanalabs, and vboxguest driver fixes for
5.1-rc3.
The Binder fixes resolve some reported issues found by testing, first
by the selinux developers, and then earlier today by syzbot.
The habanalabs fixes are all minor, resolving a number of tiny things.
The vboxguest patches are a bit larger. They resolve the fact that
virtual box decided to change their api in their latest release in a
way that broke the existing kernel code, despite saying that they were
never going to do that. So this is a bit of a "new feature", but is
good to get merged so that 5.1 will work with the latest release. The
changes are not large and of course virtual box "swears" they will not
break this again, but no one is holding their breath here.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
virt: vbox: Implement passing requestor info to the host for VirtualBox 6.0.x
binder: fix race between munmap() and direct reclaim
binder: fix BUG_ON found by selinux-testsuite
habanalabs: cast to expected type
habanalabs: prevent host crash during suspend/resume
habanalabs: perform accounting for active CS
habanalabs: fix mapping with page size bigger than 4KB
habanalabs: complete user context cleanup before hard reset
habanalabs: fix bug when mapping very large memory area
habanalabs: fix MMU number of pages calculation
There is currently no support for the multicast/broadcast aspects
of VXLAN in ovs. In the datapath flow the tun_dst must specific.
But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
And the packet can forward through the fdb table of vxlan devcice. In
this mode the broadcast/multicast packet can be sent through the
following ways in ovs.
ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
options:key=1000 options:remote_ip=flow
ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff, \
action=output:vxlan
bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
src_vni 1000 vni 1000 self
bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
src_vni 1000 vni 1000 self
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for fine-grain timeout support to conntrack action.
The new OVS_CT_ATTR_TIMEOUT attribute of the conntrack action
specifies a timeout to be associated with this connection.
If no timeout is specified, it acts as is, that is the default
timeout for the connection will be automatically applied.
Example usage:
$ nfct timeout add timeout_1 inet tcp syn_sent 100 established 200
$ ovs-ofctl add-flow br0 in_port=1,ip,tcp,action=ct(commit,timeout=timeout_1)
CC: Pravin Shelar <pshelar@ovn.org>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Drop license boilerplate (obsoleted by SPDX license IDs),
by Sven Eckelmann
- Drop documentation for sysfs and debugfs Documentation,
by Sven Eckelmann (2 patches)
- Mark sysfs as optional and deprecated, by Sven Eckelmann (3 patches)
- Update MAINTAINERS Tree, Chat and Bugtracker,
by Sven Eckelmann (3 patches)
- Rename batadv_dat_send_data, by Sven Eckelmann
- update DAT entries with incoming ARP replies, by Linus Luessing
- add multicast-to-unicast support for limited destinations,
by Linus Luessing
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlyc6uEWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSA9D/sEpVY0qOITIwzbttcyeDU5PPSD
OF4dVCf6Za6CqfnPRCdViKAGtC1FOz+X2BXtedrIxgsjSPFoRvRoi1XBdu4Bobv2
/4wx56rz3AeMoBZ1UyziUIS6Qam1x7vVYSRXk+QHqBYVc16YiIePpCqTuryrzuk4
4MMqXz+V0dqm7z7irRDe7W9/CdFRtZEDAS8o6cgw4IlL56Ul3Yz6xP6p3PRA+H6V
OWtVwmwcbX2KzZnrWDgql5NBhJ1bOfn2oDp1Y4RpLRmBp0iwg1qZdNZK2+MD2TTw
xxuz5lsZFhTBXNqGgeoGk87m2z0wNkvnj9UnkMPl3gb7j+FyyaAgvVY4M2s2qJv/
++wKDPPun/aGDOuo/rJdBTdlnToH17KS3jsDwhj4TooroI8uCCLWZQaYWkgjcugD
ZKsZlIqFrfH3rPAzOBwRZodoYkOPpz/+xHp3p/cg9ANifwqpxqq3PY35BoP4ZXRi
xUy79QgNIFxYXwrrqTrt3UrY8AGo1/OOHmA6nFQGZT79S648ZoG5vPDKFKRzTmcj
Mj2GXuBzMIkWayHgnH69Kv9vVZc7mZPi7lartsVq/aZtMCh3HbPNfKtNOYsu4QEq
6c2966jvFB+LdTibiJQWbe0s5Z96UaFQUxH5+gGdM5TS5TCIaG3udXoI1ou4YVJI
q6eOdAgblbD7oaNY4w==
=WB31
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20190328' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- Drop license boilerplate (obsoleted by SPDX license IDs),
by Sven Eckelmann
- Drop documentation for sysfs and debugfs Documentation,
by Sven Eckelmann (2 patches)
- Mark sysfs as optional and deprecated, by Sven Eckelmann (3 patches)
- Update MAINTAINERS Tree, Chat and Bugtracker,
by Sven Eckelmann (3 patches)
- Rename batadv_dat_send_data, by Sven Eckelmann
- update DAT entries with incoming ARP replies, by Linus Luessing
- add multicast-to-unicast support for limited destinations,
by Linus Luessing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I do not see any consistency about headers_install of <linux/kvm_para.h>
and <asm/kvm_para.h>.
According to my analysis of Linux 5.1-rc1, there are 3 groups:
[1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
alpha, arm, hexagon, mips, powerpc, s390, sparc, x86
[2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not
arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc,
parisc, sh, unicore32, xtensa
[3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
csky, nds32, riscv
This does not match to the actual KVM support. At least, [2] is
half-baked.
Nor do arch maintainers look like they care about this. For example,
commit 0add53713b ("microblaze: Add missing kvm_para.h to Kbuild")
exported <asm/kvm_para.h> to user-space in order to fix an in-kernel
build error.
We have two ways to make this consistent:
[A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all
architectures, irrespective of the KVM support
[B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h>
to the KVM support
My first attempt was [A] because the code looks cleaner, but Paolo
suggested [B].
So, this commit goes with [B].
For most architectures, <asm/kvm_para.h> was moved to the kernel-space.
I changed include/uapi/linux/Kbuild so that it checks generated
asm/kvm_para.h as well as check-in ones.
After this commit, there will be two groups:
[1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported
arm, arm64, mips, powerpc, s390, x86
[2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported
alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze,
nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch adds a new action - 'check_pkt_len' which checks the
packet length and executes a set of actions if the packet
length is greater than the specified length or executes
another set of actions if the packet length is lesser or equal to.
This action takes below nlattrs
* OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
* OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER - Nested actions
to apply if the packet length is greater than the specified 'pkt_len'
* OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested
actions to apply if the packet length is lesser or equal to the
specified 'pkt_len'.
The main use case for adding this action is to solve the packet
drops because of MTU mismatch in OVN virtual networking solution.
When a VM (which belongs to a logical switch of OVN) sends a packet
destined to go via the gateway router and if the nic which provides
external connectivity, has a lesser MTU, OVS drops the packet
if the packet length is greater than this MTU.
With the help of this action, OVN will check the packet length
and if it is greater than the MTU size, it will generate an
ICMP packet (type 3, code 4) and includes the next hop mtu in it
so that the sender can fragment the packets.
Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
CC: Gregory Rose <gvrose8192@gmail.com>
CC: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for Fast Link Down as new PHY tunable.
Fast Link Down reduces the time until a link down event is reported
for 1000BaseT. According to the standard it's 750ms what is too long
for several use cases.
v2:
- add comment describing the constants
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
An FoU socket is currently bound to the wildcard-address. While this
works fine, there are several use-cases where the use of the
wildcard-address is not desirable. For example, I use FoU on some
multi-homed servers and would like to use FoU on only one of the
interfaces.
This commit adds support for binding FoU sockets to a given source
address/interface, as well as connecting the socket to a given
destination address/port. udp_tunnel already provides the required
infrastructure, so most of the code added is for exposing and setting
the different attributes (local address, peer address, etc.).
The lookups performed when we add, delete or get an FoU-socket has also
been updated to compare all the attributes a user can set. Since the
comparison now involves several elements, I have added a separate
comparison-function instead of open-coding.
In order to test the code and ensure that the new comparison code works
correctly, I started by creating a wildcard socket bound to port 1234 on
my machine. I then tried to create a non-wildcarded socket bound to the
same port, as well as fetching and deleting the socket (including source
address, peer address or interface index in the netlink request). Both
the create, fetch and delete request failed. Deleting/fetching the
socket was only successful when my netlink request attributes matched
those used to create the socket.
I then repeated the tests, but with a socket bound to a local ip
address, a socket bound to a local address + interface, and a bound
socket that was also «connected» to a peer. Add only worked when no
socket with the matching source address/interface (or wildcard) existed,
while fetch/delete was only successful when all attributes matched.
In addition to testing that the new code work, I also checked that the
current behavior is kept. If none of the new attributes are provided,
then an FoU-socket is configured as before (i.e., wildcarded). If any
of the new attributes are provided, the FoU-socket is configured as
expected.
v1->v2:
* Fixed building with IPv6 disabled (kbuild).
* Fixed a return type warning and make the ugly comparison function more
readable (kbuild).
* Describe more in detail what has been tested (thanks David Miller).
* Make peer port required if peer address is specified.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Fixes here and there, a couple new device IDs, as usual:
1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.
2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.
3) Fix documentation for some eBPF helpers, from Quentin Monnet.
4) Some UAPI bpf header sync with tools, also from Quentin Monnet.
5) Set descriptor ownership bit at the right time for jumbo frames in
stmmac driver, from Aaro Koskinen.
6) Set IFF_UP properly in tun driver, from Eric Dumazet.
7) Fix load/store doubleword instruction generation in powerpc eBPF
JIT, from Naveen N. Rao.
8) nla_nest_start() return value checks all over, from Kangjie Lu.
9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
merge window. From Marcelo Ricardo Leitner and Xin Long.
10) Fix memory corruption with large MTUs in stmmac, from Aaro
Koskinen.
11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
Dumazet.
12) Fix topology subscription cancellation in tipc, from Erik Hugne.
13) Memory leak in genetlink error path, from Yue Haibing.
14) Valid control actions properly in packet scheduler, from Davide
Caratti.
15) Even if we get EEXIST, we still need to rehash if a shrink was
delayed. From Herbert Xu.
16) Fix interrupt mask handling in interrupt handler of r8169, from
Heiner Kallweit.
17) Fix leak in ehea driver, from Wen Yang"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
dpaa2-eth: fix race condition with bql frame accounting
chelsio: use BUG() instead of BUG_ON(1)
net: devlink: skip info_get op call if it is not defined in dumpit
net: phy: bcm54xx: Encode link speed and activity into LEDs
tipc: change to check tipc_own_id to return in tipc_net_stop
net: usb: aqc111: Extend HWID table by QNAP device
net: sched: Kconfig: update reference link for PIE
net: dsa: qca8k: extend slave-bus implementations
net: dsa: qca8k: remove leftover phy accessors
dt-bindings: net: dsa: qca8k: support internal mdio-bus
dt-bindings: net: dsa: qca8k: fix example
net: phy: don't clear BMCR in genphy_soft_reset
bpf, libbpf: clarify bump in libbpf version info
bpf, libbpf: fix version info and add it to shared object
rxrpc: avoid clang -Wuninitialized warning
tipc: tipc clang warning
net: sched: fix cleanup NULL pointer exception in act_mirr
r8169: fix cable re-plugging issue
net: ethernet: ti: fix possible object reference leak
net: ibm: fix possible object reference leak
...
VirtualBox 6.0.x has a new feature where the guest kernel driver passes
info about the origin of the request (e.g. userspace or kernelspace) to
the hypervisor.
If we do not pass this information then when running the 6.0.x userspace
guest-additions tools on a 6.0.x host, some requests will get denied
with a VERR_VERSION_MISMATCH error, breaking vboxservice.service and
the mounting of shared folders marked to be auto-mounted.
This commit implements passing the requestor info to the host, fixing this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-03-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) introduce bpf_tcp_check_syncookie() helper for XDP and tc, from Lorenz.
2) allow bpf_skb_ecn_set_ce() in tc, from Peter.
3) numerous bpf tc tunneling improvements, from Willem.
4) and other miscellaneous improvements from Adrian, Alan, Daniel, Ivan, Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch multicast packets with a limited number of destinations
(current default: 16) will be split and transmitted by the originator as
individual unicast transmissions.
Wifi broadcasts with their low bitrate are still a costly undertaking.
In a mesh network this cost multiplies with the overall size of the mesh
network. Therefore using multiple unicast transmissions instead of
broadcast flooding is almost always less burdensome for the mesh
network.
The maximum amount of unicast packets can be configured via the newly
introduced multicast_fanout parameter. If this limit is exceeded
distribution will fall back to classic broadcast flooding.
The multicast-to-unicast conversion is performed on the initial
multicast sender node and counts on a final destination node, mesh-wide
basis (and not next hop, neighbor node basis).
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
All files got a SPDX-License-Identifier with commit 7db7d9f369
("batman-adv: Add SPDX license identifier above copyright header"). All the
required information about the license conditions can be found in
LICENSES/.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Add documentation to the tcp_ca_state enum, since this enum is
exposed in uapi.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Sowmini Varadhan <sowmini05@gmail.com>
Acked-by: Sowmini Varadhan <sowmini05@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When pushing tunnel headers, annotate skbs in the same way as tunnel
devices.
For GSO packets, the network stack requires certain fields set to
segment packets with tunnel headers. gro_gse_segment depends on
transport and inner mac header, for instance.
Add an option to pass this information.
Remove the restriction on len_diff to network header length, which
is too short, e.g., for GRE protocols.
Changes
v1->v2:
- document new flags
- BPF_F_ADJ_ROOM_MASK moved
v2->v3:
- BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_skb_adjust_room adjusts gso_size of gso packets to account for the
pushed or popped header room.
This is not allowed with UDP, where gso_size delineates datagrams. Add
an option to avoid these updates and allow this call for datagrams.
It can also be used with TCP, when MSS is known to allow headroom,
e.g., through MSS clamping or route MTU.
Changes v1->v2:
- document flag BPF_F_ADJ_ROOM_FIXED_GSO
- do not expose BPF_F_ADJ_ROOM_MASK through uapi, as it may change.
Link: https://patchwork.ozlabs.org/patch/1052497/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_skb_adjust_room net allows inserting room in an skb.
Existing mode BPF_ADJ_ROOM_NET inserts room after the network header
by pulling the skb, moving the network header forward and zeroing the
new space.
Add new mode BPF_ADJUST_ROOM_MAC that inserts room after the mac
header. This allows inserting tunnel headers in front of the network
header without having to recreate the network header in the original
space, avoiding two copies.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Using bpf_skc_lookup_tcp it's possible to ascertain whether a packet
belongs to a known connection. However, there is one corner case: no
sockets are created if SYN cookies are active. This means that the final
ACK in the 3WHS is misclassified.
Using the helper, we can look up the listening socket via
bpf_skc_lookup_tcp and then check whether a packet is a valid SYN
cookie ACK.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Allow looking up a sock_common. This gives eBPF programs
access to timewait and request sockets.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In commit f2780d6d74 "tun: Add ioctl() SIOCGSKNS cmd to allow
obtaining net ns of tun device" it was missed that tun may change
its net ns, while net ns of socket remains the same as it was
created initially. SIOCGSKNS returns net ns of socket, so it is
not suitable for obtaining net ns of device.
We may have two tun devices with the same names in two net ns,
and in this case it's not possible to determ, which of them
fd refers to (TUNGETIFF will return the same name).
This patch adds new ioctl() cmd for obtaining net ns of a device.
Reported-by: Harald Albrecht <harald.albrecht@gmx.net>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for AES128-CCM based record encryption. AES128-CCM is
similar to AES128-GCM. Both of them have same salt/iv/mac size. The
notable difference between the two is that while invoking AES128-CCM
operation, the salt||nonce (which is passed as IV) has to be prefixed
with a hardcoded value '2'. Further, CCM implementation in kernel
requires IV passed in crypto_aead_request() to be full '16' bytes.
Therefore, the record structure 'struct tls_rec' has been modified to
reserve '16' bytes for IV. This works for both GCM and CCM based cipher.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, a multicast stream uses either broadcast or replicast as
transmission method, based on the ratio between number of actual
destinations nodes and cluster size.
However, when an L2 interface (e.g., VXLAN) provides pseudo
broadcast support, this becomes very inefficient, as it blindly
replicates multicast packets to all cluster/subnet nodes,
irrespective of whether they host actual target sockets or not.
The TIPC multicast algorithm is able to distinguish real destination
nodes from other nodes, and hence provides a smarter and more
efficient method for transferring multicast messages than
pseudo broadcast can do.
Because of this, we now make it possible for users to force
the broadcast link to permanently switch to using replicast,
irrespective of which capabilities the bearer provides,
or pretend to provide.
Conversely, we also make it possible to force the broadcast link
to always use true broadcast. While maybe less useful in
deployed systems, this may at least be useful for testing the
broadcast algorithm in small clusters.
We retain the current AUTOSELECT ability, i.e., to let the broadcast link
automatically select which algorithm to use, and to switch back and forth
between broadcast and replicast as the ratio between destination
node number and cluster size changes. This remains the default method.
Furthermore, we make it possible to configure the threshold ratio for
such switches. The default ratio is now set to 10%, down from 25% in the
earlier implementation.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes
the common Kbuild.asm file. Factor out the duplicated include directives
to scripts/Makefile.asm-generic so that no architecture would opt out
of the mandatory-y mechanism.
um is not forced to include mandatory-y since it is a very exceptional
case which does not support UAPI.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlx+nn4ACgkQjrBW1T7s
sS2kwg//aJUCwLIhV91gXUFN2jHTCf0/+5fnigEk7JhAT5wmAykxLM8tprLlIlyp
HtwNQx54hq/6p010Ulo9K50VS6JRii+2lNSpC6IkqXXdHXXm0ViH+5I9Nru8SVJ+
avRCYWNjW9Gn1EtcB2yv6KP3XffgnQ6ZLIr4QJwglOxgAqUaWZ68woSUlrIR5yFj
j48wAxjsC3g2qwGLvXPeiwYZHwk6VnYmrZ3eWXPDthWRDC4zkjyBdchZZzFJagSC
6sX8T9s5ua5juZMokEJaWjuBQQyfg0NYu41hupSdVjV7/0D3E+5/DiReInvLmSup
63bZ85uKRqWTNgl4cmJ1W3aVe2RYYemMZCXVVYYvU+IKpvTSzzYY7us+FyMAIRUV
bT+XPGzTWcGrChzv9bHZcBrkL91XGqyxRJz56jLl6EhRtqxmzmywf6mO6pS2WK4N
r+aBDgXeJbG39KguCzwUgVX8hC6YlSxSP8Md+2sK+UoAdfTUvFtdCYnjhuACofCt
saRvDIPF8N9qn4Ch3InzCKkrUTL/H3BZKBl2jo6tYQ9smUsFZW7lQoip5Ui/0VS+
qksJ91djOc9facGoOorPazojY5fO5Lj3Hg+cGIoxUV0jPH483z7hWH0ALynb0f6z
EDsgNyEUpIO2nJMJJfm37ysbU/j1gOpzQdaAEaWeknwtfecFPzM=
=yOWp
-----END PGP SIGNATURE-----
Merge tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd system call from Christian Brauner:
"This introduces the ability to use file descriptors from /proc/<pid>/
as stable handles on struct pid. Even if a pid is recycled the handle
will not change. For a start these fds can be used to send signals to
the processes they refer to.
With the ability to use /proc/<pid> fds as stable handles on struct
pid we can fix a long-standing issue where after a process has exited
its pid can be reused by another process. If a caller sends a signal
to a reused pid it will end up signaling the wrong process.
With this patchset we enable a variety of use cases. One obvious
example is that we can now safely delegate an important part of
process management - sending signals - to processes other than the
parent of a given process by sending file descriptors around via scm
rights and not fearing that the given process will have been recycled
in the meantime. It also allows for easy testing whether a given
process is still alive or not by sending signal 0 to a pidfd which is
quite handy.
There has been some interest in this feature e.g. from systems
management (systemd, glibc) and container managers. I have requested
and gotten comments from glibc to make sure that this syscall is
suitable for their needs as well. In the future I expect it to take on
most other pid-based signal syscalls. But such features are left for
the future once they are needed.
This has been sitting in linux-next for quite a while and has not
caused any issues. It comes with selftests which verify basic
functionality and also test that a recycled pid cannot be signaled via
a pidfd.
Jon has written about a prior version of this patchset. It should
cover the basic functionality since not a lot has changed since then:
https://lwn.net/Articles/773459/
The commit message for the syscall itself is extensively documenting
the syscall, including it's functionality and extensibility"
* tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: add tests for pidfd_send_signal()
signal: add pidfd_send_signal() syscall
Daniel Borkmann says:
====================
pull-request: bpf 2019-03-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a umem memory leak on cleanup in AF_XDP, from Björn.
2) Fix BTF to properly resolve forward-declared enums into their corresponding
full enum definition types during deduplication, from Andrii.
3) Fix libbpf to reject invalid flags in xsk_socket__create(), from Magnus.
4) Fix accessing invalid pointer returned from bpf_tcp_sock() and
bpf_sk_fullsock() after bpf_sk_release() was called, from Martin.
5) Fix generation of load/store DW instructions in PPC JIT, from Naveen.
6) Various fixes in BPF helper function documentation in bpf.h UAPI header
used to bpf-helpers(7) man page, from Quentin.
7) Fix segfault in BPF test_progs when prog loading failed, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add documentation for the BPF spinlock-related helpers to the doc in
bpf.h. I added the constraints and restrictions coming with the use of
spinlocks for BPF: not all of it is directly related to the use of the
helper, but I thought it would be nice for users to find them in the man
page.
This list of restrictions is nearly a verbatim copy of the list in
Alexei's commit log for those helpers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Another round of minor fixes for the documentation of the BPF helpers
located in the UAPI bpf.h header file. Changes include:
- Moving around description of some helpers, to keep the descriptions in
the same order as helpers are declared (bpf_map_push_elem(), leftover
from commit 90b1023f68 ("bpf: fix documentation for eBPF helpers"),
bpf_rc_keydown(), and bpf_skb_ancestor_cgroup_id()).
- Fixing typos ("contex" -> "context").
- Harmonising return types ("void* " -> "void *", "uint64_t" -> "u64").
- Addition of the "bpf_" prefix to bpf_get_storage().
- Light additions of RST markup on some keywords.
- Empty line deletion between description and return value for
bpf_tcp_sock().
- Edit for the description for bpf_skb_ecn_set_ce() (capital letters,
acronym expansion, no effect if ECT not set, more details on return
value).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull networking fixes from David Miller:
"More fixes in the queue:
1) Netfilter nat can erroneously register the device notifier twice,
fix from Florian Westphal.
2) Use after free in nf_tables, from Pablo Neira Ayuso.
3) Parallel update of steering rule fix in mlx5 river, from Eli
Britstein.
4) RX processing panic in lan743x, fix from Bryan Whitehead.
5) Use before initialization of TCP_SKB_CB, fix from Christoph Paasch.
6) Fix locking in SRIOV mode of mlx4 driver, from Jack Morgenstein.
7) Fix TX stalls in lan743x due to mishandling of interrupt ACKing
modes, from Bryan Whitehead.
8) Fix infoleak in l2tp_ip6_recvmsg(), from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
pptp: dst_release sk_dst_cache in pptp_sock_destruct
MAINTAINERS: GENET & SYSTEMPORT: Add internal Broadcom list
l2tp: fix infoleak in l2tp_ip6_recvmsg()
net/tls: Inform user space about send buffer availability
net_sched: return correct value for *notify* functions
lan743x: Fix TX Stall Issue
net/mlx4_core: Fix qp mtt size calculation
net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling
net/mlx4_core: Fix reset flow when in command polling mode
mlxsw: minimal: Initialize base_mac
mlxsw: core: Prevent duplication during QSFP module initialization
net: dwmac-sun8i: fix a missing check of of_get_phy_mode
net: sh_eth: fix a missing check of of_get_phy_mode
net: 8390: fix potential NULL pointer dereferences
net: fujitsu: fix a potential NULL pointer dereference
net: qlogic: fix a potential NULL pointer dereference
isdn: hfcpci: fix potential NULL pointer dereference
Documentation: devicetree: add a new optional property for port mac address
net: rocker: fix a potential NULL pointer dereference
net: qlge: fix a potential NULL pointer dereference
...
Add a new helper "struct bpf_sock *bpf_get_listener_sock(struct bpf_sock *sk)"
which returns a bpf_sock in TCP_LISTEN state. It will trace back to
the listener sk from a request_sock if possible. It returns NULL
for all other cases.
No reference is taken because the helper ensures the sk is
in SOCK_RCU_FREE (where the TCP_LISTEN sock should be in).
Hence, bpf_sk_release() is unnecessary and the verifier does not
allow bpf_sk_release(listen_sk) to be called either.
The following is also allowed because the bpf_prog is run under
rcu_read_lock():
sk = bpf_sk_lookup_tcp();
/* if (!sk) { ... } */
listen_sk = bpf_get_listener_sock(sk);
/* if (!listen_sk) { ... } */
bpf_sk_release(sk);
src_port = listen_sk->src_port; /* Allowed */
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* Fix nfit-bus command submission regression
* Support retrieval of short-ARS results if the ARS state is "requires
continuation", and even if the "no_init_ars" module parameter is
specified.
* Allow busy-polling of the kernel ARS state by allowing root to reset
the exponential back-off timer.
* Filter potentially stale ARS results by tracking query-ARS relative to
the previous start-ARS.
* Enhance dax_device alignment checks
* Add support for the Hyper-V family of device-specific-methods (DSMs)
* Add several fixes and workarounds for Hyper-V compatibility.
* Fix support to cache the dirty-shutdown-count at init.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJchsnlAAoJEB7SkWpmfYgCzNcP+gIsRwy2nklV78CoIX9rMOF+
8AF/o3kb+XbAGziTkFRk5SXsAGLQl1PNTzfaJDiBfS0vX6c3ja4cDhX4wgLi/w1c
2opBA3Fd1qAk2XXkOtQQ+yqFFxXR3zFV+Iflue39XJhwgR0yoY10mKEIGkelMur5
kOZjfWA6qseVGlyujHoM4Ta19Le88S3Yx1Da2jYTxHOYYnpRqq/epuO3hqojL/CT
GIrJFD6ayyuGjeA4CD3LsDAGgISQbLSRg1DXbCebmIsuoQ9TZeD7egqxjWKutcSU
xQVbI7Dw/dUKnAdo4DO9x0kMKV7XWDfpPOk4eZc3TSY/2g7muatsc3ZhY347F4Ia
3R9ox23WS1hd41jGbYT1CeKPvTnqnIZ6zwCEIRTq+exXSEp+lzOdF3De6olMcgYR
b37aKCR3PNZwF+esQ1XLA7tHLbLfdjY/TcFQET9i4vUMOlC/NeVpJY6g4kgd/1Lu
MNHu5NxTGtuK6Hp8zJESQ4X+yhhIMPd5VwOeDOjTzuLwgMA2c4MtXeZTABfuBe1W
bN6Kthv8mk5+CnEvXZDAEdSRijTo8inTmYQuIyaLSkWFhvANHKZX4xTc9VVhTW2I
8IONqBD3ZtFW+z+qTRB3VbOp7qj3/gA7F+T5C+MV8gj/YAO6hCzGYq8kUtQ4FeuO
OZwXxuFYY+iOOF1XEM3f
=wt4z
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"The bulk of this has been in -next since before the merge window
opened, with no known collisions / issues reported.
The only detail worth noting, outside the summary below, is that the
"libnvdimm-start-pad" topic has been truncated to just cleanups and
small fixes. The full topic branch would have doubled down on hacks
around the "section alignment" limitation of the core-mm, instead
effort is now being spent to address that root issue in the memory
hotplug implementation for v5.2.
- Fix nfit-bus command submission regression
- Support retrieval of short-ARS results if the ARS state is
"requires continuation", and even if the "no_init_ars" module
parameter is specified
- Allow busy-polling of the kernel ARS state by allowing root to
reset the exponential back-off timer
- Filter potentially stale ARS results by tracking query-ARS relative
to the previous start-ARS
- Enhance dax_device alignment checks
- Add support for the Hyper-V family of device-specific-methods
(DSMs)
- Add several fixes and workarounds for Hyper-V compatibility
- Fix support to cache the dirty-shutdown-count at init"
* tag 'libnvdimm-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (25 commits)
libnvdimm/namespace: Clean up holder_class_store()
libnvdimm/of_pmem: Fix platform_no_drv_owner.cocci warnings
acpi/nfit: Update NFIT flags error message
libnvdimm/btt: Fix LBA masking during 'free list' population
libnvdimm/btt: Remove unnecessary code in btt_freelist_init
libnvdimm/pfn: Remove dax_label_reserve
dax: Check the end of the block-device capacity with dax_direct_access()
nfit/ars: Avoid stale ARS results
nfit/ars: Allow root to busy-poll the ARS state machine
nfit/ars: Introduce scrub_flags
nfit/ars: Remove ars_start_flags
nfit/ars: Attempt short-ARS even in the no_init_ars case
nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot
acpi/nfit: Require opt-in for read-only label configurations
libnvdimm/pmem: Honor force_raw for legacy pmem regions
libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init()
libnvdimm: Fix altmap reservation size calculation
libnvdimm, pfn: Fix over-trim in trim_pfn_device()
acpi/nfit: Fix bus command validation
libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM family
...
- A new interface for UBI to deal better with read disturb
- Reject unsupported ioctl flags in UBIFS (xfstests found it)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlyHyMUWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wf9+EACFjPJaTJeLPHQofH3+u9O8gPzh
ptQFzkEcRrr7Y7WjXnYGhjw83Nx4o5iM17gfqq7zYfuCMxVbC8zm0WZ9Ujj3p7xV
p3IJ0bu/9sdIgdo+X9P8XJugAlWit1cW4mI8ZIAl2/CmYBzho8Zo55BNngNQ5G+Y
o3JujvP7TAHm9gbqIUMrGpweBHKX0GoooYZBTPdkLyKnFT0yxzOc/jdVILspIxi5
GtDl4738xV7Ts3Fwson1BVqDdwqLvd2j+LBWeRTSYXKyQLIizxRHtk1EZHZtBDZk
hWS/IW6HOzJJ5EQHn1EFAyQEGhfm4Yty+X0/BaPn8wvGE3Oud7bd9zgUCoBrhhTv
ztLPXY1U1LV8aTCmww6IOXwFj+6BGpj5fIu7my14aqGPKVV5M2kkf+prnLimb9QN
C3WxUz1Spz6CwrexoncvGm9ujoQbmwYLtKVNjRFIJ267OelaVD8icuAp1pZLSDom
1B6l39UQctrMiNqxuzJL+eq2raVZnnSQTlDqbUjFnUuU3LccRRNYgzhT1O6Ph50U
xqSO2k7Pf41/zZXhdB009HLecVL4gsZOunhGOE7Vv4kr7hin0AfrnoegdL37YG8W
GF6BNBgeegOxYDyvbOIWOxDqwyBWY2TPLJJ1IUE6j0lU6P1293IlsYHyTXUIK6bM
CQinKMNAXICWvXG0YQ==
=DlM9
-----END PGP SIGNATURE-----
Merge tag 'upstream-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
- A new interface for UBI to deal better with read disturb
- Reject unsupported ioctl flags in UBIFS (xfstests found it)
* tag 'upstream-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: wl: Silence uninitialized variable warning
ubifs: Reject unsupported ioctl flags explicitly
ubi: Expose the bitrot interface
ubi: Introduce in_pq()
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXIdqOwAKCRDh3BK/laaZ
PFRlAP0RZr7vDfGcZTXGApcIr63YDjzi8Gg1/Jhd0jrzLbKcdAD+P0d6bupWWwOl
yGjVxY9LkXNJiTI2Q+Equ7AgMYvDcQk=
=Lvcr
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"Scalability and performance improvements, as well as minor bug fixes
and cleanups"
* tag 'fuse-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
fuse: cache readdir calls if filesystem opts out of opendir
fuse: support clients that don't implement 'opendir'
fuse: lift bad inode checks into callers
fuse: multiplex cached/direct_io file operations
fuse add copy_file_range to direct io fops
fuse: use iov_iter based generic splice helpers
fuse: Switch to using async direct IO for FOPEN_DIRECT_IO
fuse: use atomic64_t for khctr
fuse: clean up aborted
fuse: Protect ff->reserved_req via corresponding fi->lock
fuse: Protect fi->nlookup with fi->lock
fuse: Introduce fi->lock to protect write related fields
fuse: Convert fc->attr_version into atomic64_t
fuse: Add fuse_inode argument to fuse_prepare_release()
fuse: Verify userspace asks to requeue interrupt that we really sent
fuse: Do some refactoring in fuse_dev_do_write()
fuse: Wake up req->waitq of only if not background
fuse: Optimize request_end() by not taking fiq->waitq.lock
fuse: Kill fasync only if interrupt is queued in queue_interrupt()
fuse: Remove stale comment in end_requests()
...
Merge several updates to the ARS implementation. Highlights include:
* Support retrieval of short-ARS results if the ARS state is "requires
continuation", and even if the "no_init_ars" module parameter is
specified.
* Allow busy-polling of the kernel ARS state by allowing root to reset
the exponential back-off timer.
* Filter potentially stale ARS results by tracking query-ARS relative to
the previous start-ARS.
Merge miscellaneous libnvdimm sub-system updates for v5.1. Highlights
include:
* Support for the Hyper-V family of device-specific-methods (DSMs)
* Several fixes and workarounds for Hyper-V compatibility.
* Fix for the support to cache the dirty-shutdown-count at init.
Referencing the __kernel_long_t type caused some user space applications
to stop compiling when they had not already included linux/posix_types.h,
e.g.
s/multicast.c -o ext/sockets/multicast.lo
In file included from /builddir/build/BUILD/php-7.3.3/main/php.h:468,
from /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:27:
/builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c: In function 'zm_startup_sockets':
/builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:776:40: error: '__kernel_long_t' undeclared (first use in this function)
776 | REGISTER_LONG_CONSTANT("SO_SNDTIMEO", SO_SNDTIMEO, CONST_CS | CONST_PERSISTENT);
It is safe to include that header here, since it only contains kernel
internal types that do not conflict with other user space types.
It's still possible that some related build failures remain, but those
are likely to be for code that is not already y2038 safe.
Reported-by: Laura Abbott <labbott@redhat.com>
Fixes: a9beb86ae6 ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Including:
- A big cleanup and optimization patch-set for the
Tegra GART driver
- Documentation updates and fixes for the IOMMU-API
- Support for page request in Intel VT-d scalable mode
- Intel VT-d dma_[un]map_resource() support
- Updates to the ATS enabling code for PCI (acked by Bjorn) and
Intel VT-d to align with the latest version of the ATS spec
- Relaxed IRQ source checking in the Intel VT-d driver for some
aliased devices, needed for future devices which send IRQ
messages from more than on request-ID
- IRQ remapping driver for Hyper-V
- Patches to make generic IOVA and IO-Page-Table code usable
outside of the IOMMU code
- Various other small fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAlyCNlIACgkQK/BELZcB
GuNDiRAAscgYj0BdqpZVUNHl4PySR12QJpS1myl/OC4HEbdB/EOh+bYT4Q1vptCU
GNK6Gt9SVfcbtWrLiGfcP9ODXmbqZ6AIOIbHKv9cvw1mnyYAtVvT/kck7B/W5jEr
/aP/5RTO7XcqscWO44zBkrtLFupegtpQFB0jXYTJYTrwQoNKRqCUqfetZGzMkXjL
x/h7kFTTIRcVP8RFcOeAMwC6EieaI8z8HN976Gu7xSV8g0VJqoNsBN8jbUuBh5AN
oPyd9nl1KBcIQEC1HsbN8I5wIhTh1sJ2UDqFHAgtlnO59zWHORuFUUt6SXbC9UqJ
okJTzFp9Dh2BqmFPXxBTxAf3j+eJP2XPpDI9Ask6SytEPhgw39fdlOOn2MWfSFoW
TaBJ4ww/r98GzVxCP7Up98xFZuHGDICL3/M7Mk3mRac/lgbNRbtfcBa5NV4fyQhY
184t656Zm/9gdWgGAvYQtApr6/iI+wRMLkIwuw63wqH09yfbDcpTOo6DEQE3B5KR
4H1qSIiVGVVZlWQateR6N32ZmY4dWzpnL2b8CfsdBytzHHFb/c3dPnZB8fxx9mwF
onyvjg9nkIiv7mdcN4Ox2WXrAExTeSftyPajN0WWawNJU3uPTBgNrqNHyWSkiaN4
dAvEepfGuFQGz2Fj03Pv7OqY8veyRezErVRLwiMJRNyy7pi6Wng=
=cKsD
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- A big cleanup and optimization patch-set for the Tegra GART driver
- Documentation updates and fixes for the IOMMU-API
- Support for page request in Intel VT-d scalable mode
- Intel VT-d dma_[un]map_resource() support
- Updates to the ATS enabling code for PCI (acked by Bjorn) and Intel
VT-d to align with the latest version of the ATS spec
- Relaxed IRQ source checking in the Intel VT-d driver for some aliased
devices, needed for future devices which send IRQ messages from more
than on request-ID
- IRQ remapping driver for Hyper-V
- Patches to make generic IOVA and IO-Page-Table code usable outside of
the IOMMU code
- Various other small fixes and cleanups
* tag 'iommu-updates-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
iommu/vt-d: Get domain ID before clear pasid entry
iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm()
iommu/vt-d: Set context field after value initialized
iommu/vt-d: Disable ATS support on untrusted devices
iommu/mediatek: Fix semicolon code style issue
MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS scope
iommu/hyper-v: Add Hyper-V stub IOMMU driver
x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available
PCI/ATS: Add inline to pci_prg_resp_pasid_required()
iommu/vt-d: Check identity map for hot-added devices
iommu: Fix IOMMU debugfs fallout
iommu: Document iommu_ops.is_attach_deferred()
iommu: Document iommu_ops.iotlb_sync_map()
iommu/vt-d: Enable ATS only if the device uses page aligned address.
PCI/ATS: Add pci_ats_page_aligned() interface
iommu/vt-d: Fix PRI/PASID dependency issue.
PCI/ATS: Add pci_prg_resp_pasid_required() interface.
iommu/vt-d: Allow interrupts from the entire bus for aliased devices
iommu/vt-d: Add helper to set an IRTE to verify only the bus number
iommu: Fix flush_tlb_all typo
...
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by the
riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception level
and to avoid the propagation of the UNKNOWN FAR value into the si_code
for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlyCl0cACgkQa9axLQDI
XvEyKxAAiogBZLbyhcy8bTUHVzVoJE0FyAkdO2wWnnaff2Ohkhy1Y/npv33IeK2q
RknxqDIx2DUUVPJNRZGoI/WwBtTZdKaAnW4rIKG84yC1eAkFcd96WQasaZzcp1qY
HmvbJiYXM0bh+0J7i3Wgry/QzOkrltJFJW2kp6Wd5aFE+R1WyWyxT6d+Fp0J3vlA
bT70jlpBK6LXEOmmBS+04Ml02+8MvaGxIl8EInBHSfDLRLErj5E8n41rRHKUiSWz
maWI+kVoLYwOE68xiZlDftUBEeQpUSWgg2nxeK+640QSl1wJmVcRcY9nm6TZeMG2
AiZTR9a7cP5rrdSN5suUmb7d4AMMVlVMisGDlwb+9oCxeTRDzg0uwACaVgHfPqQr
UeBdHbL9nStN7uBH23H8L9mKk+tqpFmk0sgzdrKejOwysAiqWV8aazb/Na3qnVRl
J1B5opxMnGOsjXmHvtG/tiZl281Uwz5ZmzfLmIY3gUZgUgdA3511Egp0ry5y1dzJ
SkYC4Hmzb2ybQvXGIDDa3OzCwXXiqyqKsO+O8Egg1k4OIwbp3w+NHE7gKeA+dMgD
gjN7zEalCUi46Q28xiCPEb+88BpQ18czIWGQLb9mAnmYeZPjqqenXKXuRHr4lgVe
jPURJ/vqvFEglZJN1RDuQHKzHEcm5f2XE566sMZYdSoeiUCb0QM=
=2U56
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Pseudo NMI support for arm64 using GICv3 interrupt priorities
- uaccess macros clean-up (unsafe user accessors also merged but
reverted, waiting for objtool support on arm64)
- ptrace regsets for Pointer Authentication (ARMv8.3) key management
- inX() ordering w.r.t. delay() on arm64 and riscv (acks in place by
the riscv maintainers)
- arm64/perf updates: PMU bindings converted to json-schema, unused
variable and misleading comment removed
- arm64/debug fixes to ensure checking of the triggering exception
level and to avoid the propagation of the UNKNOWN FAR value into the
si_code for debug signals
- Workaround for Fujitsu A64FX erratum 010001
- lib/raid6 ARM NEON optimisations
- NR_CPUS now defaults to 256 on arm64
- Minor clean-ups (documentation/comments, Kconfig warning, unused
asm-offsets, clang warnings)
- MAINTAINERS update for list information to the ARM64 ACPI entry
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
arm64: mmu: drop paging_init comments
arm64: debug: Ensure debug handlers check triggering exception level
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
Revert "arm64: uaccess: Implement unsafe accessors"
arm64: avoid clang warning about self-assignment
arm64: Kconfig.platforms: fix warning unmet direct dependencies
lib/raid6: arm: optimize away a mask operation in NEON recovery routine
lib/raid6: use vdupq_n_u8 to avoid endianness warnings
arm64: io: Hook up __io_par() for inX() ordering
riscv: io: Update __io_[p]ar() macros to take an argument
asm-generic/io: Pass result of I/O accessor to __io_[p]ar()
arm64: Add workaround for Fujitsu A64FX erratum 010001
arm64: Rename get_thread_info()
arm64: Remove documentation about TIF_USEDFPU
arm64: irqflags: Fix clang build warnings
arm64: Enable the support of pseudo-NMIs
arm64: Skip irqflags tracing for NMI in IRQs disabled context
arm64: Skip preemption when exiting an NMI
arm64: Handle serror in NMI context
irqchip/gic-v3: Allow interrupts to be set as pseudo-NMI
...
DM targets to properly advertise discard limits that blk_queue_split()
looks at when dtermining to split discard. Whereby allowing DM core's
own 'split_discard_bios' to be removed.
- Improve DM cache target to provide support for discard passdown to the
origin device.
- Introduce support to directly boot to a DM mapped device from init by
using dm-mod.create= module param. This eliminates the need for an
elaborate initramfs that is otherwise needed to create DM devices.
This feature's implementation has been worked on for quite some time
(got up to v12) and is of particular interest to Android and other
more embedded platforms (e.g. ARM).
- Rate limit errors from the DM integrity target that were identified as
the cause for recent NMI hangs due to console limitations.
- Add sanity checks for user input to thin-pool and external snapshot
creation.
- Remove some unused leftover kmem caches from when old .request_fn
request-based support was removed.
- Various small cleanups and fixes to targets (e.g. typos, needless
unlikely() annotations, use struct_size(), remove needless
.direct_access method from dm-snapshot)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJcgT+7AAoJEMUj8QotnQNaAsUIAIxsO5y6+7UruZzZxpyYBA34
yBLnZ9SICxESteu4R9lWT4LnFbrdwDDSSCeQ1dFt5/vx54T4qISN/O3lv9e//BeJ
BxFXtu7wB485l28uojBZeb+9APTaoihfEokcfDqZnaf26XtY0t/M+yRP7U86eGcC
zsX9fOEmJ3cpWtpai07tbHNDjIrr1kIWcFuU2+xGO/wn+Up8uLd85exi7e3cqDs6
VC+YJ/10/2keqFQvse3w3TBMjduwpb7SlDa2z/SorYaStVHzgwRSSjWYkSM/eDRA
OkSeRQ3Rnwc+Vad2R8J7unnZlMd4kALjGuzbyafWnitE+C+n0aJFDKqjIwNbKcw=
=GKp5
-----END PGP SIGNATURE-----
Merge tag 'for-5.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Update bio-based DM core to always call blk_queue_split() and update
DM targets to properly advertise discard limits that
blk_queue_split() looks at when dtermining to split discard. Whereby
allowing DM core's own 'split_discard_bios' to be removed.
- Improve DM cache target to provide support for discard passdown to
the origin device.
- Introduce support to directly boot to a DM mapped device from init by
using dm-mod.create= module param. This eliminates the need for an
elaborate initramfs that is otherwise needed to create DM devices.
This feature's implementation has been worked on for quite some time
(got up to v12) and is of particular interest to Android and other
more embedded platforms (e.g. ARM).
- Rate limit errors from the DM integrity target that were identified
as the cause for recent NMI hangs due to console limitations.
- Add sanity checks for user input to thin-pool and external snapshot
creation.
- Remove some unused leftover kmem caches from when old .request_fn
request-based support was removed.
- Various small cleanups and fixes to targets (e.g. typos, needless
unlikely() annotations, use struct_size(), remove needless
.direct_access method from dm-snapshot)
* tag 'for-5.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: limit the rate of error messages
dm snapshot: don't define direct_access if we don't support it
dm cache: add support for discard passdown to the origin device
dm writecache: fix typo in name for writeback_wq
dm: add support to directly boot to a mapped device
dm thin: add sanity checks to thin-pool and external snapshot creation
dm block manager: remove redundant unlikely annotation
dm verity fec: remove redundant unlikely annotation
dm integrity: remove redundant unlikely annotation
dm: always call blk_queue_split() in dm_process_bio()
dm: fix to_sector() for 32bit
dm switch: use struct_size() in kzalloc()
dm: remove unused _rq_tio_cache and _rq_cache
dm: eliminate 'split_discard_bios' flag from DM target interface
dm: update dm_process_bio() to split bio if in ->make_request_fn()
This has been a slightly more active cycle than normal with ongoing core
changes and quite a lot of collected driver updates.
- Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe
- A new data transfer mode for HFI1 giving higher performance
- Significant functional and bug fix update to the mlx5 On-Demand-Paging MR
feature
- A chip hang reset recovery system for hns
- Change mm->pinned_vm to an atomic64
- Update bnxt_re to support a new 57500 chip
- A sane netlink 'rdma link add' method for creating rxe devices and fixing
the various unregistration race conditions in rxe's unregister flow
- Allow lookup up objects by an ID over netlink
- Various reworking of the core to driver interface:
* Drivers should not assume umem SGLs are in PAGE_SIZE chunks
* ucontext is accessed via udata not other means
* Start to make the core code responsible for object memory
allocation
* Drivers should convert struct device to struct ib_device
via a helper
* Drivers have more tools to avoid use after unregister problems
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlyAJYYACgkQOG33FX4g
mxrWwQ/+OyAx4Moru7Aix0C6GWxTJp/wKgw21CS3reZxgLai6x81xNYG/s2wCNjo
IccObVd7mvzyqPdxOeyHBsJBbQDqWvoD6O2duH8cqGMgBRgh3CSdUep2zLvPpSAx
2W1SvWYCLDnCuarboFrCA8c4AN3eCZiqD7z9lHyFQGjy3nTUWzk1uBaOP46uaiMv
w89N8EMdXJ/iY6ONzihvE05NEYbMA8fuvosKLLNdghRiHIjbMQU8SneY23pvyPDd
ZziPu9NcO3Hw9OVbkwtJp47U3KCBgvKHmnixyZKkikjiD+HVoABw2IMwcYwyBZwP
Bic/ddONJUvAxMHpKRnQaW7znAiHARk21nDG28UAI7FWXH/wMXgicMp6LRcNKqKF
vqXdxHTKJb0QUR4xrYI+eA8ihstss7UUpgSgByuANJ0X729xHiJtlEvPb1DPo1Dz
9CB4OHOVRl5O8sA5Jc6PSusZiKEpvWoyWbdmw0IiwDF5pe922VLl5Nv88ta+sJ38
v2Ll5AgYcluk7F3599Uh9D7gwp5hxW2Ph3bNYyg2j3HP4/dKsL9XvIJPXqEthgCr
3KQS9rOZfI/7URieT+H+Mlf+OWZhXsZilJG7No0fYgIVjgJ00h3SF1/299YIq6Qp
9W7ZXBfVSwLYA2AEVSvGFeZPUxgBwHrSZ62wya4uFeB1jyoodPk=
=p12E
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This has been a slightly more active cycle than normal with ongoing
core changes and quite a lot of collected driver updates.
- Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe
- A new data transfer mode for HFI1 giving higher performance
- Significant functional and bug fix update to the mlx5
On-Demand-Paging MR feature
- A chip hang reset recovery system for hns
- Change mm->pinned_vm to an atomic64
- Update bnxt_re to support a new 57500 chip
- A sane netlink 'rdma link add' method for creating rxe devices and
fixing the various unregistration race conditions in rxe's
unregister flow
- Allow lookup up objects by an ID over netlink
- Various reworking of the core to driver interface:
- drivers should not assume umem SGLs are in PAGE_SIZE chunks
- ucontext is accessed via udata not other means
- start to make the core code responsible for object memory
allocation
- drivers should convert struct device to struct ib_device via a
helper
- drivers have more tools to avoid use after unregister problems"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
net/mlx5: ODP support for XRC transport is not enabled by default in FW
IB/hfi1: Close race condition on user context disable and close
RDMA/umem: Revert broken 'off by one' fix
RDMA/umem: minor bug fix in error handling path
RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
cxgb4: kfree mhp after the debug print
IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
IB/rdmavt: Fix loopback send with invalidate ordering
IB/iser: Fix dma_nents type definition
IB/mlx5: Set correct write permissions for implicit ODP MR
bnxt_re: Clean cq for kernel consumers only
RDMA/uverbs: Don't do double free of allocated PD
RDMA: Handle ucontext allocations by IB/core
RDMA/core: Fix a WARN() message
bnxt_re: fix the regression due to changes in alloc_pbl
IB/mlx4: Increase the timeout for CM cache
IB/core: Abort page fault handler silently during owning process exit
IB/mlx5: Validate correct PD before prefetch MR
IB/mlx5: Protect against prefetch of invalid MR
RDMA/uverbs: Store PR pointer before it is overwritten
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcgUDlAAoJEAhfPr2O5OEV2kIP/AiHMkMGi/fXmwzN0tFjYkim
39t6rodj6rT/oMib4XvW55GjQy5sdXwz+1jE+kZA5imbUvt6YzUXFBzIBOGOIF0n
1MukKa7M6ragnm2yR+42ucBr3jcuc91/keeVzWgP2cgeZeKUlBHme+rECYnwqDdT
9rcG4U2XL0Wolbm4lAispaWYIYoOURvPeryJ244vlPmch5/2nmXbG7AgNlfJsAw4
NFmdHBWxLeyB8F95ToikhuNlTWrsvdVHPHbDaDPwioSulZ1vw+lu4CHRd1uZo2iH
W0INE65ukgyenzTDbmnj5/oWCqV4KRTs8A2x6eimz+wG/60jWQjDiBLSzhxjBH7x
alrwhxnW3bD31ZUCkmaGd1+3txvLf+Lup9lLX3GCBKA45dW9pzVCLfxSfNaKKlTL
0xCYSMxl5xbl8TL6hHxK7/n+LsButgTRWIoJpqkM9uPrljwzznpgqJvARqSuHEKJ
3Tvnkc2DZsmlM8L02i929BsrsoTncm6wBBVlCJzhL0VNaOuL7yJVzXhrw7b/dZZw
IZu6cH5RrZhIQR4y1UPlaEZoidUGvR0+K997AsURIHJA0RolWE5eI2JHSE86EX8S
bzG5SChkQmbpYt5OXQvg5VxvqVElx/5/tamcHe/rKwaAwaG9aI9HICgP2e0Zaoce
YOMJUpcHtSY5Fedk8P1a
=tD1x
-----END PGP SIGNATURE-----
Merge tag 'media/v5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- remove sensor drivers that got converted from soc_camera
- remaining soc_camera drivers got moved to staging
- some documentation cleanups and improvements
- the imx staging driver now supports imx7
- the ov9640, mt9m001 and mt9m111 got converted from soc_camera
- the vim2m driver now does what a m2m convert driver expects to do
- epoll() fixes on media subsystems
- several drivers fixes, typos, cleanups and improvements
* tag 'media/v5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (346 commits)
media: dvb/earth-pt1: fix wrong initialization for demod blocks
media: vim2m: Address some coding style issues
media: vim2m: don't use BUG()
media: vim2m: speedup passthrough copy
media: vim2m: add an horizontal scaler
media: vim2m: don't accept YUYV anymore as output format
media: vim2m: add vertical linear scaler
media: vim2m: better handle cap/out buffers with different sizes
media: vim2m: use different framesizes for bayer formats
media: vim2m: add support for VIDIOC_ENUM_FRAMESIZES
media: vim2m: ensure that width is multiple of two
media: vim2m: improve debug messages
media: vim2m: add bayer capture formats
media: a few more typos at staging, pci, platform, radio and usb
media: Documentation: fix several typos
media: staging: fix several typos
media: include: fix several typos
media: common: fix several typos
media: v4l2-core: fix several typos
media: usb: fix several typos
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlyAJvAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphb+EACFaKI2HIdjExQ5T7Cxebzwky+Qiro3FV55
ziW00FZrkJ5g0h4ItBzh/5SDlcNQYZDMlA3s4xzWIMadWl5PjMPq1uJul0cITbSl
WIJO5hpgNMXeUEhvcXUl6+f/WzpgYUxN40uW8N5V7EKlooaFVfudDqJGlvEv+UgB
g8NWQYThSG+/e7r9OGwK0xDRVKfpjxVvmqmnDH3DrxKaDgSOwTf4xn1u41wKwfQ3
3uPfQ+GBeTqt4a2AhOi7K6KQFNnj5Jz5CXYMiOZI2JGtLPcL6dmyBVD7K0a0HUr+
rs4ghNdd1+puvPGNK4TX8qV0uiNrMctoRNVA/JDd1ZTYEKTmNLxeFf+olfYHlwuK
K5FRs60/lgNzNkzcUpFvJHitPwYtxYJdB36PyswE1FZP1YviEeVoKNt9W8aIhEoA
549uj90brfA74eCINGhq98pJqj9CNyCPw3bfi76f5Ej2utwYDb9S5Cp2gfSa853X
qc/qNda9efEq7ikwCbPzhekRMXZo6TSXtaSmC2C+Vs5+mD1Scc4kdAvdCKGQrtr9
aoy0iQMYO2NDZ/G5fppvXtMVuEPAZWbsGftyOe15IlMysjRze2ycJV8cFahKEVM9
uBeXLyH1pqGU/j7ABP4+XRZ/sbHJTwjKJbnXhTgBsdU8XO/CR3U+kRQFTsidKMfH
Wlo3uH2h2A==
=p78E
-----END PGP SIGNATURE-----
Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block
Pull io_uring IO interface from Jens Axboe:
"Second attempt at adding the io_uring interface.
Since the first one, we've added basic unit testing of the three
system calls, that resides in liburing like the other unit tests that
we have so far. It'll take a while to get full coverage of it, but
we're working towards it. I've also added two basic test programs to
tools/io_uring. One uses the raw interface and has support for all the
various features that io_uring supports outside of standard IO, like
fixed files, fixed IO buffers, and polled IO. The other uses the
liburing API, and is a simplified version of cp(1).
This adds support for a new IO interface, io_uring.
io_uring allows an application to communicate with the kernel through
two rings, the submission queue (SQ) and completion queue (CQ) ring.
This allows for very efficient handling of IOs, see the v5 posting for
some basic numbers:
https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/
Outside of just efficiency, the interface is also flexible and
extendable, and allows for future use cases like the upcoming NVMe
key-value store API, networked IO, and so on. It also supports async
buffered IO, something that we've always failed to support in the
kernel.
Outside of basic IO features, it supports async polled IO as well.
This particular feature has already been tested at Facebook months ago
for flash storage boxes, with 25-33% improvements. It makes polled IO
actually useful for real world use cases, where even basic flash sees
a nice win in terms of efficiency, latency, and performance. These
boxes were IOPS bound before, now they are not.
This series adds three new system calls. One for setting up an
io_uring instance (io_uring_setup(2)), one for submitting/completing
IO (io_uring_enter(2)), and one for aux functions like registrating
file sets, buffers, etc (io_uring_register(2)). Through the help of
Arnd, I've coordinated the syscall numbers so merge on that front
should be painless.
Jon did a writeup of the interface a while back, which (except for
minor details that have been tweaked) is still accurate. Find that
here:
https://lwn.net/Articles/776703/
Huge thanks to Al Viro for helping getting the reference cycle code
correct, and to Jann Horn for his extensive reviews focused on both
security and bugs in general.
There's a userspace library that provides basic functionality for
applications that don't need or want to care about how to fiddle with
the rings directly. It has helpers to allow applications to easily set
up an io_uring instance, and submit/complete IO through it without
knowing about the intricacies of the rings. It also includes man pages
(thanks to Jeff Moyer), and will continue to grow support helper
functions and features as time progresses. Find it here:
git://git.kernel.dk/liburing
Fio has full support for the raw interface, both in the form of an IO
engine (io_uring), but also with a small test application (t/io_uring)
that can exercise and benchmark the interface"
* tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
io_uring: add a few test tools
io_uring: allow workqueue item to handle multiple buffered requests
io_uring: add support for IORING_OP_POLL
io_uring: add io_kiocb ref count
io_uring: add submission polling
io_uring: add file set registration
net: split out functions related to registering inflight socket files
io_uring: add support for pre-mapped user IO buffers
block: implement bio helper to add iter bvec pages to bio
io_uring: batch io_kiocb allocation
io_uring: use fget/fput_many() for file references
fs: add fget_many() and fput_many()
io_uring: support for IO polling
io_uring: add fsync support
Add io_uring IO interface
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlx63XIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpp2vEACfrrQsap7R+Av28mmXpmXi2FPa3g5Tev1t
yYjK2qHvhlMZjPTYw3hCmbYdDDczlF7PEgSE2x2DjdcsYapb8Fy1lZ2X16c7ztBR
HD/t9b5AVSQsczZzKgv3RqsNtTnjzS5V0A8XH8FAP2QRgiwDMwSN6G0FP0JBLbE/
ZgxQrH1Iy1F33Wz4hI3Z7dEghKPZrH1IlegkZCEu47q9SlWS76qUetSy2GEtchOl
3Lgu54mQZyVdI5/QZf9DyMDLF6dIz3tYU2qhuo01AHjGRCC72v86p8sIiXcUr94Q
8pbegJhJ/g8KBol9Qhv3+pWG/QUAZwi/ZwasTkK+MJ4klRXfOrznxPubW1z6t9Vn
QRo39Po5SqqP0QWAscDxCFjESIQlWlKa+LZurJL7DJDCUGrSgzTpnVwFqKwc5zTP
HJa5MT2tEeL2TfUYRYCfh0ZV0elINdHA1y1klDBh38drh4EWr2gW8xdseGYXqRjh
fLgEpoF7VQ8kTvxKN+E4jZXkcZmoLmefp0ZyAbblS6IawpPVC7kXM9Fdn2OU8f2c
fjVjvSiqxfeN6dnpfeLDRbbN9894HwgP/LPropJOQ7KmjCorQq5zMDkAvoh3tElq
qwluRqdBJpWT/F05KweY+XVW8OawIycmUWqt6JrVNoIDAK31auHQv47kR0VA4OvE
DRVVhYpocw==
=VBaU
-----END PGP SIGNATURE-----
Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we
finally have Mings multi-page bvec support merged. Apart from that,
this pull request contains:
- Small series that avoids quiescing the queue for sysfs changes that
match what we currently have (Aleksei)
- Series of bcache fixes (via Coly)
- Series of lightnvm fixes (via Mathias)
- NVMe pull request from Christoph. Nothing major, just SPDX/license
cleanups, RR mp policy (Hannes), and little fixes (Bart,
Chaitanya).
- BFQ series (Paolo)
- Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
for the fast path (Jianchao)
- fops->iopoll() added for async IO polling, this is a feature that
the upcoming io_uring interface will use (Christoph, me)
- Partition scan loop fixes (Dongli)
- mtip32xx conversion from managed resource API (Christoph)
- cdrom registration race fix (Guenter)
- MD pull from Song, two minor fixes.
- Various documentation fixes (Marcos)
- Multi-page bvec feature. This brings a lot of nice improvements
with it, like more efficient splitting, larger IOs can be supported
without growing the bvec table size, and so on. (Ming)
- Various little fixes to core and drivers"
* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
block: fix updating bio's front segment size
block: Replace function name in string with __func__
nbd: propagate genlmsg_reply return code
floppy: remove set but not used variable 'q'
null_blk: fix checking for REQ_FUA
block: fix NULL pointer dereference in register_disk
fs: fix guard_bio_eod to check for real EOD errors
blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
block: optimize bvec iteration in bvec_iter_advance
block: introduce mp_bvec_for_each_page() for iterating over page
block: optimize blk_bio_segment_split for single-page bvec
block: optimize __blk_segment_map_sg() for single-page bvec
block: introduce bvec_nth_page()
iomap: wire up the iopoll method
block: add bio_set_polled() helper
block: wire up block device iopoll method
fs: add an iopoll method to struct file_operations
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
loop: do not print warn message if partition scan is successful
block: bounce: make sure that bvec table is updated
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcfzdFAAoJEAx081l5xIa+QxsP/A6QP+gx4vQ8XXikaJMNz89e
59TAbXHW/2qFMHRtUesuB2bc1a2cw2ppFsrryG7c4HqjKDDHna7Yx2JzZYL0MmNh
SpJYL4yMuu/2TmyCouaAYzzP+5Supdosfif4LRn3269DH0i5MWXL+NVrbeB47blG
XwjQTu46yfn06IFAo5bI2jMqSuPCDd4Hzpyixpvmjt+r16XwoH5nGUrDCHG8t/jV
+PUZCYAjn71in6Z66MKZv/EVCVFfTnaVJ2KEgw7e+vWxnERkRh/xnRO6KIXMD5O1
vo2qc2vbxkGpjaE6pDzC/2e5pRJT8Ks0t50jYjbVF+6nHpP5XIPvAXH4R2QdTA7B
Jiu8N0oz6wj0H3AJ/V38rEHWW8zgOfXkhbRBfmfQ9NfgiEfwxqCVgspIOwei4oVw
hvMXYUBM1CU+JIfW6w7ZT4oHALUlnCpnr5DQRdCNRm8zjClyNfIAoJIJrOtqmX44
qjEzSgxb89ZtS7c0yislSBaovgAmcM3I+aq5I4xokdY0hFEZ6QomuKunyuQ8pBYa
3gsvMEReLxETffhhYpjBt5+b5IgB49nf3Y38CKFurv32Sp0p0YgK0qVo8qRQHclj
QIJ+3+zQMCX20swYpCWXhOPUIwtQppdKhWzg12my8rL2VgTlYhjlEbL4EL+Wk+hv
6Ipulthzn0RyrSK9Dojh
=GlRQ
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2019-03-06' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This is the main drm pull request for the 5.1 merge window.
The big changes I'd highlight are:
- nouveau has HMM support now, there is finally an in-tree user so we
can quieten down the rip it out people.
- i915 now enables fastboot by default on Skylake+
- Displayport Multistream support has been refactored and should
hopefully be more reliable.
Core:
- header cleanups aiming towards removing drmP.h
- dma-buf fence seqnos to 64-bits
- common helper for DP mst hotplug for radeon,i915,amdgpu + new
refcounting scheme
- MST i2c improvements
- drm_syncobj_cb removal
- ARM FB compression fourcc
- P010 + P016 fourcc
- allwinner tiled format modifier
- i2c over aux I2C_M_STOP support
- DRM_AUTH handling fixes
TTM:
- ref/unref renaming
New driver:
- ARM komeda display driver
scheduler:
- refactor mirror list handling
- rework hw fence processing
- 0 run queue entity fix
bridge:
- TI DS90C185 LVDS bridge
- thc631lvdm83d bridge improvements
- cadence + allwinner DSI ported to generic phy
panels:
- Sitronix ST7701 panel
- Kingdisplay KD097D04
- LeMaker BL035-RGB-002
- PDA 91-00156-A0
- Innolux EE101IA-01D
i915:
- Enable fastboot by default on SKL+/VLV/CHV
- Export RPCS configuration for ICL media driver
- Coffelake PCI ID
- CNL clocks setup fixes
- ACPI/PMIC support for MIPI/DSI
- Per-engine WA init for all engines
- Shrinker locking fixes
- Kerneldoc updates
- Lots of ring improvements and reset fixes
- Coffeelake GVT Support
- VFIO GVT EDID Region support
- runtime PM wakeref tracking
- ILK->IVB primary plane enable delays
- userptr mutex locking fixes
- DSI fixes
- LVDS/TV cleanups
- HW readout fixes
- LUT robustness fixes
- ICL display and watermark fixes
- gem mmap race fix
amdgpu:
- add scheduled dependencies interface
- DCC on scanout surfaces
- vega10/20 BACO support
- Multiple IH rings on soc15
- XGMI locking fixes
- DC i2c/aux cleanups
- runtime SMU debug interface
- Kexec improvmeents
- SR-IOV fixes
- DC freesync + ABM fixes
- GDS fixes
- GPUVM fixes
- vega20 PCIE DPM switching fixes
- Context priority handling fixes
radeon:
- fix missing break in evergreen parser
nouveau:
- SVM support via HMM
msm:
- QCOM Compressed modifier support
exynos:
- s5pv210 rotator support
imx:
- zpos property support
- pending update fixes
v3d:
- cache flush improvments
vc4:
- reflection support
- HDMI overscan support
tegra:
- CEC refactoring
- HDMI audio fixes
- Tegra186 prep work
- SOR crossbar device tree fixes
sun4i:
- implicit fencing support
- YUV and scalar support improvements
- A23 support
- tiling fixes
atmel-hlcdc:
- clipping and rotation property fixes
qxl:
- BO and PRIME improvements
- generic fbdev emulation
dw-hdmi:
- HDMI 2.0 2160p
- YUV420 ouput
rockchip:
- implicit fencing support
- reflection proerties
virtio-gpu:
- use generic fbdev emulation
tilcdc:
- cpufreq vs crtc init fix
rcar-du:
- R8A774C0 support
- D3/E3 RGB output routing fixes and DPAD0 support
- RA87744 LVDS support
bochs:
- atomic and generic fbdev emulation
- ID mismatch error on bochs load
meson:
- remove firmware fbs"
* tag 'drm-next-2019-03-06' of git://anongit.freedesktop.org/drm/drm: (1130 commits)
drm/amd/display: Use vrr friendly pageflip throttling in DC.
drm/imx: only send commit done event when all state has been applied
drm/imx: allow building under COMPILE_TEST
drm/imx: imx-tve: depend on COMMON_CLK
drm/imx: ipuv3-plane: add zpos property
drm/imx: ipuv3-plane: add function to query atomic update status
gpu: ipu-v3: prg: add function to get channel configure status
gpu: ipu-v3: pre: add double buffer status readback
drm/amdgpu: Bump amdgpu version for context priority override.
drm/amdgpu/powerplay: fix typo in BACO header guards
drm/amdgpu/powerplay: fix return codes in BACO code
drm/amdgpu: add missing license on baco files
drm/bochs: Fix the ID mismatch error
drm/nouveau/dmem: use dma addresses during migration copies
drm/nouveau/dmem: use physical vram addresses during migration copies
drm/nouveau/dmem: extend copy function to allow direct use of physical addresses
drm/nouveau/svm: new ioctl to migrate process memory to GPU memory
drm/nouveau/dmem: device memory helpers for SVM
drm/nouveau/svm: initial support for shared virtual memory
drm/nouveau: prepare for enabling svm with existing userspace interfaces
...
Merge more updates from Andrew Morton:
- some of the rest of MM
- various misc things
- dynamic-debug updates
- checkpatch
- some epoll speedups
- autofs
- rapidio
- lib/, lib/lzo/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
samples/mic/mpssd/mpssd.h: remove duplicate header
kernel/fork.c: remove duplicated include
include/linux/relay.h: fix percpu annotation in struct rchan
arch/nios2/mm/fault.c: remove duplicate include
unicore32: stop printing the virtual memory layout
MAINTAINERS: fix GTA02 entry and mark as orphan
mm: create the new vm_fault_t type
arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
arch: simplify several early memory allocations
openrisc: simplify pte_alloc_one_kernel()
sh: prefer memblock APIs returning virtual address
microblaze: prefer memblock API returning virtual address
powerpc: prefer memblock APIs returning virtual address
lib/lzo: separate lzo-rle from lzo
lib/lzo: implement run-length encoding
lib/lzo: fast 8-byte copy on arm64
lib/lzo: 64-bit CTZ on arm64
lib/lzo: tidy-up ifdefs
ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
ipc: annotate implicit fall through
...
Large enterprise clients often run applications out of networked file
systems where the IT mandated layout of project volumes can end up
leading to paths that are longer than 128 characters. Bumping this up
to the next order of two solves this problem in all but the most
egregious case while still fitting into a 512b slab.
[oleg@redhat.com: update comment, per Kees]
Link: http://lkml.kernel.org/r/20181112160956.GA28472@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Ben Woodard <woodard@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>