linux-uconsole

Author	SHA1	Message	Date
Greg Kroah-Hartman	844ecc4634	This is the 4.19.64 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GibIACgkQONu9yGCS aT7z2hAAmv8AsH9IG43m7t6zLroJVswr/9594xk7yPBQgcY3/PW2aTFBCFbsdOL4 yXcj2PSwRiq9K6qAJULrvOvncR9fIILHqzWzyXnoaZ30lR/FxaaFmuHZX/5Ix1tB e5EEE/EA49UAEjEDaMLq8g2IvibsReDxmSpnXyBJWoyRAdFIElVnMJ2+zvP/wRhF NKzQj/bj/qecCbis2lUCaVWJFZ6+P/52UbD8lvIwqR3nk2TKsGDcLU6eY3yg4KrB rEHl5T8KIPrkX3KNIEB8EcFREene+rdpZLLVe4fYwf+gOqfiFXSzZZvweauMkplq ehlVHkykvQvlsVM2tjBD379z3C4aasZDuMVNMCbAy2FlruLeBQ7gEn77mCJB9VH5 /n/mlc2yizdoowtARCLWOUMfASpdSbqu2SQ7A/3kwG7l6GrpzKSIU2nQgm+41sUZ QJVtZ3IYsPoYjnU4B3JZzgJnf3M9jcRz/3JegviqhSEbF1gaScJX0cqN8C1idN/v ZAGCJK9S20/EEEsp5jn+bq2grUehvmD4TVDfot4P+5yRYyBIhMFpbM2RpjydOpwy +x8D1Q34LYPFgZfQ0vF62vcSBhMBiJ/7j41rUeo44K+Lg00F3yCOyL6FxK6S8h6j wsD0xLbllMrhV5KRYFizb3QbCHoHYiROIJk76uLvB+Tqq2Jg9VQ= =qIi2 -----END PGP SIGNATURE----- Merge 4.19.64 into android-4.19 Changes in 4.19.64 hv_sock: Add support for delayed close vsock: correct removal of socket from the list NFS: Fix dentry revalidation on NFSv4 lookup NFS: Refactor nfs_lookup_revalidate() NFSv4: Fix lookup revalidate of regular files usb: dwc2: Disable all EP's on disconnect usb: dwc2: Fix disable all EP's on disconnect arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ binder: fix possible UAF when freeing buffer ISDN: hfcsusb: checking idx of ep configuration media: au0828: fix null dereference in error path ath10k: Change the warning message string media: cpia2_usb: first wake up, then free in disconnect media: pvrusb2: use a different format for warnings NFS: Cleanup if nfs_match_client is interrupted media: radio-raremono: change devm_kalloc to kalloc iommu/vt-d: Don't queue_iova() if there is no flush queue iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA Bluetooth: hci_uart: check for missing tty operations vhost: introduce vhost_exceeds_weight() vhost_net: fix possible infinite loop vhost: vsock: add weight support vhost: scsi: add weight support sched/fair: Don't free p->numa_faults with concurrent readers sched/fair: Use RCU accessors consistently for ->numa_group /proc/<pid>/cmdline: remove all the special cases /proc/<pid>/cmdline: add back the setproctitle() special case drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl Fix allyesconfig output. ceph: hold i_ceph_lock when removing caps for freeing inode block, scsi: Change the preempt-only flag into a counter scsi: core: Avoid that a kernel warning appears during system resume ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL Linux 4.19.64 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3e9055b677bd8ad9d5070307fae0bc765d444e9d	2019-08-04 09:37:11 +02:00
Jann Horn	48046e092a	sched/fair: Don't free p->numa_faults with concurrent readers commit `16d51a590a` upstream. When going through execve(), zero out the NUMA fault statistics instead of freeing them. During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU. Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: `82727018b0` ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-04 09:30:56 +02:00
Greg Kroah-Hartman	237d383597	This is the 4.19.54 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0Nx3oACgkQONu9yGCS aT7hJRAAzdu1EKr/VUiFJshvPL/+veH1MLdMgafW6gXcoQCQSdMuv1j3mv3axElC dz2e/nHXvIhMKMrhzgEvVwV8m8eKPwM4IZjTJ8ji8st9uy0R2UIdbPUHrLtRAQzI bK2BIk6697B9/9z2s8nDXhKex9EtZ2vj8DeQLT1AgQKMM0+abPA2YR9+cp9HEjQR Wa5NjajwWgpty1VVk+Q6GMD+GBnEILyqxtVGv30z/prWoVadFAXJEZrWsaXotUvh ySc2CS9Iu/muiIegTSStnNXaWaawZblA4JdDcdRJ235rPK8sbACYtDgZispYopi4 y/9kUtuZzWEv61aFZkXd13jKZ8pJsrLZLoc+yDqew0IA6M2Hz7d7Pq/UWQ0m4qnt rCcU1vTFooSp/IHOToXN0oQQrTqD2oK5zO4V9S5McwQniuO/RqUPJfX/sKjtrvNT SI/yLVKMXImAwVXxNEfO4bE+T9F/QYptOHdpfqJkPvpKLzjxnPBPswG8poIKwQzJ w3p1AoQ1ISuW6b94a/nI5nSC28gVslVg+oTjRjF7kfIcOtvglX79XPvfdM5bB2DC qD51A8veEGCtAu57/tSyisGOomqTqqqbaiYXwuhgTI1NSszbqKDLNvjsw1GKj0rK mSmDzVMgQhxaHOagOeDqLYlj1zgTamx5R6+dretf8AOXyEE2RSg= =+MJy -----END PGP SIGNATURE----- Merge 4.19.54 into android-4.19 Changes in 4.19.54 ax25: fix inconsistent lock state in ax25_destroy_timer be2net: Fix number of Rx queues used for flow hashing hv_netvsc: Set probe mode to sync ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero lapb: fixed leak of control-blocks. neigh: fix use-after-free read in pneigh_get_next net: dsa: rtl8366: Fix up VLAN filtering net: openvswitch: do not free vport if register_netdevice() is failed. nfc: Ensure presence of required attributes in the deactivate_target handler sctp: Free cookie before we memdup a new one sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg tipc: purge deferredq list for each grp member in tipc_group_delete vsock/virtio: set SOCK_DONE on peer shutdown net/mlx5: Avoid reloading already removed devices net: mvpp2: prs: Fix parser range for VID filtering net: mvpp2: prs: Use the correct helpers when removing all VID filters Staging: vc04_services: Fix a couple error codes perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints netfilter: nf_queue: fix reinject verdict handling ipvs: Fix use-after-free in ip_vs_in selftests: netfilter: missing error check when setting up veth interface clk: ti: clkctrl: Fix clkdm_clk handling powerpc/powernv: Return for invalid IMC domain usb: xhci: Fix a potential null pointer dereference in xhci_debugfs_create_endpoint() mISDN: make sure device name is NUL terminated x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor perf/ring_buffer: Fix exposing a temporarily decreased data_head perf/ring_buffer: Add ordering to rb->nest increment perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data gpio: fix gpio-adp5588 build errors net: stmmac: update rx tail pointer register to fix rx dma hang issue. net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE() ACPI/PCI: PM: Add missing wakeup.flags.valid checks drm/etnaviv: lock MMU while dumping core net: aquantia: tx clean budget logic error net: aquantia: fix LRO with FCS error i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr ALSA: hda - Force polling mode on CNL for fixing codec communication configfs: Fix use-after-free when accessing sd->s_dentry perf data: Fix 'strncat may truncate' build failure with recent gcc perf namespace: Protect reading thread's namespace perf record: Fix s390 missing module symbol and warning for non-root users ia64: fix build errors by exporting paddr_to_nid() xen/pvcalls: Remove set but not used variable xenbus: Avoid deadlock during suspend due to open transactions KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu arm64: fix syscall_fn_t type arm64: use the correct function type in SYSCALL_DEFINE0 arm64: use the correct function type for __arm64_sys_ni_syscall net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs net: phylink: ensure consistent phy interface mode net: phy: dp83867: Set up RGMII TX delay scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route() scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask scsi: scsi_dh_alua: Fix possible null-ptr-deref scsi: libsas: delete sas port if expander discover failed mlxsw: spectrum: Prevent force of 56G ocfs2: fix error path kobject memory leak coredump: fix race condition between collapse_huge_page() and core dumping Abort file_remove_privs() for non-reg. files Linux 4.19.54 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-06-22 08:47:30 +02:00
Andrea Arcangeli	465ce9a50f	coredump: fix race condition between collapse_huge_page() and core dumping commit `59ea6d06cf` upstream. When fixing the race conditions between the coredump and the mmap_sem holders outside the context of the process, we focused on mmget_not_zero()/get_task_mm() callers in `04f5866e41` ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping"), but those aren't the only cases where the mmap_sem can be taken outside of the context of the process as Michal Hocko noticed while backporting that commit to older -stable kernels. If mmgrab() is called in the context of the process, but then the mm_count reference is transferred outside the context of the process, that can also be a problem if the mmap_sem has to be taken for writing through that mm_count reference. khugepaged registration calls mmgrab() in the context of the process, but the mmap_sem for writing is taken later in the context of the khugepaged kernel thread. collapse_huge_page() after taking the mmap_sem for writing doesn't modify any vma, so it's not obvious that it could cause a problem to the coredump, but it happens to modify the pmd in a way that breaks an invariant that pmd_trans_huge_lock() relies upon. collapse_huge_page() needs the mmap_sem for writing just to block concurrent page faults that call pmd_trans_huge_lock(). Specifically the invariant that "!pmd_trans_huge()" cannot become a "pmd_trans_huge()" doesn't hold while collapse_huge_page() runs. The coredump will call __get_user_pages() without mmap_sem for reading, which eventually can invoke a lockless page fault which will need a functional pmd_trans_huge_lock(). So collapse_huge_page() needs to use mmget_still_valid() to check it's not running concurrently with the coredump... as long as the coredump can invoke page faults without holding the mmap_sem for reading. This has "Fixes: khugepaged" to facilitate backporting, but in my view it's more a bug in the coredump code that will eventually have to be rewritten to stop invoking page faults without the mmap_sem for reading. So the long term plan is still to drop all mmget_still_valid(). Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com Fixes: `ba76149f47` ("thp: khugepaged") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-22 08:15:21 +02:00
Greg Kroah-Hartman	dda1dd3ba3	This is the 4.19.39 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzNPTYACgkQONu9yGCS aT5+LA//TEZMvKwfq8ASsp17ncB9jNrV9G5gwwTv4Aa+Zv16LN5pLhlTwqJE1JCc wJzisFxCGNil0LTZYlpTwGnPvjNB4MPqyMEU5B7GkLsbRCjCJCZUpCEhrsEhR6D+ finLDnSAHh8+c/kvETIazvzTw9BP8YUDe7j933L54pXONfAHUvYDkz57ri5oEyIE 9g+Haot6gytgnPH1iU3A+uSjMll2naBHT1ga7F6fqU6EuhIogtEepG1Y6TWT6zfQ IDO4HZwMhdi75ITkv06lh/1zdQaMaGWE+R7svKfQ5UoH89eIUvRjKl8Pkdd5JPha 1LOz5HBBTxCZdzl70UF8n0ZzlNYP71rQ1CT4bo+U4kPNWNvKZb2MxJEOxB7/JqJo qNrv1lBE1ERBM5s8NlTSx1P+RSs3G6kLhDIgz+uB8ALl87zDkwUA7ynP13NFsP2c QY7E0eianWSYJxrqhkgWXQ/ToYRLsCBARvAHMZgDveVUNe83SyK1NFhZD77weDSC WtirUFzKpw0ZtVjs8Ox6zh4HRtAFXsyPvnlPbtrMdaU0GtEox2Skg7SKboEZfyUa 8nu98ZdEdzqAKKUOs2o/3vVFnq4NKmw1khpo1OEW5Ob76KdDHYGMiiGOYH1vEGY1 OxqNyNgw8/8OT6drCrAgx7+4X+BTXdiSXORUyp9AP7WpELwMy4Q= =35Eg -----END PGP SIGNATURE----- Merge 4.19.39 into android-4.19 Changes in 4.19.39 selinux: use kernel linux/socket.h for genheaders and mdp Revert "ACPICA: Clear status of GPEs before enabling them" mm: make page ref count overflow check tighter and more explicit mm: add 'try_get_page()' helper function mm: prevent get_user_pages() from overflowing page refcount fs: prevent page refcount overflow in pipe_buf_get ARM: dts: bcm283x: Fix hdmi hpd gpio pull s390: limit brk randomization to 32MB net: ieee802154: fix a potential NULL pointer dereference ieee802154: hwsim: propagate genlmsg_reply return code net: stmmac: don't set own bit too early for jumbo frames qlcnic: Avoid potential NULL pointer dereference xsk: fix umem memory leak on cleanup staging: axis-fifo: add CONFIG_OF dependency staging, mt7621-pci: fix build without pci support netfilter: nft_set_rbtree: check for inactive element after flag mismatch netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING netfilter: fix NETFILTER_XT_TARGET_TEE dependencies netfilter: ip6t_srh: fix NULL pointer dereferences s390/qeth: fix race when initializing the IP address table ARM: imx51: fix a leaked reference by adding missing of_node_put sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init() serial: ar933x_uart: Fix build failure with disabled console KVM: arm64: Reset the PMU in preemptible context KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslots usb: dwc3: pci: add support for Comet Lake PCH ID usb: gadget: net2280: Fix overrun of OUT messages usb: gadget: net2280: Fix net2280_dequeue() usb: gadget: net2272: Fix net2272_dequeue() ARM: dts: pfla02: increase phy reset duration i2c: i801: Add support for Intel Comet Lake net: ks8851: Dequeue RX packets explicitly net: ks8851: Reassert reset pin if chip ID check fails net: ks8851: Delay requesting IRQ until opened net: ks8851: Set initial carrier state to down staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference staging: rtl8712: uninitialized memory in read_bbreg_hdl() staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc net: macb: Add null check for PCLK and HCLK net/sched: don't dereference a->goto_chain to read the chain index ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi drm/tegra: hub: Fix dereference before check NFS: Fix a typo in nfs_init_timeout_values() net: xilinx: fix possible object reference leak net: ibm: fix possible object reference leak net: ethernet: ti: fix possible object reference leak drm: Fix drm_release() and device unplug gpio: aspeed: fix a potential NULL pointer dereference drm/meson: Fix invalid pointer in meson_drv_unbind() drm/meson: Uninstall IRQ handler ARM: davinci: fix build failure with allnoconfig scsi: mpt3sas: Fix kernel panic during expander reset scsi: aacraid: Insure we don't access PCIe space during AER/EEH scsi: qla4xxx: fix a potential NULL pointer dereference usb: usb251xb: fix to avoid potential NULL pointer dereference leds: trigger: netdev: fix refcnt leak on interface rename x86/realmode: Don't leak the trampoline kernel address usb: u132-hcd: fix resource leak ceph: fix use-after-free on symlink traversal scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN x86/mm: Don't exceed the valid physical address space libata: fix using DMA buffers on stack gpio: of: Fix of_gpiochip_add() error path nvme-multipath: relax ANA state check perf machine: Update kernel map address and re-order properly kconfig/[mn]conf: handle backspace (^H) key iommu/amd: Reserve exclusion range in iova-domain ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK leds: pca9532: fix a potential NULL pointer dereference leds: trigger: netdev: use memcpy in device_name_store Linux 4.19.39 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-05-04 09:28:59 +02:00
Andrei Vagin	13a6a6dd3c	ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK [ Upstream commit `fcfc2aa018` ] There are a few system calls (pselect, ppoll, etc) which replace a task sigmask while they are running in a kernel-space When a task calls one of these syscalls, the kernel saves a current sigmask in task->saved_sigmask and sets a syscall sigmask. On syscall-exit-stop, ptrace traps a task before restoring the saved_sigmask, so PTRACE_GETSIGMASK returns the syscall sigmask and PTRACE_SETSIGMASK does nothing, because its sigmask is replaced by saved_sigmask, when the task returns to user-space. This patch fixes this problem. PTRACE_GETSIGMASK returns saved_sigmask if it's set. PTRACE_SETSIGMASK drops the TIF_RESTORE_SIGMASK flag. Link: http://lkml.kernel.org/r/20181120060616.6043-1-avagin@gmail.com Fixes: `29000caecb` ("ptrace: add ability to get/set signal-blocked mask") Signed-off-by: Andrei Vagin <avagin@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>	2019-05-04 09:20:22 +02:00
Greg Kroah-Hartman	9bf5904866	This is the 4.19.37 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzEBokACgkQONu9yGCS aT7G7w/8C93URGM67H7ynkCHTo8y3hkRE2rUJPckJNdS+IJKuecmOphak4tF0h07 qPWDPya70Q1S0cNu661TuVAGrhmE5jBx8/xfZaAOeaaU0xtZive+TfSHdAQQaHct tDk32O85N1aZ49rDEz9ibr7CGLVFDZtyhxV5gFMYQpjbqA7MzJC61zQg1jHyPSCz sKjQzW+uXMuSLru8jXHMvp41K5sFFp5gYdQbAVKlWtt79qPxWdxZPJbLbM0LBbtz XHt9E45Ink3ALF9P6tZ4e6gi4zzlNbh9yR92+X5NK5/8AP57yWba4W9JHWIfMBpC yyDYTOEAzdxqa2Jrgwr4WTdKH6U7FbQZFmWfTBB4VotbHLBWkVXj0OnF10qxP9eQ p5wGDTJAlWezhX1BTCfYroglDsvqhj+gHfwHzDRF1Del1dRgydRMQc0qLD1d9tul ovzwOkx1xyJrM2wq05I5gc0FoVyOL6/KCwqMrpVfKa3WKY7Uttjgf56bMqdIIkns i/6opzF+wtvwlLlCoXgYPXdm6kbWdgvS+skVHfWcHmZFMuGrFGGzJNwzXb7qnVjK T0hD1OestsfTyD/amnDNYkNeCkoOZqtHAi+xYOQR4kGY5cxP1lQJf85MgAy6RZSY h+rjys76Qf6+hTCtrowLr8SgksX4ACWxm+UarfAiiNnnDXwGfu8= =SrFV -----END PGP SIGNATURE----- Merge 4.19.37 into android-4.19 Changes in 4.19.37 bonding: fix event handling for stacked bonds failover: allow name change on IFF_UP slave interfaces net: atm: Fix potential Spectre v1 vulnerabilities net: bridge: fix per-port af_packet sockets net: bridge: multicast: use rcu to access port list from br_multicast_start_querier net: Fix missing meta data in skb with vlan packet net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv tcp: tcp_grow_window() needs to respect tcp_space() team: set slave to promisc if team is already in promisc mode tipc: missing entries in name table of publications vhost: reject zero size iova range ipv4: recompile ip options in ipv4_link_failure ipv4: ensure rcu_read_lock() in ipv4_link_failure() net: thunderx: raise XDP MTU to 1508 net: thunderx: don't allow jumbo frames with XDP net/mlx5: FPGA, tls, hold rcu read lock a bit longer net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded() net/mlx5: FPGA, tls, idr remove on flow delete route: Avoid crash from dereferencing NULL rt->from sch_cake: Use tc_skb_protocol() helper for getting packet protocol sch_cake: Make sure we can write the IP header before changing DSCP bits nfp: flower: replace CFI with vlan present nfp: flower: remove vlan CFI bit from push vlan action sch_cake: Simplify logic in cake_select_tin() net: IP defrag: encapsulate rbtree defrag code into callable functions net: IP6 defrag: use rbtrees for IPv6 defrag net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c CIFS: keep FileInfo handle live during oplock break cifs: Fix use-after-free in SMB2_write cifs: Fix use-after-free in SMB2_read cifs: fix handle leak in smb2_query_symlink() KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU KVM: x86: svm: make sure NMI is injected after nmi_singlestep Staging: iio: meter: fixed typo staging: iio: ad7192: Fix ad7193 channel address iio: gyro: mpu3050: fix chip ID reading iio/gyro/bmg160: Use millidegrees for temperature scale iio:chemical:bme680: Fix, report temperature in millidegrees iio:chemical:bme680: Fix SPI read interface iio: cros_ec: Fix the maths for gyro scale calculation iio: ad_sigma_delta: select channel when reading register iio: dac: mcp4725: add missing powerdown bits in store eeprom iio: Fix scan mask selection iio: adc: at91: disable adc channel interrupt in timeout case iio: core: fix a possible circular locking dependency io: accel: kxcjk1013: restore the range after resume. staging: most: core: use device description as name staging: comedi: vmk80xx: Fix use of uninitialized semaphore staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf staging: comedi: ni_usb6501: Fix use of uninitialized mutex staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf ALSA: hda/realtek - add two more pin configuration sets to quirk table ALSA: core: Fix card races between register and disconnect Input: elan_i2c - add hardware ID for multiple Lenovo laptops serial: sh-sci: Fix HSCIF RX sampling point adjustment serial: sh-sci: Fix HSCIF RX sampling point calculation vt: fix cursor when clearing the screen scsi: core: set result when the command cannot be dispatched Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO" Revert "svm: Fix AVIC incomplete IPI emulation" coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier crypto: x86/poly1305 - fix overflow during partial reduction drm/ttm: fix out-of-bounds read in ttm_put_pages() v2 arm64: futex: Restore oldval initialization to work around buggy compilers x86/kprobes: Verify stack frame on kretprobe kprobes: Mark ftrace mcount handler functions nokprobe kprobes: Fix error check when reusing optimized probes rt2x00: do not increment sequence number while re-transmitting mac80211: do not call driver wake_tx_queue op during reconfig drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming perf/x86/amd: Add event map for AMD Family 17h x86/cpu/bugs: Use __initconst for 'const' init data perf/x86: Fix incorrect PEBS_REGS x86/speculation: Prevent deadlock on ssb_state::lock timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() nfit/ars: Remove ars_start_flags nfit/ars: Introduce scrub_flags nfit/ars: Allow root to busy-poll the ARS state machine nfit/ars: Avoid stale ARS results mmc: sdhci: Fix data command CRC error handling mmc: sdhci: Rename SDHCI_ACMD12_ERR and SDHCI_INT_ACMD12ERR mmc: sdhci: Handle auto-command errors modpost: file2alias: go back to simple devtable lookup modpost: file2alias: check prototype of handler tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete tpm: Fix the type of the return value in calc_tpm2_event_size() Revert "kbuild: use -Oz instead of -Os when using clang" sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup device_cgroup: fix RCU imbalance in error case mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n ALSA: info: Fix racy addition/deletion of nodes percpu: stop printing kernel addresses tools include: Adopt linux/bits.h ASoC: rockchip: add missing INTERLEAVED PCM attribute i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()" kernel/sysctl.c: fix out-of-bounds access when setting file-max Linux 4.19.37 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-04-30 12:53:00 +02:00
Andrea Arcangeli	6ff17bc593	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping commit `04f5866e41` upstream. The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: `86039bd3b4`" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: `86039bd3b4` ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-04-27 09:36:37 +02:00
Greg Kroah-Hartman	d885da678e	This is the 4.19.34 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlynu40ACgkQONu9yGCS aT5X6g//Wkfm/+qSZ0GhLDQkPniiH1QkvzhOmVrrxu+KB0qsiwsEl8Srw33ZVkJK LT8+IPGiG9jEGu9dj+BYXTIfy9ZvfSsEL2N6GhYwDSXP0fok2rUaHbZvv1IB2g4W afhGdNwNAUCJ/j1UrUsi+SAFJ+xWbVxFpGstd0cqM9IbKdEV7RIukvuKckHiKOKR qI8FxC+G2PAr+BtnETfk5/suPDJ7B3ZicDoMhiWJGxJ6dfFTVmkSmasSoPDaMiHm 4S3hN2lu+WTeRpRPPB17Dlk4MmIp0k+bGYBKAlaxAMCc/RZxvbT2pRYaMQbId2/L mNUfSnOQFGEAhlAPfb7wdbObphnyT34GhlkWfZBTrnhPO0/FomLOvU6xVdcNuakX Tv2JKfDzb+2ttcMZ+0T84Ru9RztoswFATSw8uFMVxW8oTS6MVWnHu96Kxfl7QO3J PdlIGcyqxSuWNE8OX1QVtdSruGZfwUDNs94S4nQJtkB8BViRwhGJlqaXuy4d9Wp6 fGlI2W6qhjyosi2wBSMTjh/ytk/jq0vfs+z2XjR2gAYssvB/SOLR/AlSVguWsDnf WaoFBkXvCbuPvPlo0TrLpl5RW5WlOtLXHE3Vr3dKp458wLwpf/OZBGoZiknp7DrF PzBZs2ie5tmyqTxbAygl7WkbQPJ682pd5R4nf5CY+zvUaOMZv1g= =Iuup -----END PGP SIGNATURE----- Merge 4.19.34 into android-4.19 Changes in 4.19.34 arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals ext4: cleanup bh release code in ext4_ind_remove_space() tty/serial: atmel: Add is_half_duplex helper tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped CIFS: fix POSIX lock leak and invalid ptr deref h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- f2fs: fix to adapt small inline xattr space in __find_inline_xattr() f2fs: fix to avoid deadlock in f2fs_read_inline_dir() tracing: kdb: Fix ftdump to not sleep net/mlx5: Avoid panic when setting vport rate net/mlx5: Avoid panic when setting vport mac, getting vport config gpio: gpio-omap: fix level interrupt idling include/linux/relay.h: fix percpu annotation in struct rchan sysctl: handle overflow for file-max net: stmmac: Avoid sometimes uninitialized Clang warnings enic: fix build warning without CONFIG_CPUMASK_OFFSTACK libbpf: force fixdep compilation at the start of the build scsi: hisi_sas: Set PHY linkrate when disconnected scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver x86/hyperv: Fix kernel panic when kexec on HyperV perf c2c: Fix c2c report for empty numa node mm/sparse: fix a bad comparison mm/cma.c: cma_declare_contiguous: correct err handling mm/page_ext.c: fix an imbalance with kmemleak mm, swap: bounds check swap_info array accesses to avoid NULL derefs mm,oom: don't kill global init via memory.oom.group memcg: killed threads should not invoke memcg OOM killer mm, mempolicy: fix uninit memory access mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! mm/slab.c: kmemleak no scan alien caches ocfs2: fix a panic problem caused by o2cb_ctl f2fs: do not use mutex lock in atomic context fs/file.c: initialize init_files.resize_wait page_poison: play nicely with KASAN cifs: use correct format characters dm thin: add sanity checks to thin-pool and external snapshot creation f2fs: fix to check inline_xattr_size boundary correctly cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED cifs: Fix NULL pointer dereference of devname netfilter: nf_tables: check the result of dereferencing base_chain->stats netfilter: conntrack: tcp: only close if RST matches exact sequence jbd2: fix invalid descriptor block checksum fs: fix guard_bio_eod to check for real EOD errors tools lib traceevent: Fix buffer overflow in arg_eval PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove() wil6210: check null pointer in _wil_cfg80211_merge_extra_ies mt76: fix a leaked reference by adding a missing of_node_put crypto: crypto4xx - add missing of_node_put after of_device_is_available crypto: cavium/zip - fix collision with generic cra_driver_name usb: chipidea: Grab the (legacy) USB PHY by phandle first powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc coresight: etm4x: Add support to enable ETMv4.2 serial: 8250_pxa: honor the port number from devicetree ARM: 8840/1: use a raw_spinlock_t in unwind iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback btrfs: qgroup: Make qgroup async transaction commit more aggressive mmc: omap: fix the maximum timeout setting net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat e1000e: Fix -Wformat-truncation warnings mlxsw: spectrum: Avoid -Wformat-truncation warnings platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN platform/mellanox: mlxreg-hotplug: Fix KASAN warning loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() IB/mlx4: Increase the timeout for CM cache clk: fractional-divider: check parent rate only if flag is set perf annotate: Fix getting source line failure ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of() cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies efi: cper: Fix possible out-of-bounds access s390/ism: ignore some errors during deregistration scsi: megaraid_sas: return error when create DMA pool failed scsi: fcoe: make use of fip_mode enum complete drm/amd/display: Clear stream->mode_changed after commit perf test: Fix failure of 'evsel-tp-sched' test on s390 mwifiex: don't advertise IBSS features without FW support perf report: Don't shadow inlined symbol with different addr range SoC: imx-sgtl5000: add missing put_device() media: ov7740: fix runtime pm initialization media: sh_veu: Correct return type for mem2mem buffer helpers media: s5p-jpeg: Correct return type for mem2mem buffer helpers media: rockchip/rga: Correct return type for mem2mem buffer helpers media: s5p-g2d: Correct return type for mem2mem buffer helpers media: mx2_emmaprp: Correct return type for mem2mem buffer helpers media: mtk-jpeg: Correct return type for mem2mem buffer helpers mt76: usb: do not run mt76u_queues_deinit twice xen/gntdev: Do not destroy context while dma-bufs are in use vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1 HID: intel-ish-hid: avoid binding wrong ishtp_cl_device cgroup, rstat: Don't flush subtree root unless necessary jbd2: fix race when writing superblock leds: lp55xx: fix null deref on firmware load failure perf report: Add s390 diagnosic sampling descriptor size iwlwifi: pcie: fix emergency path ACPI / video: Refactor and fix dmi_is_desktop() selftests: skip seccomp get_metadata test if not real root kprobes: Prohibit probing on bsearch() kprobes: Prohibit probing on RCU debug routine netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm ARM: 8833/1: Ensure that NEON code always compiles with Clang ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins ALSA: PCM: check if ops are defined before suspending PCM ath10k: fix shadow register implementation for WCN3990 usb: f_fs: Avoid crash due to out-of-scope stack ptr access sched/topology: Fix percpu data types in struct sd_data & struct s_data bcache: fix input overflow to cache set sysfs file io_error_halflife bcache: fix input overflow to sequential_cutoff bcache: fix potential div-zero error of writeback_rate_i_term_inverse bcache: improve sysfs_strtoul_clamp() genirq: Avoid summation loops for /proc/stat net: marvell: mvpp2: fix stuck in-band SGMII negotiation iw_cxgb4: fix srqidx leak during connection abort net: phy: consider latched link-down status in polling mode fbdev: fbmem: fix memory access if logo is bigger than the screen cdrom: Fix race condition in cdrom_sysctl_register drm: rcar-du: add missing of_node_put drm/amd/display: Don't re-program planes for DPMS changes drm/amd/display: Disconnect mpcc when changing tg perf/aux: Make perf_event accessible to setup_aux() e1000e: fix cyclic resets at link up with active tx e1000e: Exclude device from suspend direct complete optimization platform/x86: intel_pmc_core: Fix PCH IP sts reading i2c: of: Try to find an I2C adapter matching the parent staging: spi: mt7621: Add return code check on device_reset() iwlwifi: mvm: fix RFH config command with >=10 CPUs ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK efi/memattr: Don't bail on zero VA if it equals the region's PA sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock() drm/vkms: Bugfix extra vblank frame ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted soc: qcom: gsbi: Fix error handling in gsbi_probe() mt7601u: bump supported EEPROM version ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of ARM: avoid Cortex-A9 livelock on tight dmb loops block, bfq: fix in-service-queue check for queue merging bpf: fix missing prototype warnings selftests/bpf: skip verifier tests for unsupported program types powerpc/64s: Clear on-stack exception marker upon exception return cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting backlight: pwm_bl: Use gpiod_get_value_cansleep() to get initial state tty: increase the default flip buffer limit to 2640K powerpc/pseries: Perform full re-add of CPU for topology update post-migration drm/amd/display: Enable vblank interrupt during CRC capture ALSA: dice: add support for Solid State Logic Duende Classic/Mini usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded platform/x86: intel-hid: Missing power button release on some Dell models perf script python: Use PyBytes for attr in trace-event-python perf script python: Add trace_context extension module to sys.modules media: mt9m111: set initial frame size other than 0x0 hwrng: virtio - Avoid repeated init of completion soc/tegra: fuse: Fix illegal free of IO base address HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit f2fs: UBSAN: set boolean value iostat_enable correctly hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable cpu/hotplug: Mute hotplug lockdep during init dmaengine: imx-dma: fix warning comparison of distinct pointer types dmaengine: qcom_hidma: assign channel cookie correctly dmaengine: qcom_hidma: initialize tx flags in hidma_prep_dma_ netfilter: physdev: relax br_netfilter dependency media: rcar-vin: Allow independent VIN link enablement media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting pinctrl: meson: meson8b: add the eth_rxd2 and eth_rxd3 pins drm: Auto-set allow_fb_modifiers when given modifiers at plane init drm/nouveau: Stop using drm_crtc_force_disable x86/build: Specify elf_i386 linker emulation explicitly for i386 objects selinux: do not override context on context mounts brcmfmac: Use firmware_request_nowarn for the clm_blob wlcore: Fix memory leak in case wl12xx_fetch_firmware failure x86/build: Mark per-CPU symbols as absolute explicitly for LLD drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup clk: meson: clean-up clock registration clk: rockchip: fix frac settings of GPLL clock for rk3328 dmaengine: tegra: avoid overflow of byte tracking Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers net: stmmac: Avoid one more sometimes uninitialized Clang warning ACPI / video: Extend chassis-type detection with a "Lunch Box" check bcache: fix potential div-zero error of writeback_rate_p_term_inverse kprobes/x86: Blacklist non-attachable interrupt functions Linux 4.19.34 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-04-05 22:43:09 +02:00
Luc Van Oostenryck	845d4849b6	sched/topology: Fix percpu data types in struct sd_data & struct s_data [ Upstream commit `99687cdbb3` ] The percpu members of struct sd_data and s_data are declared as: struct ... ** __percpu member; So their type is: __percpu pointer to pointer to struct ... But looking at how they're used, their type should be: pointer to __percpu pointer to struct ... and they should thus be declared as: struct ... * __percpu member; So fix the placement of '__percpu' in the definition of these structures. This addresses a bunch of Sparse's warnings like: warning: incorrect type in initializer (different address spaces) expected void const [noderef] <asn:3> __vpp_verify got struct sched_domain ** Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190118144936.79158-1-luc.vanoostenryck@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-04-05 22:33:09 +02:00
Johannes Weiner	4c9c09affa	UPSTREAM: sched: loadavg: make calc_load_n() public It's going to be used in a later patch. Keep the churn separate. Link: http://lkml.kernel.org/r/20180828172258.3185-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Daniel Drake <drake@endlessm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `5c54f5b9ed`) Bug: 127712811 Test: lmkd in PSI mode Change-Id: I50e0cb0dbf20ced329a484493f82ff69ca1ae97a Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-03-21 16:25:27 -07:00
Johannes Weiner	2ba18b41d3	BACKPORT: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD There are several definitions of those functions/macros in places that mess with fixed-point load averages. Provide an official version. [akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c] Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Daniel Drake <drake@endlessm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit `8508cf3ffa`) Conflicts: block/blk-iolatency.c (1. manual merge to replace stat->rqs.mean with stat.mean) Bug: 127712811 Test: lmkd in PSI mode Change-Id: I716b4874491cff75a2355c6d95c64cf02d05e7ee Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-03-21 16:25:26 -07:00
Greg Kroah-Hartman	c28f73fe42	This is the 4.19.20 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxbC5gACgkQONu9yGCS aT4DYQ//Uqm/Q63KQuExgd7W+61FoP4NFHlYXZ31B5Rkydryyk2K5P2ONSdVd5n9 k3wjzRrxvlPvjOwbh9PHv+pLkGxBDqpT1X8IAXPe36bYUkvXoH71BE4YSRPRUJAf sdzw/vs7WE/Kx41iT3SXiQih8ok0y3LoACBKmUsEXoLI1cJZCUnnSFpP++QNe1Iz B/y04BigL8R7OWR/jow6OPWe9uXOI8iEe9QKVX26g4oaakzly4vkp6OwROSwM31q 0wut8jF/AtDcZpZXjJLjDCj10k5DRN8jwGcLD7iZeIKqexOabjUrsvfIHfbpUtXr e7pJw2aUM8BFb8Ba2lsB7gkqvdHQohqVKQE4Qy59aPyesm2G5miH4gAbncoixjCa u3eQV5ACpFLksUFR4RAMKq+10k7swsutyyJr5vG4qdbRpcTCNJirEwAGGqgI6IEP SDqtw6u8gMP8+SicwA9p71Wwntcq9RR6fx0gX/3wi2DQp6F8Txem00SqaciE7uQ1 uIOUrhcpWzIq4m58SGhgTSQcBkm5qBD5S154/xRKIo0mvME+NwBub/x3fIsixN/u AzWQmQPXBajHbYXbKGC7t2jNHkU5d9FedZ4iDmJk/+ZZsWyFByY1bH1cg4Qnq89e tDxL114YmSujbZD/mFlbGWcqdmGNT355BmyetKDx6w0rNiU/RBU= =oprJ -----END PGP SIGNATURE----- Merge 4.19.20 into android-4.19 Changes in 4.19.20 Fix "net: ipv4: do not handle duplicate fragments as overlapping" drm/msm/gpu: fix building without debugfs ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation ipvlan, l3mdev: fix broken l3s mode wrt local routes l2tp: copy 4 more bytes to linear part if necessary l2tp: fix reading optional fields of L2TPv3 net: ip_gre: always reports o_key to userspace net: ip_gre: use erspan key field for tunnel lookup net/mlx4_core: Add masking for a few queries on HCA caps netrom: switch to sock timer API net/rose: fix NULL ax25_cb kernel panic net: set default network namespace in init_dummy_netdev() ravb: expand rx descriptor data to accommodate hw checksum sctp: improve the events for sctp stream reset tun: move the call to tun_set_real_num_queues ucc_geth: Reset BQL queue when stopping device vhost: fix OOB in get_rx_bufs() net: ip6_gre: always reports o_key to userspace sctp: improve the events for sctp stream adding net/mlx5e: Allow MAC invalidation while spoofchk is ON ip6mr: Fix notifiers call on mroute_clean_tables() Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager" sctp: set chunk transport correctly when it's a new asoc sctp: set flow sport from saddr only when it's 0 virtio_net: Don't enable NAPI when interface is down virtio_net: Don't call free_old_xmit_skbs for xdp_frames virtio_net: Fix not restoring real_num_rx_queues virtio_net: Fix out of bounds access of sq virtio_net: Don't process redirected XDP frames when XDP is disabled virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs virtio_net: Differentiate sk_buff and xdp_frame on freeing CIFS: Do not count -ENODATA as failure for query directory CIFS: Fix trace command logging for SMB2 reads and writes CIFS: Do not consider -ENODATA as stat failure for reads fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions() selftests/seccomp: Enhance per-arch ptrace syscall skip tests NFS: Fix up return value on fatal errors in nfs_page_async_flush() ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment arm64: kaslr: ensure randomized quantities are clean also when kaslr is off arm64: Do not issue IPIs for user executable ptes arm64: hyp-stub: Forbid kprobing of the hyp-stub arm64: hibernate: Clean the __hyp_text to PoC after resume gpio: altera-a10sr: Set proper output level for direction_output gpiolib: fix line event timestamps for nested irqs gpio: pcf857x: Fix interrupts on multiple instances gpio: sprd: Fix the incorrect data register gpio: sprd: Fix incorrect irq type setting for the async EIC gfs2: Revert "Fix loop in gfs2_rbm_find" mmc: bcm2835: Fix DMA channel leak on probe error mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay ALSA: usb-audio: Add Opus #3 to quirks for native DSD support ALSA: hda/realtek - Fixed hp_pin no value IB/hfi1: Remove overly conservative VM_EXEC flag check platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes mmc: sdhci-iproc: handle mmc_of_parse() errors during probe Btrfs: fix deadlock when allocating tree block during leaf/node split btrfs: On error always free subvol_name in btrfs_mount kernel/exit.c: release ptraced tasks before zap_pid_ns_processes mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT oom, oom_reaper: do not enqueue same task twice mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages mm, oom: fix use-after-free in oom_kill_process mm: hwpoison: use do_send_sig_info() instead of force_sig() mm: migrate: don't rely on __PageMovable() of newpage after unlocking it of: Convert to using %pOFn instead of device_node.name of: overlay: add tests to validate kfrees from overlay removal of: overlay: add missing of_node_get() in __of_attach_node_sysfs of: overlay: use prop add changeset entry for property in new nodes of: overlay: do not duplicate properties from overlay for new nodes md/raid5: fix 'out of memory' during raid cache recovery cifs: Always resolve hostname before reconnecting Linux 4.19.20 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-02-07 08:40:17 +01:00
Tetsuo Handa	7e70ddc332	oom, oom_reaper: do not enqueue same task twice commit `9bcdeb51bd` upstream. Arkadiusz reported that enabling memcg's group oom killing causes strange memcg statistics where there is no task in a memcg despite the number of tasks in that memcg is not 0. It turned out that there is a bug in wake_oom_reaper() which allows enqueuing same task twice which makes impossible to decrease the number of tasks in that memcg due to a refcount leak. This bug existed since the OOM reaper became invokable from task_will_free_mem(current) path in out_of_memory() in Linux 4.7, T1@P1 \|T2@P1 \|T3@P1 \|OOM reaper ----------+----------+----------+------------ # Processing an OOM victim in a different memcg domain. try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) out_of_memory() oom_kill_process(P1) do_send_sig_info(SIGKILL, @P1) mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T2@P1) wake_oom_reaper(T2@P1) # T2@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL. mutex_unlock(&oom_lock) # Completed processing an OOM victim in a different memcg domain. spin_lock(&oom_reaper_lock) # T1P1 is dequeued. spin_unlock(&oom_reaper_lock) but memcg's group oom killing made it easier to trigger this bug by calling wake_oom_reaper() on the same task from one out_of_memory() request. Fix this bug using an approach used by commit `855b018325` ("oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a side effect of this patch, this patch also avoids enqueuing multiple threads sharing memory via task_will_free_mem(current) path. Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp Fixes: `af8e15cc85` ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Aleksa Sarai <asarai@suse.de> Cc: Jay Kamat <jgkamat@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-02-06 17:30:14 +01:00
Quentin Perret	3ed3d89a25	Revert "FROMLIST: sched: Introduce a sysctl for Energy Aware Scheduling" This reverts commit `b78eec5fb7`. It has not been accepted upstream and is of no particular interest in Android since EAS is always enabled anyway. Bug: 120440300 Change-Id: I7d1f2ad206c05041d386fc99f18e29c4fa56cbc8 Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-01-04 09:17:55 +00:00
Quentin Perret	81dd278ee4	ANDROID: sched: Align EAS with upstream The core EAS patches have now been accepted upstream. The patches used in Android are based on a slightly earlier version of the series. In order to reduce the delta with mainline and ease backports, align the EAS code paths with their upstream version. This basically applies the output of git range-diff on the appropriate commits, and fixes a conflict in schedutil regarding the integration of schedtune. Bug: 120440300 Change-Id: I208ebeb4207e3f4f4bbb5103c606b293a464c20f Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2019-01-04 09:17:54 +00:00
Greg Kroah-Hartman	c454ec1e21	This is the 4.19.7 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwIG48ACgkQONu9yGCS aT7g6Q//RkJ8ZWaRkykcCGaWIvwI6QF1tmKalIEWmToPdndDuQdUDGzWVwfE9G7P yLcnp3GMlXo4F82BBwG8lFSAm9zaeqaLabnJnXbCc5mZ3xi/2aNqIGHzBY1isNZl 0fTzzcelnAKzjp0Aa/egRLOeraSLgVt/Cp7Ha3FXMP6RNxUMzs1pbQ2IFZ3m+P4G CAD3Iye6geOaZTu/kXiiooUEUGFQFbV4c3AZ4VW7dZDdrG+ekwtF4YHtkEPseWJQ Ugtrbr6S0IxYQ91o1Pk77kg4uwUFYo12jrk8Ni4gaPZE6mQCa08tr2Alg2oZkJGw PdXnt2ASYGRWFYK2JAuTvKzhHrTEJYhiC323dKYCAx7BgfFaqdo5F20oNzYxXFBB gGA3AzDDtLUD3OOO+lxrDxXMhpwXUx92WXsoJVsaSafdqIDAueq14sH19wqm0gUJ D1fC2dWTsFrPZKjkU8Z6rJAyO1XZED55h7v1YlqAt2ibjCeDKpjnW3yvUt8Ivpqc nlnmp8v/Yl2cdY55XtlgUadpknSc2jApFMwhSWetxAaqDCvha2dLQ28YMyPRJzat ZHOkizM/VUntXvlUzFvVTsqLQiX0sfLG6MKcUkzWehPomNKT+B8XL1wtzytv9QXb jOY8nRD5PiQo2p35cqdDCskBwqzEwY+WxDe7ji0yHZysBZLxoxQ= =OiCf -----END PGP SIGNATURE----- Merge 4.19.7 into android-4.19 Changes in 4.19.7 mm/huge_memory: rename freeze_page() to unmap_page() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/khugepaged: collapse_shmem() stop if punched or truncated mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: collapse_shmem() do not crash on Compound lan743x: Enable driver to work with LAN7431 lan743x: fix return value for lan743x_tx_napi_poll net: don't keep lonely packets forever in the gro hash net: gemini: Fix copy/paste error net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue packet: copy user buffers before orphan or clone rapidio/rionet: do not free skb before reading its length s390/qeth: fix length check in SNMP processing usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 net: thunderx: set xdp_prog to NULL if bpf_prog_add fails net: skb_scrub_packet(): Scrub offload_fwd_mark virtio-net: disable guest csum during XDP set virtio-net: fail XDP set if guest csum is negotiated net/dim: Update DIM start sample after each DIM iteration tcp: defer SACK compression after DupThresh net: phy: add workaround for issue where PHY driver doesn't bind to the device tipc: fix lockdep warning during node delete x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation x86/speculation: Apply IBPB more strictly to avoid cross-process data leak x86/speculation: Propagate information about RSB filling mitigation to sysfs x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support x86/retpoline: Remove minimal retpoline support x86/speculation: Update the TIF_SSBD comment x86/speculation: Clean up spectre_v2_parse_cmdline() x86/speculation: Remove unnecessary ret variable in cpu_show_common() x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common() x86/speculation: Disable STIBP when enhanced IBRS is in use x86/speculation: Rename SSBD update functions x86/speculation: Reorganize speculation control MSRs update sched/smt: Make sched_smt_present track topology x86/Kconfig: Select SCHED_SMT if SMP enabled sched/smt: Expose sched_smt_present static key x86/speculation: Rework SMT state change x86/l1tf: Show actual SMT state x86/speculation: Reorder the spec_v2 code x86/speculation: Mark string arrays const correctly x86/speculataion: Mark command line parser data __initdata x86/speculation: Unify conditional spectre v2 print functions x86/speculation: Add command line control for indirect branch speculation x86/speculation: Prepare for per task indirect branch speculation control x86/process: Consolidate and simplify switch_to_xtra() code x86/speculation: Avoid __switch_to_xtra() calls x86/speculation: Prepare for conditional IBPB in switch_mm() ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS x86/speculation: Split out TIF update x86/speculation: Prevent stale SPEC_CTRL msr content x86/speculation: Prepare arch_smt_update() for PRCTL mode x86/speculation: Add prctl() control for indirect branch speculation x86/speculation: Enable prctl mode for spectre_v2_user x86/speculation: Add seccomp Spectre v2 user space protection mode x86/speculation: Provide IBPB always command line options userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas kvm: mmu: Fix race in emulated page table writes kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall KVM: LAPIC: Fix pv ipis use-before-initialization KVM: X86: Fix scan ioapic use-before-initialization KVM: VMX: re-add ple_gap module parameter xtensa: enable coprocessors that are being flushed xtensa: fix coprocessor context offset definitions xtensa: fix coprocessor part of ptrace_{get,set}xregs udf: Allow mounting volumes with incorrect identification strings btrfs: Always try all copies when reading extent buffers Btrfs: ensure path name is null terminated at btrfs_control_ioctl Btrfs: fix rare chances for data loss when doing a fast fsync Btrfs: fix race between enabling quotas and subvolume creation btrfs: relocation: set trans to be NULL after ending transaction PCI: layerscape: Fix wrong invocation of outbound window disable accessor PCI: dwc: Fix MSI-X EP framework address calculation bug PCI: Fix incorrect value returned from pcie_get_speed_cap() arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou. x86/MCE/AMD: Fix the thresholding machinery initialization order x86/fpu: Disable bottom halves while loading FPU registers perf/x86/intel: Move branch tracing setup to the Intel-specific source file perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() perf/x86/intel: Disallow precise_ip on BTS events fs: fix lost error code in dio_complete ALSA: wss: Fix invalid snd_free_pages() at error path ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write ALSA: control: Fix race between adding and removing a user element ALSA: sparc: Fix invalid snd_free_pages() at error path ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist ALSA: hda/realtek - Support ALC300 ALSA: hda/realtek - fix headset mic detection for MSI MS-B171 ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop function_graph: Create function_graph_enter() to consolidate architecture code ARM: function_graph: Simplify with function_graph_enter() microblaze: function_graph: Simplify with function_graph_enter() x86/function_graph: Simplify with function_graph_enter() nds32: function_graph: Simplify with function_graph_enter() powerpc/function_graph: Simplify with function_graph_enter() sh/function_graph: Simplify with function_graph_enter() sparc/function_graph: Simplify with function_graph_enter() parisc: function_graph: Simplify with function_graph_enter() riscv/function_graph: Simplify with function_graph_enter() s390/function_graph: Simplify with function_graph_enter() arm64: function_graph: Simplify with function_graph_enter() MIPS: function_graph: Simplify with function_graph_enter() function_graph: Make ftrace_push_return_trace() static function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack function_graph: Have profiler use curr_ret_stack and not depth function_graph: Move return callback before update of curr_ret_stack function_graph: Reverse the order of pushing the ret_stack and the callback binder: fix race that allows malicious free of live buffer ext2: initialize opts.s_mount_opt as zero before using it ext2: fix potential use after free ASoC: intel: cht_bsw_max98090_ti: Add quirk for boards using pmc_plt_clk_0 ASoC: pcm186x: Fix device reset-registers trigger value ARM: dts: rockchip: Remove @0 from the veyron memory node dmaengine: at_hdmac: fix memory leak in at_dma_xlate() dmaengine: at_hdmac: fix module unloading staging: most: use format specifier "%s" in snprintf staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION staging: mt7621-dma: fix potentially dereferencing uninitialized 'tx_desc' staging: mt7621-pinctrl: fix uninitialized variable ngroups staging: rtl8723bs: Fix incorrect sense of ether_addr_equal staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station USB: usb-storage: Add new IDs to ums-realtek usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid" iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers iio:st_magn: Fix enable device after trigger lib/test_kmod.c: fix rmmod double free mm: cleancache: fix corruption on missed inode invalidation mm: use swp_offset as key in shmem_replace_page() Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl() misc: mic/scif: fix copy-paste error in scif_create_remote_lookup Linux 4.19.7 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-12-06 10:34:09 +01:00
Thomas Gleixner	f55e301ec4	x86/speculation: Rework SMT state change commit `a74cfffb03` upstream arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline. To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled. Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-12-05 19:32:02 +01:00
Thomas Gleixner	340693ee91	sched/smt: Expose sched_smt_present static key commit `321a874a7e` upstream Make the scheduler's 'sched_smt_present' static key globaly available, so it can be used in the x86 speculation control code. Provide a query function and a stub for the CONFIG_SMP=n case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-12-05 19:32:02 +01:00
Jin Qian	3fd220679c	ANDROID: taskstats: track fsync syscalls This adds a counter to the taskstats extended accounting fields, which tracks the number of times fsync is called, and then plumbs it through to the uid_sys_stats driver. Bug: 120442023 Change-Id: I6c138de5b2332eea70f57e098134d1d141247b3f Signed-off-by: Jin Qian <jinqian@google.com> [AmitP: Refactored changes to align with changes from upstream commit `9a07000400` ("sched/headers: Move CONFIG_TASK_XACCT bits from <linux/sched.h> to <linux/sched/xacct.h>")] Signed-off-by: Amit Pundir <amit.pundir@linaro.org> [tkjos: Needed for storaged fsync accounting ("storaged --uid" and "storaged --task").] [astrachan: This is modifying a userspace interface and should probably be reworked] Signed-off-by: Alistair Strachan <astrachan@google.com>	2018-12-05 09:48:12 -08:00
Brendan Jackman	88d05f9257	FROMLIST: sched/fair: Use wake_q length as a hint for wake_wide This patch adds a parameter to select_task_rq, sibling_count_hint allowing the caller, where it has this information, to inform the sched_class the number of tasks that are being woken up as part of the same event. The wake_q mechanism is one case where this information is available. select_task_rq_fair can then use the information to detect that it needs to widen the search space for task placement in order to avoid overloading the last-level cache domain's CPUs. * * * The reason I am investigating this change is the following use case on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which all repeatedly do X amount of work then pthread_barrier_wait (i.e. sleep until the last task finishes its X and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU finish faster, and then those CPUs pull over the tasks that are still running: v CPU v ->time-> ------------- 0 (big) 11111 /333 ------------- 1 (big) 22222 /444\| ------------- 2 (LITTLE) 333333/ ------------- 3 (LITTLE) 444444/ ------------- Now when task 4 hits the barrier (at \|) and wakes the others up, there are 4 tasks with prev_cpu=<big> and 0 tasks with prev_cpu=<little>. want_affine therefore means that we'll only look in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled on the bigs until the next load balance, something like this: v CPU v ->time-> ------------------------ 0 (big) 11111 /333 31313\33333 ------------------------ 1 (big) 22222 /444\|424\4444444 ------------------------ 2 (LITTLE) 333333/ \222222 ------------------------ 3 (LITTLE) 444444/ \1111 ------------------------ ^^^ underutilization So, I'm trying to get want_affine = 0 for these tasks. I don't _think_ any incarnation of the wakee_flips mechanism can help us here because which task is waker and which tasks are wakees generally changes with each iteration. However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the nice property that we know exactly how many tasks are being woken, so we can cheat. It might be a disadvantage that we "widen" _every_ task that's woken in an event, while select_idle_sibling would work fine for the first sd_llc_size - 1 tasks. IIUC, if wake_affine() behaves correctly this trick wouldn't be necessary on SMP systems, so it might be best guarded by the presence of SD_ASYM_CPUCAPACITY? * * * Final note.. In order to observe "perfect" behaviour for this use case, I also had to disable the TTWU_QUEUE sched feature. Suppose during the wakeup above we are working through the work queue and have placed tasks 3 and 2, and are about to place task 1: v CPU v ->time-> -------------- 0 (big) 11111 /333 3 -------------- 1 (big) 22222 /444\|4 -------------- 2 (LITTLE) 333333/ 2 -------------- 3 (LITTLE) 444444/ <- Task 1 should go here -------------- If TTWU_QUEUE is enabled, we will not yet have enqueued task 2 (having instead sent a reschedule IPI) or attached its load to CPU 2. So we are likely to also place task 1 on cpu 2. Disabling TTWU_QUEUE means that we enqueue task 2 before placing task 1, solving this issue. TTWU_QUEUE is there to minimise rq lock contention, and I guess that this contention is less of an issue on big.LITTLE systems since they have relatively few CPUs, which suggests the trade-off makes sense here. Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Matt Fleming <matt@codeblueprint.co.uk> ( - Applied from https://patchwork.kernel.org/patch/9895261/ - Fixed trivial conflict in kernel/sched/core.c - Fixed select_task_rq_idle, now in kernel/sched/idle.c - Fixed trivial conflict in select_task_rq_fair ) Signed-off-by: Quentin Perret <quentin.perret@arm.com> Change-Id: I3cfc4bf48c3d7feef969db4d22449f4fbb4f795d	2018-10-26 12:15:52 +01:00
Chris Redpath	af362094df	ANDROID: sched: Unconditionally honor sync flag for energy-aware wakeups Since we don't do energy-aware wakeups when we are overutilized, always honoring sync wakeups in this state does not prevent wake-wide mechanics overruling the flag as normal. This patch is based upon previous work to build EAS for android products. sync-hint code taken from commit 4a5e890ec60d "sched/fair: add tunable to force selection at cpu granularity" written by Juri Lelli <juri.lelli@arm.com> Signed-off-by: Chris Redpath <chris.redpath@arm.com> (cherry-picked from commit f1ec666a62dec1083ed52fe1ddef093b84373aaf) [ Moved the feature to find_energy_efficient_cpu() ] Signed-off-by: Quentin Perret <quentin.perret@arm.com> Change-Id: I4b3d79141fc8e53dc51cd63ac11096c2e3cb10f5	2018-10-26 12:14:32 +01:00
Quentin Perret	2e88529cf8	ANDROID: sched: Introduce sysctl_sched_cstate_aware Introduce a new sysctl for this option, 'sched_cstate_aware'. When this is enabled, the scheduler can make use of the idle state indexes in order to break the tie between potential CPU candidates. This patch is based on 7f6fb825d6bc ("ANDROID: sched: EAS: take cstate into account when selecting idle core") from android-4.14. All the credits goes to the authors. Change-Id: Ia076cf32faff91e90905291fa6f7924dc3dd6458 Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-10-26 11:58:00 +01:00
Quentin Perret	b78eec5fb7	FROMLIST: sched: Introduce a sysctl for Energy Aware Scheduling In its current state, Energy Aware Scheduling (EAS) starts automatically on asymmetric platforms having an Energy Model (EM). However, there are users who want to have an EM (for thermal management for example), but don't want EAS with it. In order to let users disable EAS explicitly, introduce a new sysctl called 'sched_energy_aware'. It is enabled by default so that EAS can start automatically on platforms where it makes sense. Flipping it to 0 rebuilds the scheduling domains and disables EAS. Change-Id: I55764e70bf5e90795d2269ec9135ae6e82794a2b Signed-off-by: Quentin Perret <quentin.perret@arm.com> Message-Id: <20181016101513.26919-11-quentin.perret@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-10-26 11:47:12 +01:00
Quentin Perret	1e6b1214f1	FROMLIST: sched/topology: Make Energy Aware Scheduling depend on schedutil Energy Aware Scheduling (EAS) is designed with the assumption that frequencies of CPUs follow their utilization value. When using a CPUFreq governor other than schedutil, the chances of this assumption being true are small, if any. When schedutil is being used, EAS' predictions are at least consistent with the frequency requests. Although those requests have no guarantees to be honored by the hardware, they should at least guide DVFS in the right direction and provide some hope in regards to the EAS model being accurate. To make sure EAS is only used in a sane configuration, create a strong dependency on schedutil being used. Since having sugov compiled-in does not provide that guarantee, make CPUFreq call a scheduler function on governor changes hence letting it rebuild the scheduling domains, check the governors of the online CPUs, and enable/disable EAS accordingly. Change-Id: I872949134f97d2772fc681b7393eaed7f0e224f2 Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Quentin Perret <quentin.perret@arm.com> Message-Id: <20181016101513.26919-9-quentin.perret@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-10-26 11:47:11 +01:00
Quentin Perret	f3e45428b7	FROMLIST: sched/cpufreq: Prepare schedutil for Energy Aware Scheduling Schedutil requests frequency by aggregating utilization signals from the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of them. Since Energy Aware Scheduling (EAS) needs to be able to predict the frequency requests, it needs to forecast the decisions made by the governor. In order to prepare the introduction of EAS, introduce schedutil_freq_util() to centralize the aforementioned signal aggregation and make it available to both schedutil and EAS. Since frequency selection and energy estimation still need to deal with RT and DL signals slightly differently, schedutil_freq_util() is called with a different 'type' parameter in those two contexts, and returns an aggregated utilization signal accordingly. While at it, introduce the map_util_freq() function which is designed to make schedutil's 25% margin usable easily for both sugov and EAS. As EAS will be able to predict schedutil's frequency requests more accurately than any other governor by design, it'd be sensible to make sure EAS cannot be used without schedutil. This will be done later, once EAS has actually been introduced. Change-Id: Idbeeb00926045507b73f9cba37630b38ae0816c0 Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Quentin Perret <quentin.perret@arm.com> Message-Id: <20181016101513.26919-3-quentin.perret@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-10-26 11:47:10 +01:00
Quentin Perret	30f6e1b5c9	FROMLIST: sched: Relocate arch_scale_cpu_capacity By default, arch_scale_cpu_capacity() is only visible from within the kernel/sched folder. Relocate it to include/linux/sched/topology.h to make it visible to other clients needing to know about the capacity of CPUs, such as the Energy Model framework. Change-Id: I144c7299e122201dbcadc431d55d0a6d24d90005 Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Quentin Perret <quentin.perret@arm.com> Message-Id: <20181016101513.26919-2-quentin.perret@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com>	2018-10-26 11:47:10 +01:00
Morten Rasmussen	f2d1d0fea1	UPSTREAM: sched/topology: Add SD_ASYM_CPUCAPACITY flag detection The SD_ASYM_CPUCAPACITY sched_domain flag is supposed to mark the sched_domain in the hierarchy where all CPU capacities are visible for any CPU's point of view on asymmetric CPU capacity systems. The scheduler can then take to take capacity asymmetry into account when balancing at this level. It also serves as an indicator for how wide task placement heuristics have to search to consider all available CPU capacities as asymmetric systems might often appear symmetric at smallest level(s) of the sched_domain hierarchy. The flag has been around for while but so far only been set by out-of-tree code in Android kernels. One solution is to let each architecture provide the flag through a custom sched_domain topology array and associated mask and flag functions. However, SD_ASYM_CPUCAPACITY is special in the sense that it depends on the capacity and presence of all CPUs in the system, i.e. when hotplugging all CPUs out except those with one particular CPU capacity the flag should disappear even if the sched_domains don't collapse. Similarly, the flag is affected by cpusets where load-balancing is turned off. Detecting when the flags should be set therefore depends not only on topology information but also the cpuset configuration and hotplug state. The arch code doesn't have easy access to the cpuset configuration. Instead, this patch implements the flag detection in generic code where cpusets and hotplug state is already taken care of. All the arch is responsible for is to implement arch_scale_cpu_capacity() and force a full rebuild of the sched_domain hierarchy if capacities are updated, e.g. later in the boot process when cpufreq has initialized. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1532093554-30504-2-git-send-email-morten.rasmussen@arm.com [ Fixed 'CPU' capitalization. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit `05484e0984`) Signed-off-by: Quentin Perret <quentin.perret@arm.com> Change-Id: I1d5f695a95f8d023f1ecf14ecb71a558ceb67ed6	2018-10-26 11:44:43 +01:00
Linus Torvalds	cd9b44f907	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - the rest of MM - procfs updates - various misc things - more y2038 fixes - get_maintainer updates - lib/ updates - checkpatch updates - various epoll updates - autofs updates - hfsplus - some reiserfs work - fatfs updates - signal.c cleanups - ipc/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits) ipc/util.c: update return value of ipc_getref from int to bool ipc/util.c: further variable name cleanups ipc: simplify ipc initialization ipc: get rid of ids->tables_initialized hack lib/rhashtable: guarantee initial hashtable allocation lib/rhashtable: simplify bucket_table_alloc() ipc: drop ipc_lock() ipc/util.c: correct comment in ipc_obtain_object_check ipc: rename ipcctl_pre_down_nolock() ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid() ipc: reorganize initialization of kern_ipc_perm.seq ipc: compute kern_ipc_perm.id under the ipc lock init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp adfs: use timespec64 for time conversion kernel/sysctl.c: fix typos in comments drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md fork: don't copy inconsistent signal handler state to child signal: make get_signal() return bool signal: make sigkill_pending() return bool ...	2018-08-22 12:34:08 -07:00
Christian Brauner	52cba1a274	signal: make force_sigsegv() void Patch series "signal: refactor some functions", v3. This series refactors a bunch of functions in signal.c to simplify parts of the code. The greatest single change is declaring the static do_sigpending() helper as void which makes it possible to remove a bunch of unnecessary checks in the syscalls later on. This patch (of 17): force_sigsegv() returned 0 unconditionally so it doesn't make sense to have it return at all. In addition, there are no callers that check force_sigsegv()'s return value. Link: http://lkml.kernel.org/r/20180602103653.18181-2-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Morris <james.morris@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:50 -07:00
Dmitry Vyukov	a2e5144538	kernel/hung_task.c: allow to set checking interval separately from timeout Currently task hung checking interval is equal to timeout, as the result hung is detected anywhere between timeout and 2*timeout. This is fine for most interactive environments, but this hurts automated testing setups (syzbot). In an automated setup we need to strictly order CPU lockup < RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall is not detected as task hung and task hung is not detected as silent machine loss. The large variance in task hung detection timeout requires setting silent machine loss timeout to a very large value (e.g. if task hung is 3 mins, then silent loss need to be set to ~7 mins). The additional 3 minutes significantly reduce testing efficiency because usually we crash kernel within a minute, and this can add hours to bug localization process as it needs to do dozens of tests. Allow setting checking interval separately from timeout. This allows to set timeout to, say, 3 minutes, but checking interval to 10 secs. The interval is controlled via a new hung_task_check_interval_secs sysctl, similar to the existing hung_task_timeout_secs sysctl. The default value of 0 results in the current behavior: checking interval is equal to timeout. [akpm@linux-foundation.org: update hung_task_timeout_max's comment] Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:47 -07:00
Sebastian Andrzej Siewior	fc37191272	userns: use refcount_t for reference counting instead atomic_t refcount_t type and corresponding API should be used instead of atomic_t wh en the variable is used as a reference counter. This avoids accidental refcounter overflows that might lead to use-after-free situations. Link: http://lkml.kernel.org/r/20180703200141.28415-6-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-22 10:52:46 -07:00
Linus Torvalds	0214f46b3a	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ...	2018-08-21 13:47:29 -07:00
Shakeel Butt	d46eb14b73	fs: fsnotify: account fsnotify metadata to kmemcg Patch series "Directed kmem charging", v8. The Linux kernel's memory cgroup allows limiting the memory usage of the jobs running on the system to provide isolation between the jobs. All the kernel memory allocated in the context of the job and marked with __GFP_ACCOUNT will also be included in the memory usage and be limited by the job's limit. The kernel memory can only be charged to the memcg of the process in whose context kernel memory was allocated. However there are cases where the allocated kernel memory should be charged to the memcg different from the current processes's memcg. This patch series contains two such concrete use-cases i.e. fsnotify and buffer_head. The fsnotify event objects can consume a lot of system memory for large or unlimited queues if there is either no or slow listener. The events are allocated in the context of the event producer. However they should be charged to the event consumer. Similarly the buffer_head objects can be allocated in a memcg different from the memcg of the page for which buffer_head objects are being allocated. To solve this issue, this patch series introduces mechanism to charge kernel memory to a given memcg. In case of fsnotify events, the memcg of the consumer can be used for charging and for buffer_head, the memcg of the page can be charged. For directed charging, the caller can use the scope API memalloc_[un]use_memcg() to specify the memcg to charge for all the __GFP_ACCOUNT allocations within the scope. This patch (of 2): A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it] Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-17 16:20:30 -07:00
Eric W. Biederman	c3ad2c3b02	signal: Don't restart fork when signals come in. Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn> report that a periodic signal received during fork can cause fork to continually restart preventing an application from making progress. The code was being overly pessimistic. Fork needs to guarantee that a signal sent to multiple processes is logically delivered before the fork and just to the forking process or logically delivered after the fork to both the forking process and it's newly spawned child. For signals like periodic timers that are always delivered to a single process fork can safely complete and let them appear to logically delivered after the fork(). While examining this issue I also discovered that fork today will miss signals delivered to multiple processes during the fork and handled by another thread. Similarly the current code will also miss blocked signals that are delivered to multiple process, as those signals will not appear pending during fork. Add a list of each thread that is currently forking, and keep on that list a signal set that records all of the signals sent to multiple processes. When fork completes initialize the new processes shared_pending signal set with it. The calculate_sigpending function will see those signals and set TIF_SIGPENDING causing the new task to take the slow path to userspace to handle those signals. Making it appear as if those signals were received immediately after the fork. It is not possible to send real time signals to multiple processes and exceptions don't go to multiple processes, which means that that are no signals sent to multiple processes that require siginfo. This means it is safe to not bother collecting siginfo on signals sent during fork. The sigaction of a child of fork is initially the same as the sigaction of the parent process. So a signal the parent ignores the child will also initially ignore. Therefore it is safe to ignore signals sent to multiple processes and ignored by the forking process. Signals sent to only a single process or only a single thread and delivered during fork are treated as if they are received after the fork, and generally not dealt with. They won't cause any problems. V2: Added removal from the multiprocess list on failure. V3: Use -ERESTARTNOINTR directly V4: - Don't queue both SIGCONT and SIGSTOP - Initialize signal_struct.multiprocess in init_task - Move setting of shared_pending to before the new task is visible to signals. This prevents signals from comming in before shared_pending.signal is set to delayed.signal and being lost. V5: - rework list add and delete to account for idle threads v6: - Use sigdelsetmask when removing stop signals Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447 Reported-by: Wen Yang <wen.yang99@zte.com.cn> and Reported-by: majiang <ma.jiang@zte.com.cn> Fixes: `4a2c7a7837` ("[PATCH] make fork() atomic wrt pgrp/session signals") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-08-09 13:07:01 -05:00
Eric W. Biederman	924de3b8c9	fork: Have new threads join on-going signal group stops There are only two signals that are delivered to every member of a signal group: SIGSTOP and SIGKILL. Signal delivery requires every signal appear to be delivered either before or after a clone syscall. SIGKILL terminates the clone so does not need to be considered. Which leaves only SIGSTOP that needs to be considered when creating new threads. Today in the event of a group stop TIF_SIGPENDING will get set and the fork will restart ensuring the fork syscall participates in the group stop. A fork (especially of a process with a lot of memory) is one of the most expensive system so we really only want to restart a fork when necessary. It is easy so check to see if a SIGSTOP is ongoing and have the new thread join it immediate after the clone completes. Making it appear the clone completed happened just before the SIGSTOP. The calculate_sigpending function will see the bits set in jobctl and set TIF_SIGPENDING to ensure the new task takes the slow path to userspace. V2: The call to task_join_group_stop was moved before the new task is added to the thread group list. This should not matter as sighand->siglock is held over both the addition of the threads, the call to task_join_group_stop and do_signal_stop. But the change is trivial and it is one less thing to worry about when reading the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-08-03 20:20:14 -05:00
Eric W. Biederman	088fe47ce9	signal: Add calculate_sigpending() Add a function calculate_sigpending to test to see if any signals are pending for a new task immediately following fork. Signals have to happen either before or after fork. Today our practice is to push all of the signals to before the fork, but that has the downside that frequent or periodic signals can make fork take much much longer than normal or prevent fork from completing entirely. So we need move signals that we can after the fork to prevent that. This updates the code to set TIF_SIGPENDING on a new task if there are signals or other activities that have moved so that they appear to happen after the fork. As the code today restarts if it sees any such activity this won't immediately have an effect, as there will be no reason for it to set TIF_SIGPENDING immediately after the fork. Adding calculate_sigpending means the code in fork can safely be changed to not always restart if a signal is pending. The new calculate_sigpending function sets sigpending if there are pending bits in jobctl, pending signals, the freezer needs to freeze the new task or the live kernel patching framework need the new thread to take the slow path to userspace. I have verified that setting TIF_SIGPENDING does make a new process take the slow path to userspace before it executes it's first userspace instruction. I have looked at the callers of signal_wake_up and the code paths setting TIF_SIGPENDING and I don't see anything else that needs to be handled. The code probably doesn't need to set TIF_SIGPENDING for the kernel live patching as it uses a separate thread flag as well. But at this point it seems safer reuse the recalc_sigpending logic and get the kernel live patching folks to sort out their story later. V2: I have moved the test into schedule_tail where siglock can be grabbed and recalc_sigpending can be reused directly. Further as the last action of setting up a new task this guarantees that TIF_SIGPENDING will be properly set in the new process. The helper calculate_sigpending takes the siglock and uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending clear TIF_SIGPENDING if it is unnecessary. This allows reusing the existing code and keeps maintenance of the conditions simple. Oleg Nesterov <oleg@redhat.com> suggested the movement and pointed out the need to take siglock if this code was going to be called while the new task is discoverable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-08-03 20:10:31 -05:00
Ingo Molnar	4765096f4f	Merge branch 'sched/urgent' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-07-25 11:29:58 +02:00
Al Viro	f88a333b44	alpha: fix osf_wait4() breakage kernel_wait4() expects a userland address for status - it's only rusage that goes as a kernel one (and needs a copyout afterwards) [ Also, fix the prototype of kernel_wait4() to have that __user annotation - Linus ] Fixes: `92ebce5ac5` ("osf_wait4: switch to kernel_wait4()") Cc: stable@kernel.org # v4.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-07-22 11:51:30 -07:00
Eric W. Biederman	24122c7f49	signal: Pass pid and pid type into send_sigqueue Make the code more maintainable by performing more of the signal related work in send_sigqueue. A quick inspection of do_timer_create will show that this code path does not lookup a thread group by a thread's pid. Making it safe to find the task pointed to by it_pid with "pid_task(it_pid, type)"; This supports the changes needed in fork to tell if a signal was sent to a single process or a group of processes. Having the pid to task transition in signal.c will also make it easier to sort out races with de_thread and and the thread group leader exiting when it comes time to address that. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Eric W. Biederman	6883f81aac	pid: Implement PIDTYPE_TGID Everywhere except in the pid array we distinguish between a tasks pid and a tasks tgid (thread group id). Even in the enumeration we want that distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid we almost have an implementation of PIDTYPE_TGID in struct signal_struct. Add PIDTYPE_TGID as a first class member of the pid_type enumeration and into the pids array. Then remove the __PIDTYPE_TGID special case and the leader_pid in signal_struct. The net size increase is just an extra pointer added to struct pid and an extra pair of pointers of an hlist_node added to task_struct. The effect on code maintenance is the removal of a number of special cases today and the potential to remove many more special cases as PIDTYPE_TGID gets used to it's fullest. The long term potential is allowing zombie thread group leaders to exit, which will remove a lot more special cases in the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Eric W. Biederman	2c4704756c	pids: Move the pgrp and session pid pointers from task_struct to signal_struct To access these fields the code always has to go to group leader so going to signal struct is no loss and is actually a fundamental simplification. This saves a little bit of memory by only allocating the pid pointer array once instead of once for every thread, and even better this removes a few potential races caused by the fact that group_leader can be changed by de_thread, while signal_struct can not. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Eric W. Biederman	7a36094d61	pids: Compute task_tgid using signal->leader_pid The cost is the the same and this removes the need to worry about complications that come from de_thread and group_leader changing. __task_pid_nr_ns has been updated to take advantage of this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Eric W. Biederman	1fb53567a3	pids: Move task_pid_type into sched/signal.h The function is general and inline so there is no need to hide it inside of exit.c Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Vincent Guittot	5fd778915a	sched/sysctl: Remove unused sched_time_avg_ms sysctl /proc/sys/kernel/sched_time_avg_ms entry is not used anywhere, remove it. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: claudio@evidence.eu.com Cc: daniel.lezcano@linaro.org Cc: dietmar.eggemann@arm.com Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: luca.abeni@santannapisa.it Cc: patrick.bellasi@arm.com Cc: quentin.perret@arm.com Cc: rjw@rjwysocki.net Cc: valentin.schneider@arm.com Cc: viresh.kumar@linaro.org Link: http://lkml.kernel.org/r/1530200714-4504-12-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-07-16 00:16:29 +02:00
Omar Sandoval	93781325da	lockdep: fix fs_reclaim annotation While revisiting my Btrfs swapfile series [1], I introduced a situation in which reclaim would lock i_rwsem, and even though the swapon() path clearly made GFP_KERNEL allocations while holding i_rwsem, I got no complaints from lockdep. It turns out that the rework of the fs_reclaim annotation was broken: if the current task has PF_MEMALLOC set, we don't acquire the dummy fs_reclaim lock, but when reclaiming we always check this _after_ we've just set the PF_MEMALLOC flag. In most cases, we can fix this by moving the fs_reclaim_{acquire,release}() outside of the memalloc_noreclaim_{save,restore}(), althought kswapd is slightly different. After applying this, I got the expected lockdep splats. 1: https://lwn.net/Articles/625412/ Link: http://lkml.kernel.org/r/9f8aa70652a98e98d7c4de0fc96a4addcee13efe.1523778026.git.osandov@fb.com Fixes: `d92a8cfcb3` ("locking/lockdep: Rework FS_RECLAIM annotation") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-06-07 17:34:35 -07:00
Linus Torvalds	eeee3149aa	There's been a fair amount of work in the docs tree this time around, including: - Extensive RST conversions and organizational work in the memory-management docs thanks to Mike Rapoport. - An update of Documentation/features from Andrea Parri and a script to keep it updated. - Various LICENSES updates from Thomas, along with a script to check SPDX tags. - Work to fix dangling references to documentation files; this involved a fair number of one-liner comment changes outside of Documentation/ ...and the usual list of documentation improvements, typo fixes, etc. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbFTkKAAoJEI3ONVYwIuV6t24P/0K9qltHkLwsBo2fbGu/emem mb1QrZCFZGebKVrCIvET3YcT0q0xPW+ZldwMQYEUeCcu/vD3cGHGXlDbVJCa1fFD 2OS10W/sEObPnREtlHO/zAzpapKP9DO1/f6NhO55iBJLGOCgoLL5xvSqgsI8MTGd vcJDXLitkh4CJEcfNLkQt8dEZzq9Tb6wdSFIvZBBXRNon2ItVN92D5xoQ0wtB+qt KmcGYofajK9bjtZpnC4iNg3i+zdwkd80bGTEN9f0hJTRZK5emCILk8fip8CMhRuB iwmcqb2RnMLydNLyK9RSs6OS5z3G4fYu9llRtLlZBAupcjRVpalWaBGxLOVO6jBG mvkqdKPMtxV4c7NvwKwFQL9dcjtxsxO4RDRYVWN82dS1L6WKKk8UvTuJUBLH0YA5 af7ZKn7mJVhJ1cxPblaEBOBM3oQuk57LLkjmcpMOXyJ/IOkTIuV1Ezht+XzFyFQv VWSyekiKo+8D6WHACPTaWiizjW15e8CyP+WIhKzJyn7VQQrZwhsOS+R//ITsuvQ0 vRdZ20lwUeBhR+mnXd5NfIo2w7G+OiqiREVAgxjgRrS0PnkzWG7lzzcSVU8HTfT4 S7VXqval2a9Xg+N8aU2JUe49W858J8hKvIa98hBxGoZa84wxOGtEo7pIKhnMwMSe Uhkh/1/bQMxsK3fBEF74 =I6FG -----END PGP SIGNATURE----- Merge tag 'docs-4.18' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "There's been a fair amount of work in the docs tree this time around, including: - Extensive RST conversions and organizational work in the memory-management docs thanks to Mike Rapoport. - An update of Documentation/features from Andrea Parri and a script to keep it updated. - Various LICENSES updates from Thomas, along with a script to check SPDX tags. - Work to fix dangling references to documentation files; this involved a fair number of one-liner comment changes outside of Documentation/ ... and the usual list of documentation improvements, typo fixes, etc" * tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits) Documentation: document hung_task_panic kernel parameter docs/admin-guide/mm: add high level concepts overview docs/vm: move ksm and transhuge from "user" to "internals" section. docs: Use the kerneldoc comments for memalloc_no*() doc: document scope NOFS, NOIO APIs docs: update kernel versions and dates in tables docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge docs/vm: transhuge: minor updates docs/vm: transhuge: change sections order Documentation: arm: clean up Marvell Berlin family info Documentation: gpio: driver: Fix a typo and some odd grammar docs: ranoops.rst: fix location of ramoops.txt scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode docs: uio-howto.rst: use a code block to solve a warning mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback w1: w1_io.c: fix a kernel-doc warning Documentation/process/posting: wrap text at 80 cols docs: admin-guide: add cgroup-v2 documentation Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'" Documentation: refcount-vs-atomic: Update reference to LKMM doc. ...	2018-06-04 12:34:27 -07:00
Michal Hocko	46ca359955	doc: document scope NOFS, NOIO APIs Although the api is documented in the source code Ted has pointed out that there is no mention in the core-api Documentation and there are people looking there to find answers how to use a specific API. Requested-by: "Theodore Y. Ts'o" <tytso@mit.edu> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2018-05-29 06:45:42 -06:00
Peter Zijlstra	b5bf9a90bb	sched/core: Introduce set_special_state() Gaurav reported a perceived problem with TASK_PARKED, which turned out to be a broken wait-loop pattern in __kthread_parkme(), but the reported issue can (and does) in fact happen for states that do not do condition based sleeps. When the 'current->state = TASK_RUNNING' store of a previous (concurrent) try_to_wake_up() collides with the setting of a 'special' sleep state, we can loose the sleep state. Normal condition based wait-loops are immune to this problem, but for sleep states that are not condition based are subject to this problem. There already is a fix for TASK_DEAD. Abstract that and also apply it to TASK_STOPPED and TASK_TRACED, both of which are also without condition based wait-loop. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-05-04 07:54:54 +02:00
Jonathan Corbet	24844fd339	Merge branch 'mm-rst' into docs-next Mike Rapoport says: These patches convert files in Documentation/vm to ReST format, add an initial index and link it to the top level documentation. There are no contents changes in the documentation, except few spelling fixes. The relatively large diffstat stems from the indentation and paragraph wrapping changes. I've tried to keep the formatting as consistent as possible, but I could miss some places that needed markup and add some markup where it was not necessary. [jc: significant conflicts in vm/hmm.rst]	2018-04-16 14:25:08 -06:00

1 2 3 4 5

243 commits