Commit graph

31,949 commits

Author SHA1 Message Date
Linus Torvalds
6d277aca48 Power management updates for 5.6-rc1
- Update the ACPI processor driver in order to export
    acpi_processor_evaluate_cst() to the code outside of it, add
    ACPI support to the intel_idle driver based on that and clean
    up that driver somewhat (Rafael Wysocki).
 
  - Add an admin guide document for the intel_idle driver (Rafael
    Wysocki).
 
  - Clean up cpuidle core and drivers, enable compilation testing
    for some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
    Wysocki, Yangtao Li).
 
  - Fix reference counting of OPP (operating performance points) table
    structures (Viresh Kumar).
 
  - Add support for CPR (Core Power Reduction) to the AVS (Adaptive
    Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
    YueHaibing).
 
  - Add support for TigerLake Mobile and JasperLake to the Intel RAPL
    power capping driver (Zhang Rui).
 
  - Update cpufreq drivers:
 
    * Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
 
    * Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
 
    * Fix cpufreq policy reference counting issues in s3c and
      brcmstb-avs (chenqiwu).
 
    * Fix ACPI table reference counting issue and HiSilicon quirk
      handling in the CPPC driver (Hanjun Guo).
 
    * Clean up spelling mistake in intel_pstate (Harry Pan).
 
    * Convert the kirkwood and tegra186 drivers to using
      devm_platform_ioremap_resource() (Yangtao Li).
 
  - Update devfreq core:
 
    * Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
 
    * Clean up the handing of transition statistics and allow them
      to be reset by writing 0 to the 'trans_stat' devfreq device
      attribute in sysfs (Kamil Konieczny).
 
    * Add 'devfreq_summary' to debugfs (Chanwoo Choi).
 
    * Clean up kerneldoc comments and Kconfig indentation (Krzysztof
      Kozlowski, Randy Dunlap).
 
  - Update devfreq drivers:
 
    * Add dynamic scaling for the imx8m DDR controller and clean up
      imx8m-ddrc (Leonard Crestez, YueHaibing).
 
    * Fix DT node reference counting and nitialization error code path
      in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
      for it (Chanwoo Choi, Yangtao Li).
 
    * Fix DT node reference counting in rockchip-dfi and make it use
      devm_platform_ioremap_resource() (Yangtao Li).
 
    * Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
 
    * Fix initialization error code paths in exynos-bus (Yangtao Li).
 
    * Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
      Kozlowski).
 
  - Add tracepoints for tracking usage_count updates unrelated to
    status changes in PM-runtime (Michał Mirosław).
 
  - Add sysfs attribute to control the "sync on suspend" behavior
    during system-wide suspend (Jonas Meurer).
 
  - Switch system-wide suspend tests over to 64-bit time (Alexandre
    Belloni).
 
  - Make wakeup sources statistics in debugfs cover deleted ones which
    used to be the case some time ago (zhuguangqing).
 
  - Clean up computations carried out during hibernation, update
    messages related to hibernation and fix a spelling mistake in one
    of them (Wen Yang, Luigi Semenzato, Colin Ian King).
 
  - Add mailmap entry for maintainer e-mail address that has not been
    functional for several years (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl4u2fESHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxvlkP/j5vDzyNUNJjnD6+897c8W+z5dwdiQfU
 QNtoopFXgw/fpOhGXRdj2mA4e6RtpU9aCCiHR6/qdh3/1qSnR5Y9R/51/gmdkwhY
 YakSxmgpgGrOJru94ApI1o/35eWwN/GxjajbfNY5ScrPQl/L0DF3iJWRsAOR5534
 p9e2gQqKecoE+MEn5JcGAXApA5xBLXuUmtWPUn5UGyhaz+jdmsf1zkDEOEvxREay
 hLGH1y6BY8HS/jytyNzISs9iDeBvg2fHmG8SskDiXVMke5sHBTU9MilgpnCFfQ0l
 OF/eNnTXTU7mAJhlnjBUt2rIe5peGSuhgg+Ur7s86xYqbj2SfsVM4UHjU0A6t9Jm
 sauWQh/Nbzw6XaCNzYKxP+dREAg0g/aq7xFqQi3bWx7YvzLk/hvNWi2+bv3adzx7
 Z3fvOki4xMXzLLrh0f1ipC8BKTsdioDZPAy06B80a0luv6ROdr6bPL7did14mWt2
 eCuPuZyXKhdV+PkjZHF+c4XT7N9NfGtE0WUQf54Q4VT00hDagGDliwXpm4ht1pjJ
 iO7uUJevXKSxMaV2xPZ+nWZaOeCVrMMTA1Ec1ELgC1n8WROZJ+SfhehgMQGp7BHS
 Hz4QO1HjTsCDnT+OU7JFeCRrkyXIlh75MOndWOOH6eTEXCAI9PihstB+UGXeNsK0
 BesNQz1sYY1O
 =g48u
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add ACPI support to the intel_idle driver along with an admin
  guide document for it, add support for CPR (Core Power Reduction) to
  the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support
  in a few places, add some new sysfs attributes, debugfs files and
  tracepoints, fix bugs and clean up a bunch of things all over.

  Specifics:

   - Update the ACPI processor driver in order to export
     acpi_processor_evaluate_cst() to the code outside of it, add ACPI
     support to the intel_idle driver based on that and clean up that
     driver somewhat (Rafael Wysocki).

   - Add an admin guide document for the intel_idle driver (Rafael
     Wysocki).

   - Clean up cpuidle core and drivers, enable compilation testing for
     some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael
     Wysocki, Yangtao Li).

   - Fix reference counting of OPP (operating performance points) table
     structures (Viresh Kumar).

   - Add support for CPR (Core Power Reduction) to the AVS (Adaptive
     Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King,
     YueHaibing).

   - Add support for TigerLake Mobile and JasperLake to the Intel RAPL
     power capping driver (Zhang Rui).

   - Update cpufreq drivers:
      - Add i.MX8MP support to imx-cpufreq-dt (Anson Huang).
      - Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva).
      - Fix cpufreq policy reference counting issues in s3c and
        brcmstb-avs (chenqiwu).
      - Fix ACPI table reference counting issue and HiSilicon quirk
        handling in the CPPC driver (Hanjun Guo).
      - Clean up spelling mistake in intel_pstate (Harry Pan).
      - Convert the kirkwood and tegra186 drivers to using
        devm_platform_ioremap_resource() (Yangtao Li).

   - Update devfreq core:
      - Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi).
      - Clean up the handing of transition statistics and allow them to
        be reset by writing 0 to the 'trans_stat' devfreq device
        attribute in sysfs (Kamil Konieczny).
      - Add 'devfreq_summary' to debugfs (Chanwoo Choi).
      - Clean up kerneldoc comments and Kconfig indentation (Krzysztof
        Kozlowski, Randy Dunlap).

   - Update devfreq drivers:
      - Add dynamic scaling for the imx8m DDR controller and clean up
        imx8m-ddrc (Leonard Crestez, YueHaibing).
      - Fix DT node reference counting and nitialization error code path
        in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency
        for it (Chanwoo Choi, Yangtao Li).
      - Fix DT node reference counting in rockchip-dfi and make it use
        devm_platform_ioremap_resource() (Yangtao Li).
      - Fix excessive stack usage in exynos-ppmu (Arnd Bergmann).
      - Fix initialization error code paths in exynos-bus (Yangtao Li).
      - Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof
        Kozlowski).

   - Add tracepoints for tracking usage_count updates unrelated to
     status changes in PM-runtime (Michał Mirosław).

   - Add sysfs attribute to control the "sync on suspend" behavior
     during system-wide suspend (Jonas Meurer).

   - Switch system-wide suspend tests over to 64-bit time (Alexandre
     Belloni).

   - Make wakeup sources statistics in debugfs cover deleted ones which
     used to be the case some time ago (zhuguangqing).

   - Clean up computations carried out during hibernation, update
     messages related to hibernation and fix a spelling mistake in one
     of them (Wen Yang, Luigi Semenzato, Colin Ian King).

   - Add mailmap entry for maintainer e-mail address that has not been
     functional for several years (Rafael Wysocki)"

* tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits)
  cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
  intel_idle: Clean up irtl_2_usec()
  intel_idle: Move 3 functions closer to their callers
  intel_idle: Annotate initialization code and data structures
  intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
  intel_idle: Rearrange intel_idle_cpuidle_driver_init()
  intel_idle: Clean up NULL pointer check in intel_idle_init()
  intel_idle: Fold intel_idle_probe() into intel_idle_init()
  intel_idle: Eliminate __setup_broadcast_timer()
  cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings
  cpuidle: sysfs: fix warnings when compiling with W=1
  cpuidle: coupled: fix warnings when compiling with W=1
  cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
  PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
  PM / devfreq: Add debugfs support with devfreq_summary file
  Documentation: admin-guide: PM: Add intel_idle document
  cpuidle: arm: Enable compile testing for some of drivers
  PM-runtime: add tracepoints for usage_count changes
  cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
  PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
  ...
2020-01-27 11:23:54 -08:00
Linus Torvalds
0238d3c753 arm64 updates for 5.6
- New architecture features
 	* Support for Armv8.5 E0PD, which benefits KASLR in the same way as
 	  KPTI but without the overhead. This allows KPTI to be disabled on
 	  CPUs that are not affected by Meltdown, even is KASLR is enabled.
 
 	* Initial support for the Armv8.5 RNG instructions, which claim to
 	  provide access to a high bandwidth, cryptographically secure hardware
 	  random number generator. As well as exposing these to userspace, we
 	  also use them as part of the KASLR seed and to seed the crng once
 	  all CPUs have come online.
 
 	* Advertise a bunch of new instructions to userspace, including support
 	  for Data Gathering Hint, Matrix Multiply and 16-bit floating point.
 
 - Kexec
 	* Cleanups in preparation for relocating with the MMU enabled
 	* Support for loading crash dump kernels with kexec_file_load()
 
 - Perf and PMU drivers
 	* Cleanups and non-critical fixes for a couple of system PMU drivers
 
 - FPU-less (aka broken) CPU support
 	* Considerable fixes to support CPUs without the FP/SIMD extensions,
 	  including their presence in heterogeneous systems. Good luck finding
 	  a 64-bit userspace that handles this.
 
 - Modern assembly function annotations
 	* Start migrating our use of ENTRY() and ENDPROC() over to the
 	  new-fangled SYM_{CODE,FUNC}_{START,END} macros, which are intended to
 	  aid debuggers
 
 - Kbuild
 	* Cleanup detection of LSE support in the assembler by introducing
 	  'as-instr'
 
 	* Remove compressed Image files when building clean targets
 
 - IP checksumming
 	* Implement optimised IPv4 checksumming routine when hardware offload
 	  is not in use. An IPv6 version is in the works, pending testing.
 
 - Hardware errata
 	* Work around Cortex-A55 erratum #1530923
 
 - Shadow call stack
 	* Work around some issues with Clang's integrated assembler not liking
 	  our perfectly reasonable assembly code
 
 	* Avoid allocating the X18 register, so that it can be used to hold the
 	  shadow call stack pointer in future
 
 - ACPI
 	* Fix ID count checking in IORT code. This may regress broken firmware
 	  that happened to work with the old implementation, in which case we'll
 	  have to revert it and try something else
 
 	* Fix DAIF corruption on return from GHES handler with pseudo-NMIs
 
 - Miscellaneous
 	* Whitelist some CPUs that are unaffected by Spectre-v2
 
 	* Reduce frequency of ASID rollover when KPTI is compiled in but
 	  inactive
 
 	* Reserve a couple of arch-specific PROT flags that are already used by
 	  Sparc and PowerPC and are planned for later use with BTI on arm64
 
 	* Preparatory cleanup of our entry assembly code in preparation for
 	  moving more of it into C later on
 
 	* Refactoring and cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl4oY+IQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNNfRB/4p3vax0hqaOnLRvmJPRXF31B8oPlivnr2u
 6HCA9LkdU5IlrgaTNOJ/sQEqJAPOPCU7v49Ol0iYw0iKL1suUE7Ikui5VB6Uybqt
 YbfF5UNzfXAMs2A86TF/hzqhxw+W+lpnZX8NVTuQeAODfHEGUB1HhTLfRi9INsER
 wKEAuoZyuSUibxTFvji+DAq7nVRniXX7CM7tE385pxDisCMuu/7E5wOl+3EZYXWz
 DTGzTbHXuVFL+UFCANFEUlAtmr3dQvPFIqAwVl/CxjRJjJ7a+/G3cYLsHFPrQCjj
 qYX4kfhAeeBtqmHL7YFNWFwFs5WaT5UcQquFO665/+uCTWSJpORY
 =AIh/
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The changes are a real mixed bag this time around.

  The only scary looking one from the diffstat is the uapi change to
  asm-generic/mman-common.h, but this has been acked by Arnd and is
  actually just adding a pair of comments in an attempt to prevent
  allocation of some PROT values which tend to get used for
  arch-specific purposes. We'll be using them for Branch Target
  Identification (a CFI-like hardening feature), which is currently
  under review on the mailing list.

  New architecture features:

   - Support for Armv8.5 E0PD, which benefits KASLR in the same way as
     KPTI but without the overhead. This allows KPTI to be disabled on
     CPUs that are not affected by Meltdown, even is KASLR is enabled.

   - Initial support for the Armv8.5 RNG instructions, which claim to
     provide access to a high bandwidth, cryptographically secure
     hardware random number generator. As well as exposing these to
     userspace, we also use them as part of the KASLR seed and to seed
     the crng once all CPUs have come online.

   - Advertise a bunch of new instructions to userspace, including
     support for Data Gathering Hint, Matrix Multiply and 16-bit
     floating point.

  Kexec:

   - Cleanups in preparation for relocating with the MMU enabled

   - Support for loading crash dump kernels with kexec_file_load()

  Perf and PMU drivers:

   - Cleanups and non-critical fixes for a couple of system PMU drivers

  FPU-less (aka broken) CPU support:

   - Considerable fixes to support CPUs without the FP/SIMD extensions,
     including their presence in heterogeneous systems. Good luck
     finding a 64-bit userspace that handles this.

  Modern assembly function annotations:

   - Start migrating our use of ENTRY() and ENDPROC() over to the
     new-fangled SYM_{CODE,FUNC}_{START,END} macros, which are intended
     to aid debuggers

  Kbuild:

   - Cleanup detection of LSE support in the assembler by introducing
     'as-instr'

   - Remove compressed Image files when building clean targets

  IP checksumming:

   - Implement optimised IPv4 checksumming routine when hardware offload
     is not in use. An IPv6 version is in the works, pending testing.

  Hardware errata:

   - Work around Cortex-A55 erratum #1530923

  Shadow call stack:

   - Work around some issues with Clang's integrated assembler not
     liking our perfectly reasonable assembly code

   - Avoid allocating the X18 register, so that it can be used to hold
     the shadow call stack pointer in future

  ACPI:

   - Fix ID count checking in IORT code. This may regress broken
     firmware that happened to work with the old implementation, in
     which case we'll have to revert it and try something else

   - Fix DAIF corruption on return from GHES handler with pseudo-NMIs

  Miscellaneous:

   - Whitelist some CPUs that are unaffected by Spectre-v2

   - Reduce frequency of ASID rollover when KPTI is compiled in but
     inactive

   - Reserve a couple of arch-specific PROT flags that are already used
     by Sparc and PowerPC and are planned for later use with BTI on
     arm64

   - Preparatory cleanup of our entry assembly code in preparation for
     moving more of it into C later on

   - Refactoring and cleanup"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (73 commits)
  arm64: acpi: fix DAIF manipulation with pNMI
  arm64: kconfig: Fix alignment of E0PD help text
  arm64: Use v8.5-RNG entropy for KASLR seed
  arm64: Implement archrandom.h for ARMv8.5-RNG
  arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
  arm64: entry: Avoid empty alternatives entries
  arm64: Kconfig: select HAVE_FUTEX_CMPXCHG
  arm64: csum: Fix pathological zero-length calls
  arm64: entry: cleanup sp_el0 manipulation
  arm64: entry: cleanup el0 svc handler naming
  arm64: entry: mark all entry code as notrace
  arm64: assembler: remove smp_dmb macro
  arm64: assembler: remove inherit_daif macro
  ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()
  mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use
  arm64: Use macros instead of hard-coded constants for MAIR_EL1
  arm64: Add KRYO{3,4}XX CPU cores to spectre-v2 safe list
  arm64: kernel: avoid x18 in __cpu_soft_restart
  arm64: kvm: stop treating register x18 as caller save
  arm64/lib: copy_page: avoid x18 register in assembler code
  ...
2020-01-27 08:58:19 -08:00
Rafael J. Wysocki
245224d1cb Merge branches 'pm-cpufreq' and 'pm-sleep'
* pm-cpufreq:
  cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
  cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
  cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
  cpufreq: s3c: fix unbalances of cpufreq policy refcount
  cpufreq: imx-cpufreq-dt: Add i.MX8MP support
  cpufreq: Use imx-cpufreq-dt for i.MX8MP's speed grading
  cpufreq: tegra186: convert to devm_platform_ioremap_resource
  cpufreq: kirkwood: convert to devm_platform_ioremap_resource
  cpufreq: CPPC: put ACPI table after using it
  cpufreq : CPPC: Break out if HiSilicon CPPC workaround is matched

* pm-sleep:
  PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
  PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
  PM: hibernate: Add more logging on hibernation failure
  PM: hibernate: improve arithmetic division in preallocate_highmem_fraction()
  PM: wakeup: Show statistics for deleted wakeup sources again
  PM: sleep: Switch to rtc_time64_to_tm()/rtc_tm_to_time64()
2020-01-27 11:29:09 +01:00
Linus Torvalds
34597c85be Various tracing fixes:
- Fix a function comparison warning for a xen trace event macro
  - Fix a double perf_event linking to a trace_uprobe_filter for multiple events
  - Fix suspicious RCU warnings in trace event code for using
     list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.
  - Fix a bug in the histogram code when using the same variable
  - Fix a NULL pointer dereference when tracefs lockdown enabled and calling
     trace_set_default_clock()
 
 This v2 version contains:
 
  - A fix to a bug found with the double perf_event linking patch
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXinakBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhNZAQCi86p9eW3f3w7hM2hZcirC+mQKVZgp
 2rO4zIAK5V6G7gEAh6I7VZa50a6AE647ZjryE7ufTRUhmSFMWoG0kcJ7OAk=
 =/J9n
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - Fix a function comparison warning for a xen trace event macro

   - Fix a double perf_event linking to a trace_uprobe_filter for
     multiple events

   - Fix suspicious RCU warnings in trace event code for using
     list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.

   - Fix a bug in the histogram code when using the same variable

   - Fix a NULL pointer dereference when tracefs lockdown enabled and
     calling trace_set_default_clock()

   - A fix to a bug found with the double perf_event linking patch"

* tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobe: Fix to make trace_uprobe_filter alignment safe
  tracing: Do not set trace clock if tracefs lockdown is in effect
  tracing: Fix histogram code when expression has same var as value
  tracing: trigger: Replace unneeded RCU-list traversals
  tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
  tracing: xen: Ordered comparison of function pointers
2020-01-23 11:23:37 -08:00
Linus Torvalds
3a83c8c81c Power management fix for 5.5-rc8
Prevent the kernel from crashing during resume from hibernation
 if free pages contain leftover data from the restore kernel and
 init_on_free is set (Alexander Potapenko).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl4psjQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxpOcP/1UTGUr+VdGfBBjG5WlgCY0Jrd50Y78b
 RoVNDR/NvSVSuIs44AgrnfyQiz2Y8jG6qY1iSAbIXmFl37/4+kGhkNmd6pV/xFUc
 TdZZotFwFlRQjmeQxxH0kNXuAY6nJ2RwELWrjXqM8PuNjNKIEpfS+0fSaWexHqIm
 MDArxcDHkvZU5SnnRQM+LkT/EmbEheB7tgm7vGGqMLsSKc0gUsBmVCURe/lLAH5o
 EUKX4FI2jCy+LlmSdZ3EDjf1cstm3YXLiegTLSq1Jh3mFHXkFTwJMmidiz21qXJh
 Hc4r3iG0NZ37J8HXpwuq++KlhvNbhHJz+ZgC1IYls16RNYh5mUzxtMHWdSyyqlrW
 +z8gBVUyeJUYos5Kjb/NKSt43gnz7Uhy0UVbQXD66hgajXe71CQZQq/D5CMeTdJL
 jWNaeGYnhskz3IW2vnrs9Ucf6RHHWezXk51kVsyJXadiLhTdOv7DKahDKVwC/Hvf
 kyN1W0F5PZpF50yYmnhJgqDfxkGBNKpwXxTAGk6X0WQFaWeh/2FkX045UdJzBbHu
 fa1taTM/5RfPlbWq0wLPeHHSP4M2I0ndeWXZk88vUwwMfm9Wo+FNBhgs/EZfeuKD
 16sVMsX0r7R1bG+hEj+mvNeLWqfgz7MpGludCkV1dHIDkn2esxx6JqaWxR/UnLHb
 D3fZM/cKGdpY
 =kDFG
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Prevent the kernel from crashing during resume from hibernation if
  free pages contain leftover data from the restore kernel and
  init_on_free is set (Alexander Potapenko)"

* tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: fix crashes with init_on_free=1
2020-01-23 11:10:21 -08:00
Rafael J. Wysocki
322e929d19 Merge back new material related to system-wide PM for v5.6. 2020-01-23 16:00:56 +01:00
Masami Hiramatsu
b61387cb73 tracing/uprobe: Fix to make trace_uprobe_filter alignment safe
Commit 99c9a923e9 ("tracing/uprobe: Fix double perf_event
linking on multiprobe uprobe") moved trace_uprobe_filter on
trace_probe_event. However, since it introduced a flexible
data structure with char array and type casting, the
alignment of trace_uprobe_filter can be broken.

This changes the type of the array to trace_uprobe_filter
data strucure to fix it.

Link: http://lore.kernel.org/r/20200120124022.GA14897@hirez.programming.kicks-ass.net
Link: http://lkml.kernel.org/r/157966340499.5107.10978352478952144902.stgit@devnote2

Fixes: 99c9a923e9 ("tracing/uprobe: Fix double perf_event linking on multiprobe uprobe")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-22 07:09:20 -05:00
Masami Ichikawa
bf24daac8f tracing: Do not set trace clock if tracefs lockdown is in effect
When trace_clock option is not set and unstable clcok detected,
tracing_set_default_clock() sets trace_clock(ThinkPad A285 is one of
case). In that case, if lockdown is in effect, null pointer
dereference error happens in ring_buffer_set_clock().

Link: http://lkml.kernel.org/r/20200116131236.3866925-1-masami256@gmail.com

Cc: stable@vger.kernel.org
Fixes: 17911ff38a ("tracing: Add locked_down checks to the open calls of files created for tracefs")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1788488
Signed-off-by: Masami Ichikawa <masami256@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-20 16:18:14 -05:00
Steven Rostedt (VMware)
8bcebc77e8 tracing: Fix histogram code when expression has same var as value
While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:

 # echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
 # echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
 # echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

Would not display any histograms in the sched_switch histogram side.

But if I were to swap the location of

  "delta=common_timestamp-$start" with "start2=$start"

Such that the last line had:

 # echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

The histogram works as expected.

What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:

  delta=common_timestamp-$start

The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.

When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.

From Tom Zanussi:

 "Just to clarify some more about what the problem was is that without
  your patch, we would have two separate references to the same variable,
  and during resolve_var_refs(), they'd both want to be resolved
  separately, so in this case, since the first reference to start wasn't
  part of an expression, it wouldn't get the read-once flag set, so would
  be read normally, and then the second reference would do the read-once
  read and also be read but using read-once.  So everything worked and
  you didn't see a problem:

   from: start2=$start,delta=common_timestamp-$start

  In the second case, when you switched them around, the first reference
  would be resolved by doing the read-once, and following that the second
  reference would try to resolve and see that the variable had already
  been read, so failed as unset, which caused it to short-circuit out and
  not do the trigger action to generate the synthetic event:

   to: delta=common_timestamp-$start,start2=$start

  With your patch, we only have the single resolution which happens
  correctly the one time it's resolved, so this can't happen."

Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-20 16:11:47 -05:00
Linus Torvalds
11a8272947 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix non-blocking connect() in x25, from Martin Schiller.

 2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.

 3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.

 4) Limit size of TSO packets properly in lan78xx driver, from Eric
    Dumazet.

 5) r8152 probe needs an endpoint sanity check, from Johan Hovold.

 6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
    John Fastabend.

 7) hns3 needs short frames padded on transmit, from Yunsheng Lin.

 8) Fix netfilter ICMP header corruption, from Eyal Birger.

 9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.

10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.

11) Fix memory leak in act_ctinfo, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
  cxgb4: reject overlapped queues in TC-MQPRIO offload
  cxgb4: fix Tx multi channel port rate limit
  net: sched: act_ctinfo: fix memory leak
  bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
  bnxt_en: Fix ipv6 RFS filter matching logic.
  bnxt_en: Fix NTUPLE firmware command failures.
  net: systemport: Fixed queue mapping in internal ring map
  net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
  net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
  net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
  net: hns: fix soft lockup when there is not enough memory
  net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
  net/sched: act_ife: initalize ife->metalist earlier
  netfilter: nat: fix ICMP header corruption on ICMP errors
  net: wan: lapbether.c: Use built-in RCU list checking
  netfilter: nf_tables: fix flowtable list del corruption
  netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
  netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
  netfilter: nft_tunnel: ERSPAN_VERSION must not be null
  netfilter: nft_tunnel: fix null-attribute check
  ...
2020-01-19 12:03:53 -08:00
Linus Torvalds
7ff15cd045 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Three fixes: fix link failure on Alpha, fix a Sparse warning and
  annotate/robustify a lockless access in the NOHZ code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/sched: Annotate lockless access to last_jiffies_update
  lib/vdso: Make __cvdso_clock_getres() static
  time/posix-stubs: Provide compat itimer supoprt for alpha
2020-01-18 13:00:59 -08:00
Linus Torvalds
9e79c52332 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu/SMT fix from Ingo Molnar:
 "Fix a build bug on CONFIG_HOTPLUG_SMT=y && !CONFIG_SYSFS kernels"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/SMT: Fix x86 link error without CONFIG_SYSFS
2020-01-18 12:57:41 -08:00
Linus Torvalds
b07b9e8d63 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes, three Intel uncore driver fixes, plus an AUX events fix
  uncovered by the perf fuzzer"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Remove PCIe3 unit for SNR
  perf/x86/intel/uncore: Fix missing marker for snr_uncore_imc_freerunning_events
  perf/x86/intel/uncore: Add PCI ID of IMC for Xeon E3 V5 Family
  perf: Correctly handle failed perf_get_aux_event()
  perf hists: Fix variable name's inconsistency in hists__for_each() macro
  perf map: Set kmap->kmaps backpointer for main kernel map chunks
  perf report: Fix incorrectly added dimensions as switch perf data file
  tools lib traceevent: Fix memory leakage in filter_event
2020-01-18 12:55:19 -08:00
Linus Torvalds
124b5547ec Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Three fixes:

    - Fix an rwsem spin-on-owner crash, introduced in v5.4

    - Fix a lockdep bug when running out of stack_trace entries,
      introduced in v5.4

    - Docbook fix"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN
  futex: Fix kernel-doc notation warning
  locking/lockdep: Fix buffer overrun problem in stack_trace[]
2020-01-18 12:53:28 -08:00
Linus Torvalds
ba0f472203 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:
 "Two rseq bugfixes:

   - CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end
     up corrupting the TLS of the parent. Technically a change in the
     ABI but the previous behavior couldn't resonably have been relied
     on by applications so this looks like a valid exception to the ABI
     rule.

   - Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the
     handling of other flags. This is not thought to impact any
     applications either"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Unregister rseq for clone CLONE_VM
  rseq: Reject unknown flags on rseq unregister
2020-01-18 12:29:13 -08:00
Linus Torvalds
8cac89909a for-linus-2020-01-18
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXiL/qwAKCRCRxhvAZXjc
 oln5AP9ITypHs2iNWl1Cbte++y2iflWevDyPUrmagegqpKwbJAD9EypY0RVDor8T
 LXWK4WaNgB0K0MK/gSPRAlgx9ejNwA4=
 =6xXo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2020-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread fixes from Christian Brauner:
 "Here is an urgent fix for ptrace_may_access() permission checking.

  Commit 69f594a389 ("ptrace: do not audit capability check when
  outputing /proc/pid/stat") introduced the ability to opt out of audit
  messages for accesses to various proc files since they are not
  violations of policy.

  While doing so it switched the check from ns_capable() to
  has_ns_capability{_noaudit}(). That means it switched from checking
  the subjective credentials (ktask->cred) of the task to using the
  objective credentials (ktask->real_cred). This is appears to be wrong.
  ptrace_has_cap() is currently only used in ptrace_may_access() And is
  used to check whether the calling task (subject) has the
  CAP_SYS_PTRACE capability in the provided user namespace to operate on
  the target task (object). According to the cred.h comments this means
  the subjective credentials of the calling task need to be used.

  With this fix we switch ptrace_has_cap() to use security_capable() and
  thus back to using the subjective credentials.

  As one example where this might be particularly problematic, Jann
  pointed out that in combination with the upcoming IORING_OP_OPENAT{2}
  feature, this bug might allow unprivileged users to bypass the
  capability checks while asynchronously opening files like /proc/*/mem,
  because the capability checks for this would be performed against
  kernel credentials.

  To illustrate on the former point about this being exploitable: When
  io_uring creates a new context it records the subjective credentials
  of the caller. Later on, when it starts to do work it creates a kernel
  thread and registers a callback. The callback runs with kernel creds
  for ktask->real_cred and ktask->cred.

  To prevent this from becoming a full-blown 0-day io_uring will call
  override_cred() and override ktask->cred with the subjective
  credentials of the creator of the io_uring instance. With
  ptrace_has_cap() currently looking at ktask->real_cred this override
  will be ineffective and the caller will be able to open arbitray proc
  files as mentioned above.

  Luckily, this is currently not exploitable but would be so once
  IORING_OP_OPENAT{2} land in v5.6. Let's fix it now.

  To minimize potential regressions I successfully ran the criu
  testsuite. criu makes heavy use of ptrace() and extensively hits
  ptrace_may_access() codepaths and has a good change of detecting any
  regressions.

  Additionally, I succesfully ran the ptrace and seccomp kernel tests"

* tag 'for-linus-2020-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
2020-01-18 12:23:31 -08:00
Christian Brauner
6b3ad6649a
ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
Commit 69f594a389 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
introduced the ability to opt out of audit messages for accesses to various
proc files since they are not violations of policy.  While doing so it
somehow switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking the
subjective credentials of the task to using the objective credentials. This
is wrong since. ptrace_has_cap() is currently only used in
ptrace_may_access() And is used to check whether the calling task (subject)
has the CAP_SYS_PTRACE capability in the provided user namespace to operate
on the target task (object). According to the cred.h comments this would
mean the subjective credentials of the calling task need to be used.
This switches ptrace_has_cap() to use security_capable(). Because we only
call ptrace_has_cap() in ptrace_may_access() and in there we already have a
stable reference to the calling task's creds under rcu_read_lock() there's
no need to go through another series of dereferences and rcu locking done
in ns_capable{_noaudit}().

As one example where this might be particularly problematic, Jann pointed
out that in combination with the upcoming IORING_OP_OPENAT feature, this
bug might allow unprivileged users to bypass the capability checks while
asynchronously opening files like /proc/*/mem, because the capability
checks for this would be performed against kernel credentials.

To illustrate on the former point about this being exploitable: When
io_uring creates a new context it records the subjective credentials of the
caller. Later on, when it starts to do work it creates a kernel thread and
registers a callback. The callback runs with kernel creds for
ktask->real_cred and ktask->cred. To prevent this from becoming a
full-blown 0-day io_uring will call override_cred() and override
ktask->cred with the subjective credentials of the creator of the io_uring
instance. With ptrace_has_cap() currently looking at ktask->real_cred this
override will be ineffective and the caller will be able to open arbitray
proc files as mentioned above.
Luckily, this is currently not exploitable but will turn into a 0-day once
IORING_OP_OPENAT{2} land in v5.6. Fix it now!

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Jann Horn <jannh@google.com>
Fixes: 69f594a389 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-18 13:51:39 +01:00
Mark Rutland
da9ec3d3dd perf: Correctly handle failed perf_get_aux_event()
Vince reports a worrying issue:

| so I was tracking down some odd behavior in the perf_fuzzer which turns
| out to be because perf_even_open() sometimes returns 0 (indicating a file
| descriptor of 0) even though as far as I can tell stdin is still open.

... and further the cause:

| error is triggered if aux_sample_size has non-zero value.
|
| seems to be this line in kernel/events/core.c:
|
| if (perf_need_aux_event(event) && !perf_get_aux_event(event, group_leader))
|                goto err_locked;
|
| (note, err is never set)

This seems to be a thinko in commit:

  ab43762ef0 ("perf: Allow normal events to output AUX data")

... and we should probably return -EINVAL here, as this should only
happen when the new event is mis-configured or does not have a
compatible aux_event group leader.

Fixes: ab43762ef0 ("perf: Allow normal events to output AUX data")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
2020-01-17 11:32:44 +01:00
Waiman Long
39e7234f00 locking/rwsem: Fix kernel crash when spinning on RWSEM_OWNER_UNKNOWN
The commit 91d2a812df ("locking/rwsem: Make handoff writer
optimistically spin on owner") will allow a recently woken up waiting
writer to spin on the owner. Unfortunately, if the owner happens to be
RWSEM_OWNER_UNKNOWN, the code will incorrectly spin on it leading to a
kernel crash. This is fixed by passing the proper non-spinnable bits
to rwsem_spin_on_owner() so that RWSEM_OWNER_UNKNOWN will be treated
as a non-spinnable target.

Fixes: 91d2a812df ("locking/rwsem: Make handoff writer optimistically spin on owner")

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200115154336.8679-1-longman@redhat.com
2020-01-17 10:19:27 +01:00
Alexander Potapenko
18451f9f9e PM: hibernate: fix crashes with init_on_free=1
Upon resuming from hibernation, free pages may contain stale data from
the kernel that initiated the resume. This breaks the invariant
inflicted by init_on_free=1 that freed pages must be zeroed.

To deal with this problem, make clear_free_pages() also clear the free
pages when init_on_free is enabled.

Fixes: 6471384af2 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-16 23:51:45 +01:00
Jonas Meurer
c052bf82c6 PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
The sysfs attribute `/sys/power/sync_on_suspend` controls, whether or not
filesystems are synced by the kernel before system suspend.

Congruously, the behaviour of build-time switch CONFIG_SUSPEND_SKIP_SYNC
is slightly changed: It now defines the run-tim default for the new sysfs
attribute `/sys/power/sync_on_suspend`.

The run-time attribute is added because the existing corresponding
build-time Kconfig flag for (`CONFIG_SUSPEND_SKIP_SYNC`) is not flexible
enough. E.g. Linux distributions that provide pre-compiled kernels
usually want to stick with the default (sync filesystems before suspend)
but under special conditions this needs to be changed.

One example for such a special condition is user-space handling of
suspending block devices (e.g. using `cryptsetup luksSuspend` or `dmsetup
suspend`) before system suspend. The Kernel trying to sync filesystems
after the underlying block device already got suspended obviously leads
to dead-locks. Be aware that you have to take care of the filesystem sync
yourself before suspending the system in those scenarios.

Signed-off-by: Jonas Meurer <jonas@freesources.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-16 21:47:03 +01:00
David S. Miller
3981f955eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-01-15

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 13 files changed, 95 insertions(+), 43 deletions(-).

The main changes are:

1) Fix refcount leak for TCP time wait and request sockets for socket lookup
   related BPF helpers, from Lorenz Bauer.

2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann.

3) Batch of several sockmap and related TLS fixes found while operating
   more complex BPF programs with Cilium and OpenSSL, from John Fastabend.

4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue()
   to avoid purging data upon teardown, from Lingpeng Chen.

5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly
   dump a BPF map's value with BTF, from Martin KaFai Lau.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-16 10:04:40 +01:00
Daniel Borkmann
0af2ffc93a bpf: Fix incorrect verifier simulation of ARSH under ALU32
Anatoly has been fuzzing with kBdysch harness and reported a hang in one
of the outcomes:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (85) call bpf_get_socket_cookie#46
  1: R0_w=invP(id=0) R10=fp0
  1: (57) r0 &= 808464432
  2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  2: (14) w0 -= 810299440
  3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
  3: (c4) w0 s>>= 1
  4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
  4: (76) if w0 s>= 0x30303030 goto pc+216
  221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
  221: (95) exit
  processed 6 insns (limit 1000000) [...]

Taking a closer look, the program was xlated as follows:

  # ./bpftool p d x i 12
  0: (85) call bpf_get_socket_cookie#7800896
  1: (bf) r6 = r0
  2: (57) r6 &= 808464432
  3: (14) w6 -= 810299440
  4: (c4) w6 s>>= 1
  5: (76) if w6 s>= 0x30303030 goto pc+216
  6: (05) goto pc-1
  7: (05) goto pc-1
  8: (05) goto pc-1
  [...]
  220: (05) goto pc-1
  221: (05) goto pc-1
  222: (95) exit

Meaning, the visible effect is very similar to f54c7898ed ("bpf: Fix
precision tracking for unbounded scalars"), that is, the fall-through
branch in the instruction 5 is considered to be never taken given the
conclusion from the min/max bounds tracking in w6, and therefore the
dead-code sanitation rewrites it as goto pc-1. However, real-life input
disagrees with verification analysis since a soft-lockup was observed.

The bug sits in the analysis of the ARSH. The definition is that we shift
the target register value right by K bits through shifting in copies of
its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the
register into 32 bit mode, same happens after simulating the operation.
However, for the case of simulating the actual ARSH, we don't take the
mode into account and act as if it's always 64 bit, but location of sign
bit is different:

  dst_reg->smin_value >>= umin_val;
  dst_reg->smax_value >>= umin_val;
  dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val);

Consider an unknown R0 where bpf_get_socket_cookie() (or others) would
for example return 0xffff. With the above ARSH simulation, we'd see the
following results:

  [...]
  1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0
  1: (85) call bpf_get_socket_cookie#46
  2: R0_w=invP(id=0) R10=fp0
  2: (57) r0 &= 808464432
    -> R0_runtime = 0x3030
  3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  3: (14) w0 -= 810299440
    -> R0_runtime = 0xcfb40000
  4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
                              (0xffffffff)
  4: (c4) w0 s>>= 1
    -> R0_runtime = 0xe7da0000
  5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
                              (0x67c00000)           (0x7ffbfff8)
  [...]

In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011
0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that
is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly
retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into
0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000'
and means here that the simulation didn't retain the sign bit. With above
logic, the updates happen on the 64 bit min/max bounds and given we coerced
the register, the sign bits of the bounds are cleared as well, meaning, we
need to force the simulation into s32 space for 32 bit alu mode.

Verification after the fix below. We're first analyzing the fall-through branch
on 32 bit signed >= test eventually leading to rejection of the program in this
specific case:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r2 = 808464432
  1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0
  1: (85) call bpf_get_socket_cookie#46
  2: R0_w=invP(id=0) R10=fp0
  2: (bf) r6 = r0
  3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0
  3: (57) r6 &= 808464432
  4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  4: (14) w6 -= 810299440
  5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
  5: (c4) w6 s>>= 1
  6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
                                              (0x67c00000)          (0xfffbfff8)
  6: (76) if w6 s>= 0x30303030 goto pc+216
  7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
  7: (30) r0 = *(u8 *)skb[808464432]
  BPF_LD_[ABS|IND] uses reserved fields
  processed 8 insns (limit 1000000) [...]

Fixes: 9cbe1f5a32 ("bpf/verifier: improve register value range tracking with ARSH")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
2020-01-15 13:39:59 -08:00
Eric Dumazet
de95a991bb tick/sched: Annotate lockless access to last_jiffies_update
syzbot (KCSAN) reported a data-race in tick_do_update_jiffies64():

BUG: KCSAN: data-race in tick_do_update_jiffies64 / tick_do_update_jiffies64

write to 0xffffffff8603d008 of 8 bytes by interrupt on cpu 1:
 tick_do_update_jiffies64+0x100/0x250 kernel/time/tick-sched.c:73
 tick_sched_do_timer+0xd4/0xe0 kernel/time/tick-sched.c:138
 tick_sched_timer+0x43/0xe0 kernel/time/tick-sched.c:1292
 __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
 __hrtimer_run_queues+0x274/0x5f0 kernel/time/hrtimer.c:1576
 hrtimer_interrupt+0x22a/0x480 kernel/time/hrtimer.c:1638
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
 smp_apic_timer_interrupt+0xdc/0x280 arch/x86/kernel/apic/apic.c:1135
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
 arch_local_irq_restore arch/x86/include/asm/paravirt.h:756 [inline]
 kcsan_setup_watchpoint+0x1d4/0x460 kernel/kcsan/core.c:436
 check_access kernel/kcsan/core.c:466 [inline]
 __tsan_read1 kernel/kcsan/core.c:593 [inline]
 __tsan_read1+0xc2/0x100 kernel/kcsan/core.c:593
 kallsyms_expand_symbol.constprop.0+0x70/0x160 kernel/kallsyms.c:79
 kallsyms_lookup_name+0x7f/0x120 kernel/kallsyms.c:170
 insert_report_filterlist kernel/kcsan/debugfs.c:155 [inline]
 debugfs_write+0x14b/0x2d0 kernel/kcsan/debugfs.c:256
 full_proxy_write+0xbd/0x100 fs/debugfs/file.c:225
 __vfs_write+0x67/0xc0 fs/read_write.c:494
 vfs_write fs/read_write.c:558 [inline]
 vfs_write+0x18a/0x390 fs/read_write.c:542
 ksys_write+0xd5/0x1b0 fs/read_write.c:611
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x4c/0x60 fs/read_write.c:620
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffffffff8603d008 of 8 bytes by task 0 on cpu 0:
 tick_do_update_jiffies64+0x2b/0x250 kernel/time/tick-sched.c:62
 tick_nohz_update_jiffies kernel/time/tick-sched.c:505 [inline]
 tick_nohz_irq_enter kernel/time/tick-sched.c:1257 [inline]
 tick_irq_enter+0x139/0x1c0 kernel/time/tick-sched.c:1274
 irq_enter+0x4f/0x60 kernel/softirq.c:354
 entering_irq arch/x86/include/asm/apic.h:517 [inline]
 entering_ack_irq arch/x86/include/asm/apic.h:523 [inline]
 smp_apic_timer_interrupt+0x55/0x280 arch/x86/kernel/apic/apic.c:1133
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
 native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60
 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
 cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 do_idle+0x1af/0x280 kernel/sched/idle.c:263
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
 rest_init+0xec/0xf6 init/main.c:452
 arch_call_rest_init+0x17/0x37
 start_kernel+0x838/0x85e init/main.c:786
 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
 x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Use READ_ONCE() and WRITE_ONCE() to annotate this expected race.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191205045619.204946-1-edumazet@google.com
2020-01-15 10:54:12 +01:00
Masami Hiramatsu
aeed8aa387 tracing: trigger: Replace unneeded RCU-list traversals
With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings
when I ran ftracetest trigger testcases.

-----
  # dmesg -c > /dev/null
  # ./ftracetest test.d/trigger
  ...
  # dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " "
  kernel/trace/trace_events_hist.c:6070
  kernel/trace/trace_events_hist.c:1760
  kernel/trace/trace_events_hist.c:5911
  kernel/trace/trace_events_trigger.c:504
  kernel/trace/trace_events_hist.c:1810
  kernel/trace/trace_events_hist.c:3158
  kernel/trace/trace_events_hist.c:3105
  kernel/trace/trace_events_hist.c:5518
  kernel/trace/trace_events_hist.c:5998
  kernel/trace/trace_events_hist.c:6019
  kernel/trace/trace_events_hist.c:6044
  kernel/trace/trace_events_trigger.c:1500
  kernel/trace/trace_events_trigger.c:1540
  kernel/trace/trace_events_trigger.c:539
  kernel/trace/trace_events_trigger.c:584
-----

I investigated those warnings and found that the RCU-list
traversals in event trigger and hist didn't need to use
RCU version because those were called only under event_mutex.

I also checked other RCU-list traversals related to event
trigger list, and found that most of them were called from
event_hist_trigger_func() or hist_unregister_trigger() or
register/unregister functions except for a few cases.

Replace these unneeded RCU-list traversals with normal list
traversal macro and lockdep_assert_held() to check the
event_mutex is held.

Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 30350d65ac ("tracing: Add variable support to hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-14 17:12:04 -05:00
Masami Hiramatsu
99c9a923e9 tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
Fix double perf_event linking to trace_uprobe_filter on
multiple uprobe event by moving trace_uprobe_filter under
trace_probe_event.

In uprobe perf event, trace_uprobe_filter data structure is
managing target mm filters (in perf_event) related to each
uprobe event.

Since commit 60d53e2c3b ("tracing/probe: Split trace_event
related data from trace_probe") left the trace_uprobe_filter
data structure in trace_uprobe, if a trace_probe_event has
multiple trace_uprobe (multi-probe event), a perf_event is
added to different trace_uprobe_filter on each trace_uprobe.
This leads a linked list corruption.

To fix this issue, move trace_uprobe_filter to trace_probe_event
and link it once on each event instead of each probe.

Link: http://lkml.kernel.org/r/157862073931.1800.3800576241181489174.stgit@devnote2

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?b?w7hyZ2Vuc2Vu?= <thoiland@redhat.com>
Cc: Jean-Tsung Hsiao <jhsiao@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 60d53e2c3b ("tracing/probe: Split trace_event related data from trace_probe")
Link: https://lkml.kernel.org/r/20200108171611.GA8472@kernel.org
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-14 15:57:59 -05:00
Linus Torvalds
e033e7d4a8 Merge branch 'dhowells' (patches from DavidH)
Merge misc fixes from David Howells.

Two afs fixes and a key refcounting fix.

* dhowells:
  afs: Fix afs_lookup() to not clobber the version on a new dentry
  afs: Fix use-after-loss-of-ref
  keys: Fix request_key() cache
2020-01-14 09:56:31 -08:00
David Howells
8379bb84be keys: Fix request_key() cache
When the key cached by request_key() and co.  is cleaned up on exit(),
the code looks in the wrong task_struct, and so clears the wrong cache.
This leads to anomalies in key refcounting when doing, say, a kernel
build on an afs volume, that then trigger kasan to report a
use-after-free when the key is viewed in /proc/keys.

Fix this by making exit_creds() look in the passed-in task_struct rather
than in current (the task_struct cleanup code is deferred by RCU and
potentially run in another task).

Fixes: 7743c48e54 ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
Linus Torvalds
606e9ad200 clone3-tls-v5.5-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXhhtDQAKCRCRxhvAZXjc
 orQ3AQD7H2ovZbPIpWbwOnRIExBF4O8gPDfFc/J/RweZx40v/AD/QwfFnq0TpmUc
 UfS4zzLxJ4K+L4RYWId5v8MFHGIu8QQ=
 =LmmJ
 -----END PGP SIGNATURE-----

Merge tag 'clone3-tls-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread fixes from Christian Brauner:
 "This contains a series of patches to fix CLONE_SETTLS when used with
  clone3().

  The clone3() syscall passes the tls argument through struct clone_args
  instead of a register. This means, all architectures that do not
  implement copy_thread_tls() but still support CLONE_SETTLS via
  copy_thread() expecting the tls to be located in a register argument
  based on clone() are currently unfortunately broken. Their tls value
  will be garbage.

  The patch series fixes this on all architectures that currently define
  __ARCH_WANT_SYS_CLONE3. It also adds a compile-time check to ensure
  that any architecture that enables clone3() in the future is forced to
  also implement copy_thread_tls().

  My ultimate goal is to get rid of the copy_thread()/copy_thread_tls()
  split and just have copy_thread_tls() at some point in the not too
  distant future (Maybe even renaming copy_thread_tls() back to simply
  copy_thread() once the old function is ripped from all arches). This
  is dependent now on all arches supporting clone3().

  While all relevant arches do that now there are still four missing:
  ia64, m68k, sh and sparc. They have the system call reserved, but not
  implemented. Once they all implement clone3() we can get rid of
  ARCH_WANT_SYS_CLONE3 and HAVE_COPY_THREAD_TLS.

  This series also includes a minor fix for the arm64 uapi headers which
  caused __NR_clone3 to be missing from the exported user headers.

  Unfortunately the series came in a little late especially given that
  it touches a range of architectures. Due to the holidays not all arch
  maintainers responded in time probably due to their backlog. Will and
  Arnd have thankfully acked the arm specific changes.

  Given that the changes are straightforward and rather minimal combined
  with the fact the that clone3() with CLONE_SETTLS is broken I decided
  to send them post rc3 nonetheless"

* tag 'clone3-tls-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  um: Implement copy_thread_tls
  clone3: ensure copy_thread_tls is implemented
  xtensa: Implement copy_thread_tls
  riscv: Implement copy_thread_tls
  parisc: Implement copy_thread_tls
  arm: Implement copy_thread_tls
  arm64: Implement copy_thread_tls
  arm64: Move __ARCH_WANT_SYS_CLONE3 definition to uapi headers
2020-01-11 15:33:48 -08:00
Colin Ian King
5c0e9de065 PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
There is a spelling mistake in a pr_info message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-10 12:15:30 +01:00
Linus Torvalds
a5f48c7878 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Missing netns pointer init in arp_tables, from Florian Westphal.

 2) Fix normal tcp SACK being treated as D-SACK, from Pengcheng Yang.

 3) Fix divide by zero in sch_cake, from Wen Yang.

 4) Len passed to skb_put_padto() is wrong in qrtr code, from Carl
    Huang.

 5) cmd->obj.chunk is leaked in sctp code error paths, from Xin Long.

 6) cgroup bpf programs can be released out of order, fix from Roman
    Gushchin.

 7) Make sure stmmac debugfs entry name is changed when device name
    changes, from Jiping Ma.

 8) Fix memory leak in vlan_dev_set_egress_priority(), from Eric
    Dumazet.

 9) SKB leak in lan78xx usb driver, also from Eric Dumazet.

10) Ridiculous TCA_FQ_QUANTUM values configured can cause loops in fq
    packet scheduler, reject them. From Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  tipc: fix wrong connect() return code
  tipc: fix link overflow issue at socket shutdown
  netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present
  netfilter: conntrack: dccp, sctp: handle null timeout argument
  atm: eni: fix uninitialized variable warning
  macvlan: do not assume mac_header is set in macvlan_broadcast()
  net: sch_prio: When ungrafting, replace with FIFO
  mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFO
  MAINTAINERS: Remove myself as co-maintainer for qcom-ethqos
  gtp: fix bad unlock balance in gtp_encap_enable_socket
  pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
  tipc: remove meaningless assignment in Makefile
  tipc: do not add socket.o to tipc-y twice
  net: stmmac: dwmac-sun8i: Allow all RGMII modes
  net: stmmac: dwmac-sunxi: Allow all RGMII modes
  net: usb: lan78xx: fix possible skb leak
  net: stmmac: Fixed link does not need MDIO Bus
  vlan: vlan_changelink() should propagate errors
  vlan: fix memory leak in vlan_dev_set_egress_priority
  stmmac: debugfs entry name is not be changed when udev rename device name.
  ...
2020-01-09 10:34:07 -08:00
Arnd Bergmann
f35deaff1b time/posix-stubs: Provide compat itimer supoprt for alpha
Using compat_sys_getitimer and compat_sys_setitimer on alpha
causes a link failure in the Alpha tinyconfig and other configurations
that turn off CONFIG_POSIX_TIMERS.

Use the same #ifdef check for the stub version as well.

Fixes: 4c22ea2b91 ("y2038: use compat_{get,set}_itimer on alpha")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20191207191043.656328-1-arnd@arndb.de
2020-01-09 18:20:23 +01:00
Arnd Bergmann
dc8d37ed30 cpu/SMT: Fix x86 link error without CONFIG_SYSFS
When CONFIG_SYSFS is disabled, but CONFIG_HOTPLUG_SMT is enabled,
the kernel fails to link:

arch/x86/power/cpu.o: In function `hibernate_resume_nonboot_cpu_disable':
(.text+0x38d): undefined reference to `cpuhp_smt_enable'
arch/x86/power/hibernate.o: In function `arch_resume_nosmt':
hibernate.c:(.text+0x291): undefined reference to `cpuhp_smt_enable'
hibernate.c:(.text+0x29c): undefined reference to `cpuhp_smt_disable'

Move the exported functions out of the #ifdef section into its
own with the correct conditions.

The patch that caused this is marked for stable backports, so
this one may need to be backported as well.

Fixes: ec527c3180 ("x86/power: Fix 'nosmt' vs hibernation triple fault during resume")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191210195614.786555-1-arnd@arndb.de
2020-01-09 17:31:45 +01:00
Randy Dunlap
51bfb1d11d futex: Fix kernel-doc notation warning
Fix a kernel-doc warning in kernel/futex.c by adding notation
for @ret.

../kernel/futex.c:1187: warning: Function parameter or member 'ret' not described in 'wait_for_owner_exiting'

Fixes: 3ef240eaff ("futex: Prevent exit livelock")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/223be78c-f3c8-52df-836d-c5fb8e7907e9@infradead.org
2020-01-09 13:23:40 +01:00
Pavel Tatashin
de68e4daea kexec: add machine_kexec_post_load()
It is the same as machine_kexec_prepare(), but is called after segments are
loaded. This way, can do processing work with already loaded relocation
segments. One such example is arm64: it has to have segments loaded in
order to create a page table, but it cannot do it during kexec time,
because at that time allocations won't be possible anymore.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08 16:32:55 +00:00
Pavel Tatashin
d42cc530b1 kexec: quiet down kexec reboot
Here is a regular kexec command sequence and output:
=====
$ kexec --reuse-cmdline -i --load Image
$ kexec -e
[  161.342002] kexec_core: Starting new kernel

Welcome to Buildroot
buildroot login:
=====

Even when "quiet" kernel parameter is specified, "kexec_core: Starting
new kernel" is printed.

This message has  KERN_EMERG level, but there is no emergency, it is a
normal kexec operation, so quiet it down to appropriate KERN_NOTICE.

Machines that have slow console baud rate benefit from less output.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08 16:32:55 +00:00
Amanieu d'Antras
dd499f7a7e
clone3: ensure copy_thread_tls is implemented
copy_thread implementations handle CLONE_SETTLS by reading the TLS
value from the registers containing the syscall arguments for
clone. This doesn't work with clone3 since the TLS value is passed
in clone_args instead.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: <stable@vger.kernel.org> # 5.3.x
Link: https://lore.kernel.org/r/20200102172413.654385-8-amanieu@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-07 13:31:27 +01:00
Luigi Semenzato
7a7b99bf80 PM: hibernate: Add more logging on hibernation failure
Hibernation fails when the kernel cannot allocate enough memory
to copy all pages of RAM in use.

Ensure that the failure reason is clearly logged, and clearly
attributable to the hibernation module.

Signed-off-by: Luigi Semenzato <semenzato@google.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-07 13:31:12 +01:00
Wen Yang
809ed78a83 PM: hibernate: improve arithmetic division in preallocate_highmem_fraction()
do_div() does a 64-by-32 division. Use div64_u64() instead of
do_div() if the divisor is u64, to avoid truncation to 32-bit.

This change also cleans up code a tad.

Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-07 12:42:56 +01:00
Linus Torvalds
ae6088216c Various tracing fixes:
- kbuild found missing define of MCOUNT_INSN_SIZE for various build configs
  - Initialize variable to zero as gcc thinks it is used undefined
     (it really isn't but the code is subtle enough that this doesn't hurt)
  - Convert from do_div() to div64_ull() to prevent potential divide by zero
  - Unregister a trace point on error path in sched_wakeup tracer
  - Use signed offset for archs that can have stext not be first
  - A simple indentation fix (whitespace error)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXhOj6xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qukzAQCMNfkAbMFA+C1uORMhr/jWhi4eshWN
 4jZ2u5X8zGuuXQD+PaQU4n8d0K4uCPF+lFD16DfFxXvCOXHfN3/zXmxGvw8=
 =djaW
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - kbuild found missing define of MCOUNT_INSN_SIZE for various build
     configs

   - Initialize variable to zero as gcc thinks it is used undefined (it
     really isn't but the code is subtle enough that this doesn't hurt)

   - Convert from do_div() to div64_ull() to prevent potential divide by
     zero

   - Unregister a trace point on error path in sched_wakeup tracer

   - Use signed offset for archs that can have stext not be first

   - A simple indentation fix (whitespace error)"

* tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix indentation issue
  kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
  tracing: Change offset type to s32 in preempt/irq tracepoints
  ftrace: Avoid potential division by zero in function profiler
  tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
  tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls
  tracing: Initialize val to zero in parse_entry of inject code
2020-01-06 15:38:38 -08:00
Daniel Borkmann
6d4f151acf bpf: Fix passing modified ctx to ld/abs/ind instruction
Anatoly has been fuzzing with kBdysch harness and reported a KASAN
slab oob in one of the outcomes:

  [...]
  [   77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
  [   77.361119]
  [   77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
  [   77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  [   77.362984] Call Trace:
  [   77.363249]  dump_stack+0x97/0xe0
  [   77.363603]  print_address_description.constprop.0+0x1d/0x220
  [   77.364251]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365030]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365860]  __kasan_report.cold+0x37/0x7b
  [   77.366365]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.366940]  kasan_report+0xe/0x20
  [   77.367295]  bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.367821]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.368278]  ? mark_lock+0xa3/0x9b0
  [   77.368641]  ? kvm_sched_clock_read+0x14/0x30
  [   77.369096]  ? sched_clock+0x5/0x10
  [   77.369460]  ? sched_clock_cpu+0x18/0x110
  [   77.369876]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.370330]  ___bpf_prog_run+0x16c0/0x28f0
  [   77.370755]  __bpf_prog_run32+0x83/0xc0
  [   77.371153]  ? __bpf_prog_run64+0xc0/0xc0
  [   77.371568]  ? match_held_lock+0x1b/0x230
  [   77.371984]  ? rcu_read_lock_held+0xa1/0xb0
  [   77.372416]  ? rcu_is_watching+0x34/0x50
  [   77.372826]  sk_filter_trim_cap+0x17c/0x4d0
  [   77.373259]  ? sock_kzfree_s+0x40/0x40
  [   77.373648]  ? __get_filter+0x150/0x150
  [   77.374059]  ? skb_copy_datagram_from_iter+0x80/0x280
  [   77.374581]  ? do_raw_spin_unlock+0xa5/0x140
  [   77.375025]  unix_dgram_sendmsg+0x33a/0xa70
  [   77.375459]  ? do_raw_spin_lock+0x1d0/0x1d0
  [   77.375893]  ? unix_peer_get+0xa0/0xa0
  [   77.376287]  ? __fget_light+0xa4/0xf0
  [   77.376670]  __sys_sendto+0x265/0x280
  [   77.377056]  ? __ia32_sys_getpeername+0x50/0x50
  [   77.377523]  ? lock_downgrade+0x350/0x350
  [   77.377940]  ? __sys_setsockopt+0x2a6/0x2c0
  [   77.378374]  ? sock_read_iter+0x240/0x240
  [   77.378789]  ? __sys_socketpair+0x22a/0x300
  [   77.379221]  ? __ia32_sys_socket+0x50/0x50
  [   77.379649]  ? mark_held_locks+0x1d/0x90
  [   77.380059]  ? trace_hardirqs_on_thunk+0x1a/0x1c
  [   77.380536]  __x64_sys_sendto+0x74/0x90
  [   77.380938]  do_syscall_64+0x68/0x2a0
  [   77.381324]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [   77.381878] RIP: 0033:0x44c070
  [...]

After further debugging, turns out while in case of other helper functions
we disallow passing modified ctx, the special case of ld/abs/ind instruction
which has similar semantics (except r6 being the ctx argument) is missing
such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
and others are expecting skb fields in original position, hence, add
check_ctx_reg() to reject any modified ctx. Issue was first introduced back
in f1174f77b5 ("bpf/verifier: rework value tracking").

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
2020-01-06 14:19:47 -08:00
Roman Gushchin
e10360f815 bpf: cgroup: prevent out-of-order release of cgroup bpf
Before commit 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
cgroup bpf structures were released with
corresponding cgroup structures. It guaranteed the hierarchical order
of destruction: children were always first. It preserved attached
programs from being released before their propagated copies.

But with cgroup auto-detachment there are no such guarantees anymore:
cgroup bpf is released as soon as the cgroup is offline and there are
no live associated sockets. It means that an attached program can be
detached and released, while its propagated copy is still living
in the cgroup subtree. This will obviously lead to an use-after-free
bug.

To reproduce the issue the following script can be used:

  #!/bin/bash

  CGROOT=/sys/fs/cgroup

  mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
  sleep 1

  ./test_cgrp2_attach ${CGROOT}/A egress &
  A_PID=$!
  ./test_cgrp2_attach ${CGROOT}/B egress &
  B_PID=$!

  echo $$ > ${CGROOT}/A/C/cgroup.procs
  iperf -s &
  S_PID=$!
  iperf -c localhost -t 100 &
  C_PID=$!

  sleep 1

  echo $$ > ${CGROOT}/B/cgroup.procs
  echo ${S_PID} > ${CGROOT}/B/cgroup.procs
  echo ${C_PID} > ${CGROOT}/B/cgroup.procs

  sleep 1

  rmdir ${CGROOT}/A/C
  rmdir ${CGROOT}/A

  sleep 1

  kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}

On the unpatched kernel the following stacktrace can be obtained:

[   33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
[   33.620677] #PF: supervisor read access in kernel mode
[   33.621293] #PF: error_code(0x0000) - not-present page
[   33.622754] Oops: 0000 [#1] SMP NOPTI
[   33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
[   33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
[   33.635809] Call Trace:
[   33.636118]  ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
[   33.636728]  ? __switch_to_asm+0x40/0x70
[   33.637196]  ip_finish_output+0x68/0xa0
[   33.637654]  ip_output+0x76/0xf0
[   33.638046]  ? __ip_finish_output+0x1c0/0x1c0
[   33.638576]  __ip_queue_xmit+0x157/0x410
[   33.639049]  __tcp_transmit_skb+0x535/0xaf0
[   33.639557]  tcp_write_xmit+0x378/0x1190
[   33.640049]  ? _copy_from_iter_full+0x8d/0x260
[   33.640592]  tcp_sendmsg_locked+0x2a2/0xdc0
[   33.641098]  ? sock_has_perm+0x10/0xa0
[   33.641574]  tcp_sendmsg+0x28/0x40
[   33.641985]  sock_sendmsg+0x57/0x60
[   33.642411]  sock_write_iter+0x97/0x100
[   33.642876]  new_sync_write+0x1b6/0x1d0
[   33.643339]  vfs_write+0xb6/0x1a0
[   33.643752]  ksys_write+0xa7/0xe0
[   33.644156]  do_syscall_64+0x5b/0x1b0
[   33.644605]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by grabbing a reference to the bpf structure of each ancestor
on the initialization of the cgroup bpf structure, and dropping the
reference at the end of releasing the cgroup bpf structure.

This will restore the hierarchical order of cgroup bpf releasing,
without adding any operations on hot paths.

Thanks to Josef Bacik for the debugging and the initial analysis of
the problem.

Fixes: 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-06 14:00:30 -08:00
Shakeel Butt
84029fd04c memcg: account security cred as well to kmemcg
The cred_jar kmem_cache is already memcg accounted in the current kernel
but cred->security is not.  Account cred->security to kmemcg.

Recently we saw high root slab usage on our production and on further
inspection, we found a buggy application leaking processes.  Though that
buggy application was contained within its memcg but we observe much
more system memory overhead, couple of GiBs, during that period.  This
overhead can adversely impact the isolation on the system.

One source of high overhead we found was cred->security objects, which
have a lifetime of at least the life of the process which allocated
them.

Link: http://lkml.kernel.org/r/20191205223721.40034-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Colin Ian King
72879ee0c5 tracing: Fix indentation issue
There is a declaration that is indented one level too deeply, remove
the extraneous tab.

Link: http://lkml.kernel.org/r/20191221154825.33073-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-03 15:20:46 -05:00
Linus Torvalds
d9c82fd8c8 for-linus-2020-01-03
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXg9C5wAKCRCRxhvAZXjc
 oiZXAPsGFXyDCWlKnShBpKufdFh6XugADlyZK0Si2ISWQoJJsgD/Ri1g3zg6V7YC
 HBG0sz8+vSk/Ys55yDQz+K1d1MTkdQ4=
 =8uQe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2020-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread fixes from Christian Brauner:
 "Here are two fixes:

   - Panic earlier when global init exits to generate useable coredumps.

     Currently, when global init and all threads in its thread-group
     have exited we panic via:

       do_exit()
       -> exit_notify()
          -> forget_original_parent()
             -> find_child_reaper()

     This makes it hard to extract a useable coredump for global init
     from a kernel crashdump because by the time we panic exit_mm() will
     have already released global init's mm. We now panic slightly
     earlier. This has been a problem in certain environments such as
     Android.

   - Fix a race in assigning and reading taskstats for thread-groups
     with more than one thread.

     This patch has been waiting for quite a while since people
     disagreed on what the correct fix was at first"

* tag 'for-linus-2020-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  exit: panic before exit_mm() on global init exit
  taskstats: fix data-race
2020-01-03 11:17:14 -08:00
Kaitao Cheng
50f9ad607e kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
In the function, if register_trace_sched_migrate_task() returns error,
sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is
why fail_deprobe_sched_switch was added.

Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com

Cc: stable@vger.kernel.org
Fixes: 478142c39c ("tracing: do not grab lock in wakeup latency function tracing")
Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-03 11:43:03 -05:00
Wen Yang
e31f7939c1 ftrace: Avoid potential division by zero in function profiler
The ftrace_profile->counter is unsigned long and
do_div truncates it to 32 bits, which means it can test
non-zero and be truncated to zero for division.
Fix this issue by using div64_ul() instead.

Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com

Cc: stable@vger.kernel.org
Fixes: e330b3bcd8 ("tracing: Show sample std dev in function profiling")
Fixes: 34886c8bc5 ("tracing: add average time in function to function profiler")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02 22:14:57 -05:00
Steven Rostedt (VMware)
b8299d362d tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
On some archs with some configurations, MCOUNT_INSN_SIZE is not defined, and
this makes the stack tracer fail to compile. Just define it to zero in this
case.

Link: https://lore.kernel.org/r/202001020219.zvE3vsty%lkp@intel.com

Cc: stable@vger.kernel.org
Fixes: 4df297129f ("tracing: Remove most or all of stack tracer stack size from stack_max_size")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02 22:04:07 -05:00
Steven Rostedt (VMware)
d2ccbccb54 tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls
In order to handle direct calls along side of function graph tracer, a check
is made to see if the address being traced by the function graph tracer is a
direct call or not. To get the address used by direct callers, the return
address is subtracted by MCOUNT_INSN_SIZE.

For some archs with certain configurations, MCOUNT_INSN_SIZE is undefined
here. But these should not be using direct calls anyway. Just define
MCOUNT_INSN_SIZE to zero in this case.

Link: https://lore.kernel.org/r/202001020219.zvE3vsty%lkp@intel.com

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: ff205766db ("ftrace: Fix function_graph tracer interaction with BPF trampoline")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02 21:56:44 -05:00
Linus Torvalds
bf6dd9a58e Fixes for seccomp_notify_ioctl uapi sanity
- Fix samples and selftests to zero passed-in buffer (Sargun Dhillon)
 - Enforce zeroed buffer checking (Sargun Dhillon)
 - Verify buffer sanity check in selftest (Sargun Dhillon)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl4OX5wWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJtJZD/4iLG7mOUQNXdcPidjcIMO/tjST
 UzW+9Cb3buePgmCHO9v1TKGL29fVwP5TkuxdrBYDGrJ4rEYANSDX0aNmpHsO8/8M
 2/B/Lo/f9cxFgoKI4QLY2XZ1YR+zkH980mtIG7ZcpYjsNl5AwmT27m2lo6iE7J+x
 7rsaTRPFmUfgbblB6Z5gNwwATudrWJgq066lY2fg3GADP81s6lGQB+ul8rtu84ME
 mTvtb3w6piJb3E+DeYY8p4ykyiewDuYqZWDY+dvWi3kRDjNWX+yFJaPW0YNhM+yh
 HaMXnbuh6gDyCbeUHorC9ypQhJJKzEWCUW8e60BND+fOFCdKMa1AdCtlXWHjrXDQ
 x9hUgQ3UhEedYtQeYtYuoltf0W8Ft4wAapxKJJRegYPQ0RPOgcfdAg4UquusCaLo
 fWK2Hy4XFrxOwISqsFUczUVkBcXl+w0GGH59pSyTImgoQPlTpbVP6f7Axbl+qpKo
 pqOe4bO8curLGlZpdBN6syR5Ik0bizQK0kDZeo+wPmEClp/1zJWMJ4MTP4T80rxY
 74DiQyfNH2iHfsOkdfHCsJC3jM8nmdKk5wMqtrAiIoT8/vdTBgumHrnmkORWFf8c
 R/NHCCLVs9q9sKV0s+VUR3OM2RjqpG1Wo/EBjTlbDQnibC5qdha8X2uVJWIHiF61
 ZgwZ9BoKV/+mKSqTAQ==
 =WgBI
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:
 "Fixes for seccomp_notify_ioctl uapi sanity from Sargun Dhillon.

  The bulk of this is fixing the surrounding samples and selftests so
  that seccomp can correctly validate the seccomp_notify_ioctl buffer as
  being initially zeroed.

  Summary:

   - Fix samples and selftests to zero passed-in buffer

   - Enforce zeroed buffer checking

   - Verify buffer sanity check in selftest"

* tag 'seccomp-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: Catch garbage on SECCOMP_IOCTL_NOTIF_RECV
  seccomp: Check that seccomp_notif is zeroed out by the user
  selftests/seccomp: Zero out seccomp_notif
  samples/seccomp: Zero out members based on seccomp_notif_sizes
2020-01-02 16:42:10 -08:00