Commit graph

593831 commits

Author SHA1 Message Date
Dan Carpenter
810810ffb2 qede: uninitialized variable in qede_start_xmit()
"data_split" was never set to false.  It's just uninitialized.

Fixes: 2950219d87 ('qede: Add basic network device support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08 23:31:53 -04:00
Linus Torvalds
44549e8f5e Linux 4.6-rc7 2016-05-08 14:38:32 -07:00
Ingo Molnar
4abac0d0bb MAINTAINERS: Add mmiotrace entry
The Nouveau maintainers would like to follow and review mmiotrace
changes as well, so create a separate entry for that code. The high
level bits are living in the tracing code, the low level bits in the
x86 code.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Pekka Paalanen <ppaalanen@gmail.com>
Acked-by: karol herbst <karolherbst@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-08 10:33:18 +02:00
Al Viro
99d825822e get_rock_ridge_filename(): handle malformed NM entries
Payloads of NM entries are not supposed to contain NUL.  When we run
into such, only the part prior to the first NUL goes into the
concatenation (i.e. the directory entry name being encoded by a bunch
of NM entries).  We do stop when the amount collected so far + the
claimed amount in the current NM entry exceed 254.  So far, so good,
but what we return as the total length is the sum of *claimed*
sizes, not the actual amount collected.  And that can grow pretty
large - not unlimited, since you'd need to put CE entries in
between to be able to get more than the maximum that could be
contained in one isofs directory entry / continuation chunk and
we are stop once we'd encountered 32 CEs, but you can get about 8Kb
easily.  And that's what will be passed to readdir callback as the
name length.  8Kb __copy_to_user() from a buffer allocated by
__get_free_page()

Cc: stable@vger.kernel.org # 0.98pl6+ (yes, really)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-07 22:52:39 -04:00
Dan Carpenter
8c1f454625 netxen: netxen_rom_fast_read() doesn't return -1
The error handling is broken here.  netxen_rom_fast_read() returns zero
on success and -EIO on error.  It never returns -1.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07 15:15:32 -04:00
Dan Carpenter
1c755ffa4f netxen: reversed condition in netxen_nic_set_link_parameters()
My static checker complains that we are using "autoneg" without
initializing it.  The problem is the ->phy_read() condition is reversed
so we only set this on error instead of success.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07 15:15:32 -04:00
Dan Carpenter
545fea5491 netxen: fix error handling in netxen_get_flash_block()
My static checker complained that "v" can be used unintialized if
netxen_rom_fast_read() returns -EIO.  That function never actually
returns -1.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07 15:15:32 -04:00
Hariprasad Shenai
218d48e701 cxgb4: Reset dcb state machine and tx queue prio only if dcb is enabled
When cxgb4 is enabled with CONFIG_CHELSIO_T4_DCB set, VI enable command
gets called with DCB enabled. But when we have a back to back setup with
DCB enabled on one side and non-DCB on the Peer side. Firmware doesn't
send any DCB_L2_CFG, and DCB priority is never set for Tx queue.
But driver resets the queue priority and state machine whenever there
is a link down, this patch fixes it by adding a check to reset only if
cxgb4_dcb_enabled() returns true.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-07 15:12:54 -04:00
Linus Torvalds
32cf95db22 Char/Misc driver fixes for 4.6-rc7
Here are 3 small fixes for some driver problems that were reported.
 Full details in the shortlog below.
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlcuKUQACgkQMUfUDdst+yk8qgCguPDODcYzOWiH1+RtIXTH5kXG
 /1EAoIx7+uhzX9pt9E635NsrcNJqefWx
 =9Uhc
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull misc driver fixes from Gfreg KH:
 "Here are three small fixes for some driver problems that were
  reported.  Full details in the shortlog below.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  nvmem: mxs-ocotp: fix buffer overflow in read
  Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read()
  misc: mic: Fix for double fetch security bug in VOP driver
2016-05-07 10:53:32 -07:00
Linus Torvalds
630aac5ab6 Staging/IIO driver fixes for 4.6-rc7
Well, it's really just IIO drivers here, some small fixes that resolve
 some "crash on boot" errors that have shown up in the -rc series, and
 other bugfixes that are required.
 
 All have been in linux-next with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlcuKBgACgkQMUfUDdst+ynALgCgh4QWOZ/vnLQvx/r1ZOIW1xqm
 QdAAn1eJ1/KzwbM+WmmfXu1dKykfs7YS
 =tGM3
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull IIO driver fixes from Grek KH:
 "It's really just IIO drivers here, some small fixes that resolve some
  'crash on boot' errors that have shown up in the -rc series, and other
  bugfixes that are required.

  All have been in linux-next with no reported problems"

* tag 'staging-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: imu: mpu6050: Fix name/chip_id when using ACPI
  iio: imu: mpu6050: fix possible NULL dereferences
  iio:adc:at91-sama5d2: Repair crash on module removal
  iio: ak8975: fix maybe-uninitialized warning
  iio: ak8975: Fix NULL pointer exception on early interrupt
2016-05-07 10:50:48 -07:00
Linus Torvalds
3f8f0cf2ed USB fixes for 4.6-rc7
Here are some last-remaining fixes for USB drivers to resolve issues
 that have shown up in testing.  And 2 new device ids as well.
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlcuJqEACgkQMUfUDdst+yl07wCeMMXyn3ZgOgpxiAAFUjBiHN5P
 86gAn2Kh/eihIFYwPRrHypbE67RO+yTx
 =QYaR
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some last-remaining fixes for USB drivers to resolve issues
  that have shown up in testing.  And two new device ids as well.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  Revert "USB / PM: Allow USB devices to remain runtime-suspended when sleeping"
  usb: musb: jz4740: fix error check of usb_get_phy()
  Revert "usb: musb: musb_host: Enable HCD_BH flag to handle urb return in bottom half"
  usb: musb: gadget: nuke endpoint before setting its descriptor to NULL
  USB: serial: cp210x: add Straizona Focusers device ids
  USB: serial: cp210x: add ID for Link ECU
2016-05-07 10:47:03 -07:00
Linus Torvalds
9125aeb3e2 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "These are a number of updates to fix a few problems found in the ARM
  nommu code over the last couple of years, caused mostly by changes on
  the mmu side"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8573/1: domain: move {set,get}_domain under config guard
  ARM: 8572/1: nommu: change memory reserve for the vectors
  ARM: 8571/1: nommu: fix PMSAv7 setup
2016-05-07 08:27:35 -07:00
Linus Torvalds
67601c3b64 media fixes for v4.6-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXLdt0AAoJEAhfPr2O5OEVMzYQAKGk2BqfPQO5+vV0S4FHG/5I
 kBe2y5MhedikMTm3yp77jLqkUPO3zbuZ3BZW4IJbOhl0STYzT51C573Z+6utfSvr
 b6zMvTfXPIe+iFoqO7fsO/zJuJtlFjm4wzO/S5v5mYrks5z3oLwN7v34QtkKfVM1
 3pfwlXyIthE4Cff0j18SsyUpkjN3sF8lDJOFSjH2C1a9yfUpOAX9iPXax88S9fuc
 cNyp6msrsa8qoYW4Z3IcMKzw7gCWXberaLYPv0Jy9zmk5OKLgJoUS3oKo0B29HSu
 xxIhW60uZDv9+nQF623EUdIF7hRzSjTnnKK+qQFD+JX8c5qmIoPrQQW90WefFO2e
 suBRvmg0xC6j0rQRjGbSdTm6mFjPeO6YHWcO3JpoQ7FvdYlRIq29H0PRYzr4bADD
 IiJlxqMrBRX2l5T1fKw/Fx9fag4asl/svdYjw7fVYdrn9Q6dRDR/lj4dUsC7n0+J
 lgKs9Lvs4ZE0n0TlAQEkgeL3Xe2RNuiO/90m3WEMFvT1YFX0BxW/w2RcCuAilt78
 z24ljeeBrQWaR8wnsENoE+emwr1aa5hkTl5qgudiXBWO7obUiCbYkaFpFGU1DfEe
 VE1JHBG82vbqTyYbLjLVpuLpp61qpXY5WngYjzG9gMetBAOaWkiPSIfq69h5c2JK
 K+JEoMfJdrIpsqJ7k4E+
 =0D5s
 -----END PGP SIGNATURE-----

Merge tag 'media/v4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

  - deadlock fixes on driver probe at exynos4-is and s43-camif drivers

  - a build breakage if media controller is enabled and USB or PCI is
   built as module.

* tag 'media/v4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] media-device: fix builds when USB or PCI is compiled as module
  [media] media: s3c-camif: fix deadlock on driver probe()
  [media] media: exynos4-is: fix deadlock on driver probe
2016-05-07 08:17:45 -07:00
Linus Torvalds
35cd3f4563 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "An ahci driver addition and updates to ahci port enable handling for
  some platform devices"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ata: add AMD Seattle platform driver
  ARM: dts: apq8064: add ahci ports-implemented mask
  ata: ahci-platform: Add ports-implemented DT bindings.
  libahci: save port map for forced port map
2016-05-07 08:13:42 -07:00
Linus Torvalds
b4184cbff3 Late 4.6-rc fixes
- Fix for max sector calculation in iSER
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXK3h4AAoJELgmozMOVy/doWMQAJ/O1kMh9ojnlLaFRjm889aV
 vNT/NTV0ilmbjvzgmhzYpbPOYf42hCHOZlI1vCCdTK0FafVlUh67m7oUCdW3OSm2
 b9OVtYsvqk/R6rlSqh8anmqwgAIv1OvmdQ26PBhOpkuokDgZArP4jaH06Tacqjws
 2cxZVuX8SA3f97tquOm1Shj0gqdA8dk9O+/6L3KzV2B6KUaBdYdb3ywvBQAVjIde
 qQJ4WiFl6QEBEQ18Seo6//cTiOez4bNYbR5inkgohciFfO8UIVDZDRLIXBQCE2Ca
 0QyoQ0OmAlSlAzXRCjTe+ZxAXjnmR/ZqRPgQAu9M4klMJF9ri+7U8uwerLQcMQQZ
 D1cQWx14P5BKMGKQ6kOh85+cQsn0qxOHNDRmEC7RIFda3rpsnEhtllg5+A94ZGMz
 FFOZ8uHYuOGxU2cbQUv9bjxDe2ukMjvYRdGbTTnWv+Ifi9S4fuIV1yw6erZ2XTRd
 BZWCMmmjvJWCfiZ1k1ZT1HtHW+dEvilcsFWbkZ4aGjFVF9sFvuWbGynWU57E9JWs
 2SC1CNp40OFCneqyPIVVFOu2kGNvQEp9UNy4espDT4hUP9J9trwkZdZLyE8HFufB
 Rj4BXUpzMt0+RLWQC3tGRDQDtgO4b6SJG1QtwTj7Jb/ttuzCYaVQ463KNo1KeS2y
 7Li0afdLKQ2LslVlKJK1
 =sU6v
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fix from Doug Ledford:
 "Fix for max sector calculation in iSER"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/iser: Fix max_sectors calculation
2016-05-07 08:10:08 -07:00
Thomas Gleixner
56402d63ee x86/topology: Handle CPUID bogosity gracefully
Joseph reported that a XEN guest dies with a division by 0 in the package
topology setup code. This happens if cpu_info.x86_max_cores is zero.

Handle that case and emit a warning. This does not fix the underlying XEN bug,
but makes the code more robust.

Reported-and-tested-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1605062046270.3540@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-07 10:06:55 +02:00
Rafael J. Wysocki
536bd00cdb sched/fair: Fix !CONFIG_SMP kernel cpufreq governor breakage
The following commit:

  34e2c555f3 ("cpufreq: Add mechanism for registering utilization update callbacks")

overlooked the fact that update_load_avg(), where CFS invokes cpufreq
utilization update callbacks, becomes an empty stub on UP kernels.

In consequence, if !CONFIG_SMP, cpufreq governors are never invoked
from CFS and they do not have a chance to evaluate CPU performace
levels and update them often enough.

Needless to say, things don't work as expected then.

Fix the problem by making the !CONFIG_SMP stub of update_load_avg()
invoke cpufreq update callbacks too.

Reported-by: Steve Muckle <steve.muckle@linaro.org>
Tested-by: Steve Muckle <steve.muckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Steve Muckle <steve.muckle@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Fixes: 34e2c555f3 (cpufreq: Add mechanism for registering utilization update callbacks)
Link: http://lkml.kernel.org/r/6282396.VVEdgVYxO3@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:45:34 +02:00
Kees Cook
3a94707d7a x86/KASLR: Build identity mappings on demand
Currently KASLR only supports relocation in a small physical range (from
16M to 1G), due to using the initial kernel page table identity mapping.
To support ranges above this, we need to have an identity mapping for the
desired memory range before we can decompress (and later run) the kernel.

32-bit kernels already have the needed identity mapping. This patch adds
identity mappings for the needed memory ranges on 64-bit kernels. This
happens in two possible boot paths:

If loaded via startup_32(), we need to set up the needed identity map.

If loaded from a 64-bit bootloader, the bootloader will have already
set up an identity mapping, and we'll start via the compressed kernel's
startup_64(). In this case, the bootloader's page tables need to be
avoided while selecting the new uncompressed kernel location. If not,
the decompressor could overwrite them during decompression.

To accomplish this, we could walk the pagetable and find every page
that is used, and add them to mem_avoid, but this needs extra code and
will require increasing the size of the mem_avoid array.

Instead, we can create a new set of page tables for our own identity
mapping instead. The pages for the new page table will come from the
_pagetable section of the compressed kernel, which means they are
already contained by in mem_avoid array. To do this, we reuse the code
from the uncompressed kernel's identity mapping routines.

The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce
init_size, as now the compressed kernel's _rodata to _end will contribute
to init_size.

To handle the possible mappings, we need to increase the existing page
table buffer size:

When booting via startup_64(), we need to cover the old VO, params,
cmdline and uncompressed kernel. In an extreme case we could have them
all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings.
And we'll need 2 for first 2M for VGA RAM. One more is needed for level4.
This gets us to 19 pages total.

When booting via startup_32(), KASLR could move the uncompressed kernel
above 4G, so we need to create extra identity mappings, which should only
need (2+2) pages at most when it is beyond the 512G boundary. So 19
pages is sufficient for this case as well.

The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their
names to maintain logical consistency with the existing BOOT_HEAP_SIZE
and BOOT_STACK_SIZE defines.

This patch is based on earlier patches from Yinghai Lu and Baoquan He.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:39 +02:00
Yinghai Lu
cf4fb15b31 x86/boot: Split out kernel_ident_mapping_init()
In order to support on-demand page table creation when moving the
kernel for KASLR, we need to use kernel_ident_mapping_init() in the
decompression code.

This splits it out into its own file for use outside of init_64.c.
Additionally, checking for __pa/__va defines is added since they
need to be overridden in the decompression code.

[kees: rewrote changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:39 +02:00
Kees Cook
8665e6ff21 x86/boot: Clean up indenting for asm/boot.h
Before adding more defines to asm/boot.h, this cleans up the existing
indenting for readability.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1462572095-11754-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:39 +02:00
Kees Cook
ed09acde44 x86/KASLR: Improve comments around the mem_avoid[] logic
This attempts to improve the comments that describe how the memory
range used for decompression is avoided. Additionally uses an enum
instead of raw numbers for the mem_avoid[] indexing.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20160506194459.GA16480@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:38 +02:00
Borislav Petkov
549f90db68 x86/boot: Simplify pointer casting in choose_random_location()
Pass them down as 'unsigned long' directly and get rid of more casting and
assignments.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bhe@redhat.com
Cc: dyoung@redhat.com
Cc: linux-tip-commits@vger.kernel.org
Cc: luto@kernel.org
Cc: vgoyal@redhat.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/20160506115015.GI24044@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:38:38 +02:00
Peter Jones
6c5450ef66 efivarfs: Make efivarfs_file_ioctl() static
There are no callers except through the file_operations struct below
this, so it should be static like everything else here.

Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:06:13 +02:00
Julia Lawall
1cfd63166c efi: Merge boolean flag arguments
The parameters atomic and duplicates of efivar_init always have opposite
values.  Drop the parameter atomic, replace the uses of !atomic with
duplicates, and update the call sites accordingly.

The code using duplicates is slightly reorganized with an 'else', to avoid
duplicating the lock code.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Saurabh Sengar <saurabh.truth@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:06:13 +02:00
Matt Fleming
fb7a84cac0 efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
Dan Carpenter reports that passing the address of the pointer to the
kmalloc()'d memory for 'capsule' is dangerous:

 "drivers/firmware/efi/capsule.c:109 efi_capsule_supported()
  warn: did you mean to pass the address of 'capsule'

   108
   109          status = efi.query_capsule_caps(&capsule, 1, &max_size, reset);
                                                ^^^^^^^^
  If we modify capsule inside this function call then at the end of the
  function we aren't freeing the original pointer that we allocated."

Ard Biesheuvel noted that we don't even need to call kmalloc() since the
object we allocate isn't very big and doesn't need to persist after the
function returns.

Place 'capsule' on the stack instead.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: joeyli <jlee@suse.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:06:13 +02:00
Jeremy Compostella
2e121d711a efibc: Fix excessive stack footprint warning
GCC complains about a newly added file for the EFI Bootloader Control:

  drivers/firmware/efi/efibc.c: In function 'efibc_set_variable':
  drivers/firmware/efi/efibc.c:53:1: error: the frame size of 2272 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

The problem is the declaration of a local variable of type struct
efivar_entry, which is by itself larger than the warning limit of 1024
bytes.

Use dynamic memory allocation instead of stack memory for the entry
object.

This patch also fixes a potential buffer overflow.

Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
[ Updated changelog to include GCC error ]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-3-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:06:13 +02:00
Matt Fleming
62075e5818 efi/capsule: Make efi_capsule_pending() lockless
Taking a mutex in the reboot path is bogus because we cannot sleep
with interrupts disabled, such as when rebooting due to panic(),

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
  in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched
  Call Trace:
    dump_stack+0x63/0x89
    ___might_sleep+0xd8/0x120
    __might_sleep+0x49/0x80
    mutex_lock+0x20/0x50
    efi_capsule_pending+0x1d/0x60
    native_machine_emergency_restart+0x59/0x280
    machine_emergency_restart+0x19/0x20
    emergency_restart+0x18/0x20
    panic+0x1ba/0x217

In this case all other CPUs will have been stopped by the time we
execute the platform reboot code, so 'capsule_pending' cannot change
under our feet. We wouldn't care even if it could since we cannot wait
for it complete.

Also, instead of relying on the external 'system_state' variable just
use a reboot notifier, so we can set 'stop_capsules' while holding
'capsule_mutex', thereby avoiding a race where system_state is updated
while we're in the middle of efi_capsule_update_locked() (since CPUs
won't have been stopped at that point).

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: joeyli <jlee@suse.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:06:13 +02:00
Ingo Molnar
35dc9ec107 Merge branch 'linus' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:00:07 +02:00
Ingo Molnar
ea7c285189 perf/core improvements and fixes:
User visible:
 
 - Fix ordering of kernel/user entries in 'caller' mode, where the kernel and
   user parts were being correctly inverted but kept in place wrt each other,
   i.e. 'callee' (k1, k2, u3, u4) became 'caller' (k2, k1, u4, u3) when it
   should be 'caller' (u4, u3, k2, k1) (Chris Phlipot)
 
 - In 'perf trace' don't print the raw arg syscall args for a syscall that has
   no arguments, like gettid(). This was happening because just checking if
   the syscall args list is NULL may mean that there are no args (e.g.: gettid)
   or that there is no tracepoint info (e.g.: clone) (Arnaldo Carvalho de Melo)
 
 - Add extra output of counter values with 'perf stat -vv' (Andi Kleen)
 
 Infrastructure:
 
 - Expose callchain db export via the python API (Chris Phlipot)
 
 Code reorganization:
 
 - Move some more syscall arg beautifiers from the 'perf trace' main file to
   separate files in tools/perf/trace/beauty/, to reduce the main file line
   count (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXLMC4AAoJENZQFvNTUqpAJvYQALplVTE2Yf7Umjw51jDhQPVi
 5AAWlEfjMkiW9a+8oWwFzmwNtTbe+NpTe9NsyD0zRtRNMJ7oEZOHBiQhTwYp2hBv
 jYp9/Wtdb7OGA/p+3RCbsTj6jcgd8B2KF+hhqES8QKHtTXi87V5tiRsDjj6pzUzG
 Bwq+21AsalZu7JMcnxCKhOeq9JjgxQ4/W9nhcHPmGz83yTEdiQJpCqqiaNUNLvM8
 j8vttXXoCAATNldUV2mkeggEK2OL9Bn2B4Cy1pmS/63oRhKXdBHbTpfI80/TzdBb
 cetw2l9K/gOyT56TwlijAo3982Yc9SFUCw/PYwYkvmX2niqmoocNWV5xIJh8AzOT
 CHi2UvDwOUuD4p73xHeJAvIGpZaeri2RjxlR7gmjqVmezdTb9/WqxW1ZEIFVGwbG
 Hhmi+mKX3YAseoQJ8k3gl8/Cbjat8PPt8xluG46X5Z1YrI8FKW9Kvk0CkpVqzemJ
 pUervZ0TD+gbaMxeGsCvtr8n5FSAUBTeQ8YZuVCDqHMxwFssvz/mphKo7moJHts6
 v/MIYkkq9DhBr9WLzEEkKH4XXKBB42leB2zJsJ6MT/enCi+qXSTUZXO6TyJR3D7l
 0k3xpwWSF14/kFt7gptjJC5IHY81T+7A1nCXeIL4pBD11KLj+bN8kWV8naWjYqGh
 BJj3F4o8b+iDxG5AuCOq
 =OKHa
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-20160506' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

- Fix ordering of kernel/user entries in 'caller' mode, where the kernel and
  user parts were being correctly inverted but kept in place wrt each other,
  i.e. 'callee' (k1, k2, u3, u4) became 'caller' (k2, k1, u4, u3) when it
  should be 'caller' (u4, u3, k2, k1) (Chris Phlipot)

- In 'perf trace' don't print the raw arg syscall args for a syscall that has
  no arguments, like gettid(). This was happening because just checking if
  the syscall args list is NULL may mean that there are no args (e.g.: gettid)
  or that there is no tracepoint info (e.g.: clone) (Arnaldo Carvalho de Melo)

- Add extra output of counter values with 'perf stat -vv' (Andi Kleen)

Infrastructure changes:

- Expose callchain db export via the python API (Chris Phlipot)

Code reorganization:

- Move some more syscall arg beautifiers from the 'perf trace' main file to
  separate files in tools/perf/trace/beauty/, to reduce the main file line
  count (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 06:49:28 +02:00
Jiri Pirko
5113bfdbc6 mlxsw: spectrum: Fix ordering in mlxsw_sp_fini
Fixes: 0f433fa0ec ("mlxsw: spectrum_buffers: Implement shared buffer configuration")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 19:00:00 -04:00
Ido Schimmel
2889286585 mlxsw: spectrum: Add missing rollback in flood configuration
When we fail to set the flooding configuration for the broadcast and
unregistered multicast traffic, we should revert the flooding
configuration of the unknown unicast traffic.

Fixes: 0293038e0c ("mlxsw: spectrum: Add support for flood control")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:27:43 -04:00
Ido Schimmel
51554db2d2 mlxsw: spectrum: Fix rollback order in LAG join failure
Make the leave procedure in the error path symmetric to the join
procedure and first remove the port from the collector before
potentially destroying the LAG.

Fixes: 0d65fc1304 ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:27:42 -04:00
Jarno Rajahalme
229740c631 udp_offload: Set encapsulation before inner completes.
UDP tunnel segmentation code relies on the inner offsets being set for
an UDP tunnel GSO packet, but the inner *_complete() functions will
set the inner offsets only if 'encapsulation' is set before calling
them.  Currently, udp_gro_complete() sets 'encapsulation' only after
the inner *_complete() functions are done.  This causes the inner
offsets having invalid values after udp_gro_complete() returns, which
in turn will make it impossible to properly segment the packet in case
it needs to be forwarded, which would be visible to the user either as
invalid packets being sent or as packet loss.

This patch fixes this by setting skb's 'encapsulation' in
udp_gro_complete() before calling into the inner complete functions,
and by making each possible UDP tunnel gro_complete() callback set the
inner_mac_header to the beginning of the tunnel payload.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Reviewed-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Jarno Rajahalme
43b8448cd7 udp_tunnel: Remove redundant udp_tunnel_gro_complete().
The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Marc Angel
17af2bce88 macvtap: add namespace support to the sysfs device class
When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):

/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN

This is reproducible by running the following commands:

ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap

The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.

The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.

This patch adds namespace support to the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.

However, making the macvtap sysfs class namespace-aware has the side
effect of changing /sys/devices/virtual/net/macvtapX/tapNN  into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN.

This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces") and the fact that
class devices supporting namespaces are really not supposed to be placed
directly under other class devices.

To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
created inside the macvtapX directory.

Signed-off-by: Marc Angel <marc@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:09 -04:00
Rafael J. Wysocki
bf7cdff194 cpufreq: schedutil: Make it depend on CONFIG_SMP
Make the schedutil cpufreq governor depend on CONFIG_SMP, because
the scheduler-provided utilization numbers used by it are only
available with CONFIG_SMP set.

Fixes: 9bdcb44e39 (cpufreq: schedutil: New governor based on scheduler utilization data)
Reported-by: Steve Muckle <steve.muckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-06 22:33:33 +02:00
Rafael J. Wysocki
fd7c3c29f9 Merge back new cpuidle material for v4.7. 2016-05-06 22:08:46 +02:00
Linus Torvalds
0783783104 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull writeback fix from Jens Axboe:
 "Just a single fix for domain aware writeback, fixing a regression that
  can cause balance_dirty_pages() to keep looping while not getting any
  work done"

* 'for-linus' of git://git.kernel.dk/linux-block:
  writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-06 13:08:35 -07:00
Rafael J. Wysocki
dab2e29402 Merge back new device properties material for v4.7. 2016-05-06 22:07:33 +02:00
Rafael J. Wysocki
c8541203a6 Merge back new material for v4.7. 2016-05-06 22:05:16 +02:00
Eric Dumazet
47dcc20a39 ipv4: tcp: ip_send_unicast_reply() is not BH safe
I forgot that ip_send_unicast_reply() is not BH safe (yet).

Disabling preemption before calling it was not a good move.

Fixes: c10d9310ed ("tcp: do not assume TCP code is non preemptible")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andres Lagar-Cavilla  <andreslc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:02:43 -04:00
David S. Miller
4b307a8edb Merge branch 'bpf-direct-pkt-access'
Alexei Starovoitov says:

====================
bpf: introduce direct packet access

This set of patches introduce 'direct packet access' from
cls_bpf and act_bpf programs (which are root only).

Current bpf programs use LD_ABS, LD_INS instructions which have
to do 'if (off < skb_headlen)' for every packet access.
It's ok for socket filters, but too slow for XDP, since single
LD_ABS insn consumes 3% of cpu. Therefore we have to amortize the cost
of length check over multiple packet accesses via direct access
to skb->data, data_end pointers.

The existing packet parser typically look like:
  if (load_half(skb, offsetof(struct ethhdr, h_proto)) != ETH_P_IP)
     return 0;
  if (load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)) != IPPROTO_UDP ||
      load_byte(skb, ETH_HLEN) != 0x45)
     return 0;
  ...
with 'direct packet access' the bpf program becomes:
   void *data = (void *)(long)skb->data;
   void *data_end = (void *)(long)skb->data_end;
   struct eth_hdr *eth = data;
   struct iphdr *iph = data + sizeof(*eth);

   if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
      return 0;
   if (eth->h_proto != htons(ETH_P_IP))
      return 0;
   if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
      return 0;
   ...
which is more natural to write and significantly faster.
See patch 6 for performance tests:
21Mpps(old) vs 24Mpps(new) with just 5 loads.
For more complex parsers the performance gain is higher.

The other approach implemented in [1] was adding two new instructions
to interpreter and JITs and was too hard to use from llvm side.
The approach presented here doesn't need any instruction changes,
but the verifier has to work harder to check safety of the packet access.

Patch 1 prepares the code and Patch 2 adds new checks for direct
packet access and all of them are gated with 'env->allow_ptr_leaks'
which is true for root only.
Patch 3 improves search pruning for large programs.
Patch 4 wires in verifier's changes with net/core/filter side.
Patch 5 updates docs
Patches 6 and 7 add tests.

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:55 -04:00
Alexei Starovoitov
883e44e4de samples/bpf: add verifier tests
add few tests for "pointer to packet" logic of the verifier

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
65d472fb00 samples/bpf: add 'pointer to packet' tests
parse_simple.c - packet parser exapmle with single length check that
filters out udp packets for port 9

parse_varlen.c - variable length parser that understand multiple vlan headers,
ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
The packet is parsed layer by layer with multitple length checks.

parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
Same functionality as parse_simple.

simple = 24.1Mpps per core
varlen = 22.7Mpps
ldabs  = 21.4Mpps

Parser with LD_ABS instructions is slower than full direct access parser
which does more packet accesses and checks.

These examples demonstrate the choice bpf program authors can make between
flexibility of the parser vs speed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
f9c8d19d6c bpf: add documentation for 'direct packet access'
explain how verifier checks safety of packet access
and update email addresses.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
db58ba4592 bpf: wire in data and data_end for cls_act_bpf
allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers.
The bpf helpers that change skb->data need to update data_end pointer as well.
The verifier checks that programs always reload data, data_end pointers
after calls to such bpf helpers.
We cannot add 'data_end' pointer to struct qdisc_skb_cb directly,
since it's embedded as-is by infiniband ipoib, so wrapper struct is needed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
735b433397 bpf: improve verifier state equivalence
since UNKNOWN_VALUE type is weaker than CONST_IMM we can un-teach
verifier its recognition of constants in conditional branches
without affecting safety.
Ex:
if (reg == 123) {
  .. here verifier was marking reg->type as CONST_IMM
     instead keep reg as UNKNOWN_VALUE
}

Two verifier states with UNKNOWN_VALUE are equivalent, whereas
CONST_IMM_X != CONST_IMM_Y, since CONST_IMM is used for stack range
verification and other cases.
So help search pruning by marking registers as UNKNOWN_VALUE
where possible instead of CONST_IMM.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
969bf05eb3 bpf: direct packet access
Extended BPF carried over two instructions from classic to access
packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
but due to their design they have to do length check for every access.
When BPF is processing 20M packets per second single LD_ABS after JIT
is consuming 3% cpu. Hence the need to optimize it further by amortizing
the cost of 'off < skb_headlen' over multiple packet accesses.
One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
with similar usage as skb_header_pointer().
The kernel part for interpreter and x64 JIT was implemented in [1], but such
new insns behave like old ld_abs and abort the program with 'return 0' if
access is beyond linear data. Such hidden control flow is hard to workaround
plus changing JITs and rolling out new llvm is incovenient.

Therefore allow cls_bpf/act_bpf program access skb->data directly:
int bpf_prog(struct __sk_buff *skb)
{
  struct iphdr *ip;

  if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
      /* packet too small */
      return 0;

  ip = skb->data + ETH_HLEN;

  /* access IP header fields with direct loads */
  if (ip->version != 4 || ip->saddr == 0x7f000001)
      return 1;
  [...]
}

This solution avoids introduction of new instructions. llvm stays
the same and all JITs stay the same, but verifier has to work extra hard
to prove safety of the above program.

For XDP the direct store instructions can be allowed as well.

The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
the alignment. The complex packet parsers where packet pointer is adjusted
incrementally cannot be tracked for alignment, so allow byte access in such cases
and misaligned access on architectures that define efficient_unaligned_access

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:53 -04:00
Alexei Starovoitov
1a0dc1ac1d bpf: cleanup verifier code
cleanup verifier code and prepare it for addition of "pointer to packet" logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:53 -04:00
Rafael J. Wysocki
da43af961b Merge cpufreq fixes going into v4.6.
* pm-cpufreq-fixes:
  intel_pstate: Fix intel_pstate_get()
  cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
  cpufreq: st: enable selective initialization based on the platform
  cpufreq: intel_pstate: Fix processing for turbo activation ratio
2016-05-06 22:01:14 +02:00