Commit graph

595222 commits

Author SHA1 Message Date
Matt Fleming
62075e5818 efi/capsule: Make efi_capsule_pending() lockless
Taking a mutex in the reboot path is bogus because we cannot sleep
with interrupts disabled, such as when rebooting due to panic(),

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
  in_atomic(): 0, irqs_disabled(): 1, pid: 7, name: rcu_sched
  Call Trace:
    dump_stack+0x63/0x89
    ___might_sleep+0xd8/0x120
    __might_sleep+0x49/0x80
    mutex_lock+0x20/0x50
    efi_capsule_pending+0x1d/0x60
    native_machine_emergency_restart+0x59/0x280
    machine_emergency_restart+0x19/0x20
    emergency_restart+0x18/0x20
    panic+0x1ba/0x217

In this case all other CPUs will have been stopped by the time we
execute the platform reboot code, so 'capsule_pending' cannot change
under our feet. We wouldn't care even if it could since we cannot wait
for it complete.

Also, instead of relying on the external 'system_state' variable just
use a reboot notifier, so we can set 'stop_capsules' while holding
'capsule_mutex', thereby avoiding a race where system_state is updated
while we're in the middle of efi_capsule_update_locked() (since CPUs
won't have been stopped at that point).

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: joeyli <jlee@suse.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462570771-13324-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:06:13 +02:00
Ingo Molnar
35dc9ec107 Merge branch 'linus' into efi/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 07:00:07 +02:00
Ingo Molnar
ea7c285189 perf/core improvements and fixes:
User visible:
 
 - Fix ordering of kernel/user entries in 'caller' mode, where the kernel and
   user parts were being correctly inverted but kept in place wrt each other,
   i.e. 'callee' (k1, k2, u3, u4) became 'caller' (k2, k1, u4, u3) when it
   should be 'caller' (u4, u3, k2, k1) (Chris Phlipot)
 
 - In 'perf trace' don't print the raw arg syscall args for a syscall that has
   no arguments, like gettid(). This was happening because just checking if
   the syscall args list is NULL may mean that there are no args (e.g.: gettid)
   or that there is no tracepoint info (e.g.: clone) (Arnaldo Carvalho de Melo)
 
 - Add extra output of counter values with 'perf stat -vv' (Andi Kleen)
 
 Infrastructure:
 
 - Expose callchain db export via the python API (Chris Phlipot)
 
 Code reorganization:
 
 - Move some more syscall arg beautifiers from the 'perf trace' main file to
   separate files in tools/perf/trace/beauty/, to reduce the main file line
   count (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXLMC4AAoJENZQFvNTUqpAJvYQALplVTE2Yf7Umjw51jDhQPVi
 5AAWlEfjMkiW9a+8oWwFzmwNtTbe+NpTe9NsyD0zRtRNMJ7oEZOHBiQhTwYp2hBv
 jYp9/Wtdb7OGA/p+3RCbsTj6jcgd8B2KF+hhqES8QKHtTXi87V5tiRsDjj6pzUzG
 Bwq+21AsalZu7JMcnxCKhOeq9JjgxQ4/W9nhcHPmGz83yTEdiQJpCqqiaNUNLvM8
 j8vttXXoCAATNldUV2mkeggEK2OL9Bn2B4Cy1pmS/63oRhKXdBHbTpfI80/TzdBb
 cetw2l9K/gOyT56TwlijAo3982Yc9SFUCw/PYwYkvmX2niqmoocNWV5xIJh8AzOT
 CHi2UvDwOUuD4p73xHeJAvIGpZaeri2RjxlR7gmjqVmezdTb9/WqxW1ZEIFVGwbG
 Hhmi+mKX3YAseoQJ8k3gl8/Cbjat8PPt8xluG46X5Z1YrI8FKW9Kvk0CkpVqzemJ
 pUervZ0TD+gbaMxeGsCvtr8n5FSAUBTeQ8YZuVCDqHMxwFssvz/mphKo7moJHts6
 v/MIYkkq9DhBr9WLzEEkKH4XXKBB42leB2zJsJ6MT/enCi+qXSTUZXO6TyJR3D7l
 0k3xpwWSF14/kFt7gptjJC5IHY81T+7A1nCXeIL4pBD11KLj+bN8kWV8naWjYqGh
 BJj3F4o8b+iDxG5AuCOq
 =OKHa
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-20160506' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

- Fix ordering of kernel/user entries in 'caller' mode, where the kernel and
  user parts were being correctly inverted but kept in place wrt each other,
  i.e. 'callee' (k1, k2, u3, u4) became 'caller' (k2, k1, u4, u3) when it
  should be 'caller' (u4, u3, k2, k1) (Chris Phlipot)

- In 'perf trace' don't print the raw arg syscall args for a syscall that has
  no arguments, like gettid(). This was happening because just checking if
  the syscall args list is NULL may mean that there are no args (e.g.: gettid)
  or that there is no tracepoint info (e.g.: clone) (Arnaldo Carvalho de Melo)

- Add extra output of counter values with 'perf stat -vv' (Andi Kleen)

Infrastructure changes:

- Expose callchain db export via the python API (Chris Phlipot)

Code reorganization:

- Move some more syscall arg beautifiers from the 'perf trace' main file to
  separate files in tools/perf/trace/beauty/, to reduce the main file line
  count (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-07 06:49:28 +02:00
Antti Palosaari
9c574ad4d3 [media] af9035: correct eeprom offsets
Used memory mapped eeprom offsets were off-by 8 bytes.

Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-05-06 23:51:56 -03:00
Satoshi Nagahama
ab4d14528f [media] em28xx: add support for PLEX PX-BCUD (ISDB-S)
PX-BCUD has the following components:
   USB interface: Empia EM28178
   Demodulator: Toshiba TC90532 (works by code for TC90522)
   Tuner: Next version of Sharp QM1D1C0042

em28xx_dvb_init(): add init code for PLEX PX-BCUD with calling
px_bcud_init() that does things like pin configuration.

qm1d1c0042_init(): support the next version of QM1D1C0042, change to
choose an appropriate array of initial registers by reading chip id.

[mchehab@osg.samsung.com: fold a fixup patch and fix checkpatch.pl
 errors/warnings, where applicable]
Signed-off-by: Satoshi Nagahama <sattnag@aim.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-05-06 23:51:47 -03:00
Jiri Pirko
5113bfdbc6 mlxsw: spectrum: Fix ordering in mlxsw_sp_fini
Fixes: 0f433fa0ec ("mlxsw: spectrum_buffers: Implement shared buffer configuration")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 19:00:00 -04:00
Ido Schimmel
2889286585 mlxsw: spectrum: Add missing rollback in flood configuration
When we fail to set the flooding configuration for the broadcast and
unregistered multicast traffic, we should revert the flooding
configuration of the unknown unicast traffic.

Fixes: 0293038e0c ("mlxsw: spectrum: Add support for flood control")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:27:43 -04:00
Ido Schimmel
51554db2d2 mlxsw: spectrum: Fix rollback order in LAG join failure
Make the leave procedure in the error path symmetric to the join
procedure and first remove the port from the collector before
potentially destroying the LAG.

Fixes: 0d65fc1304 ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:27:42 -04:00
Jarno Rajahalme
229740c631 udp_offload: Set encapsulation before inner completes.
UDP tunnel segmentation code relies on the inner offsets being set for
an UDP tunnel GSO packet, but the inner *_complete() functions will
set the inner offsets only if 'encapsulation' is set before calling
them.  Currently, udp_gro_complete() sets 'encapsulation' only after
the inner *_complete() functions are done.  This causes the inner
offsets having invalid values after udp_gro_complete() returns, which
in turn will make it impossible to properly segment the packet in case
it needs to be forwarded, which would be visible to the user either as
invalid packets being sent or as packet loss.

This patch fixes this by setting skb's 'encapsulation' in
udp_gro_complete() before calling into the inner complete functions,
and by making each possible UDP tunnel gro_complete() callback set the
inner_mac_header to the beginning of the tunnel payload.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Reviewed-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Jarno Rajahalme
43b8448cd7 udp_tunnel: Remove redundant udp_tunnel_gro_complete().
The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Marc Angel
17af2bce88 macvtap: add namespace support to the sysfs device class
When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):

/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN

This is reproducible by running the following commands:

ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap

The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.

The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.

This patch adds namespace support to the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.

However, making the macvtap sysfs class namespace-aware has the side
effect of changing /sys/devices/virtual/net/macvtapX/tapNN  into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN.

This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces") and the fact that
class devices supporting namespaces are really not supposed to be placed
directly under other class devices.

To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
created inside the macvtapX directory.

Signed-off-by: Marc Angel <marc@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:09 -04:00
Rafael J. Wysocki
bf7cdff194 cpufreq: schedutil: Make it depend on CONFIG_SMP
Make the schedutil cpufreq governor depend on CONFIG_SMP, because
the scheduler-provided utilization numbers used by it are only
available with CONFIG_SMP set.

Fixes: 9bdcb44e39 (cpufreq: schedutil: New governor based on scheduler utilization data)
Reported-by: Steve Muckle <steve.muckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-06 22:33:33 +02:00
Rafael J. Wysocki
fd7c3c29f9 Merge back new cpuidle material for v4.7. 2016-05-06 22:08:46 +02:00
Linus Torvalds
0783783104 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull writeback fix from Jens Axboe:
 "Just a single fix for domain aware writeback, fixing a regression that
  can cause balance_dirty_pages() to keep looping while not getting any
  work done"

* 'for-linus' of git://git.kernel.dk/linux-block:
  writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-06 13:08:35 -07:00
Rafael J. Wysocki
dab2e29402 Merge back new device properties material for v4.7. 2016-05-06 22:07:33 +02:00
Rafael J. Wysocki
c8541203a6 Merge back new material for v4.7. 2016-05-06 22:05:16 +02:00
Eric Dumazet
47dcc20a39 ipv4: tcp: ip_send_unicast_reply() is not BH safe
I forgot that ip_send_unicast_reply() is not BH safe (yet).

Disabling preemption before calling it was not a good move.

Fixes: c10d9310ed ("tcp: do not assume TCP code is non preemptible")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andres Lagar-Cavilla  <andreslc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:02:43 -04:00
David S. Miller
4b307a8edb Merge branch 'bpf-direct-pkt-access'
Alexei Starovoitov says:

====================
bpf: introduce direct packet access

This set of patches introduce 'direct packet access' from
cls_bpf and act_bpf programs (which are root only).

Current bpf programs use LD_ABS, LD_INS instructions which have
to do 'if (off < skb_headlen)' for every packet access.
It's ok for socket filters, but too slow for XDP, since single
LD_ABS insn consumes 3% of cpu. Therefore we have to amortize the cost
of length check over multiple packet accesses via direct access
to skb->data, data_end pointers.

The existing packet parser typically look like:
  if (load_half(skb, offsetof(struct ethhdr, h_proto)) != ETH_P_IP)
     return 0;
  if (load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)) != IPPROTO_UDP ||
      load_byte(skb, ETH_HLEN) != 0x45)
     return 0;
  ...
with 'direct packet access' the bpf program becomes:
   void *data = (void *)(long)skb->data;
   void *data_end = (void *)(long)skb->data_end;
   struct eth_hdr *eth = data;
   struct iphdr *iph = data + sizeof(*eth);

   if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
      return 0;
   if (eth->h_proto != htons(ETH_P_IP))
      return 0;
   if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
      return 0;
   ...
which is more natural to write and significantly faster.
See patch 6 for performance tests:
21Mpps(old) vs 24Mpps(new) with just 5 loads.
For more complex parsers the performance gain is higher.

The other approach implemented in [1] was adding two new instructions
to interpreter and JITs and was too hard to use from llvm side.
The approach presented here doesn't need any instruction changes,
but the verifier has to work harder to check safety of the packet access.

Patch 1 prepares the code and Patch 2 adds new checks for direct
packet access and all of them are gated with 'env->allow_ptr_leaks'
which is true for root only.
Patch 3 improves search pruning for large programs.
Patch 4 wires in verifier's changes with net/core/filter side.
Patch 5 updates docs
Patches 6 and 7 add tests.

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:55 -04:00
Alexei Starovoitov
883e44e4de samples/bpf: add verifier tests
add few tests for "pointer to packet" logic of the verifier

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
65d472fb00 samples/bpf: add 'pointer to packet' tests
parse_simple.c - packet parser exapmle with single length check that
filters out udp packets for port 9

parse_varlen.c - variable length parser that understand multiple vlan headers,
ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
The packet is parsed layer by layer with multitple length checks.

parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
Same functionality as parse_simple.

simple = 24.1Mpps per core
varlen = 22.7Mpps
ldabs  = 21.4Mpps

Parser with LD_ABS instructions is slower than full direct access parser
which does more packet accesses and checks.

These examples demonstrate the choice bpf program authors can make between
flexibility of the parser vs speed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
f9c8d19d6c bpf: add documentation for 'direct packet access'
explain how verifier checks safety of packet access
and update email addresses.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
db58ba4592 bpf: wire in data and data_end for cls_act_bpf
allow cls_bpf and act_bpf programs access skb->data and skb->data_end pointers.
The bpf helpers that change skb->data need to update data_end pointer as well.
The verifier checks that programs always reload data, data_end pointers
after calls to such bpf helpers.
We cannot add 'data_end' pointer to struct qdisc_skb_cb directly,
since it's embedded as-is by infiniband ipoib, so wrapper struct is needed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
735b433397 bpf: improve verifier state equivalence
since UNKNOWN_VALUE type is weaker than CONST_IMM we can un-teach
verifier its recognition of constants in conditional branches
without affecting safety.
Ex:
if (reg == 123) {
  .. here verifier was marking reg->type as CONST_IMM
     instead keep reg as UNKNOWN_VALUE
}

Two verifier states with UNKNOWN_VALUE are equivalent, whereas
CONST_IMM_X != CONST_IMM_Y, since CONST_IMM is used for stack range
verification and other cases.
So help search pruning by marking registers as UNKNOWN_VALUE
where possible instead of CONST_IMM.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Alexei Starovoitov
969bf05eb3 bpf: direct packet access
Extended BPF carried over two instructions from classic to access
packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
but due to their design they have to do length check for every access.
When BPF is processing 20M packets per second single LD_ABS after JIT
is consuming 3% cpu. Hence the need to optimize it further by amortizing
the cost of 'off < skb_headlen' over multiple packet accesses.
One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
with similar usage as skb_header_pointer().
The kernel part for interpreter and x64 JIT was implemented in [1], but such
new insns behave like old ld_abs and abort the program with 'return 0' if
access is beyond linear data. Such hidden control flow is hard to workaround
plus changing JITs and rolling out new llvm is incovenient.

Therefore allow cls_bpf/act_bpf program access skb->data directly:
int bpf_prog(struct __sk_buff *skb)
{
  struct iphdr *ip;

  if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
      /* packet too small */
      return 0;

  ip = skb->data + ETH_HLEN;

  /* access IP header fields with direct loads */
  if (ip->version != 4 || ip->saddr == 0x7f000001)
      return 1;
  [...]
}

This solution avoids introduction of new instructions. llvm stays
the same and all JITs stay the same, but verifier has to work extra hard
to prove safety of the above program.

For XDP the direct store instructions can be allowed as well.

The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
the alignment. The complex packet parsers where packet pointer is adjusted
incrementally cannot be tracked for alignment, so allow byte access in such cases
and misaligned access on architectures that define efficient_unaligned_access

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:53 -04:00
Alexei Starovoitov
1a0dc1ac1d bpf: cleanup verifier code
cleanup verifier code and prepare it for addition of "pointer to packet" logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:53 -04:00
Rafael J. Wysocki
da43af961b Merge cpufreq fixes going into v4.6.
* pm-cpufreq-fixes:
  intel_pstate: Fix intel_pstate_get()
  cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
  cpufreq: st: enable selective initialization based on the platform
  cpufreq: intel_pstate: Fix processing for turbo activation ratio
2016-05-06 22:01:14 +02:00
Linus Torvalds
3f86ba5d0c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This contains two fixes: a boot fix for older SGI/UV systems, and an
  APIC calibration fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
  x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init
2016-05-06 12:59:27 -07:00
David S. Miller
95aef7cecb Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-05-05

This series contains updates to i40e and i40evf.

The theme behind this series is code reduction, yeah!  Jesse provides
most of the changes starting with a refactor of the interpretation of
a tunnel which lets us start using the hardware's parsing.  Removed
the packet split receive routine and ancillary code in preparation
for the Rx-refactor.  The refactor of the receive routine,
aligns the receive routine with the one in ixgbe which was highly
optimized.  The hardware supports a 16 byte descriptor for receive,
but the driver was never using it in production.  There was no performance
benefit to the real driver of 16 byte descriptors, so drop a whole lot
of complexity while getting rid of the code.  Fixed a bug where while
changing the number of descriptors using ethtool, the driver did not
test the limits of the system memory before permanently assuming it
would be able to get receive buffer memory.

Mitch fixes a memory leak of one page each time the driver is opened by
allocating the correct number of receive buffers and do not fiddle with
next_to_use in the VF driver.

Arnd Bergmann fixed a indentation issue by adding the appropriate
curly braces in i40e_vc_config_promiscuous_mode_msg().

Julia Lawall fixed an issue found by Coccinelle, where i40e_client_ops
structure can be const since it is never modified.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 15:55:30 -04:00
Tedd Ho-Jeong An
a0af53b511 Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b]
This patch adds support for Intel Bluetooth device 8265 also known
as Windstorm Peak (WsP).

T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  6 Spd=12   MxCh= 0
D:  Ver= 2.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=8087 ProdID=0a2b Rev= 0.10
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  64 Ivl=1ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms

Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-05-06 21:52:35 +02:00
David Ahern
b3b4663c97 net: vrf: Create FIB tables on link create
Tables have to exist for VRFs to function. Ensure they exist
when VRF device is created.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 15:51:47 -04:00
Sudarsana Reddy Kalluru
8e0ddc040a qede: prevent chip hang when increasing channels
qede requires qed to provide enough resources to accommodate 16 combined
channels, but that upper-bound isn't actually being enforced by it.
Instead, qed inform back to qede how many channels can be opened based on
available resources - but that calculation doesn't really take into account
the resources requested by qede; Instead it considers other FW/HW available
resources.

As a result, if a user would increase the number of channels to more than
16 [e.g., using ethtool] the chip would hang.

This change increments the resources requested by qede to 64 combined
channels instead of 16; This value is an upper bound on the possible
available channels [due to other FW/HW resources].

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 15:50:33 -04:00
David Ahern
1d2f7b2d95 net: ipv6: tcp reset, icmp need to consider L3 domain
Responses for packets to unused ports are getting lost with L3 domains.

IPv4 has ip_send_unicast_reply for sending TCP responses which accounts
for L3 domains; update the IPv6 counterpart tcp_v6_send_response.
For icmp the L3 master check needs to be moved up in icmp6_send
to properly respond to UDP packets to a port with no listener.

Fixes: ca254490c8 ("net: Add VRF support to IPv6 stack")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 15:49:07 -04:00
Jon Maxwell
f37bd0cced cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure
We recently had a system crash in the cnic module. Vmcore analysis confirmed
that "ip link up" was executed which failed due to an allocation failure
because of memory fragmentation. Futher analysis revealed that the cnic irq
vector was still allocated after the "ip link up" that failed. When
"ip link down" was executed it called free_msi_irqs() which crashed the system
because the cnic irq was still inuse.

PANIC: "kernel BUG at drivers/pci/msi.c:411!"

The code execution was:

cnic_netdev_event()
if (event == NETDEV_UP) {
.
.
       ▹       if (!cnic_start_hw(dev))
cnic_start_hw()
calls cnic_cm_open() which failed with -ENOMEM
cnic_start_hw() then took the err1 path:

err1:↩
       cp->free_resc(dev);↩ <---- frees resources but not irq vector
       pci_dev_put(dev->pcidev);↩
       return err;↩
}↩

This returns control back to cnic_netdev_event() but now the cnic irq vector
is still allocated even although cnic_cm_open() failed. The next
"ip link down" while trigger the crash.

The cnic_start_hw() routine is not handling the allocation failure correctly.
Fix this by checking whether CNIC_DRV_STATE_HANDLES_IRQ flag is set indicating
that the hardware has been started in cnic_start_hw(). If it has then call
cp->stop_hw() which frees the cnic irq vector and cnic resources. Otherwise
just maintain the previous behaviour and free cnic resources.

I reproduced this by injecting an ENOMEM error into cnic_cm_alloc_mem()s return
code.

# ip link set dev enpX down
# ip link set dev enpX up <--- hit's allocation failure
# ip link set dev enpX down <--- crashes here

With this patch I confirmed there was no crash in the reproducer.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 15:44:54 -04:00
Linus Torvalds
01ec716761 Power management and ACPI fixes for v4.6-rc7
- Fix for a recent regression in the intel_pstate driver causing
    it to fail to restore the HWP (HW-managed P-states) configuration
    of the boot CPU after suspend-to-RAM (Rafael Wysocki).
 
  - Fix for two recent regressions in the intel_pstate driver, one
    that can trigger a divide by zero if the driver is accessed via
    sysfs before it manages to take the first sample and one causing
    it to fail to update a structure field used in a trace point, so
    the information coming from it is less useful (Rafael Wysocki).
 
  - Fix for a problem in the sti-cpufreq driver introduced during
    the 4.5 cycle that causes it to break CPU PM in multi-platform
    kernels by registering cpufreq-dt (which subsequently doesn't
    work) unconditionally and preventing the driver that would
    actually work from registering (Sudeep Holla).
 
  - Stable-candidate fix for an ARM64 cpuidle issue causing idle
    state usage counters to be incorrectly updated for idle states
    that were not entered due to errors (James Morse).
 
  - Fix for a recently introduced issue in the OPP (Operating
    Performance Points) framework causing it to print bogus error
    messages for missing optional regulators (Viresh Kumar).
 
  - Fix for a recently introduced issue in the generic device
    properties framework that may cause it to attempt to dereferece
    and invalid pointer in some cases (Heikki Krogerus).
 
  - Fix for a deadlock in the ACPICA core that may be triggered
    by device (eg. Thunderbolt) hotplug (Prarit Bhargava).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXLIwPAAoJEILEb/54YlRxT+wP/ROEo/r5IaRZ2k8cphWjsiKk
 k9eDuWBL2KZ29ikghXs/vVY2fMbtQkaDT5h57imsUEKoEzI3MlYA3OkQyffFOcsY
 dz/9EnG6K9Efi6VS1dS1tNCgl45aIeHLCqlVPOBCZ9TwSoAERdNJGqItJdS2YKIA
 +C1LGrWl4UiJ95AOof9PHfKfnWxrnRbpIsB2PbxD0Swe5vfskrHoRWGOAMLJIwpF
 7NvEJ15fryDIvlMR/ggNrg2L2piOu1fJl2kVZYWZTb/u+qAO3utxTQN4y++zTSNb
 LAN78Hq/nJu156SSioO9fLa0wPaU+k2OChfWXtlMsTDK+L5EQz4G3pJwi5FA8QTD
 nfeZNC9VgqfP4LtqWw05h/AOw4A0XUeuwB8Edbc+WG5twzULqDhS57jew4A4xX8d
 jOsvK5syygnR+/rExWc0NWSmCH0g1u6mCUWXQuocfSb/oOEcUGq5RSixRNRfmJUq
 9XNF3hbp7W/Vnp9GWT30Md+CenrEtQXFK8ZQtg0ckBl+b5bEqKYs6FXGqCkUmjZy
 Qgt5sqxgdLWtslS3vSu1/mdryeaLmXNO6c6wueSPMmLyYODEoIHSSka9N9O0Inwv
 d106p7gUy3/ETamC3lbnyHkUrAru74Qh8rErKpqaRLkKfcIq7YCB073fxbqlamzz
 X4n8a1H37LefLqmKwIbF
 =pU+A
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "Fixes for problems introduced or discovered recently (intel_pstate,
  sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework,
  generic device properties framework) and one fix for a hotplug-related
  deadlock in ACPICA that's been there forever, but is nasty enough.

  Specifics:

   - Fix for a recent regression in the intel_pstate driver causing it
     to fail to restore the HWP (HW-managed P-states) configuration of
     the boot CPU after suspend-to-RAM (Rafael Wysocki).

   - Fix for two recent regressions in the intel_pstate driver, one that
     can trigger a divide by zero if the driver is accessed via sysfs
     before it manages to take the first sample and one causing it to
     fail to update a structure field used in a trace point, so the
     information coming from it is less useful (Rafael Wysocki).

   - Fix for a problem in the sti-cpufreq driver introduced during the
     4.5 cycle that causes it to break CPU PM in multi-platform kernels
     by registering cpufreq-dt (which subsequently doesn't work)
     unconditionally and preventing the driver that would actually work
     from registering (Sudeep Holla).

   - Stable-candidate fix for an ARM64 cpuidle issue causing idle state
     usage counters to be incorrectly updated for idle states that were
     not entered due to errors (James Morse).

   - Fix for a recently introduced issue in the OPP (Operating
     Performance Points) framework causing it to print bogus error
     messages for missing optional regulators (Viresh Kumar).

   - Fix for a recently introduced issue in the generic device
     properties framework that may cause it to attempt to dereferece and
     invalid pointer in some cases (Heikki Krogerus).

   - Fix for a deadlock in the ACPICA core that may be triggered by
     device (eg Thunderbolt) hotplug (Prarit Bhargava)"

* tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / OPP: Remove useless check
  ACPICA: Dispatcher: Update thread ID for recursive method calls
  intel_pstate: Fix intel_pstate_get()
  cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
  cpufreq: st: enable selective initialization based on the platform
  ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
  device property: Avoid potential dereferences of invalid pointers
2016-05-06 11:58:45 -07:00
Linus Torvalds
17d25a337b Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "This contains a single fix that fixes a nohz tick stopping bug when
  mixed-poliocy SCHED_FIFO and SCHED_RR tasks are present on a runqueue"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
2016-05-06 11:53:27 -07:00
Olli Salonen
7977a15ede [media] az6027: Add support for Elgato EyeTV Sat v3
Another version of Elgato EyeTV Sat USB DVB-S2 adapter needs just
a USB ID addition.

Signed-off-by: Christian Knippel <namerp@gmail.com>
Reported-by: Olli Salonen <olli.salonen@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-05-06 15:53:23 -03:00
Kevin Fitch
7e31223ff0 [media] i2c: saa7115: Support CJC7113 detection
It's been reported that CJC7113 devices are returning
all 1s when reading register 0:

  "1111111111111111" found @ 0x4a (stk1160)

This new device is apparently compatible with SA7113, so let's
add a quirk to allow its autodetection. Given there isn't
any known differences with SAA7113, this commit does not
introduces a new saa711x_model value.

Reported-by: Philippe Desrochers <desrochers.philippe@gmail.com>
Signed-off-by: Kevin Fitch <kfitch42@gmail.com>
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-05-06 15:52:51 -03:00
Javier González
116f7d4a21 lightnvm: reserved space calculation incorrect
The nvm_dev->max_pages_per_blk variable was removed in favor of the new
nvm->sec_per_blk variable. The ->max_pages_per_blk variable was still
used in rrpc_capacity, reporting the reserved capacity to zero. Replace
with ->sec_per_blk to calculate the reserved area again.

Signed-off-by: Javier González <javier@cnexlabs.com>
Updated patch description. Was "lightnvm: eliminate redundant variable"
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
6d5be9590b lightnvm: rename nr_pages to nr_ppas on nvm_rq
The number of ppas contained on a request is not necessarily the number
of pages that it maps to neither on the target nor on the device side.
In order to avoid confusion, rename nr_pages to nr_ppas since it is what
the variable actually contains.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
df414b33bb lightnvm: add is_cached entry to struct ppa_addr
A target requires a method to identify PPAs that are either cached in
memory or on disk. This can efficiently be maintained within the PPA.
The target host-side translation table can then lookup a PPA and know
from the PPA if it is cached or on disk. In the case it is cached, it is
the responsibility of the target to maintain this cache.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
04a8aa173b lightnvm: expose gennvm_mark_blk to targets
Targets can update a block state when having a reference to an
in-memory virtual block. In the case that a target does not keep the
block metadata in memory, it does not have a way to update this
structure.

Therefore, expose gennvm_mark_blk() through the media managers
->mark_blk() callback and let targets update the state structure through
this callback.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
976bdfcae3 lightnvm: remove mgt targets on mgt removal
Targets associated with a device manager are not freed on device
removal. They have to be manually removed before shutdown. Make sure
any outstanding targets are freed upon shutdown.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Arnd Bergmann
45bbd0529e lightnvm: pass dma address to hardware rather than pointer
A recent change to lightnvm added code to pass a kernel pointer
to the hardware, which gcc complained about:

drivers/nvme/host/lightnvm.c: In function 'nvme_nvm_rqtocmd':
drivers/nvme/host/lightnvm.c:472:32: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
  c->ph_rw.metadata = cpu_to_le64(rqd->meta_list);

It looks like this has no way of working anyway, so this changes
the code to pass the dma_address instead. This was most likely
what was intended here. Neither of the two are currently ever
written to, so the effect is the same for now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: a34b1eb78e21 ("lightnvm: enable metadata to be sent to device")
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
cca87bc9d3 lightnvm: do not assume sequential lun alloc.
When doing GC, rrpc calculates the physical LUN to which the rrpc block
belongs too. This calculation is based on the assumption that LUNs are
assigned sequentially to the LUN list. Use the reference to the LUN
instead. This saves us the calculation and allows us to align LUNs in a
different manner to, for example, take advantage of devide parallelism.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Sagi Grimberg
b86d8d363e nvme/lightnvm: Log using the ctrl named device
Align with the rest of the nvme subsystem.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
75b8564932 lightnvm: rename dma helper functions
Until now, the dma pool have been exclusively used to allocate the ppa
list being sent to the device. In pblk (upcoming), we use these pools to
allocate metadata too. Thus, we generalize the names of some variables
on the dma helper functions to make the code more readable.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
003fad376b lightnvm: enable metadata to be sent to device
Enable metadata buffer to be sent to the device through the metadata
field on the physical rw nvme command. The size of the metadata buffer
must follow dev->oob_size * # of PPAs.

Signed-off-by: Javier González <javier@cnexlabs.com>
Updated description.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Javier González
57682b4915 lightnvm: do not free unused metadata on rrpc
rrpc does not save any metadata on a given request. Thus, do not attempt
to free the metadata dma region.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
293a6e8e27 lightnvm: fix out of bound ppa lun id on bb tbl
The ppa configured for retrieving the bad block table uses the internal
lun id to setup the get bad block ppa. This increases monotonically
with the number luns available. When configuring a ppa, the channel and
lun must be specified separately, leading to an out of bound memory
access in gennvm_block_bb when lun id goes beyond the luns available
within a channel.

Additional, remove out of bound check in gennvm_block_bb(), as it was a
buggy to begin with.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00
Matias Bjørling
00ee6cc3b7 lightnvm: refactor set_bb_tbl for accepting ppa list
The set_bb_tbl takes struct nvm_rq and only uses its ppa_list and
nr_pages internally. Instead, make these two variables explicit.
This allows a user to call it without initializing a struct nvm_rq
first.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-06 12:51:10 -06:00