In order to open up the possibility to internally transform a BPF program
into an alternative and possibly non-trivial reversible representation, we
need to keep the original BPF program around, so that it can be passed back
to user space w/o the need of a complex decoder.
The reason for that use case resides in commit a8fc927780 ("sk-filter:
Add ability to get socket filter program (v2)"), that is, the ability
to retrieve the currently attached BPF filter from a given socket used
mainly by the checkpoint-restore project, for example.
Therefore, we add two helpers sk_{store,release}_orig_filter for taking
care of that. In the sk_unattached_filter_create() case, there's no such
possibility/requirement to retrieve a loaded BPF program. Therefore, we
can spare us the work in that case.
This approach will simplify and slightly speed up both, sk_get_filter()
and sock_diag_put_filterinfo() handlers as we won't need to successively
decode filters anymore through sk_decode_filter(). As we still need
sk_decode_filter() later on, we're keeping it around.
Joint work with Alexei Starovoitov.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a jited flag into sk_filter struct in order to indicate
whether a filter is currently jited or not. The size of sk_filter is
not being expanded as the 32 bit 'len' member allows upper bits to be
reused since a filter can currently only grow as large as BPF_MAXINSNS.
Therefore, there's enough room also for other in future needed flags to
reuse 'len' field if necessary. The jited flag also allows for having
alternative interpreter functions running as currently, we can only
detect jit compiled filters by testing fp->bpf_func to not equal the
address of sk_run_filter().
Joint work with Alexei Starovoitov.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai <tiwai@suse.de> says:
> The letter 'X' has been already used for SUSE kernels for very long
> time, to indicate the external supported modules. Can the new flag be
> changed to another letter for avoiding conflict...?
> (BTW, we also use 'N' for "no support", too.)
Note: this code should be cleaned up, so we don't have such maps in
three places!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
vmwgfx render-node support and drm + ttm changes it depends upon.
Pull request of 2014-03-28
* tag 'vmwgfx-next-2014-03-28' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Bump driver minor and date
drm/vmwgfx: Enable render nodes
drm/vmwgfx: Tighten the security around buffer maps
drm/ttm: Add a ttm_ref_object_exists function
drm/vmwgfx: Tighten security around surface sharing v2
drm/vmwgfx: Allow prime fds in the surface reference ioctls
drm/vmwgfx: Drop authentication requirement on UNREF ioctls
drm/vmwgfx: Reinstate and tighten security around legacy master model
drm/vmwgfx: Use a per-device semaphore for reservation protection
drm: Add a function to get the ioctl flags
drm: Protect the master management with a drm_device::master_mutex v3
drm: Remove the minor master list
drm: Improve on minor type helpers v3
drm: Make control nodes master-less v3
drm: Break out ioctl permission check to a separate function v2
drm: Have the crtc code only reference master from legacy nodes v2
Pull vfs fixes from Al Viro:
"Switch mnt_hash to hlist, turning the races between __lookup_mnt() and
hash modifications into false negatives from __lookup_mnt() (instead
of hangs)"
On the false negatives from __lookup_mnt():
"The *only* thing we care about is not getting stuck in __lookup_mnt().
If it misses an entry because something in front of it just got moved
around, etc, we are fine. We'll notice that mount_lock mismatch and
that'll be it"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch mnt_hash to hlist
don't bother with propagate_mnt() unless the target is shared
keep shadowed vfsmounts together
resizable namespace.c hashes
I am the new kernel tree Documentation maintainer (except for parts that
are handled by other people, of course).
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull input updates from Dmitry Torokhov:
"Some more updates for the input subsystem.
You will get a fix for race in mousedev that has been causing quite a
few oopses lately and a small fixup for force feedback support in
evdev"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: mousedev - fix race when creating mixed device
Input: don't modify the id of ioctl-provided ff effect on upload failure
It its possible to configure your PAM stack to refuse login if audit
messages (about the login) were unable to be sent. This is common in
many distros and thus normal configuration of many containers. The PAM
modules determine if audit is enabled/disabled in the kernel based on
the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel. If it gets any other error else it refuses to let the login
proceed.
Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace. So many forms of containers have never
worked if audit was enabled in the kernel.
BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM). Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!! Clearly this was
a bug, but it is a bug some people expected.
With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out. Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!) Obviously some people were not happy that what used to let
users log in, now didn't!
This fix is kinda hacky. We return ECONNREFUSED for all non-init
relevant namespaces. That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'. They don't really work, since audit
isn't logging things. But it's what most users want.
In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world. This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...
Reported-by: Andre Tomt <andre@tomt.net>
Reported-by: Adam Richter <adam_richter2004@yahoo.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating. Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.
[fix for dumb braino folded]
Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The driver is only supported on DT enabled platforms. Convert the
driver to DT so that it can probe properly.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The driver is only supported on DT enabled platforms. Convert the
driver to DT so that it can probe properly.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The driver is only supported on DT enabled platforms. Convert the
driver to DT so that it can probe properly.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Use the regmap APIs for this driver instead of custom pm8xxx
APIs. This breaks this driver's dependency on the pm8xxx APIs and
allows us to easily port it to other bus protocols in the future.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Simplify the error paths and reduce the lines of code in this
driver by using the devm_* APIs.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The gpio configuration in this driver doesn't work because the
gpio.h include doesn't exist. Remove the configuration as it
isn't strictly necessary, allowing us to actually compile this
driver. If it's needed in the future, it should be done via a
pinctrl driver.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This patch adds a new driver for keypad for Cirrus Logic CLPS711X CPUs.
Target CPU contain keyboard interface which can scan 8 column lines,
so we can read row GPIOs to read status and determine asserted state.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The new symbols provide the same API as the 64-bit variants, so they
should have the same symbol version name. This can't break
userspace, since these symbols are new for 32-bit Linux.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/0a869bce03d25619565b1eee7d69a4fd15fd203a.1396124118.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The core implementation of cs_change didn't follow the documentation
which says that cs_change in the middle of the transfer means to briefly
deassert chip select, instead it followed buggy drivers which change the
polarity of chip select. Use a delay of 10us between deassert and
reassert simply from pulling numbers out of a hat.
Reported-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Paul Durrant says:
====================
xen-netback: fix rx slot estimation
Sander Eikelenboom reported an issue with ring overflow in netback in
3.14-rc3. This turns outo be be because of a bug in the ring slot estimation
code. This patch series fixes the slot estimation, fixes the BUG_ON() that
was supposed to catch the issue that Sander ran into and also makes a small
fix to start_new_rx_buffer().
v3:
- Added a cap of MAX_SKB_FRAGS to estimate in patch #2
v2:
- Added BUG_ON() to patch #1
- Added more explanation to patch #3
====================
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption
that meta_slots_used == ring slots used. This is not necessarily the case
for GSO packets, because the non-prefix GSO protocol consumes one more ring
slot than meta-slot for the 'extra_info'. This patch changes the test to
actually check ring slots.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
The worse-case estimate for skb ring slot usage in xenvif_rx_action()
fails to take fragment page_offset into account. The page_offset does,
however, affect the number of times the fragmentation code calls
start_new_rx_buffer() (i.e. consume another slot) and the worse-case
should assume that will always return true. This patch adds the page_offset
into the DIV_ROUND_UP for each frag.
Unfortunately some frontends aggressively limit the number of requests
they post into the shared ring so to avoid an estimate that is 'too'
pessimal it is capped at MAX_SKB_FRAGS.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/marvell/mvneta.c
The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
The at8031 can work on polling mode and interrupt mode.
Add ack_interrupt and config intr funcs to enable
interrupt mode for it.
Signed-off-by: Zhao Qiang <B45475@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ERROR: "(foo*)" should be "(foo *)"
ERROR: "foo * bar" should be "foo *bar"
Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ERROR: open brace '{' following enum go on the same line
ERROR: open brace '{' following struct go on the same line
ERROR: trailing statements should be on next line
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WARNING: please, no space before tabs
WARNING: please, no spaces at the start of a line
ERROR: spaces required around that ':' (ctx:VxW)
ERROR: spaces required around that '>' (ctx:VxV)
ERROR: spaces required around that '>=' (ctx:VxV)
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer fix from Ingo Molnar:
"A late breaking fix from John. (The bug fixed has a hard lockup
potential, but that was not observed, warnings were)"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Revert to calling clock_was_set_delayed() while in irq context
Eric W. Biederman says:
====================
netpoll: Cleanups and fixes
This should be a small set of safe cleanups and fixes to netpoll.
The fixes are vlan headers are now always inserted when needed, and
napi polling is always avoided when network devices are closed.
There are a bunch of little cleanups removing unnecessary code, fixing
function naming, not taking unnecessary locks and removing general
silliness.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Ceph fix from Sage Weil:
"This drops a bad assert that a few users have been hitting but we've
only recently been able to track down"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: drop an unsafe assertion
Stop taking the transmit lock when a network device has specified
NETIF_F_LLTX.
If no locks needed to trasnmit a packet this is the ideal scenario for
netpoll as all packets can be trasnmitted immediately.
Even if some locks are needed in ndo_start_xmit skipping any unnecessary
serialization is desirable for netpoll as it makes it more likely a
debugging packet may be trasnmitted immediately instead of being
deferred until later.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the assumption that the skbs that make it to
netpoll_send_skb_on_dev are allocated with find_skb, such that
skb->users == 1 and nothing is attached that would prevent the skbs from
being freed from hard irq context.
Remove this assumption by replacing __kfree_skb on error paths with
dev_kfree_skb_irq (in hard irq context) and kfree_skb (in process
context).
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netpoll_rx_enable and netpoll_rx_disable functions have always
controlled polling the network drivers transmit and receive queues.
Rename them to netpoll_poll_enable and netpoll_poll_disable to make
their functionality clear.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today netpoll_rx_enable and netpoll_rx_disable are called from
dev_close and and __dev_close, and not from dev_close_many.
Move the calls into __dev_close_many so that we have a single call
site to maintain, and so that dev_close_many gains this protection as
well. Which importantly makes batched network device deletes safe.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>