The Kconfig for this file is:
drivers/acpi/Kconfig:config ACPI_PCI_SLOT
drivers/acpi/Kconfig: bool "PCI slot detection driver"
...and so it is not built as a module. Hence including module.h
and everything that comes with it just for the no-op MODULE_LICENSE
and friends is rather heavy handed.
The license/author info is found at the top of the file, so we
just remove the MODULE_LICENSE etc and the include of module.h
We delete the DRIVER_VERSION macro as well since it is unused.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Kconfig currently controlling compilation of these files are:
drivers/acpi/Kconfig:menuconfig PMIC_OPREGION
drivers/acpi/Kconfig: bool "PMIC (Power Management Integrated Circuit) operation region support"
drivers/acpi/Kconfig:config BXT_WC_PMIC_OPREGION
drivers/acpi/Kconfig: bool "ACPI operation region support for BXT WhiskeyCove PMIC"
drivers/acpi/Kconfig:config XPOWER_PMIC_OPREGION
drivers/acpi/Kconfig: bool "ACPI operation region support for XPower AXP288 PMIC"
...meaning they currently are not being built as a module by anyone.
Lets remove the couple traces of modular infrastructure use, so that
when reading the code there is no doubt it is builtin-only.
We delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.
One file was using module_init. Since module_init translates to
device_initcall in the non-modular case, the init ordering remains
unchanged with this commit.
In one case we replace the module.h with export.h since that file
is exporting some symbols, but does not use __init. The other two
are using __init and so module.h gets replaced with init.h there.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
One regression in the Device Tree handling for OMAP NAND handling of the ELM
node. TI migrated to using the property name "ti,elm-id", but forgot to keep
compatibility with the old "elm_id" property.
Also, might as well send out this MAINTAINERS fixup now.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXiYHwAAoJEFySrpd9RFgtJK0P/0xH8ChIrWio8zakcndyjIb+
LdHXlkrQfXs/6vzVAaZLeVI/KnElUL4jIVr2Xg4QYYLdyg/VzOyOGMpb2hdNvYZo
RSJf2wI+k0vcP68CQFROl+Sj2FOpWjDRB92zxyikk1D++O6jOLQWK4oUBhNgximG
qmPBl7mzhjAPrFOu1DJVIcaXxC2t5JQffAUCy0rrGBmhfiZgKxlwDnS7raumj6eq
8xBil5UoFDfIWqneh5kKphexm3t0gSdibi4V2W6EKvRK2WAhcunfBLEld7qo0Zy1
lgdaoLgEsgqjA58oQ/4MdVMZDPfin4JlKsdUcWRVXpGl5nxIB6iAJzyTHPHgltL3
aLJFjP0oT9emUI4T4cAzWRYa9M2RKOIjwfNrrjWYjkb3NOa4OIg+9xWgy8CkkeJG
BTGndVCBjXLZ1k6enQUKZ8Wf+c8BRZlVFTsvxFx89VOie3+NwfUK6Cv6mOXUdCk8
TyxYF/8R2fazP46fSCv9tW2A0FakHsNqqVm9kUDEV+c/juLtzJCHTwwRUjFJxopv
2oyHqeAUjNx65usp+vTw96oHp3BXef8Cw/9PIck3R6E6LVaZuXKlMBADP6/DLYmS
XoufM25SuPg6d0WcSzcaket60tP8wNPhsn4MB0W0rHGnMaoKY4svbew0IGSIwJPt
uWaPMn/FOVWTxcID1ln1
=tSga
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20160715' of git://git.infradead.org/linux-mtd
Pull MTD fix from Brian Norris:
"Late MTD fix for v4.7:
One regression in the Device Tree handling for OMAP NAND handling of
the ELM node. TI migrated to using the property name "ti,elm-id", but
forgot to keep compatibility with the old "elm_id" property.
Also, might as well send out this MAINTAINERS fixup now"
* tag 'for-linus-20160715' of git://git.infradead.org/linux-mtd:
mtd: nand: omap2: Add check for old elm binding
MAINTAINERS: Add file patterns for mtd device tree bindings
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
There was a check on CAP_NET_ADMIN in cpmac_set_settings, but this
check is already done in dev_ethtool, so no need to repeat it before
calling the generic function.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
There was a check on CAP_NET_ADMIN in au1000_set_settings, but this
check is already done in dev_ethtool, so no need to repeat it before
calling the generic function.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phy in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building under W=1, the lack of lkdtm.h in lkdtm_usercopy.c and
lkdtm_rodata.c was discovered. This fixes the issue and consolidates
the common header and the pr_fmt macro for simplicity and regularity
across each test source file.
Signed-off-by: Kees Cook <keescook@chromium.org>
A conversion of the lkdtm core module added an "#ifdef CONFIG_KPROBES" check,
but a number of functions then become unused:
drivers/misc/lkdtm_core.c:340:16: error: 'lkdtm_debugfs_entry' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:122:12: error: 'jp_generic_ide_ioctl' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:114:12: error: 'jp_scsi_dispatch_cmd' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:106:12: error: 'jp_hrtimer_start' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:97:22: error: 'jp_shrink_inactive_list' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:89:13: error: 'jp_ll_rw_block' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:83:13: error: 'jp_tasklet_action' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:75:20: error: 'jp_handle_irq_event' defined but not used [-Werror=unused-function]
drivers/misc/lkdtm_core.c:68:21: error: 'jp_do_irq' defined but not used [-Werror=unused-function]
This adds the same #ifdef everywhere. There is probably a better way to do the
same thing, but for now this avoids the new warnings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: c479e3fd88 ("lkdtm: use struct arrays instead of enums")
[kees: moved some code around to better consolidate the #ifdefs]
Signed-off-by: Kees Cook <keescook@chromium.org>
Nothing is decrementing the index "i" while we are cleaning up the
fragments we could not successful transmit.
Fixes: 9cde94506e ("bgmac: implement scatter/gather support")
Reported-by: coverity (CID 1352048)
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent change to tracepoint napi:napi_poll changed the order of
the parameters that perf scripts sees, the printk was correct. The
problem was that the new parameters (work and budget) were pushed
in front of dev_name.
The new parameters obviously need to be appended to keep backward
compatible.
Fixes: 1db19db7f5 ("net: tracepoint napi:napi_poll add work and budget")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default value of idle interval is 2 mins, but for most time when
screen shutdown, there are still operations during the 2 mins interval,
and gc's sleep time is about 30 secs to 60 secs, so there is almost no
chance for GC thread to do garbage collecting.
Set default value of idle interval value from 2 mins to 5 secs for
fixing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch reverts 19a5f5e2ef (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.
The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:
For write case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- dio_bio_submit
update user data to old block address
For read case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_balance_fs
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- write_checkpoint
- do_checkpoint
- clear_prefree_segments
- f2fs_issue_discard
discard old block adress
- dio_bio_submit
update user buffer from obsolete block address
In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In 1TB storage, we need to admit 22841 prefree segments, which can consume
too much segments.
This patch sets 8GB in max. prefree segments in that case.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This is to fix wrong error pointer handling flow reported by Dan.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull input fixes from Dmitry Torokhov:
"A few last-minute updates for the input subsystem"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ts4800-ts - add missing of_node_put after calling of_parse_phandle
Input: synaptics-rmi4 - use of_get_child_by_name() to fix refcount
Revert "Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE"
Input: xpad - validate USB endpoint count during probe
Input: add SW_PEN_INSERTED define
Add support for axis inversion / swapping using the new
touchscreen_parse_properties() and touchscreen_set_mt_pos()
functionality.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Use the touchscreen_parse_properties() and touchscreen_report_pos() to
perform coordinates transformation, instead of DIY code, which results in a
nice cleanup.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Add support for inverting / swapping axes using the new
touchscreen_parse_properties() and touchscreen_report_pos()
functionality.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Extend touchscreen_parse_properties() with support for the
touchscreen-inverted-x/y and touchscreen-swapped-x-y properties and
add touchscreen_set_mt_pos() and touchscreen_report_pos() helper
functions for storing coordinates into a input_mt_pos struct, or
directly reporting them, taking these properties into account.
This commit also modifies the existing callers of
touchscreen_parse_properties() to pass in NULL for the new third
argument, keeping the existing behavior.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jiri Pirko says:
====================
mlxsw: Couple of fixes
Couple of fixes for mlxsw driver from Ido.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets entering the switch are mapped to a Switch Priority (SP)
according to their PCP value (untagged frames are mapped to SP 0).
The packets are classified to a priority group (PG) buffer in the port's
headroom according to their SP.
The switch maintains another mapping (SP to IEEE priority), which is
used to generate PFC frames for lossless PGs. This mapping is
initialized to IEEE = SP % 8.
Therefore, when mapping SP 'x' to PG 'y' we create a situation in which
an IEEE priority is mapped to two different PGs:
IEEE 'x' ---> SP 'x' ---> PG 'y'
IEEE 'x' ---> SP 'x + 8' ---> PG '0' (default)
Which is invalid, as a flow can use only one PG buffer.
Fix this by mapping both SP 'x' and 'x + 8' to the same PG buffer.
Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of supported traffic classes that can have ETS and PFC
simultaneously enabled is not subject to user configuration, so make
sure we always initialize them to the correct values following a set
operation.
Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can't have PAUSE frames and PFC both enabled on the same port, but
the fact that ieee_setpfc() was called doesn't necessarily mean PFC is
enabled.
Only emit errors when PAUSE frames and PFC are enabled simultaneously.
Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device supports link autonegotiation, so let the user know about it
by indicating support via ethtool ops.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting a new speed we need to disable and enable the port for the
changes to take effect. We currently only do that if the operational
state of the port is up. However, setting a new speed following link
training failure will require us to explicitly set the port down and then
up.
Instead, disable and enable the port based on its administrative state.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch switch to use skb array instead of sk_receive_queue to
avoid spinlock contentions. Tests shows about 21% improvements for
guest rx pps:
Before: 1472731 pkts/s
After: 1786289 pkts/s
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We decide the rxq through calculating its hash which is not necessary
if we only have one rx queue. So this patch skip this and just return
queue 0. Test shows 22% improving on guest rx pps.
Before: 1201504 pkts/s
After: 1472731 pkts/s
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull workqueue fix from Tejun Heo:
"The optimization for setting unbound worker affinity masks collided
with recent scheduler changes triggering warning messages.
This late pull request fixes the bug by removing the optimization"
* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix setting affinity of unbound worker threads
Update AnalyzeSuspend to v4.2:
- kprobe support for function tracing
- config file support in lieu of command line options
- advanced callgraph support for function debug
- dev mode for monitoring common sources of delay, e.g. msleep, udelay
- many bug fixes and formatting upgrades
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Without this check, the following XFS_I invocations would return bad
pointers when used on non-XFS inodes (perhaps pointers into preceding
allocator chunks).
This could be used by an attacker to trick xfs_swap_extents into
performing locking operations on attacker-chosen structures in kernel
memory, potentially leading to code execution in the kernel. (I have
not investigated how likely this is to be usable for an attack in
practice.)
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2016-07-14
This series contains fixes to i40e and ixgbe.
Alex fixes issues found in i40e_rx_checksum() which was broken, where the
checksum was being returned valid when it was not.
Kiran fixes a bug which was found when we abruptly remove a cable which
caused a panic. Set the VSI broadcast promiscuous mode during VSI add
sequence and prevents adding MAC filter if specified MAC address is
broadcast.
Paolo Abeni fixes a bug by returning the actual work done, capped to
weight - 1, since the core doesn't allow to return the full budget when
the driver modifies the NAPI status.
Guilherme Piccoli fixes an issue where the q_vector initialization
routine sets the affinity _mask of a q_vector based on v_idx value.
This means a loop iterates on v_idx, which is an incremental value, and
the cpumask is created based on this value. This is a problem in
systems with multiple logical CPUs per core (like in SMT scenarios).
Changed the way q_vector's affinity_mask is created to resolve the issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool -i provides a driver version that is hard coded.
Export the same value via "modinfo".
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
BPF event output helper improvements
This set adds improvements to the BPF event output helper to
support non-linear data sampling, here specifically, for skb
context. For details please see individual patches. The set
is based against net-next tree.
v1 -> v2:
- Integrated and adapted Peter's diff into patch 1, updated
the remaining ones accordingly. Thanks Peter!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.
We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the bpf_perf_event_output() helper as a preparation into
two parts. The new bpf_perf_event_output() will prepare the raw
record itself and test for unknown flags from BPF trace context,
where the __bpf_perf_event_output() does the core work. The
latter will be reused later on from bpf_event_output() directly.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for non-linear data on raw records. It
extends raw records to have one or multiple fragments that will
be written linearly into the ring slot, where each fragment can
optionally have a custom callback handler to walk and extract
complex, possibly non-linear data.
If a callback handler is provided for a fragment, then the new
__output_custom() will be used instead of __output_copy() for
the perf_output_sample() part. perf_prepare_sample() does all
the size calculation only once, so perf_output_sample() doesn't
need to redo the same work anymore, meaning real_size and padding
will be cached in the raw record. The raw record becomes 32 bytes
in size without holes; to not increase it further and to avoid
doing unnecessary recalculations in fast-path, we can reuse
next pointer of the last fragment, idea here is borrowed from
ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
path for PERF_SAMPLE_RAW minimal.
This facility is needed for BPF's event output helper as a first
user that will, in a follow-up, add an additional perf_raw_frag
to its perf_raw_record in order to be able to more efficiently
dump skb context after a linear head meta data related to it.
skbs can be non-linear and thus need a custom output function to
dump buffers. Currently, the skb data needs to be copied twice;
with the help of __output_custom() this work only needs to be
done once. Future users could be things like XDP/BPF programs
that work on different context though and would thus also have
a different callback function.
The few users of raw records are adapted to initialize their frag
data from the raw record itself, no change in behavior for them.
The code is based upon a PoC diff provided by Peter Zijlstra [1].
[1] http://thread.gmane.org/gmane.linux.network/421294
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>