We currently have two ways to account traffic in netfilter:
- iptables chain and rule counters:
# iptables -L -n -v
Chain INPUT (policy DROP 3 packets, 867 bytes)
pkts bytes target prot opt in out source destination
8 1104 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
- use flow-based accounting provided by ctnetlink:
# conntrack -L
tcp 6 431999 ESTABLISHED src=192.168.1.130 dst=212.106.219.168 sport=58152 dport=80 packets=47 bytes=7654 src=212.106.219.168 dst=192.168.1.130 sport=80 dport=58152 packets=49 bytes=66340 [ASSURED] mark=0 use=1
While trying to display real-time accounting statistics, we require
to pool the kernel periodically to obtain this information. This is
OK if the number of flows is relatively low. However, in case that
the number of flows is huge, we can spend a considerable amount of
cycles to iterate over the list of flows that have been obtained.
Moreover, if we want to obtain the sum of the flow accounting results
that match some criteria, we have to iterate over the whole list of
existing flows, look for matchings and update the counters.
This patch adds the extended accounting infrastructure for
nfnetlink which aims to allow displaying real-time traffic accounting
without the need of complicated and resource-consuming implementation
in user-space. Basically, this new infrastructure allows you to create
accounting objects. One accounting object is composed of packet and
byte counters.
In order to manipulate create accounting objects, you require the
new libnetfilter_acct library. It contains several examples of use:
libnetfilter_acct/examples# ./nfacct-add http-traffic
libnetfilter_acct/examples# ./nfacct-get
http-traffic = { pkts = 000000000000, bytes = 000000000000 };
Then, you can use one of this accounting objects in several iptables
rules using the new nfacct match (which comes in a follow-up patch):
# iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic
# iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic
The idea is simple: if one packet matches the rule, the nfacct match
updates the counters.
Thanks to Patrick McHardy, Eric Dumazet, Changli Gao for reviewing and
providing feedback for this contribution.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
vmwgfx: fix incorrect VRAM size check in vmw_kms_fb_create()
drm/radeon/kms: bail on BTC parts if MC ucode is missing
Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches.
Theorical limit on number of flows is 2^32
Fix some buggy RPS/RFS macros as well.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
CC: Xi Wang <xi.wang@gmail.com>
CC: Laurent Chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6373a9a286 (netem: use vmalloc for distribution table) added a
regression, since vfree() is called while holding a spinlock and BH
being disabled.
Fix this by doing the pointers swap in critical section, and freeing
after spinlock release.
Also add __GFP_NOWARN to the kmalloc() try, since we fallback to
vmalloc().
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes one scheduling while atomic error:
[ 385.565186] ctnetlink v0.93: registering with nfnetlink.
[ 385.565349] BUG: scheduling while atomic: lt-expect_creat/16163/0x00000200
It can be triggered with utils/expect_create included in
libnetfilter_conntrack if the FTP helper is not loaded.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This fixes one bogus error that is returned to user-space:
libnetfilter_conntrack/utils# ./expect_get
TEST: get expectation (-1)(Unknown error 18446744073709551504)
This patch includes the correct handling for EAGAIN (nfnetlink
uses this error value to restart the operation after module
auto-loading).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The get and zero operations have to be done in an atomic context,
otherwise counters added between them will be lost.
This problem was spotted by Changli Gao while discussing the
nfacct infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The merge of m68knommu left the linker scripts a little disorganized.
Some consistent naming and squashing two of scripts that just include
others can simplify things a lot.
So merge the two simple including scripts, and rename the nommu script
to be consistent with the existing m68k linker scripts.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
There is a race on reading the ColdFire slice timer current count and the
total clock count so far. Interrupts are off, and we may have just missed
getting a new timer wrap event interrupt. Check for this and adjust the
cycle count and current read count accordingly.
Also the slice timer counts down from the terminal count. So in read_clk()
we need take the current clock count away from the terminal count.
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Disbale the CPU cache really early in the ColdFire startup code. We set
up some variables for RAM sizing and we want to make they stick in RAM.
Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
The traditional 68000 processors and the newer reduced instruction set
ColdFire processors do not support the 32*32->64 multiply or the 64/32->32
divide instructions. This is not a difference based on the presence of
a hardware MMU or not.
Create a new config symbol to mark that a CPU type doesn't support the
longer multiply/divide instructions. Use this then as a basis for using
the fast 64bit based divide (in div64.h) and for linking in the extra
libgcc functions that may be required (mulsi3, divsi3, etc).
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
We have two implementations of the IP checksuming code for the m68k arch.
One uses the more advanced instructions available in 68020 and above
processors, the other uses the simpler instructions available on the
original 68000 processors and the modern ColdFire processors.
This simpler code is pretty much the same as the generic lib implementation
of the IP csum functions. So lets just switch over to using that. That
means we can completely remove the checksum_no.c file, and only have the
local fast code used for the more complex 68k CPU family members.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
There is no reason we can't make the saved fp registers the same for all
m68k types and ColdFire. There is a little wasted space, but the code
consistency and cleanliness is a big win.
sigcontext.h is an exported header, but currently there is no in-mainline
users of the !__uClinux__ and __mcoldfire__ case that this change effects.
Even better this change actually makes this structure consistent with
the out-of-mainline ColdFire/MMU code.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Commit 61619b1207 ("m68k: merge mmu and
non-mmu include/asm/entry.h files") made the trap entry code basically
the same for mmu and non-mmu builds. This means we no longer need code
to mark the stack frame as "system-call" type or other in the non-mmu
trap handling entry points. This is done in the SAVE_ALL_INT macro now.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
The non-MMU builds of m68k allow a fixed kernel boot command line to
be configured at configure time. Allow this MMU builds as well.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Output a table of the kernel memory regions at boot time.
This is taken directly from the ARM architecture code that does this.
The table looks like this:
Virtual kernel memory layout:
vector : 0x00000000 - 0x00000400 ( 0 KiB)
kmap : 0xd0000000 - 0xe0000000 ( 256 MiB)
vmalloc : 0xc0000000 - 0xcfffffff ( 255 MiB)
lowmem : 0x00000000 - 0x02000000 ( 32 MiB)
.init : 0x00128000 - 0x00134000 ( 48 KiB)
.text : 0x00020000 - 0x00118d54 ( 996 KiB)
.data : 0x00118d60 - 0x00126000 ( 53 KiB)
.bss : 0x00134000 - 0x001413e0 ( 53 KiB)
This has been very useful while debugging the ColdFire virtual memory
support code. But in general I think it is nice to know extacly where
the kernel has layed everything out on boot.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
The mach_gettod function pointer is only called from the time_no.c
code. So move its actual definition to there too. It is currently in
setup_no.c for no particularly good reason.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
The selection of the CONFIG_GENERIC_ATOMIC64 option is not specific to the
MMU being present and enabled. It is a property of certain CPU families.
So select it based on those CPU types being selected.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Currently on m68k we have a comeplete thread_info structure stored inside
of the thread_struct, and we also have it in the initial part of the kernel
stack. Mostly the code currently uses the one inside of the thread_struct,
only using the "task" pointer from the stack based one.
This is wasteful and confusing, we should only have the single instance of
thread_info inside the stack page. And this is the norm for all other
architectures.
This change makes m68k handle thread_info consistently on both MMU enabled
and non-MMU setups.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
We have a duplicate name and definition for the offset of the thread.info
struct within the task struct in our asm-offsets.c code. Remove one of them,
and consolidate to use a single define, TASK_INFO.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
The init_task code can be the same for both mmu and non-mmu targets.
None of the alignment carried out in the the current init_task code
is necessary. The linker script takes care of aligning the init_thread
structure to a THREAD SIZE boundary, and that is all we need.
So use the init_task.c code for all target types, that makes m68k
code consistent with what most other architectures do.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
gpiolib provides __gpio_to_irq() to map gpiolib gpios to interrupts - hook
that up on m68k.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Updated to merge the valid bits of the two m68k patches.
This converts the m86k clocksources to use clocksource_register_hz/khz
This is untested, so any assistance in testing would be appreciated!
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] omap3isp: Fix crash caused by subdevs now having a pointer to devnodes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: call d_instantiate after all ops are setup
Btrfs: fix worker lock misuse in find_worker
The defconfig for the Integrator only include the serial drivers
for the PL010 as found in the Integrator/AP, to make sure we
don't loose the serial console we simply select both PL010 and
PL011 drivers from the Integrator Kconfig entries so they are
always included when applicable.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We don't need to hardcode the peripheral IDs for the Integrator/CP,
the numbers found in the hardware are correct anyway.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netfilter: xt_connbytes: handle negation correctly
net: relax rcvbuf limits
rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()
net: introduce DST_NOPEER dst flag
mqprio: Avoid panic if no options are provided
bridge: provide a mtu() method for fake_dst_ops
Add a req_running field to the pl330_thread to track which request (if
any) has been submitted to the DMA. This mechanism replaces the old
one in which we tried to guess the same by looking at the PC of the
DMA, which could prevent the driver from sending more requests if it
didn't guess correctly.
Reference: <1323631637-9610-1-git-send-email-javi.merino@arm.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
Tested-by: Tushar Behera <tushar.behera@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add default value for CONFIG_ARCH_NR_GPIO to Kconfig and remove the
definition in gpio.h. We can't remove gpio.h yet as asm/gpio.h still
includes it.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add default value for CONFIG_ARCH_NR_GPIO to Kconfig and remove the
definition in gpio.h.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Change ARCH_NR_GPIO into a Kconfig variable as suggested by Russel King.
This makes ARCH_NR_GPIO single zImage friendly. The default value for
tegra is defined as well.
Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This removes the hardcoded shift value and lets the clockevent core
come up with suitable mult and div factors. Tested on the
Integrator/CP.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This break-out from Colin Cross' cpufreq-aware TWD patch
will handle the case when our localtimer's clock changes with
the cpu clock. A cpufreq transtion notifier will be registered
only if the platform has supplied a specified clock to the TWD.
After a cpufreq transition, update the clockevent's frequency
by fetching the new clock rate from the clock framework and
reprogram the next clock event.
The necessary changes in the clockevents framework was done by
Thomas Gleixner in kernel v3.0.
ChangeLog v1->v2:
- Replace IS_ERR_OR_NULL() with IS_ERR() in twd_clk check.
- Update code to use the already existing per-cpu array of TWD
clockevents instead of adding cruft.
[Broke out, ifdef:ed CPUfreq stuff for non-cpufreq configs]
[Rebased to newer TWD base with per-CPU clock array]
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This break-out from Colin Cross' cpufreq-aware TWD patch will
optionally retrieve the clock rate of the TWD from an external
clock. A variant of this patch has been proposed by Rob Herring
as well.
The basic idea is to avoid recalibrating the rate of the clock
at boot if the platform already know what rate the clock to the
TWD block has.
ChangeLog v1->v2: added clk_[prepare|unprepare] calls.
[Broke out of larger SMP TWD patch]
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This break-out from Colin Cross' cpufreq-aware TWD patch will
just modernize the clock event registration code to use
clockevents_config_and_register().
[Broke out of larger SMP TWD patch]
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync(). Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.
The older_than_this checks that are now more strictly enforced since
writeback: avoid livelocking WB_SYNC_ALL writeback
by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once. But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened. I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.
Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty. This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path. This means any inode that is pinned
is skipped. With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it. The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds. This means under certain scenarious time based
metadata writeback never happens.
Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>