All users of the ore will need to check if current code
supports the given layout. For example RAID5/6 is not
currently supported.
So move all the checks from exofs/super.c to a new
ore_verify_layout() to be used by ore users.
Note that any new layout should be passed through the
ore_verify_layout() because the ore engine will prepare
and verify some internal members of ore_layout, and
assumes it's called.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Users like the objlayout-driver would like to only pass
a partial device table that covers the IO in question.
For example exofs divides the file into raid-group-sized
chunks and only serves group_width number of devices at
a time.
The partiality is communicated by setting
ore_componets->first_dev and the array covers all logical
devices from oc->first_dev upto (oc->first_dev + oc->numdevs)
The ore_comp_dev() API receives a logical device index
and returns the actual present device in the table.
An out-of-range dev_index will BUG.
Logical device index is the theoretical device index as if
all the devices of a file are present. .i.e:
total_devs = group_width * mirror_p1 * group_count
0 <= dev_index < total_devs
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Memory conditions and max_bio constraints might cause us to
not comply to the full length of the requested IO. Instead of
failing the complete IO we can issue a shorter read/write and
report how much was actually executed in the ios->length
member.
All users must check ios->length at IO_done or upon return of
ore_read/write and re-issue the reminder of the bytes. Because
other wise there is no error returned like before.
This is part of the effort to support the pnfs-obj layout driver.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
If at read/write_done the actual IO was shorter then requested,
reported in returned ios->length. It is not an error. The reminder
of the pages should just be unlocked but not marked uptodate or
end_page_writeback. They will be re issued later by the VFS.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Move the check and preparation of the ios->kern_buff case to
later inside _write_mirror().
Since read was never used with ios->kern_buff its support is removed
instead of fixed.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Now that each ore_io_state covers only a single raid group.
A single striping_info math is needed. Embed one inside
ore_io_state to cache the calculation results and eliminate
an extra call.
Also the outer _prepare_for_striping is removed since it does nothing.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Usually a single IO is confined to one group of devices
(group_width) and at the boundary of a raid group it can
spill into a second group. Current code would allocate a
full device_table size array at each io_state so it can
comply to requests that span two groups. Needless to say
that is very wasteful, specially when device_table count
can get very large (hundreds even thousands), while a
group_width is usually 8 or 10.
* Change ore API to trim on IO that spans two raid groups.
The user passes offset+length to ore_get_rw_state, the
ore might trim on that length if spanning a group boundary.
The user must check ios->length or ios->nrpages to see
how much IO will be preformed. It is the responsibility
of the user to re-issue the reminder of the IO.
* Modify exofs To copy spilled pages on to the next IO.
This means one last kick is needed after all coalescing
of pages is done.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
The balloon driver's "current_pages" is very different from
totalram_pages. Self-ballooning needs to be driven by
the latter. Also, Committed_AS doesn't account for pages
used by the kernel so:
1) Add totalreserve_pages to Committed_AS for the normal target.
2) Enforce a floor for when there are little or no user-space threads
using memory (e.g. single-user mode) to avoid OOMs. The floor
function includes a "min_usable_mb" tuneable in case we discover
later that the floor function is still too aggressive in some
workloads, though likely it will not be needed.
Changes since version 4:
- change floor calculation so that it is not as aggressive; this version
uses a piecewise linear function similar to minimum_target in the 2.6.18
balloon driver, but modified to add to totalreserve_pages instead of
subtract from max_pfn, the 2.6.18 version causes OOMs on recent kernels
because the kernel has expanded over time
- change safety_margin to min_usable_mb and comment on its use
- since committed_as does NOT include kernel space (and other reserved
pages), totalreserve_pages is now added to committed_as. The result is
less aggressive self-ballooning, but theoretically more appropriate.
Changes since version 3:
- missing include causes compile problem when CONFIG_FRONTSWAP is disabled
- add comments after includes
Changes since version 2:
- missing include causes compile problem only on 32-bit
Changes since version 1:
- tuneable safety margin added
[v5: avi.miller@oracle.com: still too aggressive, seeing some OOMs]
[v4: konrad.wilk@oracle.com: fix compile when CONFIG_FRONTSWAP is disabled]
[v3: guru.anbalagane@oracle.com: fix 32-bit compile]
[v2: konrad.wilk@oracle.com: make safety margin tuneable]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
[v1: Altered description and added an extra include]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
On some build configurations PER_CLEAR_ON_SETID symbol was not
found when compiling smack_lsm.c. This patch fixes the issue by
explicitly doing #include <linux/personality.h>.
Signed-off-by: Jarkko Sakkinen <jarkko.j.sakkinen@gmail.com>
Signed-off-by: Casey Schaufler <cschaufler@cschaufler-intel.(none)>
The readlink function doesn't guarantee that a '\0' will be put at the
end of the provided buffer if there is no space left.
No need to do "buf[len] = '\0';" since the buffer is allocated with
zalloc().
Link: http://lkml.kernel.org/r/4E986ABF.9040706@intra2net.com
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Just press 'S' on any assembly line and the source code will be hidden
while the current line remains selected. Press 'S' again to show them
back.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-efmxm5etouebb7es0kkyqqwa@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its becoming common to allow the user to filter out parts of the data
structure being browsed, like already done in the hists browser and in
the annotate browser in the next commit, so provide it directly in the
ui_browser class list_head helpers.
More work required to move the equivalent routines found now in the
hists browser to the rb_tree helpers.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-jk7danyt1d9ji4e3o2xuthpn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We lost that functionality on ed7e566, restore it.
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-z8eb8af2x46x42lgpn1ustid@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With underlying dynamic data structures we need to invalidate pointers
to them after a timer, as that entry may have vanished (decayed in top,
for instance).
I forgot about browser_ui->top. Fix it by resetting it to null after a
timer. The seek operation from SEEK_SET will then set it to a valid
entry because it starts from rb_first(&hists->entries).
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-2ssjm0ouh9tsz4dwkcu7c40n@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The trace_pipe_raw handler holds a cached page from the time the file
is opened to the time it is closed. The cached page is used to handle
the case of the user space buffer being smaller than what was read from
the ring buffer. The left over buffer is held in the cache so that the
next read will continue where the data left off.
After EOF is returned (no more data in the buffer), the index of
the cached page is set to zero. If a user app reads the page again
after EOF, the check in the buffer will see that the cached page
is less than page size and will return the cached page again. This
will cause reading the trace_pipe_raw again after EOF to return
duplicate data, making the output look like the time went backwards
but instead data is just repeated.
The fix is to not reset the index right after all data is read
from the cache, but to reset it after all data is read and more
data exists in the ring buffer.
Cc: stable <stable@kernel.org>
Reported-by: Jeremy Eder <jeder@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
tracing_enabled option is deprecated.
To start/stop tracing, write to /sys/kernel/debug/tracing/tracing_on
without tracing_enabled. This patch is based on Linux 3.1.0-rc1
Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
Link: http://lkml.kernel.org/r/1313127022-23830-1-git-send-email-leemgs1@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
BUG: sleeping function called from invalid context at /local/scratch/dariof/linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 3256, name: qemu-dm
1 lock held by qemu-dm/3256:
#0: (&(&priv->lock)->rlock){......}, at: [<ffffffff813223da>] gntdev_ioctl+0x2bd/0x4d5
Pid: 3256, comm: qemu-dm Tainted: G W 3.1.0-rc8+ #5
Call Trace:
[<ffffffff81054594>] __might_sleep+0x131/0x135
[<ffffffff816bd64f>] mutex_lock_nested+0x25/0x45
[<ffffffff8131c7c8>] free_xenballooned_pages+0x20/0xb1
[<ffffffff8132194d>] gntdev_put_map+0xa8/0xdb
[<ffffffff816be546>] ? _raw_spin_lock+0x71/0x7a
[<ffffffff813223da>] ? gntdev_ioctl+0x2bd/0x4d5
[<ffffffff8132243c>] gntdev_ioctl+0x31f/0x4d5
[<ffffffff81007d62>] ? check_events+0x12/0x20
[<ffffffff811433bc>] do_vfs_ioctl+0x488/0x4d7
[<ffffffff81007d4f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<ffffffff8109168b>] ? lock_release+0x21c/0x229
[<ffffffff81135cdd>] ? rcu_read_unlock+0x21/0x32
[<ffffffff81143452>] sys_ioctl+0x47/0x6a
[<ffffffff816bfd82>] system_call_fastpath+0x16/0x1b
gntdev_put_map tries to acquire a mutex when freeing pages back to the
xenballoon pool, so it cannot be called with a spinlock held. In
gntdev_release, the spinlock is not needed as we are freeing the
structure later; in the ioctl, only the list manipulation needs to be
under the lock.
Reported-and-Tested-By: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The xenstore daemon does not have to run in the xen initial domain;
however, Linux currently uses xen_initial_domain to test if a loopback
event channel should be used instead of the event channel provided in
Xen's start_info structure. Instead, if the event channel passed in the
start_info structure is not valid, assume that this domain will run
xenstored locally and set up the event channel.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The xenbus event channel established in xenbus_init is intended to be a
loopback channel, but the remote domain was hardcoded to 0; this will
cause the channel to be unusable when xenstore is not being run in
domain 0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: revert to using a kthread for AIL pushing
xfs: force the log if we encounter pinned buffers in .iop_pushbuf
xfs: do not update xa_last_pushed_lsn for locked items
SFI tables reside in RAM and should not be modified once they are
written. Current code went to set pentry->irq to zero which causes
subsequent reads to fail with invalid SFI table checksum. This will
break kexec as the second kernel fails to validate SFI tables.
To fix this we use temporary variable for irq number.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ftmac100 allocates a page per skb fragment. We must account
PAGE_SIZE increments on skb->truesize, not the actual frag length.
If frame is under 64 bytes, page is freed, so increase truesize only for
bigger frames.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Po-Yu Chuang <ratbert@faraday-tech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size
by the caller to properly account frag sizes in skb->truesize
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3 allocates a page per skb fragment. We must account
PAGE_SIZE increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ftgmac100 allocates a page per skb fragment. We must account
PAGE_SIZE increments on skb->truesize, not the actual frag length.
If frame is under 64 bytes, page is freed, and truesize adjusted.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Po-Yu Chuang <ratbert@faraday-tech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The w83627ehf driver is improperly reporting thermal diode sensors as
type 2, instead of 3. This caused "sensors" and possibly other
monitoring tools to report these sensors as "transistor" instead of
"thermal diode".
Furthermore, diode subtype selection (CPU vs. external) is only
supported by the original W83627EHF/EHG. All later models only support
CPU diode type, and some (NCT6776F) don't even have the register in
question so we should avoid reading from it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
sky2 allocates a page per skb fragment. We must account
PAGE_SIZE increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1000e allocates a page per skb fragment. We must account
PAGE_SIZE increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ixgbe allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1000 allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1000 allocates a full page per skb fragment. We must account PAGE_SIZE
increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2 allocates a full page per fragment. We must account PAGE_SIZE
increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix skb truesize underestimations of this driver.
Each frag truesize is exactly rx_frag_size bytes. (2048 bytes per
default)
A driver should not use "sizeof(struct sk_buff)" at all.
Signed-off-by: Eric Dumazet <eric.dumazet>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.
Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.
This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.
At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.
Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.
Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gpio_base was set to 0 if no system platform data or open firmware
platform data was provided. This led to conflicts, if any other gpiochip
with a gpiobase of 0 was instantiated already. Setting it to -1 will
automatically use the first one available.
Signed-off-by: Hartmut Knaack <knaack.h@gmx.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
With commit f64ad1a0e2, "gpio/omap: cleanup _set_gpio_wakeup(), remove
ifdefs", access to build time conditionally omitted 'suspend_wakeup'
member of the 'gpio_bank' structure has been placed unconditionally in
function _set_gpio_wakeup(), which is always built. This resulted in the
driver compilation broken for certain OMAP1, i.e., non-OMAP16xx,
configurations.
Really required or not in previously excluded cases, define this
structure member unconditionally as a fix.
Tested with a custom OMAP1510 only configuration.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Acked-by: Kevin Hilman <khilman@ti.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This driver provides two functions in one configuration:
a mass storage, and a ACM (serial port) link.
Heavily based on multi.c and cdc2.c
Signed-off-by: Klaus Schwarzkopf <schwarzkopf@sensortherm.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
| drivers/usb/gadget/mv_udc_core.c: In function 'handle_setup_packet':
| drivers/usb/gadget/mv_udc_core.c:1556:6: warning: 'status' may be \
used uninitialized in this function
Signed-off-by: Felipe Balbi <balbi@ti.com>
This patch is going to support clock gating when vbus detection is
posible. Clock and phy will be on only when usb gadget is used(vbus valid).
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
use DMA API for status_req's dma address, it is needed by dtd.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
The controller will prime failure sometimes when do the iperf test.
Add delay to wait controller release dtd dma before we free it.
Then the issue is gone.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
This patch is going to correct the ep0 state, and the unexpected
ep0 package warning can be removed.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
1: Add parameter check.
2: For controller endpoint, we need to flush in and out directions.
3: delete redundant code, make it more readable.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
According to the comment right above the code, we should use
USB_ENDPOINT_XFER_BULK instead.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>