- ARCv2 uses a seperate BCR for {I,D}CCM base address:
ARCompact encoded both base/size in same BCR
- Size encoding in common BCR is different for ARCompact/ARCv2
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
It is unlikely that designs running Linux will not have multiplier.
Further the current support is not complete as tool don't generate a
multilib w/o multiplier.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Add USB ID for cp2104/5 devices on GE B650v3 and B850v3 boards.
Signed-off-by: Ken Lin <ken.lin@advantech.com.tw>
Signed-off-by: Akshay Bhat <akshay.bhat@timesys.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Cleanup.
kmem_cache_destroy has support NULL argument checking,
so drop the double null testing before calling it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We were getting build warning about:
fs/btrfs/extent-tree.c:7021:34: warning: ‘used_bg’ may be used
uninitialized in this function
It is not a valid warning as used_bg is never used uninitilized since
locked is initially false so we can never be in the section where
'used_bg' is used. But gcc is not able to understand that and we can
initialize it while declaring to silence the warning.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: David Sterba <dsterba@suse.com>
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The kernel provides a swap() that does the same thing as this code.
Signed-off-by: Dave Jones <dsj@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While running btrfs_mksubvol(), d_really_is_positive() is called twice.
First in btrfs_mksubvol() and second inside btrfs_may_create(). So I
remove the first one.
Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Simplify expression in btrfs_calc_trans_metadata_size().
Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com>
Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: David Sterba <dsterba@suse.com>
We will sometimes start background flushing the various enospc related things
(delayed nodes, delalloc, etc) if we are getting close to reserving all of our
available space. We don't want to do this however when we are actually using
this space as it causes unneeded thrashing. We currently try to do this by
checking bytes_used >= thresh, but bytes_used is only part of the equation, we
need to use bytes_reserved as well as this represents space that is very likely
to become bytes_used in the future.
My tracing tool will keep count of the number of times we kick off the async
flusher, the following are counts for the entire run of generic/027
No Patch Patch
avg: 5385 5009
median: 5500 4916
We skewed lower than the average with my patch and higher than the average with
the patch, overall it cuts the flushing from anywhere from 5-10%, which in the
case of actual ENOSPC is quite helpful. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A non-atomic PCM stream may take snd_pcm_link_rwsem rw semaphore twice
in the same code path, e.g. one in snd_pcm_action_nonatomic() and
another in snd_pcm_stream_lock(). Usually this is OK, but when a
write lock is issued between these two read locks, the problem
happens: the write lock is blocked due to the first reade lock, and
the second read lock is also blocked by the write lock. This
eventually deadlocks.
The reason is the way rwsem manages waiters; it's queued like FIFO, so
even if the writer itself doesn't take the lock yet, it blocks all the
waiters (including reads) queued after it.
As a workaround, in this patch, we replace the standard down_write()
with an spinning loop. This is far from optimal, but it's good
enough, as the spinning time is supposed to be relatively short for
normal PCM operations, and the code paths requiring the write lock
aren't called so often.
Reported-by: Vinod Koul <vinod.koul@intel.com>
Tested-by: Ramesh Babu <ramesh.babu@intel.com>
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
There are a few places where we add to trans->bytes_reserved but don't have the
corresponding trace point. With these added my tool no longer sees transaction
leaks.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
truncate_space_check is using btrfs_csum_bytes_to_leaves() but forgetting to
multiply by nodesize so we get an actual byte count. We need a tracepoint here
so that we have the matching reserve for the release that will come later. Also
add a comment to make clear what the intent of truncate_space_check is.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I'm writing a tool to visualize the enospc system in order to help debug enospc
bugs and I found weird data and ran it down to when we update the global block
rsv. We add all of the remaining free space to the block rsv, do a trace event,
then remove the extra and do another trace event. This makes my visualization
look silly and is unintuitive code as well. Fix this stuff to only add the
amount we are missing, or free the amount we are missing. This is less clean to
read but more explicit in what it is doing, as well as only emitting events for
values that make sense. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For a non-existent device, old code bypasses adding it in dev's reada
queue.
And to solve problem of unfinished waitting in raid5/6,
commit 5fbc7c59fd ("Btrfs: fix unfinished readahead thread for
raid5/6 degraded mounting")
adding an exception for the first stripe, in short, the first
stripe will always be processed whether the device exists or not.
Actually we have a better way for the above request: just bypass
creation of the reada_extent for non-existent device, it will make
code simple and effective.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Reada background works is not designed to finish all jobs
completely, it will break in following case:
1: When a device reaches workload limit (MAX_IN_FLIGHT)
2: Total reads reach max limit (10000)
3: All devices don't have queued more jobs, often happened in DUP case
And if all background works exit with remaining jobs,
btrfs_reada_wait() will wait indefinetelly.
Above problem is rarely happened in old code, because:
1: Every work queues 2x new works
So many works reduced chances of undone jobs.
2: One work will continue 10000 times loop in case of no-jobs
It reduced no-thread window time.
But after we fixed above case, the "undone reada extents" frequently
happened.
Fix:
Check to ensure we have at least one thread if there are undone jobs
in btrfs_reada_wait().
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Reada creates 2 works for each level of tree recursively.
In case of a tree having many levels, the number of created works
is 2^level_of_tree.
Actually we don't need so many works in parallel, this patch limits
max works to BTRFS_MAX_MIRRORS * 2.
The per-fs works_counter will be also used for btrfs_reada_wait() to
check is there are background workers.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
No need to decrease dev->reada_in_flight in __readahead_hook()'s
internal and reada_extent_put().
reada_extent_put() have no chance to decrease dev->reada_in_flight
in free operation, because reada_extent have additional refcnt when
scheduled to a dev.
We can put inc and dec operation for dev->reada_in_flight to one
place instead to make logic simple and safe, and move useless
reada_extent->scheduled_for to a bool flag instead.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Remove one copy of loop to fix the typo of iterate zones.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current code set nritems to 0 to make for_loop useless to bypass it,
and set generation's value which is not necessary.
Jump into cleanup directly is better choise.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
What __readahead_hook() need exactly is fs_info, no need to convert
fs_info to root in caller and convert back in __readahead_hook()
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
reada_start_machine_dev() already have reada_extent pointer, pass
it into __readahead_hook() directly instead of search radix_tree
will make code run faster.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can't release reada_extent earlier than __readahead_hook(), because
__readahead_hook() still need to use it, it is necessary to hode a refcnt
to avoid it be freed.
Actually it is not a problem after my patch named:
Avoid many times of empty loop
It make reada_extent in above line include at least one reada_extctl,
which keeps additional one refcnt for reada_extent.
But we still need this patch to make the code in pretty logic.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
level is not used in severial functions, remove them from arguments,
and remove relative code for get its value.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When failed adding all dev_zones for a reada_extent, the extent
will have no chance to be selected to run, and keep in memory
for ever.
We should bypass this extent to avoid above case.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If some device is not reachable, we should bypass and continus addingb
next, instead of break on bad device.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move is_need_to_readahead contition earlier to avoid useless loop
to get relative data for readahead.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:
BUG: unable to handle kernel paging request at ffff880840000ff8
IP: vmalloc_fault+0x1be/0x300
PGD c7f03a067 PUD 0
Oops: 0000 [#1] SM
Call Trace:
__do_page_fault+0x285/0x3e0
do_page_fault+0x2f/0x80
? put_prev_entity+0x35/0x7a0
page_fault+0x28/0x30
? memcpy_erms+0x6/0x10
? schedule+0x35/0x80
? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
? schedule_timeout+0x183/0x240
btt_log_read+0x63/0x140 [nd_btt]
:
? __symbol_put+0x60/0x60
? kernel_read+0x50/0x80
SyS_finit_module+0xb9/0xf0
entry_SYSCALL_64_fastpath+0x1a/0xa4
Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.
vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes. pgd_ctor() sets the kernel's PGD entries to
user's during fork(). When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it. If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.
Following changes are made to vmalloc_fault().
64-bit:
- No change for the PGD sync operation as it handles large
pages already.
- Add pud_huge() and pmd_huge() to the validation code to
handle large pages.
- Change pud_page_vaddr() to pud_pfn() since an ioremap range
is not directly mapped (while the if-statement still works
with a bogus addr).
- Change pmd_page() to pmd_pfn() since an ioremap range is not
backed by struct page (while the if-statement still works
with a bogus addr).
32-bit:
- No change for the sync operation since the index3 PGD entry
covers the entire vmalloc range, which is always valid.
(A separate change to sync PGD entry is necessary if this
memory layout is changed regardless of the page size.)
- Add pmd_huge() to the validation code to handle large pages.
This is for completeness since vmalloc_fault() won't happen
in ioremap'd ranges as its PGD entry is always valid.
Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org> # 4.1+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For error handling, dma_alloc_coherent's return value
needs to be checked, not argument.
Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham says:
====================
net: thunderx: Miscellaneous fixes
This patch series fixes couple of issues w.r.t multiqset mode
and receive packet statastics.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Counting rx packets for every CQE_RX in CQ irq handler is incorrect.
Synchronization is missing when multiple queues are receiving packets
simultaneously. Like transmit packet stats use HW stats here.
Also removed unused 'cqe_type' parameter in nicvf_rcv_pkt_handler().
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For secondary Qsets 'hw_tso' is not getting set as probe() returns
much earlier. Fixed it by moving silicon revision check.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a interface is assigned morethan 8 queues and the logical interface
is toggled i.e down & up, additional queues or qsets are not initialized
as secondary qset count is being set to zero while tearing down.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit 0071f56e46 ("dsa: Register netdev before phy"), we are now trying
to free a network device that has been previously registered, and in case of
errors, this will make us hit the BUG_ON(dev->reg_state != NETREG_UNREGISTERED)
condition.
Fix this by adding a missing unregister_netdev() before free_netdev().
Fixes: 0071f56e46 ("dsa: Register netdev before phy")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert 811a4e6fce ("PCI: Add helpers to manage pci_dev->irq and
pci_dev->irq_managed").
This is part of reverting 991de2e590 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
introduced.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
The rework of the driver missed to move the call to set_handle_irq() into
asm9260_of_init(). As a consequence no interrupt entry point is installed and
no interrupts are delivered
Solution is:
Install the interrupt entry handler.
Fixes: 7e4ac676ee ("irqchip/mxs: Add Alphascale ASM9260 support")
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Link: https://lkml.kernel.org/r/1454061473-24957-1-git-send-email-linux@rempel-privat.de
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
For the Marvell 88E1510, marvell_of_reg_init was called too late, in the
config_aneg function.
Since commit 113c74d83e ("net: phy: turn carrier off on phy attach"),
this lead to the link not coming up at boot anymore, due to the phy
state machine being stuck at waiting for interrupts (off by default on
the 88E1510).
For seven other Marvell PHYs, marvell_of_reg_init was not called at all.
Add a generic marvell_config_init function, which in turn calls
marvell_of_reg_init.
PHYs, which already have a specific config_init function with a call to
marvell_of_reg_init, are left untouched. The generic marvell_config_init
function is called for all the others, to get consistent behavior across
all Marvell PHYs.
Fixes: 113c74d83e ("net: phy: turn carrier off on phy attach")
Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.
This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.
Fixes: 5156dca34a ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
This is already handled correctly in the error path.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The example in the DT binding documentation uses the preliminary DT
bindings for the r8a7795 MSTP clocks, which never went upstream.
Update the example to use the DT bindings for the upstream Clock Pulse
Generator / Module Standby and Software Reset hardware block.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw fixes
Just a couple of fixes from Ido.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VLAN device leaves a bridge its STP state is set to DISABLED,
which causes the hardware to discard any packets coming through the port
with this VLAN.
Fix that by setting STP state to FORWARDING when the device leaves its
bridge and allow traffic to be directed to CPU.
Fixes: 26f0e7fb15 ("mlxsw: spectrum: Add support for VLAN devices bridging")
Reported-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MLXSW_PORT_MAX_PORTS represents the maximum number of local ports, which
is 65 for both ASICs (SwitchX-2 and Spectrum) supported by this driver.
Fixes: 93c1edb27f ("mlxsw: Introduce Mellanox switch driver core")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A previous commit (33f72e6) added notification via netlink for tunnels
when created/modified/deleted. If the notification returned an error,
this error was returned from the tunnel function. If there were no
listeners, the error code ESRCH was returned, even though having no
listeners is not an error. Other calls to this and other similar
notification functions either ignore the error code, or filter ESRCH.
This patch checks for ESRCH and does not flag this as an error.
Reviewed-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block fixes from Jens Axboe:
"A collection of fixes from the past few weeks that should go into 4.5.
This contains:
- Overflow fix for sysfs discard show function from Alan.
- A stacking limit init fix for max_dev_sectors, so we don't end up
artificially capping some use cases. From Keith.
- Have blk-mq proper end unstarted requests on a dying queue, instead
of pushing that to the driver. From Keith.
- NVMe:
- Update to Kconfig description for NVME_SCSI, since it was
vague and having it on is important for some SUSE distros.
From Christoph.
- Set of fixes from Keith, around surprise removal. Also kills
the no-merge flag, so it supports merging.
- Set of fixes for lightnvm from Matias, Javier, and Wenwei.
- Fix null_blk oops when asked for lightnvm, but not available. From
Matias.
- Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
if interrupted by a signal.
- Two floppy fixes from Jiri, fixing signal handling and blocking
open.
- A use-after-free fix for O_DIRECT, from Mike Krinkin.
- A block module ref count fix from Roman Pen.
- An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.
- Smaller reallo fix for xen-blkfront from Bob Liu.
- Removal of an unused struct member in the deadline IO scheduler,
from Tahsin.
- Also from Tahsin, properly initialize inode struct members
associated with cgroup writeback, if enabled.
- From Tejun, ensure that we keep the superblock pinned during cgroup
writeback"
* 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
blk: fix overflow in queue_discard_max_hw_show
writeback: initialize inode members that track writeback history
writeback: keep superblock pinned during cgroup writeback association switches
bio: return EINTR if copying to user space got interrupted
NVMe: Rate limit nvme IO warnings
NVMe: Poll device while still active during remove
NVMe: Requeue requests on suspended queues
NVMe: Allow request merges
NVMe: Fix io incapable return values
blk-mq: End unstarted requests on dying queue
block: Initialize max_dev_sectors to 0
null_blk: oops when initializing without lightnvm
block: fix module reference leak on put_disk() call for cgroups throttle
nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
kernel/fs: fix I/O wait not accounted for RW O_DSYNC
floppy: refactor open() flags handling
lightnvm: allow to force mm initialization
lightnvm: check overflow and correct mlc pairs
lightnvm: fix request intersection locking in rrpc
lightnvm: warn if irqs are disabled in lock laddr
...