Commit graph

590050 commits

Author SHA1 Message Date
David Sterba
5c5c0df05d btrfs: rename btrfs_find_device_by_user_input
For clarity how we are going to find the device, let's call it a device
specifier, devspec for short. Also rename the arguments that are a
leftover from previous function purpose.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
418775a22b btrfs: use existing device constraints table btrfs_raid_array
We should avoid duplicating the device constraints, let's use the
btrfs_raid_array in btrfs_check_raid_min_devices.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
621292bae6 btrfs: introduce raid-type to error-code table, for minimum device constraint
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
3cc31a0d5b btrfs: pass number of devices to btrfs_check_raid_min_devices
Before this patch, btrfs_check_raid_min_devices would do an off-by-one
check of the constraints and not the miminmum check, as its name
suggests. This is not a problem if the only caller is device remove, but
would be confusing for others.

Add an argument with the exact number and let the caller(s) decide if
this needs any adjustments, like when device replace is running.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
David Sterba
f47ab2588e btrfs: rename __check_raid_min_devices
Underscores are for special functions, use the full prefix for better
stacktrace recognition.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
02feae3c55 btrfs: optimize check for stale device
Optimize check for stale device to only be checked when there is device
added or changed. If there is no update to the device, there is no need
to call btrfs_free_stale_device().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
6b526ed70c btrfs: introduce device delete by devid
This introduces new ioctl BTRFS_IOC_RM_DEV_V2, which uses enhanced struct
btrfs_ioctl_vol_args_v2 to carry devid as an user argument.

The patch won't delete the old ioctl interface and so kernel remains
backward compatible with user land progs.

Test case/script:
echo "0 $(blockdev --getsz /dev/sdf) linear /dev/sdf 0" | dmsetup create bad_disk
mkfs.btrfs -f -d raid1 -m raid1 /dev/sdd /dev/sde /dev/mapper/bad_disk
mount /dev/sdd /btrfs
dmsetup suspend bad_disk
echo "0 $(blockdev --getsz /dev/sdf) error /dev/sdf 0" | dmsetup load bad_disk
dmsetup resume bad_disk
echo "bad disk failed. now deleting/replacing"
btrfs dev del  3  /btrfs
echo $?
btrfs fi show /btrfs
umount /btrfs
btrfs-show-super /dev/sdd | egrep num_device
dmsetup remove bad_disk
wipefs -a /dev/sdf

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Martin <m_btrfs@ml1.co.uk>
[ adjust messages, s/disk/device/ ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
42b6742715 btrfs: make use of btrfs_scratch_superblocks() in btrfs_rm_device()
With the previous patches now the btrfs_scratch_superblocks() is ready to
be used in btrfs_rm_device() so use it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ use GFP_KERNEL ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
b3d1b1532f btrfs: enhance btrfs_find_device_by_user_input() to check device path
The operation of device replace and device delete follows same steps upto
some depth with in btrfs kernel, however they don't share codes. This
enhancement will help replace and delete to share codes.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
24fc572fe4 btrfs: make use of btrfs_find_device_by_user_input()
btrfs_rm_device() has a section of the code which can be replaced
btrfs_find_device_by_user_input()

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
24e0474b59 btrfs: create helper btrfs_find_device_by_user_input()
The patch renames btrfs_dev_replace_find_srcdev() to
btrfs_find_device_by_user_input() and moves it to volumes.c, so that
delete device can use it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
bd45ffbcb1 btrfs: clean up and optimize __check_raid_min_device()
__check_raid_min_device() which was pealed from btrfs_rm_device()
maintianed its original code to show the block move. This patch cleans up
__check_raid_min_device().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
f1fa7f2642 btrfs: create helper function __check_raid_min_devices()
move a section of btrfs_rm_device() code to check for min number of the
devices into the function __check_raid_min_devices()

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:13 +02:00
Anand Jain
6cf86a006b btrfs: create a helper function to read the disk super
A part of code from btrfs_scan_one_device() is moved to a new function
btrfs_read_disk_super(), so that former function looks cleaner. (In this
process it also moves the code which ensures null terminating label). So
this creates easy opportunity to merge various duplicate codes on read
disk super. Earlier attempt to merge duplicate codes highlighted that
there were some issues for which there are duplicate codes (to read disk
super), however it was not clear what was the issue. So until we figure
that out, its better to keep them in a separate functions.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ use GFP_KERNEL, PAGE_CACHE_ removal related fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:59:04 +02:00
Liu Bo
cf25ce518e Btrfs: do not create empty block group if we have allocated data
Now we force to create empty block group to keep data profile alive,
however, in the below example, we eventually get an empty block group
while we're trying to get more space for other types (metadata/system),

- Before,
block group "A": size=2G, used=1.2G
block group "B": size=2G, used=512M

- After "btrfs balance start -dusage=50 mount_point",
block group "A": size=2G, used=(1.2+0.5)G
block group "C": size=2G, used=0

Since there is no data in block group C, it won't be deleted
automatically and we have to get the unused 2G until the next mount.

Balance itself just moves data and doesn't remove data, so it's safe
to not create such a empty block group if we already have data
 allocated in other block groups.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:41:47 +02:00
Chandan Rajendra
a2af23b7d7 Btrfs: __btrfs_buffered_write: Pass valid file offset when releasing delalloc space
The delalloc reserved space is calculated in terms of number of bytes
used by an integral number of blocks. This is done by rounding down the
value of 'pos' to the nearest multiple of sectorsize.

The file offset value held by 'pos' variable may not be aligned to
sectorsize and hence when passing it as an argument to
btrfs_delalloc_release_space(), we may end up releasing larger delalloc
space than we originally had reserved.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:41:47 +02:00
Liu Bo
894b36e35a Btrfs: cleanup error handling in extent_write_cached_pages
Now that we bail out immediately if ->writepage() returns an error,
we don't need an extra error to retain the error code.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:41:47 +02:00
Liu Bo
a91326679f Btrfs: make mapping->writeback_index point to the last written page
If sequential writer is writing in the middle of the page and it just redirties
the last written page by continuing from it.

In the above case this can end up with seeking back to that firstly redirtied
page after writing all the pages at the end of file because btrfs updates
mapping->writeback_index to 1 past the current one.

For non-cow filesystems, the cost is only about extra seek, while for cow
filesystems such as btrfs, it means unnecessary fragments.

To avoid it, we just need to continue writeback from the last written page.

This also updates btrfs to behave like what write_cache_pages() does, ie, bail
 out immediately if there is an error in writepage().

<Ref: https://www.spinics.net/lists/linux-btrfs/msg52628.html>

Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:41:47 +02:00
Luke Dashjr
4c63c2454e btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl
32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
fail.

Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org>
Cc: stable@vger.kernel.org
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:40:27 +02:00
Luis de Bethencourt
180e4d4700 btrfs: fix typos in comments
Correct a typo in the chunk_mutex name to make it grepable.

Since it is better to fix several typos at once, fixing the 2 more in the
same file.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
Geert Uytterhoeven
6719afdcf8 Btrfs: Refactor btrfs_lock_cluster() to kill compiler warning
fs/btrfs/extent-tree.c: In function ‘btrfs_lock_cluster’:
fs/btrfs/extent-tree.c:6399: warning: ‘used_bg’ may be used uninitialized in this function

  - Replace "again: ... goto again;" by standard C "while (1) { ... }",
  - Move block not processed during the first iteration of the loop to the
    end of the loop, which allows to kill the "locked" variable,

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-and-Tested-by: Miao Xie <miaox@cn.fujitsu.com>
[ the compilation warning has been fixed by other patch, now we want to
  clean up the function ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
Anand Jain
0713d90c75 btrfs: remove save_error_info()
Actually save_error_info() sets the FS state to error and nothing else.
Further the word save doesn't induce caffeine when compared to the word
set in what actually it does.

So to make it better understandable move save_error_info() code to its
only consumer itself.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
Satoru Takeuchi
13f48dc909 btrfs: Simplify conditions about compress while mapping btrfs flags to inode flags
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
Anand Jain
c5f4ccb2f7 btrfs: move error handling code together in ctree.h
Looks like we added the incompatible defines in between the error
handling defines in the file ctree.h. Now group them back.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
Anand Jain
2351d743f6 btrfs: remove unused function btrfs_assert()
Apparently looks like ASSERT does the same intended job,
as intended btrfs_assert().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
Anand Jain
34d9700702 btrfs: rename btrfs_std_error to btrfs_handle_fs_error
btrfs_std_error() handles errors, puts FS into readonly mode
(as of now). So its good idea to rename it to btrfs_handle_fs_error().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ edit changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28 10:36:54 +02:00
Kan Liang
cf3beb7c90 perf/x86/intel: Fix incorrect lbr_sel_mask value
This patch fixes a bug which was introduced by:

 b16a5b52eb ("perf/x86: Add option to disable reading branch flags/cycles")

In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
set in lbr_select.

This patch corrects the LBR_SEL_MASK by including all valid bits in
LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
LBR_SELECT. It does not operate in suppress mode, so it needs to be
specially handled in intel_pmu_setup_hw_lbr_filter.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:43 +02:00
Alexander Shishkin
1c5ac21a0e perf/x86/intel/pt: Don't die on VMXON
Some versions of Intel PT do not support tracing across VMXON, more
specifically, VMXON will clear TraceEn control bit and any attempt to
set it before VMXOFF will throw a #GP, which in the current state of
things will crash the kernel. Namely:

  $ perf record -e intel_pt// kvm -nographic

on such a machine will kill it.

To avoid this, notify the intel_pt driver before VMXON and after
VMXOFF so that it knows when not to enable itself.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:42 +02:00
Peter Zijlstra
79c9ce57eb perf/core: Fix perf_event_open() vs. execve() race
Jann reported that the ptrace_may_access() check in
find_lively_task_by_vpid() is racy against exec().

Specifically:

  perf_event_open()		execve()

  ptrace_may_access()
				commit_creds()
  ...				if (get_dumpable() != SUID_DUMP_USER)
				  perf_event_exit_task();
  perf_install_in_context()

would result in installing a counter across the creds boundary.

Fix this by wrapping lots of perf_event_open() in cred_guard_mutex.
This should be fine as perf_event_exit_task() is already called with
cred_guard_mutex held, so all perf locks already nest inside it.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:41 +02:00
Peter Zijlstra
2548d546d4 nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
Chris Metcalf reported a that sched_can_stop_tick() sometimes fails to
re-enable the tick.

His observed problem is that rq->cfs.nr_running can be 1 even though
there are multiple runnable CFS tasks. This happens in the cgroup
case, in which case cfs.nr_running is the number of runnable entities
for that level.

If there is a single runnable cgroup (which can have an arbitrary
number of runnable child entries itself) rq->cfs.nr_running will be 1.

However, looking at that function I think there's more problems with it.

It seems to assume that if there's FIFO tasks, those will run. This is
incorrect. The FIFO task can have a lower prio than an RR task, in which
case the RR task will run.

So the whole fifo_nr_running test seems misplaced, it should go after
the rr_nr_running tests. That is, only if !rr_nr_running, can we use
fifo_nr_running like this.

Reported-by: Chris Metcalf <cmetcalf@mellanox.com>
Tested-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Fixes: 76d92ac305 ("sched: Migrate sched to use new tick dependency mask model")
Link: http://lkml.kernel.org/r/20160421160315.GK24771@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:28:55 +02:00
Adam Borowski
0a25556f84 perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
referenced by filter_events() which expects undefined events to have a
value of 0.

Found via KASAN:

  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
  index 9 is out of range for type 'u64 [9]'
  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
  load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:20:25 +02:00
Ilya Dryomov
d3767f0fae rbd: report unsupported features to syslog
... instead of just returning an error.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:43 +02:00
Ilya Dryomov
811c668877 rbd: fix rbd map vs notify races
A while ago, commit 9875201e10 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().

A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup().  Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
    Call Trace:
     [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
     [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
     [<ffffffff8109d5db>] process_one_work+0x17b/0x470
     [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
     [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
     [<ffffffff810a5acf>] kthread+0xcf/0xe0
     [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
     [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]

If an error occurs during rbd map, we have to error out, potentially
tearing down a watch.  Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:

    Assertion failure in rbd_dev_header_info() at line 4722:

     rbd_assert(rbd_image_format_valid(rbd_dev->image_format));

    Call Trace:
     [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
     [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
     [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
     [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
     [<ffffffff81107799>] process_one_work+0x689/0x1a80
     [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
     [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
     [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     [<ffffffff81108c69>] worker_thread+0xd9/0x1320
     [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
     [<ffffffff8111b02d>] kthread+0x21d/0x2e0
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
     [<ffffffff82022802>] ret_from_fork+0x22/0x40
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
    RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30

To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section.  The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.

Fixes: http://tracker.ceph.com/issues/15490

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:22 +02:00
Keith Busch
1bdb897039 x86/apic: Handle zero vector gracefully in clear_vector_irq()
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().

The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().

We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.

Remove the BUG_ON and return if vector is zero,

[ tglx: Massaged changelog ]

Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-28 09:53:06 +02:00
Leo Yan
15333e3af1 thermal: use %d to print S32 parameters
Power allocator's parameters are S32 type, so use %d to print them.

Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27 15:54:51 -07:00
Leo Yan
5fdfc48bb0 thermal: hisilicon: increase temperature resolution
When calculate temperature, old code firstly do division and then
convert to "millicelsius" unit. This will lose resolution and only can
read back temperature with "Celsius" unit.

So firstly scale step value to "millicelsius" and then do division, so
finally we can increase resolution for temperature value. Also refine
the calculation from temperature value to step value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27 15:54:01 -07:00
Ashutosh Dixit
82dc4afd7e misc: mic: Fix for double fetch security bug in VOP driver
The MIC VOP driver does two successive reads from user space to read a
variable length data structure. Kernel memory corruption can result if
the data structure changes between the two reads. This patch disallows
the chance of this happening.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=116651
Reported by: Pengfei Wang <wpengfeinudt@gmail.com>
Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-27 15:30:58 -07:00
Sascha Hauer
5616f36713 ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
The secondary CPU starts up in ARM mode. When the kernel is compiled in
thumb2 mode we have to explicitly compile the secondary startup
trampoline in ARM mode, otherwise the CPU will go to Nirvana.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reported-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2016-04-27 15:19:21 -07:00
Heikki Krogerus
0224a4a30b device property: Avoid potential dereferences of invalid pointers
Since fwnode may hold ERR_PTR(-ENODEV) or it may be NULL,
the fwnode type checks is_of_node(), is_acpi_node() and is
is_pset_node() need to consider it. Using IS_ERR_OR_NULL()
to check it.

Fixes: 0d67e0fa16 (device property: fix for a case of use-after-free)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-27 23:40:02 +02:00
David S. Miller
49fa523046 sparc64: Fix bootup regressions on some Kconfig combinations.
The system call tracing bug fix mentioned in the Fixes tag
below increased the amount of assembler code in the sequence
of assembler files included by head_64.S

This caused to total set of code to exceed 0x4000 bytes in
size, which overflows the expression in head_64.S that works
to place swapper_tsb at address 0x408000.

When this is violated, the TSB is not properly aligned, and
also the trap table is not aligned properly either.  All of
this together results in failed boots.

So, do two things:

1) Simplify some code by using ba,a instead of ba/nop to get
   those bytes back.

2) Add a linker script assertion to make sure that if this
   happens again the build will fail.

Fixes: 1a40b95374 ("sparc: Fix system call tracing register handling.")
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Joerg Abraham <joerg.abraham@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 17:27:37 -04:00
David S. Miller
97d601d5de Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Bug fixes for net.

Only use MSIX on VF, and fix rx page buffers on architectures with
PAGE_SIZE >= 64K.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:46 -04:00
Michael Chan
89d0a06c51 bnxt_en: Divide a page into 32K buffers for the aggregation ring if necessary.
If PAGE_SIZE is bigger than BNXT_RX_PAGE_SIZE, that means the native CPU
page is bigger than the maximum length of the RX BD.  Divide the page
into multiple 32K buffers for the aggregation ring.

Add an offset field in the bnxt_sw_rx_agg_bd struct to keep track of the
page offset of each buffer.  Since each page can be referenced by multiple
buffer entries, call get_page() as needed to get the proper reference
count.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:45 -04:00
Michael Chan
2839f28bd5 bnxt_en: Limit RX BD pages to be no bigger than 32K.
The RX BD length field of this device is 16-bit, so the largest buffer
size is 65535.  For LRO and GRO, we allocate native CPU pages for the
aggregation ring buffers.  It won't work if the native CPU page size is
64K or bigger.

We fix this by defining BNXT_RX_PAGE_SIZE to be native CPU page size
up to 32K.  Replace PAGE_SIZE with BNXT_RX_PAGE_SIZE in all appropriate
places related to the rx aggregation ring logic.

The next patch will add additional logic to divide the page into 32K
chunks for aggrgation ring buffers if PAGE_SIZE is bigger than
BNXT_RX_PAGE_SIZE.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:45 -04:00
Michael Chan
1fa72e29e1 bnxt_en: Don't fallback to INTA on VF.
Only MSI-X can be used on a VF.  The driver should fail initialization
if it cannot successfully enable MSI-X.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 16:18:45 -04:00
Linus Torvalds
b75a2bf899 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
 "So, it turns out we had a silly bug in the most fundamental part of
  workqueue for a very long time.  AFAICS, this dates back to pre-git
  era and has quite likely been there from the time workqueue was first
  introduced.

  A work item uses its PENDING bit to synchronize multiple queuers.
  Anyone who wins the PENDING bit owns the pending state of the work
  item.  Whether a queuer wins or loses the race, one thing should be
  guaranteed - there will soon be at least one execution of the work
  item - where "after" means that the execution instance would be able
  to see all the changes that the queuer has made prior to the queueing
  attempt.

  Unfortunately, we were missing a smp_mb() after clearing PENDING for
  execution, so nothing guaranteed visibility of the changes that a
  queueing loser has made, which manifested as a reproducible blk-mq
  stall.

  Lots of kudos to Roman for debugging the problem.  The patch for
  -stable is the minimal one.  For v3.7, Peter is working on a patch to
  make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix ghost PENDING flag while doing MQ IO
2016-04-27 12:03:59 -07:00
Linus Torvalds
763cfc86ee Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Two patches to fix a deadlock which can be easily triggered if memcg
  charge moving is used.

  This bug was introduced while converting threadgroup locking to a
  global percpu_rwsem and is caused by cgroup controller task migration
  path depending on the ability to create new kthreads.  cpuset had a
  similar issue which was fixed by performing heavy-lifting operations
  asynchronous to task migration.  The two patches fix the same issue in
  memcg in a similar way.  The first patch makes the mechanism generic
  and the second relocates memcg charge moving outside the migration
  path.

  Given that we don't want to perform heavy operations while
  writelocking threadgroup lock anyway, moving them out of the way is a
  desirable solution.  One thing to note is that the problem was
  difficult to debug because lockdep couldn't figure out the deadlock
  condition.  Looking into how to improve that"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  memcg: relocate charge moving from ->attach to ->post_attach
  cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
2016-04-27 11:41:14 -07:00
Linus Torvalds
3118e5f966 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID'
  patches"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
  i2c: cpm: Fix build break due to incompatible pointer types
  i2c: ismt: Add Intel DNV PCI ID
  i2c: xlp9xx: add support for Broadcom Vulcan
  i2c: rk3x: add support for rk3228
2016-04-27 11:34:45 -07:00
Linus Torvalds
24131a61ec ARC fixes for 4.6-rc6
- LOCKDEP now words for ARCv2 builds
  - Enabling DT reserved-memory binding to work (for forthcoming HDMI driver)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXIMkcAAoJEGnX8d3iisJeHdkP/1Eb+V6asWOPVGinbdKs0nhs
 xdnFiBVesgdfXKBxY/K1GN6bVNUqaPNVRk7Y1fWFzS6O2tgVMMDFz+4FebU6sDMv
 wiQuxqzEYhyaiqNvw2JUH60Y5GiCjWFLMfgR2FKCkUM99NZm/2DFos2hPE81K2yc
 4JfEQoPcsuYYN6nheMKa0sjomGHg6qIL4wi3uvB3RCrbqs1MbauKpsbdbNF3thqZ
 xpeHavHLRLUTnN/lIf7Z1SwSh6S0Ey7YFLePxsC48vZCL0a8L1HfFSfvjcxuHBMU
 cGsRXWQmwVjPUeEC9JIPcDkbEfQ1nlezU8lZcg7PJHOt4DByxN3sCngAPKt6Skls
 I2Ql5tP12IUmuWv4zpM7VP/ZZvC5EOh4RmG2xQwuV+rtDilkYHZrUPx7PdilRt3S
 a+A+FoYMgczdTTSCJnI0kJYADmPtz/6e1N9rSyzzmVnDmSPR9hClO9dENtpSwiXD
 Jtpo4tBsMLyw6+Oj68e70c58t8ek9PObR1lIqZ3zJ97hvgACjjpeDytVjv7oPpYH
 scTUfx69s6qF96Wn+y44Iw1gRV896UFlLmHtX9Hk0BtuVdjZuwOIF1Kqfsk6SYsJ
 0WFdnxoNJHPJVWkp+Pmrz7g8BaUxfAtc7Ly6C+8yUUa1nQswYQmrK+84uKLVjiWl
 E2sBDIJl2Xu2E7amTJss
 =rlDa
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - lockdep now works for ARCv2 builds

 - enable DT reserved-memory binding (for forthcoming HDMI driver)

* tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: add support for reserved memory defined by device tree
  ARC: support generic per-device coherent dma mem
  Documentation: dt: arc: fix spelling mistakes
  ARCv2: Enable LOCKDEP
2016-04-27 09:46:21 -07:00
Linus Torvalds
508fea71c6 nios2 fix for v4.6
nios2: memset: use the right constraint modifier for the %4 output operand
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXIH+FAAoJEFWoEK+e3syCfRoQAMPXIiWR/V/dLn3OX8f8CeA6
 I9duVqMrKrVh/a+bwxzmVJkumm0xzqYnOyhOpX5fZd3Nx44Q4NynJakwgWpMDVAI
 +xXxNtHZUhjcRC4EuqJW677plR0Uq8bWY2UibpARPHfB9d0arJOCuL11vdGCAjkg
 lWeVzUFg7iB9n0tRFwvsN29EcRZDo7+WbJh3cGIfTYNbcihJfiAAlmNyXS7XFiBY
 DqSYyTXIc8scH1q66gArTnyDryvM7cEZ+zyYoX9v8/E/+xPLLLhogthtqf3u4opn
 J/70k1LBgzgHCYrlEG8vvd1kCr114PLo7RlgkwJqdAtVhtMAtcGZFfkVQkSo7R3h
 gHQHf8f0exg3JKp0VesB443FyaIvpCNkth3eGdNMWunhCPB4bXE5W+hg5J5gZBNi
 1Ft9VB9Ug/8sh9Es4muinNX1kR5Fc8IWQqIa2U/OCt4O2wR1aFanJvaRqeCozbES
 SpbRAOoXtzOZ0xRPZGPQpqP6ggfizq9Zil2ZTeXbNPBRybFvmEpgewdJYqvrNwZj
 pbgB1+7zVcsfGTiMhJ+d1rLUX/oeMuUWT+eHY1jM1k+gTQVUo6k8KNVoaiNKa9z5
 cZh+XqgWKE6Qv4DVRnj+ouvDuglhwqIAyI3oXElghrYmCWxjvVKCXowQhnSq727M
 th2iyocnFMjIVWd5u3b7
 =PRJ4
 -----END PGP SIGNATURE-----

Merge tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2

Pull arch/nios2 fix from Ley Foon Tan:
 "memset: use the right constraint modifier for the %4 output operand"

* tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: memset: use the right constraint modifier for the %4 output operand
2016-04-27 09:33:24 -07:00
Flora Cui
afc4542105 drm/amdgpu: disable vm interrupts with vm_fault_stop=2
V2: disable all vm interrupts in late_init()

Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-27 12:27:10 -04:00