Commit graph

28,880 commits

Author SHA1 Message Date
Liu Bo
6bbe3a9c80 Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents
nocow_only is now an obsolete argument.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-04 09:39:57 -04:00
Liu Bo
2e90cf858f Btrfs: cleanup fs_info->hashers
fs_info->hashers is now an obsolete one.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-04 09:39:57 -04:00
Liu Bo
ab26e9d6c8 Btrfs: cleanup for duplicated code in find_free_extent
There is already an 'add free space' phrase in front of this one, we
needn't to redo it.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-04 09:39:57 -04:00
Josef Bacik
60376ce4a8 Btrfs: fix race in sync and freeze again
I screwed this up, there is a race between checking if there is a running
transaction and actually starting a transaction in sync where we could race
with a freezer and get ourselves into trouble.  To fix this we need to make
a new join type to only do the try lock on the freeze stuff.  If it fails
we'll return EPERM and just return from sync.  This fixes a hang Liu Bo
reported when running xfstest 68 in a loop.  Thanks,

Reported-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:56 -04:00
David Sterba
b3ae244e71 btrfs: return EPERM upon rmdir on a subvolume
A subvolume cannot be deleted via rmdir, but the error code ENOTEMPTY
is confusing. Return EPERM instead, as this is not permitted.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-10-04 09:39:56 -04:00
Wei Yongjun
ebb3dad435 Btrfs: using for_each_set_bit_from to simplify the code
Using for_each_set_bit_from() to simplify the code.

spatch with a semantic match is used to found this.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
2012-10-04 09:39:55 -04:00
Anand Jain
1bcea35597 Btrfs: write_buf is now callable outside send.c
Developing service cmds needs it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
2012-10-04 09:39:55 -04:00
Tsutomu Itoh
b4f359ab06 Btrfs: remove unnecessary code in btree_get_extent()
Unnecessary lookup_extent_mapping() is removed because an error is
returned to the caller.
This patch was made based on the advice from Stefan Behrens, thanks.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-10-04 09:39:54 -04:00
Tsutomu Itoh
0433f20d43 Btrfs: cleanup of error processing in btree_get_extent()
This patch simplifies a little complex error processing in
btree_get_extent().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-10-04 09:39:54 -04:00
Dan Carpenter
74b217d0d3 ore: signedness bug in _sp2d_min_pg()
This for loop doesn't work correctly when "p" is unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-10-03 13:51:51 -07:00
Alex Elder
6285bc2312 ceph: avoid 32-bit page index overflow
A pgoff_t is defined (by default) to have type (unsigned long).  On
architectures such as i686 that's a 32-bit type.  The ceph address
space code was attempting to produce 64 bit offsets by shifting a
page's index by PAGE_CACHE_SHIFT, but the result was not what was
desired because the shift occurred before the result got promoted
to 64 bits.

Fix this by converting all uses of page->index used in this way to
use the page_offset() macro, which ensures the 64-bit result has the
intended value.

This fixes http://tracker.newdream.net/issues/3112

Reported-by:  Mohamed Pakkeer <pakkeer.mohideen@realimage.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-03 10:51:18 -05:00
Sage Weil
457712a0bc ceph: return EIO on invalid layout on GET_DATALOC ioctl
If the user calls GET_DATALOC on a file with an invalid (e.g.,
zeroed) layout, return EIO to userland.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-03 10:51:17 -05:00
Linus Torvalds
eb0ad9c06d JFS TRIM support and some minor fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQbEYbAAoJEDaohF61QIxkd4QP/Akm+fi7I8vHFc7jdckxmt+X
 BS0i1SpMvxH29JhSx2ZaamNdgSW8hc3pVP2XkoICUMIDOBcx564y1mRRnOXZ1mdw
 NAwPmx4gP5OMtHOOJS0X2z7cIRnyFvhtpPtrbU2P5aOmdpEPpLMKilFoOOrWfsn0
 bhNtFQOIZnYetG9isdNxi/P3NWctCDkL2x+PvKjUp081zDkJvSWjn6RB5QGaGKVF
 BOXXSK/1LYvRSJehYXekLObhlCCHUV8jMDsZ8dzJxiHaOdSOKPbjCGG1D8CJIYV2
 7lOaZvjs7muQR4QME2CBQ6VDPW2KUPapAKRuBCglMXwP+Ym+aU/MpXYCE3nydgeQ
 kVR5jvwQcB2hRj8ALyCL79i6jM1DoN3rbaU2FHdU9enHmmEuG4424D5pwHHyUlRJ
 zb52HPGpilAHIyXHAhAl2EOvlFA60Sx1dUKiAkgc+mFKcsZaGUJw3XWSp99trSa2
 f7ZHzrmQaq0tcoYDiH00wZZfCHPlwuXxmFkRrC721xjYNOaMZKAn8n7Xi42Ap30Z
 7IzWhwNOOZuxx9CmEZDXY+6UAx28aRXsuiKwOeUVLyXcAdOx/9DxjoUjUbgNZqf0
 wu6z3kXqkD3ZnkOiKcV1lE1oBtdgZtY2S92s6SBdduxojZqC9U8RYU9ogssePo5R
 pK69xihOXuAetSGmopPO
 =9O7B
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy

Pull JFS update from Dave Kleikamp:
 "JFS TRIM support and some minor fixes"

* tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy:
  jfs: Fix do_div precision in commit b40c2e66
  JFS: use list_move instead of list_del/list_add
  jfs: Remove obsolete email address
  fs/jfs: TRIM support for JFS Filesystem
2012-10-03 08:48:21 -07:00
Linus Torvalds
88265322c1 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - Integrity: add local fs integrity verification to detect offline
     attacks
   - Integrity: add digital signature verification
   - Simple stacking of Yama with other LSMs (per LSS discussions)
   - IBM vTPM support on ppc64
   - Add new driver for Infineon I2C TIS TPM
   - Smack: add rule revocation for subject labels"

Fixed conflicts with the user namespace support in kernel/auditsc.c and
security/integrity/ima/ima_policy.c.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits)
  Documentation: Update git repository URL for Smack userland tools
  ima: change flags container data type
  Smack: setprocattr memory leak fix
  Smack: implement revoking all rules for a subject label
  Smack: remove task_wait() hook.
  ima: audit log hashes
  ima: generic IMA action flag handling
  ima: rename ima_must_appraise_or_measure
  audit: export audit_log_task_info
  tpm: fix tpm_acpi sparse warning on different address spaces
  samples/seccomp: fix 31 bit build on s390
  ima: digital signature verification support
  ima: add support for different security.ima data types
  ima: add ima_inode_setxattr/removexattr function and calls
  ima: add inode_post_setattr call
  ima: replace iint spinblock with rwlock/read_lock
  ima: allocating iint improvements
  ima: add appraise action keywords and default rules
  ima: integrity appraisal extension
  vfs: move ima_file_free before releasing the file
  ...
2012-10-02 21:38:48 -07:00
Linus Torvalds
782c3fb22b No big changes for 3.7 in UBIFS:
* Error reporting and debug printing improvements
 * Power cut emulation fixes
 * Minor cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQaumRAAoJECmIfjd9wqK016UQAImRrzhPoegy2Pr7j/CpzUST
 wJru3QlgdHgjukWer/s4CGTTjmTLMgLAllUnJLO3yn+Po4o67RJxI4SjhMf5lwDG
 TSmB/ttQpRmYxaaS2EOC6Ow+jdMM2MhAsPrD8nAfLvOm/XduuSyquPY/xFa6U4rJ
 oE4w23b5t2BZI/UJmsCaYs6Rj1h8w1aZUukii+J9BN8DGygn/vJuI9EDNKdtQmxe
 vmoT2uWKPyL859kY/+lONH048NWMkEB3BhNmuAsFqGY/tdQRdmeyhgWEKNjbmc/M
 DXZe7ovy5umUyw6iDpfhEhmOBdV9xorDySsmKNP1/q60rUGD6R/tynC4y1q1IueB
 c6fbry6xpwKIccMS/w7x9cbZxMnG++iQoj71pr2FV9Bg5Xd7XduM/XjzI5rMHuqv
 EnmcwT/LIu5Lw1vMU8pfW2yIdTkotNR/2aca1bs0f4WtS3AhngdqmNbFmddcpY37
 qvTYDrSMOW/IaegiDQzucVXNhIs8ufIULZzmj9CtmgeyvNVJboTJRik+HJDWFqeC
 04TaM8k+Yjp/isbzTmWEyBG3G06cNpMwLtT5OKcpRVQ1ZYu62AbnR+Dbep6n0qt+
 gBORW8TTY32GudeSiHLKCrOsrkx3zNSli6T4Sw9YD5e29Dee62KRTgjKVplezQfB
 Djh70vUyX2tHdYlW88gf
 =gM4M
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifs

Pull ubifs changes from Artem Bityutskiy:
 "No big changes for 3.7 in UBIFS:
   - Error reporting and debug printing improvements
   - Power cut emulation fixes
   - Minor cleanups"

Fix trivial conflict in fs/ubifs/debug.c due to the user namespace
changes.

* tag 'upstream-3.7-rc1' of git://git.infradead.org/linux-ubifs:
  UBIFS: print less
  UBIFS: use pr_ helper instead of printk
  UBIFS: comply with coding style
  UBIFS: use __aligned() attribute
  UBIFS: remove __DATE__ and __TIME__
  UBIFS: fix power cut emulation for mtdram
  UBIFS: improve scanning debug output
  UBIFS: always print full error reports
  UBIFS: print PID in debug messages
2012-10-02 20:47:48 -07:00
Linus Torvalds
60c7b4df82 xfs: update for 3.7-rc1
Several enhancements and cleanups:
 
 - make inode32 and inode64 remountable options
 - SEEK_HOLE/SEEK_DATA enhancements
 - cleanup struct declarations in xfs_mount.h
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQayQwAAoJENaLyazVq6ZO2H0P/RiWpYlVLP0ZNbxUOvL33WFy
 xOHwNkhRG23Z0sgDfrUJjDibzOuSmf/XothsChGPE4Mm8JLCjZKzxR77ySP+pbC5
 sylrciXl5E2tY3wtW8oh2kQORLRneqA6qXNCrOtYklqwSmB0qx7bf9OdjL/ALiuC
 SvTAn/PhyVZqGJgVZyEAc63CsbSzK2XmLgyfeQeA7JbSptbg5fDkFYrbNE+zr+yW
 S1umMsuZW+zwDjzRq0lGkLcYOH3fD0JpMrfqvtQsk3+SIDlq1HSaObD63dhOJVUV
 LUpOqgXCPEC3nOL3Dh+i8gv5OTRLJW0P2zYuAY++kqYnuRsYU18tcfdd0loTGeHV
 LulurSIMBJV19Qx1K3C3E8KPwbwNwNlvmpEbgXy00Qet7LTPbofcUXDi3mcV6ozQ
 SMGQh42VGV4S8wEjAp93wLva7xLf6enV/X6Stuzy/ec8HN8K2tYMyYyGvSxZbjmq
 9JTxCulDvbr7EnSDqCkH1inlQ4cXvBSzOi2trDpQbKq0cuYbqlzgL2sloBv+y+CB
 4fTWEeeisQ05VhHU8Yp3bsVtiyUjGGcbOgvqM+NHmmh1HuHUmiBnXh7k3RfaLfBi
 E4RprFT0Yrpo16mt3Fn0GjkhY87DYHBGIehFOlu13zKPc5q+IBTwIjSugxoa23EL
 uXTtFFHMxR4pUV/An5Wf
 =G6gd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "Several enhancements and cleanups:

   - make inode32 and inode64 remountable options
   - SEEK_HOLE/SEEK_DATA enhancements
   - cleanup struct declarations in xfs_mount.h"

* tag 'for-linus-v3.7-rc1' of git://oss.sgi.com/xfs/xfs:
  xfs: Make inode32 a remountable option
  xfs: add inode64->inode32 transition into xfs_set_inode32()
  xfs: Fix mp->m_maxagi update during inode64 remount
  xfs: reduce code duplication handling inode32/64 options
  xfs: make inode64 as the default allocation mode
  xfs: Fix m_agirotor reset during AG selection
  Make inode64 a remountable option
  xfs: stop the sync worker before xfs_unmountfs
  xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents
  xfs: xfs_seek_data() refinement with unwritten extents check up from page cache
  xfs: Introduce a helper routine to probe data or hole offset from page cache
  xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole()
  xfs: fix race while discarding buffers [V4]
  xfs: check for possible overflow in xfs_ioc_trim
  xfs: unlock the AGI buffer when looping in xfs_dialloc
  xfs: kill struct declarations in xfs_mount.h
  xfs: fix uninitialised variable in xfs_rtbuf_get()
2012-10-02 20:42:58 -07:00
Linus Torvalds
aab174f0df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - big one - consolidation of descriptor-related logics; almost all of
   that is moved to fs/file.c

   (BTW, I'm seriously tempted to rename the result to fd.c.  As it is,
   we have a situation when file_table.c is about handling of struct
   file and file.c is about handling of descriptor tables; the reasons
   are historical - file_table.c used to be about a static array of
   struct file we used to have way back).

   A lot of stray ends got cleaned up and converted to saner primitives,
   disgusting mess in android/binder.c is still disgusting, but at least
   doesn't poke so much in descriptor table guts anymore.  A bunch of
   relatively minor races got fixed in process, plus an ext4 struct file
   leak.

 - related thing - fget_light() partially unuglified; see fdget() in
   there (and yes, it generates the code as good as we used to have).

 - also related - bits of Cyrill's procfs stuff that got entangled into
   that work; _not_ all of it, just the initial move to fs/proc/fd.c and
   switch of fdinfo to seq_file.

 - Alex's fs/coredump.c spiltoff - the same story, had been easier to
   take that commit than mess with conflicts.  The rest is a separate
   pile, this was just a mechanical code movement.

 - a few misc patches all over the place.  Not all for this cycle,
   there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
  MAX_LFS_FILESIZE should be a loff_t
  compat: fs: Generic compat_sys_sendfile implementation
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems
  btrfs: reada_extent doesn't need kref for refcount
  coredump: move core dump functionality into its own file
  coredump: prevent double-free on an error path in core dumper
  usb/gadget: fix misannotations
  fcntl: fix misannotations
  ceph: don't abuse d_delete() on failure exits
  hypfs: ->d_parent is never NULL or negative
  vfs: delete surplus inode NULL check
  switch simple cases of fget_light to fdget
  new helpers: fdget()/fdput()
  switch o2hb_region_dev_write() to fget_light()
  proc_map_files_readdir(): don't bother with grabbing files
  make get_file() return its argument
  vhost_set_vring(): turn pollstart/pollstop into bool
  switch prctl_set_mm_exe_file() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch xfs_swapext() to fget_light()
  ...
2012-10-02 20:25:04 -07:00
Catalin Marinas
8f9c0119d7 compat: fs: Generic compat_sys_sendfile implementation
This function is used by sparc, powerpc and arm64 for compat support.
The patch adds a generic implementation which calls do_sendfile()
directly and avoids set_fs().

The sparc architecture has wrappers for the sign extensions while
powerpc relies on the compiler to do the this. The patch adds wrappers
for powerpc to handle the u32->int type conversion.

compat_sys_sendfile64() can be replaced by a sys_sendfile() call since
compat_loff_t has the same size as off_t on a 64-bit system.

On powerpc, the patch also changes the 64-bit sendfile call from
sys_sendile64 to sys_sendfile.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Kirill A. Shutemov
8c0a853770 fs: push rcu_barrier() from deactivate_locked_super() to filesystems
There's no reason to call rcu_barrier() on every
deactivate_locked_super().  We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths.  E.g.  on my machine exit_group() of a last process in IPC
namespace takes 0.07538s.  rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Al Viro
99621b44aa btrfs: reada_extent doesn't need kref for refcount
All increments and decrements are under the same spinlock - have to be,
since they need to protect the radix_tree it's found in.  Just use
int, no need to wank with kref...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Alex Kelly
10c28d937e coredump: move core dump functionality into its own file
This prepares for making core dump functionality optional.

The variable "suid_dumpable" and associated functions are left in fs/exec.c
because they're used elsewhere, such as in ptrace.

Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Linus Torvalds
fc47912d9c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 "A few drivers were updated with device tree bindings and others got a
  few small cleanups and fixes."

Fix trivial conflict in drivers/input/keyboard/omap-keypad.c due to
changes clashing with a whitespace cleanup.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (28 commits)
  Input: wacom - mark Intuos5 pad as in-prox when touching buttons
  Input: synaptics - adjust threshold for treating position values as negative
  Input: hgpk - use %*ph to dump small buffer
  Input: gpio_keys_polled - fix dt pdata->nbuttons
  Input: Add KD[GS]KBDIACRUC ioctls to the compatible list
  Input: omap-keypad - fixed formatting
  Input: tegra - move platform data header
  Input: wacom - add support for EMR on Cintiq 24HD touch
  Input: s3c2410_ts - make s3c_ts_pmops const
  Input: samsung-keypad - use of_get_child_count() helper
  Input: samsung-keypad - use of_match_ptr()
  Input: uinput - fix formatting
  Input: uinput - specify exact bit sizes on userspace APIs
  Input: uinput - mark failed submission requests as free
  Input: uinput - fix race that can block nonblocking read
  Input: uinput - return -EINVAL when read buffer size is too small
  Input: uinput - take event lock when fetching events from buffer
  Input: get rid of MATCH_BIT() macro
  Input: rotary-encoder - add DT bindings
  Input: rotary-encoder - constify platform data pointers
  ...
2012-10-02 17:16:10 -07:00
Linus Torvalds
aecdc33e11 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
2012-10-02 13:38:27 -07:00
Linus Torvalds
437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
Linus Torvalds
c0e8a139a5 Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - xattr support added.  The implementation is shared with tmpfs.  The
   usage is restricted and intended to be used to manage per-cgroup
   metadata by system software.  tmpfs changes are routed through this
   branch with Hugh's permission.

 - cgroup subsystem ID handling simplified.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
  cgroup: Assign subsystem IDs during compile time
  cgroup: Do not depend on a given order when populating the subsys array
  cgroup: Wrap subsystem selection macro
  cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
  cgroup: net_prio: Do not define task_netpioidx() when not selected
  cgroup: net_cls: Do not define task_cls_classid() when not selected
  cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
  cgroup: trivial fixes for Documentation/cgroups/cgroups.txt
  xattr: mark variable as uninitialized to make both gcc and smatch happy
  fs: add missing documentation to simple_xattr functions
  cgroup: add documentation on extended attributes usage
  cgroup: rename subsys_bits to subsys_mask
  cgroup: add xattr support
  cgroup: revise how we re-populate root directory
  xattr: extract simple_xattr code from tmpfs
2012-10-02 10:50:47 -07:00
Linus Torvalds
033d9959ed Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "This is workqueue updates for v3.7-rc1.  A lot of activities this
  round including considerable API and behavior cleanups.

   * delayed_work combines a timer and a work item.  The handling of the
     timer part has always been a bit clunky leading to confusing
     cancelation API with weird corner-case behaviors.  delayed_work is
     updated to use new IRQ safe timer and cancelation now works as
     expected.

   * Another deficiency of delayed_work was lack of the counterpart of
     mod_timer() which led to cancel+queue combinations or open-coded
     timer+work usages.  mod_delayed_work[_on]() are added.

     These two delayed_work changes make delayed_work provide interface
     and behave like timer which is executed with process context.

   * A work item could be executed concurrently on multiple CPUs, which
     is rather unintuitive and made flush_work() behavior confusing and
     half-broken under certain circumstances.  This problem doesn't
     exist for non-reentrant workqueues.  While non-reentrancy check
     isn't free, the overhead is incurred only when a work item bounces
     across different CPUs and even in simulated pathological scenario
     the overhead isn't too high.

     All workqueues are made non-reentrant.  This removes the
     distinction between flush_[delayed_]work() and
     flush_[delayed_]_work_sync().  The former is now as strong as the
     latter and the specified work item is guaranteed to have finished
     execution of any previous queueing on return.

   * In addition to the various bug fixes, Lai redid and simplified CPU
     hotplug handling significantly.

   * Joonsoo introduced system_highpri_wq and used it during CPU
     hotplug.

  There are two merge commits - one to pull in IRQ safe timer from
  tip/timers/core and the other to pull in CPU hotplug fixes from
  wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
  workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
  workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
  workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
  workqueue: remove @delayed from cwq_dec_nr_in_flight()
  workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
  workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
  workqueue: use __cpuinit instead of __devinit for cpu callbacks
  workqueue: rename manager_mutex to assoc_mutex
  workqueue: WORKER_REBIND is no longer necessary for idle rebinding
  workqueue: WORKER_REBIND is no longer necessary for busy rebinding
  workqueue: reimplement idle worker rebinding
  workqueue: deprecate __cancel_delayed_work()
  workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
  workqueue: use mod_delayed_work() instead of __cancel + queue
  workqueue: use irqsafe timer for delayed_work
  workqueue: clean up delayed_work initializers and add missing one
  workqueue: make deferrable delayed_work initializer names consistent
  workqueue: cosmetic whitespace updates for macro definitions
  workqueue: deprecate system_nrt[_freezable]_wq
  workqueue: deprecate flush[_delayed]_work_sync()
  ...
2012-10-02 09:54:49 -07:00
Linus Torvalds
797b9e5ae9 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS updates from Steve French:
 "This patchset is the final section of the SMB2.1 support merge for
  cifs.ko.  It also includes improvements to the cifs socket handling
  from Jeff, and also fixes a few cifs bug fixes.  It adds SMB2 support
  for file and inode operations as well as moves some existing cifs code
  to use ops server struct of protocol specific callbacks.

  Most of this code is SMB2 specific.  When enabled SMB2.1 does pass
  various functional tests including most of the connectathon test
  suite, For SMB2.1, Connectathon test 4 and some related tests fail due
  to not updating mode bits remotely (cifsacl support where mode bits
  are approximated with the cifs acl is not enable for smb2), and test8
  (symlink) support is not completed for SMB2 yet (note that we will
  likely have a "Unix Extensions" eventually, at least for Samba, so in
  the long run posix locks won't have to be emulated when mounting Linux
  to Linux, but for most NAS and for Windows mounts posix lock emulation
  will still used for SMB2 in a similar fashion as we do for cifs).

  SMB2.1 dialect is supported.  Although additional fixes to enable smb2
  (the original smb2.02) dialect and to add various optional features of
  the smb3 dialect are expected to be added in the future as testing
  progresses, currently mounting with the "vers=2.1" is supported (in
  order to mount using SMB2.1 to servers like Samba 4, and Windows 7,
  Windows 2008R2)."

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: (82 commits)
  [CIFS] Fix indentation of fs/cifs/Kconfig entries
  [CIFS] Fix SMB2 negotiation support to select only one dialect (based on vers=)
  cifs: obtain file access during backup intent lookup (resend)
  CIFS: Fix possible freed pointer dereference in CIFS_SessSetup
  CIFS: Fix possible freed pointer dereference in SMB2_sess_setup
  CIFS: Make ops->close return void
  cifs: change DOS/NT/POSIX mapping of ERRnoresource
  cifs: remove support for deprecated "forcedirectio" and "strictcache" mount options
  cifs: remove support for CIFS_IOC_CHECKUMOUNT ioctl
  CIFS: Fix possible memory leaks in SMB2 code
  CIFS: Fix endian conversion of IndexNumber
  Trivial endian fixes
  MARK SMB2 support EXPERIMENTAL
  Update cifs version number
  cifs: add FL_CLOSE to fl_flags mask in cifs_read_flock
  cifs: Mangle string used for unc in /proc/mounts
  cifs: cleanups for cifs_mkdir_qinfo
  CIFS: Fix fast lease break after open problem
  CIFS: Add SMB2.1 lease break support
  CIFS: Fix cache coherency for read oplock case
  ...
2012-10-01 15:27:35 -07:00
Sage Weil
6816282dab ceph: propagate layout error on osd request creation
If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller.  We switch up the return to have an error
code instead of NULL implying -ENOMEM.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-10-01 17:20:00 -05:00
Dmitry Torokhov
dde3ada3d0 Merge branch 'next' into for-linus
Prepare first set of updates for 3.7 merge window.
2012-10-01 14:20:58 -07:00
Linus Torvalds
40689ac479 dlm for 3.7
There are two main patches in this set, both related to the userland
 dlm_controld daemon.  The first fixes a deadlock between dlm_controld and
 the dlm_send workqueue when both access configfs data simultaneously.  The
 second reworks some code to get around a long standing, but intentional,
 unlock balance warning.  The userland daemon no longer takes a lock that
 is later released from the kernel.  The other commits are minor fixes and
 changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQaeyUAAoJEDgbc8f8gGmqokAQALIt28LyaIAh9bfc0AWyLldM
 t81sT2gDR/uF0r2sLHdevunCecUbxR8CN5i+pxbTf1OsDkrhYQDx7GtQvTUefc2M
 /+JA/8UosZkrsTC0LvayWbsbC15c8epwxmERel/feinZ7dmgqNGBLaMPGJzZMv5/
 KvHPRMg/JSJOpnwC64xZ097RsNkX5P8lwnK7U5ivs4vLCYR3ac3+V2+t/hxAMONE
 dwt7EZ/ADCBMUkQDTgisJA/Xr4yub+tTxUMshwkX+ve8nzSdj8dPzMFfky0bSady
 +x/r3GSVScQ//qqp8a01Jb2ZeFhKMfpTcPWve4chsngszsSzElJeMbw7numVuU98
 7hCUbH5MeSuQ2Nx3qYfi1LhrV0687lbkNLlqET+3PX/r6m6T4SqucOmPv2SeSet9
 rLGYNM1hJxx1rPASVtEsE/Xhr1agI58EO9tNb9p79wS4X5fcGqUsQXv4gRnu2/LN
 GRUTADC7nFDXy6BItYDwsuQfuJjMaDIsZ49dEUXYTlzav/zYK4IpU3fGuG3z/UDZ
 pQl+g0aK7rsq5hk7jxGfdHDBvhM+kcRMeQuTJqxJuJvLs0pmLWvFJXbyn+xh7y9Y
 MZYSSI4L3j1s4u6SkGUsMRezfcfJEYQdOzlIs7HEaLG++yBgcScF70ZgJByxzmfr
 eb9tpzwQvGLY4CvUVbEi
 =Ghb3
 -----END PGP SIGNATURE-----

Merge tag 'dlm-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:
 "There are two main patches in this set, both related to the userland
  dlm_controld daemon.

  The first fixes a deadlock between dlm_controld and the dlm_send
  workqueue when both access configfs data simultaneously.

  The second reworks some code to get around a long standing, but
  intentional, unlock balance warning.  The userland daemon no longer
  takes a lock that is later released from the kernel.

  The other commits are minor fixes and changes."

* tag 'dlm-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: check the maximum size of a request from user
  dlm: cleanup send_to_sock routine
  dlm: convert add_sock routine return value type to void
  dlm: remove redundant variable assignments
  dlm: fix unlock balance warnings
  dlm: fix uninitialized spinlock
  dlm: fix deadlock between dlm_send and dlm_controld
2012-10-01 13:51:58 -07:00
Wei Yongjun
b905a7f8b7 ceph: convert to use le32_add_cpu()
Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-01 14:30:54 -05:00
Yan, Zheng
3e8f43a089 ceph: Fix oops when handling mdsmap that decreases max_mds
When i >= newmap->m_max_mds, ceph_mdsmap_get_addr(newmap, i) return
NULL. Passing NULL to memcmp() triggers oops.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2012-10-01 14:30:54 -05:00
Alex Elder
c98f533c94 ceph: let path portion of mount "device" be optional
A recent change to /sbin/mountall causes any trailing '/' character
in the "device" (or fs_spec) field in /etc/fstab to be stripped.  As
a result, an entry for a ceph mount that intends to mount the root
of the name space ends up with now path portion, and the ceph mount
option processing code rejects this.

That is, an entry in /etc/fstab like:
    cephserver:port:/ /mnt ceph defaults 0 0
provides to the ceph code just "cephserver:port:" as the "device,"
and that gets rejected.

Although this is a bug in /sbin/mountall, we can have the ceph mount
code support an empty/nonexistent path, interpreting it to mean the
root of the name space.

RFC 5952 offers recommendations for how to express IPv6 addresses,
and recommends the usage found in RFC 3986 (which specifies the
format for URI's) for representing both IPv4 and IPv6 addresses that
include port numbers.  (See in particular the definition of
"authority" found in the Appendix of RFC 3986.)

According to those standards, no host specification will ever
contain a '/' character.  As a result, it is sufficient to scan a
provided "device" from an /etc/fstab entry for the first '/'
character, and if it's found, treat that as the beginning of the
path.  If no '/' character is present, we can treat the entire
string as the monitor host specification(s), and assume the path
to be the root of the name space.  We'll still require a ':' to
separate the host portion from the (possibly empty) path portion.

This means that we can more formally define how ceph will interpret
the "device" it's provided when processing a mount request:

    "device" will look like:
        <server_spec>[,<server_spec>...]:[<path>]
    where
        <server_spec> is <ip>[:<port>]
        <path> is optional, but if present must begin with '/'

This addresses http://tracker.newdream.net/issues/2919

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-10-01 14:30:49 -05:00
Linus Torvalds
3498d13b80 TTY merge for 3.7-rc1
As we skipped the merge window for 3.6-rc1 for the tty tree, everything
 is now settled down and working properly, so we are ready for 3.7-rc1.
 Here's the patchset, it's big, but the large changes are removing a
 firmware file and adding a staging tty driver (it depended on the tty
 core changes, so it's going through this tree instead of the staging
 tree.)
 
 All of these patches have been in the linux-next tree for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlBp36oACgkQMUfUDdst+yk4WgCdEy13hot8fI2Lqnc7W0LKu7GX
 4p8AoLTjzrXhLosxdijskDQ9X1OtjrxU
 =S5Ng
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY changes from Greg Kroah-Hartman:
 "As we skipped the merge window for 3.6-rc1 for the tty tree,
  everything is now settled down and working properly, so we are ready
  for 3.7-rc1.  Here's the patchset, it's big, but the large changes are
  removing a firmware file and adding a staging tty driver (it depended
  on the tty core changes, so it's going through this tree instead of
  the staging tree.)

  All of these patches have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fix up more-or-less trivial conflicts in
 - drivers/char/pcmcia/synclink_cs.c:
    tty NULL dereference fix vs tty_port_cts_enabled() helper function
 - drivers/staging/{Kconfig,Makefile}:
    add-add conflict (dgrp driver added close to other staging drivers)
 - drivers/staging/ipack/devices/ipoctal.c:
    "split ipoctal_channel from iopctal" vs "TTY: use tty_port_register_device"

* tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (235 commits)
  tty/serial: Add kgdb_nmi driver
  tty/serial/amba-pl011: Quiesce interrupts in poll_get_char
  tty/serial/amba-pl011: Implement poll_init callback
  tty/serial/core: Introduce poll_init callback
  kdb: Turn KGDB_KDB=n stubs into static inlines
  kdb: Implement disable_nmi command
  kernel/debug: Mask KGDB NMI upon entry
  serial: pl011: handle corruption at high clock speeds
  serial: sccnxp: Make 'default' choice in switch last
  serial: sccnxp: Remove mask termios caps for SW flow control
  serial: sccnxp: Report actual baudrate back to core
  serial: samsung: Add poll_get_char & poll_put_char
  Powerpc 8xx CPM_UART setting MAXIDL register proportionaly to baud rate
  Powerpc 8xx CPM_UART maxidl should not depend on fifo size
  Powerpc 8xx CPM_UART too many interrupts
  Powerpc 8xx CPM_UART desynchronisation
  serial: set correct baud_base for EXSYS EX-41092 Dual 16950
  serial: omap: fix the reciever line error case
  8250: blacklist Winbond CIR port
  8250_pnp: do pnp probe before legacy probe
  ...
2012-10-01 12:26:52 -07:00
Miao Xie
90abccf2c6 Revert "Btrfs: do not do filemap_write_and_wait_range in fsync"
This reverts commit 0885ef5b56

After applying the above patch, the performance slowed down because the dirty
page flush can only be done by one task, so revert it.

The following is the test result of sysbench:
	Before		After
	24MB/s		39MB/s

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-01 15:19:22 -04:00
Josef Bacik
698d0082c4 Btrfs: remove bytes argument from do_chunk_alloc
Everybody is just making stuff up, and it's just used to see if we really do
need to alloc a chunk, and since we do this when we already know we really
do it's just a waste of space.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-01 15:19:21 -04:00
Josef Bacik
ea658badc4 Btrfs: delay block group item insertion
So we have lots of places where we try to preallocate chunks in order to
make sure we have enough space as we make our allocations.  This has
historically meant that we're constantly tweaking when we should allocate a
new chunk, and historically we have gotten this horribly wrong so we way
over allocate either metadata or data.  To try and keep this from happening
we are going to make it so that the block group item insertion is done out
of band at the end of a transaction.  This will allow us to create chunks
even if we are trying to make an allocation for the extent tree.  With this
patch my enospc tests run faster (didn't expect this) and more efficiently
use the disk space (this is what I wanted).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-01 15:19:21 -04:00
Kent Overstreet
be3940c0a9 btrfs: Kill some bi_idx references
For immutable bio vecs, I've been auditing and removing bi_idx
references. These were harmless, but removing them will make auditing
easier.

scrub_bio_end_io_worker() was open coding a bio_reset() - but this
doesn't appear to have been needed for anything as right after it does a
bio_put(), and perusing the code it doesn't appear anything else was
holding a reference to the bio.

The other use end_bio_extent_readpage() was just for a pr_debug() -
changed it to something that might be a bit more useful.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Stefan Behrens <sbehrens@giantdisaster.de>
2012-10-01 15:19:21 -04:00
Miao Xie
962197babe Btrfs: fix unnecessary warning when the fragments make the space alloc fail
When we wrote some data by compress mode into a btrfs filesystem which was full
of the fragments, the kernel will report:
	BTRFS warning (device xxx): Aborting unused transaction.

The reason is:
We can not find a long enough free space to store the compressed data because
of the fragmentary free space, and the compressed data can not be splited,
so the kernel outputed the above message.

In fact, btrfs can deal with this problem very well: it fall back to
uncompressed IO, split the uncompressed data into small ones, and then
store them into to the fragmentary free space. So we shouldn't output the
above warning message.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-01 15:19:20 -04:00
Josef Bacik
69ffb54347 Btrfs: create a pinned em when writing to a prealloc range in DIO
Wade Cline reported a problem where he was getting garbage and warnings when
writing to a preallocated range via O_DIRECT.  This is because we weren't
creating our normal pinned extent_map for the range we were writing to,
which was causing all sorts of issues.  This patch fixes the problem and
makes his testcase much happier.  Thanks,

Reported-by: Wade Cline <clinew@linux.vnet.ibm.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-01 15:19:20 -04:00
Josef Bacik
6df7881a84 Btrfs: move the sb_end_intwrite until after the throttle logic
Sage reported the following lockdep backtrace

=====================================
[ BUG: bad unlock balance detected! ]
3.6.0-rc2-ceph-00171-gc7ed62d #1 Not tainted
-------------------------------------
btrfs-cleaner/7607 is trying to release lock (sb_internal) at:
[<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
but there are no more locks to release!

other info that might help us debug this:
1 lock held by btrfs-cleaner/7607:
 #0:  (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b405>] cleaner_kthread+0x95/0x120 [btrfs]

stack backtrace:
Pid: 7607, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00171-gc7ed62d #1
Call Trace:
 [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
 [<ffffffff810afa9e>] print_unlock_inbalance_bug+0xfe/0x110
 [<ffffffff810b289e>] lock_release_non_nested+0x1ee/0x310
 [<ffffffff81172f9b>] ? kmem_cache_free+0x7b/0x160
 [<ffffffffa004106c>] ? put_transaction+0x8c/0x130 [btrfs]
 [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
 [<ffffffff810b2a95>] lock_release+0xd5/0x220
 [<ffffffff81173071>] ? kmem_cache_free+0x151/0x160
 [<ffffffff8117d9ed>] __sb_end_write+0x7d/0x90
 [<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
 [<ffffffff81079850>] ? __init_waitqueue_head+0x60/0x60
 [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffffa0042758>] __btrfs_end_transaction+0x368/0x3c0 [btrfs]
 [<ffffffffa0042808>] btrfs_end_transaction_throttle+0x18/0x20 [btrfs]
 [<ffffffffa00318f0>] btrfs_drop_snapshot+0x410/0x600 [btrfs]
 [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0
 [<ffffffffa00430ef>] btrfs_clean_old_snapshots+0xaf/0x150 [btrfs]
 [<ffffffffa003b405>] ? cleaner_kthread+0x95/0x120 [btrfs]
 [<ffffffffa003b419>] cleaner_kthread+0xa9/0x120 [btrfs]
 [<ffffffffa003b370>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs]
 [<ffffffff810791ee>] kthread+0xae/0xc0
 [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
 [<ffffffff81635430>] ? retint_restore_args+0x13/0x13
 [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0
 [<ffffffff8163e740>] ? gs_change+0x13/0x13

This is because the throttle stuff can commit the transaction, which expects to
be the one stopping the intwrite stuff, but we've already done it in the
__btrfs_end_transaction.  Moving the sb_end_intewrite after this logic makes the
lockdep go away.  Thanks,

Tested-by: Sage Weil <sage@inktank.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-01 15:19:19 -04:00
Liu Bo
425d17a290 Btrfs: use larger limit for translation of logical to inode
This is the change of the kernel side.

Translation of logical to inode used to have an upper limit 4k on
inode container's size, but the limit is not large enough for a data
with a great many of refs, so when resolving logical address,
we can end up with
"ioctl ret=0, bytes_left=0, bytes_missing=19944, cnt=510, missed=2493"

This changes to regard 64k as the upper limit and use vmalloc instead of
kmalloc to get memory more easily.

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-01 15:19:19 -04:00
Liu Bo
df031f0752 Btrfs: use helper for logical resolve
We already have a helper, iterate_inodes_from_logical(), for logical resolve,
so just use it.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-01 15:19:18 -04:00
Liu Bo
69917e4312 Btrfs: fix a bug in parsing return value in logical resolve
In logical resolve, we parse extent_from_logical()'s 'ret' as a kind of flag.

It is possible to lose our errors because
(-EXXXX & BTRFS_EXTENT_FLAG_TREE_BLOCK) is true.

I'm not sure if it is on purpose, it just looks too hacky if it is.
I'd rather use a real flag and a 'ret' to catch errors.

Acked-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Liu Bo <liub.liubo@gmail.com>
2012-10-01 15:19:18 -04:00
liubo
0647d6bd16 Btrfs: cleanup for unused ref cache stuff
As ref cache has been removed from btrfs, there is no user on
its lock and its check.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-01 15:19:17 -04:00
Miao Xie
8407aa4643 Btrfs: fix corrupted metadata in the snapshot
When we delete a inode, we will remove all the delayed items including delayed
inode update, and then truncate all the relative metadata. If there is lots of
metadata, we will end the current transaction, and start a new transaction to
truncate the left metadata. In this way, we will leave a inode item that its
link counter is > 0, and also may leave some directory index items in fs/file tree
after the current transaction ends. In other words, the metadata in this fs/file tree
is inconsistent. If we create a snapshot for this tree now, we will find a inode with
corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
because its link counter is not 0.

We fix this problem by updating the inode item before the current transaction ends.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-01 15:19:17 -04:00
David Sterba
837e197283 btrfs: polish names of kmem caches
Usecase:

  watch 'grep btrfs < /proc/slabinfo'

easy to watch all caches in one go.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-10-01 15:19:16 -04:00
Josef Bacik
a80c8dcf7e Btrfs: fix our overcommit math
I noticed I was seeing large lags when running my torrent test in a vm on my
laptop.  While trying to make it lag less I noticed that our overcommit math
was taking into account the number of bytes we wanted to reclaim, not the
number of bytes we actually wanted to allocate, which means we wouldn't
overcommit as often.  This patch fixes the overcommit math and makes
shrink_delalloc() use that logic so that it will stop looping faster.  We
still have pretty high spikes of latency, but the test now takes 3 minutes
less time (about 5% faster).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-01 15:19:16 -04:00
Josef Bacik
dea31f5233 Btrfs: wait on async pages when shrinking delalloc
Mitch reported a problem where you could get an ENOSPC error when untarring
a kernel git tree onto a 16gb file system with compress-force=zlib.  This is
because compression is a huge pain, it will return from ->writepages()
without having actually created any ordered extents.  To get around this we
check to see if the async submit counter is up, and if it is wait until it
drops to 0 before doing our normal ordered wait dance.  With this patch I
can now untar a kernel git tree onto a 16gb file system without getting
ENOSPC errors.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-01 15:19:15 -04:00
Liu Bo
9e8a4a8b0b Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag
We're going to use this flag EXTENT_DEFRAG to indicate which range
belongs to defragment so that we can implement snapshow-aware defrag:

We set the EXTENT_DEFRAG flag when dirtying the extents that need
defragmented, so later on writeback thread can differentiate between
normal writeback and writeback started by defragmentation.

Original-Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-01 15:19:15 -04:00