linux-uconsole/fs
Filipe Manana 50d700408a Btrfs: fix race leading to fs corruption after transaction abort
commit cb2d3daddb upstream.

When one transaction is finishing its commit, it is possible for another
transaction to start and enter its initial commit phase as well. If the
first ends up getting aborted, we have a small time window where the second
transaction commit does not notice that the previous transaction aborted
and ends up committing, writing a superblock that points to btrees that
reference extent buffers (nodes and leafs) that were not persisted to disk.
The consequence is that after mounting the filesystem again, we will be
unable to load some btree nodes/leafs, either because the content on disk
is either garbage (or just zeroes) or corresponds to the old content of a
previouly COWed or deleted node/leaf, resulting in the well known error
messages "parent transid verify failed on ...".
The following sequence diagram illustrates how this can happen.

        CPU 1                                           CPU 2

 <at transaction N>

 btrfs_commit_transaction()
   (...)
   --> sets transaction state to
       TRANS_STATE_UNBLOCKED
   --> sets fs_info->running_transaction
       to NULL

                                                    (...)
                                                    btrfs_start_transaction()
                                                      start_transaction()
                                                        wait_current_trans()
                                                          --> returns immediately
                                                              because
                                                              fs_info->running_transaction
                                                              is NULL
                                                        join_transaction()
                                                          --> creates transaction N + 1
                                                          --> sets
                                                              fs_info->running_transaction
                                                              to transaction N + 1
                                                          --> adds transaction N + 1 to
                                                              the fs_info->trans_list list
                                                        --> returns transaction handle
                                                            pointing to the new
                                                            transaction N + 1
                                                    (...)

                                                    btrfs_sync_file()
                                                      btrfs_start_transaction()
                                                        --> returns handle to
                                                            transaction N + 1
                                                      (...)

   btrfs_write_and_wait_transaction()
     --> writeback of some extent
         buffer fails, returns an
	 error
   btrfs_handle_fs_error()
     --> sets BTRFS_FS_STATE_ERROR in
         fs_info->fs_state
   --> jumps to label "scrub_continue"
   cleanup_transaction()
     btrfs_abort_transaction(N)
       --> sets BTRFS_FS_STATE_TRANS_ABORTED
           flag in fs_info->fs_state
       --> sets aborted field in the
           transaction and transaction
	   handle structures, for
           transaction N only
     --> removes transaction from the
         list fs_info->trans_list
                                                      btrfs_commit_transaction(N + 1)
                                                        --> transaction N + 1 was not
							    aborted, so it proceeds
                                                        (...)
                                                        --> sets the transaction's state
                                                            to TRANS_STATE_COMMIT_START
                                                        --> does not find the previous
                                                            transaction (N) in the
                                                            fs_info->trans_list, so it
                                                            doesn't know that transaction
                                                            was aborted, and the commit
                                                            of transaction N + 1 proceeds
                                                        (...)
                                                        --> sets transaction N + 1 state
                                                            to TRANS_STATE_UNBLOCKED
                                                        btrfs_write_and_wait_transaction()
                                                          --> succeeds writing all extent
                                                              buffers created in the
                                                              transaction N + 1
                                                        write_all_supers()
                                                           --> succeeds
                                                           --> we now have a superblock on
                                                               disk that points to trees
                                                               that refer to at least one
                                                               extent buffer that was
                                                               never persisted

So fix this by updating the transaction commit path to check if the flag
BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
transaction in the fs_info->trans_list. If the flag is set, just fail the
transaction commit with -EROFS, as we do in other places. The exact error
code for the previous transaction abort was already logged and reported.

Fixes: 49b25e0540 ("btrfs: enhance transaction abort infrastructure")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06 19:06:53 +02:00
..
9p 9p: pass the correct prototype to read_cache_page 2019-07-31 07:27:08 +02:00
adfs fs/adfs: super: fix use-after-free bug 2019-08-06 19:06:49 +02:00
affs
afs afs: Fix uninitialised spinlock afs_volume::cb_break_lock 2019-07-21 09:03:06 +02:00
autofs autofs: fix error return in autofs_fill_super() 2019-03-13 14:02:32 -07:00
befs
bfs
btrfs Btrfs: fix race leading to fs corruption after transaction abort 2019-08-06 19:06:53 +02:00
cachefiles
ceph ceph: return -ERANGE if virtual xattr value didn't fit in buffer 2019-08-06 19:06:50 +02:00
cifs cifs: Fix a race condition with cifs_echo_request 2019-08-06 19:06:50 +02:00
coda coda: add error handling for fget 2019-08-06 19:06:51 +02:00
configfs configfs: Fix use-after-free when accessing sd->s_dentry 2019-06-22 08:15:17 +02:00
cramfs
crypto fscrypt: clean up some BUG_ON()s in block encryption/decryption 2019-07-26 09:14:02 +02:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-05-08 07:21:48 +02:00
devpts fs/devpts: always delete dcache dentry-s in dput() 2019-03-23 20:09:59 +01:00
dlm dlm: check if workqueues are NULL before flushing/destroying 2019-07-31 07:27:07 +02:00
ecryptfs eCryptfs: fix a couple type promotion bugs 2019-07-26 09:14:29 +02:00
efivarfs
efs
exofs
exportfs
ext2 ext2: Fix underflow in ext2_max_size() 2019-03-23 20:10:03 +01:00
ext4 ext4: allow directory holes 2019-07-28 08:29:30 +02:00
f2fs f2fs: avoid out-of-range memory access 2019-07-31 07:27:07 +02:00
fat fs/fat/file.c: issue flush after the writeback of FAT 2019-06-15 11:53:59 +02:00
freevxfs
fscache
fuse fuse: retrieve: cap requested size to negotiated max_write 2019-06-15 11:54:07 +02:00
gfs2 gfs2: Fix occasional glock use-after-free 2019-05-31 06:46:07 -07:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
isofs
jbd2 jbd2: introduce jbd2_inode dirty range scoping 2019-07-28 08:29:29 +02:00
jffs2 jffs2: fix use-after-free on symlink traversal 2019-05-08 07:21:48 +02:00
jfs
kernfs kernfs: fix barrier usage in __kernfs_new_node() 2019-05-16 19:41:18 +02:00
lockd Revert "lockd: Show pid of lockd for remote locks" 2019-06-09 09:17:22 +02:00
minix
nfs NFS: Cleanup if nfs_match_client is interrupted 2019-08-04 09:30:54 +02:00
nfs_common
nfsd nfsd: Fix overflow causing non-working mounts on 1 TB machines 2019-07-10 09:53:47 +02:00
nilfs2
nls
notify memcg, fsnotify: no oom-kill for remote memcg charging 2019-07-31 07:27:08 +02:00
ntfs
ocfs2 ocfs2: fix error path kobject memory leak 2019-06-22 08:15:21 +02:00
omfs
openpromfs
orangefs
overlayfs ovl: fix bogus -Wmaybe-unitialized warning 2019-06-25 11:35:52 +08:00
proc /proc/<pid>/cmdline: add back the setproctitle() special case 2019-08-04 09:30:56 +02:00
pstore pstore/ram: Run without kernel crash dump region 2019-06-11 12:20:52 +02:00
qnx4
qnx6
quota quota: fix a problem about transfer quota 2019-07-14 08:11:15 +02:00
ramfs
reiserfs
romfs
squashfs
sysfs
sysv
tracefs
ubifs ubifs: Handle re-linking of inodes correctly while recovery 2018-12-29 13:37:55 +01:00
udf udf: Fix incorrect final NOT_ALLOCATED (hole) extent length 2019-07-14 08:11:16 +02:00
ufs ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour 2019-05-25 18:23:46 +02:00
xfs xfs: abort unaligned nowait directio early 2019-07-26 09:14:29 +02:00
aio.c Fix aio_poll() races 2019-05-02 09:58:59 +02:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make load_flat_shared_library() work 2019-07-03 13:14:44 +02:00
binfmt_misc.c
binfmt_script.c Revert "exec: load_script: don't blindly truncate shebang string" 2019-02-15 09:09:54 +01:00
block_dev.c block: fix the return errno for direct IO 2019-04-17 08:38:52 +02:00
buffer.c fs: fix guard_bio_eod to check for real EOD errors 2019-04-05 22:33:00 +02:00
char_dev.c chardev: add additional check for minor range overlap 2019-05-31 06:46:27 -07:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c
coredump.c
d_path.c
dax.c mm: page_mkclean vs MADV_DONTNEED race 2019-06-15 11:54:01 +02:00
dcache.c dcache: sort the freeing-without-RCU-delay mess for good. 2019-05-25 18:23:26 +02:00
dcookies.c
direct-io.c direct-io: allow direct writes to empty inodes 2019-03-05 17:58:50 +01:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-03-13 14:02:32 -07:00
eventfd.c
eventpoll.c fs/epoll: drop ovflist branch prediction 2019-02-12 19:47:19 +01:00
exec.c sched/fair: Don't free p->numa_faults with concurrent readers 2019-08-04 09:30:56 +02:00
fcntl.c
fhandle.c
file.c fs/file.c: initialize init_files.resize_wait 2019-04-05 22:32:59 +02:00
file_table.c
filesystems.c
fs-writeback.c blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration 2019-07-26 09:14:08 +02:00
fs_pin.c
fs_struct.c
inode.c Abort file_remove_privs() for non-reg. files 2019-06-22 08:15:21 +02:00
internal.h acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
ioctl.c
iomap.c iomap: fix a use after free in iomap_dio_rw 2019-03-13 14:02:29 -07:00
Kconfig
Kconfig.binfmt
libfs.c
locks.c
Makefile
mbcache.c
mount.h
mpage.c
namei.c
namespace.c
no-block.c
nsfs.c dcache: sort the freeing-without-RCU-delay mess for good. 2019-05-25 18:23:26 +02:00
open.c access: avoid the RCU grace period for the temporary subjective credentials 2019-07-31 07:27:11 +02:00
pipe.c fs: prevent page refcount overflow in pipe_buf_get 2019-05-04 09:20:11 +02:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-05-08 07:21:51 +02:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c fs: prevent page refcount overflow in pipe_buf_get 2019-05-04 09:20:11 +02:00
stack.c
stat.c
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c fs/userfaultfd.c: disable irqs for fault_pending and event locks 2019-07-10 09:53:42 +02:00
utimes.c
xattr.c