linux-uconsole/fs
Dave Chinner c3a62b651d iomap: readpages doesn't zero page tail beyond EOF
[ Upstream commit 8c110d43c6 ]

When we read the EOF page of the file via readpages, we need
to zero the region beyond EOF that we either do not read or
should not contain data so that mmap does not expose stale data to
user applications.

However, iomap_adjust_read_range() fails to detect EOF correctly,
and so fsx on 1k block size filesystems fails very quickly with
mapreads exposing data beyond EOF. There are two problems here.

Firstly, when calculating the end block of the EOF byte, we have
to round the size by one to avoid a block aligned EOF from reporting
a block too large. i.e. a size of 1024 bytes is 1 block, which in
index terms is block 0. Therefore we have to calculate the end block
from (isize - 1), not isize.

The second bug is determining if the current page spans EOF, and so
whether we need split it into two half, one for the IO, and the
other for zeroing. Unfortunately, the code that checks whether
we should split the block doesn't actually check if we span EOF, it
just checks if the read spans the /offset in the page/ that EOF
sits on. So it splits every read into two if EOF is not page
aligned, regardless of whether we are reading the EOF block or not.

Hence we need to restrict the "does the read span EOF" check to
just the page that spans EOF, not every page we read.

This patch results in correct EOF detection through readpages:

xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864

As you can see, it now does full page reads until the last one which
is split correctly at the block aligned EOF, reading 3072 bytes and
zeroing the last 1024 bytes. The original version of the patch got
this right, but it got another case wrong.

The EOF detection crossing really needs to the the original length
as plen, while it starts at the end of the block, will be shortened
as up-to-date blocks are found on the page. This means "orig_pos +
plen" no longer points to the end of the page, and so will not
correctly detect EOF crossing. Hence we have to use the length
passed in to detect this partial page case:

xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296

Heere we see a trace where the first block on the EOF page is up to
date, hence poff = 1024 bytes. The offset into the page of EOF is
3072, so the range we want to read is 1024 - 3071, and the range we
want to zero is 3072 - 4095. You can see this is split correctly
now.

This fixes the stale data beyond EOF problem that fsx quickly
uncovers on 1k block size filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-13 08:51:32 +01:00
..
9p 9p: avoid attaching writeback_fid on mmap with type PRIVATE 2019-10-11 18:21:13 +02:00
adfs fs/adfs: super: fix use-after-free bug 2019-08-06 19:06:49 +02:00
affs
afs afs: Fix leak in afs_lookup_cell_rcu() 2019-09-10 10:33:52 +01:00
autofs autofs: fix a leak in autofs_expire_indirect() 2019-12-13 08:51:01 +01:00
befs
bfs bfs: add sanity check at bfs_fill_super() 2018-12-01 09:37:27 +01:00
btrfs btrfs: only track ref_heads in delayed_ref_updates 2019-12-05 09:20:23 +01:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-12-17 09:24:40 +01:00
ceph ceph: return -EINVAL if given fsc mount option on kernel w/o support 2019-12-05 09:19:45 +01:00
cifs fs/cifs: fix uninitialised variable warnings 2019-12-01 09:16:58 +01:00
coda coda: add error handling for fget 2019-08-06 19:06:51 +02:00
configfs configfs: fix a deadlock in configfs_symlink() 2019-11-12 19:20:47 +01:00
cramfs Cramfs: fix abad comparison when wrap-arounds occur 2018-11-13 11:08:55 -08:00
crypto fscrypt: clean up some BUG_ON()s in block encryption/decryption 2019-07-26 09:14:02 +02:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-05-08 07:21:48 +02:00
devpts fs/devpts: always delete dcache dentry-s in dput() 2019-03-23 20:09:59 +01:00
dlm dlm: fix missing idr_destroy for recover_idr 2019-12-13 08:51:20 +01:00
ecryptfs ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either 2019-11-20 18:45:18 +01:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exofs exofs_mount(): fix leaks on failure exits 2019-12-05 09:20:32 +01:00
exportfs exportfs_decode_fh(): negative pinned may become positive without the parent locked 2019-12-13 08:51:02 +01:00
ext2 ext2: Fix underflow in ext2_max_size() 2019-03-23 20:10:03 +01:00
ext4 ext4: add more paranoia checking in ext4_expand_extra_isize handling 2019-12-05 09:21:32 +01:00
f2fs f2fs: fix to dirty inode synchronously 2019-12-05 09:20:51 +01:00
fat fat: work around race with userspace's read via blockdev while mounting 2019-10-07 18:57:14 +02:00
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-12-17 09:24:40 +01:00
fuse fuse: use READ_ONCE on congestion_threshold and max_background 2019-11-20 18:47:53 +01:00
gfs2 gfs2: take jdata unstuff into account in do_grow 2019-12-05 09:20:36 +01:00
hfs fs/hfs/extent.c: fix array out of bounds read of array extent 2019-12-01 09:17:10 +01:00
hfsplus hfsplus: update timestamps on truncate() 2019-12-01 09:17:09 +01:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs hpfs: remove unnecessary checks on the value of r when assigning error code 2018-08-25 12:42:33 -07:00
hugetlbfs hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
isofs isofs: reject hardware sector size > 2048 bytes 2018-08-21 11:37:41 +02:00
jbd2 jbd2: introduce jbd2_inode dirty range scoping 2019-07-28 08:29:29 +02:00
jffs2 jffs2: fix use-after-free on symlink traversal 2019-05-08 07:21:48 +02:00
jfs Just one jfs patch for 4.19 2018-08-15 22:47:23 -07:00
kernfs kernfs: Fix range checks in kernfs_get_target_path 2019-11-20 18:46:47 +01:00
lockd Revert "lockd: Show pid of lockd for remote locks" 2019-06-09 09:17:22 +02:00
minix
nfs NFSv4.x: fix lock recovery during delegation recall 2019-11-24 08:20:24 +01:00
nfs_common
nfsd nfsd: Fix overflow causing non-working mounts on 1 TB machines 2019-07-10 09:53:47 +02:00
nilfs2 nilfs2: convert to SPDX license tags 2018-09-04 16:45:02 -07:00
nls
notify memcg, fsnotify: no oom-kill for remote memcg charging 2019-07-31 07:27:08 +02:00
ntfs ntfs: mft: remove VLA usage 2018-08-17 16:20:27 -07:00
ocfs2 ocfs2: clear journal dirty flag after shutdown journal 2019-12-05 09:20:56 +01:00
omfs
openpromfs
orangefs orangefs: rate limit the client not running info message 2019-11-24 08:20:57 +01:00
overlayfs ovl: filter of trusted xattr results in audit 2019-10-05 13:10:08 +02:00
proc proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted() 2019-11-24 08:20:45 +01:00
pstore pstore: fs superblock limits 2019-10-07 18:56:59 +02:00
qnx4
qnx6
quota quota: fix a problem about transfer quota 2019-07-14 08:11:15 +02:00
ramfs
reiserfs reiserfs: propagate errors from fill_with_dentries() properly 2018-11-27 16:12:59 +01:00
romfs
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs Driver core patches for 4.19-rc1 2018-08-18 11:44:53 -07:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-12-17 09:24:30 +01:00
tracefs tracefs: Annotate tracefs_ops with __ro_after_init 2018-07-31 11:32:44 -04:00
ubifs ubifs: Fix default compression selection in ubifs 2019-12-05 09:20:09 +01:00
udf udf: Fix crash during mount 2019-11-20 18:46:04 +01:00
ufs ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour 2019-05-25 18:23:46 +02:00
xfs xfs: extent shifting doesn't fully invalidate page cache 2019-12-13 08:51:29 +01:00
aio.c Fix aio_poll() races 2019-05-02 09:58:59 +02:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c
binfmt_elf.c binfmt_elf: Do not move brk for INTERP-less ET_EXEC 2019-10-05 13:10:06 +02:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make load_flat_shared_library() work 2019-07-03 13:14:44 +02:00
binfmt_misc.c
binfmt_script.c exec: load_script: Do not exec truncated interpreter path 2019-11-06 13:05:37 +01:00
block_dev.c block: fix the return errno for direct IO 2019-04-17 08:38:52 +02:00
buffer.c fs: fix guard_bio_eod to check for real EOD errors 2019-04-05 22:33:00 +02:00
char_dev.c chardev: add additional check for minor range overlap 2019-05-31 06:46:27 -07:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c media: dvb: fix compat ioctl translation 2019-11-20 18:46:33 +01:00
coredump.c
d_path.c
dax.c dax: dax_layout_busy_page() should not unmap cow pages 2019-08-16 10:12:52 +02:00
dcache.c dcache: sort the freeing-without-RCU-delay mess for good. 2019-05-25 18:23:26 +02:00
dcookies.c
direct-io.c direct-io: allow direct writes to empty inodes 2019-03-05 17:58:50 +01:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-03-13 14:02:32 -07:00
eventfd.c
eventpoll.c fs/epoll: drop ovflist branch prediction 2019-02-12 19:47:19 +01:00
exec.c sched/fair: Don't free p->numa_faults with concurrent readers 2019-08-04 09:30:56 +02:00
fcntl.c signal: Don't send signals to tasks that don't exist 2018-08-15 23:03:20 -05:00
fhandle.c
file.c fs/file.c: initialize init_files.resize_wait 2019-04-05 22:32:59 +02:00
file_table.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
filesystems.c
fs-writeback.c cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead 2019-11-12 19:21:20 +01:00
fs_pin.c
fs_struct.c
inode.c Abort file_remove_privs() for non-reg. files 2019-06-22 08:15:21 +02:00
internal.h acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
ioctl.c vfs: fix FIGETBSZ ioctl on an overlayfs file 2018-11-21 09:19:14 +01:00
iomap.c iomap: readpages doesn't zero page tail beyond EOF 2019-12-13 08:51:32 +01:00
Kconfig
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
libfs.c Fix the locking in dcache_readdir() and friends 2019-10-17 13:45:35 -07:00
locks.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
Makefile
mbcache.c
mount.h
mpage.c mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
namei.c Revert "vfs: Allow userns root to call mknod on owned filesystems." 2018-12-29 13:37:54 +01:00
namespace.c mnt: fix __detach_mounts infinite loop 2018-11-21 09:19:22 +01:00
no-block.c
nsfs.c dcache: sort the freeing-without-RCU-delay mess for good. 2019-05-25 18:23:26 +02:00
open.c access: avoid the RCU grace period for the temporary subjective credentials 2019-07-31 07:27:11 +02:00
pipe.c fs: prevent page refcount overflow in pipe_buf_get 2019-05-04 09:20:11 +02:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c vfs: avoid problematic remapping requests into partial EOF block 2019-12-01 09:17:04 +01:00
readdir.c
select.c
seq_file.c seq_file: fix problem when seeking mid-record 2019-08-25 10:47:43 +02:00
signalfd.c
splice.c fs: prevent page refcount overflow in pipe_buf_get 2019-05-04 09:20:11 +02:00
stack.c
stat.c
statfs.c vfs: Fix EOVERFLOW testing in put_compat_statfs64 2019-10-11 18:21:39 +02:00
super.c Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
sync.c
timerfd.c Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:56:23 -07:00
userfaultfd.c userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx 2019-08-29 08:28:52 +02:00
utimes.c
xattr.c sysfs: Do not return POSIX ACL xattrs via listxattr 2018-09-18 07:30:48 -04:00