linux-uconsole/fs
Eric Dumazet 8d2228dd95 tcp: allow splice() to build full TSO packets
[ This combines upstream commit
  2f53384424 and the follow-on bug fix
  commit 35f9c09fe9 ]

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-27 09:51:18 -07:00
..
9p fs/9p: Use protocol-defined value for lock/getlock 'type' field. 2011-10-03 11:40:22 -07:00
adfs Fix common misspellings 2011-03-31 11:26:23 -03:00
affs affs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:53 -04:00
afs afs: Remote abort can cause BUG in rxrpc code 2012-03-23 11:20:51 -07:00
autofs4 autofs: work around unhappy compat problem on x86-64 2012-03-12 10:32:38 -07:00
befs befs: Validate length of long symbolic links. 2011-08-29 13:29:06 -07:00
bfs bfs: remove unnecessary dentry_unhash on dir rename 2011-05-28 01:02:50 -04:00
btrfs btrfs: btrfs_root_readonly() broken on big-endian 2012-04-27 09:51:17 -07:00
cachefiles Fix common misspellings 2011-03-31 11:26:23 -03:00
ceph ceph analog of cifs build_path_from_dentry() race fix 2011-07-16 23:43:58 -04:00
cifs cifs: fix issue mounting of DFS ROOT when redirecting from one domain controller to the next 2012-04-02 09:27:17 -07:00
coda coda_ioctl_permission() is safe in RCU mode 2011-06-20 10:44:19 -04:00
configfs configfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
cramfs cramfs: get_cramfs_inode() returns ERR_PTR() on failure 2011-07-17 23:22:02 -04:00
debugfs debugfs: move to new strtobool 2011-05-19 16:55:28 +09:30
devpts fs/devpts/inode.c: correctly check d_alloc_name() return code in devpts_pty_new() 2011-03-22 17:44:17 -07:00
dlm Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-05-26 13:19:00 -07:00
ecryptfs eCryptfs: Copy up lower inode attrs after setting lower xattr 2012-02-29 16:33:41 -08:00
efs
exofs fix exofs ->get_parent() 2011-07-17 23:20:29 -04:00
exportfs
ext2 ext2: remove unnecessary dentry_unhash on rmdir/rename_dir 2011-05-26 07:26:56 -04:00
ext3 ext3: Don't warn from writepage when readonly inode is spotted after error 2012-01-12 11:35:06 -08:00
ext4 ext4: fix endianness breakage in ext4_split_extent_at() 2012-04-27 09:51:09 -07:00
fat fat: Fix corrupt inode flags when remove ATTR_SYS flag 2011-05-31 19:42:24 +09:00
freevxfs treewide: fix a few typos in comments 2011-05-10 10:16:21 +02:00
fscache FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop 2011-07-21 10:59:16 -07:00
fuse fuse: fix fuse_retrieve 2011-12-21 12:57:44 -08:00
gfs2 GFS2: Fix mount hang caused by certain access pattern to sysfs files 2011-08-04 21:58:42 -07:00
hfs hfs: fix hfs_find_init() sb->ext_tree NULL ptr oops 2011-12-21 12:57:41 -08:00
hfsplus hfsplus: Fix kfree of wrong pointers in hfsplus_fill_super() error path 2011-10-25 07:10:17 +02:00
hostfs hostfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:52 -04:00
hpfs hpfs: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
hppfs hppfs: missing include 2011-11-11 09:35:59 -08:00
hugetlbfs hugetlbfs: avoid taking i_mutex from hugetlbfs_read() 2012-04-02 09:27:11 -07:00
isofs isofs: fix bh leak in isofs_fill_super() error case 2011-06-18 07:25:42 -07:00
jbd jbd/jbd2: validate sb->s_first in journal_get_superblock() 2011-12-21 12:57:40 -08:00
jbd2 jbd2: use GFP_NOFS for blkdev_issue_flush 2012-04-27 09:51:07 -07:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-05-28 13:03:41 -07:00
jfs jfs: agstart field must be 64 bits 2011-06-20 17:53:24 -05:00
lockd lockd: fix the endianness bug 2012-04-27 09:51:18 -07:00
logfs logfs doesn't need ->permission() at all 2011-06-20 10:44:26 -04:00
minix minix: remove unnecessary dentry_unhash on rmdir, dir rename 2011-05-28 01:02:54 -04:00
ncpfs ncpfs: fix rename over directory with dangling references 2011-05-28 01:02:53 -04:00
nfs NFSv4: Return the delegation if the server returns NFS4ERR_OPENMODE 2012-04-02 09:27:13 -07:00
nfs_common Fix common misspellings 2011-03-31 11:26:23 -03:00
nfsd nfsd: fix compose_entry_fh() failure exits 2012-04-27 09:51:17 -07:00
nilfs2 nilfs2: fix NULL pointer dereference in nilfs_load_super_block() 2012-03-23 11:20:50 -07:00
nls
notify fsnotify: don't BUG in fsnotify_destroy_mark() 2012-01-25 17:24:49 -08:00
ntfs Fix common misspellings 2011-03-31 11:26:23 -03:00
ocfs2 ocfs2: ->e_leaf_clusters endianness breakage 2012-04-27 09:51:18 -07:00
omfs Remove unneeded version.h includes from fs/ 2011-06-24 08:34:22 -07:00
openpromfs
partitions block: Fix NULL pointer dereference in sd_revalidate_disk 2012-03-19 08:57:58 -07:00
proc proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate(). 2012-04-02 09:27:18 -07:00
pstore pstore: fix pstore filesystem mount/remount issue 2011-05-16 11:05:00 -07:00
qnx4
quota VFS: Fix the remaining automounter semantics regressions 2011-11-11 09:36:22 -08:00
ramfs ramfs: fix memleak on no-mmu arch 2011-04-14 16:06:56 -07:00
reiserfs reiserfs: Force inode evictions before umount to avoid crash 2012-01-12 11:35:05 -08:00
romfs romfs: fix romfs_get_unmapped_area() argument check 2011-06-27 18:00:12 -07:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus 2011-05-29 11:19:45 -07:00
sysfs sysfs: Fix memory leak in sysfs_sd_setsecdata(). 2012-04-02 09:26:53 -07:00
sysv sysv: remove unnecessary dentry_unhash from rmdir, dir rename 2011-05-28 01:02:50 -04:00
ubifs UBIFS: make debugging messages light again 2012-01-25 17:25:06 -08:00
udf udf: Fix deadlock in udf_release_file() 2012-04-02 09:27:19 -07:00
ufs ufs should use d_splice_alias() 2011-07-17 23:21:35 -04:00
xfs xfs: Fix oops on IO error during xlog_recover_process_iunlinks() 2012-04-02 09:27:20 -07:00
aio.c aio: fix the "too late munmap()" race 2012-03-19 08:57:43 -07:00
anon_inodes.c
attr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00
bad_inode.c bad_inode_permission() is safe from RCU mode 2011-06-20 10:44:00 -04:00
binfmt_aout.c
binfmt_elf.c regset: Prevent null pointer reference on readonly regsets 2012-03-12 10:32:41 -07:00
binfmt_elf_fdpic.c FDPIC: Fix memory leak 2011-07-06 12:15:16 -07:00
binfmt_em86.c
binfmt_flat.c CRED: Fix load_flat_shared_library() to initialise bprm correctly 2011-05-03 10:10:51 +10:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Require subsystems to explicitly allocate bio_set integrity mempool 2011-03-17 11:11:05 +01:00
bio.c block: improve the bio_add_page() and bio_add_pc_page() descriptions 2011-05-28 14:44:46 +02:00
block_dev.c block: Fix NULL pointer dereference in sd_revalidate_disk 2012-03-19 08:57:58 -07:00
buffer.c vfs: Fix data corruption after failed write in __block_write_begin() 2011-06-16 11:44:46 -04:00
char_dev.c
compat.c exec: unify do_execve/compat_do_execve code 2011-04-09 15:53:56 +02:00
compat_binfmt_elf.c
compat_ioctl.c
dcache.c vfs: fix d_ancestor() case in d_materialize_unique 2012-04-02 09:27:19 -07:00
dcookies.c oprofile, dcookies: Fix possible circular locking dependency 2011-05-31 16:33:35 +02:00
direct-io.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
drop_caches.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
eventfd.c
eventpoll.c Don't limit non-nested epoll paths 2012-04-27 09:51:09 -07:00
exec.c exec: do not call request_module() twice from search_binary_handler() 2011-10-16 14:14:54 -07:00
fcntl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
fhandle.c fs/fhandle.c: add <linux/personality.h> for ia64 2011-04-14 16:06:56 -07:00
fifo.c Filesystem: fifo: Fixed coding style issue. 2011-03-21 00:16:09 -04:00
file.c vfs: avoid large kmalloc()s for the fdtable 2011-04-28 11:28:20 -07:00
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-03-16 13:26:17 -07:00
filesystems.c fs: synchronize_rcu when unregister_filesystem success not failure 2011-04-17 10:42:01 -07:00
fs-writeback.c writeback: update dirtied_when for synced inode to prevent livelock 2011-10-03 11:40:44 -07:00
fs_struct.c
generic_acl.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
inode.c mm: fix assertion mapping->nrpages == 0 in end_writeback() 2011-06-27 18:00:13 -07:00
internal.h fs: move i_wb_list out from under inode_lock 2011-03-24 21:17:51 -04:00
ioctl.c vfs: cleanup do_vfs_ioctl() 2011-03-21 00:16:08 -04:00
ioprio.c
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-05-26 09:52:14 -07:00
Kconfig.binfmt
libfs.c fs/libfs.c: fix simple_attr_write() on 32bit machines 2011-07-19 22:09:30 -07:00
locks.c fs: fix lock initialization 2011-07-06 10:41:13 -07:00
Makefile Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2011-03-16 19:01:29 -07:00
mbcache.c vmscan: change shrinker API by passing shrink_control struct 2011-05-25 08:39:26 -07:00
mpage.c mm/fs: add hooks to support cleancache 2011-05-26 10:01:43 -06:00
namei.c vfs: fix double put after complete_walk() 2012-03-19 08:57:44 -07:00
namespace.c fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API 2011-12-21 12:57:36 -08:00
nfsctl.c
no-block.c
open.c fs: Use BUG_ON(!mnt) at dentry_open(). 2011-03-21 01:10:41 -04:00
pipe.c
pnode.c
pnode.h
posix_acl.c
read_write.c
read_write.h
readdir.c
select.c select: remove unused MAX_SELECT_SECONDS 2011-03-21 00:16:08 -04:00
seq_file.c fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API 2011-12-21 12:57:36 -08:00
signalfd.c epoll: ep_unregister_pollwait() can use the freed pwq->whead 2012-02-29 16:34:35 -08:00
splice.c tcp: allow splice() to build full TSO packets 2012-04-27 09:51:18 -07:00
stack.c
stat.c readlinkat: ensure we return ENOENT for the empty pathname for normal lookups 2011-11-11 09:36:19 -08:00
statfs.c VFS: fix statfs() automounter semantics regression 2011-11-11 09:37:08 -08:00
super.c more conservative S_NOSEC handling 2011-06-03 18:24:58 -04:00
sync.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
timerfd.c timerfd: Fix wakeup of processes when timer is cancelled on clock change 2011-06-14 11:46:14 +02:00
utimes.c userns: rename is_owner_or_cap to inode_owner_or_capable 2011-03-23 19:47:13 -07:00
xattr.c Cache xattr security drop check for write v2 2011-05-28 12:02:09 -04:00
xattr_acl.c