comment
This commit is contained in:
parent
b8845055a7
commit
e5bcbe3f6b
1 changed files with 56 additions and 0 deletions
|
@ -0,0 +1,56 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2025-06-03T16:37:31Z"
|
||||
content="""
|
||||
A config setting may be unncessesary. If git-annex tried to use
|
||||
`copy_file_range` itself, that would fail with EOPNOTSUPP or EXDEV
|
||||
or EXDEV when not supported. Then git-annex could use `cp --reflink=always`
|
||||
as a fallback.
|
||||
|
||||
However, `copy_file_range` is not necessarily inexpensive. Depending on the
|
||||
filesystem it can still need to read and write the whole file. And, rather
|
||||
than a single syscall copying the whole file, git-annex would need to call
|
||||
it repeatedly in chunks in order to display a progress bar. But, making a
|
||||
lot of syscalls against a NFS filesystem would be its own overhead.
|
||||
|
||||
So there seems to be a tradeoff between progress display and efficiency on
|
||||
NFS. And if the goal is to maximize speed for NFS with server-side copy,
|
||||
maybe progress bars are not important enough to have in that case?
|
||||
|
||||
Also, it seems likely to me that you would certainly want to turn off
|
||||
annex.verify along with using `copy_file_range`, which is already a manual
|
||||
config setting. So a second config setting would be no big deal.
|
||||
|
||||
----
|
||||
|
||||
As to other filesystems, I found this comment with an overview as of 2022:
|
||||
<https://github.com/openzfs/zfs/discussions/4237#discussioncomment-3579635>
|
||||
|
||||
For btrfs, it does reflinking, so no benefit to using it over what
|
||||
git-annex does now.
|
||||
|
||||
Testing on ext4, `cp --reflink=auto` used `copy_file_range` in a copy on
|
||||
the same filesystem (it tried it cross-filesystem, but it failed and had to
|
||||
fall back to a regulat copy). So does `cp` with no options. On a SSD,
|
||||
with big enough files (4 gb or so), I did see noticable performance
|
||||
improvements.
|
||||
|
||||
If git-annex did `copy_file_range` in chunks on ext4, it could read each
|
||||
chunk after it was written to the destination file, and get it from the
|
||||
page cache. But that would still copy the content of the file into user
|
||||
space. So the savings from using `copy_file_range` with annex.verify set
|
||||
on ext4 seem like they would only be in avoiding the userspace to kernel
|
||||
transfer, with the kernel to userspace transfer still needed.
|
||||
|
||||
That also notes that, on NFS, `copy_file_range` can do a CoW copy when the
|
||||
underlying filesystem supports it. So with NFS on btrfs or zfs, a single
|
||||
`copy_file_range` call could result in no more work than a reflink,
|
||||
optimially efficient. If git-annex did `copy_file_range` on each chunk in
|
||||
order to display a progress bar, that would be a lot of syscalls in flight
|
||||
over the network, so noticably slower.
|
||||
|
||||
All of this is making me lean toward a config setting that enables
|
||||
`copy_file_range`, without progress bars, and that is intended to be
|
||||
used with annex.verify disabled in order to get optimal performance.
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue