Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-01-29 10:18:30 -04:00
commit 276540cc31
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 50 additions and 0 deletions

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="chunks and checksums"
date="2020-01-28T18:20:34Z"
content="""
\"verify with checksums parts of the file and re-download only those parts/chunks, that are bad.\" -- if I understand correctly, git-annex doesn't checksum [[chunks|chunking]], but can tell incompletely downloaded chunks based on size.
My original use case (registering the presence of a chunked file in a remote without downloading it) might be implementable with [[todo/setpresentkey_option_to_record_chunked_state/]]. The checksums of the chunks would not be used though.
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="https://christian.amsuess.com/chrysn"
nickname="chrysn"
avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc"
subject="Use of the RAM disk"
date="2020-01-28T14:03:15Z"
content="""
What benefit would that give?
When the transfer is complete, the file will be moved over to `.git/annex/objects`. On the same file system, that's a simple operation; across file systems, that's effectively a copy.
In both cases, the file gets written to disk once. In the original case, it's up to the operating system when to start writing the data to disk (that is, unless the file is flushed by git-annex, which I don't have reason to assume it does). With a RAM disk inbetween, the file would be copied only when it's transferred completely (and then needs to be moved once more to not show up as an incomplete file at its final location). With the original setup, if the operating system has RAM to spare, it can do roughly that already (not start writing until the file is closed). When it's under pressure, it will flush the file out as soon as possible.
Is there any performance issue you see that'd be solved using the RAM disk? If so, that might be indicative of something git-annex can do without starting to mount around (eg. remove any syncs / flushes that sneaked into the tempfile saving process, or use fallocate to tell the OS of the size to come).
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="use of RAM disk"
date="2020-01-28T17:23:28Z"
content="""
You're right in general. There may be cases though, where a temp file doesn't just get moved into [[`.git/annex/objects`|internals]]: e.g. when [[chunking]] is used along with parallel downloads, chunks might go into separate temp files before being merged. I was also thinking of use cases from [[todo/let_external_remotes_declare_support_for_named_pipes]], like [[todo/git-annex-cat]], where key contents is processed but not saved.
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="https://christian.amsuess.com/chrysn"
nickname="chrysn"
avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc"
subject="Re: use of RAM disk"
date="2020-01-29T07:49:08Z"
content="""
The chunks case should fold into the original one if git-annex merges the chunks using [ioctl_ficlonerange](https://manpages.debian.org/buster/manpages-dev/ioctl_ficlonerange.2.en.html), but admittedly that is a) not portable (but neither is mounting a RAM-disk) and b) will only work on some file systems.
I don't understand the applications in named pipes well enough to comment there (will have to read up a bit).
But more generally, I'd gut-feeling-expect that if all is properly advertised (possibly by a fcntl, but [RWH_WRITE_LIFE_SHORT](https://manpages.debian.org/buster/manpages-dev/fcntl.2.en.html) doesn't quite seem to be it) and no fsyncs are sent (like [eatmydata](https://www.flamingspork.com/projects/libeatmydata/) does), any file should behave like that until a file system action is performed that forces it to be committed to disk -- or the kernel decides that it'd better use that RAM for something else, but that's what it can probably do best.
I'm not sure the approach of screening (and possibly patching) data producers to not fsync (on some systems, closing might be an issue too, and that's where it gets more complex) is better than putting things to a RAM disk, I just think it's an alternative worth exploring.
"""]]

View file

@ -0,0 +1 @@
In [[git-annex-setpresentkey]], could an option be added to record in [[`aaa/bbb/*.log.cnk`|internals]] that the key contents is present in chunked state, with a given number of chunks of a given size?