From f2c13f73c9cc89148f733ce82707057f811bc5c2 Mon Sep 17 00:00:00 2001 From: Ilya_Shlyakhter Date: Mon, 27 Jan 2020 20:28:46 +0000 Subject: [PATCH 1/5] added todo request: setpresentkey option to record chunked state --- doc/todo/setpresentkey_option_to_record_chunked_state.mdwn | 1 + 1 file changed, 1 insertion(+) create mode 100644 doc/todo/setpresentkey_option_to_record_chunked_state.mdwn diff --git a/doc/todo/setpresentkey_option_to_record_chunked_state.mdwn b/doc/todo/setpresentkey_option_to_record_chunked_state.mdwn new file mode 100644 index 0000000000..1a4860c99e --- /dev/null +++ b/doc/todo/setpresentkey_option_to_record_chunked_state.mdwn @@ -0,0 +1 @@ +In [[git-annex-setpresentkey]], could an option be added to record in [[`aaa/bbb/*.log.cnk`|internals]] that the key contents is present in chunked state, with a given number of chunks of a given size? From 5f7be428580eaf3ff54ab2bd13829c94034ff202 Mon Sep 17 00:00:00 2001 From: "https://christian.amsuess.com/chrysn" Date: Tue, 28 Jan 2020 14:03:18 +0000 Subject: [PATCH 2/5] Added a comment: Use of the RAM disk --- ...nt_1_28a5627604d2d4b25c51779a7216931d._comment | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_1_28a5627604d2d4b25c51779a7216931d._comment diff --git a/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_1_28a5627604d2d4b25c51779a7216931d._comment b/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_1_28a5627604d2d4b25c51779a7216931d._comment new file mode 100644 index 0000000000..0e044d5008 --- /dev/null +++ b/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_1_28a5627604d2d4b25c51779a7216931d._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="https://christian.amsuess.com/chrysn" + nickname="chrysn" + avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc" + subject="Use of the RAM disk" + date="2020-01-28T14:03:15Z" + content=""" +What benefit would that give? + +When the transfer is complete, the file will be moved over to `.git/annex/objects`. On the same file system, that's a simple operation; across file systems, that's effectively a copy. + +In both cases, the file gets written to disk once. In the original case, it's up to the operating system when to start writing the data to disk (that is, unless the file is flushed by git-annex, which I don't have reason to assume it does). With a RAM disk inbetween, the file would be copied only when it's transferred completely (and then needs to be moved once more to not show up as an incomplete file at its final location). With the original setup, if the operating system has RAM to spare, it can do roughly that already (not start writing until the file is closed). When it's under pressure, it will flush the file out as soon as possible. + +Is there any performance issue you see that'd be solved using the RAM disk? If so, that might be indicative of something git-annex can do without starting to mount around (eg. remove any syncs / flushes that sneaked into the tempfile saving process, or use fallocate to tell the OS of the size to come). +"""]] From 486a0b236d99f73a23f3ced2bd75980ff213c278 Mon Sep 17 00:00:00 2001 From: Ilya_Shlyakhter Date: Tue, 28 Jan 2020 17:23:32 +0000 Subject: [PATCH 3/5] Added a comment: use of RAM disk --- .../comment_2_1df752ac3b9cb2cc0e4a7dd4af71897f._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_2_1df752ac3b9cb2cc0e4a7dd4af71897f._comment diff --git a/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_2_1df752ac3b9cb2cc0e4a7dd4af71897f._comment b/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_2_1df752ac3b9cb2cc0e4a7dd4af71897f._comment new file mode 100644 index 0000000000..5840e4f35d --- /dev/null +++ b/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_2_1df752ac3b9cb2cc0e4a7dd4af71897f._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Ilya_Shlyakhter" + avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0" + subject="use of RAM disk" + date="2020-01-28T17:23:28Z" + content=""" +You're right in general. There may be cases though, where a temp file doesn't just get moved into [[`.git/annex/objects`|internals]]: e.g. when [[chunking]] is used along with parallel downloads, chunks might go into separate temp files before being merged. I was also thinking of use cases from [[todo/let_external_remotes_declare_support_for_named_pipes]], like [[todo/git-annex-cat]], where key contents is processed but not saved. +"""]] From 8789697805e27900cd4571584802fbf26cfb8b56 Mon Sep 17 00:00:00 2001 From: Ilya_Shlyakhter Date: Tue, 28 Jan 2020 18:20:36 +0000 Subject: [PATCH 4/5] Added a comment: chunks and checksums --- ...omment_5_561b9bb28c5d375334ce915da75d5ce6._comment | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 doc/todo/key_checksum_from_chunk_checksums/comment_5_561b9bb28c5d375334ce915da75d5ce6._comment diff --git a/doc/todo/key_checksum_from_chunk_checksums/comment_5_561b9bb28c5d375334ce915da75d5ce6._comment b/doc/todo/key_checksum_from_chunk_checksums/comment_5_561b9bb28c5d375334ce915da75d5ce6._comment new file mode 100644 index 0000000000..afd7e7e4ef --- /dev/null +++ b/doc/todo/key_checksum_from_chunk_checksums/comment_5_561b9bb28c5d375334ce915da75d5ce6._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="Ilya_Shlyakhter" + avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0" + subject="chunks and checksums" + date="2020-01-28T18:20:34Z" + content=""" +\"verify with checksums parts of the file and re-download only those parts/chunks, that are bad.\" -- if I understand correctly, git-annex doesn't checksum [[chunks|chunking]], but can tell incompletely downloaded chunks based on size. + +My original use case (registering the presence of a chunked file in a remote without downloading it) might be implementable with [[todo/setpresentkey_option_to_record_chunked_state/]]. The checksums of the chunks would not be used though. + +"""]] From d2bc23487af5d9f29826299e300abe2c5c63a4a8 Mon Sep 17 00:00:00 2001 From: "https://christian.amsuess.com/chrysn" Date: Wed, 29 Jan 2020 07:49:12 +0000 Subject: [PATCH 5/5] Added a comment: Re: use of RAM disk --- ...nt_3_12a1b6f9fd616f5c498d5aff1cf1bcb6._comment | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_3_12a1b6f9fd616f5c498d5aff1cf1bcb6._comment diff --git a/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_3_12a1b6f9fd616f5c498d5aff1cf1bcb6._comment b/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_3_12a1b6f9fd616f5c498d5aff1cf1bcb6._comment new file mode 100644 index 0000000000..a7c70bfbb7 --- /dev/null +++ b/doc/todo/option_to_put_temp_files_on_a_RAM_disk/comment_3_12a1b6f9fd616f5c498d5aff1cf1bcb6._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="https://christian.amsuess.com/chrysn" + nickname="chrysn" + avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc" + subject="Re: use of RAM disk" + date="2020-01-29T07:49:08Z" + content=""" +The chunks case should fold into the original one if git-annex merges the chunks using [ioctl_ficlonerange](https://manpages.debian.org/buster/manpages-dev/ioctl_ficlonerange.2.en.html), but admittedly that is a) not portable (but neither is mounting a RAM-disk) and b) will only work on some file systems. + +I don't understand the applications in named pipes well enough to comment there (will have to read up a bit). + +But more generally, I'd gut-feeling-expect that if all is properly advertised (possibly by a fcntl, but [RWH_WRITE_LIFE_SHORT](https://manpages.debian.org/buster/manpages-dev/fcntl.2.en.html) doesn't quite seem to be it) and no fsyncs are sent (like [eatmydata](https://www.flamingspork.com/projects/libeatmydata/) does), any file should behave like that until a file system action is performed that forces it to be committed to disk -- or the kernel decides that it'd better use that RAM for something else, but that's what it can probably do best. + +I'm not sure the approach of screening (and possibly patching) data producers to not fsync (on some systems, closing might be an issue too, and that's where it gets more complex) is better than putting things to a RAM disk, I just think it's an alternative worth exploring. +"""]]