From 0550974185cda9452dc03da91588fdf2db8952f5 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 21 May 2025 14:14:37 -0400 Subject: [PATCH] comment --- ..._74a9fa412d7eb07818c2c031019dd03a._comment | 30 +++++++++++++++++++ ..._c9e1698e33432ae7cb61cd13706717b3._comment | 17 +++++++++++ 2 files changed, 47 insertions(+) create mode 100644 doc/forum/Fill_remotes_sequentially/comment_3_74a9fa412d7eb07818c2c031019dd03a._comment create mode 100644 doc/forum/Fill_remotes_sequentially/comment_4_c9e1698e33432ae7cb61cd13706717b3._comment diff --git a/doc/forum/Fill_remotes_sequentially/comment_3_74a9fa412d7eb07818c2c031019dd03a._comment b/doc/forum/Fill_remotes_sequentially/comment_3_74a9fa412d7eb07818c2c031019dd03a._comment new file mode 100644 index 0000000000..27aca24f16 --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_3_74a9fa412d7eb07818c2c031019dd03a._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-05-21T17:06:51Z" + content=""" +As to the ordering, at first I thought it would make sense for it to +pick the most full repository that still has space for a file. + +But: Suppose that the files being processed alternate between large, and +small. The fullest tape is too full for any of the large files, but it can +hold all the small files. The second fullest tape has plenty of room. +In this case, it would constantly switch back and forth between the two tapes. + +sizebalanced picks the least full repository. That's not what we want +either, clearly, since it alternates between repositories frequently when +they're near the same size. + +The optimal solution is for git-annex to remember what repository was used +to store the last file, and can just use that repository again. Unless it's +full, in which case it can pick any repository that still has space. And +then it will continue to use that new repository for subsequent files. + +That memory would necessarily be local to a repository in front of these +tape remotes. (Eg, a cluster gateway). If there were multiple repositories +that were all writing to the same tape remotes, they would each have their +own memory, and chaos would ensue. + +Needing a memory makes me a bit dubious about putting this in a preferred +content expression. But in your specific case, I guess it would work. +"""]] diff --git a/doc/forum/Fill_remotes_sequentially/comment_4_c9e1698e33432ae7cb61cd13706717b3._comment b/doc/forum/Fill_remotes_sequentially/comment_4_c9e1698e33432ae7cb61cd13706717b3._comment new file mode 100644 index 0000000000..3fdf6d9e38 --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_4_c9e1698e33432ae7cb61cd13706717b3._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-05-21T18:05:27Z" + content=""" +Another approach would be to configure `remote..annex-cost-command` +with a command that gives a low cost to the tape in the drive, and a high +cost to other tapes. + +But git-annex only checks the cost once at startup. It would need to check +it again after each file. Which could be a new configuration setting. You +would need to make the cost command efficent enough that running it once per +file is not too slow. + +With this approach, the standard archive group preferred content +would probably suffice. +"""]]