From 5c4ce1353e59086c91fc17c74c43b891092d7707 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 1 Dec 2023 14:42:55 -0400 Subject: [PATCH] comment --- ..._93b85fbe5c36e986cf7c1fc87070c04c._comment | 10 ++++ ..._22ff867952875856b20339a8829c5944._comment | 22 +++++++++ ..._3323eff3d94d366595bf2b7e78c01dce._comment | 7 +++ doc/todo/distributed_migration.mdwn | 47 +++++++++++++++++++ 4 files changed, 86 insertions(+) create mode 100644 doc/forum/Revisiting_migration_and_multiple_keys/comment_5_93b85fbe5c36e986cf7c1fc87070c04c._comment create mode 100644 doc/todo/alternate_keys_for_same_content/comment_10_22ff867952875856b20339a8829c5944._comment create mode 100644 doc/todo/alternate_keys_for_same_content/comment_11_3323eff3d94d366595bf2b7e78c01dce._comment create mode 100644 doc/todo/distributed_migration.mdwn diff --git a/doc/forum/Revisiting_migration_and_multiple_keys/comment_5_93b85fbe5c36e986cf7c1fc87070c04c._comment b/doc/forum/Revisiting_migration_and_multiple_keys/comment_5_93b85fbe5c36e986cf7c1fc87070c04c._comment new file mode 100644 index 0000000000..b42947b7b8 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_5_93b85fbe5c36e986cf7c1fc87070c04c._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2023-12-01T18:42:07Z" + content=""" +I've spent a while thinking about this and came up with the ideas at +[[todo/distributed_migration]]. + +I think that probably would handle your use case. +"""]] diff --git a/doc/todo/alternate_keys_for_same_content/comment_10_22ff867952875856b20339a8829c5944._comment b/doc/todo/alternate_keys_for_same_content/comment_10_22ff867952875856b20339a8829c5944._comment new file mode 100644 index 0000000000..47ce3cdfbb --- /dev/null +++ b/doc/todo/alternate_keys_for_same_content/comment_10_22ff867952875856b20339a8829c5944._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: simpler proposal""" + date="2023-12-01T18:00:20Z" + content=""" +About the idea of recording a checksum of the content of a URL or WORM key, +without migrating to a SHA key, that does seem worth considering. (And +maybe was the original idea of this todo really..) + +If that were implemented, it would be necessary for more than one checksum +to be able to be recorded for a given URL key. Because different +clones might get different content from the URL and each add its checksum. + +So, this would not be as strong an assurance as using a SHA key that you're +referring to a specific peice of data. It would be useful to protect +against bit rot, but not as a way to pin a file to a particular version. +Which is often something one does want to do in a git repository! + +I do think that implementing that would be a lot simpler. And it would +only affect performance when verifying the content of URL or WORM keys, +when it would need to look up the checksum in the git-annex branch. +"""]] diff --git a/doc/todo/alternate_keys_for_same_content/comment_11_3323eff3d94d366595bf2b7e78c01dce._comment b/doc/todo/alternate_keys_for_same_content/comment_11_3323eff3d94d366595bf2b7e78c01dce._comment new file mode 100644 index 0000000000..154fa5a8b5 --- /dev/null +++ b/doc/todo/alternate_keys_for_same_content/comment_11_3323eff3d94d366595bf2b7e78c01dce._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2023-12-01T18:41:30Z" + content=""" +See [[distributed_migration]]... +"""]] diff --git a/doc/todo/distributed_migration.mdwn b/doc/todo/distributed_migration.mdwn new file mode 100644 index 0000000000..ad11ada8f9 --- /dev/null +++ b/doc/todo/distributed_migration.mdwn @@ -0,0 +1,47 @@ +Currently `git-annex migrate` only hard links the objects in the local +repo. This leaves other clones without the new keys' objects unless +they re-download them, or unless the same migrate command is +re-run, in the same tree, on each clone. + +It would be good to support distributed migration, so that whatever +migration is done in one repo is reflected in the other repos. + +This needs some way to store, in the git repo, a mapping between the old +key and the new key it has been migrated to. (I investigated +how much space that would need in the git repo, in +[this comment](https://git-annex.branchable.com/todo/alternate_keys_for_same_content/#comment-917eba0b2d1637236c5d900ecb5d8da0).) +The mapping might be communicated via the git branch but be locally stored +in a sqlite database to make querying it fast. + +Once that mapping is available, one simple way to use it would be a +git-annex command that updates the local repo to reflect migrations that +have happened elsewhere. It would not touch the HEAD branch, but would +just hardlink object files from the old to new key, and update the location +log for the new key to indicate the content is present in the repo. +This command could be something like `git-annex migrate --update`. + +That wouldn't be entirely sufficient though, because special remotes from +pre-migration will be populated with the old keys. A similar command could +upload the new content to special remotes, but that would double the data +stored in a special remote (or drop the old keys from them), +and use a lot of bandwidth. Probably not a good idea. + +Alternatively, the old key could be left on a special remote, but update +the location log for the special remote to say it has the new key, +and have git-annex request the old key when it wants to get (or checkpresent) +the content from the special remote. This would need the mapping to be +cheap enough to query that it won't signficantly slow down accessing a +special remote. + +Rather than a dedicated command that users need to remember to run, +distributed migration could be done automatically when merging a git-annex +branch that adds migration information. Just hardlink object files and +update the location log for the local repo and for available special +remotes. + +It would be possible to avoid updating the location log, but then all +location log queries would have to check the migration mapping. It would be +hard to make that fast enough. Consider `git-annex find --in foo`, which +queries the location log for each file. + +--[[Joey]]