Merge branch 'master' of ssh://git-annex.branchable.com

2023-12-01 13:50:01 -04:00 · 2023-12-01 13:50:01 -04:00 · ce9f909ee9
commit ce9f909ee9
parent 1d020df896 381e316e29
3 changed files with 81 additions and 0 deletions
--- a/doc/bugs/96git_annex_info96_hangs_with_git_special_remote/comment_2_9274223b32601ead9a508aa9852e4933._comment
+++ b/doc/bugs/96git_annex_info96_hangs_with_git_special_remote/comment_2_9274223b32601ead9a508aa9852e4933._comment
@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="Atemu"
+ avatar="http://cdn.libravatar.org/avatar/86b8c2d893dfdf2146e1bbb8ac4165fb"
+ subject="comment 2"
+ date="2023-12-01T10:21:09Z"
+ content="""
+I've had an idea on this: Why not only update UUIDs on (manual) sync/fetch?
+
+This would be in line with how git interacts with regular remotes otherwise too; always requiring an explicit fetch to update its info.
+
+To me it just violates the principle of least surprise to have git-annex try and reach remotes when running something as simple as `info`.
+"""]]
--- a/doc/bugs/96git_annex_info96_hangs_with_git_special_remote/comment_3_8f46a9d4a7ceae80e378149d88dd1f19._comment
+++ b/doc/bugs/96git_annex_info96_hangs_with_git_special_remote/comment_3_8f46a9d4a7ceae80e378149d88dd1f19._comment
@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="nobodyinperson"
+ avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
+ subject="Another possibility to make --fast faster?"
+ date="2023-12-01T11:50:25Z"
+ content="""
+How about having `git annex info --fast` skip this lookup step for remotes where it doesn't know the UUID of yet?
+
+`git annex info` can already be quite slow in the other steps it takes (counting files, disk space, etc.) in large repos, so it is not so much of a surprise that it hangs a while by default. But if `--fast` would make it actually fast by staying completely offline (right?) and skipping the slow local counting steps, this would be logical.
+
+"""]]
--- a/doc/forum/Revisiting_migration_and_multiple_keys/comment_4_7d367f38250a4a3454299170700d5c6c._comment
+++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_4_7d367f38250a4a3454299170700d5c6c._comment
@ -0,0 +1,58 @@
+[[!comment format=mdwn
+ username="unqueued"
+ avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d"
+ subject="comment 4"
+ date="2023-12-01T02:09:07Z"
+ content="""
+@joey
+
+It isn't a huge problem, but I keep coming back to it. The only workflow I still use where this comes up is for my filesharing assets repo. I just ended up leaving it as MD5E, because much of it is downstream from gdrive shares, and I almost never have all of the content in one place at a time.
+
+
+This is one of the scripts I sometimes use, although I wrote it awhile ago before I found out about git-annex-filter-branch
+<https://gist.github.com/unqueued/06b5a5c14daa8224a659c5610dce3132>
+
+But I mostly rely on splitting off subset repos with no history, processing them in some way, and then re-absorbing them back into a larger repo.
+
+I actually started a repo that would track new builds for Microsoft Dev VMs: <https://github.com/unqueued/official-microsoft-vms-annex>
+
+But for my bigger repos, I almost never have all of the data in the same place at the same time.
+
+
+@nobodyinperson
+
+> Hi! If I understand you correctly, your problem is that you often migrate keys to another backend, and there are situations involving merges of repos far away from each other in history that cause merge conflicts, which results in the dead old pre-migration key being reintroduced?
+
+Well, there aren't any conflicts, they just get silently reintroduced, which isn't the end of the world, especially if they get marked as dead. But they clutter the git-annex branch, and over time, with large repos, it may become a problem. There isn't any direct relationship between the previous key and the migrated key.
+
+So, if I have my `linux_isos` repo, and I do git-annex-migrate on it, but say only isos for the year 2021 are in my specific repo at that moment, then the symlinks will be updated and the new sha256 log log files will be added to my git-annex branch.
+
+And if you sync with another repo that also has the same files in the backend, they will still be in the repo, but just inaccessible.
+
+And I feel like there's enough information to efficiently track the lifecycle of a key.
+
+
+I'm exhuming my old scripts and cleaning them up, but essentially, you can get everything you need to assemble an MD5E annex from a Google Drive share by doing `rclone lsjson -R --hash rclone-drive-remote:`
+
+And to get the keys, you could pipe it into something like this:
+`perl -MJSON::PP -ne 'BEGIN { $/ = undef; $j = decode_json(<>); } foreach $e (@{$j}) { next if $e->{\"IsDir\"} || !exists $e->{\"Hashes\"}; print \"MD5-s\" . $e->{\"Size\"} . \"--\" . $e->{\"Hashes\"}->{\"MD5\"} . \"\t\" . $e->{\"Path\"} . \"\n\"; }' `
+
+That's just part of a project I have with a Makefile that indexes, assembles and then optionally re-serves an rclone gdrive remote. I will try to post it later tonight. It was just a project I made for fun.
+
+And there are plenty of other places where you can get enough info to assemble a repo ahead of time, and essentially turn it into a big queue.
+
+
+You can find all sorts of interesting things to annex.
+
+https://old.reddit.com/r/opendirectories sometimes has interesting stuff.
+
+Here are some public Google Drive shares:
+
+* [Bibliotheca Anonoma](https://drive.google.com/drive/folders/0B7WYx7u6HJh_Z3FjU2F0NFNyQWs)
+* [Esoteric Library](https://drive.google.com/drive/folders/0B0UEkmH7vYJZRWxfSmdRbFJGNWc)
+* [EBookDroid - Google Drive](https://drive.google.com/drive/folders/0B6y-A-HTzyBiYnpIRHMzR1pueFU)
+* [The 510 Archives - Google Drive](https://drive.google.com/drive/folders/0ByCvxnHNk90SMzIxZWIwYWYtYzljNy00ZGU2LWI3ODctYzRjMmE0MGY3NTA1)
+* [Some ebooks](https://drive.google.com/drive/folders/1SReXFt16DYpTdFsSsT5Nzkj33VAYOQLa)
+
+
+"""]]