Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-07-13 19:58:36 -04:00
commit f22f0dd4d8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 241 additions and 0 deletions

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="Plumbing commands vs Porcelain commands"
date="2023-07-13T00:26:20Z"
content="""
@joey (re [comment 11](http://git-annex.branchable.com/bugs/Changing_sync_to_include_content_breaks_UX/#comment-e4557f089ec22304ff506f09f791bb81)): thank you for replying. I apologise if my tone sounded accusatory; in hindsight writing late at night after a stressful year probably wasn't the best idea.
I don't know what your intentions are here, which is part of the problem. From the outside, it feels like the git-annex assistant functionality is now the dominant use case which is guiding the development changes. (And that's understandable if it's most/all of your funding to work on the project.) That seems to be leading towards the \"plumbing\" low level commands being transitioned into higher level (\"porcelain\" in git terminology) commands, which do multiple things; and access to the low level functionality being decreased or removed.
Yes, git has done a few \"UX breaking\" transitions in the past. But in all the cases I can think of the change in functionality was accompanied with obvious documentation on why the change was being made, and how to adapt. For instance the [git safe directory behaviour change (CVE-2022-24765)](https://github.blog/2022-04-12-git-security-vulnerability-announced/) prompted a lot of documentation about why it was changed (eg, [git config manpage](https://git-scm.com/docs/git-config#Documentation/git-config.txt-safedirectory)), and specifically how users could restore the existing functionality where that was appropriate (and, eg, it gave an exact command to run in the warning output). I've been running into that problem repeatedly for a year (including supporting other users), but at least knowing why, and that the recommended \"fix command\" is the ideal solution, makes it quick to explain/fix/carry on.
In the case of this git-annex change to `sync` (from a \"just plumbing\" meta data sync command, to a \"porcelain\" git annex assistant like command) there doesn't seem to be any clear documentation on why it is changing, or the recommended process to preserve (or recreate) the existing default functionality (meta data sync only), particularly in a backwards compatible way.
The reference to the [six year old commit](http://source.git-annex.branchable.com/?p=source.git;a=commitdiff;h=b77903af48e650dbb777f29e98d0c7b388353ebd) which you see as \"starting this change\" adds some more context, but almost all of the \"documentation\" about this change appears to be scattered between some source code commits, some short changelog entries, and a bug that started being about something else. As someone who has followed your personal blog and the git-annex dev blog for years, the change of \"sync\" from a low level plumbing command (with optional \"please do more\") to a high level porcelain command (with optional \"please skip most of the things you want to do) still seemed to appear very suddenly, without any foreshadowing it was going to change. And I remain unclear on the intended timeline for the (fairly fundamental, IMHO) change in default behaviour.
For my purposes I think from here on I'll be doing *all* of the following:
1. On any system I install git-annex, running `git config --global annex.synccontent false` to set my per-user, per-machine state to equivalent to now
2. On any git-annex wrapper scripts I have that run `sync`, explicitly calling `git annex sync --no-content` instead of a bare `git-annex sync` as before
3. On any new git-annex repo that I set up, running `git annex config --set annex.synccontent false`
4. On the existing git-annex repos, as I interact with them again and remember, run `git annex config --set annex.synccontent false` to maintain the existing default
5. On new/existing git-annex repos, consider also running `git annex wanted . present` as well, since that seems to effectively match my current policy (ie, what is there is what is supposed to be there, because I put it there).
(I appreciate that mostly any one of those \"should be sufficient\". But with a lot of existing git annex repos, spread across lots of machines/drives, some of which are offline, the risks of overlooking one or more of them later on is non-trivial. So belt and braces here.)
Ewen
"""]]

View file

@ -0,0 +1,62 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="comment 2"
date="2023-07-13T00:54:34Z"
content="""
Yes, it's also somewhat strange to me that I ended up with two variations of this filename (by UTF-8 encoding) on disk.
The filesystem is the default modern macOS one (APFS, case preserving but case insensitive). From a quick experiement, it looks like the encoding of \"a with circle above\" is one of the \"case preserving but case insensitive\" things of the APFS file system. Ie, I can create either variation, but which ever variation is created first is treated the same as the other variation when it comes to opening/updating the file. Which I guess makes sense, as they're two different UTF-8 encodings of ultimately the same glyph.
FWIW, I also noticed while investigating this, this morning, that the encoding on disk in my annex had switched back to consistently encoding in both my linked copy (2023-11-03) and the annex archive directory (2023-11-04). So with the two versions known to the annex, it feels like I might be seeing a \"first to be created\" race in the UTF-8 encoding used when sync runs.
Also FTR, in addition to the possibility that the podcast RSS changed encoding between the two runs, it's also possible that this got canonicalised by, eg, shell expansion, around 2023-11-03 / 2023-11-04; it definitely looks like I did a bunch of automated \"git mv ...\" to fix up filenames around that point. (And then possibly the next git annex podcast fetch re-learnt the other name.)
Since I seem to have stablised on one encoding no disk right now, I'm going to try to make git-annex forget about the other encoding of the name, to tidy up this particular confusion for now.
But I agree it would be helpful to have a command variant that can accept the octal-encoded byte sequences (now) output by `git annex list`. Both for cases like this, and in general for round tripping output back to input (something I do in some cases to handle scripted checks on annexed files against other things).
Thanks for the reply,
Ewen
```
ewen@basadi:/tmp$ mkdir encoding
ewen@basadi:/tmp$ cd encoding
ewen@basadi:/tmp/encoding$ df -Pm .
Filesystem 1M-blocks Used Available Capacity Mounted on
/dev/disk1s2 1908108 991091 900870 53% /System/Volumes/Data
ewen@basadi:/tmp/encoding$ mount | grep /System/Volumes/Data
/dev/disk1s2 on /System/Volumes/Data (apfs, local, journaled, nobrowse)
map auto_home on /System/Volumes/Data/home (autofs, automounted, nobrowse)
ewen@basadi:/tmp/encoding$ touch A_Conversation_with_Artist_Simon_Stålenhag.mp3
ewen@basadi:/tmp/encoding$ LANG=C ls -lB
total 0
-rw-r--r-- 1 ewen wheel 0 Jul 13 12:40 A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:/tmp/encoding$ touch A_Conversation_with_Artist_Simon_Stålenhag.mp3
ewen@basadi:/tmp/encoding$ LANG=C ls -lB
total 0
-rw-r--r-- 1 ewen wheel 0 Jul 13 12:41 A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:/tmp/encoding$ touch A_Conversation_with_Artist_Simon_Stålenhag.mp3-new
ewen@basadi:/tmp/encoding$ LANG=C ls -lB
total 0
-rw-r--r-- 1 ewen wheel 0 Jul 13 12:41 A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3-new
-rw-r--r-- 1 ewen wheel 0 Jul 13 12:41 A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:/tmp/encoding$ touch A_Conversation_with_Artist_Simon_Stålenhag.mp3-new
ewen@basadi:/tmp/encoding$ LANG=C ls -lB
total 0
-rw-r--r-- 1 ewen wheel 0 Jul 13 12:43 A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3-new
-rw-r--r-- 1 ewen wheel 0 Jul 13 12:41 A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:/tmp/encoding$
```
```
ewen@basadi:~/Music/podcasts$ date
Thu 13 Jul 2023 12:44:54 NZST
ewen@basadi:~/Music/podcasts$ LANG=C ls -lB A_Conv*
-r--r--r-- 2 ewen staff 42854272 Nov 3 2022 A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts$ LANG=C ls -lB archive/Pop_Culture_Detective__Audio_Files/A_Conv*
lrwxr-xr-x 1 ewen staff 206 Nov 4 2022 archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3 -> ../../.git/annex/objects/Xm/3v/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3
ewen@basadi:~/Music/podcasts$
```
"""]]

View file

@ -0,0 +1,147 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="git mv to temporary name, commit, git mv back, commit, can resolve duplicates"
date="2023-07-13T01:12:34Z"
content="""
Interestingly, just renaming the file on disk (`git mv`) is sufficient to make the second (duplicate) entry go away, as the second one gets flagged as \"deleted\". And if I commit both changes, then it seems to be persistent. Ie, *after* the commit, I can `git mv` the file back to the original on-disk name, and commit that, and `git annex list` only shows the one name. That seems to survive `git annex sync --no-content` and even another run of my podcast fetching. So I think that dance solves my immediate \"cannot reference by name\" problem -- ie, move the one on disk aside, commit, move back, commit.
(I still have a problem with my auto-cleanup automation for this repository -- `git annex drop ...` if it's no longer linked into the \"postcasts to play\" repo -- but I'm fairly sure I can fix the detection of that somehow. And the few special cases that no longer auto-drop by \"name from `git annex list`\" I can drop by hand via wildcards or tab-completion.)
Other than the feature request (some way to feed the escaped output back in as input) I think this bug is resolved. Thanks for your comments.
Ewen
```
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list
here
|bethel
||nas01
|||web
||||bittorrent
|||||
XXXX_ \"A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3\"
XXXX_ \"A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
_XXX_ Demystifying_Fair_Use__Copyright__and_Content_ID.mp3
XXXX_ Indiana_Jones_and_The_Case_of_The_Ancient_Aliens.mp3
_XXX_ Solarpunk_and_How_We_Escape_Dystopia.mp3
_XXX_ The_Case_of_Boba_Fett_and_the_Hollywood_Western.mp3
XXXX_ The_Case_of_The_Cursed_Jungle_Cruise.mp3
_XXX_ The_Case_of_The_Falcon_and_The_Winter_Soldier.mp3
XXXX_ The_Continuing_Case_of_Ted_Lasso.mp3
_XXX_ The_Curious_Case_of_Ted_Lasso.mp3
XXXX_ The_Multiversal_Case_of_Everything_Everywhere.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ LANG=C ls -B *Conversation_*
A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git mv A_Conversation_with_Artist_Simon_Stålenhag.mp3 keep-A_Conversation_with_Artist_Simon_Stålenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ LANG=C ls -B *Conversation_*
keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list
here
|bethel
||nas01
|||web
||||bittorrent
|||||
_XXX_ Demystifying_Fair_Use__Copyright__and_Content_ID.mp3
XXXX_ Indiana_Jones_and_The_Case_of_The_Ancient_Aliens.mp3
_XXX_ Solarpunk_and_How_We_Escape_Dystopia.mp3
_XXX_ The_Case_of_Boba_Fett_and_the_Hollywood_Western.mp3
XXXX_ The_Case_of_The_Cursed_Jungle_Cruise.mp3
_XXX_ The_Case_of_The_Falcon_and_The_Winter_Soldier.mp3
XXXX_ The_Continuing_Case_of_Ted_Lasso.mp3
_XXX_ The_Curious_Case_of_Ted_Lasso.mp3
XXXX_ The_Multiversal_Case_of_Everything_Everywhere.mp3
XXXX_ \"keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git status
On branch master
Changes to be committed:
(use \"git restore --staged <file>...\" to unstage)
renamed: \"A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\" -> \"keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
Changes not staged for commit:
(use \"git add/rm <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
deleted: \"A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3\"
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git commit -am 'Consolidate on one UTF-8 encoding of A Conversation with Simon Stalenhag'
[master 05d2c2c50] Consolidate on one UTF-8 encoding of A Conversation with Simon Stalenhag
2 files changed, 1 deletion(-)
delete mode 120000 \"archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
rename \"archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3\" => \"archive/Pop_Culture_Detective__Audio_Files/keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\" (100%)
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git status
On branch master
nothing to commit, working tree clean
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ LANG=C ls -B *Conversation_*
keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list
here
|bethel
||nas01
|||web
||||bittorrent
|||||
_XXX_ Demystifying_Fair_Use__Copyright__and_Content_ID.mp3
XXXX_ Indiana_Jones_and_The_Case_of_The_Ancient_Aliens.mp3
_XXX_ Solarpunk_and_How_We_Escape_Dystopia.mp3
_XXX_ The_Case_of_Boba_Fett_and_the_Hollywood_Western.mp3
XXXX_ The_Case_of_The_Cursed_Jungle_Cruise.mp3
_XXX_ The_Case_of_The_Falcon_and_The_Winter_Soldier.mp3
XXXX_ The_Continuing_Case_of_Ted_Lasso.mp3
_XXX_ The_Curious_Case_of_Ted_Lasso.mp3
XXXX_ The_Multiversal_Case_of_Everything_Everywhere.mp3
XXXX_ \"keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git mv keep-A_Conversation_with_Artist_Simon_Stålenhag.mp3 A_Conversation_with_Artist_Simon_Stålenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ LANG=C ls -B *Conversation*
A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list
here
|bethel
||nas01
|||web
||||bittorrent
|||||
XXXX_ \"A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
_XXX_ Demystifying_Fair_Use__Copyright__and_Content_ID.mp3
XXXX_ Indiana_Jones_and_The_Case_of_The_Ancient_Aliens.mp3
_XXX_ Solarpunk_and_How_We_Escape_Dystopia.mp3
_XXX_ The_Case_of_Boba_Fett_and_the_Hollywood_Western.mp3
XXXX_ The_Case_of_The_Cursed_Jungle_Cruise.mp3
_XXX_ The_Case_of_The_Falcon_and_The_Winter_Soldier.mp3
XXXX_ The_Continuing_Case_of_Ted_Lasso.mp3
_XXX_ The_Curious_Case_of_Ted_Lasso.mp3
XXXX_ The_Multiversal_Case_of_Everything_Everywhere.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git status
On branch master
Changes to be committed:
(use \"git restore --staged <file>...\" to unstage)
renamed: \"keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\" -> \"A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git commit -m 'Restore canonical name for A Conversation with Simon Stalenhag podcast'
[master 8c7249dae] Restore canonical name for A Conversation with Simon Stalenhag podcast
1 file changed, 0 insertions(+), 0 deletions(-)
rename \"archive/Pop_Culture_Detective__Audio_Files/keep-A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\" => \"archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\" (100%)
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ LANG=C ls -B *Conversation*
A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ LANG=C ls -B ../../*Conversation*Simon*
../../A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list
here
|bethel
||nas01
|||web
||||bittorrent
|||||
XXXX_ \"A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3\"
_XXX_ Demystifying_Fair_Use__Copyright__and_Content_ID.mp3
XXXX_ Indiana_Jones_and_The_Case_of_The_Ancient_Aliens.mp3
_XXX_ Solarpunk_and_How_We_Escape_Dystopia.mp3
_XXX_ The_Case_of_Boba_Fett_and_the_Hollywood_Western.mp3
XXXX_ The_Case_of_The_Cursed_Jungle_Cruise.mp3
_XXX_ The_Case_of_The_Falcon_and_The_Winter_Soldier.mp3
XXXX_ The_Continuing_Case_of_Ted_Lasso.mp3
_XXX_ The_Curious_Case_of_Ted_Lasso.mp3
XXXX_ The_Multiversal_Case_of_Everything_Everywhere.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$
```
"""]]