Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-07-12 12:57:46 -04:00
commit 155113d3c8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
16 changed files with 548 additions and 0 deletions

View file

@ -0,0 +1,66 @@
### Please describe the problem.
After upgrading to git-annex 10.20230626, running `git annex sync` reports:
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
which appears to be a barely documented change plan (at least I cannot find it in [the git-annex dev blog](https://git-annex.branchable.com/devblog/), only in the [latest change log]((https://hackage.haskell.org/package/git-annex-10.20230626/changelog)).
From the little that is said in [the 10.20230626 changelog](https://hackage.haskell.org/package/git-annex-10.20230626/changelog), it appears the intention is to -- **after 10 years** -- fairly quickly switch from `git annex sync` just syncing metadata (allowing git annexes to easily hold partial subsets of content), to doing a full content sync bidirectionally (apparently not allowing git annexes to hold partial subsets of content without explicit countermeasures for this behaviour breaking change).
I can understand why users might want a `git annex sync` that syncs content. And even maybe why it might want to be the default for those users who expect, eg, "Dropbox like behaviour".
But **changing the git annex sync default after 10 years** seems extremely user hostile.
Especially so when changing it from something that does not copy much data (default `git annex sync --no-content`) to something which (a) potentially copies a lot of data (over what might be a slow/expensive link), and (b) will potentially fill up drives due to repopulating entire large annexes which have historically relied on having only a subset of the content available locally, if the change in behaviour (after 10 years) is not caught in time.
The idea that users should go around every single git annex (I have dozens, with copies of those on dozens of machines and a bunch of offline drives), and make sure each one has `annex.synccontent` set, or every script that runs `git annex sync` has `git annex sync --no-content` on it, just to *restore the default of 10 years* is a pretty rough transition, and not a great user experience.
I would really strongly suggest that you do not change the behaviour of `sync` in this way *after 10 years*. And if you want a full sync option for user friendliness, then create, eg, `git annex fullsync` which is an alias for `git annex sync --content`.
If you no longer want to support the user model of having "incomplete annexes" (ie, all copies of a git annex must contain local copies of all data except changes made since the last sync), then the deprecation should be explicitly documented with advanced warning.
At minimum this signficant behaviour breaking change needs to be communicated *way* better than a random change log entry, and suddenly appearing in `git annex` output as a warning the world is going to break. And it shouldn't be necessary to, eg, trawl through the git source history to try to find any context for a major planned change.
Some slight saving graces:
* Fortunately `git annex sync --no-content` seems to be accepted at least back to git-annex 6.x, so at least it can be added into scripts without having to also check which git annex version is running; it's just a "no op" option in everything prior to git-annex 10.20030626.
* It looks like [`git annex config --set annex.synccontent false`](https://git-annex.branchable.com/git-annex-config/) might be carried with the repository (across syncs), reducing the number of places the new "override, back to the old default behaviour" setting has to be set (but it has to be set on every existing and new git annex, just to restore the 10 year historical behaviour)
* The [git annex source commit](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=f93a7fce1d5272c3282ce234053d26b10dd44198) has a tiny bit more context about there needing to be a "Debian Stable release" before the default changes, which doesn't seem to be documented anywhere else; if true, then since [Debian Bookworm just released, with git-annex 10.20230126](https://packages.debian.org/bookworm/git-annex) then the change in behaviour is at least 2-3 years away, at Debian's normal stable release schedule. But this doesn't seem to be documented anywhere else.
If the plan is that this change in default behaviour will be, eg, in 2H 2025, then I'd suggest (a) putting that planned date in the warning being issued on every run, and (b) putting that date in the documentation for [git annex sync](https://git-annex.branchable.com/git-annex-sync/) which currently just says "will change in a future version of git-annex" (which is very vague, and could be next month or a decade away). However as stated above, I'd really really strongly suggest just creating a new command, like `fullsync` for the new default behaviour, and *not* breaking backwards compatibility in behaviour.
### What steps will reproduce the problem?
Upgrade to git-annex 10.20230626, run `git annex sync` in a git annex repository, without having set `annex.synccontent` (i the git-annex config, or the git config).
Try to find documentation on this pending change; find nothing other than the changelog note foreshadowing a major change in behaviour after 10 years, and some comments in the source code.
### What version of git-annex are you using? On what operating system?
git-annex 10.20230626, on macOS, installed from Home Brew.
```
ewen@basadi:~$ git annex version
git-annex version: 10.20230626
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.4 http-client-0.7.13.1 persistent-sqlite-2.13.1.1 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
ewen@basadi:~$
```
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Absolutely. I've been using git-annex since nearly the beginning; and use it extensively to maintain partial copies of large annexes on laptops/desktops (and other space constrained systems). Hence this foreshadowed sudden change in behaviour being extremely surprising, and somewhat alarming.
Thanks for writing git-annex. It's the reason I've been sponsoring you on Patreon for years.
Please don't break backwards compatibility. Even in the user experience :-)

View file

@ -0,0 +1,51 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="comment 10"
date="2023-07-12T16:25:48Z"
content="""
@nobodyinperson, @ewen: Thank you for a balanced discussion! I'm not as elegant with words
but here is my take on your ideas.
@nobodyinperson:
> How about:
> - `git annex assist` (exists, works):
Yes, I understand the need for `assist` -- not in any way opposed to it. However, I think both
`assist` and `sync` ought learn a new option called `--edit` (or, `--edit-message` but that's wordy)
to be used in a similar fashion to `git commit -m msg --edit`, ie. it would open the user's preferred editor
to edit the commit message.
> - `git annex metasync` (new proposal):
You mention that this command would be \"Option-wise as 'dumb' as git annex assist\" but I disagree: as a true
replacement for `sync` it definitely should have at least the `--no-commit` flag as an additional option so that those
used to a `git annex add; git commit -m foo [--edit]; git annex sync --no-content --no-commit` workflow could
migrate to it. Also, no auto-adding (please!) if this is supposed be a replacement for `sync --no-content`.
> - `git annex sync` (exists, has many options and configs):
You mention that this command ought to be \"deprecated eventually (in many years), with a big warning upon execution now already\".
Yes, I agree but only if we get a `metasync` with a `--no-commit` flag for reasons above.
---
@ewen:
> Firstly, if `git annex assist` has been added to create the magical \"Dropbox like experience\", I don't see the need for changing the default behaviour of `git annex sync` at all. (For the record I'm glad `git annex assist` exists; I don't need it, or want it, but it clearly solves a bunch of UI problems for some users, and that's great. This is entirely about backwards incompatible UX changes in existing commands.)
I think you have a point there that the situation pre-10.20230626 would be preferable.
But I can live with Yann's (@nobodyinperson) suggestions if there is some amount of compromising.
> Secondly, your (@nobodyinperson) suggestion for \"improving\" the current situation seems to be:
> 1. Entirely remove `git annex sync` (entirely breaking backwards compatiblity) and/or have it permanently display a warning that cannot be removed (annoying, incompatible with anything looking at output); and
> 2. Add a new (ie, not in any older version) `git annex metasync` which has some of the existing functionality, and some random other (unwanted by me at least) additional functionality (auto-adding any files in a directory where git annex is run, which I definitely do not want)
Yes, a replacement for `sync` should not in any case auto-add files. We want more control, not less.
(continued...)
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="comment 1"
date="2023-07-12T06:49:46Z"
content="""
Simultaneously, a new command `git annex assist` has been added that does basically a single step of the assistant, so roughly `git annex add;git add -A;git annex sync --content`. That goes into the direction of your `fullsync` suggestion.
You can read more about the reasoning for `git annex sync` syncing content by default [here](https://git-annex.branchable.com/todo/Having___39__git_annex_sync__39___optionally_add/#comment-fc813a22c713490156234567ed211277), where I suggested to add a config to `git annex sync` that would optionally add new files as well. That turned out to launch an avalanche of new commands and changes (git annex pull, git annex push, git annex assist, sync syncing conent, etc.). 😅
IIUC, Joeys reasoning is that `git annex sync` was incomplete/inconsistent from the start and did too many configurable things. `git annex assist` syncs the entire repo state - as the assistant. Preferred content expressions can be used to specify what files a repo wants. If you set that to `git annex wanted REPO present`, a content sync won't affect it anyway. I still think that a configurable default preferred content expression for new repos is very important, now that syncinc content by default might get the default: https://git-annex.branchable.com/todo/Setting_default_preferred_content_expressions/
"""]]

View file

@ -0,0 +1,21 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="I have to agree that this change to `sync` is annoying"
date="2023-07-12T07:18:54Z"
content="""
Frankly, as a fellow user of \"incomplete annexes\" I also find it rather jarring that having gotten used to git committing my own changes
on the command line and not relying on mere `sync` to do that (or the assistant -- I haven't found a use
for the assistant yet) but then also preferring to use invocations of `git annex sync --no-commit` (which I've abbreviated to
`git annex-sync` by way of a git alias) rather freely to record my git-level changes across remotes,
I would from here on out be forced to add the `-g` flag to all my invocations just to make sure that I don't get
an annoying warning or my then current repo doesn't cause possibly huge content to flow across the remotes because I haven't
remembered to set `annex.synccontent` appropriately. I know the ideal is to have preferred content settings
for all annexes but not every git-annex user have an innate sense of all the intricacies involved in doing that
and for them to be forced to do so is a bit too much, in my humble opinion.
@Joey, I have a suggestion: instead of breaking the UX of `sync`, why not do that on a lesser used command `mirror` and change it
to sync git-level data in addition to files (ie. I suggest that `mirror` could be a better/shorter name for OP's `fullsync`).
Then add a flag called `--only-content` to `mirror` to restore its old behavior. I bet that would have a smaller impact on git-annex's
users than making us count our blessings with `-g`s and all.
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="comment 3"
date="2023-07-12T07:33:16Z"
content="""
> IIUC, Joeys reasoning is that git annex sync was incomplete/inconsistent from the start and did too many configurable things.
That's not a reason to break backwards compatibility for such a prominent command as `sync`. Just no.
> git annex assist syncs the entire repo state - as the assistant.
I don't want to use the assistant -- or it's CLI alter ego assist -- I want my own manicured commit messages, puh-lease! The assistant
is a whole another use case, just not mine at the moment.
Just give us our old `sync` back. Rename it to `synk` (or, `sc`/`sk` -- as in \"sync for [k]urmudgeons\") if you want but let us have it. :)
"""]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="comment 4"
date="2023-07-12T09:21:55Z"
content="""
Just for the record:
- `git annex assist -m MESSAGE` works, you can make own messages.
- `git annex assist` has nothing to do with the assistant and can be used without. It's just `git annex sync --content` plus adding new files. A very important command for newcomers to 'just sync it'.
To get 'the old sync' back, you have several options:
- `git config --global annex.synccontent false` for your entire local machine, overrides the next option
- `git annex config --set annex.synccontent false` for all participants of a git annex repo (yes, it syncs between machines), still overrideable by users' local git configs above
- properly set up preferred content expressions (effectively), e.g. `git annex wanted . present`.
- `git annex sync --no-content`
If you don't use preferred content expressions to decide which repo should get which files and you want only partial checkouts with some files, then I indeed don't see why you would ever need `git annex sync --content` as you probably manually `get` the files you want. In that case one could argue that `sync` is a misleading word, you are actually only looking for a metadata sync (`msync`, `metasync`, ...?) strictly without content.
[I was also surprised](https://git-annex.branchable.com/todo/Having___39__git_annex_sync__39___optionally_add/#comment-c8c3138128a684080e2aaafc48aedfcf) that joey opted for introducing `assist` instead of teaching `sync` how to (optionally) add new files as well and keeping the rest as it was. But I can understand the direction of his reasoning.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="History of change"
date="2023-07-12T10:27:43Z"
content="""
Thanks for the [pointer to the earlier bug discussing changing sync entirely](https://git-annex.branchable.com/todo/Having___39__git_annex_sync__39___optionally_add) @nobodyinperson; it's at least useful to have a reference for where all this breaking change suddenly came from given it's otherwise undocumented. I've added a back pointer comment to that bug pointing at this one, as they're clearly related discussion (and it wasn't obvious the other \"sync\" bug was at all related to the change).
I still strongly disagree with changing \"sync\" to do something pretty different from what it's done for 10 years being a good idea. But would certainly support there being *another* command that did a \"Dropbox like full sync\".
(Personally I'd really really prefer that \"sync\" stayed doing the same thing it's done for a decade: meta data sync, only. If we're going to have to run a new different command just to retain core \"meta data sync\" behaviour, I guess \"git annex sync --no-content\" is not much worse than any other command we'd have to remember to suddenly use instead starting version N + 1.)
Ewen
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="Preferred content settings not really an alternative"
date="2023-07-12T10:53:41Z"
content="""
To echo @jkniiv's \"preferred content settings for all annexes\", as a git annex user for at least 8 years (the annex I found this issue in dates back to 2015), I've still never figured out how \"preferred content settings\" are supposed to help in git annex -- let alone for my use cases.
As an example, the git annex I spotted this new warning in is one I use for downloading/collecting podcasts. The downloads happen on my desktop, or sometimes a laptop. All content gets *pushed* to two of my NASes as soon as possible (for an \"archival copy\" -- immediately after download if I'm at home), and then once I've listened to the podcast I drop it from my desktop/laptop, so it's gone from the \"to listen\" queue (and historically, to recover space on smaller laptop drives). I don't want the files suddenly appearing again, unless I need to refer to it for some reason, then I do \"git annex get ...\".
There's around 300GB of podcasts in the annex (around 7000 files) on my NASes; but the checked out version on my desktop is 9GB (about 200 files). 300GB is more space than some of my laptop internal drives have in total; and definitely more free space than any of them had, which is one of the reasons for using git annex for this purpose.
The \"preferred content policy\" is \"if I just downloaded it I want it until I drop it, then I don't want it unless I explicitly request it again, but the two NASes should always have copies\". From [setting git annex preferred content policy](https://git-annex.branchable.com/preferred_content/), and the [preferred content policy syntax](https://git-annex.branchable.com/git-annex-preferred-content/), AFAICT \"present\" is the closest thing to the \"content policy\" I want on my desktop/laptops, ie \"trust the user, I know what I'm doing, don't automatically move files\". But up to version \"10.20230626 + N\" that was also the default without a content policy, as file content didn't automagically move around unless some other policy was explicitly set.
Ewen
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="There is already something like fullsync: `git annex assist`, suggestion: new `metasync` command"
date="2023-07-12T11:15:26Z"
content="""
> But would certainly support there being another command that did a \"Dropbox like full sync\".
You might have missed it above, but `git annex assist` is **exactly** that new command that does a 'Dropbox-like sync' (adding new files, adding changes, metadata sync, content sync) and as OP suggested with `fullsync`. It is there, it works, it's wonderful. I think we can put that part of the discussion to rest.
> Personally I'd really really prefer that \"sync\" stayed doing the same thing it's done for a decade: meta data sync, only.
Actually, `git annex sync` isn't always **just** metadata sync. If you're in a repo where someone (anyone!) decided to run `git annex config --set annex.synccontent true`, there flies your Terabytes of content around when you do your `git annex sync`, assuming it's metadata-only. At first I was a fan of teaching `git annex sync` everything with flags and configs, but I think joey is right that this is inconsistent and leads to surprises.
If we're at it, we could get rid of the ambiguous `sync` command altogether (this is what joey also proposed for the very long run) and introduce a metadata-only sync command (`metasync`?) that **can't** sync content. Not with a setting, not with a flag, it just can't. Then that's perfect for those who only want to sync metadata and `get` files they want manually and without preferred content expressions (I suppose that's how you do it?).
How about:
- `git annex assist` (exists, works):
- syncs everything: metadata&content, leaves the repo completely clean and all participating remotes updated to the same state
- for those who don't want to run the assistant, but still have 'one command to sync it all' whenever they feel like it or want to specify a custom commit message.
- Preferred content expressions manage what files land where.
- Very important for newcomers as this will effectively be the only command they'll need to know to get a workflow started.
- `git annex metasync` (new proposal):
- basically `git annex sync --no-content` that **can't** ever sync content. No surprises through configs.
- Leaves the repo in a clean state and all connected remotes at the same state of metadata (the git stuff). Also adds new files! People who want more control should use the manual commands `git annex add|push|pull|etc.`.
- Option-wise as 'dumb' as `git annex assist'
- for those who never want automatic content syncing but `get|copy|move|...` stuff manually and don't want to fiddle with preferred content expressions
- current users of 'old' `git annex sync` without content syncing will eventually switch to this.
- `git annex sync` (exists, has many options and configs):
- roll back to non-content-syncing as before, for those that don't want to change their workflows yet
- deprecated eventually (in many years), with a big warning upon execution now already.
"""]]

View file

@ -0,0 +1,43 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="Alternative proposal for better backwards compatibility"
date="2023-07-12T12:48:03Z"
content="""
Firstly, if `git annex assist` has been added to create the magical \"Dropbox like experience\", I don't see the need for changing the default behaviour of `git annex sync` at all. (For the record I'm glad `git annex assist` exists; I don't need it, or want it, but it clearly solves a bunch of UI problems for some users, and that's great. This is entirely about backwards incompatible UX changes in existing commands.)
Secondly, your (@nobodyinperson) suggestion for \"improving\" the current situation seems to be:
1. Entirely remove `git annex sync` (entirely breaking backwards compatiblity) and/or have it permanently display a warning that cannot be removed (annoying, incompatible with anything looking at output); and
2. Add a new (ie, not in any older version) `git annex metasync` which has some of the existing functionality, and some random other (unwanted by me at least) additional functionality (auto-adding any files in a directory where git annex is run, which I definitely do not want)
Which, to me, is *even more* breaking the user experience of using `git annex` for the last 8 years, than is currently happening. AFAICT there would then be *no way* at all to get the `git annex` \"just sync metadata without other unwanted behaviour\" that existed before. *And* it'd then require git annex version detection in scripts to tell if `git annex metadata --no-add-really-just-copy-metadata-honest` was available or not, or if one had to fall back to an old version of the command, which is a really user-hostile way to \"retain\" existing behaviour. (FTR in my case at least there is *zero* chance that `git annex sync` would suddenly encounter `annex.synccontent=true`, as it's just me, and I haven't set that; I don't need a \"guarantees\" it won't copy content option, except as a defense against the git annex defaults suddenly changing after 8 years.)
My suggestion for preserving backwards compatibilty would be:
1. Move the behaviour change out of `git annex sync` and into `git annex init` (rationale: it's run once per annex/machine pair, not many times per day/week, *and* it makes the change \"new annexes only\" which is much closer to \"opt in\")
2. From git annex version N + 1, have git annex automatically set `annex.synccontent=true` at `git annex init` of a *new* repo (ie, not just naming the local copy of the annex), and for version N+1 to N+3 (or higher) have `git annex init` issue a warning it has defaulted sync content on now (unlike earlier versions), and describe how to find out how to turn it off (unless a global git setting is making `annex.synccontent=true` the user preferred default anyway, in which case the user doesn't need a warning of a setting matching their preferred option).
3. For any annex where `annex.synccontent` is *not* set, assume it's an older annex, and use the backwards compatible, historical, default (false) *without* issuing any warnings about \"this is going to change\" (and never change that behaviour for historical annexes)
4. Retain `git annex sync` forever, with it obeying `annex.synccontent` (default: false, but set to true in newly created annexes by default)
5. Remove the newly added warning in `git annex sync` entirely, and just keep the functionality of the version before 10.20230623
This would:
1. Allow changing the `git annex sync` default from `metadata only` to `full content` in a \"new usage, opt in only\" way
2. Provide a clear path to opt out, and a clear path to opt in existing repos to match
3. Avoid needing to issue warnings to users on every run of a very core command until they set config
4. Avoid breaking existing usage, and use cases
The git annex \"meta data sync\" dance (which effectively allows `git push` into a repo that has a working directory attached, something that is normally otherwise difficult) was great, and the thing which made git annex attractive to me over other options. It'd be a shame to render that great functionality unreachable without unwanted \"some users wanted this, so all users must have this\" functionality.
Ewen
PS: `git annex push` / `git annex pull` do not appear to be meta data only tools to replace sync; they appear to be very explicitly targetted at copying content, and to be one remote at at a time as well (where as sync is \"all known remotes\"). Historically it looks like my wrapper scripts use `git annex get ...` to retrieve content, and `git annex copy --to=REMOTE ...` to put content onto a specific remote (ie, a `put`, when there is no `put`).
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="comment 9"
date="2023-07-12T13:10:26Z"
content="""
Fine with me to keep `git annex sync` at its old behaviour. I'm happy about `assist|push|pull` and would also be happy with a `metasync`.
These are all decisions that joey will have to decide himself. He has written this amazing piece of software that I'll be forever grateful for (Thanks joey!!). If he decides that `git annex sync` syncs content, so be it.
Your idea with moving the content-syncing into `init` sounds reasonable from a backwards-compatibility point of view. But I doubt joey will want to make `init` (or any other command) accumulate patchy stuff.
"""]]

View file

@ -0,0 +1,157 @@
### Please describe the problem.
FTR, I can work around this specific problem (somehow, not sure how yet); I also didn't intend to report two bugs in 10.20230626 today, but I found them both in a specific repo while trying to work around new error output from 10.20230626, and both of them seemed to justify being reported. (To avoid future confusion, [other bug, relating to change in meaning of sync](https://git-annex.branchable.com/bugs/Changing_sync_to_include_content_breaks_UX/).)
In short, I've ended up with the "same" podcast known in git-annex by two different encodings of the filename, both of which appear in `git annex list`, but only one of which appears in the checked out annex file system.
While `git annex list` can *show* these uniquely, there doesn't appear to be a way to identify the relevant file to operate on uniquely to, eg, `git annex drop`, or `git annex whereis`, or even `git annex list` being more specific. They do not appear to accept the "encoded format" that is output by `git annex list`, which makes roundtriping filenames printed out difficult. And since the files just differ by encoding, I'm not even sure if there is *any* way to specify one of them.
I found this while trying to debug what had happened with a podcast downloaded from:
https://popculturedetective.agency/feed/podcast/
with git-annex; specifically the episode with the title starting "A Conversation with Artist Simon ...", where the artists surname has a latin accented character in it.
My git-annex list is showing two references to that podcast, with subtly different names:
```
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list | grep "A_Conv"
XXXX_ "A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3"
XXXX_ "A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3"
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$
```
one of which matches the file now checked out on disk in that directory:
```
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ LANG=C ls -B | grep A_Conv
A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$
```
and the other does not. As best I can tell `\303\245` is the UTF-8 for U+00E5 ("a" with small ring above), and `\314\212` is the UTF-8 for U+030A (combining character, small ring above; after an "a"). So both are somewhat legitimate ways to encode that particular accented "a".
The podcast feed (now?) has the `61 cc 8a` varient in the title of the podcast episode (ie, a, plus combining ring; equivalent to `a\314\212` as git annex now encodes it).
Digging back through the git history, it appears I had the `archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3` variant by 2022-11-04, and downloaded the `archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3` version on 2022-11-03, the day before. (My commit comment from that date implies I was fixing libsyn filenames, I think to remove a URL suffix on them and/or avoid duplicate downloads; so I may also have upgraded git annex around that timeframe.)
I also have a file hard linked to this podcast (from when it was first downloaded, 2022-11-03) which has the other encoding, implying that at one point git-annex put the other variation into the checked out files (since I hard link all "newly downloaded" files into another directory, targetted at the content file inside the annex, to make them easier to play back).
```
ewen@basadi:~/Music/podcasts$ ls -ilB A_Conversation_with_Artist_Simon_Stålenhag.mp3
19692405 -r--r--r-- 2 ewen staff 42854272 3 Nov 2022 A_Conversation_with_Artist_Simon_Stålenhag.mp3
ewen@basadi:~/Music/podcasts$ ls -ilBL archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_Stålenhag.mp3
19692405 -r--r--r-- 2 ewen staff 42854272 3 Nov 2022 archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_Stålenhag.mp3
ewen@basadi:~/Music/podcasts$ ls -ilB archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_Stålenhag.mp3
19757493 lrwxr-xr-x 1 ewen staff 206 4 Nov 2022 archive/Pop_Culture_Detective__Audio_Files/A_Conversation_with_Artist_Simon_Stålenhag.mp3 -> ../../.git/annex/objects/Xm/3v/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3
ewen@basadi:~/Music/podcasts$ ls -il .git/annex/objects/Xm/3v/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3
19692405 -r--r--r-- 2 ewen staff 42854272 3 Nov 2022 .git/annex/objects/Xm/3v/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3/SHA256E-s42854272--5b156789d2152e69dd0738bb75d42ddd9172a891e9646cc53d86963bd6014dc2.mp3
ewen@basadi:~/Music/podcasts$
```
Given the Podcast feed itself (still) contains the UTF-8 (hex) `61 cc 8a`, which is the version I seem to have from first download, it feels like git annex might have changed to canonicalisng the UTF-8 in a way that it didn't previously *and* handled having files with the "old" encoding by (a) changing to the new coding (eg, in the checked out file) *and* (b) retaining the old *and* new encodings in the list of files. (And it seems like this would have happened in a 2022 release of git-annex.)
What seems to have changed with 10.20230626 is (from the [10.20230626 changelog](https://hackage.haskell.org/package/git-annex-10.20230626/changelog)):
```
* Many commands now quote filenames that contain unusual characters the
same way that git does, to avoid exposing control characters to the
terminal.
```
which makes sense as far as it it goes, so now the two different encodings known to git annex are visible in the "list".
But the format in the output for filenames containing UTF-8 is not accepted by, eg, "git annex drop", which threw error messages, which I noticed today.
In particular (a) commands *receiving* file names do not seem to understand these escaped versions, which makes round tripping of filenames (eg "git annex list" filtered and then handed back to "git annex drop") more difficult than all previous versions of git annex, (b) the UTF-8 characters are output as octal escapes of each individual byte, which makes matching them against filenames in the file system more difficult (although it seems usually they should match `LANG=C ls -B` output -- I just got unlucky here with git annex learning about two variations on the name...).
### What steps will reproduce the problem?
Something like:
```
git annex importfeed --template='archive/${feedtitle}/${itemtitle}${extension}' https://popculturedetective.agency/feed/podcast/
git annex list archive/Pop_Culture_Detective__Audio_Files
LANG=C ls -B archive/Pop_Culture_Detective__Audio_Files/*
git annex list "A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3"
```
but it may require first importing the feed with git-annex older than 10.20230626 and then again with git annex 10.20230626 (I'm not entirely clear how I ended up in this exact state; but that specific feed definitely was imported with an older git-annex first, I'm just not 100% certain how old).
### What version of git-annex are you using? On what operating system?
Now git-annex 10.20230626, on macOS, installed from Home Brew; before git-annex 10.2022xxxx, on macOS, installed from Home Brew (around early November 2022):
```
ewen@basadi:~$ git annex version
git-annex version: 10.20230626
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.4 http-client-0.7.13.1 persistent-sqlite-2.13.1.1 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
ewen@basadi:~$
```
### Please provide any additional information below.
Additional example of the "cannot use output name as input" problem:
```
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list | grep A_Conv
XXXX_ "A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3"
XXXX_ "A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3"
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list "A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3"
here
|bethel
||nas01
|||web
||||bittorrent
|||||
error: pathspec 'A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3' did not match any file(s) known to git
Did you forget to 'git add'?
list: 1 failed
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex list "A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3"
here
|bethel
||nas01
|||web
||||bittorrent
|||||
error: pathspec 'A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3' did not match any file(s) known to git
Did you forget to 'git add'?
list: 1 failed
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$
```
And proof that the actual downloaded file origin is the same:
```
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$ git annex whereis | sed -n '/A_Conv/,/ok/p'
whereis "A_Conversation_with_Artist_Simon_Sta\314\212lenhag.mp3" (4 copies)
00000000-0000-0000-0000-000000000001 -- web
4e10813c-063b-4cd9-9680-db6685b1c5e8 -- bethel_data_drive [bethel]
680fc999-1dc8-465d-b5ae-5defdb18d019 -- basadi (Mac Mini 2020) [here]
9f693b73-3283-45fa-83d3-251d57da7cd3 -- Synology DS216+ [nas01]
web: https://popculturedetective.agency/podcast-download/18821/a-conversation-with-artist-simon-sta%cc%8alenhag.mp3
ok
whereis "A_Conversation_with_Artist_Simon_St\303\245lenhag.mp3" (4 copies)
00000000-0000-0000-0000-000000000001 -- web
4e10813c-063b-4cd9-9680-db6685b1c5e8 -- bethel_data_drive [bethel]
680fc999-1dc8-465d-b5ae-5defdb18d019 -- basadi (Mac Mini 2020) [here]
9f693b73-3283-45fa-83d3-251d57da7cd3 -- Synology DS216+ [nas01]
web: https://popculturedetective.agency/podcast-download/18821/a-conversation-with-artist-simon-sta%cc%8alenhag.mp3
ok
ewen@basadi:~/Music/podcasts/archive/Pop_Culture_Detective__Audio_Files$
```
(where I had to use that command variant, as there doesn't seem to be any input method to specify those two encodings separately :-/ )
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Thanks again for writing git-annex. It's been mostly trouble free for 10 years as a "podcatcher". Today seems to be something of a "find all the surprises" day, having upgraded git-annex (from Home Brew) earlier this week.

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="Breaking change to "sync""
date="2023-07-12T10:21:03Z"
content="""
Via a comment on [my bug about the new `sync` warning suddenly appearing in 10.20230626](https://git-annex.branchable.com/bugs/Changing_sync_to_include_content_breaks_UX/?updated) I see that this fairly hidden discussion seems to have been the rationale for completely changing what \"sync\" does \"because some users expect it to do everything\".
At the beginning of the design I might have agreed with you about \"what the sync command should do\" (and having another, eg, \"metasync\" command for the smaller version). But changing a fundamental command, a decade later, to do something different, based entirely on a \"wouldn't it have been nice (for some use cases) if...\" seems quite a stretch.
As per the other bug, if you want to implement new behaviour for such a fundamental command (\"git annex sync\" is something one runs pretty constantly if using git annex actively from the command line) then it'd be best to implement it in another command name, instead of dramatically changing an existing one. (The other thread has a few more suggestions; I personally still like \"fullsync\" for the expanded version.)
Either way, such a major -- behaviour breaking -- change needs to be much better documented than \"discussion in a bug that a few people saw\", and a note in passing in a changelog (which people have to find by noticing strange new output in probably scripted git annex usage).
Ewen
PS: I too think it's a terrible idea to have git annex default to auto-adding any files it can find. Maybe that makes some \"Dropbox-like\" use cases nicer, but it also entirely breaks other long standing use cases.
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="Clarification"
date="2023-07-12T11:29:41Z"
content="""
@ewen for the record:
I never suggested that `git annex sync` auto-adds new files by default, see [my comment above](http://git-annex.branchable.com/todo/Having___39__git_annex_sync__39___optionally_add/#comment-37c3eb24df11d85c07b78c0297447c45):
> Exactly, I also would never want git annex sync to do the adding by default - strictly as an opt-in configuration.
As `git annex sync` already had so many options for configuring its behaviour, I thought having one more that runs `git annex add` in the beginning wouldn't hurt. I never suggested changing `git annex sync`'s default behaviour. Purely opt-in. Same for the existing options like `annex.synccontent` etc.
Changing `git annex sync`'s default behaviour of now syncing content was joey's idea, not mine.
> if you want to implement new behaviour for such a fundamental command
There is already `git annex assist` to do exactly that. See above or the changelogs.
Just to clarify. Please read the discussions you're referring to thoroughly before claiming things between the lines. 🙂
Now back to constructive discussion.
"""]]

View file

@ -0,0 +1,22 @@
Hi joey,
(This might be considered a bug but it is not a dealbreaker so I put it as a todo)
When multiple configured git remotes have the same `annex-uuid` (e.g. there's multiple URLs to the same repo, which can be the case if there's a fast local url and a slow publicly accessible url), `git annex sync|assist|push --jobs=cpus` (nightly build) pushes simultaneously via all paths. Depending on who wins the race, slower pushes fail with something like:
```
remote: error: cannot lock ref 'refs/heads/synced/master': is at ce08fb81df1df82357dfd58aad9d1f65e6d2e58a but expected e8a067f2
19eaa5947c3553e7d06c23d5f830a6de
To ssh://URL
! [remote rejected] master -> synced/master (failed to update ref)
```
Immediately syncing again afterwards skips the pushing as everything is up to date.
This is just a cosmetic inconvenience and doesn't stop syncing from working. However, I guess nobody likes having red error messages thrown at them (can cause a certain spike in heart rate if one is on a tight schedule 😉). git annex knows that the remotes are effectively the same repo, so it could just push to them in sequence (sorted by lowest cost) and stop after the first successful push.
Cheers,
Yann
P.S.: Thanks for git-annex, it is just a joy to use 🙂

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="jstritch"
avatar="http://cdn.libravatar.org/avatar/56756b34ff409071074762951d270002"
subject="comment 3"
date="2023-07-11T18:56:39Z"
content="""
The --json-progress flag should produce progress. I do not understand why it would be a problem to intermix types of 'action' in the output. If file A is uploaded and file B is downloaded and checksummed, that's (at least) three different progress objects to produce. This is exactly the type of information I want to see on the screen instead of wondering if it's stuck. I would also apply this pattern to sync and friends.
Upload A 35%
Upload A 100%
Download B 100%
Checksum B 50%
Checksum B 100%
Go ahead and make per-file json error results available and describe in the documentation.
"""]]