Suggest using "git annex fullsync" to trigger a full sync, not changing "git annex sync" behaviour

This commit is contained in:
ewen 2023-07-12 04:16:48 +00:00 committed by admin
parent 538d602c88
commit 6ed3ed7e57

View file

@ -0,0 +1,66 @@
### Please describe the problem.
After upgrading to git-annex 10.20230626, running `git annex sync` reports:
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
which appears to be a barely documented change plan (at least I cannot find it in [the git-annex dev blog](https://git-annex.branchable.com/devblog/), only in the [latest change log]((https://hackage.haskell.org/package/git-annex-10.20230626/changelog)).
From the little that is said in [the 10.20230626 changelog](https://hackage.haskell.org/package/git-annex-10.20230626/changelog), it appears the intention is to -- **after 10 years** -- fairly quickly switch from `git annex sync` just syncing metadata (allowing git annexes to easily hold partial subsets of content), to doing a full content sync bidirectionally (apparently not allowing git annexes to hold partial subsets of content without explicit countermeasures for this behaviour breaking change).
I can understand why users might want a `git annex sync` that syncs content. And even maybe why it might want to be the default for those users who expect, eg, "Dropbox like behaviour".
But **changing the git annex sync default after 10 years** seems extremely user hostile.
Especially so when changing it from something that does not copy much data (default `git annex sync --no-content`) to something which (a) potentially copies a lot of data (over what might be a slow/expensive link), and (b) will potentially fill up drives due to repopulating entire large annexes which have historically relied on having only a subset of the content available locally, if the change in behaviour (after 10 years) is not caught in time.
The idea that users should go around every single git annex (I have dozens, with copies of those on dozens of machines and a bunch of offline drives), and make sure each one has `annex.synccontent` set, or every script that runs `git annex sync` has `git annex sync --no-content` on it, just to *restore the default of 10 years* is a pretty rough transition, and not a great user experience.
I would really strongly suggest that you do not change the behaviour of `sync` in this way *after 10 years*. And if you want a full sync option for user friendliness, then create, eg, `git annex fullsync` which is an alias for `git annex sync --content`.
If you no longer want to support the user model of having "incomplete annexes" (ie, all copies of a git annex must contain local copies of all data except changes made since the last sync), then the deprecation should be explicitly documented with advanced warning.
At minimum this signficant behaviour breaking change needs to be communicated *way* better than a random change log entry, and suddenly appearing in `git annex` output as a warning the world is going to break. And it shouldn't be necessary to, eg, trawl through the git source history to try to find any context for a major planned change.
Some slight saving graces:
* Fortunately `git annex sync --no-content` seems to be accepted at least back to git-annex 6.x, so at least it can be added into scripts without having to also check which git annex version is running; it's just a "no op" option in everything prior to git-annex 10.20030626.
* It looks like [`git annex config --set annex.synccontent false`](https://git-annex.branchable.com/git-annex-config/) might be carried with the repository (across syncs), reducing the number of places the new "override, back to the old default behaviour" setting has to be set (but it has to be set on every existing and new git annex, just to restore the 10 year historical behaviour)
* The [git annex source commit](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=f93a7fce1d5272c3282ce234053d26b10dd44198) has a tiny bit more context about there needing to be a "Debian Stable release" before the default changes, which doesn't seem to be documented anywhere else; if true, then since [Debian Bookworm just released, with git-annex 10.20230126](https://packages.debian.org/bookworm/git-annex) then the change in behaviour is at least 2-3 years away, at Debian's normal stable release schedule. But this doesn't seem to be documented anywhere else.
If the plan is that this change in default behaviour will be, eg, in 2H 2025, then I'd suggest (a) putting that planned date in the warning being issued on every run, and (b) putting that date in the documentation for [git annex sync](https://git-annex.branchable.com/git-annex-sync/) which currently just says "will change in a future version of git-annex" (which is very vague, and could be next month or a decade away). However as stated above, I'd really really strongly suggest just creating a new command, like `fullsync` for the new default behaviour, and *not* breaking backwards compatibility in behaviour.
### What steps will reproduce the problem?
Upgrade to git-annex 10.20230626, run `git annex sync` in a git annex repository, without having set `annex.synccontent` (i the git-annex config, or the git config).
Try to find documentation on this pending change; find nothing other than the changelog note foreshadowing a major change in behaviour after 10 years, and some comments in the source code.
### What version of git-annex are you using? On what operating system?
git-annex 10.20230626, on macOS, installed from Home Brew.
```
ewen@basadi:~$ git annex version
git-annex version: 10.20230626
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.4 http-client-0.7.13.1 persistent-sqlite-2.13.1.1 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
ewen@basadi:~$
```
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Absolutely. I've been using git-annex since nearly the beginning; and use it extensively to maintain partial copies of large annexes on laptops/desktops (and other space constrained systems). Hence this foreshadowed sudden change in behaviour being extremely surprising, and somewhat alarming.
Thanks for writing git-annex. It's the reason I've been sponsoring you on Patreon for years.
Please don't break backwards compatibility. Even in the user experience :-)