Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-07-05 11:54:43 -04:00
commit 5aad0cea83
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 185 additions and 2 deletions

View file

@ -0,0 +1,34 @@
### Please describe the problem.
It seems that `git annex assist` doesn't actually do multiprocessing despite `--jobs=cpus` or `git config annex.jobs cpus`
### What steps will reproduce the problem?
- make an annex repo
- add more than one remote to it
- add some files
- run `git annex assist --jobs` (operates sequentially)
- run `git annex sync --jobs` (operates in parallel)
- set `git config annex.jobs cpus`
- run `git annex assist` (operates sequentially)
- run `git annex sync` (operates in parallel)
### What version of git-annex are you using? On what operating system?
[[!format sh """
git annex version
git-annex version: 10.20230627-g06f734555
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
git-annex is **marvellous** and a game-changer for all my workflows 👍

View file

@ -0,0 +1,24 @@
### Please describe the problem.
Familiarizing myself more with adjusted branches mode and might be doing smth wrong. But in this http://www.oneukrainian.com/tmp/case-20230630.tgz case I observe that `annex sync` simply updates `master` to some prior state, thus possibly silently causing a data loss for me if I don't spot it:
```
tar -xzf case-20230630.tgz
cd case
content.html@ datasets.datalad.org/ subfolder/
( source ~/git-annexes/10.20230626+git13-g029d12815c.env; git annex version | head -n 1; git describe master; git checkout 'adjusted/master(unlocked)'; git annex sync ; git describe master; )
git-annex version: 10.20230626+git13-g029d12815c-1~ndall+1
0.0.0-2-gf34191a
Switched to branch 'adjusted/master(unlocked)'
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/master(unlocked)
nothing to commit, working tree clean
ok
0.0.0-1-gde710c5
```
PS investigation of adjusted/unlocked came up in ReproNim context where people wanted a "hard copy" of the fmriprep results without symlinks to simplify navigation of the results in the browser, which otherwise due to browser resolving symlinks makes it hard and require a workaround like starting a webserver [as we documented in dbic handbook](https://dbic-handbook.readthedocs.io/en/latest/datalad.html#how-to-view-mriqcfmriprepetc-dataladified-results-in-a-browser)
[[!meta author=yoh]]
[[!tag projects/repronim]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="WiFi without broadcasting"
date="2023-06-30T07:56:03Z"
content="""
Turns out, both WiFi networks here disallow broadcasting, so the local pairing of the webapp doesn't work. Just tried it with two laptops, the pairing request is never shown.
So the server solution with SSH keys it is, then!
"""]]

View file

@ -20,7 +20,17 @@ or as a shim that runs an external git diff driver.
If some of your annexed files are textual in form, and can be usefully
diffed with diff(1), you can configure git to use this command to diff
them, by configuring `.gitattributes` to contain eg `*.txt diff=annextextdiff`
and setting `git config diff.annextextdiff.command "git annex diffdriver --text"`
and setting `git config diff.annextextdiff.command "git annex diffdriver --text"`.
The following can thus be used to configure `git diff` (only in your local
repository) to operate on the contents of annexed files:
```sh
echo '* diff=annextextdiff' >> .git/info/attributes
git config diff.annextextdiff.command "git annex diffdriver --text"
```
Note, however, that this will change the diff mechanism for *all* tracked files,
so `git diff` might look a little different than normal.
If your annexed files are not textual in form, you will need an external
diff driver program that is able to diff the file format(s) you use.

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="dud225@35a1ee469f82f3a7eb1f2dce4ad453f5e47bdfd3"
nickname="dud225"
avatar="http://cdn.libravatar.org/avatar/5147563e50c475918474594d93be95c2"
subject="Disabling a special remote"
date="2023-07-04T20:54:19Z"
content="""
Is there any way to enable a special remote on demand?
My configuration is comprised of local standard remotes (directories located on the same computer) and my phone as described in the Android tutorial [1]. I've noticed that git-annex automatically ignores the standard remotes that are missing (the aforementioned directories actually map to external drives that may or may not be connected), that's perfect, but for the phone special remote it seems to always try to connect to it, as a result the command \"git annex sync\" constantly outputs error messages:
list phone error: device 'xxxx' not found
git-annex: Unable to list contents of phone: adb find failed
failed
[1] https://git-annex.branchable.com/tips/android_sync_with_adb/
"""]]

View file

@ -0,0 +1,46 @@
First mentioned [here](https://git-annex.branchable.com/todo/new_command_for_syncing_content_only/#comment-0814894ed5e40c9d0f7aef694cce53c1)
## TL;DR: Feature Proposal: A matching option to operate only on files within a git revision range
## Motivation
I use git-annex for several repositories where content is added automatically, e.g.:
- my phone (pictures I take, selected pictures others send me, etc.)
- my research data repo (new files are added as data comes in)
Those repos don't necessarily have the assistant running as I want to control when and what's being added. I also have `annex.synccontent=false` set because full availability of all files doesn't make sense. I imagine I am not the only one using git annex like this.
Most of the time, when I want to access content from such repos from another machine, it is often just the most recent content. Example: I take a picture on my phone and would like to have it *now* on my desktop. The workflow is:
[[!format bash """
yann@phone> git annex assist # takes ages for some reason, but only when --content functionality is active
yann@desktop> git annex assist # doesn't pull all content from server because I have annex.synccontent=false set (too much space otherwise)
# selectively get files I want (tedious, manual picking)
yann@desktop> git annex get file1 file2 file3 ...
"""]]
## Workaround
One can script the following to only sync the recently touched files:
[[!format bash """
# Specifying a git rev range by having `git diff` figure out the details
yann@desktop> git diff --name-only HEAD~20 | xargs -d'\n' git annex get
# Specifying a time range
yann@desktop> git log -p --since="1 week ago" | grep -e '^[+-]\{3\}' | cut -c5- | grep -vx '/dev/null' | cut -c3- | sort -u | xargs -d'\n' git annex get
"""]]
This kinda works but...
- is fragile shell scripting
- doesn't like deleted files that much (they get handed to git-annex, which complains that those don't exist)
- is two very separate solutions for the same thing: only operating on recent files
## Proposal
How about a `--recent` or `--since` or `--revs` etc. option which you can hand either a commit(range) or a `git log --since`-compatible string like `1 week ago`, which will cause git-annex to only consider those files for `get`ing or `drop`ing or `sync`ing or whatnot?
I observed that `git annex sync --content` is often very much slower than `git annex sync --no-content` (without the time the actual syncing takes naturally), apparently because it needs to check a whole lot of files for syncing necessity. If that is the case, then a `--since` option could result in a speed improvement as only a very small amount of files would need to be checked for.
Also, it would be awesome if one could say `git annex assist|sync --since=yesterday` and one would end up with a perfectly synced repo and the files touched yesterday being available. This is a situation I find myself needing on a daily basis.

View file

@ -0,0 +1,30 @@
Thank you joey for `git annex diffdriver --text`, that is a big step towards easier diffing of annexed files. The following is now a copy-paste solution to 'make git diff work with git annex':
```
echo '* diff=annextextdiff' >> .git/info/attributes && git config diff.annextextdiff.command "git annex diffdriver --text"
```
This however then has `git diff` use git-annex' diffing mechanism for *all* files, including normal git-tracked files. There probably is no gitattributes-way of applying a diff command only to annexed files, right?
## Customizing `diff` options
Apparently, `git annex diffdriver --text` uses the system's `diff` command and doesn't (allow to) give it any specific options. The below points could be worked around by having something like `git annex diffdriver --text --diffopts='--color=auto'` so that the user can customize the `diff` invocation. Alternatively you could introduce an environment variable like `GIT_ANNEX_DIFFDRIVER_DIFF_OPTIONS` or shorter `GIT_ANNEX_DIFF_FLAGS` that could also be used to temporarily diff *that one file* with specific options.
## Coloring the output
How about passing the (sanitized, `false`→`never` and `true`→`auto`) `git config color.diff` (and fallback `color.ui`) setting as the `diff --color=...` option? The experience would then match the users configured expectation.
## Handling all files as text
`diff` detects some files as binary, although it can make sense to text-diff them (e.g. PDFs), just to get an impression of the changes. `diff` (and `git diff` as well, AFAIK without any good workaround without an external driver again) then just displays the unhelpful message that the 'binary files differ' (you don't say...🙄).
How about having `git annex diffdriver --text` always using `diff --text`? That would deviate from the usual `git diff` behaviour, but I argue that:
- a) People *explicitly* configure (a subset of) files in `.gitattributes` to be diffed with `git annex diffdriver --text`.
- b) If they wouldn't care about the diff between files, then they wouldn't configure it.
- c) Tracking binary(-like) files is something git-annex is explicitly designed for, it makes sense that git just skips over those, but git-annex could add value here
Thanks again a ton for git-annex, the Tübix2023-Workshop was well appreciated and lots of fun. 👍
Yann

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="Thanks for git annex satisfy, numcopies question"
date="2023-06-30T05:37:06Z"
content="""
Marvellous! I was wondering, if `git annex sync` doesn't satisfy numcopies, what does? Is numcopies just for drop/move? `sync`/`satisfy` won't override the preferred content to satisfy numcopies, right? Then my mental model ”git annex ensures there's always at least `numcopies` copies of your files” wasn't really true. It just ”won't by itself reduce the amount of copies of a file below numcopies/mincopies”, right? If that's true, would a command or option to sync/satisfy make sense that enforces numcopies and potentionally overrides the preferred content?
"""]]

View file

@ -1,6 +1,8 @@
Talks and screencasts about git-annex.
These videos are also available in a public git-annex repository
- [📹 Yann Büchau's (German) talk](https://odysee.com/@nobodyinperson:6/T%C3%BCbix2023-Yann-B%C3%BCchau-git-annex:6) about Git Annex on the [Tübix2023 Linux Day](https://tuebix.org) in Tübingen, Germany:
The below videos are also available in a public git-annex repository
`git clone https://downloads.kitenet.net/.git/`
[[!inline pages="./videos/* and !./videos/*/* and !*/Discussion" show="2"]]