Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-07-05 11:54:43 -04:00
commit 5aad0cea83
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 185 additions and 2 deletions

View file

@ -0,0 +1,46 @@
First mentioned [here](https://git-annex.branchable.com/todo/new_command_for_syncing_content_only/#comment-0814894ed5e40c9d0f7aef694cce53c1)
## TL;DR: Feature Proposal: A matching option to operate only on files within a git revision range
## Motivation
I use git-annex for several repositories where content is added automatically, e.g.:
- my phone (pictures I take, selected pictures others send me, etc.)
- my research data repo (new files are added as data comes in)
Those repos don't necessarily have the assistant running as I want to control when and what's being added. I also have `annex.synccontent=false` set because full availability of all files doesn't make sense. I imagine I am not the only one using git annex like this.
Most of the time, when I want to access content from such repos from another machine, it is often just the most recent content. Example: I take a picture on my phone and would like to have it *now* on my desktop. The workflow is:
[[!format bash """
yann@phone> git annex assist # takes ages for some reason, but only when --content functionality is active
yann@desktop> git annex assist # doesn't pull all content from server because I have annex.synccontent=false set (too much space otherwise)
# selectively get files I want (tedious, manual picking)
yann@desktop> git annex get file1 file2 file3 ...
"""]]
## Workaround
One can script the following to only sync the recently touched files:
[[!format bash """
# Specifying a git rev range by having `git diff` figure out the details
yann@desktop> git diff --name-only HEAD~20 | xargs -d'\n' git annex get
# Specifying a time range
yann@desktop> git log -p --since="1 week ago" | grep -e '^[+-]\{3\}' | cut -c5- | grep -vx '/dev/null' | cut -c3- | sort -u | xargs -d'\n' git annex get
"""]]
This kinda works but...
- is fragile shell scripting
- doesn't like deleted files that much (they get handed to git-annex, which complains that those don't exist)
- is two very separate solutions for the same thing: only operating on recent files
## Proposal
How about a `--recent` or `--since` or `--revs` etc. option which you can hand either a commit(range) or a `git log --since`-compatible string like `1 week ago`, which will cause git-annex to only consider those files for `get`ing or `drop`ing or `sync`ing or whatnot?
I observed that `git annex sync --content` is often very much slower than `git annex sync --no-content` (without the time the actual syncing takes naturally), apparently because it needs to check a whole lot of files for syncing necessity. If that is the case, then a `--since` option could result in a speed improvement as only a very small amount of files would need to be checked for.
Also, it would be awesome if one could say `git annex assist|sync --since=yesterday` and one would end up with a perfectly synced repo and the files touched yesterday being available. This is a situation I find myself needing on a daily basis.

View file

@ -0,0 +1,30 @@
Thank you joey for `git annex diffdriver --text`, that is a big step towards easier diffing of annexed files. The following is now a copy-paste solution to 'make git diff work with git annex':
```
echo '* diff=annextextdiff' >> .git/info/attributes && git config diff.annextextdiff.command "git annex diffdriver --text"
```
This however then has `git diff` use git-annex' diffing mechanism for *all* files, including normal git-tracked files. There probably is no gitattributes-way of applying a diff command only to annexed files, right?
## Customizing `diff` options
Apparently, `git annex diffdriver --text` uses the system's `diff` command and doesn't (allow to) give it any specific options. The below points could be worked around by having something like `git annex diffdriver --text --diffopts='--color=auto'` so that the user can customize the `diff` invocation. Alternatively you could introduce an environment variable like `GIT_ANNEX_DIFFDRIVER_DIFF_OPTIONS` or shorter `GIT_ANNEX_DIFF_FLAGS` that could also be used to temporarily diff *that one file* with specific options.
## Coloring the output
How about passing the (sanitized, `false`→`never` and `true`→`auto`) `git config color.diff` (and fallback `color.ui`) setting as the `diff --color=...` option? The experience would then match the users configured expectation.
## Handling all files as text
`diff` detects some files as binary, although it can make sense to text-diff them (e.g. PDFs), just to get an impression of the changes. `diff` (and `git diff` as well, AFAIK without any good workaround without an external driver again) then just displays the unhelpful message that the 'binary files differ' (you don't say...🙄).
How about having `git annex diffdriver --text` always using `diff --text`? That would deviate from the usual `git diff` behaviour, but I argue that:
- a) People *explicitly* configure (a subset of) files in `.gitattributes` to be diffed with `git annex diffdriver --text`.
- b) If they wouldn't care about the diff between files, then they wouldn't configure it.
- c) Tracking binary(-like) files is something git-annex is explicitly designed for, it makes sense that git just skips over those, but git-annex could add value here
Thanks again a ton for git-annex, the Tübix2023-Workshop was well appreciated and lots of fun. 👍
Yann

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="nobodyinperson"
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
subject="Thanks for git annex satisfy, numcopies question"
date="2023-06-30T05:37:06Z"
content="""
Marvellous! I was wondering, if `git annex sync` doesn't satisfy numcopies, what does? Is numcopies just for drop/move? `sync`/`satisfy` won't override the preferred content to satisfy numcopies, right? Then my mental model ”git annex ensures there's always at least `numcopies` copies of your files” wasn't really true. It just ”won't by itself reduce the amount of copies of a file below numcopies/mincopies”, right? If that's true, would a command or option to sync/satisfy make sense that enforces numcopies and potentionally overrides the preferred content?
"""]]