Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
5aad0cea83
9 changed files with 185 additions and 2 deletions
|
@ -0,0 +1,46 @@
|
|||
First mentioned [here](https://git-annex.branchable.com/todo/new_command_for_syncing_content_only/#comment-0814894ed5e40c9d0f7aef694cce53c1)
|
||||
|
||||
## TL;DR: Feature Proposal: A matching option to operate only on files within a git revision range
|
||||
|
||||
## Motivation
|
||||
|
||||
I use git-annex for several repositories where content is added automatically, e.g.:
|
||||
|
||||
- my phone (pictures I take, selected pictures others send me, etc.)
|
||||
- my research data repo (new files are added as data comes in)
|
||||
|
||||
Those repos don't necessarily have the assistant running as I want to control when and what's being added. I also have `annex.synccontent=false` set because full availability of all files doesn't make sense. I imagine I am not the only one using git annex like this.
|
||||
|
||||
Most of the time, when I want to access content from such repos from another machine, it is often just the most recent content. Example: I take a picture on my phone and would like to have it *now* on my desktop. The workflow is:
|
||||
|
||||
[[!format bash """
|
||||
yann@phone> git annex assist # takes ages for some reason, but only when --content functionality is active
|
||||
yann@desktop> git annex assist # doesn't pull all content from server because I have annex.synccontent=false set (too much space otherwise)
|
||||
# selectively get files I want (tedious, manual picking)
|
||||
yann@desktop> git annex get file1 file2 file3 ...
|
||||
"""]]
|
||||
|
||||
## Workaround
|
||||
|
||||
One can script the following to only sync the recently touched files:
|
||||
|
||||
[[!format bash """
|
||||
# Specifying a git rev range by having `git diff` figure out the details
|
||||
yann@desktop> git diff --name-only HEAD~20 | xargs -d'\n' git annex get
|
||||
# Specifying a time range
|
||||
yann@desktop> git log -p --since="1 week ago" | grep -e '^[+-]\{3\}' | cut -c5- | grep -vx '/dev/null' | cut -c3- | sort -u | xargs -d'\n' git annex get
|
||||
"""]]
|
||||
|
||||
This kinda works but...
|
||||
|
||||
- is fragile shell scripting
|
||||
- doesn't like deleted files that much (they get handed to git-annex, which complains that those don't exist)
|
||||
- is two very separate solutions for the same thing: only operating on recent files
|
||||
|
||||
## Proposal
|
||||
|
||||
How about a `--recent` or `--since` or `--revs` etc. option which you can hand either a commit(range) or a `git log --since`-compatible string like `1 week ago`, which will cause git-annex to only consider those files for `get`ing or `drop`ing or `sync`ing or whatnot?
|
||||
|
||||
I observed that `git annex sync --content` is often very much slower than `git annex sync --no-content` (without the time the actual syncing takes naturally), apparently because it needs to check a whole lot of files for syncing necessity. If that is the case, then a `--since` option could result in a speed improvement as only a very small amount of files would need to be checked for.
|
||||
|
||||
Also, it would be awesome if one could say `git annex assist|sync --since=yesterday` and one would end up with a perfectly synced repo and the files touched yesterday being available. This is a situation I find myself needing on a daily basis.
|
30
doc/todo/Improving_diffdriver_--text.mdwn
Normal file
30
doc/todo/Improving_diffdriver_--text.mdwn
Normal file
|
@ -0,0 +1,30 @@
|
|||
Thank you joey for `git annex diffdriver --text`, that is a big step towards easier diffing of annexed files. The following is now a copy-paste solution to 'make git diff work with git annex':
|
||||
|
||||
```
|
||||
echo '* diff=annextextdiff' >> .git/info/attributes && git config diff.annextextdiff.command "git annex diffdriver --text"
|
||||
```
|
||||
|
||||
This however then has `git diff` use git-annex' diffing mechanism for *all* files, including normal git-tracked files. There probably is no gitattributes-way of applying a diff command only to annexed files, right?
|
||||
|
||||
## Customizing `diff` options
|
||||
|
||||
Apparently, `git annex diffdriver --text` uses the system's `diff` command and doesn't (allow to) give it any specific options. The below points could be worked around by having something like `git annex diffdriver --text --diffopts='--color=auto'` so that the user can customize the `diff` invocation. Alternatively you could introduce an environment variable like `GIT_ANNEX_DIFFDRIVER_DIFF_OPTIONS` or shorter `GIT_ANNEX_DIFF_FLAGS` that could also be used to temporarily diff *that one file* with specific options.
|
||||
|
||||
## Coloring the output
|
||||
|
||||
How about passing the (sanitized, `false`→`never` and `true`→`auto`) `git config color.diff` (and fallback `color.ui`) setting as the `diff --color=...` option? The experience would then match the users configured expectation.
|
||||
|
||||
|
||||
## Handling all files as text
|
||||
|
||||
`diff` detects some files as binary, although it can make sense to text-diff them (e.g. PDFs), just to get an impression of the changes. `diff` (and `git diff` as well, AFAIK without any good workaround without an external driver again) then just displays the unhelpful message that the 'binary files differ' (you don't say...🙄).
|
||||
|
||||
How about having `git annex diffdriver --text` always using `diff --text`? That would deviate from the usual `git diff` behaviour, but I argue that:
|
||||
|
||||
- a) People *explicitly* configure (a subset of) files in `.gitattributes` to be diffed with `git annex diffdriver --text`.
|
||||
- b) If they wouldn't care about the diff between files, then they wouldn't configure it.
|
||||
- c) Tracking binary(-like) files is something git-annex is explicitly designed for, it makes sense that git just skips over those, but git-annex could add value here
|
||||
|
||||
Thanks again a ton for git-annex, the Tübix2023-Workshop was well appreciated and lots of fun. 👍
|
||||
|
||||
Yann
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="nobodyinperson"
|
||||
avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
|
||||
subject="Thanks for git annex satisfy, numcopies question"
|
||||
date="2023-06-30T05:37:06Z"
|
||||
content="""
|
||||
Marvellous! I was wondering, if `git annex sync` doesn't satisfy numcopies, what does? Is numcopies just for drop/move? `sync`/`satisfy` won't override the preferred content to satisfy numcopies, right? Then my mental model ”git annex ensures there's always at least `numcopies` copies of your files” wasn't really true. It just ”won't by itself reduce the amount of copies of a file below numcopies/mincopies”, right? If that's true, would a command or option to sync/satisfy make sense that enforces numcopies and potentionally overrides the preferred content?
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue