Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-06-26 10:52:57 -04:00
commit dc90d818b1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 46 additions and 0 deletions

View file

@ -0,0 +1,19 @@
## The Problem
Apparently, `.gitattributes`-based configuration (of e.g. `numcopies`, `largefiles`, `addunlocked` (not even implemented due to the inefficienty), etc.) is slow as every file needs to be queried individually for its attributes (`git check-attr` under the hood, I guess).
## The Motivation
From a user's perspective, `.gitattributes`-based configuration has several benefits over the `git annex --set annex....` approach:
- `.gitattributes` can differ between branches
- `.gitattributes` lists file name matches much more easily readable, while e.g. `git annex --set annex.largefiles 'include=*.txt and include=*.md and include=*.bla and mimetype shenanigans and largerthan and whatnot...'` gets confusing quickly.
- `.gitattributes` nests well in subdirs, enabling quite concise and fine-grained control (e.g. all files in THAT folder should be annexed, but if I delete the folder at some point, nvm, my `git config --get annex.largefiles` won't stay cluttered with that path config)
Furthermore, Datalad [relies on `.gitattributes` configuration](https://git-annex.branchable.com/todo/annex.addunlocked_in_gitattributes/#comment-431d5040eac3b9a01d97724e25194f17) to specify the backend and e.g. the `text2git` procedure
## Suggestion
Couldn't the [separate-git-tree-for-diffing-technique you used lately to speed up repeated imports](https://git-annex.branchable.com/devblog/day_649-650__speeding_up_repeated_imports/) be used to cache `.gitattributes` for all (or relevant) files in a git tree (e.g. have the same paths in that tree but file contents are the attributes), querying the attributes is a matter of quering this tree and updating them just requires re-querying the touched paths.
One problem I see with this tough is that it wouldn't be possible to cache the user's `.git/info/attributes` settings, which can change independently.

View file

@ -0,0 +1,27 @@
Hi joey,
Currently, getting a useful diff between annexed file versions is quite involved [(setting up git-annex diffdriver)](https://git-annex.branchable.com/forum/git-like_git-annex_diff/).
It would be very nice if showing changes between annexed files was a little more straight-forward and ideally without any user config needed. UI suggestions:
- `git annex diff`: would behave exactly like `git diff`, but operatign on both unannexed and annexed contents
- ideally re-implementing all its options (e.g. `--word-diff`, `--word-diff-regex`, etc.)
- would need a diff implementation in Haskell (surely there is one)
- sounds complicated to do TBH
- Teaching `git diff` to use the annexed content instead of the pointer links/files
- software like [`nbstripout`](https://github.com/kynan/nbstripout) passes the git-tracked contents through a filter before diffing. This sounds like git-annex could do the same to add straight-forward `git diff` support without user configuration.
- git-annex already has a `* filter=annex` attribute in place, for text diffing there apparently needs to be a `* diff=annex` attribute and a `[diff "annex"] textconv=git-annex-output-content-instead-of-pointer` config.
- even if the above works, I don't know how to temporarily switch this off without uncommenting the `textconv` e.g. with `git config --edit`. Sometimes you just want to see the actual hashes of old and new file.
Maybe `git annex diffdriver` kind of does part of this, but I don't really understand what it actually does.
Here other posts related to diffing:
- https://git-annex.branchable.com/forum/enabling_git-annex-diffdriver_for_gitk/
- https://git-annex.branchable.com/todo/--get_option_for_diffdriver/
What do you think?
Cheers, Yann
PS: Thank you very much for git-annex, it's awesome! I'm giving a git-annex workshop next weekend [@Tuebix](https://cfp.tuebix.org/tuebix-2023/talk/review/GWRP3UKE3VFKVDG8RNQ8ZZPCZPNZYYWM), really looking forward to it.