From c7309b30e9ba0a632358a917afae0b92cee7c034 Mon Sep 17 00:00:00 2001 From: nobodyinperson Date: Mon, 26 Jun 2023 11:43:20 +0000 Subject: [PATCH 1/2] --- ...ffing_with___39__git_annex_diff__39__.mdwn | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 doc/todo/Straight-forward_diffing_with___39__git_annex_diff__39__.mdwn diff --git a/doc/todo/Straight-forward_diffing_with___39__git_annex_diff__39__.mdwn b/doc/todo/Straight-forward_diffing_with___39__git_annex_diff__39__.mdwn new file mode 100644 index 0000000000..5b2eb9e2ef --- /dev/null +++ b/doc/todo/Straight-forward_diffing_with___39__git_annex_diff__39__.mdwn @@ -0,0 +1,27 @@ +Hi joey, + +Currently, getting a useful diff between annexed file versions is quite involved [(setting up git-annex diffdriver)](https://git-annex.branchable.com/forum/git-like_git-annex_diff/). + +It would be very nice if showing changes between annexed files was a little more straight-forward and ideally without any user config needed. UI suggestions: + +- `git annex diff`: would behave exactly like `git diff`, but operatign on both unannexed and annexed contents + - ideally re-implementing all its options (e.g. `--word-diff`, `--word-diff-regex`, etc.) + - would need a diff implementation in Haskell (surely there is one) + - sounds complicated to do TBH +- Teaching `git diff` to use the annexed content instead of the pointer links/files + - software like [`nbstripout`](https://github.com/kynan/nbstripout) passes the git-tracked contents through a filter before diffing. This sounds like git-annex could do the same to add straight-forward `git diff` support without user configuration. + - git-annex already has a `* filter=annex` attribute in place, for text diffing there apparently needs to be a `* diff=annex` attribute and a `[diff "annex"] textconv=git-annex-output-content-instead-of-pointer` config. + - even if the above works, I don't know how to temporarily switch this off without uncommenting the `textconv` e.g. with `git config --edit`. Sometimes you just want to see the actual hashes of old and new file. + +Maybe `git annex diffdriver` kind of does part of this, but I don't really understand what it actually does. + +Here other posts related to diffing: + +- https://git-annex.branchable.com/forum/enabling_git-annex-diffdriver_for_gitk/ +- https://git-annex.branchable.com/todo/--get_option_for_diffdriver/ + +What do you think? + +Cheers, Yann + +PS: Thank you very much for git-annex, it's awesome! I'm giving a git-annex workshop next weekend [@Tuebix](https://cfp.tuebix.org/tuebix-2023/talk/review/GWRP3UKE3VFKVDG8RNQ8ZZPCZPNZYYWM), really looking forward to it. From 9112f0c184ca409536614d92c0fdb97061f79501 Mon Sep 17 00:00:00 2001 From: nobodyinperson Date: Mon, 26 Jun 2023 14:08:32 +0000 Subject: [PATCH 2/2] --- ...Adressing_.gitattributes_inefficiency.mdwn | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 doc/todo/Adressing_.gitattributes_inefficiency.mdwn diff --git a/doc/todo/Adressing_.gitattributes_inefficiency.mdwn b/doc/todo/Adressing_.gitattributes_inefficiency.mdwn new file mode 100644 index 0000000000..72a394bf49 --- /dev/null +++ b/doc/todo/Adressing_.gitattributes_inefficiency.mdwn @@ -0,0 +1,19 @@ +## The Problem + +Apparently, `.gitattributes`-based configuration (of e.g. `numcopies`, `largefiles`, `addunlocked` (not even implemented due to the inefficienty), etc.) is slow as every file needs to be queried individually for its attributes (`git check-attr` under the hood, I guess). + +## The Motivation + +From a user's perspective, `.gitattributes`-based configuration has several benefits over the `git annex --set annex....` approach: + +- `.gitattributes` can differ between branches +- `.gitattributes` lists file name matches much more easily readable, while e.g. `git annex --set annex.largefiles 'include=*.txt and include=*.md and include=*.bla and mimetype shenanigans and largerthan and whatnot...'` gets confusing quickly. +- `.gitattributes` nests well in subdirs, enabling quite concise and fine-grained control (e.g. all files in THAT folder should be annexed, but if I delete the folder at some point, nvm, my `git config --get annex.largefiles` won't stay cluttered with that path config) + +Furthermore, Datalad [relies on `.gitattributes` configuration](https://git-annex.branchable.com/todo/annex.addunlocked_in_gitattributes/#comment-431d5040eac3b9a01d97724e25194f17) to specify the backend and e.g. the `text2git` procedure + +## Suggestion + +Couldn't the [separate-git-tree-for-diffing-technique you used lately to speed up repeated imports](https://git-annex.branchable.com/devblog/day_649-650__speeding_up_repeated_imports/) be used to cache `.gitattributes` for all (or relevant) files in a git tree (e.g. have the same paths in that tree but file contents are the attributes), querying the attributes is a matter of quering this tree and updating them just requires re-querying the touched paths. + +One problem I see with this tough is that it wouldn't be possible to cache the user's `.git/info/attributes` settings, which can change independently.