This commit is contained in:
Joey Hess 2018-11-15 13:36:20 -04:00
parent 71cc9cfaa2
commit 05bfce7ca8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 53 additions and 0 deletions

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 13"""
date="2018-11-15T17:06:39Z"
content="""
I don't think this bug will be able to be tracked down without
rebuilding git-annex one way or another.
I could use the technique in
<https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell>
to instrument every possible place in git-annex that could deadlock.
This would be a lot of work, probably hundreds of lines of code changes.
It seems much easier to either find a way to reproduce the problem
outside of this one machine with the problem, or bisect it.
"""]]

View file

@ -0,0 +1,37 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2018-11-15T16:50:16Z"
content="""
git diff is running the smudge --clean filter, and there are two different
cases when it comes to git-annex's filter.
If the user is for some reason using git to diff two files that are both
located outside the repository, the thing to do is clearly to pass them
both through the clean filter unchanged. That way the actual text of the
files will be diffed, not the pointer file the clean filter would usually
provide.
OTOH, the user may be diffing an annexed file located in the git repo
with a file outside the repo. If the smudge filter generates a pointer for
the file inside the repo, and not for outside, then the diff between the
pointer and the actual content of the other file will not be useful.
Unfortunately, the interface doesn't let git-annex distinguish between
these scenarios, since the filter is run on one file at a time.
It also doesn't make much sense for the clean filter to actually ingest a
file located outside the repo into the annex.
So it seems the best that can be done is for the clean filter to
pass through the content of files located outside the repo. At least
avoiding error messages and unwanted ingestion, but not generating a useful
diff in the second case. And this is the approach I've implemented.
Another way to look at it is, git diff displays a diff of the data that
actually gets checked into git. Diffing two worktree files that are both
annexed results in a not very useful diff between two pointer files. So
it's consistent that diffing between a worktree file and an out-of-worktree
file results in a not very useful diff between a pointer file and the later
file's content. This is also consitent with how git diff works
with git-annex symlinks.
"""]]