finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.
What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.
So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d
, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.
This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.
This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.
This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
parent
06fd4657db
commit
48e9e12961
6 changed files with 76 additions and 102 deletions
|
@ -1,41 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2018-08-09T20:18:46Z"
|
||||
content="""
|
||||
One of v6's big problems is that dropping or getting an annexed file
|
||||
updates the file in the working tree, which makes git status think
|
||||
the file is modified, even though the clean filter will output
|
||||
the same pointer as before. Runing `git add` to clear it up is quite
|
||||
expensive since the large file content has to be read.
|
||||
Maybe a long-running filter process could avoid this problem.
|
||||
|
||||
----
|
||||
|
||||
If git can be coaxed somehow into re-running the smudge filter,
|
||||
git-annex could provide the new worktree content to git via it,
|
||||
and let git update the working tree.
|
||||
|
||||
Git would make a copy, which git-annex currently does, so the only
|
||||
added overhead would be sending the file content to git down the pipe.
|
||||
(Well and it won't use reflink for the copy on COW filesystems.)
|
||||
|
||||
annex.thin is a complication, but it could be handled by hard linking the
|
||||
work tree file that git writes back into the annex, overwriting the file that
|
||||
was there. (This approach could also make git checkout of a branch honor
|
||||
annex.thin.)
|
||||
|
||||
How to make git re-run the smudge filter? It needs to want to update the
|
||||
working tree. One way is to touch the worktree files and then run
|
||||
`git checkout`. Although this risks losing modifications the user made
|
||||
to the files so would need to be done with care.
|
||||
|
||||
That seems like it would defer working tree updates until the git-annex
|
||||
get command was done processing all files. Sometimes I want to use a
|
||||
file while the same get command is still running for other files.
|
||||
It might work to use the "delay" capability of the filter process
|
||||
interface. Get git to re-smudge all affected files, and when it
|
||||
asks for content for each, send "delayed". Then as git-annex gets
|
||||
each file, respond to git's "list_available_blobs" with a single blob,
|
||||
which git should request and use to update the working tree.
|
||||
"""]]
|
|
@ -12,39 +12,39 @@ git-annex should use smudge/clean filters.
|
|||
# because it doesn't know it has that name
|
||||
# git commit clears up this mess
|
||||
|
||||
* Dropping a smudged file causes git status (and git annex status)
|
||||
to show it as modified, because the timestamp has changed.
|
||||
Getting a smudged file can also cause this.
|
||||
Upgrading a direct mode repo also leaves files in this state.
|
||||
User can use `git add` to clear it up, but better to avoid this,
|
||||
by updating stat info in the index.
|
||||
* If an annexed file's content is not present, and its pointer file
|
||||
is copied to a new name and added, it does not get added as an
|
||||
associated file. (If the content is present, it does get added.)
|
||||
|
||||
May need to use libgit2 to do this efficiently, cannot find
|
||||
any plumbing except git-update-index, which is very inneficient for
|
||||
smudged files; updating a file feeds its whole content through the clean
|
||||
filter again.
|
||||
* If an unlocked file's content is not present, and a new file with
|
||||
identical content is added with `git add`, the unlocked file is
|
||||
populated, but git-annex is unable to update the index, so git status
|
||||
will say that it has been modified.
|
||||
|
||||
Part of the problem is that the clean filter needs to consume the whole
|
||||
of stdin. (And git has to write the whole file content to stdout from the
|
||||
file it mmaps). A more efficient smudge/clean interface that let the filter
|
||||
read the file itself would let git-annex short-circuit when the file it's
|
||||
cleaning is one it already knows about. I've proposed extending git with
|
||||
such an interface:
|
||||
<http://git.661346.n2.nabble.com/proposal-for-extending-smudge-clean-filters-with-raw-file-access-td7656150.html>
|
||||
* If an annexed file is deleted in the index, and another annexed file
|
||||
uses the same key, and git annex get/drop is run, the index update
|
||||
that's done to prevent status showing the file as modified adds
|
||||
the deleted file back to the index.
|
||||
|
||||
And developed a patch set: [[git-patches]]
|
||||
* Also, if the user is getting files, and modifying files at the same
|
||||
time, and they stage their modifications, the modification may get
|
||||
unstaged in a race when a file is got and the updated worktree file
|
||||
staged in the index.
|
||||
|
||||
I don't know if this is worth worrying about,
|
||||
because there's also of course a race where the modification to the
|
||||
worktree file may get reverted when git-annex updates the content. Those
|
||||
races are much smaller, but do exist.
|
||||
|
||||
> Thanks to [[!commit a96972015dd76271b46432151e15d5d38d7151ff]],
|
||||
> the clean filter is now very quick, so, this can be fixed by running
|
||||
> git update-index with files affected by get/drop.
|
||||
>
|
||||
> In case a file's content quickly changes after get/drop, git
|
||||
> update-index would add the new content. To avoid this, use
|
||||
> `git update-index --index-info`. The next run of `git status`
|
||||
> then runs the clean filter, and will detect if the file has gotten
|
||||
> modified after the get/drop. TODO
|
||||
* get/drop operations on unlocked files lead to an update of the index.
|
||||
Only one process can update the index at one time, so eg, git annex get
|
||||
at the same time as a git commit may display a ugly warning
|
||||
(or the git commit could fail to start if run at just the right time).
|
||||
|
||||
* Use git's new `filter.<driver>.process` interface, which will
|
||||
Two git-annex get processes can also try to update the index at the
|
||||
same time and encounter this problem (git annex get -J is ok).
|
||||
|
||||
* Potentially: Use git's new `filter.<driver>.process` interface, which will
|
||||
let only 1 git-annex process be started by git when processing
|
||||
multiple files, and so should be faster.
|
||||
|
||||
|
@ -66,12 +66,8 @@ git-annex should use smudge/clean filters.
|
|||
git-annex adjust and git-annex sync could both use that internally
|
||||
when checking out the adjusted branch, and merging a branch into HEAD.
|
||||
|
||||
Since this approach modifies work tree files, it again causes git status
|
||||
to think files are modified. So, the above todo item about that needs to
|
||||
be sorted out first; it would not do for git annex adjust to cause
|
||||
the whole work tree to be considered to be modified!
|
||||
|
||||
My enhanced smudge/clean patch set also fixes this problem.
|
||||
(My enhanced smudge/clean patch set also fixed this problem, in a much
|
||||
nicer way...)
|
||||
|
||||
* When git runs the smudge filter, it buffers all its output in ram before
|
||||
writing it to a file. So, checking out a branch with a large v6 unlocked files
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue