Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
2622fd3192
3 changed files with 47 additions and 0 deletions
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.55"
|
||||
subject="comment 1"
|
||||
date="2014-07-04T18:24:51Z"
|
||||
content="""
|
||||
The diff-filter=T comes from when Command.Add runs its pass to find unlocked files. It's finished adding all the files, so it must either be that or the git-annex branch commit that's running out of memory, I think.
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.55"
|
||||
subject="comment 2"
|
||||
date="2014-07-04T18:36:49Z"
|
||||
content="""
|
||||
Does not seem to be the diff-filter=T command that is the problem. It's not outputting a lot of files, and should stream over them even if it did.
|
||||
|
||||
The last xargs appears to be at or near the problem. It is called by Annex.Content.saveState, which first does a Annex.Queue.flush, followed by a Annex.Branch.commit. I suspect the problem is the latter; at this point there are 2 million files in .git/annex/journal waiting to be committed to the git-annex branch.
|
||||
|
||||
In the same big repo, I can add one more file and reproduce the problem running `git annex add $newfile`.
|
||||
"""]]
|
|
@ -0,0 +1,27 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.55"
|
||||
subject="comment 3"
|
||||
date="2014-07-04T19:26:00Z"
|
||||
content="""
|
||||
Looking at the code, it's pretty clear why this is using a lot of memory:
|
||||
|
||||
<pre>
|
||||
fs <- getJournalFiles jl
|
||||
liftIO $ do
|
||||
h <- hashObjectStart g
|
||||
Git.UpdateIndex.streamUpdateIndex g
|
||||
[genstream dir h fs]
|
||||
hashObjectStop h
|
||||
return $ liftIO $ mapM_ (removeFile . (dir </>)) fs
|
||||
</pre>
|
||||
|
||||
So the big list in `fs` has to be retained in memory after the files are streamed to update-index, in order for them to be deleted!
|
||||
|
||||
Fixing is a bit tricky.. New journal files can appear while this is going on, so it can't just run getJournalFiles a second time to get the files to clean.
|
||||
Maybe delete the file after it's been sent to git-update-index? But git-update-index is going to want to read the file, and we don't really know when it will choose to do so. It could wait a while after we've sent the filename to it, potentially.
|
||||
|
||||
Also, per [[!commit 750c4ac6c282d14d19f79e0711f858367da145e4]], we cannot delete the journal files until *after* the commit, or another git-annex process would see inconsistent data!
|
||||
|
||||
I actually think I am going to need to use a temp file to hold the list of files..
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue