move bug out of forum and close

This commit is contained in:
Joey Hess 2014-03-06 17:18:35 -04:00
parent 941ea993dc
commit aa131c870a
Failed to extract signature
10 changed files with 2 additions and 0 deletions

View file

@ -0,0 +1,7 @@
I have a large direct-mode repository whose files I'm trying to copy to a non-direct-mode repository. Both repositories live on an HDD attached to an rpi.
When I do $ git annex copy --to pi dirs/to/copy, the copy starts out OK, but eventually many files fail to copy. The only diagnostic I get is "failed". Judging from the backscroll, I don't see a strong pattern to the files which fail to copy; they're kind of interspersed amongst files which were successfully copied. If I try to copy one of these failed files explicitly (git annex copy --to pi file/which/failed), this succeeds. I have plenty of free space on the disk.
Is there a way to get more diagnostics out of git annex so I can see why these files are failing to copy?
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmUJBh1lYmvfCCiGr3yrdx-QhuLCSRnU5c"
nickname="Justin"
subject="comment 1"
date="2014-03-05T16:11:27Z"
content="""
I tried git annex sync --content, and it failed to copy some files with
git: createProcess: resource exhausted (Resource temporarily unavailable)
So this sounds like fork is failing; I'm probably exhausting my poor pi's RAM. Maybe the same thing is happening for git annex copy. I'll run strace to see.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.146"
subject="comment 2"
date="2014-03-05T20:31:57Z"
content="""
How many files copied are we talking about before it begins to fail?
You can try passing --debug, which will make git-annex show every external command it runs, which includes `cp` for a copy to another repo on the same machine.
Might also check memory usage in top during the run.
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmUJBh1lYmvfCCiGr3yrdx-QhuLCSRnU5c"
nickname="Justin"
subject="comment 3"
date="2014-03-06T18:21:53Z"
content="""
> How many files copied are we talking about before it begins to fail?
Tens of thousands of files processed, but many of them were already on the other remote so didn't invoke cp (or anything else). ~3300 invocations of cp.
I saved a log of ps aux, and, while the memory used by git annex remains relatively constant, I do observe /tons/ of zombie processes. 3300, actually.
I didn't check all of them, but all of the zombie pids I checked appear to have corresponded to this command:
/home/pi/git-annex.linux/shimmed/git/git --git-dir=/home/pi/hdd/annex/.git --work-tree=/home/pi/hdd/annex cat-file --batch
Perhaps git annex is forgetting to reap this processes?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.146"
subject="comment 4"
date="2014-03-06T18:32:33Z"
content="""
Old versions of git-annex have known bugs involving zombies. What version?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmUJBh1lYmvfCCiGr3yrdx-QhuLCSRnU5c"
nickname="Justin"
subject="comment 5"
date="2014-03-06T18:35:00Z"
content="""
5.20140221-g1a47f5f -- I just downloaded it a week or two ago.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.146"
subject="comment 6"
date="2014-03-06T18:38:43Z"
content="""
Hmm, that version should only start git cat-file --batch a maximum of 10 times (if it is crashing for some reason), and appears to wait on the process if it does crash. And if not, should only start one.
I think you need to post some git-annex --debug output , to show when it's running this command.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.146"
subject="comment 7"
date="2014-03-06T18:40:26Z"
content="""
Actually, NM, I have reproduced the bug.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.146"
subject="comment 8"
date="2014-03-06T20:21:16Z"
content="""
Analysis: Remote.Git's onLocal calls Annex.new to make a new AnnexState for the local remote. This state is not cached, and is regenerated for each file. Since it runs a Annex.Branch check of the location log on the remote, it needs to start catFile, and since the state is not reused, a new CatFileHandle is allocated each time. I'm not sure, but there may have been a recent-ish change that caused the location log to get checked and so catfile to be run; the general inneficiency of making a new AnnexState each time is not new.
Fixing this by caching the AnnexState will not only fix the resource leak, but should speed up local to local copies significantly!
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.146"
subject="comment 9"
date="2014-03-06T21:17:14Z"
content="""
Fixed in git. Also reduced the non-data-transfer work done by `git-annex copy` by around 8%.
I'm going to move this thread to [[bugs]] so I can close it. ;)
"""]]