Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2017-02-17 12:31:47 -04:00
commit e93d4bfa85
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
4 changed files with 173 additions and 0 deletions

View file

@ -0,0 +1,45 @@
[[!comment format=mdwn
username="lasitus"
avatar="http://cdn.libravatar.org/avatar/dfe778f28027aeb75876172022aa5de3"
subject="comment 7"
date="2017-02-17T03:23:46Z"
content="""
Ok, I have a script that generates the error. This generates a repository and 30 GB of random binary files with many folders 2 layers deep. Just put in an empty folder and run with python. No remotes are necessary. This was run in Windows 10 in a git bash window.
```
#!/usr/bin/env python
import logging
import os
import shutil
import subprocess
import uuid
logging.basicConfig(level=logging.DEBUG)
repositoryPath = os.path.abspath(\"./bigRepoTest\")
os.makedirs(repositoryPath)
subprocess.call(\"git init\", cwd=repositoryPath)
subprocess.call(\"git annex init pc\", cwd=repositoryPath)
def makeRandomDirectories(level1FolderCount, level2FolderCount, fileCount):
for directoryIndex in range(0, level1FolderCount):
logging.info(\"Adding top level folder \" + str(directoryIndex + 1) + \" of \" + str(level1FolderCount))
newDirectory = os.path.join(repositoryPath, str(uuid.uuid1()))
os.makedirs(newDirectory)
for directoryIndex in range(0, level2FolderCount):
newNestedDirectory = os.path.join(newDirectory, str(uuid.uuid1()))
os.makedirs(newNestedDirectory)
for fileIndex in range(0, fileCount):
newFile = os.path.join(newNestedDirectory, str(uuid.uuid1()) + \".bin\")
with open(newFile, 'wb') as fileOut:
fileOut.write(os.urandom(500000))
makeRandomDirectories(32, 1000, 1)
with open(os.path.join(repositoryPath, \"assistant.log\"), 'w') as output:
subprocess.Popen([\"git\", \"annex\", \"assistant\", \"--debug\"], cwd=repositoryPath, stdout=output, stderr=output)
makeRandomDirectories(32, 1000, 1)
subprocess.call(\"tail -f daemon.log\", cwd=os.path.join(repositoryPath, \".git\", \"annex\"))
```
"""]]

View file

@ -0,0 +1,81 @@
### Please describe the problem.
in v6 mode -- Result depends on having a good sleep before running 'git annex add'.
Without sleep, git annex manages first to stage file to be committed into git, but then also modifies it to be added into annex (this is not shown above -- just inspect that repository obtained without having any sleep)
I guess relates to http://git-annex.branchable.com/bugs/Too_difficult_if_not_impossible_to_explicitly_add__47__keep_file_under_git___40__not_annex__41___in_v6_without_employing_.gitattributes/
### What steps will reproduce the problem?
Run http://www.onerussian.com/tmp/ga-3.sh twice: once giving 0 secs to sleep, and then 1 (or about 0.3 might work as well)
### What version of git-annex are you using? On what operating system?
6.20170209+gitg16be7b5cc-1~ndall+1
### Please provide any additional information below.
if we just proceed with the script (init, add, status) without any delays -- git annex status would report it
[[!format sh """
$> ./ga-3.sh 0
+ s=0
++ mktemp -d
+ d=/home/yoh/.tmp/tmp.d6g0E7scxt
+ echo 'directory: /home/yoh/.tmp/tmp.d6g0E7scxt'
directory: /home/yoh/.tmp/tmp.d6g0E7scxt
+ cd /home/yoh/.tmp/tmp.d6g0E7scxt
+ git init
Initialized empty Git repository in /tmp/tmp.d6g0E7scxt/.git/
+ git annex init --version=6
init ok
(recording state in git...)
+ sed -i -e 's,pre-commit ,pre-commit --debug ,g' .git/hooks/pre-commit
+ echo 'I: creating a file'
I: creating a file
+ echo whatever
+ sleep 0
+ git -c annex.largefiles=nothing annex --debug add file5
[2017-02-17 10:19:48.91932971] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--others","--exclude-standard","-z","--","file5"]
add file5 (non-large file; adding content to git repository) ok
[2017-02-17 10:19:48.923428344] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--modified","-z","--","file5"]
(recording state in git...)
[2017-02-17 10:19:48.927922289] feed: xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"]
[2017-02-17 10:19:48.956812867] process done ExitSuccess
+ git annex status
M file5
"""]]
And if we wait just a bit before running add -- we would get it reported added
[[!format sh """
hopa:~/.tmp
$> ./ga-3.sh 1
+ s=1
++ mktemp -d
+ d=/home/yoh/.tmp/tmp.4I7ym6dSx2
+ echo 'directory: /home/yoh/.tmp/tmp.4I7ym6dSx2'
directory: /home/yoh/.tmp/tmp.4I7ym6dSx2
+ cd /home/yoh/.tmp/tmp.4I7ym6dSx2
+ git init
Initialized empty Git repository in /tmp/tmp.4I7ym6dSx2/.git/
+ git annex init --version=6
init ok
(recording state in git...)
+ sed -i -e 's,pre-commit ,pre-commit --debug ,g' .git/hooks/pre-commit
+ echo 'I: creating a file'
I: creating a file
+ echo whatever
+ sleep 1
+ git -c annex.largefiles=nothing annex --debug add file5
[2017-02-17 10:19:52.529445464] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--others","--exclude-standard","-z","--","file5"]
add file5 (non-large file; adding content to git repository) ok
[2017-02-17 10:19:52.533532166] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--modified","-z","--","file5"]
(recording state in git...)
[2017-02-17 10:19:52.537789158] feed: xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"]
[2017-02-17 10:19:52.567222419] process done ExitSuccess
+ git annex status
A file5
"""]]
[[!meta author=yoh]]

View file

@ -0,0 +1,27 @@
Revisiting an issue I reported a couple of months ago but never figured out. I am trying to use git annex assistant on two separate machines to automatically mirror files between them. But after I start the second assistant and add new files to the annex, I find that git fsck reports dangling blobs. Is there a conflict between the two assistants?
On the server:
$ mkdir ~/annex
$ cd ~/annex
$ git init
$ git annex init u --version=6
$ echo This is test file 1. >testfile1.txt
$ git annex add testfile1.txt
$ git annex sync
$ git remote add ml2 ssh://laptop/Users/username/annex
$ git annex assistant
After all that, I do this on the laptop:
$ cd ~/
$ git clone ssh://server/home/username/annex
$ cd annex
$ git annex init ml2 --version=6
$ git annex sync
$ git annex assistant
At this point git fsck is happy. But when I add files to the annex on either machine and run git fsck, I get messages like:
Checking object directories: 100% (256/256), done.
dangling blob 31a30177d1e37faf8eac96524302a61713d3d522

View file

@ -0,0 +1,20 @@
ATM I am experiencing sporadic failures of the batched git annex addurl call -- seems to report failure (success: False) once in a while, but succeeds on a retry:
[[!format sh """
(Pdb) p url
'http://openneuro.s3.amazonaws.com/ds000001/ds000001_R1.1.0/uncompressed/sub016/BOLD/task001_run003/QA/QA_report.pdf?versionId=null'
(Pdb) p out_json
{u'note': u'from datalad', u'command': u'addurl', u'file': u'ds000001_R1.1.0/uncompressed/sub016/BOLD/task001_run003/QA/QA_report.pdf', u'success': False}
(Pdb) up
> /home/yoh/proj/datalad/datalad/datalad/support/gitrepo.py(210)newfunc()
-> return func(self, file_new, *args, **kwargs)
(Pdb) func(self, file_new, *args, **kwargs)
{u'note': u'from datalad', u'file': u'ds000001_R1.1.0/uncompressed/sub016/BOLD/task001_run003/QA/QA_report.pdf', u'command': u'addurl', u'key': u'MD5E-s1191419--cb4efab8104b5117f64b58ee6d6a79ba.pdf', u'success': True}
"""]]
besides me blindly trying to re-run it e.g. 3 times and only then declare total failure, I wondered if json output could provide more information (if any known) about the failure... e.g. if a custom remote crashed/errorred (I guess the case here due to "from datalad") -- what was stderr/exit code for that process if crashed/ERROR msg... if wget -- what was stderr there
[[!meta name=yoh]]