move old closed bugs to datalad's bugs
These were not tagged datalad, but they did pay to get them fixed, and it impacted them as seen in the bug reports..
This commit is contained in:
parent
aab79dded8
commit
acdd5fbab6
53 changed files with 0 additions and 0 deletions
|
@ -1,138 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
edit: I said "lustre" but it is isilon which is under. My wrong/outdated/mixed up knowledge of the system
|
||||
|
||||
Finally we got what should be a normal "POSIX" NFS mount (no ACL) on a local HPC. I immediately ventured to test git-annex (and datalad) on that file system. git annex 8.20210803-g99bb214 and git 2.30.2 both coming from conda.
|
||||
|
||||
with basic operations -- it felt like it works ok.
|
||||
but then when I ran tests while pointing TMPDIR to this location, e.g. `TMPDIR=$PWD git annex test 2>&1 | tee git-annex-noretry+pidlock1-1.log` - `git-annex` is far from a happy camper.
|
||||
|
||||
Note: `tmp` path below is actually a `/dartfs/rc/lab/D/DBIC/DBIC/archive/tmp`
|
||||
|
||||
<details>
|
||||
<summary>Not sure if related or not but git-annex upon init seems to not detect the need for pidlock </summary>
|
||||
|
||||
```shell
|
||||
[d31548v@discovery7 tmp]$ ( PS4='[$(date)] > '; set -x; mkdir test; cd test; git init; git annex --debug init; cat .git/config; )
|
||||
[Wed Aug 11 16:14:03 EDT 2021] > mkdir test
|
||||
[Wed Aug 11 16:14:03 EDT 2021] > cd test
|
||||
[Wed Aug 11 16:14:04 EDT 2021] > git init
|
||||
Initialized empty Git repository in /dartfs/rc/lab/D/DBIC/DBIC/archive/tmp/test/.git/
|
||||
[Wed Aug 11 16:14:04 EDT 2021] > git annex --debug init
|
||||
init [2021-08-11 16:14:04.936362] (Utility.Process) process [10184] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2021-08-11 16:14:04.982236] (Utility.Process) process [10184] done ExitFailure 1
|
||||
[2021-08-11 16:14:04.984294] (Utility.Process) process [10185] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--verify","-q","refs/remotes/origin/git-annex"]
|
||||
[2021-08-11 16:14:05.019706] (Utility.Process) process [10185] done ExitFailure 1
|
||||
[2021-08-11 16:14:05.029391] (Utility.Process) process [10187] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","write-tree"]
|
||||
[2021-08-11 16:14:05.118706] (Utility.Process) process [10187] done ExitSuccess
|
||||
[2021-08-11 16:14:05.121008] (Utility.Process) process [10189] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","4b825dc642cb6eb9a060e54bf8d69288fbee4904","--no-gpg-sign"]
|
||||
[2021-08-11 16:14:05.197601] (Utility.Process) process [10189] done ExitSuccess
|
||||
[2021-08-11 16:14:05.218776] (Utility.Process) process [10190] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","4570bef1d852f540b8bb1bcf7aa778cf48d8743a"]
|
||||
[2021-08-11 16:14:05.342053] (Utility.Process) process [10190] done ExitSuccess
|
||||
[2021-08-11 16:14:05.380591] (Utility.Process) process [10191] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","annex.uuid","4f1069df-9aa9-4e72-92dc-b1f5ac6946d5"]
|
||||
[2021-08-11 16:14:05.428772] (Utility.Process) process [10191] done ExitSuccess
|
||||
[2021-08-11 16:14:05.432081] (Utility.Process) process [10193] read: git ["config","--null","--list"]
|
||||
[2021-08-11 16:14:05.502382] (Utility.Process) process [10193] done ExitSuccess
|
||||
[2021-08-11 16:14:05.635239] (Utility.Process) process [10202] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
|
||||
[2021-08-11 16:14:05.698023] (Utility.Process) process [10202] done ExitSuccess
|
||||
[2021-08-11 16:14:05.699988] (Utility.Process) process [10203] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2021-08-11 16:14:05.734066] (Utility.Process) process [10203] done ExitSuccess
|
||||
[2021-08-11 16:14:05.741208] (Utility.Process) process [10204] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..4570bef1d852f540b8bb1bcf7aa778cf48d8743a","--pretty=%H","-n1"]
|
||||
[2021-08-11 16:14:05.794383] (Utility.Process) process [10204] done ExitSuccess
|
||||
[2021-08-11 16:14:05.827348] (Utility.Process) process [10207] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
|
||||
[2021-08-11 16:14:05.832331] (Utility.Process) process [10208] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
|
||||
[2021-08-11 16:14:05.912843] (Utility.Process) process [10212] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","annex.version","8"]
|
||||
[2021-08-11 16:14:06.014334] (Utility.Process) process [10212] done ExitSuccess
|
||||
[2021-08-11 16:14:06.017332] (Utility.Process) process [10213] read: git ["config","--null","--list"]
|
||||
[2021-08-11 16:14:06.09542] (Utility.Process) process [10213] done ExitSuccess
|
||||
[2021-08-11 16:14:06.097328] (Utility.Process) process [10215] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","status","--porcelain"]
|
||||
[2021-08-11 16:14:06.199386] (Utility.Process) process [10215] done ExitSuccess
|
||||
[2021-08-11 16:14:06.201524] (Utility.Process) process [10217] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","filter.annex.smudge","git-annex smudge -- %f"]
|
||||
[2021-08-11 16:14:06.259174] (Utility.Process) process [10217] done ExitSuccess
|
||||
[2021-08-11 16:14:06.262019] (Utility.Process) process [10218] read: git ["config","--null","--list"]
|
||||
[2021-08-11 16:14:06.332769] (Utility.Process) process [10218] done ExitSuccess
|
||||
[2021-08-11 16:14:06.334786] (Utility.Process) process [10221] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","filter.annex.clean","git-annex smudge --clean -- %f"]
|
||||
[2021-08-11 16:14:06.396427] (Utility.Process) process [10221] done ExitSuccess
|
||||
[2021-08-11 16:14:06.39987] (Utility.Process) process [10224] read: git ["config","--null","--list"]
|
||||
[2021-08-11 16:14:06.454542] (Utility.Process) process [10224] done ExitSuccess
|
||||
[2021-08-11 16:14:06.871684] (Utility.Process) process [10228] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","symbolic-ref","-q","HEAD"]
|
||||
[2021-08-11 16:14:06.923312] (Utility.Process) process [10228] done ExitSuccess
|
||||
[2021-08-11 16:14:06.975588] (Utility.Process) process [10231] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","refs/heads/master"]
|
||||
[2021-08-11 16:14:07.015031] (Utility.Process) process [10231] done ExitFailure 1
|
||||
[2021-08-11 16:14:07.017599] (Utility.Process) process [10233] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","symbolic-ref","-q","HEAD"]
|
||||
[2021-08-11 16:14:07.056731] (Utility.Process) process [10233] done ExitSuccess
|
||||
[2021-08-11 16:14:07.059379] (Utility.Process) process [10234] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/master"]
|
||||
[2021-08-11 16:14:07.100443] (Utility.Process) process [10234] done ExitFailure 1
|
||||
[2021-08-11 16:14:07.102482] (Utility.Process) process [10236] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","checkout","-q","-B","master"]
|
||||
[2021-08-11 16:14:07.157509] (Utility.Process) process [10236] done ExitSuccess
|
||||
[2021-08-11 16:14:07.208637] (Utility.Process) process [10238] read: uname ["-n"]
|
||||
[2021-08-11 16:14:07.209729] (Utility.Process) process [10238] done ExitSuccess
|
||||
[2021-08-11 16:14:07.292876] (Annex.Branch) read uuid.log
|
||||
[2021-08-11 16:14:07.392149] (Annex.Branch) set uuid.log
|
||||
[2021-08-11 16:14:07.393272] (Annex.Branch) read remote.log
|
||||
ok
|
||||
[2021-08-11 16:14:07.536729] (Utility.Process) process [10244] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","hash-object","-w","--stdin-paths","--no-filters"]
|
||||
[2021-08-11 16:14:07.54018] (Utility.Process) process [10245] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-index","-z","--index-info"]
|
||||
[2021-08-11 16:14:07.760348] (Utility.Process) process [10245] done ExitSuccess
|
||||
[2021-08-11 16:14:07.76284] (Utility.Process) process [10246] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2021-08-11 16:14:07.800127] (Utility.Process) process [10246] done ExitSuccess
|
||||
(recording state in git...)
|
||||
[2021-08-11 16:14:07.805187] (Utility.Process) process [10247] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","write-tree"]
|
||||
[2021-08-11 16:14:07.948696] (Utility.Process) process [10247] done ExitSuccess
|
||||
[2021-08-11 16:14:07.950889] (Utility.Process) process [10248] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","2ebb8bbc83b47982a995f66975d4265986f005b8","--no-gpg-sign","-p","refs/heads/git-annex"]
|
||||
[2021-08-11 16:14:08.022394] (Utility.Process) process [10248] done ExitSuccess
|
||||
[2021-08-11 16:14:08.024889] (Utility.Process) process [10252] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","83fb8ae08e2e0ea030f3ce75aa840d27838859b6"]
|
||||
[2021-08-11 16:14:08.172372] (Utility.Process) process [10252] done ExitSuccess
|
||||
[2021-08-11 16:14:08.349523] (Utility.Process) process [10207] done ExitSuccess
|
||||
[2021-08-11 16:14:08.35006] (Utility.Process) process [10208] done ExitSuccess
|
||||
[2021-08-11 16:14:08.352912] (Utility.Process) process [10244] done ExitSuccess
|
||||
[Wed Aug 11 16:14:08 EDT 2021] > cat .git/config
|
||||
[core]
|
||||
repositoryformatversion = 0
|
||||
filemode = true
|
||||
bare = false
|
||||
logallrefupdates = true
|
||||
[annex]
|
||||
uuid = 4f1069df-9aa9-4e72-92dc-b1f5ac6946d5
|
||||
version = 8
|
||||
[filter "annex"]
|
||||
smudge = git-annex smudge -- %f
|
||||
clean = git-annex smudge --clean -- %f
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
|
||||
so I did set it to `1` globally (in `~/.gitconfig`). Only later discovered that it should be `true` not `1` but it nohow seems affected git-annex operation in the tests.
|
||||
|
||||
Full log from tests run is available at [http://www.onerussian.com/tmp/git-annex-noretry+pidlock1-1.log](http://www.onerussian.com/tmp/git-annex-noretry+pidlock1-1.log)
|
||||
|
||||
and a common pattern something like
|
||||
|
||||
```
|
||||
fsck downloaded object: OK (0.02s)
|
||||
retrieveKeyFile resume from 0: FAIL
|
||||
Exception: .git/annex/tmp/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key: openFile: permission denied (Permission denied)
|
||||
fsck downloaded object: OK (0.02s)
|
||||
retrieveKeyFile resume from 33%: FAIL
|
||||
Exception: .git/annex/objects/1X/90/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
|
||||
fsck downloaded object: OK (0.03s)
|
||||
retrieveKeyFile resume from end: cp: cannot stat '.git/annex/objects/1X/90/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key': No such file or directory
|
||||
FAIL (0.28s)
|
||||
./Command/TestRemote.hs:292:
|
||||
failed
|
||||
```
|
||||
|
||||
I thought that may be it is due to running into the path limit, but I could touch a much (twice) longer path no problem and longest filename of 257 characters can be created.
|
||||
|
||||
details of the mount with minimal sensoring with XXXs:
|
||||
|
||||
```
|
||||
/ifs/rc/dartfs/rc on /dartfs/rc type nfs4 (rw,nosuid,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5,clientaddr=XXX.XXX.XXX.XXX,local_lock=none,addr=10.1.16.105)
|
||||
```
|
||||
|
||||
any ideas Joey? or how to troubleshoot on this beast further?
|
||||
|
||||
[[!meta title="357 out of 984 tests fail on NFS isilon mount"]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 10"
|
||||
date="2021-08-27T01:38:54Z"
|
||||
content="""
|
||||
ha -- I have mentioned that there is `--no-preserve` so I have tried it with -a to exclude xattr and it seems to do the trick as well
|
||||
|
||||
```
|
||||
(datalad) [d31548v@discovery7 _test_]$ rm -f bar baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ echo bar > bar
|
||||
(datalad) [d31548v@discovery7 _test_]$ chmod 444 bar
|
||||
(datalad) [d31548v@discovery7 _test_]$ cp --reflink=auto -a --no-preserve=xattr bar baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ chmod 644 baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ ls -l baz
|
||||
-rw-r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:36 baz
|
||||
```
|
||||
|
||||
so may be just worth explicitly excluding xattr instead of explicitly including some of the others? ;)
|
||||
"""]]
|
|
@ -1,44 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 11"""
|
||||
date="2021-08-27T16:41:45Z"
|
||||
content="""
|
||||
I've made the change to use that option. I think this should clear the test
|
||||
suite.
|
||||
|
||||
I see that Isilon is a NAS, so some proprietary and broken NFS server
|
||||
apparently. Kind of explains why cp -a would do this, because I found
|
||||
threads about cp -a preserving NFS xattrs, and people wanted it to,
|
||||
presumably for other reasons on less broken NFS servers.
|
||||
|
||||
This does raise the question of what might happen if someone copies
|
||||
a file themselves with cp -a on this NFS and then git-annex adds the copy.
|
||||
Probably git-annex would then be unable to remove the write bit
|
||||
from the annex object file.
|
||||
|
||||
I also worried about ACLS, but it seems like ACLs do not have this
|
||||
problem, because chmod a-w causes the write ACL that was set to be
|
||||
effectively unset:
|
||||
|
||||
joey@darkstar:~>chmod -w foo
|
||||
joey@darkstar:~>setfacl -m g:nogroup:rw foo
|
||||
joey@darkstar:~>ls -l foo
|
||||
-r--rw-r--+ 1 joey joey 0 Aug 27 12:53 foo
|
||||
nobody@darkstar:/home/joey$ echo test >> foo
|
||||
joey@darkstar:~>chmod a-w foo
|
||||
joey@darkstar:~>ls -l foo
|
||||
-r--rw-r--+ 1 joey joey 0 Aug 27 12:53 foo
|
||||
nobody@darkstar:/home/joey$ echo test >> foo
|
||||
bash: foo: Permission denied
|
||||
joey@darkstar:~>getfacl foo
|
||||
# file: foo
|
||||
# owner: joey
|
||||
# group: joey
|
||||
user::r--
|
||||
group::r--
|
||||
group:nogroup:rw- #effective:r--
|
||||
mask::r--
|
||||
other::r--
|
||||
|
||||
I've verified that git-annex add also clears the write ACL.
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 12"""
|
||||
date="2021-08-27T17:13:54Z"
|
||||
content="""
|
||||
I've also made git-annex add check, after removing write bits,
|
||||
if the file still has write bits set. It will refuse to add a file
|
||||
when it can't lock it down.
|
||||
|
||||
That should avoid the NFS xattr problem in a situation where
|
||||
cp -a was used to make a copy that then gets added to the annex.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 13"
|
||||
date="2021-08-27T18:46:39Z"
|
||||
content="""
|
||||
Great, thank you, I will test after the next daily build. What would be the easiest way to establish such an xattr enabled NFS setup, so we could tune up [our test on NFS scenarios](https://github.com/datalad/git-annex/blob/master/.github/workflows/build-ubuntu.yaml#L221) to catch any possible future regression?
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 14"""
|
||||
date="2021-08-30T15:59:29Z"
|
||||
content="""
|
||||
It seems likely to me that this NFS server is just broken. Why would a
|
||||
well-designed NFS server behave this way, rather than letting a chmod
|
||||
override the xattr that was set earlier?
|
||||
|
||||
I considered filing a bug report on coreutils or something about cp
|
||||
preserving the NFS xattr leading to this problem, but since it seems likely
|
||||
to be the NFS server on the NAS that is at fault, didn't think that would
|
||||
be productive.
|
||||
|
||||
If you did manage to reproduce that behavior with a regular
|
||||
linux NFS server, it seems like it would be a grounds for a bug on some
|
||||
part of linux..
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="just an update on my testing"
|
||||
date="2021-08-30T22:17:53Z"
|
||||
content="""
|
||||
FWIW, I've tested a debianstadalone build of `8.20210803+git153-g63c2cfd47-1~ndall+1` on that system, and besides 3 fails due to no `--kill` in elderly system wide gpgconf (apparently debian bundle does not come with one) -- it is all clean! (although took nearly a full work day to complete on that filesystem: `3 out of 984 tests failed (18157.38s)`)
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="note on `sec=` on the mount"
|
||||
date="2021-08-16T21:38:13Z"
|
||||
content="""
|
||||
Just now, while comparing with operation on another mount point (on NFS v3, not 4 though), noted that this mount point is `sec=krb5` (and NFS 3 one is `sec=sys`)... so might be what is contributing here
|
||||
"""]]
|
|
@ -1,42 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-08-24T16:01:30Z"
|
||||
content="""
|
||||
|
||||
The "retrieveKeyFile resume from 0%" test
|
||||
starts by writing an empty file with default permissions.
|
||||
It then does the download storing to that file.
|
||||
|
||||
The permission denied is happening when trying to open that file
|
||||
later, to write to it. So this seems like a umask or some other
|
||||
kind of permissions problem. Maybe filesystem related in some way,
|
||||
if the filesystem is somehow not honoring permissions, but that would
|
||||
not be my first or second guess.
|
||||
|
||||
The other 2 failures shown do chain on from that
|
||||
initial failure, since that test does some things that the
|
||||
later tests depend on. Looking through the log, there are several other
|
||||
test failures that also seem due to permissions, that all look like this,
|
||||
or probably chain from this:
|
||||
|
||||
add: FAIL (4.08s)
|
||||
./Test/Framework.hs:324:
|
||||
unable to modify sha1foo
|
||||
|
||||
And that error message happens when it's checking that the file
|
||||
permissions include the write bit.
|
||||
|
||||
So, your next step should be something like:
|
||||
|
||||
umask
|
||||
touch foo
|
||||
ls -l foo
|
||||
echo foo >> foo
|
||||
|
||||
I tried with `umask 0222` (new files lack any write bit) but it failed a lot
|
||||
earlier, while setting up the test suite. Which seems ok itself given how
|
||||
restricted and weird that umask is. If the test suite overrode the umask, it
|
||||
would not notice if some actual reasonable umask caused problems. But I'm
|
||||
getting ahead of confirmation that it is a umask problem.
|
||||
"""]]
|
|
@ -1,19 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 3"
|
||||
date="2021-08-24T18:06:10Z"
|
||||
content="""
|
||||
```
|
||||
[d31548v@discovery7 _test_]$ umask
|
||||
0022
|
||||
[d31548v@discovery7 _test_]$ touch foo
|
||||
[d31548v@discovery7 _test_]$ ls -l foo
|
||||
-rw-r--r-- 1 d31548v rc-DBIC-bids 0 Aug 24 13:53 foo
|
||||
[d31548v@discovery7 _test_]$ echo foo >> foo
|
||||
|
||||
```
|
||||
|
||||
I blindly think it is somehow due to that `sec=krb5` nfs mount option. didn't try to reproduce the setup \"locally\" (would need all the krb config I guess) to see if I am right on that.
|
||||
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-08-24T18:51:41Z"
|
||||
content="""
|
||||
Somehow the write bit is not being set. Since the umask is ok, the next
|
||||
most likely thing seems to be NFS either not setting it, or temporarily
|
||||
forgetting it's set, or the file somehow being created owned by a different
|
||||
user.
|
||||
|
||||
Could run: `git-annex test --keep-failures` and then when it gets
|
||||
to one of the "unable to modify foo" failures, it will print out the path
|
||||
to the repository that it failed in, and you can "ls -l" the path,
|
||||
which should include the "foo" file.
|
||||
"""]]
|
|
@ -1,22 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 5"
|
||||
date="2021-08-25T22:05:06Z"
|
||||
content="""
|
||||
```
|
||||
[d31548v@discovery7 _test_]$ grep -A1 -m1 'unable to modify foo' git-annex-8.20210803-g99bb214-20210925-1.log
|
||||
unable to modify foo
|
||||
export_import_subdir: ** Preserving repo for failure analysis in .t/tmprepo5
|
||||
|
||||
[d31548v@discovery7 _test_]$ ls -la .t/tmprepo5
|
||||
total 239
|
||||
drwxr-xr-x 4 d31548v rc-DBIC-bids 112 Aug 25 10:59 .
|
||||
drwxr-xr-x 205 d31548v rc-DBIC-bids 5525 Aug 25 11:31 ..
|
||||
-rw-r--r-- 1 d31548v rc-DBIC-bids 19 Aug 25 10:58 bar.c
|
||||
drwxr-xr-x 2 d31548v rc-DBIC-bids 0 Aug 25 10:59 dir
|
||||
-r--r--r-- 1 d31548v rc-DBIC-bids 20 Aug 25 10:58 foo
|
||||
drwxr-xr-x 8 d31548v rc-DBIC-bids 296 Aug 25 10:59 .git
|
||||
-r--r--r-- 1 d31548v rc-DBIC-bids 25 Aug 25 10:58 sha1foo
|
||||
```
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2021-08-26T15:06:26Z"
|
||||
content="""
|
||||
So confirmed to be actual missing write bits.
|
||||
|
||||
I think that the next step is going to be to dig through a `strace -f -s 1024`,
|
||||
of the "resume from 0" file that has the permission denied problem,
|
||||
and of the ".t/tmprepo5/foo" file that is somehow not writable.
|
||||
|
||||
Then I can compare with a strace where it does not fail, to see
|
||||
if it's doing something different.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 7"
|
||||
date="2021-08-26T16:49:56Z"
|
||||
content="""
|
||||
details on what I have ran and logs went via email in response to your last email response ;)
|
||||
"""]]
|
|
@ -1,66 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 8"""
|
||||
date="2021-08-26T20:18:05Z"
|
||||
content="""
|
||||
From the strace of `git-annex test -p export_import_subdir`,
|
||||
I have found what I think is the smoking gun:
|
||||
|
||||
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
|
||||
21179 chmod(".git/annex/othertmp/foo.0/foo", 0100644) = 0
|
||||
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
|
||||
21179 rename(".git/annex/othertmp/foo.0/foo", "./foo") = 0
|
||||
|
||||
This seems like proof that NFS is losing the file permissions. chmod 644
|
||||
followed immediately by stat showing the mode is still 444, with nothing
|
||||
happening in between according to strace.
|
||||
|
||||
The full set of operations on that temp file is:
|
||||
|
||||
21179 unlink(".git/annex/othertmp/foo.0/foo") = -1 ENOENT (No such file or directory)
|
||||
21179 vfork( <unfinished ...>
|
||||
21217 execve("/dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/bin/cp", ["cp", "--reflink=auto", "-a", ".git/annex/objects/Kj/0x/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77", ".git/annex/othertmp/foo.0/foo"], 0x7fff09e296f8 /* 77 vars */ <unfinished ...>
|
||||
21179 <... vfork resumed>) = 21217
|
||||
21217 stat(".git/annex/othertmp/foo.0/foo", 0x7ffc13f0d430) = -1 ENOENT (No such file or directory)
|
||||
21217 newfstatat(AT_FDCWD, ".git/annex/othertmp/foo.0/foo", 0x7ffc13f0d1b0, 0) = -1 ENOENT (No such file or directory)
|
||||
21217 openat(AT_FDCWD, ".git/annex/othertmp/foo.0/foo", O_WRONLY|O_CREAT|O_EXCL, 0400) = 4
|
||||
21217 fstat(4, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
|
||||
21217 ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
|
||||
21217 write(4, "annexed file content", 20) = 20
|
||||
21217 utimensat(4, NULL, [{tv_sec=1629996169, tv_nsec=810295000} /* 2021-08-26T12:42:49.810295000-0400 */, {tv_sec=1629996169, tv_nsec=810295000} /* 2021-08-26T12:42:49.810295000-0400 */], 0 <unfinished ...>
|
||||
21217 <... utimensat resumed>) = 0
|
||||
21217 fchmod(4, 0600) = 0
|
||||
21217 fsetxattr(4, "system.nfs4_acl", "\0\0\0\3\0\0\0\0\0\0\0\0\0\26\1\211\0\0\0\6OWNER@\0\0\0\0\0\0\0\0\0\0\0\22\0\211\0\0\0\6GROUP@\0\0\0\0\0\0\0\0\0\0\0\22\0\211\0\0\0\tEVERYONE@\0\0", 80, 0) = 0
|
||||
21217 fchmod(4, 0400) = 0
|
||||
21217 fsetxattr(4, "system.posix_acl_access", "\2\0\0\0\1\0\4\0\377\377\377\377\4\0\4\0\377\377\377\377 \0\4\0\377\377\377\377", 28, 0) = -1 EOPNOTSUPP (Operation not supported)
|
||||
21217 fchmod(4, 0100444) = 0
|
||||
21217 close(4) = 0
|
||||
21179 stat(".git/annex/othertmp/foo.0/foo", <unfinished ...>
|
||||
21179 <... stat resumed>{st_mode=S_IFREG|0444, st_size=20, ...}) = 0
|
||||
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
|
||||
21179 chmod(".git/annex/othertmp/foo.0/foo", 0100644) = 0
|
||||
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
|
||||
21179 rename(".git/annex/othertmp/foo.0/foo", "./foo") = 0
|
||||
|
||||
So, cp -a copies the object file to the temp file, leaving it 444 at the end,
|
||||
as it should be since the object file had that mode. And the following
|
||||
chmod 644 has no effect.
|
||||
|
||||
That `nfs4_acl` seems like it contains effectively the permissions of the file,
|
||||
which were 444 at that point. Could that xattr override the later chmod?
|
||||
|
||||
It would be worth trying this:
|
||||
|
||||
# use runshell to get the same cp that git-annex uses
|
||||
/dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
|
||||
echo bar > bar
|
||||
chmod 444 bar
|
||||
cp --reflink=auto -a bar baz
|
||||
chmod 644 baz
|
||||
ls -l baz
|
||||
|
||||
If preserving the `nfs4_acl` is indeed the problem, then a fix in git-annex
|
||||
might be possible. (Even though this feels like a NFS bug)
|
||||
Replace `cp -a` with `cp -dR --preserve=mode,ownership,timestamps`
|
||||
so omitting the xattr preservation.
|
||||
"""]]
|
|
@ -1,67 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 9"
|
||||
date="2021-08-27T01:34:18Z"
|
||||
content="""
|
||||
not sure if related, but I noticed you said to use `/dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell` which corresponds to the git-annex I have available upon login, but the tests were ran using git annex installed in a conda env `8.20200502-g55acb2e52` `~/.conda/envs/datalad/bin/git-annex`, which is a \"proper\" (not standalone) build and lacks `runshell`, and I believe would just use system wide `cp`.
|
||||
|
||||
# in that /dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
|
||||
|
||||
```
|
||||
(datalad) [d31548v@discovery7 _test_]$ /dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
|
||||
$ pwd
|
||||
/dartfs/rc/lab/D/DBIC/DBIC/inbox/BIDS/_test_
|
||||
$ echo bar > bar
|
||||
chmod 444 bar
|
||||
cp --reflink=auto -a bar baz
|
||||
chmod 644 baz
|
||||
ls -l baz$ $ $ $
|
||||
-r--r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:27 baz
|
||||
|
||||
```
|
||||
|
||||
now outside using default cp -- the same
|
||||
|
||||
```shell
|
||||
(datalad) [d31548v@discovery7 _test_]$ echo bar > bar
|
||||
(datalad) [d31548v@discovery7 _test_]$ chmod 444 bar
|
||||
(datalad) [d31548v@discovery7 _test_]$ cp --reflink=auto -a bar baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ chmod 644 baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ ls -l baz
|
||||
-r--r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:29 baz
|
||||
```
|
||||
|
||||
and indeed we do get a different result, with `w`, if running under a regular `/tmp`:
|
||||
|
||||
```
|
||||
(datalad) [d31548v@discovery7 tmp]$ pwd
|
||||
/tmp
|
||||
(datalad) [d31548v@discovery7 tmp]$ echo bar > bar
|
||||
(datalad) [d31548v@discovery7 tmp]$ chmod 444 bar
|
||||
(datalad) [d31548v@discovery7 tmp]$ cp --reflink=auto -a bar baz
|
||||
(datalad) [d31548v@discovery7 tmp]$ chmod 644 baz
|
||||
(datalad) [d31548v@discovery7 tmp]$ ls -l baz
|
||||
-rw-r--r-- 1 d31548v rc-users 4 Aug 26 21:29 baz
|
||||
```
|
||||
|
||||
Now, using the alternative to `-a` setting -- retains the `w` in both `runshell` and outside:
|
||||
|
||||
```
|
||||
(datalad) [d31548v@discovery7 _test_]$ /dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
|
||||
$ echo bar > bar
|
||||
chmod 444 bar
|
||||
cp --reflink=auto -dR --preserve=mode,ownership,timestamps bar baz
|
||||
chmod 644 baz
|
||||
ls -l baz$ $ $ $
|
||||
-rw-r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:32 baz
|
||||
$
|
||||
(datalad) [d31548v@discovery7 _test_]$ rm -f bar baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ echo bar > bar
|
||||
(datalad) [d31548v@discovery7 _test_]$ chmod 444 bar
|
||||
(datalad) [d31548v@discovery7 _test_]$ cp --reflink=auto -dR --preserve=mode,ownership,timestamps bar baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ chmod 644 baz
|
||||
(datalad) [d31548v@discovery7 _test_]$ ls -l baz
|
||||
-rw-r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:32 baz
|
||||
```
|
||||
"""]]
|
|
@ -1,19 +0,0 @@
|
|||
### Please describe the problem.
|
||||
|
||||
With recent RFing of scanning for unlocked/annexed files (I guess), a sweep of datalad tests on OSX started to take about 3h 30min instead of prior 1h 46min. So pretty much twice. Besides possibly affecting user experience, I am afraid that would cause too much ripples though our CI setup which might not run out of time
|
||||
|
||||
Logs etc are at https://github.com/datalad/git-annex/actions/workflows/build-macos.yaml
|
||||
|
||||
The first red is ok, just a fluke but then they all fail due to change in output log string (for which there is a fix but somehow behavior on osx seems different, yet to check).
|
||||
|
||||
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
Currently 8.20210428+git282-gd39dfed2a and first got slow with
|
||||
8.20210428+git228-g13a6bfff4 and was ok with 8.20210428+git202-g9a5981a15
|
||||
|
||||
[[!meta title="performance edge case when adding large numbers of identical files"]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="slow down is OSX specific"
|
||||
date="2021-06-08T14:28:18Z"
|
||||
content="""
|
||||
Thank you Joey for all the OPTs.
|
||||
Looking at it again: the significant impact seems to be OSX builds specific: Linux and Windows builds did get just a bit slower (if at all). So may be situation is not that dire ;-) Two possibilities:
|
||||
|
||||
- Even though coincidences like this are unlikely, it is possible that OSX VMs on Github actions got slower?
|
||||
- may be the RFing is somehow effecting OSX specifically?
|
||||
|
||||
I will try to do some timings locally. Also I should finally get those logs (and built packages) into some more convenient form using [tinuous](https://github.com/con/tinuous/) John has been developing -- would allow for easy greps instead of all these jumping through CIs etc.
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 11"""
|
||||
date="2021-06-08T15:41:23Z"
|
||||
content="""
|
||||
I can't think of anything OSX specific in the recent changes.
|
||||
|
||||
I have added debugging of when reconcileStaged wakes up and possibly
|
||||
wastes some time, eg:
|
||||
|
||||
joey@darkstar:~/tmp/big> git config annex.debug true
|
||||
joey@darkstar:~/tmp/big> git config annex.debugfilter Database.Keys
|
||||
joey@darkstar:~/tmp/big> git-annex init
|
||||
init
|
||||
[2021-06-08 11:52:11.854202926] (Database.Keys) reconcileStaged start
|
||||
(scanning for annexed files...)
|
||||
[2021-06-08 11:52:44.092620256] (Database.Keys) reconcileStaged end
|
||||
ok
|
||||
|
||||
All the new work happens in between those two debugs[1], so you could check if
|
||||
the time sink is there or elsewhere.
|
||||
|
||||
(Note that the last release takes 2 seconds longer for that init than
|
||||
it does now..)
|
||||
|
||||
[1] With the exception of a single call to `git write-tree`, but that
|
||||
should be very fast.
|
||||
"""]]
|
|
@ -1,153 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="all recent builds/logs are fetched to smaug"
|
||||
date="2021-06-08T16:50:11Z"
|
||||
content="""
|
||||
Finished fetching all the builds (logs + artifacts, i.e. built installers and packages) to smaug. FWIW `joey` user on smaug should be able to access them at /mnt/datasets/datalad/ci/git-annex as well (it is all git/git-annex with submodules happen you want to get a \"clone\" ;))
|
||||
|
||||
<details>
|
||||
<summary>grep confirms timing for datalad testson OSX</summary>
|
||||
|
||||
```shell
|
||||
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/0[56]/*/cron-*/*/*macOS*/*_test-datalad\ \(master\).txt
|
||||
2021-05-01T05:47:43.6623300Z Ran 1269 tests in 10381.041s
|
||||
2021-05-02T04:27:47.4468150Z Ran 1269 tests in 5835.199s
|
||||
2021-05-03T04:25:06.4741780Z Ran 1269 tests in 5817.953s
|
||||
2021-05-04T04:26:25.2725400Z Ran 1269 tests in 5925.938s
|
||||
2021-05-05T04:15:45.7241110Z Ran 1270 tests in 6003.093s
|
||||
2021-05-06T04:14:18.9203140Z Ran 1270 tests in 5821.084s
|
||||
2021-05-07T04:19:12.5257430Z Ran 1270 tests in 5887.912s
|
||||
2021-05-08T04:17:21.4790320Z Ran 1270 tests in 5775.691s
|
||||
2021-05-09T04:20:53.2085730Z Ran 1270 tests in 5961.437s
|
||||
2021-05-10T04:21:52.2492300Z Ran 1270 tests in 5698.905s
|
||||
2021-05-11T04:28:39.1451570Z Ran 1270 tests in 5899.918s
|
||||
2021-05-12T04:28:26.6181110Z Ran 1270 tests in 5964.851s
|
||||
2021-05-13T04:30:47.2978230Z Ran 1277 tests in 5938.297s
|
||||
2021-05-14T04:37:51.3172970Z Ran 1277 tests in 5921.571s
|
||||
2021-05-15T04:31:29.6466840Z Ran 1277 tests in 6060.762s
|
||||
2021-05-16T04:33:26.5552670Z Ran 1277 tests in 5789.405s
|
||||
2021-05-17T04:36:52.4460760Z Ran 1277 tests in 5855.382s
|
||||
2021-05-18T04:37:24.5613580Z Ran 1278 tests in 6007.685s
|
||||
2021-05-19T04:34:37.3253830Z Ran 1279 tests in 5940.204s
|
||||
2021-05-20T04:36:10.3184010Z Ran 1280 tests in 5941.388s
|
||||
2021-05-21T04:41:10.2664330Z Ran 1280 tests in 6218.775s
|
||||
2021-05-22T04:40:39.2754540Z Ran 1289 tests in 5884.267s
|
||||
2021-05-23T04:37:51.5005200Z Ran 1289 tests in 5750.672s
|
||||
2021-05-24T04:38:24.5894000Z Ran 1289 tests in 5911.655s
|
||||
2021-05-25T04:51:35.2189100Z Ran 1289 tests in 6266.836s
|
||||
2021-05-26T06:56:33.6660300Z Ran 1293 tests in 12584.992s
|
||||
2021-05-27T07:08:27.4015580Z Ran 1293 tests in 11901.552s
|
||||
2021-05-28T07:39:17.3481450Z Ran 1292 tests in 12094.343s
|
||||
2021-05-29T08:05:14.3586440Z Ran 1294 tests in 12205.434s
|
||||
2021-05-30T08:09:33.0780990Z Ran 1294 tests in 12028.089s
|
||||
2021-05-31T08:16:36.3910830Z Ran 1294 tests in 12329.455s
|
||||
2021-06-01T08:42:24.3167200Z Ran 1294 tests in 12115.378s
|
||||
2021-06-02T08:42:23.1985610Z Ran 1294 tests in 12432.309s
|
||||
2021-06-03T08:00:08.2576030Z Ran 1294 tests in 11556.974s
|
||||
2021-06-04T07:25:39.1674160Z Ran 1294 tests in 11946.195s
|
||||
2021-06-05T07:14:00.5262620Z Ran 1294 tests in 12456.432s
|
||||
2021-06-06T06:53:14.7001370Z Ran 1294 tests in 11677.647s
|
||||
2021-06-07T07:12:57.5076610Z Ran 1294 tests in 12042.332s
|
||||
2021-06-08T06:48:01.3977250Z Ran 1294 tests in 12126.463
|
||||
```
|
||||
|
||||
Interestingly the top one is in 10k, but prior ones were ok
|
||||
|
||||
```shell
|
||||
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/04/*/cron-*/*/*macOS*/*_test-datalad\ \(master\).txt
|
||||
2021-04-27T03:58:30.4829510Z Ran 1265 tests in 6312.819s
|
||||
2021-04-28T03:44:23.1655040Z Ran 1265 tests in 5622.562s
|
||||
2021-04-29T03:50:14.0670840Z Ran 1269 tests in 6196.774s
|
||||
2021-04-30T04:21:45.1727310Z Ran 1269 tests in 5879.034s
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>and absence of the drastic effect confirmed for linux and windows</summary>
|
||||
|
||||
```shell
|
||||
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/0[56]/*/cron-*/*/*buntu*/*_test-datalad\ \(master\).txt
|
||||
2021-05-01T04:16:34.1539917Z Ran 1269 tests in 3999.905s
|
||||
2021-05-02T04:17:04.5604428Z Ran 1269 tests in 3745.951s
|
||||
2021-05-03T04:22:46.1508135Z Ran 1269 tests in 4133.703s
|
||||
2021-05-04T04:22:19.3612212Z Ran 1269 tests in 4341.995s
|
||||
2021-05-05T04:23:04.3327570Z Ran 1270 tests in 4383.268s
|
||||
2021-05-06T04:07:33.2967674Z Ran 1270 tests in 3457.688s
|
||||
2021-05-07T04:18:26.7140893Z Ran 1270 tests in 3778.162s
|
||||
2021-05-08T04:13:13.9041377Z Ran 1270 tests in 3728.503s
|
||||
2021-05-09T04:16:59.6844666Z Ran 1270 tests in 3898.714s
|
||||
2021-05-11T04:21:45.5447035Z Ran 1270 tests in 4032.457s
|
||||
2021-05-12T04:20:48.6349049Z Ran 1270 tests in 3940.342s
|
||||
2021-05-13T04:26:53.8376010Z Ran 1277 tests in 4130.863s
|
||||
2021-05-14T04:37:41.5689917Z Ran 1277 tests in 4476.880s
|
||||
2021-05-15T04:20:12.0126613Z Ran 1277 tests in 4007.927s
|
||||
2021-05-16T04:30:50.0329536Z Ran 1277 tests in 4077.010s
|
||||
2021-05-17T04:25:03.3302428Z Ran 1277 tests in 4034.791s
|
||||
2021-05-18T04:33:23.5513354Z Ran 1278 tests in 4422.760s
|
||||
2021-05-19T04:19:47.5664435Z Ran 1279 tests in 3910.031s
|
||||
2021-05-20T04:13:45.4617209Z Ran 1280 tests in 3508.342s
|
||||
2021-05-21T04:23:56.1103373Z Ran 1280 tests in 3946.563s
|
||||
2021-05-22T04:30:52.1649178Z Ran 1289 tests in 4406.863s
|
||||
2021-05-23T04:35:53.5526207Z Ran 1289 tests in 3818.054s
|
||||
2021-05-24T04:33:44.9770098Z Ran 1289 tests in 4069.751s
|
||||
2021-05-25T04:38:28.6369074Z Ran 1289 tests in 3809.039s
|
||||
2021-05-26T05:16:56.6913239Z Ran 1293 tests in 4340.973s
|
||||
2021-05-27T05:21:33.5230620Z Ran 1293 tests in 4303.062s
|
||||
2021-05-28T05:44:02.6763227Z Ran 1292 tests in 4013.921s
|
||||
2021-05-29T06:05:40.8639499Z Ran 1294 tests in 3581.055s
|
||||
2021-05-30T06:17:23.4265104Z Ran 1294 tests in 4257.736s
|
||||
2021-05-31T06:36:09.6446865Z Ran 1294 tests in 4782.606s
|
||||
2021-06-01T06:57:26.6049829Z Ran 1294 tests in 4391.030s
|
||||
2021-06-02T06:59:17.8345547Z Ran 1294 tests in 4737.580s
|
||||
2021-06-03T06:10:51.5533496Z Ran 1294 tests in 3453.557s
|
||||
2021-06-04T05:44:15.7873867Z Ran 1294 tests in 4113.770s
|
||||
2021-06-05T05:25:15.5789949Z Ran 1294 tests in 4383.014s
|
||||
2021-06-06T05:20:48.7371175Z Ran 1294 tests in 4386.210s
|
||||
2021-06-07T05:33:24.8643085Z Ran 1294 tests in 4303.855s
|
||||
2021-06-08T05:03:38.3420188Z Ran 1294 tests in 4371.387s
|
||||
```
|
||||
|
||||
```shell
|
||||
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/0[56]/*/cron-*/*/*indows*/*_test-datalad\ \(master\).txt
|
||||
2021-05-01T05:29:21.7469495Z Ran 1250 tests in 4904.324s
|
||||
2021-05-02T05:33:35.7321469Z Ran 1250 tests in 4835.792s
|
||||
2021-05-03T05:32:36.9342468Z Ran 1250 tests in 5040.210s
|
||||
2021-05-04T05:18:53.5090364Z Ran 1250 tests in 4508.693s
|
||||
2021-05-05T05:29:53.0946195Z Ran 1251 tests in 4989.032s
|
||||
2021-05-06T05:26:05.6627667Z Ran 1251 tests in 4766.005s
|
||||
2021-05-07T05:19:54.3788683Z Ran 1251 tests in 4305.963s
|
||||
2021-05-08T05:32:49.9787816Z Ran 1251 tests in 4999.028s
|
||||
2021-05-09T05:38:03.4560444Z Ran 1251 tests in 5482.509s
|
||||
2021-05-10T05:34:58.3502335Z Ran 1251 tests in 5067.805s
|
||||
2021-05-11T05:29:07.7916431Z Ran 1251 tests in 4797.435s
|
||||
2021-05-12T05:23:10.5488731Z Ran 1251 tests in 4276.426s
|
||||
2021-05-13T05:25:09.1884039Z Ran 1258 tests in 4275.409s
|
||||
2021-05-14T05:39:20.0295201Z Ran 1258 tests in 4999.023s
|
||||
2021-05-15T05:31:48.8021685Z Ran 1258 tests in 4736.174s
|
||||
2021-05-16T05:33:12.0136713Z Ran 1258 tests in 4310.326s
|
||||
2021-05-17T05:30:27.7704400Z Ran 1258 tests in 4561.060s
|
||||
2021-05-18T05:31:34.3746293Z Ran 1259 tests in 4426.723s
|
||||
2021-05-19T05:31:11.8604863Z Ran 1260 tests in 4903.506s
|
||||
2021-05-20T05:22:56.6087010Z Ran 1261 tests in 4299.536s
|
||||
2021-05-21T05:28:25.7511300Z Ran 1261 tests in 4498.286s
|
||||
2021-05-22T05:44:54.2975226Z Ran 1270 tests in 5376.292s
|
||||
2021-05-23T05:35:29.5504579Z Ran 1270 tests in 4289.031s
|
||||
2021-05-24T05:33:40.4688984Z Ran 1270 tests in 4202.429s
|
||||
2021-05-25T05:45:30.0310577Z Ran 1270 tests in 4882.292s
|
||||
2021-05-26T06:03:08.4980635Z Ran 1274 tests in 4441.783s
|
||||
2021-05-27T06:05:25.0347805Z Ran 1274 tests in 4381.865s
|
||||
2021-05-28T06:35:10.3614775Z Ran 1273 tests in 4529.720s
|
||||
2021-05-29T07:10:28.5980900Z Ran 1275 tests in 4639.335s
|
||||
2021-05-30T07:01:37.6908114Z Ran 1275 tests in 5123.743s
|
||||
2021-05-31T07:01:41.6742606Z Ran 1275 tests in 4672.112s
|
||||
2021-06-01T07:36:38.0359361Z Ran 1275 tests in 5183.674s
|
||||
2021-06-02T07:21:46.3022239Z Ran 1275 tests in 4610.975s
|
||||
2021-06-03T07:20:14.4589015Z Ran 1275 tests in 5795.353s
|
||||
2021-06-04T06:44:13.4142157Z Ran 1275 tests in 4574.473s
|
||||
2021-06-08T06:10:00.4654735Z Ran 1275 tests in 5406.636s
|
||||
```
|
||||
</details>
|
||||
|
||||
now will try to do timing on a local OSX to see if it is indeed just some coincidence with OSX VM getting slower or it is \"real\".
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 13"
|
||||
date="2021-06-08T19:08:01Z"
|
||||
content="""
|
||||
One other test for “some coincidence with OSX VM getting slower“ is to try the current github CI test with an older git-annex.
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="getting closer..."
|
||||
date="2021-06-08T19:21:59Z"
|
||||
content="""
|
||||
I think I have localized the slowdown to a single particular test in datalad which operates on a very heavy tree with tiny files.
|
||||
Good and bad runs:
|
||||
|
||||
```
|
||||
*$> grep -h -A3 'datalad.support.tests.test_annexrepo.test_files_split(' builds/2021/05/*/cron-*/44196064/Build\ git-annex\ on\ macOS-29{2,3}-failed/1_test-datalad\ \(master\).txt
|
||||
2021-05-25T04:34:39.7723910Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.gitrepo.GitRepo'>,) ... ok
|
||||
2021-05-25T04:39:31.3031220Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.annexrepo.AnnexRepo'>,) ... ok
|
||||
2021-05-25T04:39:31.3032670Z datalad.support.tests.test_annexrepo.test_get_size_from_key ... ok
|
||||
2021-05-25T04:39:31.3043440Z datalad.support.tests.test_annexrepo.test_done_deprecation ... ok
|
||||
2021-05-25T04:39:31.3104830Z datalad.support.tests.test_ansi_colors.test_color_enabled ... ok
|
||||
--
|
||||
2021-05-26T05:01:12.6881120Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.gitrepo.GitRepo'>,) ... ok
|
||||
2021-05-26T06:47:04.8547640Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.annexrepo.AnnexRepo'>,) ... ok
|
||||
2021-05-26T06:47:04.8549600Z datalad.support.tests.test_annexrepo.test_get_size_from_key ... ok
|
||||
2021-05-26T06:47:04.8559760Z datalad.support.tests.test_annexrepo.test_done_deprecation ... ok
|
||||
2021-05-26T06:47:04.8636720Z datalad.support.tests.test_ansi_colors.test_color_enabled ... ok
|
||||
|
||||
```
|
||||
you can see from timestamps (a guess github prepends time stamp AFTER getting full line) that there is over 1h30m spent there on `test_files_split(<class 'datalad.support.annexrepo.AnnexRepo'>,)`. [Here is the actual test etc for posterity](https://github.com/datalad/datalad/blob/master/datalad/support/tests/test_annexrepo.py#L2354). Yet to pin point more specifically on what is going on but most likely some interplay with command line length invocation limits (specific to OSX) etc.
|
||||
|
||||
So good news is that it is not some widely spread drastic slow-down effect as far as I see it.
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 15"
|
||||
date="2021-06-08T20:23:09Z"
|
||||
content="""
|
||||
ok -- I think it (or at least a part of it, datalad test is still running) boils down to `git-annex add` now doing some `O(n-staged * n-cmdline-paths)` lookup/operation (instead of before just `O(n-cmdline-paths)`) whenever we have a series of `annex add --json cmdline-paths ...`. This is reflected by the fact that if before we had about `~30 sec` per each invocation of `annex add`, now we have `30 284 528 720`.
|
||||
|
||||
In any case in datalad we should finally switch to use `annex add --batch`, filed [an issue](https://github.com/datalad/datalad/issues/5721), but I guess may be it could also be addressed on git-annex side since sounds like some suboptimal data structure is used for some paths' matching.
|
||||
"""]]
|
|
@ -1,33 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 16"""
|
||||
date="2021-06-08T20:56:50Z"
|
||||
content="""
|
||||
This is starting to make some sense. If you're running git-annex add
|
||||
N times adding M files each time, then each run will now diff the
|
||||
changes in the index made by the previous run.
|
||||
|
||||
And the first part of diffing the index is generating a tree from all the
|
||||
files in it, which is to some extent `O(N*M)` (though IDK, git may have
|
||||
optimisations involving subtrees or such). So the combined N git-annex add
|
||||
runs come to `O(N*N*M)`
|
||||
|
||||
On linux, `git write-tree` with 100,000 files in the index runs in under 1
|
||||
second, so athe `N*M` is not too bad. And then there's the overhead of
|
||||
git-annex processing the resulting diff, which takes more time but is what
|
||||
I've been optimising.
|
||||
|
||||
Perhaps on OSX something is making the write-tree significantly slower.
|
||||
Or something is making it run the command more with fewer files per run.
|
||||
Although IIRC OSX has close to the same maximum command line length as
|
||||
linux.
|
||||
|
||||
Or maybe the index diffing is diffing from the wrong start point.
|
||||
One way I can think of where this would happen is if
|
||||
it somehow misdetects the index as being locked.
|
||||
|
||||
A --debug trace of the one of the later git-annex add runs in that
|
||||
test case would probably shed some useful light.
|
||||
|
||||
Yes, --batch should avoid the problem...
|
||||
"""]]
|
|
@ -1,93 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 17"""
|
||||
date="2021-06-09T18:53:14Z"
|
||||
content="""
|
||||
Trying to repro on linux, batches of 1000 files each time:
|
||||
|
||||
joey@darkstar:~/tmp/bench6>/usr/bin/time git-annex add 1??? --quiet
|
||||
7.17user 5.02system 0:11.33elapsed 107%CPU (0avgtext+0avgdata 69480maxresident)k
|
||||
6880inputs+40344outputs (4major+404989minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 2??? --quiet
|
||||
7.75user 5.63system 0:12.62elapsed 106%CPU (0avgtext+0avgdata 70296maxresident)k
|
||||
3640inputs+42656outputs (9major+414106minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 3??? --quiet
|
||||
8.04user 5.74system 0:12.92elapsed 106%CPU (0avgtext+0avgdata 70396maxresident)k
|
||||
1456inputs+44200outputs (8major+414767minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 4??? --quiet
|
||||
8.66user 5.57system 0:13.45elapsed 105%CPU (0avgtext+0avgdata 69620maxresident)k
|
||||
768inputs+45768outputs (11major+416364minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 5??? --quiet
|
||||
8.83user 5.58system 0:13.92elapsed 103%CPU (0avgtext+0avgdata 69956maxresident)k
|
||||
5448inputs+47128outputs (45major+415820minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 6??? --quiet
|
||||
9.00user 5.49system 0:13.60elapsed 106%CPU (0avgtext+0avgdata 70340maxresident)k
|
||||
20864inputs+48768outputs (270major+418220minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 7??? --quiet
|
||||
9.23user 5.56system 0:13.90elapsed 106%CPU (0avgtext+0avgdata 70352maxresident)k
|
||||
12736inputs+50024outputs (492major+417882minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 8??? --quiet
|
||||
9.27user 5.62system 0:13.89elapsed 107%CPU (0avgtext+0avgdata 70236maxresident)k
|
||||
11320inputs+51816outputs (411major+419672minor)pagefaults 0swaps
|
||||
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 9??? --quiet
|
||||
9.33user 5.80system 0:14.04elapsed 107%CPU (0avgtext+0avgdata 70380maxresident)k
|
||||
10952inputs+53128outputs (281major+419771minor)pagefaults 0swaps
|
||||
|
||||
There's some growth here, but it seems linear to the number of new files in the git index.
|
||||
|
||||
Doing the same with the last git-annex release:
|
||||
|
||||
0:10.97elapsed, 0:11.24elapsed, 0:11.65elapsed, 0:11.90elapsed,
|
||||
0:12.25elapsed, 0:12.86elapsed, 0:12.74elapsed, 0:12.84elapsed,
|
||||
0:13.26elapsed
|
||||
|
||||
So close to the same, the slowdown feels minimal.
|
||||
|
||||
Then on OSX:
|
||||
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 1??? --quiet
|
||||
18.26 real 6.03 user 8.35 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 2??? --quiet
|
||||
26.68 real 6.63 user 9.08 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 3??? --quiet
|
||||
32.13 real 6.75 user 9.09 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 4??? --quiet
|
||||
34.49 real 7.11 user 9.40 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 5??? --quiet
|
||||
34.61 real 7.08 user 9.18 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 6??? --quiet
|
||||
36.66 real 7.20 user 9.31 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 7??? --quiet
|
||||
38.29 real 7.35 user 9.42 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 8??? --quiet
|
||||
35.85 real 7.39 user 9.57 sys
|
||||
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 9??? --quiet
|
||||
36.20 real 7.56 user 9.59 sys
|
||||
|
||||
Well, this does not seem exponential but it's certainly growing
|
||||
faster than linux.
|
||||
|
||||
Then tried on OSX with 100 chunks of 100 files each:
|
||||
|
||||
datalads-imac:bench joey$ for x in $(seq 10 99); do echo $x; PATH=$HOME:$PATH /usr/bin/time ~/git-annex add $x?? --quiet; done
|
||||
...
|
||||
23
|
||||
3.91 real 0.83 user 1.17 sys
|
||||
24
|
||||
3.31 real 0.83 user 1.25 sys
|
||||
25
|
||||
3.32 real 0.83 user 1.26 sys
|
||||
...
|
||||
80
|
||||
5.68 real 1.05 user 1.60 sys
|
||||
81
|
||||
6.18 real 1.06 user 1.60 sys
|
||||
82
|
||||
7.06 real 1.06 user 1.59 sys
|
||||
|
||||
There's more overhead because git-annex has to start up and diff
|
||||
the index trees, which takes longer than processing the 100 files
|
||||
in the chunk. But this is not taking hours to run, so either the
|
||||
test case involves a *lot* more than 10k files, or there is
|
||||
something else involved.
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="OSX mystery resolved. add --batch is effective mitigation"
|
||||
date="2021-06-08T21:56:53Z"
|
||||
content="""
|
||||
> Perhaps on OSX something is making the write-tree significantly slower. Or something is making it run the command more with fewer files per run.
|
||||
|
||||
The latter would be my guess... We seems to get 2 vs 5 splits on Linux vs OSX... our `datalad.utils.CMD_MAX_ARG` (logic is [here](https://github.com/datalad/datalad/blob/HEAD/datalad/utils.py#L103)) gets set to 1048576 on linux and 524288 on OSX. So \"matches\" and \"OSX mystery resolved\"! ;)
|
||||
|
||||
Meanwhile confirming that using `add --batch` mitigates it. With [ad-hoc patch to add add --batched to datalad](https://github.com/datalad/datalad/pull/5722) I get 187.8053s run for our test using `annex add --batch` and bleeding edge annex 57b567ac8 and 183.1654s (so about the same) using annex 9a5981a15 from 20210525 (which takes over hour with splitting); and then with our old splitting and that old git-annex 9a5981a15 I get 188.3987s run.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 18"
|
||||
date="2021-06-08T21:58:20Z"
|
||||
content="""
|
||||
clarification: the \"(which takes over hour with splitting)\" was intended for the \"bleeding edge\" one, not the older one. That one is ok even with splitting.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 19"
|
||||
date="2021-06-08T22:02:33Z"
|
||||
content="""
|
||||
and we seems to not run that test on windows (marked as `@known_failure_windows` ;)) so we do not see that \"regression\" on windows runs (was yet another part of mystery to me why OSX only ;))
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 1"
|
||||
date="2021-06-05T13:50:42Z"
|
||||
content="""
|
||||
Oh, fresh runs datalad tests fail because I guess you implemented that \"don't display message unless takes awhile\" I spotted in one of the comments but didn't have a chance to follow up - I guess from now on we weren't be able to test us passing through messages from git annex (unless there is some setting where were could disable that feature during tests).
|
||||
|
||||
That is unrelated to this issue of slow down though
|
||||
"""]]
|
|
@ -1,104 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="more "mystery resolved" -- identical (empty) keys"
|
||||
date="2021-06-09T21:00:34Z"
|
||||
content="""
|
||||
thank you Joey for timing it up! May be relevant: in our [test scenario](https://github.com/datalad/datalad/blob/master/datalad/support/tests/test_annexrepo.py#L2354) we have 100 directories with 100 files in each one of those and each directory or file name is 100 chars long (not sure if this is relevant either). So doing 5 subsequent adds (on OSX) should result in ~20k paths specified on the command line for each invocation.
|
||||
|
||||
<details>
|
||||
<summary>so I did this little replication script which would do similar drill (just not long filenames for this one)</summary>
|
||||
|
||||
```shell
|
||||
#!/bin/bash
|
||||
|
||||
export PS4='> '
|
||||
set -eu
|
||||
cd \"$(mktemp -d ${TMPDIR:-/tmp}/ann-XXXXXXX)\"
|
||||
|
||||
echo \"Populating the tree\"
|
||||
for d in {0..99}; do
|
||||
mkdir $d
|
||||
for f in {0..99}; do
|
||||
echo \"$d$f\" >> $d/$f
|
||||
done
|
||||
done
|
||||
|
||||
git init
|
||||
git annex init
|
||||
/usr/bin/time git annex add --json {,1}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {2,3}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {4,5}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {6,7}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {8,9}?/* >/dev/null
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>with \"older\" 8.20210429-g9a5981a15 on OSX - stable 30 sec per batch (matches what I observed from running our tests)</summary>
|
||||
|
||||
```shell
|
||||
29.83 real 9.52 user 17.43 sys
|
||||
30.49 real 10.02 user 17.34 sys
|
||||
30.67 real 10.37 user 17.36 sys
|
||||
31.00 real 10.57 user 17.39 sys
|
||||
30.78 real 10.77 user 17.23 sys
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>and the newer 8.20210429-g57b567ac8 -- I got the same-ish nice timing without significant growth! -- damn it</summary>
|
||||
|
||||
```shell
|
||||
31.26 real 10.08 user 18.14 sys
|
||||
31.97 real 10.99 user 18.69 sys
|
||||
31.77 real 11.23 user 18.24 sys
|
||||
32.08 real 11.26 user 18.06 sys
|
||||
32.53 real 11.45 user 18.27 sys
|
||||
```
|
||||
</details>
|
||||
|
||||
so I looked into our test generation again and realized -- we are not populating unique files. They are all empty!
|
||||
|
||||
<details>
|
||||
<summary>and now confirming with this slightly adjusted script which just touches them:</summary>
|
||||
|
||||
```shell
|
||||
#!/bin/bash
|
||||
|
||||
export PS4='> '
|
||||
set -eu
|
||||
cd \"$(mktemp -d ${TMPDIR:-/tmp}/ann-XXXXXXX)\"
|
||||
|
||||
echo \"Populating the tree\"
|
||||
for d in {0..99}; do
|
||||
mkdir $d
|
||||
for f in {0..99}; do
|
||||
touch $d/$f
|
||||
done
|
||||
done
|
||||
|
||||
git init
|
||||
git annex init
|
||||
/usr/bin/time git annex add --json {,1}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {2,3}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {4,5}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {6,7}?/* >/dev/null
|
||||
/usr/bin/time git annex add --json {8,9}?/* >/dev/null
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
the case I had observed
|
||||
|
||||
```
|
||||
(base) yoh@dataladmac2 ~ % bash macos-slow-annex-add-empty.sh
|
||||
...
|
||||
21.65 real 8.47 user 10.89 sys
|
||||
139.96 real 61.51 user 78.18 sys
|
||||
... waiting for the rest -- too eager to post as is for now
|
||||
```
|
||||
|
||||
so -- sorry I have not spotted that peculiarity right from the beginning!
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 22"
|
||||
date="2021-06-09T22:28:06Z"
|
||||
content="""
|
||||
FWIW the complete timing was
|
||||
```
|
||||
21.65 real 8.47 user 10.89 sys
|
||||
139.96 real 61.51 user 78.18 sys
|
||||
253.47 real 112.01 user 142.78 sys
|
||||
370.43 real 161.94 user 211.15 sys
|
||||
481.03 real 210.78 user 274.31 sys
|
||||
```
|
||||
so goes pretty close with what I observed with 4 splits (`30 284 528 720`).
|
||||
"""]]
|
|
@ -1,31 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 23"""
|
||||
date="2021-06-14T14:09:07Z"
|
||||
content="""
|
||||
The file contents all being the same is the crucial thing. On linux,
|
||||
adding 1000 dup files at a time (all in same directory), I get:
|
||||
|
||||
run 1: 0:08
|
||||
run 2: 0:42
|
||||
run 3: 1:14
|
||||
run 4: 1:46
|
||||
|
||||
After run 4, adding 1000 files with all different content takes
|
||||
0:11, so not appreciably slowed down; it only affects adding dups,
|
||||
and only when there are a *lot* of them.
|
||||
|
||||
This feels like quite an edge case, and also not
|
||||
really a new problem, since unlocked files would have already
|
||||
had the same problem before recent changes.
|
||||
|
||||
I thought this might be an innefficiency in sqlite's index, similar to how
|
||||
hash tables can scale poorly when a lot of things end up in the same
|
||||
bucket. But disabling the index did not improve performance.
|
||||
|
||||
Aha -- the slowdown is caused by `git-annex add` looking to see what other
|
||||
annexed files use the same content, so that it can populate any unlocked
|
||||
files that didn't have the content present before. With all these locked
|
||||
files now recorded in the db, it has to check each file in turn, and
|
||||
there's the `O(N^2)`
|
||||
"""]]
|
|
@ -1,17 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 24"""
|
||||
date="2021-06-14T15:36:30Z"
|
||||
content="""
|
||||
If the database recorded when files were unlocked or not, that could be
|
||||
avoided, but tracking that would add a lot of complexity for what is just
|
||||
an edge case. And probably slow things down generally by some amount due to
|
||||
the db being larger.
|
||||
|
||||
It seems almost cheating, but it could remember the last few keys it's added,
|
||||
and avoid trying to populate unlocked files when adding those keys again.
|
||||
This would slow down the usual case by some tiny amount (eg an IORef access)
|
||||
but avoid `O(N^2)` in this edge case. Though it wouldn't fix all edge cases,
|
||||
eg when the files it's adding rotate through X different contents, and X is
|
||||
larger than the number of keys it remembers.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 25"""
|
||||
date="2021-06-14T15:52:37Z"
|
||||
content="""
|
||||
Found an optimisation that sped it up 50%, but this edge case is still
|
||||
`O(N^2)`, so *shrug*.
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="git-annex-add slowdown"
|
||||
date="2021-06-14T16:00:44Z"
|
||||
content="""
|
||||
> git-annex add looking to see what other annexed files use the same content, so that it can populate any unlocked files that didn't have the content present before
|
||||
|
||||
Could this be skipped if `annex.supportunlocked=false`?
|
||||
|
||||
>The file contents all being the same is the crucial thing
|
||||
|
||||
One not-quite-edge case where that happens is when empty files are used as placedholders for outputs of failed steps throughout a large workflow.
|
||||
|
||||
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 27"
|
||||
date="2021-06-14T16:23:41Z"
|
||||
content="""
|
||||
FWIW, without having a clear idea on code logic/data structures, it smells a bit like somewhere may be it could be `any` vs current `all` operation/check (in case of `add` at least) if needed just to test if there is already a path with a given key, and assuming that DB doesn't require a full rewrite of all `path` entries if a new path added for the key.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 28"""
|
||||
date="2021-06-14T16:31:26Z"
|
||||
content="""
|
||||
@Ilya sure can be skipped when annex.supportunlocked=false.
|
||||
I've implemented that. (And also for several other cases that have similar
|
||||
behavior, like dropping a key.)
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 29"""
|
||||
date="2021-06-14T16:32:38Z"
|
||||
content="""
|
||||
@yarik no, it needs to check all files. Consider what happens when `foo` is
|
||||
an annexed link to key K, which is not present, and you copy
|
||||
`.git/annex/objects/../K` from some other repo and run `git annex add K`
|
||||
(or reinject, or get, it's all the same) -- `foo` now points to the content
|
||||
of K. The same needs to also hold true for unlocked files, and so it has to
|
||||
check if `foo` is an unlocked pointer to K and populate the file with the
|
||||
content. Repeat for all the other 9999 files..
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2021-06-07T15:41:43Z"
|
||||
content="""
|
||||
I assume these tests are creating lots of clones of repositories. Are they
|
||||
also doing lots of merges or switching between branches?
|
||||
|
||||
The init scan time has already been optimised as much as seems feasible.
|
||||
The update scan time can still be optimised, see
|
||||
[[speed_up_keys_db_update_with_git_streaming]].
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 30"
|
||||
date="2021-06-14T17:36:16Z"
|
||||
content="""
|
||||
> The same needs to also hold true for unlocked files, and so it has to check if foo is an unlocked pointer to K and populate the file with the content.
|
||||
|
||||
but that should only needs to be done iff K became present/known whenever it was not present before. If K is already known (e.g. was just added for another file, or may be was added in previous \"commit\") no such checks are needed since those files could be expected to already be populated. Right?
|
||||
"""]]
|
|
@ -1,7 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""Re: comment 30"""
|
||||
date="2021-06-14T18:36:11Z"
|
||||
content="""
|
||||
I discussed that approach in comment #24.
|
||||
"""]]
|
|
@ -1,35 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 32"""
|
||||
date="2021-06-14T20:26:52Z"
|
||||
content="""
|
||||
Some thoughts leading to a workable plan:
|
||||
|
||||
It's easy to detect this edge case because getAssociatedFiles will be
|
||||
returning a long list of files. So it could detect say 10 files in the list
|
||||
and start doing something other than the usual, without bothering the usual
|
||||
case with any extra work.
|
||||
|
||||
Git starts to get slow anyway in the 1 million to 10 million file range. So
|
||||
we can assume less than that many files are being added. And there need to
|
||||
be a fairly large number of duplicates of a key for speed to become a problem
|
||||
when adding that key. Around 1000 based on above benchmarks, but 100 would
|
||||
be safer.
|
||||
|
||||
If it's adding 10 million files, there can be at most 10000 keys
|
||||
that have `>=` 1000 duplicates (10 million / 1000).
|
||||
No problem to remember 10000 keys; a key is less than 128 bytes long, so
|
||||
that would take 1250 kb, plus the overhead of the Map. Might as well
|
||||
remember 12 mb worth of keys, to catch 100 duplicates.
|
||||
|
||||
It would be even better to use a bloom filter, which could remember many
|
||||
more, and I thought I had a way, but the false positive case seems the
|
||||
wrong way around. If the bloom filter remembers keys that have already had
|
||||
their associated files populated, then a false positive would prevent doing
|
||||
that for a key that it's not been done for.
|
||||
|
||||
It would make sense to do this not only in populateUnlockedFiles but in
|
||||
Annex.Content.moveAnnex and Annex.Content.removeAnnex. Although removeAnnex
|
||||
would need a different bloom filter, since a file might have been populated
|
||||
and then somehow get removed in the same git-annex call.
|
||||
"""]]
|
|
@ -1,23 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 33"""
|
||||
date="2021-06-15T13:01:04Z"
|
||||
content="""
|
||||
Oh, there's a much better solution: If the annex object file already exists
|
||||
when ingesting a new file, skip populating other associated files. They
|
||||
will have already been populated. moveAnnex has to check if the annex object
|
||||
file already exists anyway, so this will have zero overhead.
|
||||
|
||||
(Maybe that's what yarik was getting at in comment #30)
|
||||
|
||||
Implemented that, and here's the results, re-running my prior benchmark:
|
||||
|
||||
run 1: 0:03.14
|
||||
run 2: 0:03.24
|
||||
run 3: 0.03.35
|
||||
run 4: 0.03.45
|
||||
run 9: 0:03.65
|
||||
|
||||
That also shows the actual overhead of the diffing of the index,
|
||||
as its size grows, is quite small.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 34"
|
||||
date="2021-06-15T15:38:16Z"
|
||||
content="""
|
||||
FWIW: Yes! ;-)
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="thanks"
|
||||
date="2021-06-16T19:54:01Z"
|
||||
content="""
|
||||
Thanks @joeyh for sorting this out ;)
|
||||
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 36"
|
||||
date="2021-06-16T21:32:58Z"
|
||||
content="""
|
||||
yes, awesome, thank you Joey!
|
||||
|
||||
FWIW, here is the little trail of our tests (non-adjusted in released version) timings
|
||||
|
||||
```
|
||||
$> grep -h Ran 0[56]/cron-*/build-macos.yaml-*/*_test-datalad\ \(release\).txt
|
||||
2021-05-01T04:28:55.4957910Z Ran 1257 tests in 6033.095s
|
||||
2021-05-02T04:24:43.3005800Z Ran 1257 tests in 5722.774s
|
||||
2021-05-03T04:24:54.5178860Z Ran 1257 tests in 5846.700s
|
||||
...
|
||||
2021-05-24T04:39:07.5375680Z Ran 1257 tests in 5710.372s
|
||||
2021-05-25T04:54:01.2232460Z Ran 1257 tests in 6459.339s
|
||||
2021-05-26T06:41:04.2956530Z Ran 1257 tests in 11655.761s
|
||||
2021-05-27T07:07:28.4722960Z Ran 1257 tests in 11954.671s
|
||||
...
|
||||
2021-06-13T05:59:36.9626910Z Ran 1257 tests in 11987.233s
|
||||
2021-06-14T06:05:36.1668770Z Ran 1257 tests in 12135.290s
|
||||
2021-06-15T04:45:33.3489450Z Ran 1257 tests in 7046.095s
|
||||
2021-06-16T04:26:46.1256550Z Ran 1257 tests in 5706.314s
|
||||
```
|
||||
so we are indeed back to normal (if not better ;)) !
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-06-07T15:49:51Z"
|
||||
content="""
|
||||
Also worth saying: When I have a choice between a bug fix and a performance
|
||||
change, I kind of have to pick the bug fix.
|
||||
[[bugs/indeterminite_preferred_content_state_for_duplicated_file]] was a
|
||||
longstanding bug that could cause *very* expensive misbehavior. The only
|
||||
way to fix it that would not have an on performance would be to
|
||||
remove include= and exclude= from the preferred content expression syntax,
|
||||
which would prevent a lot of current uses of preferred content.
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="deferring the scan"
|
||||
date="2021-06-07T17:41:44Z"
|
||||
content="""
|
||||
>The only way to fix it that would not have an on performance would be to remove include= and exclude= from the preferred content expression syntax
|
||||
|
||||
What about deferring the scan until the first command that uses `include=/exclude=` gets run, if `annex.supportunlocked=false`?
|
||||
Also, could the scan be limited to the files that match one of the include/exclude globs?
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-06-07T20:22:24Z"
|
||||
content="""
|
||||
Turns out `git-annex init` got a lot slower than it had to, it was doing
|
||||
the same kind of scan twice. I think that probably explains much of the
|
||||
slowdown you saw. (Although it still needs to do somewhat more work than
|
||||
before, but also does it more efficiently than before.)
|
||||
|
||||
Also I've optimised the update scan's use of git, around a 20% speedup,
|
||||
although I don't know if your test case speed was impacted much by that
|
||||
scan anyway.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2021-06-07T20:42:34Z"
|
||||
content="""
|
||||
Looking at the CI log, I don't see any past runs that took 1h 46min.
|
||||
A month ago they were taking 2h 6min. Let's see if the changes I'm pushing
|
||||
now drop it back to that.
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="clarification"
|
||||
date="2021-06-07T21:20:35Z"
|
||||
content="""
|
||||
For completeness
|
||||
|
||||
> Looking at the CI log, I don't see any past runs that took 1h 46min. A month ago they were taking 2h 6min.
|
||||
|
||||
I was looking at OSX ones. And the timing was for the one right before slow down: It was [this one](https://github.com/datalad/git-annex/actions/runs/873390021), but even [previous one](https://github.com/datalad/git-annex/actions/runs/869970007) was 1h40min
|
||||
|
||||
> Let's see if the changes I'm pushing now drop it back to that.
|
||||
|
||||
🤞
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 8"
|
||||
date="2021-06-07T21:39:05Z"
|
||||
content="""
|
||||
FWIW:
|
||||
|
||||
> I assume these tests are creating lots of clones of repositories.
|
||||
|
||||
lots of `git annex init` in new git repos. Also a good number of clones AFAIK. FWIW: we used to do more clones instead of just recreating new repos per test, but then we switched to just creating a new repo. We still have a good number though of cloning.
|
||||
|
||||
> Are they also doing lots of merges or switching between branches?
|
||||
|
||||
Not that many AFAIK.
|
||||
"""]]
|
|
@ -1,33 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 9"""
|
||||
date="2021-06-08T13:45:18Z"
|
||||
content="""
|
||||
I do not see 1h40min anywhere on the page you linked. It says
|
||||
2h 6m at the top. Oh I see, you are clicking around to get to
|
||||
https://github.com/datalad/git-annex/runs/2661363385?check_suite_focus=true,
|
||||
and that number is in there next to "Run datalad tests".
|
||||
|
||||
My improvements yesterday did not improve your test suite time any. But
|
||||
they certainly sped it up a *lot* in my benchmarks. So I think what you
|
||||
think is taking more time, eg the scanning at init, does not really have
|
||||
much to do with whatever is really taking more time. And if your test suite
|
||||
is mostly not cloning repos but is initializing new empty repos, then the
|
||||
scanning at init was and still is effectively a noop, it will not have
|
||||
gotten more expensive.
|
||||
|
||||
It might be that the incremental updating git-annex now does when it sees
|
||||
changes to the index is making your test suite run a bit longer. But twice
|
||||
as long? That is not a very expensive process.
|
||||
|
||||
Also notice that the git-annex test part of the CI job used to take
|
||||
15m54s and is now taking 17m29s. That's doing some cloning and some adding
|
||||
of files etc, and it didn't double in run time.
|
||||
|
||||
I did find another nice optimisation this morning
|
||||
[[!commit c831a562f550e6ff93e081f555eced3a8a0d524f]], so we can see if that
|
||||
improves things in the next run.
|
||||
|
||||
I suspect you're going to have to look at specific parts of your test suite
|
||||
and/or the CI system to identify what's slower though.
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue