move old closed bugs to datalad's bugs

These were not tagged datalad, but they did pay to get them fixed, and
it impacted them as seen in the bug reports..
This commit is contained in:
Joey Hess 2023-01-05 14:57:27 -04:00
parent aab79dded8
commit acdd5fbab6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
53 changed files with 0 additions and 0 deletions

View file

@ -0,0 +1,138 @@
### Please describe the problem.
edit: I said "lustre" but it is isilon which is under. My wrong/outdated/mixed up knowledge of the system
Finally we got what should be a normal "POSIX" NFS mount (no ACL) on a local HPC. I immediately ventured to test git-annex (and datalad) on that file system. git annex 8.20210803-g99bb214 and git 2.30.2 both coming from conda.
with basic operations -- it felt like it works ok.
but then when I ran tests while pointing TMPDIR to this location, e.g. `TMPDIR=$PWD git annex test 2>&1 | tee git-annex-noretry+pidlock1-1.log` - `git-annex` is far from a happy camper.
Note: `tmp` path below is actually a `/dartfs/rc/lab/D/DBIC/DBIC/archive/tmp`
<details>
<summary>Not sure if related or not but git-annex upon init seems to not detect the need for pidlock </summary>
```shell
[d31548v@discovery7 tmp]$ ( PS4='[$(date)] > '; set -x; mkdir test; cd test; git init; git annex --debug init; cat .git/config; )
[Wed Aug 11 16:14:03 EDT 2021] > mkdir test
[Wed Aug 11 16:14:03 EDT 2021] > cd test
[Wed Aug 11 16:14:04 EDT 2021] > git init
Initialized empty Git repository in /dartfs/rc/lab/D/DBIC/DBIC/archive/tmp/test/.git/
[Wed Aug 11 16:14:04 EDT 2021] > git annex --debug init
init [2021-08-11 16:14:04.936362] (Utility.Process) process [10184] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2021-08-11 16:14:04.982236] (Utility.Process) process [10184] done ExitFailure 1
[2021-08-11 16:14:04.984294] (Utility.Process) process [10185] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--verify","-q","refs/remotes/origin/git-annex"]
[2021-08-11 16:14:05.019706] (Utility.Process) process [10185] done ExitFailure 1
[2021-08-11 16:14:05.029391] (Utility.Process) process [10187] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","write-tree"]
[2021-08-11 16:14:05.118706] (Utility.Process) process [10187] done ExitSuccess
[2021-08-11 16:14:05.121008] (Utility.Process) process [10189] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","4b825dc642cb6eb9a060e54bf8d69288fbee4904","--no-gpg-sign"]
[2021-08-11 16:14:05.197601] (Utility.Process) process [10189] done ExitSuccess
[2021-08-11 16:14:05.218776] (Utility.Process) process [10190] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","4570bef1d852f540b8bb1bcf7aa778cf48d8743a"]
[2021-08-11 16:14:05.342053] (Utility.Process) process [10190] done ExitSuccess
[2021-08-11 16:14:05.380591] (Utility.Process) process [10191] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","annex.uuid","4f1069df-9aa9-4e72-92dc-b1f5ac6946d5"]
[2021-08-11 16:14:05.428772] (Utility.Process) process [10191] done ExitSuccess
[2021-08-11 16:14:05.432081] (Utility.Process) process [10193] read: git ["config","--null","--list"]
[2021-08-11 16:14:05.502382] (Utility.Process) process [10193] done ExitSuccess
[2021-08-11 16:14:05.635239] (Utility.Process) process [10202] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2021-08-11 16:14:05.698023] (Utility.Process) process [10202] done ExitSuccess
[2021-08-11 16:14:05.699988] (Utility.Process) process [10203] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2021-08-11 16:14:05.734066] (Utility.Process) process [10203] done ExitSuccess
[2021-08-11 16:14:05.741208] (Utility.Process) process [10204] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..4570bef1d852f540b8bb1bcf7aa778cf48d8743a","--pretty=%H","-n1"]
[2021-08-11 16:14:05.794383] (Utility.Process) process [10204] done ExitSuccess
[2021-08-11 16:14:05.827348] (Utility.Process) process [10207] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
[2021-08-11 16:14:05.832331] (Utility.Process) process [10208] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2021-08-11 16:14:05.912843] (Utility.Process) process [10212] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","annex.version","8"]
[2021-08-11 16:14:06.014334] (Utility.Process) process [10212] done ExitSuccess
[2021-08-11 16:14:06.017332] (Utility.Process) process [10213] read: git ["config","--null","--list"]
[2021-08-11 16:14:06.09542] (Utility.Process) process [10213] done ExitSuccess
[2021-08-11 16:14:06.097328] (Utility.Process) process [10215] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","status","--porcelain"]
[2021-08-11 16:14:06.199386] (Utility.Process) process [10215] done ExitSuccess
[2021-08-11 16:14:06.201524] (Utility.Process) process [10217] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","filter.annex.smudge","git-annex smudge -- %f"]
[2021-08-11 16:14:06.259174] (Utility.Process) process [10217] done ExitSuccess
[2021-08-11 16:14:06.262019] (Utility.Process) process [10218] read: git ["config","--null","--list"]
[2021-08-11 16:14:06.332769] (Utility.Process) process [10218] done ExitSuccess
[2021-08-11 16:14:06.334786] (Utility.Process) process [10221] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","config","filter.annex.clean","git-annex smudge --clean -- %f"]
[2021-08-11 16:14:06.396427] (Utility.Process) process [10221] done ExitSuccess
[2021-08-11 16:14:06.39987] (Utility.Process) process [10224] read: git ["config","--null","--list"]
[2021-08-11 16:14:06.454542] (Utility.Process) process [10224] done ExitSuccess
[2021-08-11 16:14:06.871684] (Utility.Process) process [10228] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","symbolic-ref","-q","HEAD"]
[2021-08-11 16:14:06.923312] (Utility.Process) process [10228] done ExitSuccess
[2021-08-11 16:14:06.975588] (Utility.Process) process [10231] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","refs/heads/master"]
[2021-08-11 16:14:07.015031] (Utility.Process) process [10231] done ExitFailure 1
[2021-08-11 16:14:07.017599] (Utility.Process) process [10233] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","symbolic-ref","-q","HEAD"]
[2021-08-11 16:14:07.056731] (Utility.Process) process [10233] done ExitSuccess
[2021-08-11 16:14:07.059379] (Utility.Process) process [10234] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/master"]
[2021-08-11 16:14:07.100443] (Utility.Process) process [10234] done ExitFailure 1
[2021-08-11 16:14:07.102482] (Utility.Process) process [10236] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","checkout","-q","-B","master"]
[2021-08-11 16:14:07.157509] (Utility.Process) process [10236] done ExitSuccess
[2021-08-11 16:14:07.208637] (Utility.Process) process [10238] read: uname ["-n"]
[2021-08-11 16:14:07.209729] (Utility.Process) process [10238] done ExitSuccess
[2021-08-11 16:14:07.292876] (Annex.Branch) read uuid.log
[2021-08-11 16:14:07.392149] (Annex.Branch) set uuid.log
[2021-08-11 16:14:07.393272] (Annex.Branch) read remote.log
ok
[2021-08-11 16:14:07.536729] (Utility.Process) process [10244] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","hash-object","-w","--stdin-paths","--no-filters"]
[2021-08-11 16:14:07.54018] (Utility.Process) process [10245] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-index","-z","--index-info"]
[2021-08-11 16:14:07.760348] (Utility.Process) process [10245] done ExitSuccess
[2021-08-11 16:14:07.76284] (Utility.Process) process [10246] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2021-08-11 16:14:07.800127] (Utility.Process) process [10246] done ExitSuccess
(recording state in git...)
[2021-08-11 16:14:07.805187] (Utility.Process) process [10247] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","write-tree"]
[2021-08-11 16:14:07.948696] (Utility.Process) process [10247] done ExitSuccess
[2021-08-11 16:14:07.950889] (Utility.Process) process [10248] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","2ebb8bbc83b47982a995f66975d4265986f005b8","--no-gpg-sign","-p","refs/heads/git-annex"]
[2021-08-11 16:14:08.022394] (Utility.Process) process [10248] done ExitSuccess
[2021-08-11 16:14:08.024889] (Utility.Process) process [10252] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","83fb8ae08e2e0ea030f3ce75aa840d27838859b6"]
[2021-08-11 16:14:08.172372] (Utility.Process) process [10252] done ExitSuccess
[2021-08-11 16:14:08.349523] (Utility.Process) process [10207] done ExitSuccess
[2021-08-11 16:14:08.35006] (Utility.Process) process [10208] done ExitSuccess
[2021-08-11 16:14:08.352912] (Utility.Process) process [10244] done ExitSuccess
[Wed Aug 11 16:14:08 EDT 2021] > cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[annex]
uuid = 4f1069df-9aa9-4e72-92dc-b1f5ac6946d5
version = 8
[filter "annex"]
smudge = git-annex smudge -- %f
clean = git-annex smudge --clean -- %f
```
</details>
so I did set it to `1` globally (in `~/.gitconfig`). Only later discovered that it should be `true` not `1` but it nohow seems affected git-annex operation in the tests.
Full log from tests run is available at [http://www.onerussian.com/tmp/git-annex-noretry+pidlock1-1.log](http://www.onerussian.com/tmp/git-annex-noretry+pidlock1-1.log)
and a common pattern something like
```
fsck downloaded object: OK (0.02s)
retrieveKeyFile resume from 0: FAIL
Exception: .git/annex/tmp/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key: openFile: permission denied (Permission denied)
fsck downloaded object: OK (0.02s)
retrieveKeyFile resume from 33%: FAIL
Exception: .git/annex/objects/1X/90/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key: openBinaryFile: does not exist (No such file or directory)
fsck downloaded object: OK (0.03s)
retrieveKeyFile resume from end: cp: cannot stat '.git/annex/objects/1X/90/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key/SHA256E-s1048577--2c30628691d7a7e5b16237fd74efb4bf365db003e19e6fbe78d96c15396819e5.this-is-a-test-key': No such file or directory
FAIL (0.28s)
./Command/TestRemote.hs:292:
failed
```
I thought that may be it is due to running into the path limit, but I could touch a much (twice) longer path no problem and longest filename of 257 characters can be created.
details of the mount with minimal sensoring with XXXs:
```
/ifs/rc/dartfs/rc on /dartfs/rc type nfs4 (rw,nosuid,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5,clientaddr=XXX.XXX.XXX.XXX,local_lock=none,addr=10.1.16.105)
```
any ideas Joey? or how to troubleshoot on this beast further?
[[!meta title="357 out of 984 tests fail on NFS isilon mount"]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 10"
date="2021-08-27T01:38:54Z"
content="""
ha -- I have mentioned that there is `--no-preserve` so I have tried it with -a to exclude xattr and it seems to do the trick as well
```
(datalad) [d31548v@discovery7 _test_]$ rm -f bar baz
(datalad) [d31548v@discovery7 _test_]$ echo bar > bar
(datalad) [d31548v@discovery7 _test_]$ chmod 444 bar
(datalad) [d31548v@discovery7 _test_]$ cp --reflink=auto -a --no-preserve=xattr bar baz
(datalad) [d31548v@discovery7 _test_]$ chmod 644 baz
(datalad) [d31548v@discovery7 _test_]$ ls -l baz
-rw-r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:36 baz
```
so may be just worth explicitly excluding xattr instead of explicitly including some of the others? ;)
"""]]

View file

@ -0,0 +1,44 @@
[[!comment format=mdwn
username="joey"
subject="""comment 11"""
date="2021-08-27T16:41:45Z"
content="""
I've made the change to use that option. I think this should clear the test
suite.
I see that Isilon is a NAS, so some proprietary and broken NFS server
apparently. Kind of explains why cp -a would do this, because I found
threads about cp -a preserving NFS xattrs, and people wanted it to,
presumably for other reasons on less broken NFS servers.
This does raise the question of what might happen if someone copies
a file themselves with cp -a on this NFS and then git-annex adds the copy.
Probably git-annex would then be unable to remove the write bit
from the annex object file.
I also worried about ACLS, but it seems like ACLs do not have this
problem, because chmod a-w causes the write ACL that was set to be
effectively unset:
joey@darkstar:~>chmod -w foo
joey@darkstar:~>setfacl -m g:nogroup:rw foo
joey@darkstar:~>ls -l foo
-r--rw-r--+ 1 joey joey 0 Aug 27 12:53 foo
nobody@darkstar:/home/joey$ echo test >> foo
joey@darkstar:~>chmod a-w foo
joey@darkstar:~>ls -l foo
-r--rw-r--+ 1 joey joey 0 Aug 27 12:53 foo
nobody@darkstar:/home/joey$ echo test >> foo
bash: foo: Permission denied
joey@darkstar:~>getfacl foo
# file: foo
# owner: joey
# group: joey
user::r--
group::r--
group:nogroup:rw- #effective:r--
mask::r--
other::r--
I've verified that git-annex add also clears the write ACL.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 12"""
date="2021-08-27T17:13:54Z"
content="""
I've also made git-annex add check, after removing write bits,
if the file still has write bits set. It will refuse to add a file
when it can't lock it down.
That should avoid the NFS xattr problem in a situation where
cp -a was used to make a copy that then gets added to the annex.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 13"
date="2021-08-27T18:46:39Z"
content="""
Great, thank you, I will test after the next daily build. What would be the easiest way to establish such an xattr enabled NFS setup, so we could tune up [our test on NFS scenarios](https://github.com/datalad/git-annex/blob/master/.github/workflows/build-ubuntu.yaml#L221) to catch any possible future regression?
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 14"""
date="2021-08-30T15:59:29Z"
content="""
It seems likely to me that this NFS server is just broken. Why would a
well-designed NFS server behave this way, rather than letting a chmod
override the xattr that was set earlier?
I considered filing a bug report on coreutils or something about cp
preserving the NFS xattr leading to this problem, but since it seems likely
to be the NFS server on the NAS that is at fault, didn't think that would
be productive.
If you did manage to reproduce that behavior with a regular
linux NFS server, it seems like it would be a grounds for a bug on some
part of linux..
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="just an update on my testing"
date="2021-08-30T22:17:53Z"
content="""
FWIW, I've tested a debianstadalone build of `8.20210803+git153-g63c2cfd47-1~ndall+1` on that system, and besides 3 fails due to no `--kill` in elderly system wide gpgconf (apparently debian bundle does not come with one) -- it is all clean! (although took nearly a full work day to complete on that filesystem: `3 out of 984 tests failed (18157.38s)`)
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="note on `sec=` on the mount"
date="2021-08-16T21:38:13Z"
content="""
Just now, while comparing with operation on another mount point (on NFS v3, not 4 though), noted that this mount point is `sec=krb5` (and NFS 3 one is `sec=sys`)... so might be what is contributing here
"""]]

View file

@ -0,0 +1,42 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2021-08-24T16:01:30Z"
content="""
The "retrieveKeyFile resume from 0%" test
starts by writing an empty file with default permissions.
It then does the download storing to that file.
The permission denied is happening when trying to open that file
later, to write to it. So this seems like a umask or some other
kind of permissions problem. Maybe filesystem related in some way,
if the filesystem is somehow not honoring permissions, but that would
not be my first or second guess.
The other 2 failures shown do chain on from that
initial failure, since that test does some things that the
later tests depend on. Looking through the log, there are several other
test failures that also seem due to permissions, that all look like this,
or probably chain from this:
add: FAIL (4.08s)
./Test/Framework.hs:324:
unable to modify sha1foo
And that error message happens when it's checking that the file
permissions include the write bit.
So, your next step should be something like:
umask
touch foo
ls -l foo
echo foo >> foo
I tried with `umask 0222` (new files lack any write bit) but it failed a lot
earlier, while setting up the test suite. Which seems ok itself given how
restricted and weird that umask is. If the test suite overrode the umask, it
would not notice if some actual reasonable umask caused problems. But I'm
getting ahead of confirmation that it is a umask problem.
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2021-08-24T18:06:10Z"
content="""
```
[d31548v@discovery7 _test_]$ umask
0022
[d31548v@discovery7 _test_]$ touch foo
[d31548v@discovery7 _test_]$ ls -l foo
-rw-r--r-- 1 d31548v rc-DBIC-bids 0 Aug 24 13:53 foo
[d31548v@discovery7 _test_]$ echo foo >> foo
```
I blindly think it is somehow due to that `sec=krb5` nfs mount option. didn't try to reproduce the setup \"locally\" (would need all the krb config I guess) to see if I am right on that.
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-08-24T18:51:41Z"
content="""
Somehow the write bit is not being set. Since the umask is ok, the next
most likely thing seems to be NFS either not setting it, or temporarily
forgetting it's set, or the file somehow being created owned by a different
user.
Could run: `git-annex test --keep-failures` and then when it gets
to one of the "unable to modify foo" failures, it will print out the path
to the repository that it failed in, and you can "ls -l" the path,
which should include the "foo" file.
"""]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 5"
date="2021-08-25T22:05:06Z"
content="""
```
[d31548v@discovery7 _test_]$ grep -A1 -m1 'unable to modify foo' git-annex-8.20210803-g99bb214-20210925-1.log
unable to modify foo
export_import_subdir: ** Preserving repo for failure analysis in .t/tmprepo5
[d31548v@discovery7 _test_]$ ls -la .t/tmprepo5
total 239
drwxr-xr-x 4 d31548v rc-DBIC-bids 112 Aug 25 10:59 .
drwxr-xr-x 205 d31548v rc-DBIC-bids 5525 Aug 25 11:31 ..
-rw-r--r-- 1 d31548v rc-DBIC-bids 19 Aug 25 10:58 bar.c
drwxr-xr-x 2 d31548v rc-DBIC-bids 0 Aug 25 10:59 dir
-r--r--r-- 1 d31548v rc-DBIC-bids 20 Aug 25 10:58 foo
drwxr-xr-x 8 d31548v rc-DBIC-bids 296 Aug 25 10:59 .git
-r--r--r-- 1 d31548v rc-DBIC-bids 25 Aug 25 10:58 sha1foo
```
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2021-08-26T15:06:26Z"
content="""
So confirmed to be actual missing write bits.
I think that the next step is going to be to dig through a `strace -f -s 1024`,
of the "resume from 0" file that has the permission denied problem,
and of the ".t/tmprepo5/foo" file that is somehow not writable.
Then I can compare with a strace where it does not fail, to see
if it's doing something different.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 7"
date="2021-08-26T16:49:56Z"
content="""
details on what I have ran and logs went via email in response to your last email response ;)
"""]]

View file

@ -0,0 +1,66 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2021-08-26T20:18:05Z"
content="""
From the strace of `git-annex test -p export_import_subdir`,
I have found what I think is the smoking gun:
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
21179 chmod(".git/annex/othertmp/foo.0/foo", 0100644) = 0
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
21179 rename(".git/annex/othertmp/foo.0/foo", "./foo") = 0
This seems like proof that NFS is losing the file permissions. chmod 644
followed immediately by stat showing the mode is still 444, with nothing
happening in between according to strace.
The full set of operations on that temp file is:
21179 unlink(".git/annex/othertmp/foo.0/foo") = -1 ENOENT (No such file or directory)
21179 vfork( <unfinished ...>
21217 execve("/dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/bin/cp", ["cp", "--reflink=auto", "-a", ".git/annex/objects/Kj/0x/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77", ".git/annex/othertmp/foo.0/foo"], 0x7fff09e296f8 /* 77 vars */ <unfinished ...>
21179 <... vfork resumed>) = 21217
21217 stat(".git/annex/othertmp/foo.0/foo", 0x7ffc13f0d430) = -1 ENOENT (No such file or directory)
21217 newfstatat(AT_FDCWD, ".git/annex/othertmp/foo.0/foo", 0x7ffc13f0d1b0, 0) = -1 ENOENT (No such file or directory)
21217 openat(AT_FDCWD, ".git/annex/othertmp/foo.0/foo", O_WRONLY|O_CREAT|O_EXCL, 0400) = 4
21217 fstat(4, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
21217 ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
21217 write(4, "annexed file content", 20) = 20
21217 utimensat(4, NULL, [{tv_sec=1629996169, tv_nsec=810295000} /* 2021-08-26T12:42:49.810295000-0400 */, {tv_sec=1629996169, tv_nsec=810295000} /* 2021-08-26T12:42:49.810295000-0400 */], 0 <unfinished ...>
21217 <... utimensat resumed>) = 0
21217 fchmod(4, 0600) = 0
21217 fsetxattr(4, "system.nfs4_acl", "\0\0\0\3\0\0\0\0\0\0\0\0\0\26\1\211\0\0\0\6OWNER@\0\0\0\0\0\0\0\0\0\0\0\22\0\211\0\0\0\6GROUP@\0\0\0\0\0\0\0\0\0\0\0\22\0\211\0\0\0\tEVERYONE@\0\0", 80, 0) = 0
21217 fchmod(4, 0400) = 0
21217 fsetxattr(4, "system.posix_acl_access", "\2\0\0\0\1\0\4\0\377\377\377\377\4\0\4\0\377\377\377\377 \0\4\0\377\377\377\377", 28, 0) = -1 EOPNOTSUPP (Operation not supported)
21217 fchmod(4, 0100444) = 0
21217 close(4) = 0
21179 stat(".git/annex/othertmp/foo.0/foo", <unfinished ...>
21179 <... stat resumed>{st_mode=S_IFREG|0444, st_size=20, ...}) = 0
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
21179 chmod(".git/annex/othertmp/foo.0/foo", 0100644) = 0
21179 stat(".git/annex/othertmp/foo.0/foo", {st_mode=S_IFREG|0444, st_size=20, ...}) = 0
21179 rename(".git/annex/othertmp/foo.0/foo", "./foo") = 0
So, cp -a copies the object file to the temp file, leaving it 444 at the end,
as it should be since the object file had that mode. And the following
chmod 644 has no effect.
That `nfs4_acl` seems like it contains effectively the permissions of the file,
which were 444 at that point. Could that xattr override the later chmod?
It would be worth trying this:
# use runshell to get the same cp that git-annex uses
/dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
echo bar > bar
chmod 444 bar
cp --reflink=auto -a bar baz
chmod 644 baz
ls -l baz
If preserving the `nfs4_acl` is indeed the problem, then a fix in git-annex
might be possible. (Even though this feels like a NFS bug)
Replace `cp -a` with `cp -dR --preserve=mode,ownership,timestamps`
so omitting the xattr preservation.
"""]]

View file

@ -0,0 +1,67 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 9"
date="2021-08-27T01:34:18Z"
content="""
not sure if related, but I noticed you said to use `/dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell` which corresponds to the git-annex I have available upon login, but the tests were ran using git annex installed in a conda env `8.20200502-g55acb2e52` `~/.conda/envs/datalad/bin/git-annex`, which is a \"proper\" (not standalone) build and lacks `runshell`, and I believe would just use system wide `cp`.
# in that /dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
```
(datalad) [d31548v@discovery7 _test_]$ /dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
$ pwd
/dartfs/rc/lab/D/DBIC/DBIC/inbox/BIDS/_test_
$ echo bar > bar
chmod 444 bar
cp --reflink=auto -a bar baz
chmod 644 baz
ls -l baz$ $ $ $
-r--r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:27 baz
```
now outside using default cp -- the same
```shell
(datalad) [d31548v@discovery7 _test_]$ echo bar > bar
(datalad) [d31548v@discovery7 _test_]$ chmod 444 bar
(datalad) [d31548v@discovery7 _test_]$ cp --reflink=auto -a bar baz
(datalad) [d31548v@discovery7 _test_]$ chmod 644 baz
(datalad) [d31548v@discovery7 _test_]$ ls -l baz
-r--r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:29 baz
```
and indeed we do get a different result, with `w`, if running under a regular `/tmp`:
```
(datalad) [d31548v@discovery7 tmp]$ pwd
/tmp
(datalad) [d31548v@discovery7 tmp]$ echo bar > bar
(datalad) [d31548v@discovery7 tmp]$ chmod 444 bar
(datalad) [d31548v@discovery7 tmp]$ cp --reflink=auto -a bar baz
(datalad) [d31548v@discovery7 tmp]$ chmod 644 baz
(datalad) [d31548v@discovery7 tmp]$ ls -l baz
-rw-r--r-- 1 d31548v rc-users 4 Aug 26 21:29 baz
```
Now, using the alternative to `-a` setting -- retains the `w` in both `runshell` and outside:
```
(datalad) [d31548v@discovery7 _test_]$ /dartfs-hpc/admin/opt/oldopt/git-annex/8.20200522/runshell
$ echo bar > bar
chmod 444 bar
cp --reflink=auto -dR --preserve=mode,ownership,timestamps bar baz
chmod 644 baz
ls -l baz$ $ $ $
-rw-r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:32 baz
$
(datalad) [d31548v@discovery7 _test_]$ rm -f bar baz
(datalad) [d31548v@discovery7 _test_]$ echo bar > bar
(datalad) [d31548v@discovery7 _test_]$ chmod 444 bar
(datalad) [d31548v@discovery7 _test_]$ cp --reflink=auto -dR --preserve=mode,ownership,timestamps bar baz
(datalad) [d31548v@discovery7 _test_]$ chmod 644 baz
(datalad) [d31548v@discovery7 _test_]$ ls -l baz
-rw-r--r-- 1 d31548v rc-DBIC-bids 4 Aug 26 21:32 baz
```
"""]]

View file

@ -0,0 +1,19 @@
### Please describe the problem.
With recent RFing of scanning for unlocked/annexed files (I guess), a sweep of datalad tests on OSX started to take about 3h 30min instead of prior 1h 46min. So pretty much twice. Besides possibly affecting user experience, I am afraid that would cause too much ripples though our CI setup which might not run out of time
Logs etc are at https://github.com/datalad/git-annex/actions/workflows/build-macos.yaml
The first red is ok, just a fluke but then they all fail due to change in output log string (for which there is a fix but somehow behavior on osx seems different, yet to check).
### What version of git-annex are you using? On what operating system?
Currently 8.20210428+git282-gd39dfed2a and first got slow with
8.20210428+git228-g13a6bfff4 and was ok with 8.20210428+git202-g9a5981a15
[[!meta title="performance edge case when adding large numbers of identical files"]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="slow down is OSX specific"
date="2021-06-08T14:28:18Z"
content="""
Thank you Joey for all the OPTs.
Looking at it again: the significant impact seems to be OSX builds specific: Linux and Windows builds did get just a bit slower (if at all). So may be situation is not that dire ;-) Two possibilities:
- Even though coincidences like this are unlikely, it is possible that OSX VMs on Github actions got slower?
- may be the RFing is somehow effecting OSX specifically?
I will try to do some timings locally. Also I should finally get those logs (and built packages) into some more convenient form using [tinuous](https://github.com/con/tinuous/) John has been developing -- would allow for easy greps instead of all these jumping through CIs etc.
"""]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="joey"
subject="""comment 11"""
date="2021-06-08T15:41:23Z"
content="""
I can't think of anything OSX specific in the recent changes.
I have added debugging of when reconcileStaged wakes up and possibly
wastes some time, eg:
joey@darkstar:~/tmp/big> git config annex.debug true
joey@darkstar:~/tmp/big> git config annex.debugfilter Database.Keys
joey@darkstar:~/tmp/big> git-annex init
init
[2021-06-08 11:52:11.854202926] (Database.Keys) reconcileStaged start
(scanning for annexed files...)
[2021-06-08 11:52:44.092620256] (Database.Keys) reconcileStaged end
ok
All the new work happens in between those two debugs[1], so you could check if
the time sink is there or elsewhere.
(Note that the last release takes 2 seconds longer for that init than
it does now..)
[1] With the exception of a single call to `git write-tree`, but that
should be very fast.
"""]]

View file

@ -0,0 +1,153 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="all recent builds/logs are fetched to smaug"
date="2021-06-08T16:50:11Z"
content="""
Finished fetching all the builds (logs + artifacts, i.e. built installers and packages) to smaug. FWIW `joey` user on smaug should be able to access them at /mnt/datasets/datalad/ci/git-annex as well (it is all git/git-annex with submodules happen you want to get a \"clone\" ;))
<details>
<summary>grep confirms timing for datalad testson OSX</summary>
```shell
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/0[56]/*/cron-*/*/*macOS*/*_test-datalad\ \(master\).txt
2021-05-01T05:47:43.6623300Z Ran 1269 tests in 10381.041s
2021-05-02T04:27:47.4468150Z Ran 1269 tests in 5835.199s
2021-05-03T04:25:06.4741780Z Ran 1269 tests in 5817.953s
2021-05-04T04:26:25.2725400Z Ran 1269 tests in 5925.938s
2021-05-05T04:15:45.7241110Z Ran 1270 tests in 6003.093s
2021-05-06T04:14:18.9203140Z Ran 1270 tests in 5821.084s
2021-05-07T04:19:12.5257430Z Ran 1270 tests in 5887.912s
2021-05-08T04:17:21.4790320Z Ran 1270 tests in 5775.691s
2021-05-09T04:20:53.2085730Z Ran 1270 tests in 5961.437s
2021-05-10T04:21:52.2492300Z Ran 1270 tests in 5698.905s
2021-05-11T04:28:39.1451570Z Ran 1270 tests in 5899.918s
2021-05-12T04:28:26.6181110Z Ran 1270 tests in 5964.851s
2021-05-13T04:30:47.2978230Z Ran 1277 tests in 5938.297s
2021-05-14T04:37:51.3172970Z Ran 1277 tests in 5921.571s
2021-05-15T04:31:29.6466840Z Ran 1277 tests in 6060.762s
2021-05-16T04:33:26.5552670Z Ran 1277 tests in 5789.405s
2021-05-17T04:36:52.4460760Z Ran 1277 tests in 5855.382s
2021-05-18T04:37:24.5613580Z Ran 1278 tests in 6007.685s
2021-05-19T04:34:37.3253830Z Ran 1279 tests in 5940.204s
2021-05-20T04:36:10.3184010Z Ran 1280 tests in 5941.388s
2021-05-21T04:41:10.2664330Z Ran 1280 tests in 6218.775s
2021-05-22T04:40:39.2754540Z Ran 1289 tests in 5884.267s
2021-05-23T04:37:51.5005200Z Ran 1289 tests in 5750.672s
2021-05-24T04:38:24.5894000Z Ran 1289 tests in 5911.655s
2021-05-25T04:51:35.2189100Z Ran 1289 tests in 6266.836s
2021-05-26T06:56:33.6660300Z Ran 1293 tests in 12584.992s
2021-05-27T07:08:27.4015580Z Ran 1293 tests in 11901.552s
2021-05-28T07:39:17.3481450Z Ran 1292 tests in 12094.343s
2021-05-29T08:05:14.3586440Z Ran 1294 tests in 12205.434s
2021-05-30T08:09:33.0780990Z Ran 1294 tests in 12028.089s
2021-05-31T08:16:36.3910830Z Ran 1294 tests in 12329.455s
2021-06-01T08:42:24.3167200Z Ran 1294 tests in 12115.378s
2021-06-02T08:42:23.1985610Z Ran 1294 tests in 12432.309s
2021-06-03T08:00:08.2576030Z Ran 1294 tests in 11556.974s
2021-06-04T07:25:39.1674160Z Ran 1294 tests in 11946.195s
2021-06-05T07:14:00.5262620Z Ran 1294 tests in 12456.432s
2021-06-06T06:53:14.7001370Z Ran 1294 tests in 11677.647s
2021-06-07T07:12:57.5076610Z Ran 1294 tests in 12042.332s
2021-06-08T06:48:01.3977250Z Ran 1294 tests in 12126.463
```
Interestingly the top one is in 10k, but prior ones were ok
```shell
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/04/*/cron-*/*/*macOS*/*_test-datalad\ \(master\).txt
2021-04-27T03:58:30.4829510Z Ran 1265 tests in 6312.819s
2021-04-28T03:44:23.1655040Z Ran 1265 tests in 5622.562s
2021-04-29T03:50:14.0670840Z Ran 1269 tests in 6196.774s
2021-04-30T04:21:45.1727310Z Ran 1269 tests in 5879.034s
```
</details>
<details>
<summary>and absence of the drastic effect confirmed for linux and windows</summary>
```shell
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/0[56]/*/cron-*/*/*buntu*/*_test-datalad\ \(master\).txt
2021-05-01T04:16:34.1539917Z Ran 1269 tests in 3999.905s
2021-05-02T04:17:04.5604428Z Ran 1269 tests in 3745.951s
2021-05-03T04:22:46.1508135Z Ran 1269 tests in 4133.703s
2021-05-04T04:22:19.3612212Z Ran 1269 tests in 4341.995s
2021-05-05T04:23:04.3327570Z Ran 1270 tests in 4383.268s
2021-05-06T04:07:33.2967674Z Ran 1270 tests in 3457.688s
2021-05-07T04:18:26.7140893Z Ran 1270 tests in 3778.162s
2021-05-08T04:13:13.9041377Z Ran 1270 tests in 3728.503s
2021-05-09T04:16:59.6844666Z Ran 1270 tests in 3898.714s
2021-05-11T04:21:45.5447035Z Ran 1270 tests in 4032.457s
2021-05-12T04:20:48.6349049Z Ran 1270 tests in 3940.342s
2021-05-13T04:26:53.8376010Z Ran 1277 tests in 4130.863s
2021-05-14T04:37:41.5689917Z Ran 1277 tests in 4476.880s
2021-05-15T04:20:12.0126613Z Ran 1277 tests in 4007.927s
2021-05-16T04:30:50.0329536Z Ran 1277 tests in 4077.010s
2021-05-17T04:25:03.3302428Z Ran 1277 tests in 4034.791s
2021-05-18T04:33:23.5513354Z Ran 1278 tests in 4422.760s
2021-05-19T04:19:47.5664435Z Ran 1279 tests in 3910.031s
2021-05-20T04:13:45.4617209Z Ran 1280 tests in 3508.342s
2021-05-21T04:23:56.1103373Z Ran 1280 tests in 3946.563s
2021-05-22T04:30:52.1649178Z Ran 1289 tests in 4406.863s
2021-05-23T04:35:53.5526207Z Ran 1289 tests in 3818.054s
2021-05-24T04:33:44.9770098Z Ran 1289 tests in 4069.751s
2021-05-25T04:38:28.6369074Z Ran 1289 tests in 3809.039s
2021-05-26T05:16:56.6913239Z Ran 1293 tests in 4340.973s
2021-05-27T05:21:33.5230620Z Ran 1293 tests in 4303.062s
2021-05-28T05:44:02.6763227Z Ran 1292 tests in 4013.921s
2021-05-29T06:05:40.8639499Z Ran 1294 tests in 3581.055s
2021-05-30T06:17:23.4265104Z Ran 1294 tests in 4257.736s
2021-05-31T06:36:09.6446865Z Ran 1294 tests in 4782.606s
2021-06-01T06:57:26.6049829Z Ran 1294 tests in 4391.030s
2021-06-02T06:59:17.8345547Z Ran 1294 tests in 4737.580s
2021-06-03T06:10:51.5533496Z Ran 1294 tests in 3453.557s
2021-06-04T05:44:15.7873867Z Ran 1294 tests in 4113.770s
2021-06-05T05:25:15.5789949Z Ran 1294 tests in 4383.014s
2021-06-06T05:20:48.7371175Z Ran 1294 tests in 4386.210s
2021-06-07T05:33:24.8643085Z Ran 1294 tests in 4303.855s
2021-06-08T05:03:38.3420188Z Ran 1294 tests in 4371.387s
```
```shell
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/git-annex$ grep -h -i 'Ran .*tests in' builds/2021/0[56]/*/cron-*/*/*indows*/*_test-datalad\ \(master\).txt
2021-05-01T05:29:21.7469495Z Ran 1250 tests in 4904.324s
2021-05-02T05:33:35.7321469Z Ran 1250 tests in 4835.792s
2021-05-03T05:32:36.9342468Z Ran 1250 tests in 5040.210s
2021-05-04T05:18:53.5090364Z Ran 1250 tests in 4508.693s
2021-05-05T05:29:53.0946195Z Ran 1251 tests in 4989.032s
2021-05-06T05:26:05.6627667Z Ran 1251 tests in 4766.005s
2021-05-07T05:19:54.3788683Z Ran 1251 tests in 4305.963s
2021-05-08T05:32:49.9787816Z Ran 1251 tests in 4999.028s
2021-05-09T05:38:03.4560444Z Ran 1251 tests in 5482.509s
2021-05-10T05:34:58.3502335Z Ran 1251 tests in 5067.805s
2021-05-11T05:29:07.7916431Z Ran 1251 tests in 4797.435s
2021-05-12T05:23:10.5488731Z Ran 1251 tests in 4276.426s
2021-05-13T05:25:09.1884039Z Ran 1258 tests in 4275.409s
2021-05-14T05:39:20.0295201Z Ran 1258 tests in 4999.023s
2021-05-15T05:31:48.8021685Z Ran 1258 tests in 4736.174s
2021-05-16T05:33:12.0136713Z Ran 1258 tests in 4310.326s
2021-05-17T05:30:27.7704400Z Ran 1258 tests in 4561.060s
2021-05-18T05:31:34.3746293Z Ran 1259 tests in 4426.723s
2021-05-19T05:31:11.8604863Z Ran 1260 tests in 4903.506s
2021-05-20T05:22:56.6087010Z Ran 1261 tests in 4299.536s
2021-05-21T05:28:25.7511300Z Ran 1261 tests in 4498.286s
2021-05-22T05:44:54.2975226Z Ran 1270 tests in 5376.292s
2021-05-23T05:35:29.5504579Z Ran 1270 tests in 4289.031s
2021-05-24T05:33:40.4688984Z Ran 1270 tests in 4202.429s
2021-05-25T05:45:30.0310577Z Ran 1270 tests in 4882.292s
2021-05-26T06:03:08.4980635Z Ran 1274 tests in 4441.783s
2021-05-27T06:05:25.0347805Z Ran 1274 tests in 4381.865s
2021-05-28T06:35:10.3614775Z Ran 1273 tests in 4529.720s
2021-05-29T07:10:28.5980900Z Ran 1275 tests in 4639.335s
2021-05-30T07:01:37.6908114Z Ran 1275 tests in 5123.743s
2021-05-31T07:01:41.6742606Z Ran 1275 tests in 4672.112s
2021-06-01T07:36:38.0359361Z Ran 1275 tests in 5183.674s
2021-06-02T07:21:46.3022239Z Ran 1275 tests in 4610.975s
2021-06-03T07:20:14.4589015Z Ran 1275 tests in 5795.353s
2021-06-04T06:44:13.4142157Z Ran 1275 tests in 4574.473s
2021-06-08T06:10:00.4654735Z Ran 1275 tests in 5406.636s
```
</details>
now will try to do timing on a local OSX to see if it is indeed just some coincidence with OSX VM getting slower or it is \"real\".
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 13"
date="2021-06-08T19:08:01Z"
content="""
One other test for “some coincidence with OSX VM getting slower“ is to try the current github CI test with an older git-annex.
"""]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="getting closer..."
date="2021-06-08T19:21:59Z"
content="""
I think I have localized the slowdown to a single particular test in datalad which operates on a very heavy tree with tiny files.
Good and bad runs:
```
*$> grep -h -A3 'datalad.support.tests.test_annexrepo.test_files_split(' builds/2021/05/*/cron-*/44196064/Build\ git-annex\ on\ macOS-29{2,3}-failed/1_test-datalad\ \(master\).txt
2021-05-25T04:34:39.7723910Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.gitrepo.GitRepo'>,) ... ok
2021-05-25T04:39:31.3031220Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.annexrepo.AnnexRepo'>,) ... ok
2021-05-25T04:39:31.3032670Z datalad.support.tests.test_annexrepo.test_get_size_from_key ... ok
2021-05-25T04:39:31.3043440Z datalad.support.tests.test_annexrepo.test_done_deprecation ... ok
2021-05-25T04:39:31.3104830Z datalad.support.tests.test_ansi_colors.test_color_enabled ... ok
--
2021-05-26T05:01:12.6881120Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.gitrepo.GitRepo'>,) ... ok
2021-05-26T06:47:04.8547640Z datalad.support.tests.test_annexrepo.test_files_split(<class 'datalad.support.annexrepo.AnnexRepo'>,) ... ok
2021-05-26T06:47:04.8549600Z datalad.support.tests.test_annexrepo.test_get_size_from_key ... ok
2021-05-26T06:47:04.8559760Z datalad.support.tests.test_annexrepo.test_done_deprecation ... ok
2021-05-26T06:47:04.8636720Z datalad.support.tests.test_ansi_colors.test_color_enabled ... ok
```
you can see from timestamps (a guess github prepends time stamp AFTER getting full line) that there is over 1h30m spent there on `test_files_split(<class 'datalad.support.annexrepo.AnnexRepo'>,)`. [Here is the actual test etc for posterity](https://github.com/datalad/datalad/blob/master/datalad/support/tests/test_annexrepo.py#L2354). Yet to pin point more specifically on what is going on but most likely some interplay with command line length invocation limits (specific to OSX) etc.
So good news is that it is not some widely spread drastic slow-down effect as far as I see it.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 15"
date="2021-06-08T20:23:09Z"
content="""
ok -- I think it (or at least a part of it, datalad test is still running) boils down to `git-annex add` now doing some `O(n-staged * n-cmdline-paths)` lookup/operation (instead of before just `O(n-cmdline-paths)`) whenever we have a series of `annex add --json cmdline-paths ...`. This is reflected by the fact that if before we had about `~30 sec` per each invocation of `annex add`, now we have `30 284 528 720`.
In any case in datalad we should finally switch to use `annex add --batch`, filed [an issue](https://github.com/datalad/datalad/issues/5721), but I guess may be it could also be addressed on git-annex side since sounds like some suboptimal data structure is used for some paths' matching.
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="joey"
subject="""comment 16"""
date="2021-06-08T20:56:50Z"
content="""
This is starting to make some sense. If you're running git-annex add
N times adding M files each time, then each run will now diff the
changes in the index made by the previous run.
And the first part of diffing the index is generating a tree from all the
files in it, which is to some extent `O(N*M)` (though IDK, git may have
optimisations involving subtrees or such). So the combined N git-annex add
runs come to `O(N*N*M)`
On linux, `git write-tree` with 100,000 files in the index runs in under 1
second, so athe `N*M` is not too bad. And then there's the overhead of
git-annex processing the resulting diff, which takes more time but is what
I've been optimising.
Perhaps on OSX something is making the write-tree significantly slower.
Or something is making it run the command more with fewer files per run.
Although IIRC OSX has close to the same maximum command line length as
linux.
Or maybe the index diffing is diffing from the wrong start point.
One way I can think of where this would happen is if
it somehow misdetects the index as being locked.
A --debug trace of the one of the later git-annex add runs in that
test case would probably shed some useful light.
Yes, --batch should avoid the problem...
"""]]

View file

@ -0,0 +1,93 @@
[[!comment format=mdwn
username="joey"
subject="""comment 17"""
date="2021-06-09T18:53:14Z"
content="""
Trying to repro on linux, batches of 1000 files each time:
joey@darkstar:~/tmp/bench6>/usr/bin/time git-annex add 1??? --quiet
7.17user 5.02system 0:11.33elapsed 107%CPU (0avgtext+0avgdata 69480maxresident)k
6880inputs+40344outputs (4major+404989minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 2??? --quiet
7.75user 5.63system 0:12.62elapsed 106%CPU (0avgtext+0avgdata 70296maxresident)k
3640inputs+42656outputs (9major+414106minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 3??? --quiet
8.04user 5.74system 0:12.92elapsed 106%CPU (0avgtext+0avgdata 70396maxresident)k
1456inputs+44200outputs (8major+414767minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 4??? --quiet
8.66user 5.57system 0:13.45elapsed 105%CPU (0avgtext+0avgdata 69620maxresident)k
768inputs+45768outputs (11major+416364minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 5??? --quiet
8.83user 5.58system 0:13.92elapsed 103%CPU (0avgtext+0avgdata 69956maxresident)k
5448inputs+47128outputs (45major+415820minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 6??? --quiet
9.00user 5.49system 0:13.60elapsed 106%CPU (0avgtext+0avgdata 70340maxresident)k
20864inputs+48768outputs (270major+418220minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 7??? --quiet
9.23user 5.56system 0:13.90elapsed 106%CPU (0avgtext+0avgdata 70352maxresident)k
12736inputs+50024outputs (492major+417882minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 8??? --quiet
9.27user 5.62system 0:13.89elapsed 107%CPU (0avgtext+0avgdata 70236maxresident)k
11320inputs+51816outputs (411major+419672minor)pagefaults 0swaps
joey@darkstar:~/tmp/bench>/usr/bin/time git-annex add 9??? --quiet
9.33user 5.80system 0:14.04elapsed 107%CPU (0avgtext+0avgdata 70380maxresident)k
10952inputs+53128outputs (281major+419771minor)pagefaults 0swaps
There's some growth here, but it seems linear to the number of new files in the git index.
Doing the same with the last git-annex release:
0:10.97elapsed, 0:11.24elapsed, 0:11.65elapsed, 0:11.90elapsed,
0:12.25elapsed, 0:12.86elapsed, 0:12.74elapsed, 0:12.84elapsed,
0:13.26elapsed
So close to the same, the slowdown feels minimal.
Then on OSX:
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 1??? --quiet
18.26 real 6.03 user 8.35 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 2??? --quiet
26.68 real 6.63 user 9.08 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 3??? --quiet
32.13 real 6.75 user 9.09 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 4??? --quiet
34.49 real 7.11 user 9.40 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 5??? --quiet
34.61 real 7.08 user 9.18 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 6??? --quiet
36.66 real 7.20 user 9.31 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 7??? --quiet
38.29 real 7.35 user 9.42 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 8??? --quiet
35.85 real 7.39 user 9.57 sys
datalads-imac:bench joey$ PATH=$HOME:$PATH /usr/bin/time ~/git-annex add 9??? --quiet
36.20 real 7.56 user 9.59 sys
Well, this does not seem exponential but it's certainly growing
faster than linux.
Then tried on OSX with 100 chunks of 100 files each:
datalads-imac:bench joey$ for x in $(seq 10 99); do echo $x; PATH=$HOME:$PATH /usr/bin/time ~/git-annex add $x?? --quiet; done
...
23
3.91 real 0.83 user 1.17 sys
24
3.31 real 0.83 user 1.25 sys
25
3.32 real 0.83 user 1.26 sys
...
80
5.68 real 1.05 user 1.60 sys
81
6.18 real 1.06 user 1.60 sys
82
7.06 real 1.06 user 1.59 sys
There's more overhead because git-annex has to start up and diff
the index trees, which takes longer than processing the 100 files
in the chunk. But this is not taking hours to run, so either the
test case involves a *lot* more than 10k files, or there is
something else involved.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="OSX mystery resolved. add --batch is effective mitigation"
date="2021-06-08T21:56:53Z"
content="""
> Perhaps on OSX something is making the write-tree significantly slower. Or something is making it run the command more with fewer files per run.
The latter would be my guess... We seems to get 2 vs 5 splits on Linux vs OSX... our `datalad.utils.CMD_MAX_ARG` (logic is [here](https://github.com/datalad/datalad/blob/HEAD/datalad/utils.py#L103)) gets set to 1048576 on linux and 524288 on OSX. So \"matches\" and \"OSX mystery resolved\"! ;)
Meanwhile confirming that using `add --batch` mitigates it. With [ad-hoc patch to add add --batched to datalad](https://github.com/datalad/datalad/pull/5722) I get 187.8053s run for our test using `annex add --batch` and bleeding edge annex 57b567ac8 and 183.1654s (so about the same) using annex 9a5981a15 from 20210525 (which takes over hour with splitting); and then with our old splitting and that old git-annex 9a5981a15 I get 188.3987s run.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 18"
date="2021-06-08T21:58:20Z"
content="""
clarification: the \"(which takes over hour with splitting)\" was intended for the \"bleeding edge\" one, not the older one. That one is ok even with splitting.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 19"
date="2021-06-08T22:02:33Z"
content="""
and we seems to not run that test on windows (marked as `@known_failure_windows` ;)) so we do not see that \"regression\" on windows runs (was yet another part of mystery to me why OSX only ;))
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 1"
date="2021-06-05T13:50:42Z"
content="""
Oh, fresh runs datalad tests fail because I guess you implemented that \"don't display message unless takes awhile\" I spotted in one of the comments but didn't have a chance to follow up - I guess from now on we weren't be able to test us passing through messages from git annex (unless there is some setting where were could disable that feature during tests).
That is unrelated to this issue of slow down though
"""]]

View file

@ -0,0 +1,104 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="more &quot;mystery resolved&quot; -- identical (empty) keys"
date="2021-06-09T21:00:34Z"
content="""
thank you Joey for timing it up! May be relevant: in our [test scenario](https://github.com/datalad/datalad/blob/master/datalad/support/tests/test_annexrepo.py#L2354) we have 100 directories with 100 files in each one of those and each directory or file name is 100 chars long (not sure if this is relevant either). So doing 5 subsequent adds (on OSX) should result in ~20k paths specified on the command line for each invocation.
<details>
<summary>so I did this little replication script which would do similar drill (just not long filenames for this one)</summary>
```shell
#!/bin/bash
export PS4='> '
set -eu
cd \"$(mktemp -d ${TMPDIR:-/tmp}/ann-XXXXXXX)\"
echo \"Populating the tree\"
for d in {0..99}; do
mkdir $d
for f in {0..99}; do
echo \"$d$f\" >> $d/$f
done
done
git init
git annex init
/usr/bin/time git annex add --json {,1}?/* >/dev/null
/usr/bin/time git annex add --json {2,3}?/* >/dev/null
/usr/bin/time git annex add --json {4,5}?/* >/dev/null
/usr/bin/time git annex add --json {6,7}?/* >/dev/null
/usr/bin/time git annex add --json {8,9}?/* >/dev/null
```
</details>
<details>
<summary>with \"older\" 8.20210429-g9a5981a15 on OSX - stable 30 sec per batch (matches what I observed from running our tests)</summary>
```shell
29.83 real 9.52 user 17.43 sys
30.49 real 10.02 user 17.34 sys
30.67 real 10.37 user 17.36 sys
31.00 real 10.57 user 17.39 sys
30.78 real 10.77 user 17.23 sys
```
</details>
<details>
<summary>and the newer 8.20210429-g57b567ac8 -- I got the same-ish nice timing without significant growth! -- damn it</summary>
```shell
31.26 real 10.08 user 18.14 sys
31.97 real 10.99 user 18.69 sys
31.77 real 11.23 user 18.24 sys
32.08 real 11.26 user 18.06 sys
32.53 real 11.45 user 18.27 sys
```
</details>
so I looked into our test generation again and realized -- we are not populating unique files. They are all empty!
<details>
<summary>and now confirming with this slightly adjusted script which just touches them:</summary>
```shell
#!/bin/bash
export PS4='> '
set -eu
cd \"$(mktemp -d ${TMPDIR:-/tmp}/ann-XXXXXXX)\"
echo \"Populating the tree\"
for d in {0..99}; do
mkdir $d
for f in {0..99}; do
touch $d/$f
done
done
git init
git annex init
/usr/bin/time git annex add --json {,1}?/* >/dev/null
/usr/bin/time git annex add --json {2,3}?/* >/dev/null
/usr/bin/time git annex add --json {4,5}?/* >/dev/null
/usr/bin/time git annex add --json {6,7}?/* >/dev/null
/usr/bin/time git annex add --json {8,9}?/* >/dev/null
```
</details>
the case I had observed
```
(base) yoh@dataladmac2 ~ % bash macos-slow-annex-add-empty.sh
...
21.65 real 8.47 user 10.89 sys
139.96 real 61.51 user 78.18 sys
... waiting for the rest -- too eager to post as is for now
```
so -- sorry I have not spotted that peculiarity right from the beginning!
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 22"
date="2021-06-09T22:28:06Z"
content="""
FWIW the complete timing was
```
21.65 real 8.47 user 10.89 sys
139.96 real 61.51 user 78.18 sys
253.47 real 112.01 user 142.78 sys
370.43 real 161.94 user 211.15 sys
481.03 real 210.78 user 274.31 sys
```
so goes pretty close with what I observed with 4 splits (`30 284 528 720`).
"""]]

View file

@ -0,0 +1,31 @@
[[!comment format=mdwn
username="joey"
subject="""comment 23"""
date="2021-06-14T14:09:07Z"
content="""
The file contents all being the same is the crucial thing. On linux,
adding 1000 dup files at a time (all in same directory), I get:
run 1: 0:08
run 2: 0:42
run 3: 1:14
run 4: 1:46
After run 4, adding 1000 files with all different content takes
0:11, so not appreciably slowed down; it only affects adding dups,
and only when there are a *lot* of them.
This feels like quite an edge case, and also not
really a new problem, since unlocked files would have already
had the same problem before recent changes.
I thought this might be an innefficiency in sqlite's index, similar to how
hash tables can scale poorly when a lot of things end up in the same
bucket. But disabling the index did not improve performance.
Aha -- the slowdown is caused by `git-annex add` looking to see what other
annexed files use the same content, so that it can populate any unlocked
files that didn't have the content present before. With all these locked
files now recorded in the db, it has to check each file in turn, and
there's the `O(N^2)`
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 24"""
date="2021-06-14T15:36:30Z"
content="""
If the database recorded when files were unlocked or not, that could be
avoided, but tracking that would add a lot of complexity for what is just
an edge case. And probably slow things down generally by some amount due to
the db being larger.
It seems almost cheating, but it could remember the last few keys it's added,
and avoid trying to populate unlocked files when adding those keys again.
This would slow down the usual case by some tiny amount (eg an IORef access)
but avoid `O(N^2)` in this edge case. Though it wouldn't fix all edge cases,
eg when the files it's adding rotate through X different contents, and X is
larger than the number of keys it remembers.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 25"""
date="2021-06-14T15:52:37Z"
content="""
Found an optimisation that sped it up 50%, but this edge case is still
`O(N^2)`, so *shrug*.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="git-annex-add slowdown"
date="2021-06-14T16:00:44Z"
content="""
> git-annex add looking to see what other annexed files use the same content, so that it can populate any unlocked files that didn't have the content present before
Could this be skipped if `annex.supportunlocked=false`?
>The file contents all being the same is the crucial thing
One not-quite-edge case where that happens is when empty files are used as placedholders for outputs of failed steps throughout a large workflow.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 27"
date="2021-06-14T16:23:41Z"
content="""
FWIW, without having a clear idea on code logic/data structures, it smells a bit like somewhere may be it could be `any` vs current `all` operation/check (in case of `add` at least) if needed just to test if there is already a path with a given key, and assuming that DB doesn't require a full rewrite of all `path` entries if a new path added for the key.
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 28"""
date="2021-06-14T16:31:26Z"
content="""
@Ilya sure can be skipped when annex.supportunlocked=false.
I've implemented that. (And also for several other cases that have similar
behavior, like dropping a key.)
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 29"""
date="2021-06-14T16:32:38Z"
content="""
@yarik no, it needs to check all files. Consider what happens when `foo` is
an annexed link to key K, which is not present, and you copy
`.git/annex/objects/../K` from some other repo and run `git annex add K`
(or reinject, or get, it's all the same) -- `foo` now points to the content
of K. The same needs to also hold true for unlocked files, and so it has to
check if `foo` is an unlocked pointer to K and populate the file with the
content. Repeat for all the other 9999 files..
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2021-06-07T15:41:43Z"
content="""
I assume these tests are creating lots of clones of repositories. Are they
also doing lots of merges or switching between branches?
The init scan time has already been optimised as much as seems feasible.
The update scan time can still be optimised, see
[[speed_up_keys_db_update_with_git_streaming]].
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 30"
date="2021-06-14T17:36:16Z"
content="""
> The same needs to also hold true for unlocked files, and so it has to check if foo is an unlocked pointer to K and populate the file with the content.
but that should only needs to be done iff K became present/known whenever it was not present before. If K is already known (e.g. was just added for another file, or may be was added in previous \"commit\") no such checks are needed since those files could be expected to already be populated. Right?
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""Re: comment 30"""
date="2021-06-14T18:36:11Z"
content="""
I discussed that approach in comment #24.
"""]]

View file

@ -0,0 +1,35 @@
[[!comment format=mdwn
username="joey"
subject="""comment 32"""
date="2021-06-14T20:26:52Z"
content="""
Some thoughts leading to a workable plan:
It's easy to detect this edge case because getAssociatedFiles will be
returning a long list of files. So it could detect say 10 files in the list
and start doing something other than the usual, without bothering the usual
case with any extra work.
Git starts to get slow anyway in the 1 million to 10 million file range. So
we can assume less than that many files are being added. And there need to
be a fairly large number of duplicates of a key for speed to become a problem
when adding that key. Around 1000 based on above benchmarks, but 100 would
be safer.
If it's adding 10 million files, there can be at most 10000 keys
that have `>=` 1000 duplicates (10 million / 1000).
No problem to remember 10000 keys; a key is less than 128 bytes long, so
that would take 1250 kb, plus the overhead of the Map. Might as well
remember 12 mb worth of keys, to catch 100 duplicates.
It would be even better to use a bloom filter, which could remember many
more, and I thought I had a way, but the false positive case seems the
wrong way around. If the bloom filter remembers keys that have already had
their associated files populated, then a false positive would prevent doing
that for a key that it's not been done for.
It would make sense to do this not only in populateUnlockedFiles but in
Annex.Content.moveAnnex and Annex.Content.removeAnnex. Although removeAnnex
would need a different bloom filter, since a file might have been populated
and then somehow get removed in the same git-annex call.
"""]]

View file

@ -0,0 +1,23 @@
[[!comment format=mdwn
username="joey"
subject="""comment 33"""
date="2021-06-15T13:01:04Z"
content="""
Oh, there's a much better solution: If the annex object file already exists
when ingesting a new file, skip populating other associated files. They
will have already been populated. moveAnnex has to check if the annex object
file already exists anyway, so this will have zero overhead.
(Maybe that's what yarik was getting at in comment #30)
Implemented that, and here's the results, re-running my prior benchmark:
run 1: 0:03.14
run 2: 0:03.24
run 3: 0.03.35
run 4: 0.03.45
run 9: 0:03.65
That also shows the actual overhead of the diffing of the index,
as its size grows, is quite small.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 34"
date="2021-06-15T15:38:16Z"
content="""
FWIW: Yes! ;-)
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="thanks"
date="2021-06-16T19:54:01Z"
content="""
Thanks @joeyh for sorting this out ;)
"""]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 36"
date="2021-06-16T21:32:58Z"
content="""
yes, awesome, thank you Joey!
FWIW, here is the little trail of our tests (non-adjusted in released version) timings
```
$> grep -h Ran 0[56]/cron-*/build-macos.yaml-*/*_test-datalad\ \(release\).txt
2021-05-01T04:28:55.4957910Z Ran 1257 tests in 6033.095s
2021-05-02T04:24:43.3005800Z Ran 1257 tests in 5722.774s
2021-05-03T04:24:54.5178860Z Ran 1257 tests in 5846.700s
...
2021-05-24T04:39:07.5375680Z Ran 1257 tests in 5710.372s
2021-05-25T04:54:01.2232460Z Ran 1257 tests in 6459.339s
2021-05-26T06:41:04.2956530Z Ran 1257 tests in 11655.761s
2021-05-27T07:07:28.4722960Z Ran 1257 tests in 11954.671s
...
2021-06-13T05:59:36.9626910Z Ran 1257 tests in 11987.233s
2021-06-14T06:05:36.1668770Z Ran 1257 tests in 12135.290s
2021-06-15T04:45:33.3489450Z Ran 1257 tests in 7046.095s
2021-06-16T04:26:46.1256550Z Ran 1257 tests in 5706.314s
```
so we are indeed back to normal (if not better ;)) !
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-06-07T15:49:51Z"
content="""
Also worth saying: When I have a choice between a bug fix and a performance
change, I kind of have to pick the bug fix.
[[bugs/indeterminite_preferred_content_state_for_duplicated_file]] was a
longstanding bug that could cause *very* expensive misbehavior. The only
way to fix it that would not have an on performance would be to
remove include= and exclude= from the preferred content expression syntax,
which would prevent a lot of current uses of preferred content.
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="deferring the scan"
date="2021-06-07T17:41:44Z"
content="""
>The only way to fix it that would not have an on performance would be to remove include= and exclude= from the preferred content expression syntax
What about deferring the scan until the first command that uses `include=/exclude=` gets run, if `annex.supportunlocked=false`?
Also, could the scan be limited to the files that match one of the include/exclude globs?
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-06-07T20:22:24Z"
content="""
Turns out `git-annex init` got a lot slower than it had to, it was doing
the same kind of scan twice. I think that probably explains much of the
slowdown you saw. (Although it still needs to do somewhat more work than
before, but also does it more efficiently than before.)
Also I've optimised the update scan's use of git, around a 20% speedup,
although I don't know if your test case speed was impacted much by that
scan anyway.
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-06-07T20:42:34Z"
content="""
Looking at the CI log, I don't see any past runs that took 1h 46min.
A month ago they were taking 2h 6min. Let's see if the changes I'm pushing
now drop it back to that.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="clarification"
date="2021-06-07T21:20:35Z"
content="""
For completeness
> Looking at the CI log, I don't see any past runs that took 1h 46min. A month ago they were taking 2h 6min.
I was looking at OSX ones. And the timing was for the one right before slow down: It was [this one](https://github.com/datalad/git-annex/actions/runs/873390021), but even [previous one](https://github.com/datalad/git-annex/actions/runs/869970007) was 1h40min
> Let's see if the changes I'm pushing now drop it back to that.
🤞
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 8"
date="2021-06-07T21:39:05Z"
content="""
FWIW:
> I assume these tests are creating lots of clones of repositories.
lots of `git annex init` in new git repos. Also a good number of clones AFAIK. FWIW: we used to do more clones instead of just recreating new repos per test, but then we switched to just creating a new repo. We still have a good number though of cloning.
> Are they also doing lots of merges or switching between branches?
Not that many AFAIK.
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2021-06-08T13:45:18Z"
content="""
I do not see 1h40min anywhere on the page you linked. It says
2h 6m at the top. Oh I see, you are clicking around to get to
https://github.com/datalad/git-annex/runs/2661363385?check_suite_focus=true,
and that number is in there next to "Run datalad tests".
My improvements yesterday did not improve your test suite time any. But
they certainly sped it up a *lot* in my benchmarks. So I think what you
think is taking more time, eg the scanning at init, does not really have
much to do with whatever is really taking more time. And if your test suite
is mostly not cloning repos but is initializing new empty repos, then the
scanning at init was and still is effectively a noop, it will not have
gotten more expensive.
It might be that the incremental updating git-annex now does when it sees
changes to the index is making your test suite run a bit longer. But twice
as long? That is not a very expensive process.
Also notice that the git-annex test part of the CI job used to take
15m54s and is now taking 17m29s. That's doing some cloning and some adding
of files etc, and it didn't double in run time.
I did find another nice optimisation this morning
[[!commit c831a562f550e6ff93e081f555eced3a8a0d524f]], so we can see if that
improves things in the next run.
I suspect you're going to have to look at specific parts of your test suite
and/or the CI system to identify what's slower though.
"""]]