fix exponential blowup when adding lots of identical files

This was an old problem when the files were being added unlocked,
so the changelog mentions that being fixed. However, recently it's also
affected locked files.

The fix for locked files is kind of stupidly simple. moveAnnex already
handles populating unlocked files, and only does it when the object file
was not already present. So remove the redundant populateUnlockedFiles
call. (That call was added all the way back in
cfaac52b88, and has always been
unncessary.)

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-06-15 09:32:12 -04:00
parent e147ae07f4
commit 3af4c9a29a
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 40 additions and 20 deletions

View file

@ -7,4 +7,17 @@ Oh, there's a much better solution: If the annex object file already exists
when ingesting a new file, skip populating other associated files. They
will have already been populated. moveAnnex has to check if the annex object
file already exists anyway, so this will have zero overhead.
(Maybe that's what yarik was getting at in comment #30)
Implemented that, and here's the results, re-running my prior benchmark:
run 1: 0:03.14
run 2: 0:03.24
run 3: 0.03.35
run 4: 0.03.45
run 9: 0:03.65
That also shows the actual overhead of the diffing of the index,
as its size grows, is quite small.
"""]]