git-annex/doc/bugs/significant_performance_regression_impacting_datal.mdwn
Joey Hess 3af4c9a29a
fix exponential blowup when adding lots of identical files
This was an old problem when the files were being added unlocked,
so the changelog mentions that being fixed. However, recently it's also
affected locked files.

The fix for locked files is kind of stupidly simple. moveAnnex already
handles populating unlocked files, and only does it when the object file
was not already present. So remove the redundant populateUnlockedFiles
call. (That call was added all the way back in
cfaac52b88, and has always been
unncessary.)

Sponsored-by: Dartmouth College's Datalad project
2021-06-15 09:45:55 -04:00

19 lines
955 B
Markdown

### Please describe the problem.
With recent RFing of scanning for unlocked/annexed files (I guess), a sweep of datalad tests on OSX started to take about 3h 30min instead of prior 1h 46min. So pretty much twice. Besides possibly affecting user experience, I am afraid that would cause too much ripples though our CI setup which might not run out of time
Logs etc are at https://github.com/datalad/git-annex/actions/workflows/build-macos.yaml
The first red is ok, just a fluke but then they all fail due to change in output log string (for which there is a fix but somehow behavior on osx seems different, yet to check).
### What version of git-annex are you using? On what operating system?
Currently 8.20210428+git282-gd39dfed2a and first got slow with
8.20210428+git228-g13a6bfff4 and was ok with 8.20210428+git202-g9a5981a15
[[!meta title="performance edge case when adding large numbers of identical files"]]
> [[fixed|done]] --[[Joey]]