Avoid repeated checking that files passed on the command line exist.

git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.

Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2017-10-16 14:10:03 -04:00
parent a461cf2ce6
commit 85ed38a574
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
25 changed files with 128 additions and 71 deletions

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2017-10-16T16:58:46Z"
content="""
I was worried there could be further races in the seeking
done by withFilesOldUnlocked and withFilesMaybeModified if those
run while files are still being ingested by actions run earlier
in the `git annex add`. Seems this is not a problem though --
withFilesOldUnlocked looks for typeChanged files, but the files
that were just/are currently being added were not in git before,
so are not typeChanged.
withFilesMaybeModified looks for modified files, and again these
files were/are just being added for the first time, so it won't stumble
over them.
So, I don't think a synchronization point is needed. In fact,
all three seeks could actually be run more concurrently than they are not
without stepping on one-another's toes.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2017-10-16T17:06:43Z"
content="""
That leaves only the innefficiency of checkFileOrDirectoryExists being
run three times per parameter passed to `git annex add`.
There are some other commands that also run checkFileOrDirectoryExists
multiple times. `git annex lock` being one.
So, I factored that out into a separate pass, that's only done once.
"""]]