move --from, copy --from: 10 times faster scanning remote on local disk

Rather than go through the location log to see which files are present on
the remote, it simply looks at the disk contents directly.

I benchmarked this speeding up scanning 834 files, from an annex on my
phone's SSD, from 11.39 seconds to 1.31 seconds. (No files actually moved.)

Also benchmarked 8139 files, from an annex on spinning storage,
speeding up from 103.17 to 13.39 seconds.

Note that benchmarking with an encrypted annex on flash actually showed a
minor slowdown with this optimisation -- from 13.93 to 14.50 seconds. Seems
the overhead of doing the crypto needed to get the filenames to directly
check can be higher than the overhead of looking up data in the location
log. (Which says good things about how well the location log and git have
been optimised!) It *may* make sense to make encrypted local remotes not
have hasKeyCheap set; further benchmarking is called for.
This commit is contained in:
Joey Hess 2012-02-26 14:59:12 -04:00
parent 00d814aecc
commit 2fd294d06f
2 changed files with 13 additions and 4 deletions
Command
debian

View file

@ -120,10 +120,15 @@ fromStart src move file key
showMoveAction move file
next $ fromPerform src move key
fromOk :: Remote -> Key -> Annex Bool
fromOk src key = do
u <- getUUID
remotes <- Remote.keyPossibilities key
return $ u /= Remote.uuid src && any (== src) remotes
fromOk src key
| Remote.hasKeyCheap src =
either (const expensive) return =<< Remote.hasKey src key
| otherwise = expensive
where
expensive = do
u <- getUUID
remotes <- Remote.keyPossibilities key
return $ u /= Remote.uuid src && any (== src) remotes
fromPerform :: Remote -> Bool -> Key -> CommandPerform
fromPerform src move key = moveLock move key $ do
ishere <- inAnnex key

4
debian/changelog vendored
View file

@ -36,6 +36,10 @@ git-annex (3.20120124) UNRELEASED; urgency=low
less frequently, when a merge or sync is done.
* configure: Check if ssh connection caching is supported by the installed
version of ssh and default annex.sshcaching accordingly.
* move --from, copy --from: Now 10 times faster when scanning to find
files in a remote on a local disk; rather than go through the location log
to see which files are present on the remote, it simply looks at the
disk contents directly.
-- Joey Hess <joeyh@debian.org> Tue, 24 Jan 2012 16:21:55 -0400