Significantly sped up processing of large numbers of directories passed to a single git-annex command.
This commit is contained in:
parent
dffa212e02
commit
705112903e
3 changed files with 10 additions and 2 deletions
|
@ -170,17 +170,19 @@ prop_relPathDirToFile_regressionTest = same_dir_shortcurcuits_at_difference
|
|||
== joinPath ["..", "..", "..", "..", ".git", "annex", "objects", "18", "gk", "SHA256-foo", "SHA256-foo"]
|
||||
|
||||
{- Given an original list of paths, and an expanded list derived from it,
|
||||
- generates a list of lists, where each sublist corresponds to one of the
|
||||
- partitions the expanded list, so that sublist corresponds to one of the
|
||||
- original paths. When the original path is a directory, any items
|
||||
- in the expanded list that are contained in that directory will appear in
|
||||
- its segment.
|
||||
-
|
||||
- The expanded list must have the same ordering as the original list.
|
||||
-}
|
||||
segmentPaths :: [FilePath] -> [FilePath] -> [[FilePath]]
|
||||
segmentPaths [] new = [new]
|
||||
segmentPaths [_] new = [new] -- optimisation
|
||||
segmentPaths (l:ls) new = found : segmentPaths ls rest
|
||||
where
|
||||
(found, rest)=partition (l `dirContains`) new
|
||||
(found, rest) = break (\p -> not (l `dirContains` p)) new
|
||||
|
||||
{- This assumes that it's cheaper to call segmentPaths on the result,
|
||||
- than it would be to run the action separately with each path. In
|
||||
|
|
2
debian/changelog
vendored
2
debian/changelog
vendored
|
@ -22,6 +22,8 @@ git-annex (5.20150328) UNRELEASED; urgency=medium
|
|||
corresponding to duplicated files they process.
|
||||
* fsck: Added --distributed and --expire options,
|
||||
for distributed fsck.
|
||||
* Significantly sped up processing of large numbers of directories
|
||||
passed to a single git-annex command.
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Fri, 27 Mar 2015 16:04:43 -0400
|
||||
|
||||
|
|
|
@ -9,3 +9,7 @@ Feeding git-annex a long list off directories, eg with xargs can have
|
|||
git-ls-files results. There is probably an exponential blowup in the time
|
||||
relative to the number of parameters. Some of the stuff being done to
|
||||
preserve original ordering etc is likely at fault.
|
||||
|
||||
> I think I've managed to speed this up something like
|
||||
> 1000x or some such. segmentPaths on an utterly insane list of 6 million
|
||||
> files now runs in about 10 seconds. --[[Joey]]
|
||||
|
|
Loading…
Reference in a new issue