diff --git a/Utility/Path.hs b/Utility/Path.hs index 755436448e..2675aa0f9e 100644 --- a/Utility/Path.hs +++ b/Utility/Path.hs @@ -170,17 +170,26 @@ prop_relPathDirToFile_regressionTest = same_dir_shortcurcuits_at_difference == joinPath ["..", "..", "..", "..", ".git", "annex", "objects", "18", "gk", "SHA256-foo", "SHA256-foo"] {- Given an original list of paths, and an expanded list derived from it, - - generates a list of lists, where each sublist corresponds to one of the - - original paths. When the original path is a directory, any items - - in the expanded list that are contained in that directory will appear in - - its segment. + - which may be arbitrarily reordered, generates a list of lists, where + - each sublist corresponds to one of the original paths. + - + - When the original path is a directory, any items in the expanded list + - that are contained in that directory will appear in its segment. + - + - The order of the original list of paths is attempted to be preserved in + - the order of the returned segments. However, doing so has a O^NM + - growth factor. So, if the original list has more than 100 paths on it, + - we stop preserving ordering at that point. Presumably a user passing + - that many paths in doesn't care too much about order of the later ones. -} segmentPaths :: [FilePath] -> [FilePath] -> [[FilePath]] segmentPaths [] new = [new] segmentPaths [_] new = [new] -- optimisation segmentPaths (l:ls) new = found : segmentPaths ls rest where - (found, rest)=partition (l `dirContains`) new + (found, rest) = if length ls < 100 + then partition (l `dirContains`) new + else break (\p -> not (l `dirContains` p)) new {- This assumes that it's cheaper to call segmentPaths on the result, - than it would be to run the action separately with each path. In diff --git a/debian/changelog b/debian/changelog index ed152c8ef9..7188f28603 100644 --- a/debian/changelog +++ b/debian/changelog @@ -23,6 +23,8 @@ git-annex (5.20150328) UNRELEASED; urgency=medium * fsck: Added --distributed and --expire options, for distributed fsck. * Fix truncation of parameters that could occur when using xargs git-annex. + * Significantly sped up processing of large numbers of directories + passed to a single git-annex command. -- Joey Hess Fri, 27 Mar 2015 16:04:43 -0400 diff --git a/doc/bugs/feeding_git_annex_with_xargs_can_fail.mdwn b/doc/bugs/feeding_git_annex_with_xargs_can_fail.mdwn index c973308c65..c4199d3b9c 100644 --- a/doc/bugs/feeding_git_annex_with_xargs_can_fail.mdwn +++ b/doc/bugs/feeding_git_annex_with_xargs_can_fail.mdwn @@ -11,3 +11,5 @@ Feeding git-annex a long list off directories, eg with xargs can have git-ls-files results. There is probably an exponential blowup in the time relative to the number of parameters. Some of the stuff being done to preserve original ordering etc is likely at fault. + + > [[fixed|done]] --[[Joey]]