Revert "Significantly sped up processing of large numbers of directories passed to a single git-annex command."

This reverts commit 705112903e.

Whoops, git ls-files does not always output in the input ordering.
That's why all this work is needed. Urk.
This commit is contained in:
Joey Hess 2015-04-02 01:23:43 -04:00
parent 8aa6b5f2a6
commit f79502d377
3 changed files with 3 additions and 11 deletions

View file

@ -170,19 +170,17 @@ prop_relPathDirToFile_regressionTest = same_dir_shortcurcuits_at_difference
== joinPath ["..", "..", "..", "..", ".git", "annex", "objects", "18", "gk", "SHA256-foo", "SHA256-foo"]
{- Given an original list of paths, and an expanded list derived from it,
- partitions the expanded list, so that sublist corresponds to one of the
- generates a list of lists, where each sublist corresponds to one of the
- original paths. When the original path is a directory, any items
- in the expanded list that are contained in that directory will appear in
- its segment.
-
- The expanded list must have the same ordering as the original list.
-}
segmentPaths :: [FilePath] -> [FilePath] -> [[FilePath]]
segmentPaths [] new = [new]
segmentPaths [_] new = [new] -- optimisation
segmentPaths (l:ls) new = found : segmentPaths ls rest
where
(found, rest) = break (\p -> not (l `dirContains` p)) new
(found, rest)=partition (l `dirContains`) new
{- This assumes that it's cheaper to call segmentPaths on the result,
- than it would be to run the action separately with each path. In

2
debian/changelog vendored
View file

@ -22,8 +22,6 @@ git-annex (5.20150328) UNRELEASED; urgency=medium
corresponding to duplicated files they process.
* fsck: Added --distributed and --expire options,
for distributed fsck.
* Significantly sped up processing of large numbers of directories
passed to a single git-annex command.
* Fix truncation of parameters that could occur when using xargs git-annex.
-- Joey Hess <id@joeyh.name> Fri, 27 Mar 2015 16:04:43 -0400

View file

@ -5,13 +5,9 @@ Feeding git-annex a long list off directories, eg with xargs can have
ls-files command is longer than the git-annex command often, so it gets
truncated and some files are not processed.
> [[fixed|done]] --[[Joey]]
> fixed --[[Joey]]
* It can take a really long time for git-annex to chew through the
git-ls-files results. There is probably an exponential blowup in the time
relative to the number of parameters. Some of the stuff being done to
preserve original ordering etc is likely at fault.
> I think I've managed to speed this up something like
> 1000x or some such. segmentPaths on an utterly insane list of 6 million
> files now runs in about 10 seconds. --[[Joey]]