From 4868b64868747455a9c5d512650f9e7074e6009e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 27 Mar 2011 18:34:30 -0400 Subject: [PATCH] Provide a less expensive version of `git annex copy --to`, enabled via --fast. This assumes that location tracking information is correct, rather than contacting the remote for every file. --- Command/Move.hs | 10 ++++++++-- debian/changelog | 3 +++ ...batch_check_on_remote_when_using_copy.mdwn | 19 +++++++++++++++++++ doc/git-annex.mdwn | 19 ++++++++++++------- 4 files changed, 42 insertions(+), 9 deletions(-) diff --git a/Command/Move.hs b/Command/Move.hs index 907bbf00ef..3ac5a7ab2c 100644 --- a/Command/Move.hs +++ b/Command/Move.hs @@ -84,8 +84,14 @@ toStart dest move file = isAnnexed file $ \(key, _) -> do return $ Just $ toPerform dest move key toPerform :: Remote.Remote Annex -> Bool -> Key -> CommandPerform toPerform dest move key = do - -- checking the remote is expensive, so not done in the start step - isthere <- Remote.hasKey dest key + -- Checking the remote is expensive, so not done in the start step. + -- In fast mode, location tracking is assumed to be correct, + -- and an explicit check is not done, when copying. When moving, + -- it has to be done, to avoid inaverdent data loss. + fast <- Annex.getState Annex.fast + isthere <- if fast && not move + then return $ Right True + else Remote.hasKey dest key case isthere of Left err -> do showNote $ show err diff --git a/debian/changelog b/debian/changelog index e995009db0..2f532784d4 100644 --- a/debian/changelog +++ b/debian/changelog @@ -3,6 +3,9 @@ git-annex (0.20110326) UNRELEASED; urgency=low * annex.diskreserve can be given in arbitrary units (ie "0.5 gigabytes") * Generalized remotes handling, laying groundwork for remotes that are not regular git remotes. + * Provide a less expensive version of `git annex copy --to`, enabled + via --fast. This assumes that location tracking information is correct, + rather than contacting the remote for every file. -- Joey Hess Sat, 26 Mar 2011 14:36:16 -0400 diff --git a/doc/forum/batch_check_on_remote_when_using_copy.mdwn b/doc/forum/batch_check_on_remote_when_using_copy.mdwn index 0f20ab6454..b08c33b8ba 100644 --- a/doc/forum/batch_check_on_remote_when_using_copy.mdwn +++ b/doc/forum/batch_check_on_remote_when_using_copy.mdwn @@ -6,3 +6,22 @@ Once all checks are done, one single transfer session should be started. Creatin -- RichiH + +> (Use of SHA is irrelevant here, copy does not checksum anything.) +> +> I think what you're seeing is +> that `git annex copy --to remote` is slow, going to the remote repository +> every time to see if it has the file, while `git annex copy --from remote` +> is fast, since it looks at what files are locally present. +> +> That is something I mean to improve. At least `git annex copy --fast --to remote` +> could easily do a fast copy of all files that are known to be missing from +> the remote repository. When local and remote git repos are not 100% in sync, +> relying on that data could miss some files that the remote doesn't have anymore, +> but local doesn't know it dropped. That's why it's a candidate for `--fast`. +> +> I've just implemented that. +> +> While I do hope to improve ssh usage so that it sshs once, and feeds +> `git-annex-shell` a series of commands to run, that is a much longer-term +> thing. --[[Joey]] diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 32f190e75d..8afe93c109 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -84,20 +84,22 @@ Many git-annex commands will stage changes for later `git commit` by you. it is safe to do so, typically because of the setting of annex.numcopies. * move [path ...] + + When used with the --from option, moves the content of annexed files + from the specified repository to the current one. When used with the --to option, moves the content of annexed files from the current repository to the specified one. - When used with the --from option, moves the content of annexed files - from the specified repository to the current one. - * copy [path ...] + When used with the --from option, copies the content of annexed files + from the specified repository to the current one. + When used with the --to option, copies the content of annexed files from the current repository to the specified one. - When used with the --from option, copies the content of annexed files - from the specified repository to the current one. + To avoid contacting the remote to check if it has every file, specify --fast * unlock [path ...] @@ -137,11 +139,15 @@ Many git-annex commands will stage changes for later `git commit` by you. With parameters, only the specified files are checked. + To avoid expensive checksum calculations, specify --fast + * unused Checks the annex for data that is not used by any files currently in the annex, and prints a numbered list of the data. + To only show unused temp files, specify --fast + * dropunused [number ...] Drops the data corresponding to the numbers, as listed by the last @@ -286,8 +292,7 @@ Many git-annex commands will stage changes for later `git commit` by you. * --fast Enables less expensive, but also less thorough versions of some commands. - What is avoided depends on the command. A fast fsck avoids calculating - checksums; a fast unused only shows temp files and not other unused files. + What is avoided depends on the command. * --quiet