Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2012-01-27 16:36:47 -04:00
commit f2817f13ac
3 changed files with 29 additions and 1 deletions

View file

@ -11,6 +11,8 @@ import qualified Data.ByteString.Lazy.Char8 as L
import System.IO.Error import System.IO.Error
import qualified Data.Map as M import qualified Data.Map as M
import System.Process import System.Process
import System.Posix.Env (getEnvironment)
import System.Path (brackettmpdir)
import Common.Annex import Common.Annex
import Types.Remote import Types.Remote
@ -83,10 +85,21 @@ bupParams :: String -> BupRepo -> [CommandParam] -> [CommandParam]
bupParams command buprepo params = bupParams command buprepo params =
Param command : [Param "-r", Param buprepo] ++ params Param command : [Param "-r", Param buprepo] ++ params
isLocal :: BupRepo -> Bool
isLocal buprepo = not (elem ':' buprepo)
bup :: String -> BupRepo -> [CommandParam] -> Annex Bool bup :: String -> BupRepo -> [CommandParam] -> Annex Bool
bup command buprepo params = do bup command buprepo params = do
showOutput -- make way for bup output showOutput -- make way for bup output
liftIO $ boolSystem "bup" $ bupParams command buprepo params liftIO action
where
action | isLocal buprepo = runBup lparams buprepo
| otherwise = brackettmpdir "bupXXXXXX" $ runBup rparams
lparams = Param command : params
rparams = bupParams command buprepo params
runBup params bupdir = do
env <- getEnvironment
boolSystemEnv "bup" params (Just (("BUP_DIR", bupdir) : env))
pipeBup :: [CommandParam] -> Maybe Handle -> Maybe Handle -> IO Bool pipeBup :: [CommandParam] -> Maybe Handle -> Maybe Handle -> IO Bool
pipeBup params inh outh = do pipeBup params inh outh = do

View file

@ -0,0 +1,4 @@
It seems that git-annex copies every individual file in a separate transaction. This is quite costly for mass transfers: each file involves a separate rsync invocation and the creation of a new commit. Even with a meager thousand files or so in the annex, I have to wait for fifteen minutes to copy the contents to another disk, simply because every individual file involves some disk thrashing. Also, it seems suspicious that the git-annex branch would get a thousands commits of history from the simple procedure of copying everything to a new repository. Surely it would be better to first copy everything and then create only a single commit that registers the changes to the files' availability?
(I'm also not quite clear on why rsync is being used when both repositories are local. It seems to be just overhead.)

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawk6QAwUsFHpr3Km1yQbg8hf3S7RDYf7hX4"
nickname="Lauri"
subject="comment 5"
date="2012-01-26T22:13:18Z"
content="""
I also encountered Adam's bug. The problem seems to be that communication with the git process is done with `Char8`-bytestrings. So, when `L.unpack` is called, all filenames that git outputs (with `ls-files` or `ls-tree`) are interpreted to be in latin-1, which wreaks havoc if they are really in UTF-8.
I suspect that it would be enough to just switch to standard `String`s (or `Data.Text.Text`) instead of bytestrings for textual data, and to `Word8`-bytestrings for pure binary data. GHC should nowadays handle locale-dependent encoding of `String`s transparently.
"""]]