diff --git a/doc/bugs/copy_doesn__39__t_scale.mdwn b/doc/bugs/copy_doesn__39__t_scale.mdwn new file mode 100644 index 0000000000..1a83ae548b --- /dev/null +++ b/doc/bugs/copy_doesn__39__t_scale.mdwn @@ -0,0 +1,4 @@ +It seems that git-annex copies every individual file in a separate transaction. This is quite costly for mass transfers: each file involves a separate rsync invocation and the creation of a new commit. Even with a meager thousand files or so in the annex, I have to wait for fifteen minutes to copy the contents to another disk, simply because every individual file involves some disk thrashing. Also, it seems suspicious that the git-annex branch would get a thousands commits of history from the simple procedure of copying everything to a new repository. Surely it would be better to first copy everything and then create only a single commit that registers the changes to the files' availability? + +(I'm also not quite clear on why rsync is being used when both repositories are local. It seems to be just overhead.) +