Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2012-05-10 14:26:55 -04:00
commit 7943318ec6
4 changed files with 34 additions and 0 deletions

View file

@ -0,0 +1,9 @@
Hi,
May I know, which processes git-annex is using to move/copy/get file content between repositories?
Is it using same processes which git uses, Like send-pack, receive-pack.
I want to use Aspera to move file contents between repositories.
Is it enough if I customize send-pack, receive-pack of git code to do Aspera file transfer or git-annex uses any other transfer mechanism.
Many Thanks,
Royal Pinto

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.153.2.186"
subject="rsync over ssh"
date="2012-05-10T18:18:01Z"
content="""
Some other protocols such as S3 for special remotes.
"""]]

View file

@ -0,0 +1,5 @@
http://git-annex.branchable.com/walkthrough/ says "Git wants to first stage the entire contents of the file in its index. That can be slow for big files (sorta why git-annex exists in the first place)."<br/>
What is git doing that git-annex isn't, other than copying the file to .git/objects rather than just moving it to .git/annex/objects, prepending it with "blob"+length, and compressing it? If git were changed to store the "blob"+length as part of the object filename rather than as part of the object file content, have a config option to use uncompressed objects for large files (and not try to pack them when creating pack files), and were used on a filesystem such as zfs or btrfs which does COW so the copy would be as fast as a move, then what speed advantage would git-annex still have over git? I realize git-annex has more features than just big file handling, and has the worm backend for even faster handling, but I'm just talking about the case with the default sha backend.<br/>
Have such changes been proposed for git? It seems that for anybody already familiar with the git codebase, adding the config option for uncompressed objects and moving the storage location for "blob"+length would be easy changes to make, and I see no downside to them. It wouldn't break backwards compatibility because the object filename being hash."blob".length rather than just hash would indicate that the new object format is in use, and a ".raw" filename extension could be used for uncompressed objects (or more sensibly, in the new format, no additional extension for uncompressed, and ".compressed" for compressed).<br/>
This would also eliminate the need for a git-annex object store separate from the git object store, and the complexities involved with having them separate, and the need for symlinks, and the complexities they cause. I don't think that relying on COW for speed is unreasonable once btrfs becomes the default in major Linux distros (the bsds already have zfs and hammerfs); right now part of what git-annex is doing is just working around the functional deficiency of non-COW filesystems.<br/>
P.S. I recommend a "plain" option for the page type when submitting comments on your wiki, so I don't have to put HTML line break markup at the end of my lines.

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.153.2.245"
subject="comment 1"
date="2012-05-08T18:22:12Z"
content="""
git's code base makes lots of assumptions hardcoding the size of the hash, etc. (grep its source for magic numbers 40 and 42...) I'd like to see git get parameratised hashes. SHA1 insecurity may evenutally push it in that direction. However, when I asked the git developers about this at the Gittogether last year, there were several ideas floated that would avoid parameterisation, and a lot of good thoughts about problems parameterised hashes would cause.
Moving data into git proper would still leave the problems unique to large data of not being able to store it all on every clone. Which means a git-annex like thing is needed to track where the data resides and move it around.
(BTW, in markdown, you separate paragraphs with blank lines. Like in email.)
"""]]