git-annex/doc/forum/Making_git-annex_less_necessary.mdwn

6 lines
2.2 KiB
Text
Raw Normal View History

2012-05-07 21:57:58 +00:00
http://git-annex.branchable.com/walkthrough/ says "Git wants to first stage the entire contents of the file in its index. That can be slow for big files (sorta why git-annex exists in the first place)."<br/>
What is git doing that git-annex isn't, other than copying the file to .git/objects rather than just moving it to .git/annex/objects, prepending it with "blob"+length, and compressing it? If git were changed to store the "blob"+length as part of the object filename rather than as part of the object file content, have a config option to use uncompressed objects for large files (and not try to pack them when creating pack files), and were used on a filesystem such as zfs or btrfs which does COW so the copy would be as fast as a move, then what speed advantage would git-annex still have over git? I realize git-annex has more features than just big file handling, and has the worm backend for even faster handling, but I'm just talking about the case with the default sha backend.<br/>
Have such changes been proposed for git? It seems that for anybody already familiar with the git codebase, adding the config option for uncompressed objects and moving the storage location for "blob"+length would be easy changes to make, and I see no downside to them. It wouldn't break backwards compatibility because the object filename being hash."blob".length rather than just hash would indicate that the new object format is in use, and a ".raw" filename extension could be used for uncompressed objects (or more sensibly, in the new format, no additional extension for uncompressed, and ".compressed" for compressed).<br/>
This would also eliminate the need for a git-annex object store separate from the git object store, and the complexities involved with having them separate, and the need for symlinks, and the complexities they cause. I don't think that relying on COW for speed is unreasonable once btrfs becomes the default in major Linux distros (the bsds already have zfs and hammerfs); right now part of what git-annex is doing is just working around the functional deficiency of non-COW filesystems.<br/>
P.S. I recommend a "plain" option for the page type when submitting comments on your wiki, so I don't have to put HTML line break markup at the end of my lines.