Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
c962221337
2 changed files with 22 additions and 0 deletions
|
@ -0,0 +1,12 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawlsXhOlsW6RaGR83VNSMxPh159l5dFau70"
|
||||||
|
nickname="Yung-Chin"
|
||||||
|
subject="re scaling issue"
|
||||||
|
date="2013-05-03T14:57:51Z"
|
||||||
|
content="""
|
||||||
|
Tobias/joey,
|
||||||
|
|
||||||
|
I think there are at least two scaling issues that may be causing you trouble. One is that bup writes pack+idx files rather than bare objects, and if you send 1 file per call to bup-split, you end up with a pair of pack and idx files for each such call. When you later try to retrieve a blob, bup currently just calls git, and git will have to traverse all these tiny idx files looking for the right hash (bare objects you could at least find by name). You can probably ameliorate the pain by calling git repack (look at the -a and --max-pack-size switches) on your bup repository. The other is the \"thousands of branches\" issue, and I think \"git pack-refs --all\" (that's again on your bup repository) might help a little bit.
|
||||||
|
|
||||||
|
It would certainly help performance if you could store blob/tree ids in git-annex instead of branch names. For small files, all bup would need to store is a blob, but currently you end up storing a blob, a tree, and a commit (and looking-up all of those, plus the ref too, on calling bup-join). (you might want to patch bup-split, so it would allow you to ask it for \"--blob-or-tree\", because currently if you say you pass it -b for blob-ids, then for bigger files you get a series of IDs, whereas you'd be much better off with a tree-id there)
|
||||||
|
"""]]
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawlsXhOlsW6RaGR83VNSMxPh159l5dFau70"
|
||||||
|
nickname="Yung-Chin"
|
||||||
|
subject="comment 9"
|
||||||
|
date="2013-05-03T16:34:05Z"
|
||||||
|
content="""
|
||||||
|
Thinking about this some more, a very elegant way to make a bup remote could actually be to just pass the whole .git/annex tree into bup-index/save (you could avoid sending some files by only bup-indexing select subtrees, or by using --exclude-*'s, but you'd run bup-save over the whole .git/annex tree). You could then use bup-restore to retrieve files or whole subtrees, and you'd refer to the files you're retrieving by their actual pathname under which they live in .git/annex (if that doesn't make sense it's because I've misunderstood how git-annex is organised!), so something like \"bup restore branch_name/latest/.git/annex/aa/bb/sha-of-some-sort\" would work - that's cute, right? And you'd only have 1 branch.
|
||||||
|
|
||||||
|
However... somebody who is good with lazy-evaluation would need to rework bup.vfs: currently, if you'd call bup-restore on a path like that, it would instantiate a _lot_ of vfs-nodes you don't need - to begin with, it would make a node for every commit you ever made (on any branch!) - on a big repository you'd wait ages for it to just find the commit objects...
|
||||||
|
"""]]
|
Loading…
Add table
Reference in a new issue