git-annex/doc/todo/branching.mdwn

The use of `.git-annex` to store logs means that if a repo has branches 
and the user switched between them, git-annex will see different logs in
the different branches, and so may miss info about what remotes have which
files (though it can re-learn). 

An alternative would be to store the log data directly in the git repo
as `pristine-tar` does. Problem with that approach is that git won't merge
conflicting changes to log files if they are not in the currently checked
out branch.

It would be possible to use a branch with a tree like this, to avoid
conflicts:

key/uuid/time/status

As long as new files are only added, and old timestamped files deleted,
there would be no conflicts.

A related problem though is the size of the tree objects git needs to
commit. Having the logs in a separate branch doesn't help with that.
As more keys are added, the tree object size will increase, and git will
take longer and longer to commit, and use more space. One way to deal with
this is simply by splitting the logs amoung subdirectories. Git then can
reuse trees for most directories. (Check: Does it still have to build
dup trees in memory?)

Another approach would be to have git-annex *delete* old logs. Keep logs
for the currently available files, or something like that. If other log
info is needed, look back through history to find the first occurance of a
log. Maybe even look at other branches -- so if the logs were on master,
a new empty branch could be made and git-annex would still know where to
get keys in that branch. 

Would have to be careful about conflicts when deleting and bringing back
files with the same name. And would need to avoid expensive searching thru
all history to try to find an old log file.
add doc wiki 2010-10-19 18:37:19 +00:00			The use of `.git-annex` to store logs means that if a repo has branches
			`and the user switched between them, git-annex will see different logs in`
			`the different branches, and so may miss info about what remotes have which`
			`files (though it can re-learn).`

			`An alternative would be to store the log data directly in the git repo`
			as `pristine-tar` does. Problem with that approach is that git won't merge
			`conflicting changes to log files if they are not in the currently checked`
			`out branch.`

			`It would be possible to use a branch with a tree like this, to avoid`
			`conflicts:`

			`key/uuid/time/status`

			`As long as new files are only added, and old timestamped files deleted,`
			`there would be no conflicts.`

			`A related problem though is the size of the tree objects git needs to`
			`commit. Having the logs in a separate branch doesn't help with that.`
			`As more keys are added, the tree object size will increase, and git will`
			`take longer and longer to commit, and use more space. One way to deal with`
			`this is simply by splitting the logs amoung subdirectories. Git then can`
			`reuse trees for most directories. (Check: Does it still have to build`
			`dup trees in memory?)`

			`Another approach would be to have git-annex delete old logs. Keep logs`
			`for the currently available files, or something like that. If other log`
			`info is needed, look back through history to find the first occurance of a`
			`log. Maybe even look at other branches -- so if the logs were on master,`
			`a new empty branch could be made and git-annex would still know where to`
			`get keys in that branch.`

			`Would have to be careful about conflicts when deleting and bringing back`
			`files with the same name. And would need to avoid expensive searching thru`
			`all history to try to find an old log file.`