git-annex branch size when storing migration information
Sponsored-by: Jack Hill on Patreon
This commit is contained in:
parent
d37219e3e5
commit
1d020df896
1 changed files with 50 additions and 0 deletions
|
@ -0,0 +1,50 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""git-annex branch size when storing migration information"""
|
||||
date="2023-12-01T16:10:11Z"
|
||||
content="""
|
||||
I did a small experiment to gauge how much the git repo size would grow if
|
||||
migration were recorded in log files in the git-annex branch.
|
||||
|
||||
In my experiment, I started with 1000 files using sha256. The size of the
|
||||
git objects (after repack by git gc --aggressive) was 0.5 mb. I then
|
||||
migrated them to sha512, which increased the size of git objects to 1.1 mb
|
||||
(after repacking).
|
||||
|
||||
Then I recorded in the git-annex branch additional log files for each of
|
||||
the sha512 keys that contained the corresponding sha256 key. That grew the
|
||||
git objects to 1.4 mb after repacking.
|
||||
|
||||
This was a little disappointing. I'd hoped that repacking would avoid
|
||||
duplication of the sha256 keys, which are both in the log files I wrote
|
||||
and are used as filenames. But the data I wrote to the logs is only 75 kb
|
||||
total, and git grew 4x that.
|
||||
|
||||
I tried the same thing except instead of separate log files I added to git
|
||||
one log file that contained pairs of sha256 and sha512 keys. That log file
|
||||
was 213 kb and adding it to the git repo grew it by 102 kb. So there was
|
||||
some compression there, but less than I would have hoped, and not much
|
||||
better than just gzip -9 of the log file (113 kb). Of course putting all
|
||||
the migration information in a single file like this would add a lot of
|
||||
complexity to accessing it.
|
||||
|
||||
So adding this information to the git-annex branch would involve at best
|
||||
around a 16% overhead, which is a surprising amount.
|
||||
|
||||
(It would be possible to make `git-annex forget --drop-dead` remove the
|
||||
information about old migrated keys if they later get marked as dead, and
|
||||
so regain the space.)
|
||||
|
||||
This is also rather redundant information to store in git, since most
|
||||
of the time when file foo has been migrated, the old key can be determined
|
||||
by looking at `git log foo`. Not always of course because foo might have
|
||||
been renamed after migration, for example.
|
||||
|
||||
Another way to store migration information in the git-annex branch would to
|
||||
be graft in the pre-migration tree and the post-migration tree. Diffing
|
||||
those two trees would show what migrated, and most of the time this would
|
||||
use almost no additional space in git, because the user will have committed
|
||||
both those trees anyway, or something very close to them. But it would be
|
||||
more expensive to extract the migration information then, and this would
|
||||
need a local cache of migrations to be built up from examining those diffs..
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue