update for filter-branch

This commit is contained in:
Joey Hess 2021-05-17 15:03:47 -04:00
parent c525d18cf7
commit 7d57866c3e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 76 additions and 5 deletions

View file

@ -1,13 +1,78 @@
[[!meta title="Splitting a git-annex repository"]]
Note: this is the reverse of [[migrating two seperate disconnected directories to git annex]].
I have a [git annex](https://git-annex.branchable.com/) repo for all my media
that has grown to 57866 files and git operations are getting slow, especially
on external spinning hard drives, so I decided to split it into separate
repositories.
This is how I did it, with some help from `#git-annex`. Suppose the old big repo is at `~/oldrepo`:
Here is how to split out a repository that contains a subset of the files
in the larger repository. The larger repository is left as-is, but similar
methods can be used to remove the files from it. Or, it can be deleted
once it gets split up into several smaller repositories.
(This is the reverse of [[migrating two seperate disconnected directories
to git annex]].)
Suppose the old big repo is at `~/oldrepo`, and you want to split out
photos from it, and those are all located inside `~/oldrepo/photos`.
First, let's create a new empty repo.
mkdir ~/photos
cd photos
git init
Now to populate the new repo with the files we want from the old repo. We
can use `git filter-branch` to create a git branch that contains only the
history of the files in `photos`. That command has a *lot* of options and
ways to use it, but here is one simple way:
cd ~/oldrepo
# filter a branch to with only the files wanted by the new repository
git branch split-master master
git filter-branch --prune-empty --subdirectory-filter photos split-master
# replace the new repo's master branch with the filtered branch
git push ~/photos split-master
git branch -D split-master
cd ~/photos
git reset --hard split-master
git branch -d split-master
Next, the git-annex branch needs to be filtered to include only
the files in `photos`, and that filtered branch sent to the new repository.
That can be done with the [[git-annex-filter-branch]](1) command.
cd ~/oldrepo
annexrev=$(git annex filter-branch photos --include-all-key-information --include-all-repo-config --include-global-config)
git push ~/photos $annexrev:refs/heads/git-annex
Next, initialize git-annex on the new repository. This uses
the same annex.uuid as was in the old repository. That's ok, because
the repository that's been split off will never have the old repository
as a remote.
cd ~/photos
git annex reinit $(git config --file ../tofilter/.git/config annex.uuid)
Finally the annexed file contents need to be copied to the new repository:
cd ~/photos
# Hardlink all the annexed data from the old repo
cp -rl ~/oldrepo/.git/annex/objects .git/annex/
# Remove unneeded hard links
git annex unused --quiet
git annex drop --unused --force
# Fix up annex links to content and make sure it's all ok.
git annex fsck
# alternative older method
Here is another way to do it. Suppose the old big repo is at `~/oldrepo`:
```
# Create a new repo for photos only

View file

@ -3,8 +3,14 @@
subject="""comment 1"""
date="2017-05-11T16:28:32Z"
content="""
This is a simple way to split a repository, but the resulting split git
repository will be larger than is really necessary.
2021 update: The new [[git-annex-filter-branch]] command
can be used to produce a filtered version of the git-annex branch that only
includes information for the files you want. I have updated the tip to
show how to do it that way, and kept the old way as an alternative
The old, alternative way is a simple way to split a repository, but the
resulting split git repository will be larger than is really necessary.
(The new method avoids this problem.)
When you `dropunused` all the hard links that are not present in the
repository, git-annex will commit a log to the git-annex branch saying "I