From 7d57866c3e1cddea2ce8b74c457a46416b6704d4 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 17 May 2021 15:03:47 -0400 Subject: [PATCH] update for filter-branch --- doc/tips/splitting_a_repository.mdwn | 71 ++++++++++++++++++- ..._654527ef2350fe871e2d7ff6addc6713._comment | 10 ++- 2 files changed, 76 insertions(+), 5 deletions(-) diff --git a/doc/tips/splitting_a_repository.mdwn b/doc/tips/splitting_a_repository.mdwn index 89080f6203..cd94785760 100644 --- a/doc/tips/splitting_a_repository.mdwn +++ b/doc/tips/splitting_a_repository.mdwn @@ -1,13 +1,78 @@ [[!meta title="Splitting a git-annex repository"]] -Note: this is the reverse of [[migrating two seperate disconnected directories to git annex]]. - I have a [git annex](https://git-annex.branchable.com/) repo for all my media that has grown to 57866 files and git operations are getting slow, especially on external spinning hard drives, so I decided to split it into separate repositories. -This is how I did it, with some help from `#git-annex`. Suppose the old big repo is at `~/oldrepo`: +Here is how to split out a repository that contains a subset of the files +in the larger repository. The larger repository is left as-is, but similar +methods can be used to remove the files from it. Or, it can be deleted +once it gets split up into several smaller repositories. + +(This is the reverse of [[migrating two seperate disconnected directories +to git annex]].) + +Suppose the old big repo is at `~/oldrepo`, and you want to split out +photos from it, and those are all located inside `~/oldrepo/photos`. + +First, let's create a new empty repo. + + mkdir ~/photos + cd photos + git init + +Now to populate the new repo with the files we want from the old repo. We +can use `git filter-branch` to create a git branch that contains only the +history of the files in `photos`. That command has a *lot* of options and +ways to use it, but here is one simple way: + + cd ~/oldrepo + + # filter a branch to with only the files wanted by the new repository + git branch split-master master + git filter-branch --prune-empty --subdirectory-filter photos split-master + + # replace the new repo's master branch with the filtered branch + git push ~/photos split-master + git branch -D split-master + cd ~/photos + git reset --hard split-master + git branch -d split-master + +Next, the git-annex branch needs to be filtered to include only +the files in `photos`, and that filtered branch sent to the new repository. +That can be done with the [[git-annex-filter-branch]](1) command. + + cd ~/oldrepo + annexrev=$(git annex filter-branch photos --include-all-key-information --include-all-repo-config --include-global-config) + git push ~/photos $annexrev:refs/heads/git-annex + +Next, initialize git-annex on the new repository. This uses +the same annex.uuid as was in the old repository. That's ok, because +the repository that's been split off will never have the old repository +as a remote. + + cd ~/photos + git annex reinit $(git config --file ../tofilter/.git/config annex.uuid) + +Finally the annexed file contents need to be copied to the new repository: + + cd ~/photos + + # Hardlink all the annexed data from the old repo + cp -rl ~/oldrepo/.git/annex/objects .git/annex/ + + # Remove unneeded hard links + git annex unused --quiet + git annex drop --unused --force + + # Fix up annex links to content and make sure it's all ok. + git annex fsck + +# alternative older method + +Here is another way to do it. Suppose the old big repo is at `~/oldrepo`: ``` # Create a new repo for photos only diff --git a/doc/tips/splitting_a_repository/comment_1_654527ef2350fe871e2d7ff6addc6713._comment b/doc/tips/splitting_a_repository/comment_1_654527ef2350fe871e2d7ff6addc6713._comment index 3c04c20610..8ba3b7981c 100644 --- a/doc/tips/splitting_a_repository/comment_1_654527ef2350fe871e2d7ff6addc6713._comment +++ b/doc/tips/splitting_a_repository/comment_1_654527ef2350fe871e2d7ff6addc6713._comment @@ -3,8 +3,14 @@ subject="""comment 1""" date="2017-05-11T16:28:32Z" content=""" -This is a simple way to split a repository, but the resulting split git -repository will be larger than is really necessary. +2021 update: The new [[git-annex-filter-branch]] command +can be used to produce a filtered version of the git-annex branch that only +includes information for the files you want. I have updated the tip to +show how to do it that way, and kept the old way as an alternative + +The old, alternative way is a simple way to split a repository, but the +resulting split git repository will be larger than is really necessary. +(The new method avoids this problem.) When you `dropunused` all the hard links that are not present in the repository, git-annex will commit a log to the git-annex branch saying "I