0df94132d9
arising from a conversation at FOSSY
126 lines
4.1 KiB
Markdown
126 lines
4.1 KiB
Markdown
[[!meta title="Splitting a git-annex repository"]]
|
|
|
|
I have a [git annex](https://git-annex.branchable.com/) repo for all my media
|
|
that has grown to 57866 files and git operations are getting slow, especially
|
|
on external spinning hard drives, so I decided to split it into separate
|
|
repositories.
|
|
|
|
Here is how to split out a repository that contains a subset of the files
|
|
in the larger repository. The larger repository is left as-is, but similar
|
|
methods can be used to remove the files from it. Or, it can be deleted
|
|
once it gets split up into several smaller repositories.
|
|
|
|
(This is the reverse of [[migrating two seperate disconnected directories
|
|
to git annex]].)
|
|
|
|
Suppose the old big repo is at `~/oldrepo`, and you want to split out
|
|
photos from it, and those are all located inside `~/oldrepo/photos`.
|
|
|
|
First, let's create a new empty repo.
|
|
|
|
mkdir ~/photos
|
|
cd photos
|
|
git init
|
|
|
|
Now to populate the new repo with the files we want from the old repo. We
|
|
can use `git filter-branch` to create a git branch that contains only the
|
|
history of the files in `photos`. That command has a *lot* of options and
|
|
ways to use it, but here is one simple way:
|
|
|
|
cd ~/oldrepo
|
|
|
|
# filter a branch to with only the files wanted by the new repository
|
|
git branch split-master master
|
|
git filter-branch --prune-empty --subdirectory-filter photos split-master
|
|
|
|
# replace the new repo's master branch with the filtered branch
|
|
git push ~/photos split-master
|
|
git branch -D split-master
|
|
cd ~/photos
|
|
git reset --hard split-master
|
|
git branch -d split-master
|
|
|
|
Next, the git-annex branch needs to be filtered to include only
|
|
the files in `photos`, and that filtered branch sent to the new repository.
|
|
That can be done with the [[git-annex-filter-branch]](1) command.
|
|
|
|
cd ~/oldrepo
|
|
annexrev=$(git annex filter-branch photos --include-all-key-information --include-all-repo-config --include-global-config)
|
|
git push ~/photos $annexrev:refs/heads/git-annex
|
|
|
|
Next, initialize git-annex on the new repository. This uses
|
|
the same annex.uuid as was in the old repository. That's ok, because
|
|
the repository that's been split off will never have the old repository
|
|
as a remote.
|
|
|
|
cd ~/photos
|
|
git annex reinit $(git config --file ../tofilter/.git/config annex.uuid)
|
|
|
|
Finally the annexed file contents need to be copied to the new repository:
|
|
|
|
cd ~/photos
|
|
|
|
# Hardlink all the annexed data from the old repo
|
|
cp -rl ~/oldrepo/.git/annex/objects .git/annex/
|
|
|
|
# Remove unneeded hard links
|
|
git annex unused --quiet
|
|
git annex drop --unused --force
|
|
|
|
# Fix up annex links to content and make sure it's all ok.
|
|
git annex fsck
|
|
|
|
Warning: This method of copying the annexed file contents and dropping
|
|
the unused ones causes the git-annex branch to log information.
|
|
|
|
# alternative older method
|
|
|
|
Here is another way to do it. Suppose the old big repo is at `~/oldrepo`:
|
|
|
|
```
|
|
# Create a new repo for photos only
|
|
mkdir ~/photos
|
|
cd photos
|
|
git init
|
|
git annex init laptop
|
|
|
|
# Hardlink all the annexed data from the old repo
|
|
cp -rl ~/oldrepo/.git/annex/objects .git/annex/
|
|
|
|
# Regenerate the git annex metadata
|
|
git annex fsck --fast
|
|
|
|
# Also split the repo on the usb key
|
|
cd /media/usbkey
|
|
git clone ~/photos
|
|
cd photos
|
|
git annex init usbkey
|
|
cp -rl ../oldrepo/.git/annex/objects .git/annex/
|
|
git annex fsck --fast
|
|
|
|
# Connect the annexes as remotes of each other
|
|
git remote add laptop ~/photos
|
|
cd ~/photos
|
|
git remote add usbkey /media/usbkey
|
|
```
|
|
|
|
At this point, I went through all repos doing standard cleanup:
|
|
|
|
```
|
|
# Remove unneeded hard links
|
|
git annex unused
|
|
git annex dropunused --force 1-12345
|
|
|
|
# Sync
|
|
git annex sync
|
|
```
|
|
|
|
To make sure nothing is missing, I used `git annex find --not --in=here`
|
|
to see if, for example, the usbkey that should have everything could be missing
|
|
some thing.
|
|
|
|
Update: Antoine Beaupré pointed me to
|
|
[this tip about Repositories with large number of files](http://git-annex.branchable.com/tips/Repositories_with_large_number_of_files/)
|
|
which I will try next time one of my repositories grows enough to hit a performance issue.
|
|
|
|
> This document was originally written by [Enrico Zini](http://www.enricozini.org/blog/2017/debian/splitting-a-git-annex-repository/) and added to this wiki by [[anarcat]].
|