New migrate subcommand can be used to switch files to using a different backend, safely and with no duplication of content.

This commit is contained in:
Joey Hess 2011-01-08 15:54:14 -04:00
parent 32b0e10390
commit a78b0555e1
8 changed files with 112 additions and 17 deletions

View file

@ -144,6 +144,14 @@ Many git-annex commands will stage changes for later `git commit` by you.
With no parameters, defaults to finding all files in the current directory
and its subdirectories.
* migrate [path ...]
Changes the specified annexed files to store their content in the
default backend (or the one specified with --backend).
Note that the content is not removed from the backend it was previously in.
Use `git annex unused` to find and remove such content.
* unannex [path ...]
Use this to undo an accidental add command. This is not the command you

View file

@ -277,25 +277,32 @@ add something like this to `.gitattributes`:
* annex.backend=SHA1
## migrating between backends
## migrating data to a new backend
Perhaps you had been using the WORM backend, but now have configured
git-annex to use SHA1 for new files. Your old files are still in WORM. How
to migrate that content? A quick and dirty way is to use the unannex
subcommand, which removes a file from git-annex's control, followed by
a re-add of the file, to put it in the new backend.
Maybe you started out using the WORM backend, and have now configured
git-annex to use SHA1. But files you added to the annex before still
use the WORM backend. There is a simple command that can migrate that
data:
# git annex unannex my_cool_big_file
unannex my_cool_big_file ok
# git annex add my_cool_big_file
add my_cool_big_file (checksum ...) ok
# git annex migrate my_cool_big_file
migrate my_cool_big_file (checksum...) ok
You can only migrate files whose content is currently available. Other
files will be skipped.
After migrating a file to a new backend, the old content in the old backend
will still be present. That is necessary because multiple files
can point to the same content. The `git annex unused` sucommand can be
used to clear up that detritus later. Note that hard links are used,
to avoid wasting disk space.
## unused data
It's possible for data to accumulate in the annex that no files point to
nymore. One way it can happen is if you `git rm` a file without
anymore. One way it can happen is if you `git rm` a file without
first calling `git annex drop`. And, when you modify an annexed file, the old
content of the file remains in the annex.
content of the file remains in the annex. Another way is when migrating
between backends.
This might be historical data you want to preserve, so git-annex defaults to
preserving it. So from time to time, you may want to check for such data and
@ -318,6 +325,10 @@ data anymore, you can easily remove it:
# git annex dropunused 1
dropunused 1 ok
Hint: To drop a lot of unused data, use a command like this:
# git annex dropunused `seq 1 1000`
## fsck: verifying your data
You can use the fsck subcommand to check for problems in your data.