reusing repository uuid cannot result in data loss AFAIK

Avoiding such problems is one reason why git-annex does active
verification of other copies of a file when dropping.

You could argue that reusing the uuid of a trusted repository leads to
data loss, but that data loss doesn't really involve reusing the uuid,
but instead is caused by deleting a trusted repository. Using trusted
repositories without a great deal of care is a good way to blow off your
foot, of which deleting them is only the most obvious;
added some sections about that.

If reusing a repository uuid could result in data loss then I'd be on
board with making reinit run a fast fsck to update the location log, but
since it can't, I feel that is not worth forcing. Not a bad idea to run
fsck afterwards. Updated language about that.

This commit was sponsored by Jake Vosloo on Patreon.
This commit is contained in:
Joey Hess 2017-01-30 12:54:32 -04:00
parent 280442ca2c
commit 809ddd9df8
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
2 changed files with 53 additions and 11 deletions

View file

@ -14,10 +14,11 @@ UUID -- for example, if a repository got deleted, and you're
setting it back up. setting it back up.
Use this with caution; it can be confusing to have two existing Use this with caution; it can be confusing to have two existing
repositories with the same UUID. It can lead to data loss and other repositories with the same UUID.
weird phenomenons. Make sure you run `git annex fsck` after changing
the UUID of a repository to make sure location tracking information is Make sure you run `git annex fsck` after changing the UUID of a
recorded correctly. repository to make sure location tracking information is recorded
correctly.
Like `git annex init`, this attempts to enable any special remotes Like `git annex init`, this attempts to enable any special remotes
that are configured with autoenable=true. that are configured with autoenable=true.

View file

@ -66,9 +66,16 @@ To quote the [[git-annex-reinit]] manpage:
[[git-annex-reinit]] can be used to reuse UUIDs for deleted [[git-annex-reinit]] can be used to reuse UUIDs for deleted
repositories. But what happens if you reuse the UUID of an *existing* repositories. But what happens if you reuse the UUID of an *existing*
repository, or a repository that hasn't been properly emptied before repository, or a repository that hasn't been properly emptied before
being declared dead? This can lead to data loss because, in that case, being declared dead? This can lead to git-annex getting confused
git-annex may think some files are still present in the revived because, in that case, git-annex may think some files are still
repository (while they may not actually be). present in the revived repository (while they may not actually be).
This should never result in data loss, because git-annex does not
trust its records about the contents of a repository, and checks
that it really contains files before dropping them from other
repositories. (The one exception to this rule is trusted repositories,
whose contents are never checked. See the next two sections for more
about problems with trusted repositories.)
Proper pattern Proper pattern
-------------- --------------
@ -89,11 +96,45 @@ Fixes
An improvement to git-annex here would be to allow An improvement to git-annex here would be to allow
[[reinit to work without arguments|todo/reinit_should_work_without_arguments]] [[reinit to work without arguments|todo/reinit_should_work_without_arguments]]
to at least not encourage UUID reuse. reinit could also recommend to at least not encourage UUID reuse.
running fsck explicitely. It could even trigger an fsck directly.
The [[git-annex-reinit]] manpage has always suggested running `fsck`, # **Deleting data from trusted repositories**
but the wording has been changed on 2017-01-17.
When you use [[git-annex-trust]] on a repository, you disable
some very important sanity checks that make sure that git-annex
never loses the content of files. So trusting a repository
is a good way to shoot yourself in the foot and lose data. Like the
man page says, "Use with care."
When you have made git-annex trust a repository, you can lose data
by dropping files from that repository. For example, suppose file `foo` is
present in the trusted repository, and also in a second repository.
Now suppose you run `git annex drop foo` in both repositories.
Normally, git-annex will not let both copies of the file be removed,
but if the trusted repository is able to verify that the second
repository has a copy, it will delete its copy. Then the drop in the second
repository will *trust* the trusted repository still has its copy,
and so the last copy of the file gets deleted.
Proper pattern
--------------
Either avoid using trusted repositories, or avoid dropping content
from them, or make sure you `git annex sync` just right, so
other reposities know that data has been removed from a trusted repository.
# **Deleting trusted repositories**
Another way trusted repositories are unsafe is that even after they're
deleted, git-annex will trust that they contained the files they
used to contain.
Proper pattern
--------------
Always use [[git-annex-dead]] to tell git-annex when a repository has
been deleted, especially if it was trusted.
Other cases Other cases
=========== ===========