reusing repository uuid cannot result in data loss AFAIK
Avoiding such problems is one reason why git-annex does active verification of other copies of a file when dropping. You could argue that reusing the uuid of a trusted repository leads to data loss, but that data loss doesn't really involve reusing the uuid, but instead is caused by deleting a trusted repository. Using trusted repositories without a great deal of care is a good way to blow off your foot, of which deleting them is only the most obvious; added some sections about that. If reusing a repository uuid could result in data loss then I'd be on board with making reinit run a fast fsck to update the location log, but since it can't, I feel that is not worth forcing. Not a bad idea to run fsck afterwards. Updated language about that. This commit was sponsored by Jake Vosloo on Patreon.
This commit is contained in:
parent
280442ca2c
commit
809ddd9df8
2 changed files with 53 additions and 11 deletions
|
@ -14,10 +14,11 @@ UUID -- for example, if a repository got deleted, and you're
|
||||||
setting it back up.
|
setting it back up.
|
||||||
|
|
||||||
Use this with caution; it can be confusing to have two existing
|
Use this with caution; it can be confusing to have two existing
|
||||||
repositories with the same UUID. It can lead to data loss and other
|
repositories with the same UUID.
|
||||||
weird phenomenons. Make sure you run `git annex fsck` after changing
|
|
||||||
the UUID of a repository to make sure location tracking information is
|
Make sure you run `git annex fsck` after changing the UUID of a
|
||||||
recorded correctly.
|
repository to make sure location tracking information is recorded
|
||||||
|
correctly.
|
||||||
|
|
||||||
Like `git annex init`, this attempts to enable any special remotes
|
Like `git annex init`, this attempts to enable any special remotes
|
||||||
that are configured with autoenable=true.
|
that are configured with autoenable=true.
|
||||||
|
|
|
@ -66,9 +66,16 @@ To quote the [[git-annex-reinit]] manpage:
|
||||||
[[git-annex-reinit]] can be used to reuse UUIDs for deleted
|
[[git-annex-reinit]] can be used to reuse UUIDs for deleted
|
||||||
repositories. But what happens if you reuse the UUID of an *existing*
|
repositories. But what happens if you reuse the UUID of an *existing*
|
||||||
repository, or a repository that hasn't been properly emptied before
|
repository, or a repository that hasn't been properly emptied before
|
||||||
being declared dead? This can lead to data loss because, in that case,
|
being declared dead? This can lead to git-annex getting confused
|
||||||
git-annex may think some files are still present in the revived
|
because, in that case, git-annex may think some files are still
|
||||||
repository (while they may not actually be).
|
present in the revived repository (while they may not actually be).
|
||||||
|
|
||||||
|
This should never result in data loss, because git-annex does not
|
||||||
|
trust its records about the contents of a repository, and checks
|
||||||
|
that it really contains files before dropping them from other
|
||||||
|
repositories. (The one exception to this rule is trusted repositories,
|
||||||
|
whose contents are never checked. See the next two sections for more
|
||||||
|
about problems with trusted repositories.)
|
||||||
|
|
||||||
Proper pattern
|
Proper pattern
|
||||||
--------------
|
--------------
|
||||||
|
@ -89,11 +96,45 @@ Fixes
|
||||||
|
|
||||||
An improvement to git-annex here would be to allow
|
An improvement to git-annex here would be to allow
|
||||||
[[reinit to work without arguments|todo/reinit_should_work_without_arguments]]
|
[[reinit to work without arguments|todo/reinit_should_work_without_arguments]]
|
||||||
to at least not encourage UUID reuse. reinit could also recommend
|
to at least not encourage UUID reuse.
|
||||||
running fsck explicitely. It could even trigger an fsck directly.
|
|
||||||
|
|
||||||
The [[git-annex-reinit]] manpage has always suggested running `fsck`,
|
# **Deleting data from trusted repositories**
|
||||||
but the wording has been changed on 2017-01-17.
|
|
||||||
|
When you use [[git-annex-trust]] on a repository, you disable
|
||||||
|
some very important sanity checks that make sure that git-annex
|
||||||
|
never loses the content of files. So trusting a repository
|
||||||
|
is a good way to shoot yourself in the foot and lose data. Like the
|
||||||
|
man page says, "Use with care."
|
||||||
|
|
||||||
|
When you have made git-annex trust a repository, you can lose data
|
||||||
|
by dropping files from that repository. For example, suppose file `foo` is
|
||||||
|
present in the trusted repository, and also in a second repository.
|
||||||
|
|
||||||
|
Now suppose you run `git annex drop foo` in both repositories.
|
||||||
|
Normally, git-annex will not let both copies of the file be removed,
|
||||||
|
but if the trusted repository is able to verify that the second
|
||||||
|
repository has a copy, it will delete its copy. Then the drop in the second
|
||||||
|
repository will *trust* the trusted repository still has its copy,
|
||||||
|
and so the last copy of the file gets deleted.
|
||||||
|
|
||||||
|
Proper pattern
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Either avoid using trusted repositories, or avoid dropping content
|
||||||
|
from them, or make sure you `git annex sync` just right, so
|
||||||
|
other reposities know that data has been removed from a trusted repository.
|
||||||
|
|
||||||
|
# **Deleting trusted repositories**
|
||||||
|
|
||||||
|
Another way trusted repositories are unsafe is that even after they're
|
||||||
|
deleted, git-annex will trust that they contained the files they
|
||||||
|
used to contain.
|
||||||
|
|
||||||
|
Proper pattern
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Always use [[git-annex-dead]] to tell git-annex when a repository has
|
||||||
|
been deleted, especially if it was trusted.
|
||||||
|
|
||||||
Other cases
|
Other cases
|
||||||
===========
|
===========
|
||||||
|
|
Loading…
Reference in a new issue