reusing repository uuid cannot result in data loss AFAIK
Avoiding such problems is one reason why git-annex does active verification of other copies of a file when dropping. You could argue that reusing the uuid of a trusted repository leads to data loss, but that data loss doesn't really involve reusing the uuid, but instead is caused by deleting a trusted repository. Using trusted repositories without a great deal of care is a good way to blow off your foot, of which deleting them is only the most obvious; added some sections about that. If reusing a repository uuid could result in data loss then I'd be on board with making reinit run a fast fsck to update the location log, but since it can't, I feel that is not worth forcing. Not a bad idea to run fsck afterwards. Updated language about that. This commit was sponsored by Jake Vosloo on Patreon.
This commit is contained in:
parent
280442ca2c
commit
809ddd9df8
2 changed files with 53 additions and 11 deletions
|
@ -14,10 +14,11 @@ UUID -- for example, if a repository got deleted, and you're
|
|||
setting it back up.
|
||||
|
||||
Use this with caution; it can be confusing to have two existing
|
||||
repositories with the same UUID. It can lead to data loss and other
|
||||
weird phenomenons. Make sure you run `git annex fsck` after changing
|
||||
the UUID of a repository to make sure location tracking information is
|
||||
recorded correctly.
|
||||
repositories with the same UUID.
|
||||
|
||||
Make sure you run `git annex fsck` after changing the UUID of a
|
||||
repository to make sure location tracking information is recorded
|
||||
correctly.
|
||||
|
||||
Like `git annex init`, this attempts to enable any special remotes
|
||||
that are configured with autoenable=true.
|
||||
|
|
|
@ -66,9 +66,16 @@ To quote the [[git-annex-reinit]] manpage:
|
|||
[[git-annex-reinit]] can be used to reuse UUIDs for deleted
|
||||
repositories. But what happens if you reuse the UUID of an *existing*
|
||||
repository, or a repository that hasn't been properly emptied before
|
||||
being declared dead? This can lead to data loss because, in that case,
|
||||
git-annex may think some files are still present in the revived
|
||||
repository (while they may not actually be).
|
||||
being declared dead? This can lead to git-annex getting confused
|
||||
because, in that case, git-annex may think some files are still
|
||||
present in the revived repository (while they may not actually be).
|
||||
|
||||
This should never result in data loss, because git-annex does not
|
||||
trust its records about the contents of a repository, and checks
|
||||
that it really contains files before dropping them from other
|
||||
repositories. (The one exception to this rule is trusted repositories,
|
||||
whose contents are never checked. See the next two sections for more
|
||||
about problems with trusted repositories.)
|
||||
|
||||
Proper pattern
|
||||
--------------
|
||||
|
@ -89,11 +96,45 @@ Fixes
|
|||
|
||||
An improvement to git-annex here would be to allow
|
||||
[[reinit to work without arguments|todo/reinit_should_work_without_arguments]]
|
||||
to at least not encourage UUID reuse. reinit could also recommend
|
||||
running fsck explicitely. It could even trigger an fsck directly.
|
||||
to at least not encourage UUID reuse.
|
||||
|
||||
The [[git-annex-reinit]] manpage has always suggested running `fsck`,
|
||||
but the wording has been changed on 2017-01-17.
|
||||
# **Deleting data from trusted repositories**
|
||||
|
||||
When you use [[git-annex-trust]] on a repository, you disable
|
||||
some very important sanity checks that make sure that git-annex
|
||||
never loses the content of files. So trusting a repository
|
||||
is a good way to shoot yourself in the foot and lose data. Like the
|
||||
man page says, "Use with care."
|
||||
|
||||
When you have made git-annex trust a repository, you can lose data
|
||||
by dropping files from that repository. For example, suppose file `foo` is
|
||||
present in the trusted repository, and also in a second repository.
|
||||
|
||||
Now suppose you run `git annex drop foo` in both repositories.
|
||||
Normally, git-annex will not let both copies of the file be removed,
|
||||
but if the trusted repository is able to verify that the second
|
||||
repository has a copy, it will delete its copy. Then the drop in the second
|
||||
repository will *trust* the trusted repository still has its copy,
|
||||
and so the last copy of the file gets deleted.
|
||||
|
||||
Proper pattern
|
||||
--------------
|
||||
|
||||
Either avoid using trusted repositories, or avoid dropping content
|
||||
from them, or make sure you `git annex sync` just right, so
|
||||
other reposities know that data has been removed from a trusted repository.
|
||||
|
||||
# **Deleting trusted repositories**
|
||||
|
||||
Another way trusted repositories are unsafe is that even after they're
|
||||
deleted, git-annex will trust that they contained the files they
|
||||
used to contain.
|
||||
|
||||
Proper pattern
|
||||
--------------
|
||||
|
||||
Always use [[git-annex-dead]] to tell git-annex when a repository has
|
||||
been deleted, especially if it was trusted.
|
||||
|
||||
Other cases
|
||||
===========
|
||||
|
|
Loading…
Reference in a new issue