809ddd9df8
Avoiding such problems is one reason why git-annex does active verification of other copies of a file when dropping. You could argue that reusing the uuid of a trusted repository leads to data loss, but that data loss doesn't really involve reusing the uuid, but instead is caused by deleting a trusted repository. Using trusted repositories without a great deal of care is a good way to blow off your foot, of which deleting them is only the most obvious; added some sections about that. If reusing a repository uuid could result in data loss then I'd be on board with making reinit run a fast fsck to update the location log, but since it can't, I feel that is not worth forcing. Not a bad idea to run fsck afterwards. Updated language about that. This commit was sponsored by Jake Vosloo on Patreon.
146 lines
5.1 KiB
Markdown
146 lines
5.1 KiB
Markdown
This page tries to regroup a set of Really Bad Ideas people had with
|
|
git-annex in the past that can lead to catastrophic data loss, abusive
|
|
disk usage, improper swearing and other unfortunate experiences.
|
|
|
|
This could also be called the "git annex worst practices", but is
|
|
different than [[what git annex is not|not]] in that it covers normal
|
|
use cases of git-annex, just implemented in the wrong way. Hopefully,
|
|
git-annex should make it as hard as possible to do those things, but
|
|
sometimes, you just can't help it, people figure out the worst
|
|
possible ways of doing things.
|
|
|
|
[[!toc]]
|
|
|
|
---
|
|
|
|
# **Symlinking the `.git/annex` directory**
|
|
|
|
Symlinking the `.git/annex` directory, in the hope of saving
|
|
disk space, is a horrible idea. The general antipattern is:
|
|
|
|
git clone repoA repoB
|
|
mv repoB/.git/annex repoB/.git/annex.bak
|
|
ln -s repoA/.git/annex repoB/.git/annex
|
|
|
|
This is bad because git-annex will believe it has two copy of the
|
|
files and then would let you drop the single copy, therefore leading
|
|
to data loss.
|
|
|
|
Proper pattern
|
|
--------------
|
|
|
|
The proper way of doing this is through git-annex's hardlink support,
|
|
by cloning the repository with the `--shared` option:
|
|
|
|
git clone --shared repoA repoB
|
|
|
|
This will setup repoB as an "untrusted" repository and use hardlinks
|
|
to copy files between the two repos, using space only once. This
|
|
works, of course, only on filesystems that support hardlinks, but
|
|
that's usually the case for filesystems that support symlinks.
|
|
|
|
Real world cases
|
|
----------------
|
|
|
|
* [[forum/share_.git__47__annex__47__objects_across_multiple_repositories_on_one_machine/]]
|
|
* at least one IRC discussion
|
|
|
|
Fixes
|
|
-----
|
|
|
|
Probably no way to fix this in git-annex - if users want to shoot
|
|
themselves in the foot by messing with the backend, there's not much
|
|
we can do to change that in this case.
|
|
|
|
---
|
|
|
|
# **Reinit repo with an existing uuid without `fsck`**
|
|
|
|
To quote the [[git-annex-reinit]] manpage:
|
|
|
|
> Normally, initializing a repository generates a new, unique
|
|
> identifier (UUID) for that repository. Occasionally it may be useful
|
|
> to reuse a UUID -- for example, if a repository got deleted, and
|
|
> you're setting it back up.
|
|
|
|
[[git-annex-reinit]] can be used to reuse UUIDs for deleted
|
|
repositories. But what happens if you reuse the UUID of an *existing*
|
|
repository, or a repository that hasn't been properly emptied before
|
|
being declared dead? This can lead to git-annex getting confused
|
|
because, in that case, git-annex may think some files are still
|
|
present in the revived repository (while they may not actually be).
|
|
|
|
This should never result in data loss, because git-annex does not
|
|
trust its records about the contents of a repository, and checks
|
|
that it really contains files before dropping them from other
|
|
repositories. (The one exception to this rule is trusted repositories,
|
|
whose contents are never checked. See the next two sections for more
|
|
about problems with trusted repositories.)
|
|
|
|
Proper pattern
|
|
--------------
|
|
|
|
The proper way of using reinit is to make sure you run
|
|
[[git-annex-fsck]] (optionally with `--fast` to save time) on the
|
|
revived repo right after running reinit. This will ensure that at
|
|
least the location log will be updated, and git-annex will notice if
|
|
files are missing.
|
|
|
|
Real world cases
|
|
----------------
|
|
|
|
* [[bugs/remotes_disappeared]]
|
|
|
|
Fixes
|
|
-----
|
|
|
|
An improvement to git-annex here would be to allow
|
|
[[reinit to work without arguments|todo/reinit_should_work_without_arguments]]
|
|
to at least not encourage UUID reuse.
|
|
|
|
# **Deleting data from trusted repositories**
|
|
|
|
When you use [[git-annex-trust]] on a repository, you disable
|
|
some very important sanity checks that make sure that git-annex
|
|
never loses the content of files. So trusting a repository
|
|
is a good way to shoot yourself in the foot and lose data. Like the
|
|
man page says, "Use with care."
|
|
|
|
When you have made git-annex trust a repository, you can lose data
|
|
by dropping files from that repository. For example, suppose file `foo` is
|
|
present in the trusted repository, and also in a second repository.
|
|
|
|
Now suppose you run `git annex drop foo` in both repositories.
|
|
Normally, git-annex will not let both copies of the file be removed,
|
|
but if the trusted repository is able to verify that the second
|
|
repository has a copy, it will delete its copy. Then the drop in the second
|
|
repository will *trust* the trusted repository still has its copy,
|
|
and so the last copy of the file gets deleted.
|
|
|
|
Proper pattern
|
|
--------------
|
|
|
|
Either avoid using trusted repositories, or avoid dropping content
|
|
from them, or make sure you `git annex sync` just right, so
|
|
other reposities know that data has been removed from a trusted repository.
|
|
|
|
# **Deleting trusted repositories**
|
|
|
|
Another way trusted repositories are unsafe is that even after they're
|
|
deleted, git-annex will trust that they contained the files they
|
|
used to contain.
|
|
|
|
Proper pattern
|
|
--------------
|
|
|
|
Always use [[git-annex-dead]] to tell git-annex when a repository has
|
|
been deleted, especially if it was trusted.
|
|
|
|
Other cases
|
|
===========
|
|
|
|
Feel free to add your lessons in catastrophe here! It's educational
|
|
and fun, and will improve git-annex for everyone.
|
|
|
|
PS: should this be a toplevel page instead of being drowned in the
|
|
[[tips]] section? Where should it be linked to? -- [[anarcat]]
|