This commit is contained in:
parent
ada1f56806
commit
add2f406a8
1 changed files with 55 additions and 0 deletions
|
@ -0,0 +1,55 @@
|
|||
This is not really a proper bug report, but I thought I should post this here
|
||||
in case someone can find any sane, non-supernatural reason for a strange case
|
||||
of data loss I have experienced with git-annex.
|
||||
|
||||
Some time ago I cloned a bunch of git-annex repos from an external drive (let's
|
||||
call it disk1) to a new computer (computer3). On one of my repos git-annex
|
||||
marked a bunch of files corrupt and moved them to .git/annex/bad. Oops, I
|
||||
thought, I must have a failing disk. Luckily I had offsite backups -- no less
|
||||
than two other external hard disks (disk2-3), each having a full copy of the
|
||||
repo in question. However, **both of these** had the same, corrupt files. The
|
||||
files have the correct size, but are filled with zeroes. Other files in the
|
||||
repo are fine, and so are other repos.
|
||||
|
||||
I have been trying to wrap my head around this but I can't think of any reason
|
||||
how this could occur. However the files have gotten corrupted in the first
|
||||
place, the corruption should have been picked up when copying the content to
|
||||
the external drives disk2 and disk3, right? I have to rule out NSA/MIB/aliens
|
||||
from messing with me because these files are not that valuable or sensitive.
|
||||
|
||||
The files in question were added to git-annex back in 2012, so the trail is
|
||||
cold on this one. Naturally, I have no idea on how to reproduce this, nor can I
|
||||
reliably say that git-annex is to blame. I can gather some hints though. The
|
||||
files were all added on the same commit in 2012, but not all files from that
|
||||
commit are corrupted. The corrupted files have consecutive file names. The
|
||||
files were never modified since (except for the corruption), and the content
|
||||
*may* have been copied via an encrypted rsync transfer repository. I have
|
||||
always used git-annex on Arch Linux and in indirect more. The files used the
|
||||
SHA-1 backend.
|
||||
|
||||
All these files have a similar tracking log that looks something like this
|
||||
(uuids replaced with symbolic names):
|
||||
|
||||
1356690700.542152s 1 computer1 <- first added
|
||||
1356691074.253815s 1 disk1 <- copied to disk1
|
||||
1356719321.145126s 1 rsync <- copied to rsync repo
|
||||
1358070999.435676s 1 rsync <- copied to rsync repo (again?)
|
||||
1362166895.310332s 1 disk2 <- copied to disk2
|
||||
1362906850.555869s 1 computer2 (dead) <- copied to another computer
|
||||
1364926664.362195s 0 computer1 <- dropped from computer1 as enough copies in disks
|
||||
1374412057.409496s 0 computer2 (dead) <- dropped from computer2, now dead
|
||||
1445691595.764108s 1 disk3 <- copied to disk3
|
||||
1445770764.165792s 0 rsync <- dropped from rsync repo to save space
|
||||
1482077052.217353646s 0 disk1 <- first noticed as corrupted on disk1
|
||||
1482741278.318274404s 0 disk3 <- WTF, also corrupted on disk3
|
||||
1482926246.268440532s 0 disk2 <- double-WTF, also corrupted on disk2
|
||||
|
||||
The only thing that strikes odd to me is the double entry with the rsync
|
||||
remote. The non-corrupted files from the same commit do not seem to have such a
|
||||
double entry.
|
||||
|
||||
So my main question is, has there ever been a bug in git-annex that could have
|
||||
caused this behavior? Or is there any other realistic explanation for this? In
|
||||
case this is an existing bug, is there any other evidence I can gather?
|
||||
Needless to say, the lesson here is to run `git annex fsck` regularly even if
|
||||
you have offsite backups...
|
Loading…
Add table
Reference in a new issue