From 4f6388f522319ed5d01c9b26a3646939324ff1e8 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawnjjCyhVEcTRM5m4iIBqL3ZCooPx7ZYB_E" Date: Mon, 14 Jan 2013 13:28:45 +0000 Subject: [PATCH] --- ...mode_to_direct_mode_breaks_duplicates.mdwn | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 doc/bugs/Switching_from_indirect_mode_to_direct_mode_breaks_duplicates.mdwn diff --git a/doc/bugs/Switching_from_indirect_mode_to_direct_mode_breaks_duplicates.mdwn b/doc/bugs/Switching_from_indirect_mode_to_direct_mode_breaks_duplicates.mdwn new file mode 100644 index 0000000000..c4c1a8385b --- /dev/null +++ b/doc/bugs/Switching_from_indirect_mode_to_direct_mode_breaks_duplicates.mdwn @@ -0,0 +1,21 @@ +#What steps will reproduce the problem? + +1. Create a new repository in indirect mode. + +2. Add the same file twice under a different name. Now you have two symlinks pointing to the same file under .git/annex/objects/ + +3. Switch to direct mode. The first symlink gets replaced by the actual file. The second stays unchanged, pointing to nowhere. But git annex whereis still reports it has a copy. + +4. Delete the first file. Git annex whereis still thinks it has a copy of file 2, which is not true -> data loss. + +#What is the expected output? What do you see instead? + +When switching to direct mode, both symlinks should be replaced by a copy (or at least a hardlink) of the actual file. + +#What version of git-annex are you using? On what operating system? + +3.20130107 on Arch Linux x64 + +#Please provide any additional information below. + +The deduplication performed by git-annex is very dangerous in itself because files with identical content become replaced by references to the same file without the user necessarily being aware. Think of the user making a copy of a file, than modifying it. He would expect to end up with two files, the unchanged original and the modified copy. But what he really gets is two symlinks pointing to the same modified file.