diff --git a/doc/git-annex-unannex/comment_6_d5a8450349b0383ec29d150420189937._comment b/doc/git-annex-unannex/comment_6_d5a8450349b0383ec29d150420189937._comment new file mode 100644 index 0000000000..0ada2033bb --- /dev/null +++ b/doc/git-annex-unannex/comment_6_d5a8450349b0383ec29d150420189937._comment @@ -0,0 +1,97 @@ +[[!comment format=mdwn + username="arnaud.legrand@e79f5d4cff79116f56388885021e8507bef18e12" + nickname="arnaud.legrand" + avatar="http://cdn.libravatar.org/avatar/143239914c3e3c1a374a7c244b56d73e" + subject="Weird behavior of git archive in combination with largefiles configuration" + date="2023-02-17T09:46:34Z" + content=""" +Hi, + +I'm preparing a lecture on how git annex can help research data +management and I stumbled, when playing with `git-annex unannex`, on a +strange behavior that I fail to understand nor to properly work around. +When preparing for a public archive it may make sense to include +**some** annexed files in the archive while it may be desirable to keep +the symlinks for others (e.g., because they are already available from +somewhere else). This is why I do not want to rely on the `git-annex +export` mechanism that would replace the symlinks of all annexed files +by their content. + +Instead, I `unannex` some of my files but surprisingly, depending on git +annex configuration, their content may not be in the archive produced by +`git archive`. Here is a minimal working example. + +``` shell +DIR=/tmp/test +chmod -Rf u+w $DIR; rm -rf $DIR ; mkdir -p $DIR; cd $DIR +git init +git annex init +git config --local annex.largefiles 'largerthan=100kb and include=data/*' + +echo \"Hello\" > README +git add README + +mkdir data/ +dd if=/dev/zero of=data/foo.dat bs=1M count=1 2>/dev/null +git annex add data/foo.dat +git commit -m \"Initial commit\" + +## git config --local annex.largefiles '' +git annex unannex data/foo.dat && git add data/foo.dat && git commit -m \"Unannexing\" +git archive --format=tar.gz --prefix nobel_project/ -o ../archive.tgz HEAD + +tar zxf ../archive.tgz +tree -s nobel_project/ +``` + +``` example + + Initialized empty Git repository in /tmp/test/.git/ +init ok +(recording state in git...) + add data/foo.dat +31.98 KiB 14 MiB/s 0s100% 1 MiB 137 MiB/s 0s ok +(recording state in git...) +[master (root-commit) 8fbb907] Initial commit + 2 files changed, 2 insertions(+) + create mode 100644 README + create mode 120000 data/foo.dat + unannex data/foo.dat ok +(recording state in git...) +[master da73fb0] Unannexing + 1 file changed, 1 insertion(+), 1 deletion(-) +) +100644 +mnobel_project/ +├── [ 4096] data +│   └── [ 102] foo.dat +└── [ 6] README + +1 directory, 2 files +``` + +As you may see from the output, `foo.dat` is only 102 bytes whereas it +should be 1MB. Instead the content of `foo.dat` is: + +``` shell +cat nobel_project/data/foo.dat +``` + +``` example +/annex/objects/SHA256E-s1048576--30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58.dat +``` + +But if I remove the `annex.largefiles` configuration (either upfront or +right before calling `unannex`), everything works as expected, i.e., my +archive comprises the content of the annexed file. + +Is this an expected behavior ? This is the kind of operation I typically +do in a branch that I erase afterward but it (temporarily) messes my +local git configuration, which I don't like, so I'm looking for a better +workaround. + +Thanks for you amazing work, + +Arnaud + +"""]]