[[!comment format=mdwn username="arnaud.legrand@e79f5d4cff79116f56388885021e8507bef18e12" nickname="arnaud.legrand" avatar="http://cdn.libravatar.org/avatar/143239914c3e3c1a374a7c244b56d73e" subject="Weird behavior of git archive in combination with largefiles configuration" date="2023-02-17T09:46:34Z" content=""" Hi, I'm preparing a lecture on how git annex can help research data management and I stumbled, when playing with `git-annex unannex`, on a strange behavior that I fail to understand nor to properly work around. When preparing for a public archive it may make sense to include **some** annexed files in the archive while it may be desirable to keep the symlinks for others (e.g., because they are already available from somewhere else). This is why I do not want to rely on the `git-annex export` mechanism that would replace the symlinks of all annexed files by their content. Instead, I `unannex` some of my files but surprisingly, depending on git annex configuration, their content may not be in the archive produced by `git archive`. Here is a minimal working example. ``` shell DIR=/tmp/test chmod -Rf u+w $DIR; rm -rf $DIR ; mkdir -p $DIR; cd $DIR git init git annex init git config --local annex.largefiles 'largerthan=100kb and include=data/*' echo \"Hello\" > README git add README mkdir data/ dd if=/dev/zero of=data/foo.dat bs=1M count=1 2>/dev/null git annex add data/foo.dat git commit -m \"Initial commit\" ## git config --local annex.largefiles '' git annex unannex data/foo.dat && git add data/foo.dat && git commit -m \"Unannexing\" git archive --format=tar.gz --prefix nobel_project/ -o ../archive.tgz HEAD tar zxf ../archive.tgz tree -s nobel_project/ ``` ``` example Initialized empty Git repository in /tmp/test/.git/ init ok (recording state in git...) add data/foo.dat 31.98 KiB 14 MiB/s 0s100% 1 MiB 137 MiB/s 0s ok (recording state in git...) [master (root-commit) 8fbb907] Initial commit 2 files changed, 2 insertions(+) create mode 100644 README create mode 120000 data/foo.dat unannex data/foo.dat ok (recording state in git...) [master da73fb0] Unannexing 1 file changed, 1 insertion(+), 1 deletion(-) ) 100644 mnobel_project/ ├── [ 4096] data │   └── [ 102] foo.dat └── [ 6] README 1 directory, 2 files ``` As you may see from the output, `foo.dat` is only 102 bytes whereas it should be 1MB. Instead the content of `foo.dat` is: ``` shell cat nobel_project/data/foo.dat ``` ``` example /annex/objects/SHA256E-s1048576--30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58.dat ``` But if I remove the `annex.largefiles` configuration (either upfront or right before calling `unannex`), everything works as expected, i.e., my archive comprises the content of the annexed file. Is this an expected behavior ? This is the kind of operation I typically do in a branch that I erase afterward but it (temporarily) messes my local git configuration, which I don't like, so I'm looking for a better workaround. Thanks for you amazing work, Arnaud """]]