This commit is contained in:
arnaud.legrand@e79f5d4cff79116f56388885021e8507bef18e12 2023-02-17 09:43:51 +00:00 committed by admin
parent 4e83ff3bbd
commit d31e605b02

View file

@ -1,97 +0,0 @@
[[!comment format=mdwn
username="arnaud.legrand@e79f5d4cff79116f56388885021e8507bef18e12"
nickname="arnaud.legrand"
avatar="http://cdn.libravatar.org/avatar/143239914c3e3c1a374a7c244b56d73e"
subject="Weird behavior of git archive in combination with largefiles configuration"
date="2023-02-17T09:43:19Z"
content="""
Hi,
I'm preparing a lecture on how git annex can help research data
management and I stumbled, when playing with `git-annex unannex`, on a
strange behavior that I fail to understand nor to properly work around.
When preparing for a public archive it may make sense to include
**some** annexed files in the archive while it may be desirable to keep
the symlinks for others (e.g., because they are already available from
somewhere else). This is why I do not want to rely on the `git-annex
export` mechanism that would replace the symlinks of all annexed files
by their content.
Instead, I `unannex` some of my files but surprisingly, depending on git
annex configuration, their content may not be in the archive produced by
`git archive`. Here is a minimal working example.
``` shell linenums=\"1\"
DIR=/tmp/test
chmod -Rf u+w $DIR; rm -rf $DIR ; mkdir -p $DIR; cd $DIR
git init
git annex init
git config --local annex.largefiles 'largerthan=100kb and include=data/*'
echo \"Hello\" > README
git add README
mkdir data/
dd if=/dev/zero of=data/foo.dat bs=1M count=1 2>/dev/null
git annex add data/foo.dat
git commit -m \"Initial commit\"
## git config --local annex.largefiles ''
git annex unannex data/foo.dat && git add data/foo.dat && git commit -m \"Unannexing\"
git archive --format=tar.gz --prefix nobel_project/ -o ../archive.tgz HEAD
tar zxf ../archive.tgz
tree -s nobel_project/
```
``` example
Initialized empty Git repository in /tmp/test/.git/
init ok
(recording state in git...)
add data/foo.dat
31.98 KiB 14 MiB/s 0s100% 1 MiB 137 MiB/s 0s ok
(recording state in git...)
[master (root-commit) 8fbb907] Initial commit
2 files changed, 2 insertions(+)
create mode 100644 README
create mode 120000 data/foo.dat
unannex data/foo.dat ok
(recording state in git...)
[master da73fb0] Unannexing
1 file changed, 1 insertion(+), 1 deletion(-)
)
100644
mnobel_project/
├── [ 4096] data
│   └── [ 102] foo.dat
└── [ 6] README
1 directory, 2 files
```
As you may see from the output, `foo.dat` is only 102 bytes whereas it
should be 1MB. Instead the content of `foo.dat` is:
``` shell linenums=\"1\"
cat nobel_project/data/foo.dat
```
``` example
/annex/objects/SHA256E-s1048576--30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58.dat
```
But if I remove the `annex.largefiles` configuration (either upfront or
right before calling `unannex`), everything works as expected, i.e., my
archive comprises the content of the annexed file.
Is this an expected behavior ? This is the kind of operation I typically
do in a branch that I erase afterward but it (temporarily) messes my
local git configuration, which I don't like, so I'm looking for a better
workaround.
Thanks for you amazing work,
Arnaud
"""]]