98 lines
3.1 KiB
Text
98 lines
3.1 KiB
Text
|
[[!comment format=mdwn
|
||
|
username="arnaud.legrand@e79f5d4cff79116f56388885021e8507bef18e12"
|
||
|
nickname="arnaud.legrand"
|
||
|
avatar="http://cdn.libravatar.org/avatar/143239914c3e3c1a374a7c244b56d73e"
|
||
|
subject="Weird behavior of git archive in combination with largefiles configuration"
|
||
|
date="2023-02-17T09:46:34Z"
|
||
|
content="""
|
||
|
Hi,
|
||
|
|
||
|
I'm preparing a lecture on how git annex can help research data
|
||
|
management and I stumbled, when playing with `git-annex unannex`, on a
|
||
|
strange behavior that I fail to understand nor to properly work around.
|
||
|
When preparing for a public archive it may make sense to include
|
||
|
**some** annexed files in the archive while it may be desirable to keep
|
||
|
the symlinks for others (e.g., because they are already available from
|
||
|
somewhere else). This is why I do not want to rely on the `git-annex
|
||
|
export` mechanism that would replace the symlinks of all annexed files
|
||
|
by their content.
|
||
|
|
||
|
Instead, I `unannex` some of my files but surprisingly, depending on git
|
||
|
annex configuration, their content may not be in the archive produced by
|
||
|
`git archive`. Here is a minimal working example.
|
||
|
|
||
|
``` shell
|
||
|
DIR=/tmp/test
|
||
|
chmod -Rf u+w $DIR; rm -rf $DIR ; mkdir -p $DIR; cd $DIR
|
||
|
git init
|
||
|
git annex init
|
||
|
git config --local annex.largefiles 'largerthan=100kb and include=data/*'
|
||
|
|
||
|
echo \"Hello\" > README
|
||
|
git add README
|
||
|
|
||
|
mkdir data/
|
||
|
dd if=/dev/zero of=data/foo.dat bs=1M count=1 2>/dev/null
|
||
|
git annex add data/foo.dat
|
||
|
git commit -m \"Initial commit\"
|
||
|
|
||
|
## git config --local annex.largefiles ''
|
||
|
git annex unannex data/foo.dat && git add data/foo.dat && git commit -m \"Unannexing\"
|
||
|
git archive --format=tar.gz --prefix nobel_project/ -o ../archive.tgz HEAD
|
||
|
|
||
|
tar zxf ../archive.tgz
|
||
|
tree -s nobel_project/
|
||
|
```
|
||
|
|
||
|
``` example
|
||
|
|
||
|
Initialized empty Git repository in /tmp/test/.git/
|
||
|
init ok
|
||
|
(recording state in git...)
|
||
|
add data/foo.dat
|
||
|
31.98 KiB 14 MiB/s 0s100% 1 MiB 137 MiB/s 0s ok
|
||
|
(recording state in git...)
|
||
|
[master (root-commit) 8fbb907] Initial commit
|
||
|
2 files changed, 2 insertions(+)
|
||
|
create mode 100644 README
|
||
|
create mode 120000 data/foo.dat
|
||
|
unannex data/foo.dat ok
|
||
|
(recording state in git...)
|
||
|
[master da73fb0] Unannexing
|
||
|
1 file changed, 1 insertion(+), 1 deletion(-)
|
||
|
)
|
||
|
100644
|
||
|
mnobel_project/
|
||
|
├── [ 4096] data
|
||
|
│ └── [ 102] foo.dat
|
||
|
└── [ 6] README
|
||
|
|
||
|
1 directory, 2 files
|
||
|
```
|
||
|
|
||
|
As you may see from the output, `foo.dat` is only 102 bytes whereas it
|
||
|
should be 1MB. Instead the content of `foo.dat` is:
|
||
|
|
||
|
``` shell
|
||
|
cat nobel_project/data/foo.dat
|
||
|
```
|
||
|
|
||
|
``` example
|
||
|
/annex/objects/SHA256E-s1048576--30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58.dat
|
||
|
```
|
||
|
|
||
|
But if I remove the `annex.largefiles` configuration (either upfront or
|
||
|
right before calling `unannex`), everything works as expected, i.e., my
|
||
|
archive comprises the content of the annexed file.
|
||
|
|
||
|
Is this an expected behavior ? This is the kind of operation I typically
|
||
|
do in a branch that I erase afterward but it (temporarily) messes my
|
||
|
local git configuration, which I don't like, so I'm looking for a better
|
||
|
workaround.
|
||
|
|
||
|
Thanks for you amazing work,
|
||
|
|
||
|
Arnaud
|
||
|
|
||
|
"""]]
|