2011-01-26 17:21:51 +00:00
|
|
|
git-annex should use smudge/clean filters.
|
|
|
|
|
|
|
|
The trick is doing it efficiently. Since git a2b665d, 2011-01-05,
|
|
|
|
something like this works to provide a filename to the clean script:
|
|
|
|
|
|
|
|
git config --global filter.huge.clean huge-clean %f
|
|
|
|
|
|
|
|
This avoids it needing to read all the current file content from stdin
|
|
|
|
when doing eg, a git status or git commit. Instead it is passed the
|
|
|
|
filename that git is operating on, I think that's from the working
|
|
|
|
directory.
|
|
|
|
|
|
|
|
So, WORM could just look at that file and easily tell if it is one
|
|
|
|
it already knows (same mtime and size). If so, it can short-circuit and
|
|
|
|
do nothing, file content is already cached.
|
|
|
|
|
|
|
|
SHA1 has a harder job. Would not want to re-sha1 the file every time,
|
|
|
|
probably. So it'd need a cache of file stat info, mapped to known objects.
|
|
|
|
|
2011-01-26 17:34:39 +00:00
|
|
|
### dealing with partial content availability
|
|
|
|
|
|
|
|
The smudge filter cannot be allowed to fail, that leaves the tree and
|
|
|
|
index in a weird state. So if a file's content is requested by calling
|
|
|
|
the smudge filter, the trick is to instead provide dummy content,
|
|
|
|
indicating it is not available (and perhaps saying to run "git-annex get").
|
|
|
|
|
|
|
|
Then, in the clean filter, it has to detect that it's cleaning a file
|
|
|
|
with that dummy content, and make sure to provide the same identifier as
|
|
|
|
it would if the file content was there.
|
|
|
|
|
|
|
|
I've a demo implementation of this technique in the scripts below.
|
2011-01-26 17:21:51 +00:00
|
|
|
|
2011-01-26 17:40:11 +00:00
|
|
|
----
|
|
|
|
|
|
|
|
It may further be possible to use the %f with the smudge filter
|
|
|
|
(docs say it's supported), and instead of outputting the dummy content,
|
|
|
|
it could create a dangling symlink, which would be more like git-annex's
|
|
|
|
behavior now, and makes it easy to tell what content is not available
|
|
|
|
with `ls`.
|
2011-01-26 17:21:51 +00:00
|
|
|
|
|
|
|
### test files
|
|
|
|
|
|
|
|
huge-smudge:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
#!/bin/sh
|
|
|
|
read sha1
|
|
|
|
echo "smudging $sha1" >&2
|
2011-01-26 17:34:39 +00:00
|
|
|
if [ -e ~/$sha1 ]; then
|
|
|
|
cat ~/$sha1
|
|
|
|
else
|
|
|
|
echo "$sha1 not available"
|
|
|
|
fi
|
2011-01-26 17:21:51 +00:00
|
|
|
</pre>
|
|
|
|
|
|
|
|
huge-clean:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
#!/bin/sh
|
|
|
|
cat >temp
|
2011-01-26 17:34:39 +00:00
|
|
|
if grep -q 'not available' temp; then
|
|
|
|
awk '{print $1}' temp # provide what we would if the content were avail!
|
|
|
|
rm temp
|
|
|
|
exit 0
|
|
|
|
fi
|
2011-01-26 17:21:51 +00:00
|
|
|
sha1=`sha1sum temp | cut -d' ' -f1`
|
|
|
|
echo "cleaning $sha1" >&2
|
|
|
|
ls -l temp >&2
|
|
|
|
mv temp ~/$sha1
|
|
|
|
echo $sha1
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
.gitattributes:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
*.huge filter=huge
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
in .git/config:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
[filter "huge"]
|
|
|
|
clean = huge-clean
|
|
|
|
smudge = huge-smudge
|
|
|
|
<pre>
|