2011-01-26 17:21:51 +00:00
|
|
|
git-annex should use smudge/clean filters.
|
|
|
|
|
2011-02-25 19:50:17 +00:00
|
|
|
The clean filter is run when files are staged for commit. So a user could copy
|
|
|
|
any file into the annex, git add it, and git-annex's clean filter causes
|
|
|
|
the file's key to be staged, while its value is added to the annex.
|
|
|
|
|
|
|
|
The smudge filter is run when files are checked out. Since git annex
|
|
|
|
repos have partial content, this would not git annex get the file content.
|
|
|
|
Instead, if the content is not currently available, it would need to do
|
|
|
|
something like return empty file content. (Sadly, it cannot create a
|
|
|
|
symlink, as git still wants to write the file afterwards.
|
|
|
|
|
|
|
|
So the nice current behavior of unavailable files being clearly missing due
|
|
|
|
to dangling symlinks, would be lost when using smudge/clean filters.
|
|
|
|
(Contact git developers to get an interface to do this?)
|
|
|
|
|
|
|
|
Instead, we get the nice behavior of not having to remeber to `git annex
|
|
|
|
add` files, and just being able to use `git add` or `git commit -a`,
|
|
|
|
and have it use git-annex when .gitattributes says to. Also, annexed
|
|
|
|
files can be directly modified without having to `git annex unlock`.
|
|
|
|
|
|
|
|
### efficiency
|
|
|
|
|
|
|
|
The trick is doing it efficiently. Since git a2b665d, v1.7.4.1,
|
2011-01-26 17:21:51 +00:00
|
|
|
something like this works to provide a filename to the clean script:
|
|
|
|
|
|
|
|
git config --global filter.huge.clean huge-clean %f
|
|
|
|
|
|
|
|
This avoids it needing to read all the current file content from stdin
|
|
|
|
when doing eg, a git status or git commit. Instead it is passed the
|
2011-02-25 19:50:17 +00:00
|
|
|
filename that git is operating on, in the working directory.
|
2011-01-26 17:21:51 +00:00
|
|
|
|
|
|
|
So, WORM could just look at that file and easily tell if it is one
|
|
|
|
it already knows (same mtime and size). If so, it can short-circuit and
|
|
|
|
do nothing, file content is already cached.
|
|
|
|
|
|
|
|
SHA1 has a harder job. Would not want to re-sha1 the file every time,
|
|
|
|
probably. So it'd need a cache of file stat info, mapped to known objects.
|
|
|
|
|
2011-01-26 17:34:39 +00:00
|
|
|
### dealing with partial content availability
|
|
|
|
|
|
|
|
The smudge filter cannot be allowed to fail, that leaves the tree and
|
|
|
|
index in a weird state. So if a file's content is requested by calling
|
|
|
|
the smudge filter, the trick is to instead provide dummy content,
|
|
|
|
indicating it is not available (and perhaps saying to run "git-annex get").
|
|
|
|
|
|
|
|
Then, in the clean filter, it has to detect that it's cleaning a file
|
|
|
|
with that dummy content, and make sure to provide the same identifier as
|
|
|
|
it would if the file content was there.
|
|
|
|
|
|
|
|
I've a demo implementation of this technique in the scripts below.
|
2011-01-26 17:21:51 +00:00
|
|
|
|
2011-01-26 17:40:11 +00:00
|
|
|
----
|
|
|
|
|
2011-01-26 17:21:51 +00:00
|
|
|
### test files
|
|
|
|
|
|
|
|
huge-smudge:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
#!/bin/sh
|
|
|
|
read sha1
|
|
|
|
echo "smudging $sha1" >&2
|
2011-01-26 17:34:39 +00:00
|
|
|
if [ -e ~/$sha1 ]; then
|
|
|
|
cat ~/$sha1
|
|
|
|
else
|
|
|
|
echo "$sha1 not available"
|
|
|
|
fi
|
2011-01-26 17:21:51 +00:00
|
|
|
</pre>
|
|
|
|
|
|
|
|
huge-clean:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
#!/bin/sh
|
|
|
|
cat >temp
|
2011-01-26 17:34:39 +00:00
|
|
|
if grep -q 'not available' temp; then
|
|
|
|
awk '{print $1}' temp # provide what we would if the content were avail!
|
|
|
|
rm temp
|
|
|
|
exit 0
|
|
|
|
fi
|
2011-01-26 17:21:51 +00:00
|
|
|
sha1=`sha1sum temp | cut -d' ' -f1`
|
|
|
|
echo "cleaning $sha1" >&2
|
|
|
|
ls -l temp >&2
|
|
|
|
mv temp ~/$sha1
|
|
|
|
echo $sha1
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
.gitattributes:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
*.huge filter=huge
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
in .git/config:
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
[filter "huge"]
|
|
|
|
clean = huge-clean
|
|
|
|
smudge = huge-smudge
|
|
|
|
<pre>
|