further investigation
This commit is contained in:
parent
5cb24bebf4
commit
3f7c0b6970
1 changed files with 23 additions and 9 deletions
|
@ -1,14 +1,34 @@
|
||||||
git-annex should use smudge/clean filters.
|
git-annex should use smudge/clean filters.
|
||||||
|
|
||||||
The trick is doing it efficiently. Since git a2b665d, 2011-01-05,
|
The clean filter is run when files are staged for commit. So a user could copy
|
||||||
|
any file into the annex, git add it, and git-annex's clean filter causes
|
||||||
|
the file's key to be staged, while its value is added to the annex.
|
||||||
|
|
||||||
|
The smudge filter is run when files are checked out. Since git annex
|
||||||
|
repos have partial content, this would not git annex get the file content.
|
||||||
|
Instead, if the content is not currently available, it would need to do
|
||||||
|
something like return empty file content. (Sadly, it cannot create a
|
||||||
|
symlink, as git still wants to write the file afterwards.
|
||||||
|
|
||||||
|
So the nice current behavior of unavailable files being clearly missing due
|
||||||
|
to dangling symlinks, would be lost when using smudge/clean filters.
|
||||||
|
(Contact git developers to get an interface to do this?)
|
||||||
|
|
||||||
|
Instead, we get the nice behavior of not having to remeber to `git annex
|
||||||
|
add` files, and just being able to use `git add` or `git commit -a`,
|
||||||
|
and have it use git-annex when .gitattributes says to. Also, annexed
|
||||||
|
files can be directly modified without having to `git annex unlock`.
|
||||||
|
|
||||||
|
### efficiency
|
||||||
|
|
||||||
|
The trick is doing it efficiently. Since git a2b665d, v1.7.4.1,
|
||||||
something like this works to provide a filename to the clean script:
|
something like this works to provide a filename to the clean script:
|
||||||
|
|
||||||
git config --global filter.huge.clean huge-clean %f
|
git config --global filter.huge.clean huge-clean %f
|
||||||
|
|
||||||
This avoids it needing to read all the current file content from stdin
|
This avoids it needing to read all the current file content from stdin
|
||||||
when doing eg, a git status or git commit. Instead it is passed the
|
when doing eg, a git status or git commit. Instead it is passed the
|
||||||
filename that git is operating on, I think that's from the working
|
filename that git is operating on, in the working directory.
|
||||||
directory.
|
|
||||||
|
|
||||||
So, WORM could just look at that file and easily tell if it is one
|
So, WORM could just look at that file and easily tell if it is one
|
||||||
it already knows (same mtime and size). If so, it can short-circuit and
|
it already knows (same mtime and size). If so, it can short-circuit and
|
||||||
|
@ -32,12 +52,6 @@ I've a demo implementation of this technique in the scripts below.
|
||||||
|
|
||||||
----
|
----
|
||||||
|
|
||||||
It may further be possible to use the %f with the smudge filter
|
|
||||||
(docs say it's supported), and instead of outputting the dummy content,
|
|
||||||
it could create a dangling symlink, which would be more like git-annex's
|
|
||||||
behavior now, and makes it easy to tell what content is not available
|
|
||||||
with `ls`.
|
|
||||||
|
|
||||||
### test files
|
### test files
|
||||||
|
|
||||||
huge-smudge:
|
huge-smudge:
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue