file map analysis

This commit is contained in:
Joey Hess 2015-11-24 11:39:47 -04:00
parent d2e83db759
commit 3f63666727
Failed to extract signature

View file

@ -162,8 +162,8 @@ Data:
be a message saying that the file's content is not currently available.
An annex pointer file is checked into the git repository the same way
that an annex symlink is checked in.
* file2key maps are maintained by git-annex, to keep track of
what files are pointers at keys.
* A file map is maintained by git-annex, to keep track of the keys
that are used by files in the working tree.
Configuration:
@ -206,16 +206,16 @@ git-annex clean:
also drop that copy once the object gets uploaded to another repo ...
But that gets complicated quickly.
Update file2key map.
Update file map.
Output the pointer file content to stdout.
git-annex smudge:
* Run by eg `git checkout` and passed the filename, as well as fed
the pointer file content on stdin.
* Run by eg `git checkout`
and passed the filename, as well as fed the pointer file content on stdin.
Updates file2key map.
Update file map.
When an object is present in the annex, outputs its content to stdout.
Otherwise, outputs the file pointer content.
@ -242,16 +242,65 @@ git annex lock/unlock:
itself to break such a hard link. Always finish by locking down the
permissions of the annex object.
All other git-annex commands that look at annex symlinks to get keys will
need fall back to checking if a given work tree file is stored in git as
pointer file. This can be done by checking the file2key map (or by looking
it up in the index).
#### file map
Note that I have not verified if file2key maps can be maintained
consistently using the smudge/clean filters. Seems likely to work,
based on when I see smudge/clean filters being run. The file2key
optimisation may not be needed though, looking at the index
might be fast enough.
The file map needs to map from `Key -> [File]`. `File -> Key`
seems useful to have, but in practice is not worthwhile.
Drop and get operations need to know what files in the work tree use a
given key in order to update the work tree.
git-annex commands that look at annex symlinks to get keys to act on will
need fall back to either consulting the file map, or looking at the staged
file to see if it's a pointer to a key. So a `File -> Key` map is a possible
optimisation.
Question: If the smudge/clean filters update the file map incrementally
based on the pointer files they generate/see, will the result
always be consistent with the content of the working tree?
This depends on when git calls the smudge/clean filters and on what.
In particular:
* Does the clean filter always get called when adding a relevant
file to git? Yes.
* Is the clean filter called at any other time? Yes, for example
git diff will clean relevant modified files to generate the diff.
So, the clean filter may see file versions that have not yet been staged
in git.
* Is the clean filter ever passed content not in the work tree?
I don't think so, but not 100% sure.
* Is the smudge filter always called when git updates a relevant file
in the work tree? Yes.
* Is the smudge filter called at any other time? Seems unlikely but then
there could be situations with a detached work tree or such.
* Does git call any useful hooks when removing a file from the work tree,
or converting it to not be annexed?
No!
From this analysis, any file map generated by the smudge/clean filters
is necessary potentially innaccurate. It may list deleted files.
It may or may not reflect current unstaged changes from the work tree.
Follows that any use of the file map needs to verify the info from it,
and throw out bad cached info (updating the map to match reality).
When downloading a key, check if the files listed in the file map are
still pointer files in the work tree, and only replace them with the
content if so.
When dropping a key, check if the files listed for it in the file map are
unmodified in the work tree, and are staged as pointers to the key,
and only reset them to the pointers if so. Note that this means that
a modified work tree file that has not yet been staged, but that
corresponds to a key, won't be reset when the key is dropped.
This is probably not a big deal; the user will either add the
file, which will add the key back, or reset the file.
Does the `File -> Key` map have any benefits given this innaccuracy?
Answer seems to be no; any answer that map gives may be innaccurate and
needs to be verified by looking at actual repo content, so might as well
just look at the repo content in the first place..
#### Upgrading