file map analysis

2015-11-24 11:39:47 -04:00 · 2015-11-24 11:39:47 -04:00 · 3f63666727
commit 3f63666727
parent d2e83db759
1 changed files with 64 additions and 15 deletions
--- a/doc/todo/smudge.mdwn
+++ b/doc/todo/smudge.mdwn
@ -162,8 +162,8 @@ Data:
  be a message saying that the file's content is not currently available.
  An annex pointer file is checked into the git repository the same way
  that an annex symlink is checked in.
-* file2key maps are maintained by git-annex, to keep track of
-  what files are pointers at keys.
+* A file map is  maintained by git-annex, to keep track of the keys
+  that are used by files in the working tree.

 Configuration: 

@ -206,16 +206,16 @@ git-annex clean:
  also drop that copy once the object gets uploaded to another repo ...
  But that gets complicated quickly.

-  Update file2key map.
+  Update file map.

  Output the pointer file content to stdout.

 git-annex smudge:

-* Run by eg `git checkout` and passed the filename, as well as fed
-  the pointer file content on stdin.
+* Run by eg `git checkout`
+  and passed the filename, as well as fed the pointer file content on stdin.

-  Updates file2key map.
+  Update file map.

  When an object is present in the annex, outputs its content to stdout.
  Otherwise, outputs the file pointer content.
@ -242,16 +242,65 @@ git annex lock/unlock:
  itself to break such a hard link. Always finish by locking down the
  permissions of the annex object.

-All other git-annex commands that look at annex symlinks to get keys will
-need fall back to checking if a given work tree file is stored in git as
-pointer file. This can be done by checking the file2key map (or by looking
-it up in the index).
+#### file map

-Note that I have not verified if file2key maps can be maintained
-consistently using the smudge/clean filters. Seems likely to work,
-based on when I see smudge/clean filters being run. The file2key
-optimisation may not be needed though, looking at the index 
-might be fast enough.
+The file map needs to map from `Key -> [File]`. `File -> Key`
+seems useful to have, but in practice is not worthwhile.
+
+Drop and get operations need to know what files in the work tree use a
+given key in order to update the work tree.
+
+git-annex commands that look at annex symlinks to get keys to act on will
+need fall back to either consulting the file map, or looking at the staged
+file to see if it's a pointer to a key. So a `File -> Key` map is a possible
+optimisation.
+
+Question: If the smudge/clean filters update the file map incrementally
+based on the pointer files they generate/see, will the result
+always be consistent with the content of the working tree?
+
+This depends on when git calls the smudge/clean filters and on what.
+In particular:
+
+* Does the clean filter always get called when adding a relevant 
+  file to git? Yes.
+* Is the clean filter called at any other time? Yes, for example
+  git diff will clean relevant modified files to generate the diff.
+  So, the clean filter may see file versions that have not yet been staged
+  in git.
+* Is the clean filter ever passed content not in the work tree?
+  I don't think so, but not 100% sure.
+* Is the smudge filter always called when git updates a relevant file
+  in the work tree? Yes.
+* Is the smudge filter called at any other time? Seems unlikely but then
+  there could be situations with a detached work tree or such.
+* Does git call any useful hooks when removing a file from the work tree,
+  or converting it to not be annexed?
+  No!
+
+From this analysis, any file map generated by the smudge/clean filters
+is necessary potentially innaccurate. It may list deleted files.
+It may or may not reflect current unstaged changes from the work tree.
+
+Follows that any use of the file map needs to verify the info from it,
+and throw out bad cached info (updating the map to match reality). 
+
+When downloading a key, check if the files listed in the file map are
+still pointer files in the work tree, and only replace them with the
+content if so. 
+
+When dropping a key, check if the files listed for it in the file map are
+unmodified in the work tree, and are staged as pointers to the key,
+and only reset them to the pointers if so. Note that this means that
+a modified work tree file that has not yet been staged, but that
+corresponds to a key, won't be reset when the key is dropped.
+This is probably not a big deal; the user will either add the
+file, which will add the key back, or reset the file.
+
+Does the `File -> Key` map have any benefits given this innaccuracy?
+Answer seems to be no; any answer that map gives may be innaccurate and
+needs to be verified by looking at actual repo content, so might as well
+just look at the repo content in the first place..

 #### Upgrading