some open questions

2018-06-14 13:42:25 -04:00 · 2018-06-14 13:42:25 -04:00 · 3f80aaea3d
commit 3f80aaea3d
parent 466d3fbaab
1 changed files with 23 additions and 6 deletions
--- a/doc/todo/import_tree.mdwn
+++ b/doc/todo/import_tree.mdwn
@ -78,15 +78,32 @@ A good content identifier needs to:
 * Be reasonably unique, but not necessarily fully unique.  
  For example, if the mtime of a file is used as the content identifier, then
  a rename that swaps two files would be noticed, except for in the
-  unusual case where they have the same mtime. If a new file (or a copy)
+  unusual case where they have the same mtime. If a new file
  is added with the same mtime as some other file in the tree though,
-  git-annex will see that the file is new, and so can still import it, even
+  git-annex will see that the filename is new, and so can still import it,
-  though it's seen that content identifier before. Of course, that might
+  even though it's seen that content identifier before. Of course, that might
-  result in unncessary downloads, so a more unique content identifer would
+  result in unncessary downloads (eg of a renamed file), so a more unique
-  be better.
+  content identifer would be better.
 A (size, mtime, inode) tuple is as good a content identifier as git uses in
-its index. That or a hash of the content would be ideal.
+its index. That or a hash of the content would be ideal. 
 Do remotes need to tell git-annex about the properties of content
 identifiers they use, or does git-annex assume a minimum bar, and pay the
 price with some unncessary transfers of renamed files etc?
 Note that git-annex will need a way to get the content identifiers of files
 that it stores on the remote when exporting a tree to it. There's a race
 here, since a file could be modified on the remote while it's being
 exported, and if the remote then uses its mtime in the content identifier,
 the modification would never be noticed. (Does git have this same race when
 updating the work tree after a merge?)
 Some remotes could avoid that race, if they sent back the content
 identifier in response to the TRANSFEREXPORT message, and kept the file
 quarentined until they had generated the content identifier. Other remotes
 probably can't avoid the race. Is it worth changing the TRANSFEREXPORT
 interface to include the content identifier in the reply?
 ----