thoughts

2020-06-23 13:51:10 -04:00 · 2020-06-23 13:51:10 -04:00 · 3da4caa785
commit 3da4caa785
parent 400b03115e
1 changed files with 54 additions and 7 deletions
--- a/doc/todo/import_tree_should_honor_annex.largefiles.mdwn
+++ b/doc/todo/import_tree_should_honor_annex.largefiles.mdwn
@ -4,10 +4,57 @@ remote.
 Note that the legacy `git annex import` from a directory does honor
 annex.largefiles.
-The tricky bit might be that the largefiles matcher will need to run on
+> annex.largefiles will either need to be matched by downloadImport
-the temporary annex key that's used to import, before calculating the real
+> (changing to return `Either Sha Key`, or by buildImportTrees).
-annex key; there's no corresponding file in the working tree. Also,
+>
-a "branch:subdir" at the command line or in
+> If it's done in downloadImport, to avoid re-download of non-large files,
-remote.name.annex-tracking-branch can change the path
+> the content identifier will
-that the file is being imported to, which needs to be communicated to the
+> need to be recorded as using the git sha1. This needs a way to encode
-largefiles matcher.
+> a git sha1 as a key, that is distinct from annex sha1 keys.
 > 
 > Problem: In downloadImport, startdownload checks getcidkey
 > to see if the ContentIdentifier is already known, and if so, returns the
 > key used for it before. But, with annex.largefiles, the same content
 > might be annexed given one filename, and not annexed with another.
 > So, the key from getcidkey might not be the right one (or there could be
 > more than one, an annex key and a translated git key).
 > 
 > That argues against making downloadImport match annex.largefiles.
 > But, if instead buildImportTrees matches annex.largefiles,
 > then downloadImport has already run moveAnnex on the download,
 > so the content is in the annex. Moving it back out of the annex is
 > difficult (there may be other files in the repo using the same key).
 > So, downloadImport would then need to not moveAnnex, but move it to
 > somewhere temporary. Like the gitAnnexTmpObjectLocation, but using
 > that would be a problem if there was a file in the repo
 > and git-annex get was run on it at the same time. So an equivilant
 > but separate location.
 > 
 > Further problem: downloadImport might skip a download of a CID
 > that's already been seen. That CID might have generated a key
 > before. The key's content may not still be present in the local 
 > repo. Then, if buildImportTrees checks annex.largefiles and wants
 > to add it directly to git, it won't have the content available to add to
 > git. (Conversely, the CID may have been added to git before, but
 > annex.largefiles matches now, and so it would need to extract
 > the content from git only to store it in the annex, which is doable but
 > seems pointless as it's not going to save any space.)
 > 
 > Would it be acceptable for annex.largefiles to be ignored if the same
 > content was already imported from a remote earlier? I think maybe so.
 > 
 > Then all these problems are not a concern, and back to downloadImport
 > checking annex.largefiles being the simplest approach, since it avoids
 > needing the separate temp file location. 
 > 
 > From the user's perspective, the special remote contained a file,
 > it was already imported in the past, and the file has been renamed.
 > It makes no more sense for importing it again to change how it's
 > stored between git and annex than it makes sense for git mv of a file
 > to change how it's stored.
 > 
 > However... If two people can access the special remote, and import
 > from it at different times and get different trees as a result,
 > that might break some assumptions and would certainly lead to merge
 > conflicts. --[[Joey]]