design work on annexobjects remotes

This commit is contained in:
Joey Hess 2024-08-03 19:51:03 -04:00
parent a4a06404d4
commit fe01a1e7e1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 42 additions and 16 deletions

View file

@ -78,6 +78,42 @@ So, when calling removeExport, have to also check if the key is present in
the objects location. If so, either don't record the key as missing, or the objects location. If so, either don't record the key as missing, or
also remove from the objects location. also remove from the objects location.
----
# trust
Could a remote with annexobjects=yet and exporttree=yes but without
importtree=yes not be forced to be untrusted?
If not, the retrieval from the annexobjects location needs to do strong
verification of the content.
If the annexobjects directory only gets keys uploaded to it, and never had
exported files renamed into it, its content will always be as expected, and
perhaps the remote does not need to be untrusted.
OTOH, if an exported file that is being deleted in an updated export gets
renamed into the annexobjects directory, it's possible that the file has in
fact been overwritten with other content (by git-annex in another clone of
the repository), and so the object in annexobjects would not be as
expected. So unfortunately, it seems that rename can't be done.
Note that, exporting a new tree can still delete any file at any time.
If the remote is not untrusted, that could violate numcopies.
So, performUnexport would need to check numcopies first, when using such a
remote.
Even if they are not untrusted, an exported file can't be counted as a
copy. Only a file in the annexobjects location can be. So the remote's
checkPresent will perhaps need to return false for files that are exported?
But surely other things than numcopies use checkPresent. So this might need
a change to checkPresent's type to indicate the difference.
Crazy idea: Split the remote into two uuids. Use one for
the annexobjects directory, and the other for the exported files. This
clean separation avoids the above problem. But would be confusing for the
user. HOWEVER, what if the two were treated as parts of the same cluster....?
--- ---
Implementing in the "exportreeplus" branch --[[Joey]] Implementing in the "exportreeplus" branch --[[Joey]]

View file

@ -33,25 +33,15 @@ Planned schedule of work:
* Working on `exportreeplus` branch which is groundwork for proxying to * Working on `exportreeplus` branch which is groundwork for proxying to
exporttree=yes special remotes. exporttree=yes special remotes.
* `git-annex sync` with an annexobjects=true special remote, when exporting
a subdir that contained a file, which has now been moved out of the
subdir, first unexports the file, and then re-uploads it to the remote.
This could be avoided if when unexporting, it moves to the annex objects
location.
(Might be worth doing that by default, this would let annexobjects=true
special remotes not be untrusted.)
* `git-annex sync --content` to an annexobjects=true special remote should
get and put keys that are not in the exported tree to the annexobjects
location.
* `git-annex export` when renaming an exported file to a temporary name * `git-annex export` when renaming an exported file to a temporary name
should use the annexobjects location. should use the annexobjects location.
* It would be possible for an annexobjects=true special remote to not be * Make annexobjects=true remotes not be untrusted, if possible. See todo.
untrusted, unlike usual exporttree=yes special remotes. Unexporting a
file would need to move it to the annexobjects location. Alternatively, if they do need to be untrusted, the retrieval from the
annexobjects location may also need to do strong verification of the
content, if exported files ever get renamed into the annexobjects
location.
## items deferred until later for p2p protocol over http ## items deferred until later for p2p protocol over http