Merge branch 'exportreeplus'

This commit is contained in:
Joey Hess 2024-08-08 15:31:57 -04:00
commit 2616056cde
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
40 changed files with 705 additions and 222 deletions

View file

@ -189,8 +189,9 @@ the special remote can reply with `UNSUPPORTED-REQUEST`.
a list of settings with descriptions. Note that the user is not required
to provided all the settings listed here. A block of responses
can be made to this, which must always end with `CONFIGEND`.
(Do not include settings like "encryption" that are common to all external
special remotes.)
(Do not include config like "encryption" that are common to all external
special remotes. Also avoid including a config named "versioning"
unless using it as desribed in the [[export_and_import_appendix]].)
* `CONFIG Name Description`
Indicates the name and description of a config setting. The description
should be reasonably short. Example:

View file

@ -153,13 +153,6 @@ support a request, it can reply with `UNSUPPORTED-REQUEST`.
Indicates that `IMPORTKEY` can be used.
* `IMPORTKEYSUPPORTED-FAILURE`
Indicates that `IMPORTKEY` cannot be used.
* `VERSIONED`
Used to check if the special remote is versioned.
Note that this request may be made before or after `PREPARE`.
* `ISVERSIONED`
Indicates that the remote is versioned.
* `NOTVERSIONED`
Indicates that the remote is not versioned.
* `LISTIMPORTABLECONTENTS`
Used to get a list of all the files that are stored in the special
remote. A block of responses
@ -178,10 +171,9 @@ support a request, it can reply with `UNSUPPORTED-REQUEST`.
block of responses. This can be repeated any number of times
(indicating a branching history), and histories can also
be nested multiple levels deep.
This should only be used when the remote supports using
"TRANSFER RECEIVE Key" to retrieve historical versions of files.
And, it should only be used when the remote replies `ISVERSIONED`
to the `VERSIONED` message.
This should only be a response when the remote supports using
"TRANSFER RECEIVE Key" to retrieve historical versions of files,
and when "GETCONFIG versioning" yields "VALUE TRUE".
* `END`
Indicates the end of a block of responses.
* `LOCATION Name`

View file

@ -545,6 +545,10 @@ it pick which of multiple branches to export?
Perhaps configure the annex-tracking-branch in the git-annex branch?
That might be generally useful when working with exporttree=yes remotes.
Or simply configure remote.foo.annex-tracking-branch on the proxy.
This may not meet all use cases, but it's simple and seems like a
reasonable first step.
The first two approaches also have a complication when a key is sent to
the proxy that is not part of the configured annex-tracking-branch. What
does the proxy do with it? There seem three possibilities:
@ -610,19 +614,35 @@ were not accessible when it is accessed directly rather than via the proxy.
Simplified design for proxying to exporttree=yes, if those remotes can
store any key:
* Configure annex-tracking-branch for the proxy in the git-annex branch.
(For the proxy as a whole, or for specific exporttree=yes repos behind
it?)
* Configure annex-tracking-branch in the proxy's git config.
* Then the user's workflow is simply: `git-annex push`
* The proxy handles PUT/GET/REMOVE of a key that is not in the
annex-tracking branch that it currently knows about, by using
the special remote's .git/annex/objects/ location.
* Upon receiving a new annex-tracking-branch or any transfer of a key
used in the current annex-tracking-branch, the proxy can update
the exporttree=yes remote. This needs to happen incrementally,
eg upon receiving a key, just proxy it on to the exporttree=yes remote,
and update the export database. Once all keys are received, update
the git-annex branch to indicate a new tree has been exported.
* The proxy handles PUT by always storing to the special remote's
.git/annex/objects/ location, not updating the exported tree.
* The proxy allows REMOVE from the special remote's
.git/annex/objects/ location, but not removal of keys
that are in the currently exported tree.
* When `git-annex post-receive` is run by the post-receive hook
and the annex-tracking-branch has been updated, it exports
the tree to the special remote.
(But, `git-annex push` sends the updated tree first, so
this will often be an incomplete export.)
* When there is an incomplete export and a key is received
that is part of that export, check if it is the *last* key
that is needed to complete the export. If so, export the tree to the
special remote again.
(This avoids overhead and complication of incrementally updating
the export. It relies on the special remote supporting renameExport.
Incrementally updating the export might be worth doing eventually,
for special remotes that do no support renameExport.)
* When exporting a tree to the special remote, handle cases
where a single key is used by multiple files, and the key is not
present locally. In this case it currently fails to update
one of the files (and renames the annexobjects location to the other
one). It will need to download the content from the special remote and
send it back to it.
* When the special remote does not support renameExport, will need to
download from the annexobjects location in order to store to the export
location.
## possible enhancement: indirect uploads

View file

@ -14,8 +14,7 @@ Normally files are stored on a git-annex special remote named by their
keys. That is great for reliable data storage, but your filenames are
obscured. Exporting replicates the tree to the special remote as-is.
Mixing key/value storage and exports in the same remote would be a mess and
so is not allowed. You have to configure a special remote with
To use this, you have to configure a special remote with
`exporttree=yes` when initially setting it up with
[[git-annex-initremote]](1).
@ -78,6 +77,20 @@ so the overwritten modification is not lost.)
Specify the special remote to export to.
* `--from=remote`
When the content of a file is not available in the local repository,
this option lets it be downloaded from another remote, and sent on to the
destination remote. The file will be temporarily stored on local disk,
but will never enter the local repository.
This option can be repeated multiple times.
It is possible to use --from with the same remote as --to. If the tree
contains several files with the same content, and the remote being
exported to already contains one copy of the content, this allows making
a copy by downloading the content from it.
* `--tracking`
This is a deprecated way to set "remote.<name>.annex-tracking-branch".

View file

@ -17,6 +17,11 @@ for repositories that have an adjusted branch checked
out. The hook updates the work tree when run in such a repository,
the same as running `git-annex merge` would.
When a repository is configured to proxy to a special remote with
exporttree=yes, and the configured remote.name.annex-tracking-branch
is received, the hook handles updating the tree exported to the
special remote.
# OPTIONS
* The [[git-annex-common-options]](1) can be used.
@ -29,6 +34,8 @@ the same as running `git-annex merge` would.
[[git-annex-merge]](1)
[[git-annex-export]](1)
# AUTHOR
Joey Hess <id@joeyh.name>

View file

@ -92,8 +92,8 @@ See [[git-annex-preferred-content]](1).
This option can be repeated multiple times with different paths.
Note that this option is ignored when syncing with "exporttree=yes"
remotes.
Note that this option does not prevent exporting other files to an
"exporttree=yes" remote.
* `--all` `-A`

View file

@ -37,8 +37,8 @@ do so by using eg `approxlackingcopies=1`.
This option can be repeated multiple times with different paths.
Note that this option is ignored when syncing with "exporttree=yes"
remotes.
Note that this option does not prevent exporting other files to an
"exporttree=yes" remote.
* `--jobs=N` `-JN`

View file

@ -28,6 +28,14 @@ a proxy.
Proxies can only be accessed via ssh or by an annex+http url.
To set up proxying to a special remote that is configured with
exporttree=yes, its necessary for it to also be configured with
annexobjects=yes. And, "remote.<name>.annex-tracking-branch" needs to
be configured to the branch that will be exported to the special remote.
When that branch is pushed to the proxy, it will update the tree exported
to the special remote. When files are copied to the remote via the proxy,
it will also update the exported tree.
# OPTIONS
* The [[git-annex-common-options]](1) can be used.
@ -36,6 +44,7 @@ Proxies can only be accessed via ssh or by an annex+http url.
* [[git-annex]](1)
* [[git-annex-updatecluster]](1)
* [[git-annex-export]](1)
# AUTHOR

View file

@ -351,7 +351,6 @@ content from the key-value store.
See [[git-annex-extendcluster](1) for details.
* `updateproxy`
Update records with proxy configuration.

View file

@ -125,6 +125,11 @@ the S3 remote.
When versioning is not enabled, this risks data loss, and so git-annex
will not let you enable a remote with that configuration unless forced.
* `annexobjects` - When set to "yes" along with "exporttree=yes",
this allows storing other objects in the remote along with the
exported tree. They will be stored under .git/annex/objects/ in the
remote.
* `publicurl` - Configure the URL that is used to download files
from the bucket. Using this with a S3 bucket that has been configured
to allow anyone to download its content allows git-annex to download

View file

@ -32,6 +32,11 @@ the adb remote.
by [[git-annex-import]]. When set in combination with exporttree,
this lets files be imported from it, and changes exported back to it.
* `annexobjects` - When set to "yes" along with "exporttree=yes",
this allows storing other objects in the remote along with the
exported tree. They will be stored under .git/annex/objects/ in the
remote.
* `oldandroid` - Set to "yes" if your Android device is too old
to support `find -printf`. Enabling this will make importing slower.
If you see an error like "bad arg '-printf'", you can enable this

View file

@ -41,6 +41,11 @@ remote:
by [[git-annex-import]]. It will not be usable as a general-purpose
special remote.
* `annexobjects` - When set to "yes" along with "exporttree=yes",
this allows storing other objects in the remote along with the
exported tree. They will be stored under .git/annex/objects/ in the
directory.
* `ignoreinodes` - Usually when importing, the inode numbers
of files are used to detect when files have changed. Since some
filesystems generate new inode numbers each time they are mounted,

View file

@ -32,6 +32,9 @@ for a list of known working combinations.
Setting this does not allow trees to be exported to the httpalso remote,
because it's read-only. But it does let exported files be downloaded
from it.
* `annexobjects` - If the other special remote has `annexobjects=yes`
set (along with `exporttree=yes`), it also needs to be set when
initializing the httpalso remote.
Configuration of encryption and chunking is inherited from the other
special remote, and does not need to be specified when initializing the

View file

@ -26,6 +26,11 @@ These parameters can be passed to `git annex initremote` to configure rsync:
by [[git-annex-export]]. It will not be usable as a general-purpose
special remote.
* `annexobjects` - When set to "yes" along with "exporttree=yes",
this allows storing other objects in the remote along with the
exported tree. They will be stored under .git/annex/objects/ in the
remote.
* `shellescape` - Optional. This has no effect when using rsync 3.2.4 or
newer. Set to "no" to avoid shell escaping
normally done when using older versions of rsync over ssh. That escaping

View file

@ -33,6 +33,11 @@ the webdav remote.
by [[git-annex-export]]. It will not be usable as a general-purpose
special remote.
* `annexobjects` - When set to "yes" along with "exporttree=yes",
this allows storing other objects in the remote along with the
exported tree. They will be stored under .git/annex/objects/ in the
remote.
* `chunk` - Enables [[chunking]] when storing large files.
* `chunksize` - Deprecated version of chunk parameter above.

View file

@ -16,9 +16,6 @@ keys, in order to support exporttree=yes remotes.
Another place this would be useful is
[[proxying to exporttree=yes special remotes|design/passthrough_proxy]].
This could also solve [[todo/export_paired_rename_innefficenctcy]]
cleanly.
With this change, a user could just `git-annex copy --to remote`
and copy whatever files they want into it. Then later
`git-annex export master --to remote` would efficiently update the tree
@ -52,6 +49,13 @@ surprising for an existing user!
Perhaps this should not be "exportree=yes", but something else.
> Currently, if a remote is configured with "exporttree=foo", that
> is treated the same as "exporttree=no". So this will need to be
> a config added to exporttree=yes in order to interoperate
> with old git-annex.
>
> Call it "exporttree=yes annexobjects=yes" --[[Joey]]
----
Consider two repositories A and B that both have access to the same
@ -60,16 +64,58 @@ exporttree=yes special remote R.
* A exports tree T1 to R
* B pulls from A, so knows R has tree T1
* A exports tree T2 to R, which deletes file `foo`. So
it is moved to R's .git/annex/objects/
it is moved to R's .git/annex/objects. Or, alternatively,
`foo` is deleted, and the key is then copied to R again,
also to .git/annex/objects.
* B exports tree T2 to R also. So B deletes file `foo`. But it was not
present anyway. If B then marks the key as not present in R, we will have
lost track of the fact that A moved it to the objects location.
So, when calling removeExport, have to also check if the key is present in
the objects location. If so, don't record the key as missing. (Or course,
it already checks if some other exported file also has the content of the
key.)
the objects location. If so, either don't record the key as missing, or
also remove from the objects location.
----
Could a remote with annexobjects=yet and exporttree=yes but without
importtree=yes not be forced to be untrusted?
If not, the retrieval from the annexobjects location needs to do strong
verification of the content.
If the annexobjects directory only gets keys uploaded to it, and never had
exported files renamed into it, its content will always be as expected, and
perhaps the remote does not need to be untrusted.
OTOH, if an exported file that is being deleted in an
updated export gets renamed into the annexobjects directory, it's possible
that the file has in fact been overwritten with other content (by git-annex
in another clone of the repository), and so the object in annexobjects
would not be as expected. So unfortunately, it seems that rename can't be
done without forcing untrusted.
Note that, exporting a new tree can still delete any file at any time.
If the remote is not untrusted, that could violate numcopies.
So, performUnexport would need to check numcopies first, when using such a
remote.
Even if they are not untrusted, an exported file can't be counted as a
copy. Only a file in the annexobjects location can be. So the remote's
checkPresent will perhaps need to return false for files that are exported?
But surely other things than numcopies use checkPresent. So this might need
a change to checkPresent's type to indicate the difference.
Crazy idea: Split the remote into two uuids. Use one for
the annexobjects directory, and the other for the exported files. This
clean separation avoids the above problem. But would be confusing for the
user. HOWEVER, what if the two were treated as parts of the same cluster....?
This may be worth revisiting later, but for now, I am leaning to keeping it
untrusted, and following down that line to make it as performant as
possible.
---
Implementing in the "exportreeplus" branch --[[Joey]]
> [[done]] --[[Joey]]

View file

@ -31,7 +31,30 @@ Planned schedule of work:
## work notes
* Working on `exportreeplus` branch which is groundwork for proxying to
exporttree=yes special remotes.
exporttree=yes special remotes. Need to merge it to master.
## completed items for August
* Special remotes configured with exporttree=yes annexobjects=yes
can store objects in .git/annex/objects, as well as an exported tree.
* Support proxying to special remotes configured with
exporttree=yes annexobjects=yes.
* post-retrieve: When proxying is enabled for an exporttree=yes
special remote and the configured remote.name.annex-tracking-branch
is received, the tree is exported to the special remote.
* When getting from a P2P HTTP remote, prompt for credentials when
required, instead of failing.
* Prevent `updateproxy` and `updatecluster` from adding
an exporttree=yes special remote that does not have
annexobjects=yes, to avoid foot shooting.
* Implement `git-annex export treeish --to=foo --from=bar`, which
gets from bar as needed to send to foo. Make post-retrieve use
`--to=r --from=r` to handle the multiple files case.
## items deferred until later for p2p protocol over http