Merge branch 'exportreeplus'
This commit is contained in:
commit
2616056cde
40 changed files with 705 additions and 222 deletions
|
@ -189,8 +189,9 @@ the special remote can reply with `UNSUPPORTED-REQUEST`.
|
|||
a list of settings with descriptions. Note that the user is not required
|
||||
to provided all the settings listed here. A block of responses
|
||||
can be made to this, which must always end with `CONFIGEND`.
|
||||
(Do not include settings like "encryption" that are common to all external
|
||||
special remotes.)
|
||||
(Do not include config like "encryption" that are common to all external
|
||||
special remotes. Also avoid including a config named "versioning"
|
||||
unless using it as desribed in the [[export_and_import_appendix]].)
|
||||
* `CONFIG Name Description`
|
||||
Indicates the name and description of a config setting. The description
|
||||
should be reasonably short. Example:
|
||||
|
|
|
@ -153,13 +153,6 @@ support a request, it can reply with `UNSUPPORTED-REQUEST`.
|
|||
Indicates that `IMPORTKEY` can be used.
|
||||
* `IMPORTKEYSUPPORTED-FAILURE`
|
||||
Indicates that `IMPORTKEY` cannot be used.
|
||||
* `VERSIONED`
|
||||
Used to check if the special remote is versioned.
|
||||
Note that this request may be made before or after `PREPARE`.
|
||||
* `ISVERSIONED`
|
||||
Indicates that the remote is versioned.
|
||||
* `NOTVERSIONED`
|
||||
Indicates that the remote is not versioned.
|
||||
* `LISTIMPORTABLECONTENTS`
|
||||
Used to get a list of all the files that are stored in the special
|
||||
remote. A block of responses
|
||||
|
@ -178,10 +171,9 @@ support a request, it can reply with `UNSUPPORTED-REQUEST`.
|
|||
block of responses. This can be repeated any number of times
|
||||
(indicating a branching history), and histories can also
|
||||
be nested multiple levels deep.
|
||||
This should only be used when the remote supports using
|
||||
"TRANSFER RECEIVE Key" to retrieve historical versions of files.
|
||||
And, it should only be used when the remote replies `ISVERSIONED`
|
||||
to the `VERSIONED` message.
|
||||
This should only be a response when the remote supports using
|
||||
"TRANSFER RECEIVE Key" to retrieve historical versions of files,
|
||||
and when "GETCONFIG versioning" yields "VALUE TRUE".
|
||||
* `END`
|
||||
Indicates the end of a block of responses.
|
||||
* `LOCATION Name`
|
||||
|
|
|
@ -545,6 +545,10 @@ it pick which of multiple branches to export?
|
|||
Perhaps configure the annex-tracking-branch in the git-annex branch?
|
||||
That might be generally useful when working with exporttree=yes remotes.
|
||||
|
||||
Or simply configure remote.foo.annex-tracking-branch on the proxy.
|
||||
This may not meet all use cases, but it's simple and seems like a
|
||||
reasonable first step.
|
||||
|
||||
The first two approaches also have a complication when a key is sent to
|
||||
the proxy that is not part of the configured annex-tracking-branch. What
|
||||
does the proxy do with it? There seem three possibilities:
|
||||
|
@ -610,19 +614,35 @@ were not accessible when it is accessed directly rather than via the proxy.
|
|||
Simplified design for proxying to exporttree=yes, if those remotes can
|
||||
store any key:
|
||||
|
||||
* Configure annex-tracking-branch for the proxy in the git-annex branch.
|
||||
(For the proxy as a whole, or for specific exporttree=yes repos behind
|
||||
it?)
|
||||
* Configure annex-tracking-branch in the proxy's git config.
|
||||
* Then the user's workflow is simply: `git-annex push`
|
||||
* The proxy handles PUT/GET/REMOVE of a key that is not in the
|
||||
annex-tracking branch that it currently knows about, by using
|
||||
the special remote's .git/annex/objects/ location.
|
||||
* Upon receiving a new annex-tracking-branch or any transfer of a key
|
||||
used in the current annex-tracking-branch, the proxy can update
|
||||
the exporttree=yes remote. This needs to happen incrementally,
|
||||
eg upon receiving a key, just proxy it on to the exporttree=yes remote,
|
||||
and update the export database. Once all keys are received, update
|
||||
the git-annex branch to indicate a new tree has been exported.
|
||||
* The proxy handles PUT by always storing to the special remote's
|
||||
.git/annex/objects/ location, not updating the exported tree.
|
||||
* The proxy allows REMOVE from the special remote's
|
||||
.git/annex/objects/ location, but not removal of keys
|
||||
that are in the currently exported tree.
|
||||
* When `git-annex post-receive` is run by the post-receive hook
|
||||
and the annex-tracking-branch has been updated, it exports
|
||||
the tree to the special remote.
|
||||
(But, `git-annex push` sends the updated tree first, so
|
||||
this will often be an incomplete export.)
|
||||
* When there is an incomplete export and a key is received
|
||||
that is part of that export, check if it is the *last* key
|
||||
that is needed to complete the export. If so, export the tree to the
|
||||
special remote again.
|
||||
(This avoids overhead and complication of incrementally updating
|
||||
the export. It relies on the special remote supporting renameExport.
|
||||
Incrementally updating the export might be worth doing eventually,
|
||||
for special remotes that do no support renameExport.)
|
||||
* When exporting a tree to the special remote, handle cases
|
||||
where a single key is used by multiple files, and the key is not
|
||||
present locally. In this case it currently fails to update
|
||||
one of the files (and renames the annexobjects location to the other
|
||||
one). It will need to download the content from the special remote and
|
||||
send it back to it.
|
||||
* When the special remote does not support renameExport, will need to
|
||||
download from the annexobjects location in order to store to the export
|
||||
location.
|
||||
|
||||
## possible enhancement: indirect uploads
|
||||
|
||||
|
|
|
@ -14,8 +14,7 @@ Normally files are stored on a git-annex special remote named by their
|
|||
keys. That is great for reliable data storage, but your filenames are
|
||||
obscured. Exporting replicates the tree to the special remote as-is.
|
||||
|
||||
Mixing key/value storage and exports in the same remote would be a mess and
|
||||
so is not allowed. You have to configure a special remote with
|
||||
To use this, you have to configure a special remote with
|
||||
`exporttree=yes` when initially setting it up with
|
||||
[[git-annex-initremote]](1).
|
||||
|
||||
|
@ -78,6 +77,20 @@ so the overwritten modification is not lost.)
|
|||
|
||||
Specify the special remote to export to.
|
||||
|
||||
* `--from=remote`
|
||||
|
||||
When the content of a file is not available in the local repository,
|
||||
this option lets it be downloaded from another remote, and sent on to the
|
||||
destination remote. The file will be temporarily stored on local disk,
|
||||
but will never enter the local repository.
|
||||
|
||||
This option can be repeated multiple times.
|
||||
|
||||
It is possible to use --from with the same remote as --to. If the tree
|
||||
contains several files with the same content, and the remote being
|
||||
exported to already contains one copy of the content, this allows making
|
||||
a copy by downloading the content from it.
|
||||
|
||||
* `--tracking`
|
||||
|
||||
This is a deprecated way to set "remote.<name>.annex-tracking-branch".
|
||||
|
|
|
@ -17,6 +17,11 @@ for repositories that have an adjusted branch checked
|
|||
out. The hook updates the work tree when run in such a repository,
|
||||
the same as running `git-annex merge` would.
|
||||
|
||||
When a repository is configured to proxy to a special remote with
|
||||
exporttree=yes, and the configured remote.name.annex-tracking-branch
|
||||
is received, the hook handles updating the tree exported to the
|
||||
special remote.
|
||||
|
||||
# OPTIONS
|
||||
|
||||
* The [[git-annex-common-options]](1) can be used.
|
||||
|
@ -29,6 +34,8 @@ the same as running `git-annex merge` would.
|
|||
|
||||
[[git-annex-merge]](1)
|
||||
|
||||
[[git-annex-export]](1)
|
||||
|
||||
# AUTHOR
|
||||
|
||||
Joey Hess <id@joeyh.name>
|
||||
|
|
|
@ -92,8 +92,8 @@ See [[git-annex-preferred-content]](1).
|
|||
|
||||
This option can be repeated multiple times with different paths.
|
||||
|
||||
Note that this option is ignored when syncing with "exporttree=yes"
|
||||
remotes.
|
||||
Note that this option does not prevent exporting other files to an
|
||||
"exporttree=yes" remote.
|
||||
|
||||
* `--all` `-A`
|
||||
|
||||
|
|
|
@ -37,8 +37,8 @@ do so by using eg `approxlackingcopies=1`.
|
|||
|
||||
This option can be repeated multiple times with different paths.
|
||||
|
||||
Note that this option is ignored when syncing with "exporttree=yes"
|
||||
remotes.
|
||||
Note that this option does not prevent exporting other files to an
|
||||
"exporttree=yes" remote.
|
||||
|
||||
* `--jobs=N` `-JN`
|
||||
|
||||
|
|
|
@ -28,6 +28,14 @@ a proxy.
|
|||
|
||||
Proxies can only be accessed via ssh or by an annex+http url.
|
||||
|
||||
To set up proxying to a special remote that is configured with
|
||||
exporttree=yes, its necessary for it to also be configured with
|
||||
annexobjects=yes. And, "remote.<name>.annex-tracking-branch" needs to
|
||||
be configured to the branch that will be exported to the special remote.
|
||||
When that branch is pushed to the proxy, it will update the tree exported
|
||||
to the special remote. When files are copied to the remote via the proxy,
|
||||
it will also update the exported tree.
|
||||
|
||||
# OPTIONS
|
||||
|
||||
* The [[git-annex-common-options]](1) can be used.
|
||||
|
@ -36,6 +44,7 @@ Proxies can only be accessed via ssh or by an annex+http url.
|
|||
|
||||
* [[git-annex]](1)
|
||||
* [[git-annex-updatecluster]](1)
|
||||
* [[git-annex-export]](1)
|
||||
|
||||
# AUTHOR
|
||||
|
||||
|
|
|
@ -351,7 +351,6 @@ content from the key-value store.
|
|||
|
||||
See [[git-annex-extendcluster](1) for details.
|
||||
|
||||
|
||||
* `updateproxy`
|
||||
|
||||
Update records with proxy configuration.
|
||||
|
|
|
@ -125,6 +125,11 @@ the S3 remote.
|
|||
When versioning is not enabled, this risks data loss, and so git-annex
|
||||
will not let you enable a remote with that configuration unless forced.
|
||||
|
||||
* `annexobjects` - When set to "yes" along with "exporttree=yes",
|
||||
this allows storing other objects in the remote along with the
|
||||
exported tree. They will be stored under .git/annex/objects/ in the
|
||||
remote.
|
||||
|
||||
* `publicurl` - Configure the URL that is used to download files
|
||||
from the bucket. Using this with a S3 bucket that has been configured
|
||||
to allow anyone to download its content allows git-annex to download
|
||||
|
|
|
@ -32,6 +32,11 @@ the adb remote.
|
|||
by [[git-annex-import]]. When set in combination with exporttree,
|
||||
this lets files be imported from it, and changes exported back to it.
|
||||
|
||||
* `annexobjects` - When set to "yes" along with "exporttree=yes",
|
||||
this allows storing other objects in the remote along with the
|
||||
exported tree. They will be stored under .git/annex/objects/ in the
|
||||
remote.
|
||||
|
||||
* `oldandroid` - Set to "yes" if your Android device is too old
|
||||
to support `find -printf`. Enabling this will make importing slower.
|
||||
If you see an error like "bad arg '-printf'", you can enable this
|
||||
|
|
|
@ -41,6 +41,11 @@ remote:
|
|||
by [[git-annex-import]]. It will not be usable as a general-purpose
|
||||
special remote.
|
||||
|
||||
* `annexobjects` - When set to "yes" along with "exporttree=yes",
|
||||
this allows storing other objects in the remote along with the
|
||||
exported tree. They will be stored under .git/annex/objects/ in the
|
||||
directory.
|
||||
|
||||
* `ignoreinodes` - Usually when importing, the inode numbers
|
||||
of files are used to detect when files have changed. Since some
|
||||
filesystems generate new inode numbers each time they are mounted,
|
||||
|
|
|
@ -32,6 +32,9 @@ for a list of known working combinations.
|
|||
Setting this does not allow trees to be exported to the httpalso remote,
|
||||
because it's read-only. But it does let exported files be downloaded
|
||||
from it.
|
||||
* `annexobjects` - If the other special remote has `annexobjects=yes`
|
||||
set (along with `exporttree=yes`), it also needs to be set when
|
||||
initializing the httpalso remote.
|
||||
|
||||
Configuration of encryption and chunking is inherited from the other
|
||||
special remote, and does not need to be specified when initializing the
|
||||
|
|
|
@ -26,6 +26,11 @@ These parameters can be passed to `git annex initremote` to configure rsync:
|
|||
by [[git-annex-export]]. It will not be usable as a general-purpose
|
||||
special remote.
|
||||
|
||||
* `annexobjects` - When set to "yes" along with "exporttree=yes",
|
||||
this allows storing other objects in the remote along with the
|
||||
exported tree. They will be stored under .git/annex/objects/ in the
|
||||
remote.
|
||||
|
||||
* `shellescape` - Optional. This has no effect when using rsync 3.2.4 or
|
||||
newer. Set to "no" to avoid shell escaping
|
||||
normally done when using older versions of rsync over ssh. That escaping
|
||||
|
|
|
@ -33,6 +33,11 @@ the webdav remote.
|
|||
by [[git-annex-export]]. It will not be usable as a general-purpose
|
||||
special remote.
|
||||
|
||||
* `annexobjects` - When set to "yes" along with "exporttree=yes",
|
||||
this allows storing other objects in the remote along with the
|
||||
exported tree. They will be stored under .git/annex/objects/ in the
|
||||
remote.
|
||||
|
||||
* `chunk` - Enables [[chunking]] when storing large files.
|
||||
|
||||
* `chunksize` - Deprecated version of chunk parameter above.
|
||||
|
|
|
@ -16,9 +16,6 @@ keys, in order to support exporttree=yes remotes.
|
|||
Another place this would be useful is
|
||||
[[proxying to exporttree=yes special remotes|design/passthrough_proxy]].
|
||||
|
||||
This could also solve [[todo/export_paired_rename_innefficenctcy]]
|
||||
cleanly.
|
||||
|
||||
With this change, a user could just `git-annex copy --to remote`
|
||||
and copy whatever files they want into it. Then later
|
||||
`git-annex export master --to remote` would efficiently update the tree
|
||||
|
@ -52,6 +49,13 @@ surprising for an existing user!
|
|||
|
||||
Perhaps this should not be "exportree=yes", but something else.
|
||||
|
||||
> Currently, if a remote is configured with "exporttree=foo", that
|
||||
> is treated the same as "exporttree=no". So this will need to be
|
||||
> a config added to exporttree=yes in order to interoperate
|
||||
> with old git-annex.
|
||||
>
|
||||
> Call it "exporttree=yes annexobjects=yes" --[[Joey]]
|
||||
|
||||
----
|
||||
|
||||
Consider two repositories A and B that both have access to the same
|
||||
|
@ -60,16 +64,58 @@ exporttree=yes special remote R.
|
|||
* A exports tree T1 to R
|
||||
* B pulls from A, so knows R has tree T1
|
||||
* A exports tree T2 to R, which deletes file `foo`. So
|
||||
it is moved to R's .git/annex/objects/
|
||||
it is moved to R's .git/annex/objects. Or, alternatively,
|
||||
`foo` is deleted, and the key is then copied to R again,
|
||||
also to .git/annex/objects.
|
||||
* B exports tree T2 to R also. So B deletes file `foo`. But it was not
|
||||
present anyway. If B then marks the key as not present in R, we will have
|
||||
lost track of the fact that A moved it to the objects location.
|
||||
|
||||
So, when calling removeExport, have to also check if the key is present in
|
||||
the objects location. If so, don't record the key as missing. (Or course,
|
||||
it already checks if some other exported file also has the content of the
|
||||
key.)
|
||||
the objects location. If so, either don't record the key as missing, or
|
||||
also remove from the objects location.
|
||||
|
||||
----
|
||||
|
||||
Could a remote with annexobjects=yet and exporttree=yes but without
|
||||
importtree=yes not be forced to be untrusted?
|
||||
|
||||
If not, the retrieval from the annexobjects location needs to do strong
|
||||
verification of the content.
|
||||
|
||||
If the annexobjects directory only gets keys uploaded to it, and never had
|
||||
exported files renamed into it, its content will always be as expected, and
|
||||
perhaps the remote does not need to be untrusted.
|
||||
|
||||
OTOH, if an exported file that is being deleted in an
|
||||
updated export gets renamed into the annexobjects directory, it's possible
|
||||
that the file has in fact been overwritten with other content (by git-annex
|
||||
in another clone of the repository), and so the object in annexobjects
|
||||
would not be as expected. So unfortunately, it seems that rename can't be
|
||||
done without forcing untrusted.
|
||||
|
||||
Note that, exporting a new tree can still delete any file at any time.
|
||||
If the remote is not untrusted, that could violate numcopies.
|
||||
So, performUnexport would need to check numcopies first, when using such a
|
||||
remote.
|
||||
|
||||
Even if they are not untrusted, an exported file can't be counted as a
|
||||
copy. Only a file in the annexobjects location can be. So the remote's
|
||||
checkPresent will perhaps need to return false for files that are exported?
|
||||
But surely other things than numcopies use checkPresent. So this might need
|
||||
a change to checkPresent's type to indicate the difference.
|
||||
|
||||
Crazy idea: Split the remote into two uuids. Use one for
|
||||
the annexobjects directory, and the other for the exported files. This
|
||||
clean separation avoids the above problem. But would be confusing for the
|
||||
user. HOWEVER, what if the two were treated as parts of the same cluster....?
|
||||
|
||||
This may be worth revisiting later, but for now, I am leaning to keeping it
|
||||
untrusted, and following down that line to make it as performant as
|
||||
possible.
|
||||
|
||||
---
|
||||
|
||||
Implementing in the "exportreeplus" branch --[[Joey]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
||||
|
|
|
@ -31,7 +31,30 @@ Planned schedule of work:
|
|||
## work notes
|
||||
|
||||
* Working on `exportreeplus` branch which is groundwork for proxying to
|
||||
exporttree=yes special remotes.
|
||||
exporttree=yes special remotes. Need to merge it to master.
|
||||
|
||||
## completed items for August
|
||||
|
||||
* Special remotes configured with exporttree=yes annexobjects=yes
|
||||
can store objects in .git/annex/objects, as well as an exported tree.
|
||||
|
||||
* Support proxying to special remotes configured with
|
||||
exporttree=yes annexobjects=yes.
|
||||
|
||||
* post-retrieve: When proxying is enabled for an exporttree=yes
|
||||
special remote and the configured remote.name.annex-tracking-branch
|
||||
is received, the tree is exported to the special remote.
|
||||
|
||||
* When getting from a P2P HTTP remote, prompt for credentials when
|
||||
required, instead of failing.
|
||||
|
||||
* Prevent `updateproxy` and `updatecluster` from adding
|
||||
an exporttree=yes special remote that does not have
|
||||
annexobjects=yes, to avoid foot shooting.
|
||||
|
||||
* Implement `git-annex export treeish --to=foo --from=bar`, which
|
||||
gets from bar as needed to send to foo. Make post-retrieve use
|
||||
`--to=r --from=r` to handle the multiple files case.
|
||||
|
||||
## items deferred until later for p2p protocol over http
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue