web: Add urlinclude and urlexclude configuration settings
Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
parent
8d06930c88
commit
6fa166e1fc
5 changed files with 139 additions and 16 deletions
|
@ -12,3 +12,16 @@ This special remote uses urls on the web as the source for content.
|
|||
There are several other ways http can be used to download annexed objects,
|
||||
including a git remote accessible by http, S3 with a `publicurl` configured,
|
||||
and the [[httpalso]] special remote.
|
||||
|
||||
## configuration
|
||||
|
||||
These parameters can be passed to `git annex initremote` or
|
||||
`git-annex enableremote` to configure a web remote:
|
||||
|
||||
* `urlinclude` - Only use urls that match the specified glob.
|
||||
For example, `urlinclude="https://s3.amazonaws.com/*"`
|
||||
Note: Globs are matched case-insensitively.
|
||||
* `urlexclude` - Don't use urls that match the specified glob.
|
||||
For example, to prohibit http urls, but allow https,
|
||||
use `urlexclude="http:*"`
|
||||
Note: Globs are matched case-insensitively.
|
||||
|
|
|
@ -1,3 +1,7 @@
|
|||
[[!toc ]]
|
||||
|
||||
## basic use
|
||||
|
||||
The web can be used as a [[special_remote|special_remotes]] too.
|
||||
|
||||
# git annex addurl http://example.com/video.mpeg
|
||||
|
@ -48,7 +52,7 @@ You can also attach urls to any file already in the annex:
|
|||
00000000-0000-0000-0000-000000000001 -- web
|
||||
27a9510c-760a-11e1-b9a0-c731d2b77df9 -- here
|
||||
|
||||
## configuring filenames
|
||||
## configuring addurl filenames
|
||||
|
||||
By default, `addurl` will generate a filename for you. You can use
|
||||
`--file=` to specify the filename to use.
|
||||
|
@ -115,3 +119,25 @@ to work.
|
|||
## podcasts
|
||||
|
||||
This is done using `git annex importfeed`. See [[downloading podcasts]].
|
||||
|
||||
## configuring which url is used when there are several
|
||||
|
||||
An annexed file can have content at multiple urls that git-annex knows
|
||||
about, and git-annex may use any of those urls for downloading a file.
|
||||
|
||||
If some urls are especially fast, you might want to configure
|
||||
which urls git-annex prefers to use first. To accomplish that,
|
||||
you can create additional remotes, that are web special remotes, and are
|
||||
configured to only use the fast urls. Then it's simply a matter of
|
||||
configuring the cost of those remotes.
|
||||
|
||||
For example, suppose that you want to prioritize using urls on "fasthost.com".
|
||||
|
||||
git-annex initremote --sameas=web fasthost type=web urlinclude='*//fasthost.com/*'
|
||||
git config remote.fasthost.annex-cost 150
|
||||
|
||||
Now, `git-annex get` of a file that is on both fasthost.com and another url
|
||||
will prefer to use the fasthost special remote, rather than the web special
|
||||
remote (which has a higher cost of 200), and so will use the fasthost.com
|
||||
url. If that url is not available, it will fall back to the web special
|
||||
remote, and use the other url.
|
||||
|
|
|
@ -0,0 +1,49 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2023-01-09T20:48:18Z"
|
||||
content="""
|
||||
I've implemented support for multiple web special remotes,
|
||||
and have added configurations urlinclude= and urlexclude=
|
||||
(both case-insensitive globs).
|
||||
|
||||
Example use:
|
||||
|
||||
git-annex initremote --sameas=web fastweb type=web urlinclude='*//fasthost.com/*' autoenable=true
|
||||
git config remote.fastweb.annex-cost 150
|
||||
|
||||
And then `git-annex get --from fasthost` will only use urls on that host,
|
||||
not any other urls. `git-annex get --from web` will still use any urls.
|
||||
The cost of 150 makes `git-annex get` use fasthost before web.
|
||||
|
||||
That's enough to handle the example you gave, just use
|
||||
`urlinclude='*//dandiarchive.s3.amazonaws.com/*'
|
||||
|
||||
---
|
||||
|
||||
But, I don't think this is quite sufficient. Because it should also be
|
||||
possible to deprioritize urls. And there's not a good way to yet.
|
||||
|
||||
In particular, this doesn't work:
|
||||
|
||||
git-annex initremote --sameas=web slowweb type=web urlinclude='*//slowhost.com/*' autoenable=true
|
||||
git config remote.slowhost.annex-cost 300
|
||||
|
||||
Because when getting a file, the main web special remote is tried before
|
||||
this high-cost slowhost one, and will use any url, including
|
||||
slowhost.com urls.
|
||||
|
||||
Now you can instead do this:
|
||||
|
||||
git-annex initremote --sameas=web fastweb type=web urlexclude='*//slowhost.com/*' autoenable=true
|
||||
git config remote.fasthost.annex-cost 150
|
||||
|
||||
But when there's a second slow host, that approach falls down, because you
|
||||
can't specify urlexclude= twice. And even if you could, there would be a
|
||||
distributed configs merging issue same as discussed in comment #3.
|
||||
|
||||
I think what's needed is for the main web special remote to notice that a
|
||||
web remote such as fastweb or slowweb exists, and automatically exclude
|
||||
from using the urls that other web remote is configured to use. Which
|
||||
will be a little bit tricky to implent, but seems doable.
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue