git-annex/doc/todo/extensible_addurl.mdwn

89 lines
3.4 KiB
Text
Raw Normal View History

2014-12-04 18:56:51 +00:00
`git annex addurl` supports regular urls, as well as detecting videos that
quvi can download. We'd like to extend this to support extensible uri
handling.
Use cases range from torrent download support, to pulling data
from scientific data repositories that use their own APIs.
The basic idea is to have external special remotes (or perhaps built-in
ones in some cases), which addurl can use to download an object, referred
to by some uri-like thing. The uri starts with "$downloader:" to indicate
that it's not a regular url and so is not handled by the web special
remote.
2014-12-04 18:56:51 +00:00
git annex addurl torrent:$foo
git annex addurl CERN:$bar
Problem: This requires mapping from the name of the downloader, which is
probably the same as the git-annex-remote-$downloader program implementing
the special remote protocol (but not always), to the UUID of a remote.
That's assuming we want location tracking to be able to know that a file is
both available from CERN and from a torrent, for example.
Solution: Add a new method to remotes:
claimUrl :: Maybe (URLString -> Annex Bool)
Remotes that implement this method (including special remotes) will
be queried when such an uri is added, to see which claims it.
Once the remote is known, addurl --file will record that the Key is present
on that remote, and record the uri in the url log.
----
What about using addurl to add a new file? In this mode, the Key is not yet
known. addurl currently handles this by generating a dummy Key for the url
(hitting the url to get its size), and running a Transfer using the dummy
key that downloads from the web. Once the download is done, the dummy Key
is upgraded to the final Key.
Something similar could be done for other remotes, but the url log for the
dummy key would need to have the url added to it, for the remote to know
what to download, and then that could be removed after the download. Which
causes ugly churn in git, and would leave a mess if interrupted.
One option is to add another new method to remotes:
downloadUrl :: Maybe (URLString -> Annex FilePath)
Or, the url log could have support added for recording temporary key
urls in memory. (done)
Another problem is that the size of the Key isn't known. addurl
could always operate in relaxed mode, where it generates a size-less Key.
Or, yet another method could be added: (done)
sizeUrl :: URLString -> Annex (Maybe Integer)
----
Retrieval of the Key works more or less as usual. The only
difference being that remotes that support this interface can look
at the url log to find the one with the right "$downloader:" prefix,
and so know where to download from. (Much as the web special remote already
does.)
Prerequisite: Expand the external special remote interface to support
accessing the url log. (done)
----
2014-12-04 18:56:51 +00:00
It would also be nice to be able to easily configure a regexp that normal
urls, if they match, are made to use a particular downloader. So, for
torrents, this would make matching urls have torrent: prefixed to them.
git config annex.downloader.torrent.regexp '(^magnet:|\.torrent$)'
It might also be useful to allow bypassing the complexity of the external
special remote interface, and let a downloader be specified simply by:
git config annex.downloader.torrent.command 'aria2c %url $file'
This could be implemented in either the web special remote or even in an
external special remote.
2014-12-04 18:56:51 +00:00
Some other discussion at <https://github.com/datalad/datalad/issues/10>
> [[done]]! --[[Joey]]