git-annex/doc/design/requests_routing.mdwn

100 lines
2.9 KiB
Markdown

## requesting content
In some situations, nodes only want particular files, and not everything.
(Or don't have the bandwidth to get everything.) A way to handle this,
that should work in a fully ad-hoc, offline distributed network,
suggested by Vincenzo Tozzi:
* Nodes generate a request for a specific file they want, committed
to git somewhere.
* This request has a TTL (of eg 3 or 4).
* When syncing, copy the requests that a node has, and decrease their TTL
by 1. Requests with a TTL of 0 have timed out and are not copied.
(So, requests are stored in git, but on eg, per-node branches.)
* Only copy content to nodes that have a request for it (either one
originating with them, or one they copied from another node).
* Each request indicates the requesting node, so once no nodes have an
active request for a particular file, it's ok to drop it from the
transfer nodes (honoring numcopies etc of course).
## simulation
A simulation of a network using this method is in [[simroutes.hs]].
Question: How efficient is this method? Does the network fill with many
copies that are not needed, before the request is fulfilled?
## storing requests
Requests could be stored in the location tracking file.
Currently:
time 0|1 uuid1
time 0|1 uuid2
* Use negative numbers for the TTL of a request:
time -3! uuid1
time -2 uuid2
The `!` indicates that the request originated on
that node.
* To propigate a request, set -1 * (TTL+1) in the line
for the uuid of the repository that is propigating it.
This should be done as part of the git-annex branch merge,
so if a location tracking file is merged, any open requests
get propigated to the current repository automatically.
* When a requested file reaches a node that requested it,
the location is set to 1; this automatically clears the
request.
* When a file has no more originating requests, clear all
the copied requests:
time 1 uuid1
time -2 uuid2
Becomes:
time 1 uuid1
time' 0 uuid2
## generating requests
git annex request [file...]
Indicates that the file is wanted in the current repository.
(git annex get could also do this on failure, or suggest doing this)
## acting on requests
Add a preferred content expression that looks at request data:
requestedby=N
Matches files that have been requested by at least N nodes.
requested
Matches files that the current node has requested.
### Example preferred content expressions
For an immobile node that accumulates files it requests, and also
temporarily stores files requested by other such nodes:
present or requestedby=1
For a node that only transfers files between the immobile nodes:
requestedby=1
For an immobile node that only accumulates files it requests, but never
stores files requested by other nodes:
present or requested
TODO: Would be nice to be able to prioritize files that more nodes are
requesting, or that have some urgent flag set. But currently there is no
way to do that; content is either preferred or not preferred.