388 lines
16 KiB
Markdown
388 lines
16 KiB
Markdown
This is a summary todo covering several subprojects, which would extend
|
|
git-annex to be able to use proxies which sit in front of a cluster of
|
|
repositories.
|
|
|
|
1. [[design/passthrough_proxy]]
|
|
2. [[design/p2p_protocol_over_http]]
|
|
3. [[design/balanced_preferred_content]]
|
|
4. [[todo/track_free_space_in_repos_via_git-annex_branch]]
|
|
5. [[todo/proving_preferred_content_behavior]]
|
|
|
|
## table of contents
|
|
|
|
[[!toc ]]
|
|
|
|
## planned schedule
|
|
|
|
Joey has received funding to work on this.
|
|
Planned schedule of work:
|
|
|
|
* June: git-annex proxies and clusters
|
|
* July: p2p protocol over http
|
|
* August, part 1: git-annex proxy support for exporttree
|
|
* August, part 2: [[track_free_space_in_repos_via_git-annex_branch]]
|
|
* September, part 1: balanced preferred content
|
|
* September, part 2: streaming through proxy to special remotes (especially S3)
|
|
* October, part 1: streaming through proxy continued
|
|
* October, part 2: proving behavior of balanced preferred content with proxies
|
|
|
|
[[!tag projects/openneuro]]
|
|
|
|
## work notes
|
|
|
|
* `git-annex assist --rebalance` of `balanced=foo:2`
|
|
sometimes needs several runs to stabalize.
|
|
|
|
May not be a bug, needs reproducing and analysis.
|
|
|
|
* Concurrency issues with RepoSizes calculation and balanced content:
|
|
|
|
* What if 2 concurrent threads are considering sending two different
|
|
keys to a repo at the same time. It can hold either but not both.
|
|
It should avoid sending both in this situation.
|
|
|
|
* There can also be a race with 2 concurrent threads where one just
|
|
finished sending to a repo, but has not yet updated the location log.
|
|
So the other one won't see an updated repo size.
|
|
|
|
The fact that location log changes happen in CommandCleanup makes
|
|
this difficult to fix.
|
|
|
|
Could provisionally update Annex.reposizes before starting to send a
|
|
key, and roll it back if the send fails. But then Logs.Location
|
|
would update Annex.reposizes redundantly. So would need to remember
|
|
the provisional update was made until that is called.... But what if it
|
|
is never called for some reason?
|
|
|
|
Also, in a race between two threads at the checking preferred content
|
|
stage, neither would have started sending yet, and so both would think
|
|
it was ok for them to.
|
|
|
|
This race only really matters when the repo becomes full,
|
|
then the second thread will fail to send because it's full. Or will
|
|
send more than the configured maxsize. Still this would be good to
|
|
fix.
|
|
|
|
* If all the above thread concurrency problems are fixed, separate
|
|
processes will still have concurrency problems. One case where that is
|
|
bad is a cluster accessed via ssh. Each connection to the cluster is
|
|
a separate process. So each will be unaware of changes made by others.
|
|
When `git-annex copy --to cluster -Jn` is used, this makes a single
|
|
command behave non-ideally, the same as the thread concurrency
|
|
problems.
|
|
|
|
* Possible solution:
|
|
|
|
Add to reposizes db a table for live updates.
|
|
Listing process ID, thread ID, UUID, key, addition or removal
|
|
(done)
|
|
|
|
Make checking the balanced preferred content limit record a
|
|
live update in the table (done)
|
|
|
|
... and use other live updates in making its decision
|
|
|
|
Note: This will only work when preferred content is being checked.
|
|
If a git-annex copy without --auto is run, for example, it won't
|
|
tell other processes that it is in the process of filling up a remote.
|
|
That seems ok though, because if the user is running a command like
|
|
that, they are ok with a remote filling up.
|
|
|
|
Make sure that two threads don't check balanced preferred content at the
|
|
same time, so each thread always sees a consistent picture of what is
|
|
happening. Use locking as necessary.
|
|
|
|
In the unlikely event that one thread of a process is storing a key and
|
|
another thread is dropping the same key from the same uuid, at the same
|
|
time, reconcile somehow. How? Or is this perhaps something that cannot
|
|
happen? Could just record the liveupdate for one, and not for the
|
|
other.
|
|
|
|
Also keep an in-memory cache of the live updates being performed by
|
|
the current process. For use in location log update as follows..
|
|
|
|
Make updating location log for a key that is in the in-memory cache
|
|
of the live update table update the db, removing it from that table,
|
|
and updating the in-memory reposizes. (done)
|
|
|
|
Make updading location log have locking to make sure redundant
|
|
information is never visible:
|
|
Take lock, journal update, remove from live update table.
|
|
|
|
Detect when an upload (or drop) fails, and remove from the live
|
|
update table and in-memory cache. (done)
|
|
|
|
Have a counter in the reposizes table that is updated on write. This
|
|
can be used to quickly determine if it has changed. On every check of
|
|
balanced preferred content, check the counter, and if it's been changed
|
|
by another process, re-run calcRepoSizes. This would be expensive, but
|
|
it would only happen when another process is running at the same time.
|
|
The counter could also be a per-UUID counter, so two processes
|
|
operating on different remotes would not have overhead.
|
|
|
|
When loading the live update table, check if PIDs in it are still
|
|
running (and are still git-annex), and if not, remove stale entries
|
|
from it, which can accumulate when processes are interrupted.
|
|
Note that it will be ok for the wrong git-annex process, running again
|
|
at a pid to keep a stale item in the live update table, because that
|
|
is unlikely and exponentially unlikely to happen repeatedly, so stale
|
|
information will only be used for a short time.
|
|
|
|
But then, how to check if a PID is git-annex or not? /proc of course,
|
|
but what about other OS's? Windows?
|
|
|
|
How? Possibly have a thread that
|
|
waits on an empty MVar. Thread MVar through somehow to location log
|
|
update. (Seems this would need checking preferred content to return
|
|
the MVar? Or alternatively, the MVar could be passed into it, which
|
|
seems better..) Fill MVar on location log update. If MVar gets
|
|
GCed without being filled, the thread will get an exception and can
|
|
remove from table and cache then. This does rely on GC behavior, but if
|
|
the GC takes some time, it will just cause a failed upload to take
|
|
longer to get removed from the table and cache, which will just prevent
|
|
another upload of a different key from running immediately.
|
|
(Need to check if MVar GC behavior operates like this.
|
|
See https://stackoverflow.com/questions/10871303/killing-a-thread-when-mvar-is-garbage-collected )
|
|
Perhaps stale entries can be found in a different way. Require the live
|
|
update table to be updated with a timestamp every 5 minutes. The thread
|
|
that waits on the MVar can do that, as long as the transfer is running. If
|
|
interrupted, it will become stale in 5 minutes, which is probably good
|
|
enough? Could do it every minute, depending on overhead. This could
|
|
also be done by just repeatedly touching a file named with the processes's
|
|
pid in it, to avoid sqlite overhead.
|
|
|
|
* Still implementing LiveUpdate. Check for TODO XXX markers
|
|
|
|
* Could two processes both doing the same operation end up both
|
|
calling successfullyFinishedLiveSizeChange with the same repo uuid and
|
|
key? If so, the rolling total would get out of wack.
|
|
|
|
Logs.Location.logChange only calls updateRepoSize when the presence
|
|
actually changed. So if one process does something and then the other
|
|
process also does the same thing (eg both drop), the second process
|
|
will see what the first process recorded, and won't update the size
|
|
redundantly.
|
|
|
|
But: What if they're running at the same time? It seems
|
|
likely that Annex.Branch.maybeChange does not handle that in a way
|
|
that will guarantee this doesn't happen. Does anything else guarantee
|
|
it?
|
|
|
|
Can additional locking be added to avoid it? Probably, but it
|
|
will add overhead and so should be avoided in the NoLiveUpdate case.
|
|
|
|
* In the case where a copy to a remote fails (due eg to annex.diskreserve),
|
|
the LiveUpdate thread can not get a chance to catch its exception when
|
|
the LiveUpdate is gced, before git-annex exits. In this case, the
|
|
database is left with some stale entries in the live update table.
|
|
|
|
This is not a big problem because the same can happen when the process is
|
|
interrupted. Still it would be cleaner for this not to happen. Is there
|
|
any way to prevent it? Waiting 1 GC tick before exiting would do it,
|
|
I'd think, but I tried manually doing a performGC at git-annex shutdown
|
|
and it didn't help.
|
|
|
|
* The assistant is using NoLiveUpdate, but it should be posssible to plumb
|
|
a LiveUpdate through it from preferred content checking to location log
|
|
updating.
|
|
|
|
* `git-annex info` in the limitedcalc path in cachedAllRepoData
|
|
double-counts redundant information from the journal due to using
|
|
overLocationLogs. In the other path it does not, and this should be fixed
|
|
for consistency and correctness.
|
|
|
|
## completed items for August's work on balanced preferred content
|
|
|
|
* Balanced preferred content basic implementation, including --rebalance
|
|
option.
|
|
* Implemented [[track_free_space_in_repos_via_git-annex_branch]]
|
|
* `git-annex maxsize`
|
|
* annex.fullybalancedthreshhold
|
|
|
|
## completed items for August's work on git-annex proxy support for exporttre
|
|
|
|
* Special remotes configured with exporttree=yes annexobjects=yes
|
|
can store objects in .git/annex/objects, as well as an exported tree.
|
|
|
|
* Support proxying to special remotes configured with
|
|
exporttree=yes annexobjects=yes.
|
|
|
|
* post-retrieve: When proxying is enabled for an exporttree=yes
|
|
special remote and the configured remote.name.annex-tracking-branch
|
|
is received, the tree is exported to the special remote.
|
|
|
|
* When getting from a P2P HTTP remote, prompt for credentials when
|
|
required, instead of failing.
|
|
|
|
* Prevent `updateproxy` and `updatecluster` from adding
|
|
an exporttree=yes special remote that does not have
|
|
annexobjects=yes, to avoid foot shooting.
|
|
|
|
* Implement `git-annex export treeish --to=foo --from=bar`, which
|
|
gets from bar as needed to send to foo. Make post-retrieve use
|
|
`--to=r --from=r` to handle the multiple files case.
|
|
|
|
## items deferred until later for p2p protocol over http
|
|
|
|
* `git-annex p2phttp` should support serving several repositories at the same
|
|
time (not as proxied remotes), so that eg, every git-annex repository
|
|
on a server can be served on the same port.
|
|
|
|
* Support proxying to git remotes that use annex+http urls. This needs a
|
|
translation from P2P protocol to servant-client to P2P protocol.
|
|
|
|
* Should be possible to use a git-remote-annex annex::$uuid url as
|
|
remote.foo.url with remote.foo.annexUrl using annex+http, and so
|
|
not need a separate web server to serve the git repository. Doesn't work
|
|
currently because git-remote-annex urls only support special remotes.
|
|
It would need a new form of git-remote-annex url, eg:
|
|
annex::$uuid?annex+http://example.com/git-annex/
|
|
|
|
* `git-annex p2phttp` could support systemd socket activation. This would
|
|
allow making a systemd unit that listens on port 80.
|
|
|
|
## completed items for July's work on p2p protocol over http
|
|
|
|
* HTTP P2P protocol design [[design/p2p_protocol_over_http]].
|
|
|
|
* addressed [[doc/todo/P2P_locking_connection_drop_safety]]
|
|
|
|
* implemented server and client for HTTP P2P protocol
|
|
|
|
* added git-annex p2phttp command to serve HTTP P2P protocol
|
|
|
|
* Make git-annex p2phttp support https.
|
|
|
|
* Allow using annex+http urls in remote.name.annexUrl
|
|
|
|
* Make http server support proxying.
|
|
|
|
* Make http server support serving a cluster.
|
|
|
|
## items deferred until later for [[design/passthrough_proxy]]
|
|
|
|
* Check annex.diskreserve when proxying for special remotes
|
|
to avoid the proxy's disk filling up with the temporary object file
|
|
cached there.
|
|
|
|
* Resuming an interrupted download from proxied special remote makes the proxy
|
|
re-download the whole content. It could instead keep some of the
|
|
object files around when the client does not send SUCCESS. This would
|
|
use more disk, but without streaming, proxying a special remote already
|
|
needs some disk. And it could minimize to eg, the last 2 or so.
|
|
The design doc has some more thoughts about this.
|
|
|
|
* Streaming download from proxied special remotes. See design.
|
|
(Planned for September)
|
|
|
|
* When an upload to a cluster is distributed to multiple special remotes,
|
|
a temporary file is written for each one, which may even happen in
|
|
parallel. This is a lot of extra work and may use excess disk space.
|
|
It should be possible to only write a single temp file.
|
|
(With streaming this won't be an issue.)
|
|
|
|
* Indirect uploads when proxying for special remote
|
|
(to be considered). See design.
|
|
|
|
* Getting a key from a cluster currently picks from amoung
|
|
the lowest cost remotes at random. This could be smarter,
|
|
eg prefer to avoid using remotes that are doing other transfers at the
|
|
same time.
|
|
|
|
* The cost of a proxied node that is accessed via an intermediate gateway
|
|
is currently the same as a node accessed via the cluster gateway.
|
|
To fix this, there needs to be some way to tell how many hops through
|
|
gateways it takes to get to a node. Currently the only way is to
|
|
guess based on number of dashes in the node name, which is not satisfying.
|
|
|
|
Even counting hops is not very satisfying, one cluster gateway could
|
|
be much more expensive to traverse than another one.
|
|
|
|
If seriously tackling this, it might be worth making enough information
|
|
available to use spanning tree protocol for routing inside clusters.
|
|
|
|
* Optimise proxy speed. See design for ideas.
|
|
|
|
* Speed: A proxy to a local git repository spawns git-annex-shell
|
|
to communicate with it. It would be more efficient to operate
|
|
directly on the Remote. Especially when transferring content to/from it.
|
|
But: When a cluster has several nodes that are local git repositories,
|
|
and is sending data to all of them, this would need an alternate
|
|
interface than `storeKey`, which supports streaming, of chunks
|
|
of a ByteString.
|
|
|
|
* Use `sendfile()` to avoid data copying overhead when
|
|
`receiveBytes` is being fed right into `sendBytes`.
|
|
Library to use:
|
|
<https://hackage.haskell.org/package/hsyscall-0.4/docs/System-Syscall.html>
|
|
|
|
* Support using a proxy when its url is a P2P address.
|
|
(Eg tor-annex remotes.)
|
|
|
|
## completed items for June's work on [[design/passthrough_proxy]]:
|
|
|
|
* UUID discovery via git-annex branch. Add a log file listing UUIDs
|
|
accessible via proxy UUIDs. It also will contain the names
|
|
of the remotes that the proxy is a proxy for,
|
|
from the perspective of the proxy. (done)
|
|
|
|
* Add `git-annex updateproxy` command (done)
|
|
|
|
* Remote instantiation for proxies. (done)
|
|
|
|
* Implement git-annex-shell proxying to git remotes. (done)
|
|
|
|
* Proxy should update location tracking information for proxied remotes,
|
|
so it is available to other users who sync with it. (done)
|
|
|
|
* Implement `git-annex initcluster` and `git-annex updatecluster` commands (done)
|
|
|
|
* Implement cluster UUID insertation on location log load, and removal
|
|
on location log store. (done)
|
|
|
|
* Omit cluster UUIDs when constructing drop proofs, since lockcontent will
|
|
always fail on a cluster. (done)
|
|
|
|
* Don't count cluster UUID as a copy in numcopies checking etc. (done)
|
|
|
|
* Tab complete proxied remotes and clusters in eg --from option. (done)
|
|
|
|
* Getting a key from a cluster should proxy from one of the nodes that has
|
|
it. (done)
|
|
|
|
* Implement upload with fanout to multiple cluster nodes and reporting back
|
|
additional UUIDs over P2P protocol. (done)
|
|
|
|
* Implement cluster drops, trying to remove from all nodes, and returning
|
|
which UUIDs it was dropped from. (done)
|
|
|
|
* `git-annex testremote` works against proxied remote and cluster. (done)
|
|
|
|
* Avoid `git-annex sync --content` etc from operating on cluster nodes by
|
|
default since syncing with a cluster implicitly syncs with its nodes. (done)
|
|
|
|
* On upload to cluster, send to nodes where its preferred content, and not
|
|
to other nodes. (done)
|
|
|
|
* Support annex.jobs for clusters. (done)
|
|
|
|
* Add `git-annex extendcluster` command and extend `git-annex updatecluster`
|
|
to support clusters with multiple gateways. (done)
|
|
|
|
* Support proxying for a remote that is proxied by another gateway of
|
|
a cluster. (done)
|
|
|
|
* Support distributed clusters: Make a proxy for a cluster repeat
|
|
protocol messages on to any remotes that have the same UUID as
|
|
the cluster. Needs extension to P2P protocol to avoid cycles.
|
|
(done)
|
|
|
|
* Proxied cluster nodes should have slightly higher cost than the cluster
|
|
gateway. (done)
|
|
|
|
* Basic support for proxying special remotes. (But not exporttree=yes ones
|
|
yet.) (done)
|
|
|
|
* Tab complete remotes in all relevant commands (done)
|
|
|
|
* Display cluster and proxy information in git-annex info (done)
|