git-annex/doc/todo/to_and_from_multiple_remotes.mdwn
Joey Hess cffa2446e8
tagged the past 2 years of open todos and followed up to a few of them
also moved some that were really bug reports to bugs/ and closed a
couple
2020-01-30 15:22:05 -04:00

69 lines
3.1 KiB
Markdown

git annex sync supports git remote groups, that should be expanded to all
commands supporting --to and --from. And since there will be a [Remote]
list, could also support --to foo --to bar etc.
--[[Joey]]
Started working on this in multifromto branch.
Problem: Command.Move uses onlyActionOn, so only allows one remote to
process a key at a time. Seems that onlyActionOn needs to be expanded to
include the remote.
Problem: The concurrency is not right yet. For example,
`git-annex copy --to foo --to bar -J2` will start actions copying a
file to both remotes. Assume the copy to bar finishes first.
Then it will start an action copying the next file to foo, despite a
copy to foo already running.
To fix this, it would help to look at Annex.activeremotes when starting the
next action. If foo is busy, start the action on bar instead. It would
be possible to build up a queue of actions that have not yet been started on
a remote. That risks using a lot of memory if one remote is very slow.
The queue would need to be capped at some amount, and when full,
delay until the laggard remotes catches up.
Bonus: git annex sync --content -J2 already works, but it has the same
problem described above and ought to be able to be fixed the same way.
Also worth noting that with --auto and -J, git-annex may make more transfers
than preferred content settings demand, because it will start several
transfers to different remotes at once. If only one copy is needed
amoung all the remotes, it won't notice and a copy will be sent to all
remotes. I think this is something the user can understand though?
----
Looking at commands that support --to and --from and what each should do,
there is a lot of diversity.
git annex move --to of course removes the local copy. So if moving to
multiple remotes it would need to delay that removal until it's sent to all
of them. And it really only ought to try to remove the local copy once
at the end, not once per remote moved to.
git annex move --from ought to spread the load amoung remotes with -Jn,
and once a file is downloaded, it needs to try to remove it from all the
remotes.
git annex mirror --from mirrors one remote; mirroring from
multiple remotes does not really make any sense. mirror --to multiple
could be done.
git annex unused --from seems unlikely to make sense with multiple remotes,
since it would result in a list of keys distributed amoung them, and what
would be done with that? Perhaps a git annex drop --from multiple remotes,
but that would be innefficient. A shell script looping over remotes makes
more sense if the user wants to drop unused from multiple remotes.
git annex get/fsck/copy/export/transferkey/drop/dropunused all make sense to
support multiple remotes. But with -Jn the operations that get files behave
differently than the operations that drop files. The gets want to balance
load amoung the remotes, while the drops and uploads need to run each
action over each remote.
Seems two runners are needed with different concurrency behavior, one that
balances the load amoung remotes, and one that runs the same action against
multiple remotes concurrently.
[[!tag needsthought]]