Used by the assistant, rather than copy, this is faster because it avoids
using git ls-files, avoids checking the location log redundantly, and
runs in oneshot mode, avoiding making a commit to the git-annex branch
for every file transferred.
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.
However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.
Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.
This code could have its efficiency improved:
* When multiple remotes are synced, if any one has diverged, they're
all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
the local repo has new content, or both, and could optimise its scan
accordingly.
This commit includes a paydown on technical debt incurred two years ago,
when I didn't know that it was bad to make custom Read and Show instances
for types. As the routes need Read and Show for Transfer, which includes a
Key, and deriving my own Read instance of key was not practical,
I had to finally clean that up.
So the compact Key read and show functions are now file2key and key2file,
and Read and Show are now derived instances.
Changed all code that used the old instances, compiler checked.
(There were a few places, particularly in Command.Unused, and the test
suite where the Show instance continue to be used for legitimate
comparisons; ie show key_x == show key_y (though really in a bloom filter))
Avoid crashing when "git annex get" fails to download from one location,
and falls back to downloading from a second location.
The problem is that git annex get calls download recursively from within
itself if the first download attempt fails. So the first time through, it
writes a transfer info file, which is then overwritten on the second,
recursive call. Then on cleanup, it tries to delete the file twice, which
of course doesn't work.
Fixed both by not crashing if the transfer file is removed, and by
changing Get to not run download recursively like that. It's the only
thing that did so, and it just seems like a bad idea.
git annex assistant --autostart will start separate daemons in each
listed autostart repo
running the webapp outside any git-annex repo will open it on the
first listed autostart repo
This prevents multiple runs of the assistant in the foreground, and lets
--stop stop foregrounded runs too.
The webapp firstrun case also now writes a pid file, once it's made the git
repo to put it in.
This avoids forking another process, avoids polling, fixes a race,
and avoids a rare forkProcess thread hang that I saw once time
when starting the webapp.
Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.
Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.
As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
Test suite now passes with -threaded!
I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.
In the process, --debug stopped printing commands it runs. I may try to
bring that back later.
Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
This *almost* works.
Along the way, I noticed that the --uuid parameter was being accidentially
passed after the --, so that has never been actually used by
git-annex-shell to verify it's running in the expected repository. Oops. Fixed.
Not yet tested and places git-annex-shell is run need to be modified to
pass the new field settings.
Note that rsyncServerSend was changed to fork, rather than directly exec
rsync, because it needs to keep the transfer lock held, and clean up the
transfer log when done.
In order to record a semi-useful filename associated with the key,
this required plumbing the filename all the way through to the remotes'
storeKey and retrieveKeyFile.
Note that there is potential for deadlock here, narrowly avoided.
Suppose the repos are A and B. A sends file foo to B, and at the same
time, B gets file foo from A. So, A locks its upload transfer info file,
and then locks B's download transfer info file. At the same time,
B is taking the two locks in the opposite order. This is only not a
deadlock because the lock code does not wait, and aborts. So one of A or
B's transfers will be aborted and the other transfer will continue.
Whew!
Note this is per-remote, so trying to get the same file from multiple
remotes can still let duplicate downloads run. (And uploading the same file
to multiple remotes is not duplicate at all of course.)
get, move, and copy are the only git-annex subcommands that transfer
files, but there's still git-annex-shell recvkey and sendkey to deal with too.
I considered modifying retrieveKeyFile or getViaTmp, but they are called
by other code that does not involve expensive file transfers (migrate)
or that does file transfers that should not be checked by this (fsck --from).