Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.
Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.
As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
Test suite now passes with -threaded!
I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.
In the process, --debug stopped printing commands it runs. I may try to
bring that back later.
Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
This *almost* works.
Along the way, I noticed that the --uuid parameter was being accidentially
passed after the --, so that has never been actually used by
git-annex-shell to verify it's running in the expected repository. Oops. Fixed.
In order to record a semi-useful filename associated with the key,
this required plumbing the filename all the way through to the remotes'
storeKey and retrieveKeyFile.
Note that there is potential for deadlock here, narrowly avoided.
Suppose the repos are A and B. A sends file foo to B, and at the same
time, B gets file foo from A. So, A locks its upload transfer info file,
and then locks B's download transfer info file. At the same time,
B is taking the two locks in the opposite order. This is only not a
deadlock because the lock code does not wait, and aborts. So one of A or
B's transfers will be aborted and the other transfer will continue.
Whew!
Not including such remotes turned out to have other consequences,
including annex-truselevel git config being ignored. Instead, add guards
before each operation that might try to operate on such a repo.
Prelude.undefined error message was introduced by
bb4f31a0ee.
It seems best to filter out local repositories that cannot be accessed
from the list of remotes, rather than keeping them in and making every
thing that uses the list have to deal with remotes that may have an unknown
location.
Besides fixing the error message, this also makes unavailable local
remotes' names not be shown in various messages, including in git annex
status output.
Also, move --to an unavailable local repository now avoids some ugly
errors like "changeWorkingDirectory: does not exist".
This was shown redundantly for a tricky reason -- while it runs
inside a doSideAction block that would appear to supress it,
the action being run is in a different state monad; for the remote,
and so the suppression doesn't work.
Always suppressing the message when committing to a local remote is
ok do to though -- it mirrors the /dev/nulling of the git annex shell commit
output. And it turns out that any time there is a git-annex branch state
change to commit on the remote, the local repo has also had a similar
change made, and so the message has been shown already.
The environment needs to override git-config. Changed when git config is
read, and avoid rereading it once it's been read.
chdir for both worktree settings.
getConfig got a remote-specific config, and this confusing name caused it
to be used a couple of places that only were interested in global configs.
Rename to getRemoteConfig and make getConfig only get global configs.
There are no behavior changes here, but remote.<name>.annex-web-options
never actually worked (and per-remote web options is a very unlikely to be
useful case so I didn't make it work), so fix the documentation for it.
openSUSE patches rsync with a patch adding SIP protocol support.
https://gist.github.com/2026167
With this patch, running rsync with no hostname parameter is apparently
supposed to list SIP hosts on the network. Practically, it does nothing
and exits 0.
git-annex uses rsync in a very special way to allow git-annex-shell to be
run on the remote host, and so did not need to specify a hostname, or a
file to transfer as a rsync parameter. So it sent ":", a degenerate case of
"host:file".
But the patch cannot differentiate ":" with no host parameter
(a bug in the SIP patch surely).
Results were that getting files failed, as rsync seemed to succeed, but the
requested file failed to arrive. Also I think that sending files will
make git-annex think a file has been transferred to the remote when
really rsync does nothing.
The workaround for this buggy rsync patch is to use "dummy:" as the
hostname.
Added Annex.cleanup, which is a general purpose interface for adding
actions to run at the end.
Remotes with the old git-annex-shell will commit every time, and have no
commit command, so hide stderr when running the commit command.
If there's no Content-Length, or the key has no size, this check is not
done, but it should happen most of the time, and protect against web
content that has changed.
Done by adding a oneshot mode, in which location log changes are written to
the journal, but not committed. Taking advantage of git-annex's existing
ability to recover in this situation.
This is used by git-annex-shell and other places where changes are made to
a remote's location log.
For a local git remote, can symlink the file.
For a git remote using rsync, can preseed any local content.
There are a few reasons to use fsck --from on a normal git remote.
One is if it's using gitosis or similar, and you don't have shell access
to run git annex locally. Another reason could be if you just want to
fsck certian files of a bare remote.
With --fast, unavailable local remotes are filtered out of the fast set.
This way, if there are local remotes, --fast always acts only on them,
and if none are mounted, acts on nothing. This consistency is better
than --fast acting on different remotes depending on what's mounted.
A crash on parsing was fixed a while ago. This adds support for fully
correctly parsing multiline git config values, using git config --null.
Since git-annex-shell configlist uses normal git config output, I left in
support for that too; the two forms of config output can be easily
identified by the parser. Since configlist only prints the annex.uuid
config, there's no risk of multiline values there, so no need to change it.
Needed due to this scenario: Bare repo origin is made, foo is cloned from it;
foo is initalized; a file is added to foo's annex; git annex move --to origin
Since the git-annex branch has not yet been pushed to origin, it doesn't
auto-initialize. When the content is sent to it, it's stored, but
the remote has NoUUID, and so nothing is logged in the location log.
Then the content is removed from the local repo, and git-annex has lost
track of it.
git annex fsck in origin will find the lost content, but let's not let this
happen. Content should only be sent to initalized remotes.
This cannot happen for non-local remotes, since git-annex-shell always
checks that the repo is initialized.
Supporting multiple directory hash types will allow converting to a
different one, without a flag day.
gitAnnexLocation now checks which of the possible locations have a file.
This means more statting of files. Several places currently use
gitAnnexLocation and immediately check if the returned file exists;
those need to be optimised.
git-annex-shell inannex now returns always 0, 1, or 100 (the last when
it's unclear if content is currently in the index due to it currently being
moved or dropped).
(Actual locking code still not yet written.)
Many functions took the repo as their first parameter. Changing it
consistently to be the last parameter allows doing some useful things with
currying, that reduce boilerplate.
In particular, g <- gitRepo is almost never needed now, instead
use inRepo to run an IO action in the repo, and fromRepo to get
a value from the repo.
This also provides more opportunities to use monadic and applicative
combinators.
Before the config was read each time onLocal was called, and entirely
redundantly since it's read for same-host remotes on startup.
Also a minor bug fix: When rsyncing to a same-host remote, use the
rsync-options from the repository that the user ran git-annex in, not those of
the receiving repository.
Specifically, disabled trying to update the git-annex branch on the remote,
since that data is never used by operations that act on such remotes.
Also, when copying content to such a remote, skip committing the presence
information changes to its git-annex branch. Leaving it in the journal there
is ok: Any command run on the remote that needs the info will flush the
journal.
This may partially solve this bug:
http://git-annex.branchable.com/bugs/fails_to_handle_lot_of_files/
Although I still see unreaped git processes piling up when doing a copy --to.