I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
Done on unix, could not implement it on windows quite.
The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.
Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
All callers adjusted to update it themselves.
In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.
This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
There was no particular reason not to support this, other than maybe a lack
of a use case. One use case would of course be a remote that you want to
avoid overwriting content on. A new use case is the idea of importing from
backups, eg borg, where exporting is not necessarily supported at all.
This commit was sponsored by Brock Spratlen on Patreon.
It's not concurrent-output safe, and doesn't support
--json-error-messages.
Using Annex.makeRunner is a bit scary, because what if it's run in a
different thread from an active annex action? Normally the same Annex
state is not used concurrently in several threads, and it's not designed
to be fully concurrency safe. (Annex.Concurrent exists to deal with
that.) I think it will be ok in these simple cases though. Eg,
when buffering a warning message to json, Annex.changeState is used,
and it modifies the MVar in a concurrency safe way.
The only warningIO remaining is not a problem.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.
Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.
The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink
This commit was sponsored by Ethan Aubin.
It looks to me like the old code would have already dealt with the case
of ssh starting a ssh daemon that inherits stderr and keeps it open.
The ender thread closed the handle, which would unblock the other thread
and let it exit. Using hGetLineUntilExitOrEOF makes this more explicit
that it's dealt with and simplifies the code.
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.
Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:
* If two files point to one file, then eg, `git annex get foo` will
update the branch to unlock foo, but will not unlock bar, because it
does not know about it. Might be fixable by making `git annex get
bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info
Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.
A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
removeFile changed to removeLink, because AFAICS it should be fine to
remove non-file things here. In particular, it's fine to remove a
symlink, since we're about to write a symlink. (removeLink does not
remove directories, so file, symlink, and unix socket are the only
possibilities.)
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.
This commit was sponsored by Graham Spencer on Patreon.
Only done in checkPresentChunks, although retrieveChunks could also do
it. Does not seem necessary though, because git-annex never retrives
content without first checking if it's present AFAICR. And really this will
only be needed when using fsck. Puttting it here, rather than in fsck
avoids breaking an abstraction boundary, and is nice and inexpensive.
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.
This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.
Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
When I put in Haskell98 this spring, I was under the mistaken
apprehension that ghc defaulted to that. But it actually its default
is a third mode, which is closer to Haskell2010 but with some differences.
The manual says "By default, GHC mainly aims to behave (mostly) like a
Haskell 2010 compiler"
Fixed two cases where the Haskell98 do indentation flexability let
wrongly indented code build. That is one of the places where
ghc does not behave like Haskell2010 by default.
The other place that I think I was concerned about, is GHC manual
section 19.1.1.3. Expressions and patterns. But that only seems to
affect code using bottoms, so would only affect pure functions throwing
an error, which I don't think git-annex does in many places as it's
pretty horrid style. And it would only affect rare cases like shown in
that section. If it did happen, it would mean that the error was not
thrown before specifying Haskell98, and then was. Haskell2010 behaves
the same as Haskell98.
This commit was sponsored by Denis Dzyubenko on Patreon.
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.
The youtube-dl output is no longer displayed, except for any errors.
This commit was sponsored by Denis Dzyubenko on Patreon.
Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/
although I don't know if it does.
My thinking is, ResourceT may allocate a resource and then free it,
and a unforced thunk to that resource could result in reading memory
that has since been overwritten by something else, or in a SEGV,
depending. While that seems kind of like a bug in ResourceT to me, if it
is what's happening, this will avoid it. If it's not, this doesn't
really hurt much since the values are all smallish.
This commit was sponsored by Graham Spencer on Patreon.
So these special remotes are always supported.
IIRC these build flags were added because the dep chains were a bit too
long, or perhaps because the libraries were not available in Debian stable,
or something like that. That was long ago, those reasons no longer apply,
and users get confused when builtin special remotes are not available, so
it seems best to remove the build flags now.
If this does cause a problem it can be reverted of course..
This commit was sponsored by Jochen Bartl on Patreon.
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.
Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.
Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
Also tested what happens if the other special remote has importtree=yes
and exporttree=yes, and in that case, download via httpalso works too,
without needing to implement any importtree methods here.
It might be possible to make it automatically set exporttree=yes if the
--sameas does. Didn't try, will probably be layering issues.
Or perhaps it should be inherited by sameas like some
other configs? But then, wouldn't it also make sense to inherit
importree=yes? But as shown here, it's not needed by this kind of
remote.
"http" was too generic and easy to confuse with web. The new name makes
clear it's used in addition to some other remote. And other protocols
can use the same naming scheme.
I'm sure this used to work, but somewhere along the line something or
things (getCost and getAvailability I think, probably others)
started catching the exception and not displaying it. So, show warnings.
TVars were not updated atomically, which was ok when each thread got its
own External that was the only thing using these TVars. But, with the
async extension, several External instances can share the same var, so
it needs to be a TMVar to avoid read/write conflicts.
In particular, this makes PREPARE only be sent once.
Moving jobid generation to the git-annex side lets it be simplified a
lot.
Note that it will also be possible to generate one jobid per connection,
rather than a new job per request. That will make overflow not an issue,
and will avoid some work, and will simplify some of the code.
Receive loop looks right. Still need the send loop.
And, a complication is that some messages git-annex
sends need to be wrapped in REPLY_ASYNC, while others
do not. So will probably need to split externalSend
into two.
Idea is for ASYNC extension, it will instead contain methods that communicate
with the thread that handles all communication with the external process.
This was already prevented in other ways, but as seen in commit
c30fd24d91, those were a bit fragile.
And I'm not sure races were avoided in every case before. At least a
race between two separate git-annex processes, dropping the same
content, seemed possible.
This way, if locking fails, and the content is not present, it will
always do the right thing. Also, it avoids the overhead of an unncessary
inAnnex check for every file.
This commit was sponsored by Denis Dzyubenko on Patreon.
adb shell has sha256sum sha1sum and some others, so they could be used.
They're provided by toybox, so seem about as likely to keep
working as find and stat, which it already depends on.
Or to not add a dep, could use stat the same as getExportContentIdentifier
to get a mtime, and make a WORM key. But do I really want this to
default to WORM?
Unsure what's the best path, so punting for now.
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.
git-annex sync --no-content does not yet use this to do imports
Trivial since git-annex cannot remove, but do an active checkKey verification
anyway, in case the data was lost somehow.
This commit was sponsored by Ryan Newton on Patreon.
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
Otherwise use the vendored copy as before.
The library is in Debian testing but not stable. Once it reaches
stable, the vendored copy can be removed.
Did not add it to debian/control because IIRC that's used to build
git-annex on stable too, possibly. However, the Debian maintainer will
probably want to make the package depend on libghc-git-lfs-dev.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Clean build under ghc 8.8.3, which seems to do better at finding cases
where two imports both provide the same symbol, and warns about one of
them.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Fix bug that made creds not be stored in git when a special remote was
initialized with gpg encryption, but without an explicit embedcreds=yes.
(Yet nother regression introduced in version 7.20200202.7. 5th so far.)
This makes the creds get saved, since only things recorded there will be
saved.
IIRC, unparsedRemoteConfig was not originally available when I
implemented this; now that it is things get a bit simpler.
More could probably be simplified, is externalConfigChanges needed at
all?
This does not entirely fix the bugs though, because creds are only
embedded when embedcreds=yes, but not when encryption=pubkey is used
without embedcreds=yes.
* Improve display of problems auto-initializing or upgrading local git
remotes.
* When a local git remote cannot be initialized because it has no
git-annex branch or a .noannex file, avoid displaying a message about it.
So stop documenting it, and stop offering it as a choice in the assistant.
Removed the code that parses it into S3.ReducedRedundancy, because
S3.OtherStorageClass with the value will work just the same and avoids a
special case for a deprecated this.
Some recent changes to use mask missed that async exceptions can still
be thrown inside it. The goal is to make sure a block of cleanup code
runs entirely, w/o being interrupted by an async exception, so use
uninterruptibleMask.
Also, converted a few to bracket, which is nicer.
Since an external process can be in the middle of some operation when an
async exception is received, it has to be shut down then. Using
cleanupProcess will close its IO handles and send it a SIGTERM.
If a special remote choses to catch SIGTERM, it's fine for it to do some
cleanup then, but until it finishes, git-annex will be blocked waiting
for it. If a special remote blocked SIGTERM, it would cause a hang.
Mentioned in docs.
Also, in passing, fixed a FD leak, it was not closing the error handle
when shutting down the external. In practice that didn't matter before because
it was only run when git-annex was itself shutting down, but now that it
can run on exception, it would have been a problem.
Fixes reversion in recent conversions, the old code relied on the GC
apparently, but the new code explicitly waits on the process, so must
close stdin handle first or the command will never exit.
It was not the wrong handle. The handle was not being closed, so bup
kept running.
Before 2670890b17, the code was:
withHandle StdinHandle createProcessSuccess cmd feeder
The stdin handle was not closed by the feeder.
Testing this:
withHandle StdinHandle createProcessSuccess (proc "cat" []) (\h -> hPutStrLn h "hi")
There's a rather long pause, a couple seconds, before it completes, but
it does complete. With hClose h, it immediately completes. This must be
the GC noticing that h is out of scope and closing it.
It seems likely that the old code worked only by that accident.
So, other similar changes made in that and nearby commits may also
have this problem, and need to explicitly close handles that were
somehow implicitly closed before.
Except for the assistant, which I think may use them between threads?
Most of the uses of SomeException were already catching only async exceptions.
But I did find a few places that were accidentially catching them.
Masking ensures that EndStderrHandler gets written, so the helper
threads shut down.
However, nothing currently guarantees that calls to closeP2PSshConnection
are async exception safe, so made a note about it.
At this point, I've audited all calls to async, and made them all async
exception safe, except for ones in the assistant, and a few in leaf
commands (remotedaemon, enable-tor, multicast, p2p) which don't need to
be.
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.
Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
This handles all sites where checkSuccessProcess/ignoreFailureProcess
is used, except for one: Git.Command.pipeReadLazy
That one will be significantly more work to convert to bracketing.
(Also skipped Command.Assistant.autoStart, but it does not need to
shut down the processes it started on exception because they are
git-annex assistant daemons..)
forceSuccessProcess is done, except for createProcessSuccess.
All call sites of createProcessSuccess will need to be converted
to bracketing.
(process pools still todo also)
Try to enable special remotes configured with autoenable=yes when git-annex
auto-initialization happens in a new clone of an existing repo. Previously,
git-annex init had to be explicitly run to enable them. That was a bit of a
wart of a special case for users to need to keep in mind.
Special remotes cannot display anything when autoenabled this way, to avoid
interfering with the output of git-annex query commands.
Any error messages will be hidden, and if it fails, nothing is displayed.
The user will realize the remote isn't enable when they try to use it,
and can run git-annex init manually then to try the autoenable again and
see what failed.
That seems like a reasonable approach, and it's less complicated than
communicating something across a pipe in order to display it as a side
message. Other reason not to do that is that, if the first command the
user runs is one like git-annex find that has machine readable output,
any message about autoenable failing would need to not be displayed anyway.
So better to not display a failure message ever, for consistency.
(Had to split out Remote.List.Util to avoid an import cycle.)
Fix bug that made enableremote of S3 and webdav remotes, that have
embedcreds=yes, fail to set up the embedded creds, so accessing the remotes
failed.
(Regression introduced in version 7.20200202.7 in when reworking all the
remote configs to be parsed.)
Root problem is that parseEncryptionConfig excludes all other config keys
except encryption ones, so it is then unable to find the
credPairRemoteField. And since that field is not required to be
present, it proceeds as if it's not, rather than failing in any visible
way.
This causes it to not find any creds, and so it does not cache
them. When when the S3 remote tries to make a S3 connection, it finds no
creds, so assumes it's being used in no-creds mode, and tries to find a
public url. With no public url available, it fails, but the failure doesn't
say a lack of creds is the problem.
Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full
set of configs of the remote can be parsed. A bit annoying to need to
parse the remote config before the full config (as returned by
setRemoteCredPair) is available, but this avoids the problem.
I assume webdav also had the problem by inspection, but didn't try to
reproduce it with it.
Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted
String, but that does not seem right. Now that it runs that code, it
crashed saying it had just a String.
Remotes that have already been enableremoted, and so lack the cached creds
file will work after this fix, because getRemoteCredPair will extract
the creds from the remote config, writing the missing file.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Finishes the transition to make remote methods throw exceptions, rather
than silently hide them.
A bit on the fence about this one, because when renameExport fails,
it falls back to deleting instead, and so does the user care why it failed?
However, it did let me clean up several places in the code.
This commit was sponsored by Ethan Aubin.
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.
This commit was sponsored by Graham Spencer on Patreon.
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.
getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.
This commit was sponsored by Ilya Shlyakhter on Patreon.
When storing content on remote fails, always display a reason why.
Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
That had almost no benefit at all, and complicated things quite a lot.
What I proably wanted this to be was something like ResourceT, but it
was not. The few remotes that actually need some preparation done only
once and reused used a MVar and not Preparer.