Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.
This commit was supported by the NSF-funded DataLad project.
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.
This commit was supported by the NSF-funded DataLad project
Implemented so far for the directory special remote.
Several remotes don't make sense to export to. Regular Git remotes,
obviously, do not. Bup remotes almost certianly do not, since bup would
need to be used to extract the export; same store for Ddar. Web and
Bittorrent are download-only. GCrypt is always encrypted so exporting to
it would be pointless. There's probably no point complicating the Hook
remotes with exporting at this point. External, S3, Glacier, WebDAV,
Rsync, and possibly Tahoe should be modified to support export.
Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey
interface, rather than adding a new interface. But, it seemed better to
keep it separate, to avoid a complicated interface that sometimes
encrypts/chunks key/value storage and sometimes users non-key/value
storage. Any common parts can be factored out.
Note that storeExport is not atomic.
doc/design/exporting_trees_to_special_remotes.mdwn has some things in
the "resuming exports" section that bear on this decision. Basically,
I don't think, at this time, that an atomic storeExport would help with
resuming, because exports are not key/value storage, and we can't be
sure that a partially uploaded file is the same content we're currently
trying to export.
Also, note that ExportLocation will always use unix path separators.
This is important, because users may export from a mix of windows and
unix, and it avoids complicating the API with path conversions,
and ensures that in such a mix, they always use the same locations for
exports.
This commit was sponsored by Bruno BEAUFILS on Patreon.
External special remotes will refuse to operate on keys with spaces in
their names. That has never worked correctly due to the design of the
external special remote protocol. Display an error message suggesting
migration.
Not super happy with this, but it's a pragmatic solution. Better than
complicating the external special remote interface and all external special
remotes.
Note that I only made it use SafeKey in Request, not Response. git-annex
does not construct a Response, so that would not add any safety. And
presumably, if git-annex avoids feeding any such keys to an external
special remote, it will never have a reason to make a Response using such a
key. If it did, it would result in a protocol error anyway.
There's still a Serializeable instance for Key; it's used by P2P.Protocol.
There, the Key is always in the final position, so it's ok if it contains
spaces.
Note that the protocol documentation has been fixed to say that the File
may contain spaces. One way that can happen, even though the Key can't,
is when using direct mode, and the work tree filename contains spaces.
When sending such a file to the external special remote the worktree
filename is used.
This commit was sponsored by Thom May on Patreon.
findShellCommand needs a full path to a file in order to check it for a
shebang on Windows. It was being run with only the base name of the external
special remote program, which would only work when it was in the current
directory.
This is why users in
https://github.com/DanielDent/git-annex-remote-rclone/pull/10 and elsewhere
were complaining that the previous improvements to git-annex didn't make
git-remote-rclone work on Windows.
Also, reworked checkearlytermination, which while it worked, seemed
to rely on a race condition. And, improved its error messages.
This commit was sponsored by Shane-o on Patreon.
Most remotes have an idempotent setup that can be reused for
enableremote, but in a few cases, it needs to tell which, and whether
a UUID was provided to setup was used.
This is groundwork for making initremote be able to provide a UUID.
It should not change any behavior.
Note that it would be nice to make the UUID always be provided to setup,
and make setup not need to generate and return a UUID. What prevented
this simplification is Remote.Git.gitSetup, which needs to reuse the
UUID of the git remote when setting it up, and so has to return that
UUID.
This commit was sponsored by Thom May on Patreon.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.
Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.
This commit was sponsored by Ethan Aubin.
Multiple external special remote processes for the same remote will be
started as needed when using -J.
This should not beak any existing external special remotes, because running
multiple git-annex commands at the same time could already start multiple
processes for the same external special remotes.
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.
Noisy due to some refactoring into Types/
Ignore exceptions when getting the cost and availability for the remote,
and return sane defaults. These defaults are not cached, so if a special
remote program has a transient problem, it will re-query it later.
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.
RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
Since git-annex unsets these when started, they have to be explicitly
propigated. Also, this makes --git-dir and --work-tree settings be
reflected in the environment.
The need for this came up in
https://github.com/DanielDent/git-annex-remote-rclone/issues/3
Note that, if an url is added to the web log for such a remote, it's not
distinguishable from another url that might be added for the web remote.
(Because the web log doesn't distinguish which remote owns a plain url.
Urls with a downloader set are distinguishable, but we're not using them
here.)
This seems ok-ish.. In such a case, both remotes will try to use both
urls, and both remotes should be able to.
The only issue I see is that dropping a file from the web remote will
remove both urls in this case. This is not often done, and could even
be considered a feature, I suppose.
Now it suffices to run git remote add, followed by git-annex sync. Now the
remote is automatically initialized for use by git-annex, where before the
git-annex branch had to manually be pushed before using git-annex sync.
Note that this involved changes to git-annex-shell, so if the remote is
using an old version, the manual push is still needed.
Implementation required git-annex-shell be changed, so configlist can
autoinit a repository even when no git-annex branch has been pushed yet.
Unfortunate because we'll have to wait for it to get deployed to servers
before being able to rely on this change in the documentation.
Did consider making git-annex sync push the git-annex branch to repos that
didn't have a uuid, but this seemed difficult to do without complicating it
in messy ways.
It would be cleaner to split a command out from configlist to handle
the initialization. But this is difficult without sacrificing backwards
compatability, for users of old git-annex versions which would not use the
new command.
It sounds worse than it is. ;)
Some external special remotes may run commands that display progress on
stderr. If git-annex is run with --quiet, this should filter out such
displays while letting the errors through.
Useful for things like ipfs that don't use regular urls.
An external special remote can add a regular url to a key, and then
git-annex get will download it from the web. But for ipfs, we want to
instead tell git-annex that the uri uses OtherDownloader. Before this
change, the external special remote protocol lacked a way to do that.
Now `git annex info $remote` shows info specific to the type of the remote,
for example, it shows the rsync url.
Remote types that support encryption or chunking also include that in their
info.
This commit was sponsored by Ævar Arnfjörð Bjarmason.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
encryptionSetup must be called before setRemoteCredPair. Otherwise,
the RemoteConfig doesn't have the cipher in it, and so no cipher is used to
encrypt the embedded creds.
This is a security fix for non-shared encryption methods!
For encryption=shared, there's no security problem, just an
inconsistentency in whether the embedded creds are encrypted.
This is very important to get right, so used some types to help ensure that
setRemoteCredPair is only run after encryptionSetup. Note that the external
special remote bypasses the type safety, since creds can be set after the
initial remote config, if the external special remote program requests it.
Also note that IA remotes never use encryption, so encryptionSetup is not
run for them at all, and again the type safety is bypassed.
This leaves two open questions:
1. What to do about S3 and glacier remotes that were set up
using encryption=pubkey/hybrid with embedcreds?
Such a git repo has a security hole embedded in it, and this needs to be
communicated to the user. Is the changelog enough?
2. enableremote won't work in such a repo, because git-annex will
try to decrypt the embedded creds, which are not encrypted, so fails.
This needs to be dealt with, especially for ecryption=shared repos,
which are not really broken, just inconsistently configured.
Noticing that problem for encryption=shared is what led to commit
fbdeeeed5f, which tried to
fix the problem by not decrypting the embedded creds.
This commit was sponsored by Josh Taylor.
Added a mkUnavailable method, which a Remote can use to generate a version
of itself that is not available. Implemented for several, but not yet all
remotes.
This allows testing that checkPresent properly throws an exceptions when
it cannot check if a key is present or not. It also allows testing that the
other methods don't throw exceptions in these circumstances.
This immediately found several bugs, which this commit also fixes!
* git remotes using ssh accidentially had checkPresent return
an exception, rather than throwing it
* The chunking code accidentially returned False rather than
propigating an exception when there were no chunks and
checkPresent threw an exception for the non-chunked key.
This commit was sponsored by Carlo Matteo Capocasa.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:
1. They can all be caught in one place (Remote.hasKey), and we know
every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
Remotes to be able to be implemented without needing to worry about
catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
Bool, which all the other relevant remote methods already did.
I do not see a good way to generalize that type; my previous attempts
failed miserably.
This allows a remote to store a piece of arbitrary state associated with a
key. This is needed to support Tahoe, where the file-cap is calculated from
the data stored in it, and used to retrieve a key later. Glacier also would
be much improved by using this.
GETSTATE and SETSTATE are added to the external special remote protocol.
Note that the state is left as-is even when a key is removed from a remote.
It's up to the remote to decide when it wants to clear the state.
The remote state log, $KEY.log.rmt, is a UUID-based log. However,
rather than using the old UUID-based log format, I created a new variant
of that format. The new varient is more space efficient (since it lacks the
"timestamp=" hack, and easier to parse (and the parser doesn't mess with
whitespace in the value), and avoids compatability cruft in the old one.
This seemed worth cleaning up for these new files, since there could be a
lot of them, while before UUID-based logs were only used for a few log
files at the top of the git-annex branch. The transition code has also
been updated to handle these new UUID-based logs.
This commit was sponsored by Daniel Hofer.
This was unexpectedly difficult because of a depdenency cycle. To parse a
preferred content expression involves several things that need to operate
on the list of remotes. Which needs Remote.External. The only way to avoid
this cycle (I tried breaking it at several points) was to skip parsing the
expression in SETWANTED.
That's sorta ok, because git-annex already has to deal with unparsable
preferred content expressions being stored, in order to handle eg,
upgrades. But I'm still not very happy that I cannot check it.
I feel this is a strong indication that I need to beware of further
bloating the special remote protocol interface.
Changed protocol spec to make SETCONFIG only store it persistently when run
during INITREMOTE. I see no reason to support storing it persistently at
other times, and doing so would unnecessarily complicate the code.
Also, letting that be done would probably result in use for storing data that
doesn't really belong there, and special remote authors who don't
understand how the union merging works would probably be surprised the
results.
That complicated special remote programs, because they had to avoid making
PREPARE fail if some configuration is missing, because the remote might not
be initialized yet. Instead, complicate git-annex slightly by only sending
PREPARE immediately before some other request other than INITREMOTE (or
PREPARE of course).
This has not been tested at all. It compiles!
The only known missing things are support for encryption, and for get/set
of special remote configuration, and of key state. (The latter needs
separate work to add a new per-key log file to store that state.)
Only thing I don't much like is that initremote needs to be passed both
type=external and externaltype=foo. It would be better to have just
type=foo
Most of this is quite straightforward code, that largely wrote itself given
the types. The only tricky parts were:
* Need to lock the remote when using it to eg make a request, because
in theory git-annex could have multiple threads that each try to use
a remote at the same time. I don't think that git-annex ever does
that currently, but better safe than sorry.
* Rather than starting up every external special remote program when
git-annex starts, they are started only on demand, when first used.
This will avoid slowdown, especially when running fast git-annex query
commands. Once started, they keep running until git-annex stops, currently,
which may not be ideal, but it's hard to know a better time to stop them.
* Bit of a chicken and egg problem with caching the cost of the remote,
because setting annex-cost in the git config needs the remote to already
be set up. Managed to finesse that.
This commit was sponsored by Lukas Anzinger.
This is mostly straightforward, but did turn out quite nicely stronly
typed, and with a quite nice automatic tokenization and parsing of received
messages.
Made a few minor changes to the protocol to clear up ambiguities and make
it easier to parse. Note particularly that setting remote configuration
is moved to a separate command, which allows a remote to set arbitrary data.