Commit graph

1332 commits

Author SHA1 Message Date
Joey Hess
ca31d7e54f
refactor
That code was not borg specific, and I can see making more remotes for
other backup software.
2020-12-18 17:08:44 -04:00
Joey Hess
1c054f1cf7
started borg special remote
Still need to implement 3 methods, but importKeyM looks like it will
work well to find annex object files.
2020-12-18 16:56:54 -04:00
Joey Hess
3207e8293b
start borg special remote
Compiles, but unusable so far.
2020-12-18 16:03:51 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
f930176d6e
change info from export=yes to exporttree=yes and same for import
for consistency
2020-12-17 17:06:50 -04:00
Joey Hess
e9db382308
avoid redundant set of a S3 verison ID that is already recorded
I think this could cause unnecessary changes to the git-annex branch,
and retrieveExportWithContentIdentifier is now also used for getting
content from importtree=yes remotes, so it would happen more frequently
so let's avoid.
2020-12-17 16:49:17 -04:00
Joey Hess
77aedbef8b
fix call to warnExportImportConflict
That needs a Remote that has the right export/import set up, not the input
Remote, which does not yet.
2020-12-17 16:25:02 -04:00
Joey Hess
f2ecc6e0da
import remotes use ContentIdentifier for getting and checking content
This is better than using the equivilant actions for export remotes,
especially for getting content, since the ContentIdentifier checking
means we can be sure (enough) that the content is valid to not force
verification of content. Which allows getting keys of types that cannot
be verified.

Also, reorganized the internals of adjustExportImport which was becoming
very hard to follow. Now it's clear what each method does in each case.
2020-12-17 15:55:31 -04:00
Joey Hess
5946e7136e
force verification after getting file from export remote
This way, if annex.verify is disabled, it's still checked, since this is
not a key/value store, it has to be checked.
2020-12-17 15:31:22 -04:00
Joey Hess
ceda8c0066
refactor common code 2020-12-17 14:17:09 -04:00
Joey Hess
4d2cd58ee5
provide missing remote actions for importree only remote
Ah, it seemed too easy before when I was implementing importrree only,
and it was because all the key-based actions needed to be handled too.

Mostly copied from isexport, and this works. It does seem that
an import remote could use retrieveExportWithContentIdentifier
rather than retrieveExport, and checkPresentExportWithContentIdentifier
rather than checkPresentExport, which would both be more accurate.
2020-12-17 13:46:34 -04:00
Joey Hess
6c890d62f6
initremote: Prevent enabling encryption with exporttree=yes/importtree=yes
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
2020-12-15 12:08:08 -04:00
Joey Hess
230e1c88a9
improve display 2020-12-14 13:13:53 -04:00
Joey Hess
d3f78da0ed
propagate signals to the transferrer process group
Done on unix, could not implement it on windows quite.

The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.

Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-11 15:32:00 -04:00
Joey Hess
a422a056f2
make getViaTmpFrom no longer update location log
All callers adjusted to update it themselves.

In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.

This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
2020-12-11 11:50:13 -04:00
Joey Hess
6a11b6fab8
Support special remotes that are configured with importtree=yes but without exporttree=yes
There was no particular reason not to support this, other than maybe a lack
of a use case. One use case would of course be a remote that you want to
avoid overwriting content on. A new use case is the idea of importing from
backups, eg borg, where exporting is not necessarily supported at all.

This commit was sponsored by Brock Spratlen on Patreon.
2020-12-10 13:17:40 -04:00
Joey Hess
63839532c9
remove uses of warningIO
It's not concurrent-output safe, and doesn't support
--json-error-messages.

Using Annex.makeRunner is a bit scary, because what if it's run in a
different thread from an active annex action? Normally the same Annex
state is not used concurrently in several threads, and it's not designed
to be fully concurrency safe. (Annex.Concurrent exists to deal with
that.) I think it will be ok in these simple cases though. Eg,
when buffering a warning message to json, Annex.changeState is used,
and it modifies the MVar in a concurrency safe way.

The only warningIO remaining is not a problem.
2020-12-02 14:57:43 -04:00
Joey Hess
7776677a5f
Fix hang on shutdown of external special remote using ASYNC protocol extension.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.

Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.

The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-30 13:04:02 -04:00
Joey Hess
a3b714ddd9
finish fixing removeLink on windows
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink

This commit was sponsored by Ethan Aubin.
2020-11-24 13:20:44 -04:00
Joey Hess
613455e059
convert to use hGetLineUntilExitOrEOF
It looks to me like the old code would have already dealt with the case
of ssh starting a ssh daemon that inherits stderr and keeps it open.
The ender thread closed the handle, which would unblock the other thread
and let it exit. Using hGetLineUntilExitOrEOF makes this more explicit
that it's dealt with and simplifies the code.
2020-11-19 16:13:31 -04:00
Joey Hess
66497d39b3
convert git config reading to use hGetLineUntilExitOrEOF
Much nicer than the old hack of waiting for a few seconds for stderr to be read.
2020-11-19 15:38:43 -04:00
Kyle Meyer
9e09dcb2cf
BitTorrent: Fix build for "no torrent" code path
The RawFilePath conversions missed a spot in the else arm of "#ifdef
WITH_TORRENTPARSER".
2020-11-19 14:46:21 -04:00
Joey Hess
4b739fc460
Fix build on Windows
Thanks to bug reporter for the patch.
2020-11-19 12:33:00 -04:00
Joey Hess
0896038ba7
annex.adjustedbranchrefresh
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.

Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:

* If two files point to one file, then eg, `git annex get foo` will
  update the branch to unlock foo, but will not unlock bar, because it
  does not know about it. Might be fixable by making `git annex get
  bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
  of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info

Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
2020-11-16 14:27:28 -04:00
Joey Hess
885974be99
add newtypes for QuickCheck to avoid LANG=C issues
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.

A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
2020-11-09 20:21:18 -04:00
Joey Hess
1db49497e0
finished this stage of the RawFilePath conversion
This commit was sponsored by Denis Dzyubenko on Patreon.
2020-11-06 14:10:58 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
87f91ce563
more RawFilePath conversion
451/645
2020-10-30 15:55:59 -04:00
Joey Hess
ca80c3154c
more RawFilePath conversion
removeFile changed to removeLink, because AFAICS it should be fine to
remove non-file things here. In particular, it's fine to remove a
symlink, since we're about to write a symlink. (removeLink does not
remove directories, so file, symlink, and unix socket are the only
possibilities.)
2020-10-30 13:07:41 -04:00
Joey Hess
8f452416f7
more RawFilePath conversion
Better to use Git.repoPath to get a filepath, Git.repoLocation does not
always return one.

This commit was sponsored by Ethan Aubin.
2020-10-30 13:00:12 -04:00
Joey Hess
19694fb280
more RawFilePath conversion
At this point I'll be done by new year's.

This commit was sponsored by Ethan Aubin.
2020-10-30 12:51:34 -04:00
Joey Hess
681b44236a
more RawFilePath conversion
at 377/645

This commit was sponsored by Svenne Krap on Patreon.
2020-10-29 14:20:57 -04:00
Joey Hess
b05015f772
fix name of lock file
It was the stringification of a UUID, so "UUID \"foo\""
2020-10-29 10:53:01 -04:00
Joey Hess
e505c03bcc
more RawFilePath conversion
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.

This commit was sponsored by Graham Spencer on Patreon.
2020-10-29 10:50:29 -04:00
Joey Hess
8d66f7ba0f
more RawFilePath conversion
Added a RawFilePath createDirectory and kept making stuff build.

Up to 296/645

This commit was sponsored by Mark Reidenbach on Patreon.
2020-10-28 17:25:59 -04:00
Joey Hess
b2bf099aa3
use removeDirGeneric here too for consistency
And because it might be more robust on windows.
2020-10-23 16:12:47 -04:00
Joey Hess
4d063f12c6
turns out this was fixed in 2014 2020-10-22 19:54:26 -04:00
Joey Hess
b62e004c2c
update chunk log after speculated chunks are verified to be present
Only done in checkPresentChunks, although retrieveChunks could also do
it. Does not seem necessary though, because git-annex never retrives
content without first checking if it's present AFAICR. And really this will
only be needed when using fsck. Puttting it here, rather than in fsck
avoids breaking an abstraction boundary, and is nice and inexpensive.
2020-10-22 13:37:09 -04:00
Joey Hess
dad4be97c2
speculatively use remote's configured chunk size as a fallback
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.

This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.

Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
2020-10-22 13:11:06 -04:00
Joey Hess
2dd38b6403
switch to Haskell2010
When I put in Haskell98 this spring, I was under the mistaken
apprehension that ghc defaulted to that. But it actually its default
is a third mode, which is closer to Haskell2010 but with some differences.
The manual says "By default, GHC mainly aims to behave (mostly) like a
Haskell 2010 compiler"

Fixed two cases where the Haskell98 do indentation flexability let
wrongly indented code build. That is one of the places where
ghc does not behave like Haskell2010 by default.

The other place that I think I was concerned about, is GHC manual
section 19.1.1.3. Expressions and patterns. But that only seems to
affect code using bottoms, so would only affect pure functions throwing
an error, which I don't think git-annex does in many places as it's
pretty horrid style. And it would only affect rare cases like shown in
that section. If it did happen, it would mean that the error was not
thrown before specifying Haskell98, and then was. Haskell2010 behaves
the same as Haskell98.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-10-19 11:26:16 -04:00
Joey Hess
4c32499e82
Parse youtube-dl progress output
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.

The youtube-dl output is no longer displayed, except for any errors.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-29 17:53:48 -04:00
Joey Hess
084b502c7a
httpalso: Support being used with special remotes that do not have encryption= in their config. 2020-09-29 13:56:27 -04:00
Joey Hess
5cfcf1f05f
cache remote.log
Unlikely to speed up any of the existing uses much, but I want to use it
in a message that might be displayed many times.
2020-09-22 13:52:26 -04:00
Joey Hess
5844a54869
aws-0.22 improved its support for setting etags, which improves support for versioned S3 buckets.
Remove placeholder version number I used when implementing the feature in
aws.

This commit was sponsored by Ethan Aubin.
2020-09-14 18:37:49 -04:00
Joey Hess
ddf963d019
deepseq all things returned from ResourceT http
Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/
although I don't know if it does.

My thinking is, ResourceT may allocate a resource and then free it,
and a unforced thunk to that resource could result in reading memory
that has since been overwritten by something else, or in a SEGV,
depending. While that seems kind of like a bug in ResourceT to me, if it
is what's happening, this will avoid it. If it's not, this doesn't
really hurt much since the values are all smallish.

This commit was sponsored by Graham Spencer on Patreon.
2020-09-14 18:30:06 -04:00
Joey Hess
6ea511beb4
Removed the S3 and WebDAV build flags
So these special remotes are always supported.

IIRC these build flags were added because the dep chains were a bit too
long, or perhaps because the libraries were not available in Debian stable,
or something like that. That was long ago, those reasons no longer apply,
and users get confused when builtin special remotes are not available, so
it seems best to remove the build flags now.

If this does cause a problem it can be reverted of course..

This commit was sponsored by Jochen Bartl on Patreon.
2020-09-08 12:42:59 -04:00
Joey Hess
eed20fe3b7
fix some file modes in calls to withTmpFileIn to honor umask
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.

Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.

Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
2020-09-02 14:36:08 -04:00
Joey Hess
26724fb331
display actual download errors
Eg, when config prohibits accessing localhost, need to show that
message, not a generic "download failed".
2020-09-02 12:21:10 -04:00
Joey Hess
31e5785bf7
avoid multiple download failed messages when learning
Also only display one progress meter for all download attempts, to avoid
a bunch of blank lines.
2020-09-02 12:01:50 -04:00
Joey Hess
854cd2ad47
httpalso: support exporttree=yes
Also tested what happens if the other special remote has importtree=yes
and exporttree=yes, and in that case, download via httpalso works too,
without needing to implement any importtree methods here.

It might be possible to make it automatically set exporttree=yes if the
--sameas does. Didn't try, will probably be layering issues.

Or perhaps it should be inherited by sameas like some
other configs? But then, wouldn't it also make sense to inherit
importree=yes? But as shown here, it's not needed by this kind of
remote.
2020-09-02 11:26:00 -04:00