Commit graph

1363 commits

Author SHA1 Message Date
Joey Hess
fa3d71d924
Tahoe: Avoid verifying hash after download, since tahoe does sufficient verification itself
See my comment in the next commit for some details about why
Verified needs a hash with preimage resistance. As far as tahoe goes,
it's fully cryptographically secure.

I think that bup could also return Verified. However, the Retriever
interface does not currenly support that.
2021-02-09 13:42:16 -04:00
Joey Hess
3a66cd715f
avoid making absolute git remote path relative
When a git remote is configured with an absolute path, use that path,
rather than making it relative. If it's configured with a relative path,
use that.

Git.Construct.fromPath changed to preserve the path as-is,
rather than making it absolute. And Annex.new changed to not
convert the path to relative. Instead, Git.CurrentRepo.get
generates a relative path.

A few things that used fromAbsPath unncessarily were changed in passing to
use fromPath instead. I'm seeing fromAbsPath as a security check,
while before it was being used in some cases when the path was
known absolute already. It may be that fromAbsPath is not really needed,
but only git-annex-shell uses it now, and I'm not 100% sure that there's
not some input that would cause a relative path to be used, opening a
security hole, without the security check. So left it as-is.

Test suite passes and strace shows the configured remote url is used
unchanged in the path into it. I can't be 100% sure there's not some code
somewhere that takes an absolute path to the repo and converts it to
relative and uses it, but it seems pretty unlikely that the code paths used
for a git remote would call such code. One place I know of is gitAnnexLink,
but I'm pretty sure that git remotes never deal with annex symlinks. If
that did get called, it generates a path relative to cwd, which would have
been wrong before this change as well, when operating on a remote.
2021-02-08 13:18:01 -04:00
Joey Hess
dd39e9e255
suggest when user may want annex.stalldetection
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.

Note that the progress meter display is not taken down when displaying
the message, so it will display like this:

	0%    8 B                 0 B/s
	  Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
	0%    10 B                0 B/s

Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.

Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.

A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.

This commit was sponsored by Kevin Mueller on Patreon.
2021-02-03 15:57:19 -04:00
Joey Hess
1b63132ca3
add searchPathContents
And rename related functions for consistency.
2021-02-02 19:06:15 -04:00
Joey Hess
b372d962ae
Added GETGITREMOTENAME to extenal special remote protocol 2021-01-26 12:42:47 -04:00
Joey Hess
b63e3118d7
fix export overwrite on FAT
Don't accept the cid of the temp file that the content has just been
written to as something we will accept if another file has that same
content. There's no reason to, and on FAT, due to mtime resolution,
the test suite hit just such a case.

This fixes a reversion from 73df633a62
which removed inode from the ContentIdentifier.
2021-01-25 13:31:17 -04:00
Joey Hess
73df633a62
omit inode from ContentIdentifier for directory special remote
Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.

A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.

I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.

There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.

This commit was sponsored by Brett Eisenberg on Patreon.
2021-01-19 13:15:07 -04:00
Joey Hess
e7134ca1eb
avoid partial functions in Git.Url
After the last commit, it was able to throw errors just due to an
unparseable url. This avoids needing to worry about that, as long
as the call site has already checked that it has a parseable url.
2021-01-18 15:07:23 -04:00
Joey Hess
2aa4fab62a
avoid crashing when there are remotes using unparseable urls
Including the non-standard URI form that git-remote-gcrypt uses for rsync.

Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port
number. But git could have a remote with that, it would try to run
git-remote-ook to handle it. So, git-annex has to allow for such things,
rather than crashing.

This commit was sponsored by Luke Shumaker on Patreon.
2021-01-18 14:59:08 -04:00
Joey Hess
c8b1fa67b4
Behavior change: --trust-glacier option no longer overrides trust
Since that can lead to data loss, which should never be enabled by an
option other than --force.

This commit was sponsored by Jake Vosloo on Patreon.
2021-01-07 10:37:43 -04:00
Joey Hess
e10855e723
don't support dropping from thirdPartyPopulated for now
This code I'm reverting works. But it has a problem: The export db and
log and the ContentIdentifier db and log still list the content as being
stored in the remote. So when I ran borg create again and stored the
content in borg again in a new archive, git-annex sync noticed that, but
since it didn't update the tree for the old archives, it then thought
the content that had been removed from them was still in them, and so
git-annex get failed in an ugly way:

	Include pattern 'tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' never matched.
	[2020-12-28 16:40:44.878952393] process [933616] done ExitFailure 1

	  user error (borg ["extract","/tmp/b::abs4","tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"] exited 1)

It does not seem worth it to update the git tree for the export when dropping
content, that would make drop of many files very expensive in git tree objects
created. So, let's not support this I suppose..
2020-12-28 16:48:38 -04:00
Joey Hess
69d4b84501
support removing objects from borg 2020-12-28 16:36:52 -04:00
Joey Hess
bfdaee234f
support removal from thirdPartyPopulated
also some other fixes to thirdPartyPopulated
2020-12-28 16:36:26 -04:00
Joey Hess
b16e6fb4e6
borg appendonly config 2020-12-28 16:23:38 -04:00
Joey Hess
0990d74574
revert recent lockContent change
lockContent should only be done when it's versioned
2020-12-28 16:05:14 -04:00
Joey Hess
36133f27c0
move untrust forcing from Logs.Trust into Remote
No behavior changes here, but this is groundwork for letting remotes
such as borg vary untrust forcing depending on configuration.
2020-12-28 15:22:10 -04:00
Joey Hess
5ce7fce74a
simplify
adjustExportImport' is never called with both isexport and isimport False.
2020-12-28 15:06:47 -04:00
Joey Hess
46059ab0e5
split off versionedExport from appendonly
S3 uses versionedExport, while GitLFS uses appendonly.

This is groundwork for later changes.
2020-12-28 14:37:15 -04:00
Joey Hess
2e72590a48
avoid using export method when the remote only supports import 2020-12-23 13:40:56 -04:00
Joey Hess
e3d356fe84
borg: add subdir= config
Note that, after changing it with enableremote, syncing won't rescan
known archives in the borg repo using the changed config. Probably not a
problem?

Also used File in some places where filenames that could theoretically
start with - are passed to borg, to avoid it confusing them with
options.
2020-12-23 13:12:11 -04:00
Joey Hess
4254e2297d
implement retrieveExportWithContentIdentifier
Moved out an XXX to a todo

This seems about ready to merge..
2020-12-22 16:16:48 -04:00
Joey Hess
a9d639c5b5
borg can prompt 2020-12-22 15:48:17 -04:00
Joey Hess
df4942e179
notice when an archive that was seen before gets deleted 2020-12-22 15:45:06 -04:00
Joey Hess
523b7143e0
implemented checkPresentExportWithContentIdentifier 2020-12-22 15:34:41 -04:00
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
5d8e4a7c74
avoid borg list of archives that have been listed before
This makes sync a lot faster in the common case where there's no new
backup.

There's still room for it to be faster. Currently the old imported tree
has to be traversed, to generate the ImportableContents. Which then
gets turned around to generate the new imported tree, which is
identical. So, it would be possible to just return a "no new imports",
or an ImportableContents that has a way to graft in a tree. The latter
is probably too far to go to optimise this, unless other things need it.
The former might be worth it, but it's already pretty fast, since git
ls-tree is pretty fast.
2020-12-22 14:06:40 -04:00
Joey Hess
7f7094a7cb
include borg archive name in tree, use empty ContentIdentifier
It's unusual to use a ContentIdentifier that is not semi-unique
for different contents. Note that in importKeys, it checks if a content
identifier is one that's known before, to avoid downloading the same
content twice. But that's done in a code path not used for borg repos,
because they are thirdpartypopulated.
2020-12-22 11:53:00 -04:00
Joey Hess
bcd55b365c
import from borg is basically working
Still some issues to deal with, see TODO and XXX.

Here's what gets logged, for each key:

cid log:
1608582045.832799227s 6720ebad-b20e-4460-a8f2-2477361aea75 !MjAyMC0xMi0yMVQxMTozMzoxNw==:!MjAyMC0xMi0yMVQxMzowNzoyNg==

The "!Mj" are base64 encoded borg archive names, since mine were
dates and contained some characters not allowed in cid logs unescaped.
There were archives that each contained the key. This list will grow as
more borg backups are done and learned about.

tree generated:
120000 blob 5ef6a4615c084819b44cd4e3a31657664ddf643b	x/dotgit/annex/objects/06/mv/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da
120000 blob 063a139d3021c8db60f5c576d29fada2b824d91c	x/dotgit/annex/objects/72/PP/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893
120000 blob b53b54916fd6abf21fedf796deca08d5ac7a75af	x/dotgit/annex/objects/Ww/pk/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-12-21 16:37:55 -04:00
Joey Hess
15000dee07
improve thirdpartypopulated support
May actually work now.

Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.
2020-12-21 16:19:44 -04:00
Joey Hess
706e2a63fb
fix logic error in thirdPartyPopulated handling 2020-12-21 13:24:07 -04:00
Joey Hess
ca31d7e54f
refactor
That code was not borg specific, and I can see making more remotes for
other backup software.
2020-12-18 17:08:44 -04:00
Joey Hess
1c054f1cf7
started borg special remote
Still need to implement 3 methods, but importKeyM looks like it will
work well to find annex object files.
2020-12-18 16:56:54 -04:00
Joey Hess
3207e8293b
start borg special remote
Compiles, but unusable so far.
2020-12-18 16:03:51 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
f930176d6e
change info from export=yes to exporttree=yes and same for import
for consistency
2020-12-17 17:06:50 -04:00
Joey Hess
e9db382308
avoid redundant set of a S3 verison ID that is already recorded
I think this could cause unnecessary changes to the git-annex branch,
and retrieveExportWithContentIdentifier is now also used for getting
content from importtree=yes remotes, so it would happen more frequently
so let's avoid.
2020-12-17 16:49:17 -04:00
Joey Hess
77aedbef8b
fix call to warnExportImportConflict
That needs a Remote that has the right export/import set up, not the input
Remote, which does not yet.
2020-12-17 16:25:02 -04:00
Joey Hess
f2ecc6e0da
import remotes use ContentIdentifier for getting and checking content
This is better than using the equivilant actions for export remotes,
especially for getting content, since the ContentIdentifier checking
means we can be sure (enough) that the content is valid to not force
verification of content. Which allows getting keys of types that cannot
be verified.

Also, reorganized the internals of adjustExportImport which was becoming
very hard to follow. Now it's clear what each method does in each case.
2020-12-17 15:55:31 -04:00
Joey Hess
5946e7136e
force verification after getting file from export remote
This way, if annex.verify is disabled, it's still checked, since this is
not a key/value store, it has to be checked.
2020-12-17 15:31:22 -04:00
Joey Hess
ceda8c0066
refactor common code 2020-12-17 14:17:09 -04:00
Joey Hess
4d2cd58ee5
provide missing remote actions for importree only remote
Ah, it seemed too easy before when I was implementing importrree only,
and it was because all the key-based actions needed to be handled too.

Mostly copied from isexport, and this works. It does seem that
an import remote could use retrieveExportWithContentIdentifier
rather than retrieveExport, and checkPresentExportWithContentIdentifier
rather than checkPresentExport, which would both be more accurate.
2020-12-17 13:46:34 -04:00
Joey Hess
6c890d62f6
initremote: Prevent enabling encryption with exporttree=yes/importtree=yes
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
2020-12-15 12:08:08 -04:00
Joey Hess
230e1c88a9
improve display 2020-12-14 13:13:53 -04:00
Joey Hess
d3f78da0ed
propagate signals to the transferrer process group
Done on unix, could not implement it on windows quite.

The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.

Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-11 15:32:00 -04:00
Joey Hess
a422a056f2
make getViaTmpFrom no longer update location log
All callers adjusted to update it themselves.

In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.

This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
2020-12-11 11:50:13 -04:00
Joey Hess
6a11b6fab8
Support special remotes that are configured with importtree=yes but without exporttree=yes
There was no particular reason not to support this, other than maybe a lack
of a use case. One use case would of course be a remote that you want to
avoid overwriting content on. A new use case is the idea of importing from
backups, eg borg, where exporting is not necessarily supported at all.

This commit was sponsored by Brock Spratlen on Patreon.
2020-12-10 13:17:40 -04:00
Joey Hess
63839532c9
remove uses of warningIO
It's not concurrent-output safe, and doesn't support
--json-error-messages.

Using Annex.makeRunner is a bit scary, because what if it's run in a
different thread from an active annex action? Normally the same Annex
state is not used concurrently in several threads, and it's not designed
to be fully concurrency safe. (Annex.Concurrent exists to deal with
that.) I think it will be ok in these simple cases though. Eg,
when buffering a warning message to json, Annex.changeState is used,
and it modifies the MVar in a concurrency safe way.

The only warningIO remaining is not a problem.
2020-12-02 14:57:43 -04:00
Joey Hess
7776677a5f
Fix hang on shutdown of external special remote using ASYNC protocol extension.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.

Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.

The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-30 13:04:02 -04:00
Joey Hess
a3b714ddd9
finish fixing removeLink on windows
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink

This commit was sponsored by Ethan Aubin.
2020-11-24 13:20:44 -04:00