Commit graph

38719 commits

Author SHA1 Message Date
Joey Hess
f31bdd0b19
todo 2020-12-22 15:01:07 -04:00
Joey Hess
82e43da936
todo 2020-12-22 15:00:11 -04:00
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
5d8e4a7c74
avoid borg list of archives that have been listed before
This makes sync a lot faster in the common case where there's no new
backup.

There's still room for it to be faster. Currently the old imported tree
has to be traversed, to generate the ImportableContents. Which then
gets turned around to generate the new imported tree, which is
identical. So, it would be possible to just return a "no new imports",
or an ImportableContents that has a way to graft in a tree. The latter
is probably too far to go to optimise this, unless other things need it.
The former might be worth it, but it's already pretty fast, since git
ls-tree is pretty fast.
2020-12-22 14:06:40 -04:00
Joey Hess
06ef1b7d68
improve storage of redundant ContentIdentifiers
When a ContentIdentifier is already recorded, don't add it to the log
again, and avoid updating the log.
2020-12-22 12:03:25 -04:00
Joey Hess
7f7094a7cb
include borg archive name in tree, use empty ContentIdentifier
It's unusual to use a ContentIdentifier that is not semi-unique
for different contents. Note that in importKeys, it checks if a content
identifier is one that's known before, to avoid downloading the same
content twice. But that's done in a code path not used for borg repos,
because they are thirdpartypopulated.
2020-12-22 11:53:00 -04:00
Joey Hess
c2d6f335a6
notes on ImportableContents history not being used for retrieval 2020-12-22 11:24:11 -04:00
Joey Hess
bcd55b365c
import from borg is basically working
Still some issues to deal with, see TODO and XXX.

Here's what gets logged, for each key:

cid log:
1608582045.832799227s 6720ebad-b20e-4460-a8f2-2477361aea75 !MjAyMC0xMi0yMVQxMTozMzoxNw==:!MjAyMC0xMi0yMVQxMzowNzoyNg==

The "!Mj" are base64 encoded borg archive names, since mine were
dates and contained some characters not allowed in cid logs unescaped.
There were archives that each contained the key. This list will grow as
more borg backups are done and learned about.

tree generated:
120000 blob 5ef6a4615c084819b44cd4e3a31657664ddf643b	x/dotgit/annex/objects/06/mv/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da
120000 blob 063a139d3021c8db60f5c576d29fada2b824d91c	x/dotgit/annex/objects/72/PP/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893
120000 blob b53b54916fd6abf21fedf796deca08d5ac7a75af	x/dotgit/annex/objects/Ww/pk/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-12-21 16:37:55 -04:00
Joey Hess
15000dee07
improve thirdpartypopulated support
May actually work now.

Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.
2020-12-21 16:19:44 -04:00
Joey Hess
57b03630b3
support thirdPartyPopulated
These don't have importTree in their config, because they don't support
tree import, but they do still support import, and do not support export
or key/value modification.
2020-12-21 13:49:47 -04:00
Joey Hess
706e2a63fb
fix logic error in thirdPartyPopulated handling 2020-12-21 13:24:07 -04:00
Joey Hess
ca31d7e54f
refactor
That code was not borg specific, and I can see making more remotes for
other backup software.
2020-12-18 17:08:44 -04:00
Joey Hess
1c054f1cf7
started borg special remote
Still need to implement 3 methods, but importKeyM looks like it will
work well to find annex object files.
2020-12-18 16:56:54 -04:00
Joey Hess
771b6c64f0
Merge branch 'master' into borg 2020-12-18 16:05:09 -04:00
Joey Hess
e0062c4f93
build fix 2020-12-18 16:04:56 -04:00
Joey Hess
3207e8293b
start borg special remote
Compiles, but unusable so far.
2020-12-18 16:03:51 -04:00
Joey Hess
909318dcee
Merge branch 'master' into borg 2020-12-18 15:27:24 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
e998320318
Merge branch 'master' of ssh://git-annex.branchable.com 2020-12-18 15:14:10 -04:00
Joey Hess
f62aee0525
fix handling of importtree-only remotes
Don't want to try to use these remotes as key/value remotes, which will
surely fail. It only recently became possible for importtree to be set
w/o exporttree, so before this code was ok.

(cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)
2020-12-18 15:13:30 -04:00
Joey Hess
037f8b6863
update 2020-12-18 11:06:23 -04:00
Joey Hess
f930176d6e
change info from export=yes to exporttree=yes and same for import
for consistency
2020-12-17 17:06:50 -04:00
Ilya_Shlyakhter
738d919df3 Added a comment: encryption=onlycreds 2020-12-17 21:01:33 +00:00
Joey Hess
933c86f186
Merge branch 'master' into borg 2020-12-17 16:50:25 -04:00
Joey Hess
e9db382308
avoid redundant set of a S3 verison ID that is already recorded
I think this could cause unnecessary changes to the git-annex branch,
and retrieveExportWithContentIdentifier is now also used for getting
content from importtree=yes remotes, so it would happen more frequently
so let's avoid.
2020-12-17 16:49:17 -04:00
Joey Hess
f0a495fa05
Merge branch 'master' into borg 2020-12-17 16:36:15 -04:00
Joey Hess
e5ef8aea9a
Merge branch 'master' of ssh://git-annex.branchable.com 2020-12-17 16:35:10 -04:00
Joey Hess
4c63cab467
todo 2020-12-17 16:30:51 -04:00
Joey Hess
400bdb48db
update warnExportImportConflict for import-only remotes 2020-12-17 16:25:46 -04:00
Joey Hess
77aedbef8b
fix call to warnExportImportConflict
That needs a Remote that has the right export/import set up, not the input
Remote, which does not yet.
2020-12-17 16:25:02 -04:00
Joey Hess
a4451ac391
add missing space 2020-12-17 15:58:14 -04:00
Joey Hess
f2ecc6e0da
import remotes use ContentIdentifier for getting and checking content
This is better than using the equivilant actions for export remotes,
especially for getting content, since the ContentIdentifier checking
means we can be sure (enough) that the content is valid to not force
verification of content. Which allows getting keys of types that cannot
be verified.

Also, reorganized the internals of adjustExportImport which was becoming
very hard to follow. Now it's clear what each method does in each case.
2020-12-17 15:55:31 -04:00
Joey Hess
5946e7136e
force verification after getting file from export remote
This way, if annex.verify is disabled, it's still checked, since this is
not a key/value store, it has to be checked.
2020-12-17 15:31:22 -04:00
kyle
48da7e002d Added a comment 2020-12-17 19:05:13 +00:00
Joey Hess
ceda8c0066
refactor common code 2020-12-17 14:17:09 -04:00
dscheffy@c203b7661ec8c1ebd53e52627c84536c5f0c9026
c53057caa3 Added a comment 2020-12-17 18:14:16 +00:00
Joey Hess
4d2cd58ee5
provide missing remote actions for importree only remote
Ah, it seemed too easy before when I was implementing importrree only,
and it was because all the key-based actions needed to be handled too.

Mostly copied from isexport, and this works. It does seem that
an import remote could use retrieveExportWithContentIdentifier
rather than retrieveExport, and checkPresentExportWithContentIdentifier
rather than checkPresentExport, which would both be more accurate.
2020-12-17 13:46:34 -04:00
Joey Hess
1b5cb77acf
importtree only remotes are untrusted, same as exporttree remotes
Importtree only remotes are new; importtree remotes used to always also be
exporttree, so were untrusted.

Since an import remote is one that can be edited by something other than
git-annex, it's clearly not trustworthy at all.
2020-12-17 13:45:07 -04:00
kyle
49c4d471ff Added a comment 2020-12-17 17:41:36 +00:00
Joey Hess
e81e43b829
improve comment 2020-12-17 13:12:52 -04:00
Joey Hess
ef8c36254a
docs for borg special remote
(which DNE yet)
2020-12-17 13:12:35 -04:00
Joey Hess
e9af56fef1
typo 2020-12-17 12:53:47 -04:00
Joey Hess
53fd1564b1
improve synopsis 2020-12-17 12:51:49 -04:00
Joey Hess
7c7486a45f
response 2020-12-17 12:47:07 -04:00
Joey Hess
c52550a6a8
Merge branch 'master' of ssh://git-annex.branchable.com 2020-12-17 12:45:07 -04:00
Joey Hess
170185fb78
improve docs 2020-12-17 12:32:41 -04:00
Joey Hess
00352ebe37
man page improvement 2020-12-17 12:17:58 -04:00
Joey Hess
26aad24fd3
simplify
As the only blocking operation now is threadDelaySeconds, no need to
calculate actual time and actual expected minimum size.
2020-12-17 12:09:49 -04:00
kyle
54dc20fb74 Added a comment 2020-12-17 16:06:44 +00:00