Commit graph

38912 commits

Author SHA1 Message Date
Joey Hess
94b3b8f2d9
update 2020-12-28 17:04:41 -04:00
Joey Hess
d95f647572
tip for probably best use case for borg with git-annex 2020-12-28 17:02:39 -04:00
Joey Hess
e10855e723
don't support dropping from thirdPartyPopulated for now
This code I'm reverting works. But it has a problem: The export db and
log and the ContentIdentifier db and log still list the content as being
stored in the remote. So when I ran borg create again and stored the
content in borg again in a new archive, git-annex sync noticed that, but
since it didn't update the tree for the old archives, it then thought
the content that had been removed from them was still in them, and so
git-annex get failed in an ugly way:

	Include pattern 'tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' never matched.
	[2020-12-28 16:40:44.878952393] process [933616] done ExitFailure 1

	  user error (borg ["extract","/tmp/b::abs4","tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"] exited 1)

It does not seem worth it to update the git tree for the export when dropping
content, that would make drop of many files very expensive in git tree objects
created. So, let's not support this I suppose..
2020-12-28 16:48:38 -04:00
Joey Hess
69d4b84501
support removing objects from borg 2020-12-28 16:36:52 -04:00
Joey Hess
bfdaee234f
support removal from thirdPartyPopulated
also some other fixes to thirdPartyPopulated
2020-12-28 16:36:26 -04:00
Joey Hess
b16e6fb4e6
borg appendonly config 2020-12-28 16:23:38 -04:00
Joey Hess
0990d74574
revert recent lockContent change
lockContent should only be done when it's versioned
2020-12-28 16:05:14 -04:00
gb@4a49bb1afcf3d183bba8f07297b0395808768c6c
1ff6499878 Bug report: annex.adjustedbranchrefresh ignored 2020-12-28 20:03:34 +00:00
Joey Hess
36133f27c0
move untrust forcing from Logs.Trust into Remote
No behavior changes here, but this is groundwork for letting remotes
such as borg vary untrust forcing depending on configuration.
2020-12-28 15:22:10 -04:00
Joey Hess
5ce7fce74a
simplify
adjustExportImport' is never called with both isexport and isimport False.
2020-12-28 15:06:47 -04:00
Joey Hess
46059ab0e5
split off versionedExport from appendonly
S3 uses versionedExport, while GitLFS uses appendonly.

This is groundwork for later changes.
2020-12-28 14:37:15 -04:00
AlbertZeyer
5f99217626 2020-12-28 16:31:09 +00:00
AlbertZeyer
4593f002b7 2020-12-28 16:02:24 +00:00
AlbertZeyer
268df65516 2020-12-28 15:50:54 +00:00
AlbertZeyer
5765a9cbc0 2020-12-28 15:46:52 +00:00
AlbertZeyer
b1cd6981f5 2020-12-28 15:33:08 +00:00
Joey Hess
fe4725d66e
switch to createrepo_c
createrepo used python2 and got removed from debian
2020-12-25 16:32:03 -04:00
eric.w@eee65cd362d995ced72640c7cfae388ae93a4234
a83f6075ec removed 2020-12-23 19:57:49 +00:00
Joey Hess
6280af2901
generate more compact git-annex branch for imports
Especially from borg, where the content identifier logs
all end up being the same identical file!

But also, for other imports, the location tracking logs can,
in some cases, be identical files.

Bonus optimisation: Avoid looking up (and parsing when set)
GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to.
Although the lookup does happen at startup even when no
log will be written now.
2020-12-23 15:25:16 -04:00
Joey Hess
f8aadbfb9b
avoid grafting in an imported tree when it's not changed
This just avoids some churn in the git-annex branch.
2020-12-23 14:31:14 -04:00
Joey Hess
7916fc98a3
graft in imported tree to avoid gc
Fix a bug that could prevent getting files from an importtree=yes remote,
because the imported tree was allowed to be garbage collected.
2020-12-23 14:27:38 -04:00
Joey Hess
c6e693b25d
remove ContentIndentifiersCidRemoteIndex uniqueness constraint
For reasons explained in the bug report.

Implemented using a persistent migration, which works fine. It may add a
little startup overhead when a remote is enabled that uses this, but
probably un-noticable.

On the next major version, it would be fine to delete this database,
and regenerate it from the git-annex branch information. Then this
change could be reverted.

Did nothing about adding back the data that got dropped from the db
due to the bug. Only the borg special remote was probably affected,
and it's not been released yet. rm -rf .git/annex/cidsdb does work.
2020-12-23 14:03:33 -04:00
Joey Hess
b370e6b0ad
bug 2020-12-23 13:41:02 -04:00
Joey Hess
2e72590a48
avoid using export method when the remote only supports import 2020-12-23 13:40:56 -04:00
Joey Hess
e3d356fe84
borg: add subdir= config
Note that, after changing it with enableremote, syncing won't rescan
known archives in the borg repo using the changed config. Probably not a
problem?

Also used File in some places where filenames that could theoretically
start with - are passed to borg, to avoid it confusing them with
options.
2020-12-23 13:12:11 -04:00
Joey Hess
1574972ba9
make sync --content get from third-party populated remotes like borg 2020-12-23 12:10:39 -04:00
Joey Hess
d239a55bd1
Merge branch 'master' of ssh://git-annex.branchable.com 2020-12-22 16:55:15 -04:00
Joey Hess
91624f918d
devblog 2020-12-22 16:53:40 -04:00
Joey Hess
79729bcd76
todo 2020-12-22 16:37:19 -04:00
Joey Hess
a2fe994ebb
move unimplemented option to todo 2020-12-22 16:28:13 -04:00
Joey Hess
cd4c68924b
merged borg
Still a couple related todos, but it's basically usable now.
2020-12-22 16:22:44 -04:00
Joey Hess
310d3c3823
Merge branch 'borg' 2020-12-22 16:19:32 -04:00
Joey Hess
2335476e1e
todo 2020-12-22 16:19:02 -04:00
Joey Hess
4254e2297d
implement retrieveExportWithContentIdentifier
Moved out an XXX to a todo

This seems about ready to merge..
2020-12-22 16:16:48 -04:00
Joey Hess
a9d639c5b5
borg can prompt 2020-12-22 15:48:17 -04:00
Joey Hess
df4942e179
notice when an archive that was seen before gets deleted 2020-12-22 15:45:06 -04:00
Joey Hess
523b7143e0
implemented checkPresentExportWithContentIdentifier 2020-12-22 15:34:41 -04:00
Joey Hess
f31bdd0b19
todo 2020-12-22 15:01:07 -04:00
Joey Hess
82e43da936
todo 2020-12-22 15:00:11 -04:00
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
5d8e4a7c74
avoid borg list of archives that have been listed before
This makes sync a lot faster in the common case where there's no new
backup.

There's still room for it to be faster. Currently the old imported tree
has to be traversed, to generate the ImportableContents. Which then
gets turned around to generate the new imported tree, which is
identical. So, it would be possible to just return a "no new imports",
or an ImportableContents that has a way to graft in a tree. The latter
is probably too far to go to optimise this, unless other things need it.
The former might be worth it, but it's already pretty fast, since git
ls-tree is pretty fast.
2020-12-22 14:06:40 -04:00
Joey Hess
06ef1b7d68
improve storage of redundant ContentIdentifiers
When a ContentIdentifier is already recorded, don't add it to the log
again, and avoid updating the log.
2020-12-22 12:03:25 -04:00
Joey Hess
7f7094a7cb
include borg archive name in tree, use empty ContentIdentifier
It's unusual to use a ContentIdentifier that is not semi-unique
for different contents. Note that in importKeys, it checks if a content
identifier is one that's known before, to avoid downloading the same
content twice. But that's done in a code path not used for borg repos,
because they are thirdpartypopulated.
2020-12-22 11:53:00 -04:00
Joey Hess
c2d6f335a6
notes on ImportableContents history not being used for retrieval 2020-12-22 11:24:11 -04:00
eric.w@eee65cd362d995ced72640c7cfae388ae93a4234
ce740fb22e 2020-12-21 20:59:21 +00:00
Joey Hess
bcd55b365c
import from borg is basically working
Still some issues to deal with, see TODO and XXX.

Here's what gets logged, for each key:

cid log:
1608582045.832799227s 6720ebad-b20e-4460-a8f2-2477361aea75 !MjAyMC0xMi0yMVQxMTozMzoxNw==:!MjAyMC0xMi0yMVQxMzowNzoyNg==

The "!Mj" are base64 encoded borg archive names, since mine were
dates and contained some characters not allowed in cid logs unescaped.
There were archives that each contained the key. This list will grow as
more borg backups are done and learned about.

tree generated:
120000 blob 5ef6a4615c084819b44cd4e3a31657664ddf643b	x/dotgit/annex/objects/06/mv/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da
120000 blob 063a139d3021c8db60f5c576d29fada2b824d91c	x/dotgit/annex/objects/72/PP/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893
120000 blob b53b54916fd6abf21fedf796deca08d5ac7a75af	x/dotgit/annex/objects/Ww/pk/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-12-21 16:37:55 -04:00
Joey Hess
15000dee07
improve thirdpartypopulated support
May actually work now.

Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.
2020-12-21 16:19:44 -04:00
jkniiv
21302c00ef Added a comment: well done! 2020-12-21 18:14:26 +00:00
Joey Hess
57b03630b3
support thirdPartyPopulated
These don't have importTree in their config, because they don't support
tree import, but they do still support import, and do not support export
or key/value modification.
2020-12-21 13:49:47 -04:00