Commit graph

29 commits

Author SHA1 Message Date
Joey Hess
0ffc59d341
change retrieveExportWithContentIdentifier to take a list of ContentIdentifier
This partly fixes an issue where there are duplicate files in the
special remote, and the first file gets swapped with another duplicate,
or deleted. The swap case is fixed by this, the deleted case will need
other changes.

This makes retrieveExportWithContentIdentifier take a list of allowed
ContentIdentifier, same as storeExportWithContentIdentifier,
removeExportWithContentIdentifier, and
checkPresentExportWithContentIdentifier.

Of the special remotes that support importtree, borg is a special case
and does not use content identifiers, S3 I assume can't get mixed up
like this, directory certainly has the problem, and adb also appears to
have had the problem.

Sponsored-by: Graham Spencer on Patreon
2022-09-20 13:19:42 -04:00
Yaroslav Halchenko
0151976676
Typo fix unncessary -> unnecessary.
Detected while reading recent CHANGELOG entry but then decided to apply
to entire codebase and docs since why not?
2022-08-20 09:40:19 -04:00
Joey Hess
debcf86029
use RawFilePath version of rename
Some small wins, almost certianly swamped by the system calls, but still
worthwhile progress on the RawFilePath conversion.

Sponsored-by: Erik Bjäreholt on Patreon
2022-06-22 16:47:34 -04:00
Joey Hess
e8a601aa24
incremental verification for retrieval from import remotes
Sponsored-by: Dartmouth College's Datalad project
2022-05-09 15:39:43 -04:00
Joey Hess
69f8e6c7c0
ImportableContentsChunkable
This improves the borg special remote memory usage, by
letting it only load one archive's worth of filenames into memory at a
time, and building up a larger tree out of the chunks.

When a borg repository has many archives, git-annex could easily OOM
before. Now, it will use only memory proportional to the number of
annexed keys in an archive.

Minor implementation wart: Each new chunk re-opens the content
identifier database, and also a new vector clock is used for each chunk.
This is a minor innefficiency only; the use of continuations makes
it hard to avoid, although putting the database handle into a Reader
monad would be one way to fix it.

It may later be possible to extend the ImportableContentsChunkable
interface to remotes that are not third-party populated. However, that
would perhaps need an interface that does not use continuations.

The ImportableContentsChunkable interface currently does not allow
populating the top of the tree with anything other than subtrees. It
would be easy to extend it to allow putting files in that tree, but borg
doesn't need that so I left it out for now.

Sponsored-by: Noam Kremen on Patreon
2021-10-08 13:15:22 -04:00
Joey Hess
9f38ecac1e
borg: Avoid trying to extract xattrs, ACLS, and bsdflags
when retrieving from a borg repository

That broke restoring on linux from a borg backup made on OSX.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2021-09-03 12:10:14 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
c952c485c8
Fix retrieval of content from borg repos accessed over ssh
It was making the borgrepo path absolute.. even when it was a ssh
repository.

Made BorgRepo a newtype, to guard against accidentially treating it like a
FilePath.

Sponsored-by: Graham Spencer on Patreon
2021-07-15 12:39:24 -04:00
Joey Hess
4a387eda54
fix oops 2021-03-26 14:37:03 -04:00
Joey Hess
f085ae4937
borg: Support importing files that are hard linked in the borg backup
Note that a key with no size field that is hard linked will
result in listImportableContents reporting a file size of 0,
rather than the actual size of the file. One result is that
the progress meter when getting the file will seem to get stuck
at 100%. Another is that the remote's preferred content expression,
if it tries to match against file size, will treat it as an empty file.
I don't see a way to improve the latter behavior, and the former behavior
is a minor enough problem.

This commit was sponsored by Jake Vosloo on Patreon.
2021-03-26 13:29:34 -04:00
Joey Hess
31eb5fddf3
borg: Fix a bug that prevented importing keys of type URL and WORM
Keys stored on the filesystem are mangled by keyFile to avoid problem
chars. So, that mangling has to be reversed when parsing files from a
borg backup back to a key.

The directory special remote also so mangles them. Some other special
remotes do not; eg S3 just serializes the key -- but S3 object names are
not limited to filesystem valid filenames anyway, so a S3 server must
not map them directly to files in any case. It seems unlikely that a
borg backup of some such special remote will get broken by this change.

This commit was sponsored by Graham Spencer on Patreon.
2021-03-26 12:07:00 -04:00
Joey Hess
a8b837aaef
add git ls-tree --long parser
Not yet used, but allows getting the size of items in the tree fairly
cheaply.

I noticed that CmdLine.Seek uses ls-tree and the feeds the files into
another long-running process to check their size. That would be an
example of a place that might be sped up by using this. Although in that
particular case, it only needs to know the size of unlocked files, not
locked. And since enabling --long probably doubles the ls-tree runtime
or more, the overhead of using it there may outwweigh the benefit.
2021-03-23 12:47:00 -04:00
Joey Hess
e10855e723
don't support dropping from thirdPartyPopulated for now
This code I'm reverting works. But it has a problem: The export db and
log and the ContentIdentifier db and log still list the content as being
stored in the remote. So when I ran borg create again and stored the
content in borg again in a new archive, git-annex sync noticed that, but
since it didn't update the tree for the old archives, it then thought
the content that had been removed from them was still in them, and so
git-annex get failed in an ugly way:

	Include pattern 'tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' never matched.
	[2020-12-28 16:40:44.878952393] process [933616] done ExitFailure 1

	  user error (borg ["extract","/tmp/b::abs4","tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"] exited 1)

It does not seem worth it to update the git tree for the export when dropping
content, that would make drop of many files very expensive in git tree objects
created. So, let's not support this I suppose..
2020-12-28 16:48:38 -04:00
Joey Hess
69d4b84501
support removing objects from borg 2020-12-28 16:36:52 -04:00
Joey Hess
b16e6fb4e6
borg appendonly config 2020-12-28 16:23:38 -04:00
Joey Hess
36133f27c0
move untrust forcing from Logs.Trust into Remote
No behavior changes here, but this is groundwork for letting remotes
such as borg vary untrust forcing depending on configuration.
2020-12-28 15:22:10 -04:00
Joey Hess
e3d356fe84
borg: add subdir= config
Note that, after changing it with enableremote, syncing won't rescan
known archives in the borg repo using the changed config. Probably not a
problem?

Also used File in some places where filenames that could theoretically
start with - are passed to borg, to avoid it confusing them with
options.
2020-12-23 13:12:11 -04:00
Joey Hess
4254e2297d
implement retrieveExportWithContentIdentifier
Moved out an XXX to a todo

This seems about ready to merge..
2020-12-22 16:16:48 -04:00
Joey Hess
a9d639c5b5
borg can prompt 2020-12-22 15:48:17 -04:00
Joey Hess
df4942e179
notice when an archive that was seen before gets deleted 2020-12-22 15:45:06 -04:00
Joey Hess
523b7143e0
implemented checkPresentExportWithContentIdentifier 2020-12-22 15:34:41 -04:00
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
5d8e4a7c74
avoid borg list of archives that have been listed before
This makes sync a lot faster in the common case where there's no new
backup.

There's still room for it to be faster. Currently the old imported tree
has to be traversed, to generate the ImportableContents. Which then
gets turned around to generate the new imported tree, which is
identical. So, it would be possible to just return a "no new imports",
or an ImportableContents that has a way to graft in a tree. The latter
is probably too far to go to optimise this, unless other things need it.
The former might be worth it, but it's already pretty fast, since git
ls-tree is pretty fast.
2020-12-22 14:06:40 -04:00
Joey Hess
7f7094a7cb
include borg archive name in tree, use empty ContentIdentifier
It's unusual to use a ContentIdentifier that is not semi-unique
for different contents. Note that in importKeys, it checks if a content
identifier is one that's known before, to avoid downloading the same
content twice. But that's done in a code path not used for borg repos,
because they are thirdpartypopulated.
2020-12-22 11:53:00 -04:00
Joey Hess
bcd55b365c
import from borg is basically working
Still some issues to deal with, see TODO and XXX.

Here's what gets logged, for each key:

cid log:
1608582045.832799227s 6720ebad-b20e-4460-a8f2-2477361aea75 !MjAyMC0xMi0yMVQxMTozMzoxNw==:!MjAyMC0xMi0yMVQxMzowNzoyNg==

The "!Mj" are base64 encoded borg archive names, since mine were
dates and contained some characters not allowed in cid logs unescaped.
There were archives that each contained the key. This list will grow as
more borg backups are done and learned about.

tree generated:
120000 blob 5ef6a4615c084819b44cd4e3a31657664ddf643b	x/dotgit/annex/objects/06/mv/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da
120000 blob 063a139d3021c8db60f5c576d29fada2b824d91c	x/dotgit/annex/objects/72/PP/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893
120000 blob b53b54916fd6abf21fedf796deca08d5ac7a75af	x/dotgit/annex/objects/Ww/pk/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-12-21 16:37:55 -04:00
Joey Hess
ca31d7e54f
refactor
That code was not borg specific, and I can see making more remotes for
other backup software.
2020-12-18 17:08:44 -04:00
Joey Hess
1c054f1cf7
started borg special remote
Still need to implement 3 methods, but importKeyM looks like it will
work well to find annex object files.
2020-12-18 16:56:54 -04:00
Joey Hess
3207e8293b
start borg special remote
Compiles, but unusable so far.
2020-12-18 16:03:51 -04:00