Commit graph

1420 commits

Author SHA1 Message Date
Joey Hess
53744e132d
incremental verification for gitlfs and httpalso
And that should be all the special remotes supporting it on linux now,
except for in the odd edge case here and there.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 15:17:10 -04:00
Joey Hess
f5e09a1dbe
incremental verification for S3
Sponsored-by: Dartmouth College's DANDI project
2021-08-18 15:07:00 -04:00
Joey Hess
d154e7022e
incremental verification for web special remote
Except when configuration makes curl be used. It did not seem worth
trying to tail the file when curl is downloading.

But when an interrupted download is resumed, it does not read the whole
existing file to hash it. Same reason discussed in
commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long
time with no progress being displayed. And also there's an open http
request, which needs to be consumed; taking a long time to hash the file
might cause it to time out.

Also in passing implemented it for git and external special remotes when
downloading from the web. Several others like S3 are within striking
distance now as well.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 15:02:22 -04:00
Joey Hess
88b63a43fa
distinguish between incremental verification failing and not being done
Sponsored-by: Dartmouth College's DANDI project
2021-08-18 14:38:02 -04:00
Joey Hess
325bfda12d
refactor 2021-08-18 13:37:00 -04:00
Joey Hess
449851225a
refactor
IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.

It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 13:19:02 -04:00
Joey Hess
f0754a61f5
plumb VerifyConfig into retrieveKeyFile
This fixes the recent reversion that annex.verify is not honored,
because retrieveChunks was passed RemoteVerify baser, but baser
did not have export/import set up.

Sponsored-by: Dartmouth College's DANDI project
2021-08-17 12:43:13 -04:00
Joey Hess
8613770b06
incremental verify for webdav special remote
Sponsored-by: Dartmouth College's DANDI project
2021-08-16 17:29:32 -04:00
Joey Hess
b1622eb932
incremental verify for directory special remote
Added fileRetriever', which will let the remaining special remotes
eventually also support incremental verify.

Sponsored-by: Dartmouth College's DANDI project
2021-08-16 16:51:33 -04:00
Joey Hess
a644f729ce
refactor fileCopier
Sponsored-by: Dartmouth College's DANDI project
2021-08-16 15:56:24 -04:00
Joey Hess
d889ae0c01
move comment 2021-08-16 15:25:06 -04:00
Joey Hess
c4aba8e032
better handling of finishing up incomplete incremental verify
Now it's run in VerifyStage.

I thought about keeping the file handle open, and resuming reading where
tailVerify left off. But that risks leaking open file handles, until the
GC closes them, if the deferred verification does not get resumed. Since
that could perhaps happen if there's an exception somewhere, I decided
that was too unsafe.

Instead, re-open the file, seek, and resume.

Sponsored-by: Dartmouth College's DANDI project
2021-08-16 14:52:59 -04:00
Joey Hess
dadbb510f6
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.

This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.

Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.

The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 15:43:29 -04:00
Joey Hess
7eb3742e4b
incremental verify for chunked remotes
Simply feed each chunk in turn to the incremental verifier.

When resuming an interrupted retrieve, it does not do incremental
verification. That would need to read the file, up to the resume point,
and feed it to the incremental verifier. That seems easy to get wrong.
Also it would mean extra work done before the transfer can start. Which
would complicate displaying progress, and would perhaps not appear to the
user as if it was resuming from where it left off. Instead, in that
situation, return UnVerified, and let the verification be done in a
separate pass.

Granted, Annex.CopyFile does manage all that, but it's not complicated
by dealing with chunks too.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:42:49 -04:00
Joey Hess
c20358b671
incremental verify for byteRetriever special remotes
Several special remotes verify content while it is being retrieved,
avoiding a separate checksum pass. They are: S3, bup, ddar, and
gcrypt (with a local repository).

Not done when using chunking, yet.

Complicated by Retriever needing to change to be polymorphic. Which in turn
meant RankNTypes is needed, and also needed some code changes. The
change in Remote.External does not change behavior at all but avoids
the type checking failing because of a "rigid, skolem type" which
"would escape its scope". So I refactored slightly to make the type
checker's job easier there.

Unfortunately, directory uses fileRetriever (except when chunked),
so it is not amoung the improved ones. Fixing that would need a way for
FileRetriever to return a Verification. But, since the file retrieved
may be encrypted or chunked, it would be extra work to always
incrementally checksum the file while retrieving it. Hm.

Some other special remotes use fileRetriever, and so don't get incremental
verification, but could be converted to byteRetriever later. One is
GitLFS, which uses downloadConduit, which writes to the file, so could
verify as it goes. Other special remotes like web could too, but don't
use Remote.Helper.Special and so will need to be addressed separately.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:20:38 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
a871bcfe77
simplify 2021-08-09 15:17:48 -04:00
Joey Hess
f1176f82a5
rsync special remote: Stop displaying rsync progress, and use git-annex's own progress display
Reasons are same as in commit cee14f147a.
(It was already done when using -J.)

Sponsored-by: Mark Reidenbach on Patreon
2021-08-09 12:06:10 -04:00
Joey Hess
de482c7eeb
move verifyKeyContent to Annex.Verify
The goal is that Database.Keys be able to use it; it can't use
Annex.Content.Presence due to an import loop.

Several other things also needed to be moved to Annex.Verify as a
conseqence.
2021-07-27 14:07:23 -04:00
Joey Hess
e676cd43c0
propagate debugging into remote's Annex monad
This is needed to make the debugging added in
0073384850 actually be displayed when
running git-annex get from a local remote.
2021-07-26 11:40:51 -04:00
Joey Hess
635e7f3e26
split annexLocations
To avoid mistakes like commit 0ccbed4f6f,
be explicit about the two variants of this.

Incidentially avoids a small amount of overhead in calling reverse.

Sponsored-by: Shae Erisson on Patreon
2021-07-16 14:17:56 -04:00
Joey Hess
c952c485c8
Fix retrieval of content from borg repos accessed over ssh
It was making the borgrepo path absolute.. even when it was a ssh
repository.

Made BorgRepo a newtype, to guard against accidentially treating it like a
FilePath.

Sponsored-by: Graham Spencer on Patreon
2021-07-15 12:39:24 -04:00
Joey Hess
df2001aa88
Improve display of errors when transfers fail
Transfers from or to a local git repo could fail without a reason being
given, if the content failed to verify, or if the object file's stat
changed while it was being copied. Now display messages in these cases.

Sponsored-by: Jack Hill on Patreon
2021-06-25 13:17:04 -04:00
Joey Hess
0f73b6d03a
Avoid more than 1 gpg password prompt at the same time
Which could happen occasionally before when concurrency is enabled.
While not much of a problem when it did happen, better to avoid it. Also,
since it seems likely the gpg-agent sometimes fails in such a situation,
this makes it not happen when running a single git-annex command with
concurrency enabled.

This commit was sponsored by Jake Vosloo on Patreon.
2021-04-27 16:36:44 -04:00
Joey Hess
c7a3404b20
add missing whitespace 2021-04-27 15:23:56 -04:00
Joey Hess
f8836306fa
remove "checking remotename" message
This fixes fsck of a remote that uses chunking displaying
(checking remotename) (checking remotename)" for every chunk.

Also, some remotes displayed the message, and others did not, with no
consistency. It was originally displayed only when accessing remotes
that were expensive or might involve a password prompt, I think, but
nothing in the API said when to do it so it became an inconsistent mess.

Originally I thought fsck should always display it. But it only displays
in fsck --from remote, so the user knows the remote is being accessed,
so there is no reason to tell them it's accessing it over and over.

It was also possible for git-annex move to sometimes display it twice,
due to checking if content is present twice. But, the user of move
specifies --from/--to, so it does not need to display when it's
accessing the remote, as the user expects it to access the remote.

git-annex get might display it, but only if the remote also supports
hasKeyCheap, which is really only local git remotes, which didn't
display it always; and in any case nothing displayed it before hasKeyCheap,
which is checked first, so I don't think this needs to display it ever.

mirror is like move. And that's all the main places it would have been
displayed.

This commit was sponsored by Jochen Bartl on Patreon.
2021-04-27 13:05:27 -04:00
Joey Hess
0e830b6bb5
make remoteKeyToRemoteName safer
If it's passed a ConfigKey such as annex.version, avoid returning
an empty remote name and return Nothing instead. Also, foo.bar.baz is
not treated as a remote named "bar".
2021-04-23 13:29:21 -04:00
Joey Hess
fee6504388
fix reversion in recent CoW changes
File handle accidentially left open is both a FD leak and causes the
haskell RTS to reject opening it again with "file is locked".
2021-04-20 11:41:43 -04:00
Joey Hess
58da9f74b7
directory CoW on export
Completing Cow support for directory.
2021-04-14 16:19:43 -04:00
Joey Hess
b86206b553
directory CoW on import 2021-04-14 16:10:09 -04:00
Joey Hess
4b048ca042
directory CoW on store
Not for exports to directory yet though.
2021-04-14 15:11:00 -04:00
Joey Hess
7bb93896af
directory CoW on retrieve
directory: When cp supports reflinks, use it when getting content from a
directory special remote.

Not yet for imports from directory though, and not for store.

Note that, when it's chunked, using cp --reflink would not speed it up, and
when reflink was not supported, would unnecessarily write the chunk to a
file before reading it back in. So, only using a fileRetriever in the
NoChunks case is necessary to keep chunking fast.

fileCopier is told not to verify, because the special remote interface
does not yet support verification in passing. AFAICS, fileCopies can
never return False when not verifying so the added giveup should never
actually happen.
2021-04-14 15:05:12 -04:00
Joey Hess
441f65c2cf
split out Annex.CopyFile
Goal is to use it in Remote.Directory, but also it's nice to shrink Remote.Git.
2021-04-14 14:06:43 -04:00
Joey Hess
2e9d4ac754
fix fastDebug to check if debugging is actually enabled
Had to add to AnnexRead an indication of whether debugging is enabled.

Could have just made setupConsole not install a debug output action that
outputs, and have enableDebug be what installs that, but then in the
common case where there is no debug selector, and so all debug output is
selected, it would run the debug output action every time, which entails
an IORef access. Which would make fastDebug too slow..
2021-04-06 16:28:37 -04:00
Joey Hess
13c090b37a
use fastDebug everywhere it can be used
None of these are likely to yeild a noticable speedup though.
2021-04-06 15:41:24 -04:00
Joey Hess
aaba83795b
switch from hslogger to purpose-built Utility.Debug
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.

The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.

Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
2021-04-05 13:40:31 -04:00
Joey Hess
c2f612292a
start splitting out readonly values from AnnexState
Values in AnnexRead can be read more efficiently, without MVar overhead.
Only a few things have been moved into there, and the performance
increase so far is not likely to be noticable.

This is groundwork for putting more stuff in there, particularly a value
that indicates if debugging is enabled.

The obvious next step is to change option parsing to not run in the
Annex monad to set values in AnnexState, and instead return a pure value
that gets stored in AnnexRead.
2021-04-02 15:51:44 -04:00
Joey Hess
4a387eda54
fix oops 2021-03-26 14:37:03 -04:00
Joey Hess
f085ae4937
borg: Support importing files that are hard linked in the borg backup
Note that a key with no size field that is hard linked will
result in listImportableContents reporting a file size of 0,
rather than the actual size of the file. One result is that
the progress meter when getting the file will seem to get stuck
at 100%. Another is that the remote's preferred content expression,
if it tries to match against file size, will treat it as an empty file.
I don't see a way to improve the latter behavior, and the former behavior
is a minor enough problem.

This commit was sponsored by Jake Vosloo on Patreon.
2021-03-26 13:29:34 -04:00
Joey Hess
31eb5fddf3
borg: Fix a bug that prevented importing keys of type URL and WORM
Keys stored on the filesystem are mangled by keyFile to avoid problem
chars. So, that mangling has to be reversed when parsing files from a
borg backup back to a key.

The directory special remote also so mangles them. Some other special
remotes do not; eg S3 just serializes the key -- but S3 object names are
not limited to filesystem valid filenames anyway, so a S3 server must
not map them directly to files in any case. It seems unlikely that a
borg backup of some such special remote will get broken by this change.

This commit was sponsored by Graham Spencer on Patreon.
2021-03-26 12:07:00 -04:00
Joey Hess
537f9d9a11
Improved display of errors when accessing a git http remote fails.
New error message:

  Remote foo not usable by git-annex; setting annex-ignore

  http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1

If git config parse fails, or the git config file is not available at the url,
a better error message for that is also shown.

This commit was sponsored by Mark Reidenbach on Patreon.
2021-03-24 14:19:32 -04:00
Joey Hess
a8b837aaef
add git ls-tree --long parser
Not yet used, but allows getting the size of items in the tree fairly
cheaply.

I noticed that CmdLine.Seek uses ls-tree and the feeds the files into
another long-running process to check their size. That would be an
example of a place that might be sped up by using this. Although in that
particular case, it only needs to know the size of unlocked files, not
locked. And since enabling --long probably doubles the ls-tree runtime
or more, the overhead of using it there may outwweigh the benefit.
2021-03-23 12:47:00 -04:00
Joey Hess
5d75cbcdcf
webdav: deal with buggy webdav servers in renameExport
box.com already had a special case, since its renaming was known buggy.
In its case, renaming to the temp file succeeds, but then renaming the temp
file to final destination fails.

Then this 4shared server has buggy handling of renames across directories.
While already worked around with for the temp files when storing exports
now being in the same directory as the final filename, that also affected
renameExport when the file moves between directories.

I'm not entirely clear what happens on the 4shared server when it fails
this way. It kind of looks like it may rename the file to destination and
then still fail.

To handle both, when rename fails, delete both the source and the
destination, and fall back to uploading the content again. In the box.com
case, the temp file is the source, and deleting it makes sure the temp file
gets cleaned up. In the 4shared case, the file may have been renamed to the
destination and so cleaning that up avoids any interference with the
re-upload to the destination.
2021-03-22 13:08:18 -04:00
Joey Hess
0e44c252c8
avoid getting creds from environment during autoenable
When autoenabling special remotes of type S3, weddav, or glacier, do not
take login credentials from environment variables, as the user may not be
expecting the autoenable to happen, and may have those set for other
purposes.
2021-03-17 09:41:12 -04:00
Joey Hess
3337f7c272
fix exporting when the file is in the top of the repo
takeDirectory "foo" is ".", and that will confuse webdav, so only
use that code path when there is a subdirectory.
2021-03-16 14:17:29 -04:00
Joey Hess
3e41a8f032
move
move use of </> into DavLocation so it always uses unix filepaths due to
imports
2021-03-12 15:19:11 -04:00
Joey Hess
4f49c29d20
webdav: store temp file in same collection as the final export location
This may work better in some webdav server that gets confused at
cross-collection renamed. I don't know, let's find out.

The only real downside of doing this is that the temp files are not all
in the top-level collection, in case an interrupted run leaves one
behind. But that does not seem especially significant.
2021-03-12 14:52:24 -04:00
Joey Hess
1d7fa63149
Added support for git-remote-gcrypt's rsync URIs
Which access a remote using rsync over ssh, and which git pushes to much
more efficiently than ssh urls.

There was some old partial support for rsync URIs from 2013, but it seemed
incomplete, and did not use rsync over ssh. Weird.

I'm not sure if there's any remaining benefit to using the non-rsync url
forms with gcrypt, now that this is implemented? Updated docs to encourage
using the rsync urls.

This commit was sponsored by Svenne Krap on Patreon.
2021-03-09 15:58:09 -04:00
Joey Hess
6940d4ad40
chance case of "transfer failed" to match "Transfer stalled" 2021-03-06 17:47:05 -04:00
Joey Hess
381f203d1a
refactor
Avoiding using a callback simplifies this and should make it easier to
implement incremental checksumming, which will need to happen partly in
writeRetrievedContent and partly in retrieveChunks.
2021-02-16 16:03:28 -04:00