Commit graph

1526 commits

Author SHA1 Message Date
Joey Hess
d154e7022e
incremental verification for web special remote
Except when configuration makes curl be used. It did not seem worth
trying to tail the file when curl is downloading.

But when an interrupted download is resumed, it does not read the whole
existing file to hash it. Same reason discussed in
commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long
time with no progress being displayed. And also there's an open http
request, which needs to be consumed; taking a long time to hash the file
might cause it to time out.

Also in passing implemented it for git and external special remotes when
downloading from the web. Several others like S3 are within striking
distance now as well.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 15:02:22 -04:00
Joey Hess
88b63a43fa
distinguish between incremental verification failing and not being done
Sponsored-by: Dartmouth College's DANDI project
2021-08-18 14:38:02 -04:00
Joey Hess
449851225a
refactor
IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.

It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 13:19:02 -04:00
Joey Hess
57b5ec79e7
remove comment
This comment used to be in Crypto, where it made sense, but it does not
really make any sense in Utility.Hash
2021-08-18 13:02:02 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
a38b724bfa
remove unused function 2021-08-10 20:04:17 -04:00
Joey Hess
86bd9ac186
fix missing new lines in processTranscript 2021-08-02 13:42:27 -04:00
Joey Hess
66089e97de
Fix a rounding bug in display of data sizes
Eg, showImprecise 1 1.99 returned "1.1" rather than "2". The 9 rounded
upward to 10, and that was wrongly used as the decimal, rather than
carrying the 1.

Sponsored-by: Jack Hill on Patreon
2021-07-30 09:56:04 -04:00
Joey Hess
b9db859221
addurl: Avoid crashing when used on beegfs.
Sponsored-by: Dartmouth College's DANDI project
2021-07-05 13:02:40 -04:00
Joey Hess
9905ec19a7
add pointer to annex.security.allowed-url-schemes
Sponsored-by: Kevin Mueller on Patreon
2021-07-02 10:53:45 -04:00
Joey Hess
7b6deb1109
display scanning message whenever reconcileStaged has enough files to chew on
Clear visible progress bar first.

Removed showSideActionAfter because it can't be used in reconcileStaged
(import loop). Instead, it counts the number of files it
processes and displays it after it's seen a sufficient to know it's
taking a while.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 12:48:30 -04:00
Joey Hess
0434674c85
avoid displaying the scanning annexed files message when repo is not large
Avoids users thinking this scan is a big deal, when it's not in the
majority of repos.

showSideActionAfter has some ugly caveats, since it has to display in
the background of another action. I could not see a better way to do it
and it works fine in this particular case. It also doesn't really belong
in Annex.Concurrent, but cannot go in Messages due to an import loop.

Sponsored-by: Dartmouth College's Datalad project
2021-06-04 13:16:48 -04:00
Joey Hess
4de3351c5c
set cwd rarher than changing current process directory
This is not actually used in git-annex though.
2021-05-12 17:44:22 -04:00
Joey Hess
947d2a10bc
assistant: Fix a crash on startup by avoiding using forkProcess
ghc 8.8.4 seems to have changed something that broke code that has been
successfully using forkProcess since 2012. Likely a change to GC internals.

Since forkProcess has never had clear documentation about how to
use it safely, avoid using it at all. Instead, when git-annex needs to
daemonize itself, re-run the git-annex command, in a new process group
and session.

This commit was sponsored by Luke Shumaker on Patreon.
2021-05-12 15:08:03 -04:00
Joey Hess
4bf7940d6b
fileRef: make paths relative and simplified
Fix behavior of several commands, including reinject, addurl, and rmurl
when given an absolute path to an unlocked file, or a relative path that
leaves and re-enters the repository.

To avoid slowing down all the cases where the paths are already ok
with an unncessary call to getCurrentDirectory, put in an optimisation
in relPathCwdToFile. That will probably also speed up other parts of
git-annex by some small amount, but I have not benchmarked.

Note that I did not convert branchFileRef, because it seems likely that
it will be used with a file that is not provided by the user, so is already
in a sane format. This is certainly true for the way git-annex uses it,
though maybe arguable to the extent Git.Ref is a reusable library.
2021-05-07 13:25:59 -04:00
Joey Hess
73f330a62e
document an important property of relPathCwdToFile 2021-05-07 12:57:54 -04:00
Joey Hess
6136006106
semigroup and monoid instances for DebugSelector
mempty is NoDebugSelector, so it does not default to matching
everything, or nothing, in a chain like foo <> mempty
2021-04-06 15:12:35 -04:00
Joey Hess
aaba83795b
switch from hslogger to purpose-built Utility.Debug
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.

The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.

Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
2021-04-05 13:40:31 -04:00
Joey Hess
537f9d9a11
Improved display of errors when accessing a git http remote fails.
New error message:

  Remote foo not usable by git-annex; setting annex-ignore

  http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1

If git config parse fails, or the git config file is not available at the url,
a better error message for that is also shown.

This commit was sponsored by Mark Reidenbach on Patreon.
2021-03-24 14:19:32 -04:00
Joey Hess
62e152f210
incremental checksum on download from ssh or p2p
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.

Not tested at all yet, but I imagine it will work!

Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
2021-02-09 17:03:27 -04:00
Joey Hess
ed684f651e
add incremental hashing interface to Backend
As yet unused.

Backend.External could perhaps implement it too, although that would
involve sending chunks of data to it via a pipe or something, so likely
to be slow.
2021-02-09 15:00:51 -04:00
Joey Hess
97129388d5
support fuzzy matching of addon commands
Note this does find things in PATH that are not executable.
Like searchPath use, the executable bit is not checked. Thing is,
there does not seem to be a binding for access(), which would be the
right way to check that the right execute bit is set. Anyway, if it's in
PATH and it's a file, it's probably fine to treat it as something that
was intended to be executable.

This commit was sponsored by Brock Spratlen on Patreon.
2021-02-02 19:37:09 -04:00
Joey Hess
1b63132ca3
add searchPathContents
And rename related functions for consistency.
2021-02-02 19:06:15 -04:00
James Cook
6013abe87a
fix build on openbsd 2021-02-01 11:53:31 -04:00
Joey Hess
c35fa6975b
fix handling of implicit and before parens
Fix an oddity in matching options and preferred content expressions such as
"foo (bar or baz)", which was incorrectly handled as if it were "(foo or
bar) and baz)" rather than the intended "foo and (bar or baz)"

Seemed like a change to consume should be able to handle this case
better, but I was having trouble writing it that way, so instead added
a separate pass that inserts the implicit ands explicitly. Also added
several test cases to make sure versions with and without explicit ands
generate the same.
2021-01-28 13:51:07 -04:00
Joey Hess
9b2084f29a
fix problem on windows with newly rewritten prop_relPathDirToFileAbs_basics
Seems that dropDrive on windows only drops eg c:/ but not a leading /
while on linux, it does drop a leading / (which is what it considers
to be equivilant to a drive letter. I had been relying on it to drop
both. So need to drop leading directory separators.

Also, if the quickcheck generated input is eg "c:c:c:c:foo",
dropDrive will only drop the first one, leaving a path that's
still not relative. So instead of using dropDrive, just remove the
colons from the path.
2021-01-22 14:30:48 -04:00
Joey Hess
ba109ce7df
comment typo 2021-01-21 14:13:55 -04:00
Joey Hess
73df633a62
omit inode from ContentIdentifier for directory special remote
Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.

A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.

I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.

There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.

This commit was sponsored by Brett Eisenberg on Patreon.
2021-01-19 13:15:07 -04:00
Joey Hess
7eb54bad12
fix prop_relPathDirToFileAbs_basics fail on windows
It was just slapping on a path separator to the front of the path to
make it absolute, but on windows, a path like "//foo/bar" actually
has a network "drive" of "//foo" and so that broke the test case.

Since "a:foo" is a somehow relative path on windows
(who knows how), drop any drive from the input. But dropDrive also drops
any leading path separator, making the input path relative. So now
it should be safe to slapp on a leading path separator.
2021-01-18 13:26:10 -04:00
Joey Hess
fb921cd0b0
fix build warning 2021-01-13 14:48:41 -04:00
Joey Hess
5e39b7eb8d
Windows: Work around win32 length limits when dealing with lock files 2021-01-13 14:38:35 -04:00
Joey Hess
bb4dc3a399
export TestableFilePath constructor
Useful for eg, replicating failures in ghci. No need for this to be a
smart constructor, as long as it's used with valid filepaths, it's ok
and if not the test breaks.
2021-01-13 13:23:35 -04:00
Joey Hess
99ba471209
rewrite prop_relPathDirToFileAbs_basics
This was not a good test, it broke the requirement that
relPathDirToFileAbs take absolute paths. And it failed when the two
input paths were eg, the same but differently normalized.

Replaced with some tests of the real basics of that function.
2021-01-13 13:23:26 -04:00
Joey Hess
6b13574827
Windows: include= and exclude= containing '/' will also match filenames that are written using '\'
And vice-versa, but it's better to use '/' for portability.

Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
2020-12-15 12:39:34 -04:00
Joey Hess
94b323a8e8
use TotalSize more extensively 2020-12-11 12:10:43 -04:00
Joey Hess
447d798987
export encode_c' 2020-12-09 15:28:45 -04:00
Joey Hess
19777d1c6f
minor improvements
Adding new instance for Integer, and some parsers for more parameters.

The conversion of readish to readMaybe is done because a serialized
exit code cannot contain additional text after the number.
2020-12-09 15:28:11 -04:00
Joey Hess
41f2c308ff
stall detection is working
New config annex.stalldetection, remote.name.annex-stalldetection, which
can be used to deal with remotes that stall during transfers, or are
sometimes too slow to want to use.

This commit was sponsored by Luke Shumaker on Patreon.
2020-12-08 15:22:18 -04:00
Joey Hess
794fc72afb
avoid parseDuration succeeding on empty string 2020-12-08 12:51:56 -04:00
Joey Hess
72e5764a87
move TransferrerPool from assistant
This old code will now be useful for git-annex beyond the assistant.

git-annex won't use the CheckTransferrer part, and won't run transferkeys
as a batch process, and will want withTransferrer to not shut down
transferkeys processes. Still, the rest of this is a good fit for what I
need now.

Also removed some dead code, and simplified a little bit.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-12-07 12:50:48 -04:00
Joey Hess
e5b170aa1c
switch back to POSIXTime
turned out not to need Read MeterState
2020-12-04 13:54:33 -04:00
Joey Hess
5a41e46bd4
start on serializing Messages
Json objects not yet handled, and some other special cases, but this is
the bulk of the messages.

For progress meters, POSIXTime does not have a Read instance (or a
suitable Show instance), so had to switch to using a Double for progress
meters.

This commit was sponsored by Ethan Aubin on Patreon.
2020-12-03 13:03:03 -04:00
Joey Hess
92136284b1
avoid hGetMetered 0 closing the handle
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.

I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-01 15:39:22 -04:00
Joey Hess
a3b714ddd9
finish fixing removeLink on windows
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink

This commit was sponsored by Ethan Aubin.
2020-11-24 13:20:44 -04:00
Joey Hess
dce0781391
squash remaining build warnings on windows 2020-11-24 12:35:09 -04:00
Joey Hess
804808d569
squash build warnings on windows 2020-11-23 14:00:17 -04:00
Joey Hess
b13c44cccc
convert processTranscript to use hGetLineUntilExitOrEOF
It does use it on both stdout and stderr. It seems unlikely the problem
could really affect stdout, but the unix implementation of it combines
both into a single handle in any case.
2020-11-19 16:36:37 -04:00
Joey Hess
ff0927bde9
converted reads from stderr to use hGetLineUntilExitOrEOF
These are all unlikely to suffer from the inherited stderr fd problem,
but who knows, it could happen.
2020-11-19 16:21:17 -04:00
Joey Hess
728afbc4b1
preserve other headers when adding resume header
It had lost the hAcceptEncoding header that is set as part of the
overriding of http-client's default decompression of compressed files.
Seems likely that would have caused resuming of compressed files to fail
in some cases.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-11-19 14:45:22 -04:00
Joey Hess
b90b9b936d
don't rely on exception for http 416
Fix a bug that could make resuming a download from the web fail when the
entire content of the file is actually already present locally.

What a mess that Request can throw exceptions or not, depending on how
it's configured. Makes it very hard if you need to handle some specific
http status codes in a function like this! Implementing everything two
ways did not seem appealing, if possible at all, so I decided to
override the Request if it did come configured to throw exception on
non-2xx http status. Other exceptions, like from http-client-restricted,
or due to a redirect to a non-http url, still get thrown.

This commit was sponsored by Luke Shumaker on Patreon.
2020-11-19 14:44:42 -04:00