Commit graph

39220 commits

Author SHA1 Message Date
Joey Hess
e3832af5d5
Merge branch 'master' of ssh://git-annex.branchable.com 2021-02-10 16:40:16 -04:00
Joey Hess
dc9376feeb
optimisation
IORef rather than MVar sped up benchmark mentioned in last commit to
13.0s.

This makes me wonder if changing the interface to not need the IORef
either would improve speed further.
2021-02-10 16:39:41 -04:00
Joey Hess
f44d4704c6
incremental checksum for local remotes
This benchmarks only slightly faster than the old git-annex. Eg, for a 1
gb file, 14.56s vs 15.57s. (On a ram disk; there would certianly be
more of an effect if the file was written to disk and didn't stay in
cache.)

Commenting out the updateIncremental calls make the same run in 6.31s.
May be that overhead in the implementation, other than the actual
checksumming, is slowing it down. Eg, MVar access.

(I also tried using 10x larger chunks, which did not change the speed.)
2021-02-10 16:05:24 -04:00
Joey Hess
48f63c2798
stop using rsync in fileCopier
This is groundwork for calculating checksums while copying, rather than
in a separate pass, but that's not done yet. For now, avoid using rsync
(and cp on Windows), and instead read and write the file ourselves, with
resume handling.

Benchmarking vs old git-annex that used rsync, this is faster,
at least once the file size is larger than a couple of MB.
2021-02-10 14:44:35 -04:00
Joey Hess
c4c9b99e22
refactoring 2021-02-10 13:38:45 -04:00
Joey Hess
e24ddb8946
Bugfix: fsck --from a ssh remote did not actually check that the content on the remote is not corrupted
Changing to the P2P protocol broke this, because preseedTmp copies
the local copy of the object to the temp file, and then the P2P transfer
sees the right length file and uses it as-is.

When git-annex-shell is too old and rsync is used, it did verify the
content, and when the local repo does not have the object it did verify the
content.
2021-02-10 13:29:12 -04:00
Joey Hess
6487a75d33
comment 2021-02-10 13:15:00 -04:00
Joey Hess
1c75364eac
fix missing call to check after hard linking
This could perhaps have caused a hard link to be made when the content
of the object was modified. I don't think that actually happened,
because the annexed file would have to be unlocked, with annex.thin, for
the object to get modified, and in that case, a hard link is not made.
However, to be sure, run the check.

Note that it seemed best to run the check only once, although the
current implementation is fast and safe to run repeatedly.
2021-02-10 13:07:38 -04:00
Joey Hess
f08d7688e9
Merge branch 'incrementalhash' 2021-02-10 12:42:17 -04:00
Joey Hess
4b63e932f3
incremental checksum on upload to ssh or p2p 2021-02-10 12:41:05 -04:00
Joey Hess
94f6210b68
deal with possibility of short read by S.hGet
It may read less than requested, and may yield an empty string if the
file was somehow shorter than expected.
2021-02-09 22:20:05 -04:00
yarikoptic
f5c3eb86f9 Added a comment 2021-02-09 22:48:07 +00:00
Joey Hess
86814582e8
Merge branch 'master' of ssh://git-annex.branchable.com 2021-02-09 17:06:32 -04:00
Joey Hess
d8598dc3a0
comment 2021-02-09 17:05:56 -04:00
Joey Hess
62e152f210
incremental checksum on download from ssh or p2p
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.

Not tested at all yet, but I imagine it will work!

Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
2021-02-09 17:03:27 -04:00
Joey Hess
ed684f651e
add incremental hashing interface to Backend
As yet unused.

Backend.External could perhaps implement it too, although that would
involve sending chunks of data to it via a pipe or something, so likely
to be slow.
2021-02-09 15:00:51 -04:00
Joey Hess
fd51b0cd83
comment 2021-02-09 13:42:49 -04:00
Joey Hess
fa3d71d924
Tahoe: Avoid verifying hash after download, since tahoe does sufficient verification itself
See my comment in the next commit for some details about why
Verified needs a hash with preimage resistance. As far as tahoe goes,
it's fully cryptographically secure.

I think that bup could also return Verified. However, the Retriever
interface does not currenly support that.
2021-02-09 13:42:16 -04:00
nguenthe@0f416ab0ba07a395eb8a0c85732e0105e4970e10
c208f8c3d4 Added a comment 2021-02-09 17:09:17 +00:00
nguenthe@0f416ab0ba07a395eb8a0c85732e0105e4970e10
ed28f4a744 Added a comment 2021-02-09 17:06:25 +00:00
Joey Hess
cbe84b62b9
close 2021-02-08 18:17:59 -04:00
Joey Hess
bb8db94655
reorder 2021-02-08 17:54:29 -04:00
Joey Hess
f36a0c1b13
deal with cabal unpack not preserving execute bit 2021-02-08 14:32:24 -04:00
Joey Hess
8f3554a7a8
close as dup 2021-02-08 14:19:43 -04:00
Joey Hess
6bc24b1bce
2021 2021-02-08 13:44:58 -04:00
Joey Hess
3a66cd715f
avoid making absolute git remote path relative
When a git remote is configured with an absolute path, use that path,
rather than making it relative. If it's configured with a relative path,
use that.

Git.Construct.fromPath changed to preserve the path as-is,
rather than making it absolute. And Annex.new changed to not
convert the path to relative. Instead, Git.CurrentRepo.get
generates a relative path.

A few things that used fromAbsPath unncessarily were changed in passing to
use fromPath instead. I'm seeing fromAbsPath as a security check,
while before it was being used in some cases when the path was
known absolute already. It may be that fromAbsPath is not really needed,
but only git-annex-shell uses it now, and I'm not 100% sure that there's
not some input that would cause a relative path to be used, opening a
security hole, without the security check. So left it as-is.

Test suite passes and strace shows the configured remote url is used
unchanged in the path into it. I can't be 100% sure there's not some code
somewhere that takes an absolute path to the repo and converts it to
relative and uses it, but it seems pretty unlikely that the code paths used
for a git remote would call such code. One place I know of is gitAnnexLink,
but I'm pretty sure that git remotes never deal with annex symlinks. If
that did get called, it generates a path relative to cwd, which would have
been wrong before this change as well, when operating on a remote.
2021-02-08 13:18:01 -04:00
Joey Hess
df26c9a492
Merge branch 'master' of ssh://git-annex.branchable.com 2021-02-08 11:39:36 -04:00
yarikoptic
2038d839cc Added a comment: why relative path? 2021-02-08 14:47:43 +00:00
yarikoptic
30728860ac removed 2021-02-08 14:46:11 +00:00
yarikoptic
6834093433 Added a comment 2021-02-08 14:44:39 +00:00
Joey Hess
8df2215891
Merge branch 'master' of ssh://git-annex.branchable.com 2021-02-05 15:34:04 -04:00
Joey Hess
724c97952b
update 2021-02-05 10:22:42 -04:00
falsifian
be5a0e4b8f 2021-02-05 05:11:24 +00:00
Joey Hess
3d3b55e7f2
Merge branch 'master' of ssh://git-annex.branchable.com 2021-02-03 15:57:46 -04:00
Joey Hess
dd39e9e255
suggest when user may want annex.stalldetection
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.

Note that the progress meter display is not taken down when displaying
the message, so it will display like this:

	0%    8 B                 0 B/s
	  Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
	0%    10 B                0 B/s

Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.

Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.

A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.

This commit was sponsored by Kevin Mueller on Patreon.
2021-02-03 15:57:19 -04:00
Joey Hess
7db4e62a90
remove accidental duplicated code
The code in Annex.WorkerStage and Annex.Concurrent was 100% identical.
2021-02-03 15:23:52 -04:00
Joey Hess
135757d64a
automatic stall detection
annex.stalldetection can now be set to "true" to make git-annex do
automatic stall detection when it detects a remote is updating its transfer
progress consistently enough.

This commit was sponsored by Luke Shumaker on Patreon.
2021-02-03 13:33:57 -04:00
gueux
59fa8b7671 Added a comment 2021-02-03 17:20:24 +00:00
gueux
31f5482729 Added a comment 2021-02-03 17:06:24 +00:00
Joey Hess
904689f11b
Merge branch 'master' of ssh://git-annex.branchable.com 2021-02-02 19:39:24 -04:00
Joey Hess
97129388d5
support fuzzy matching of addon commands
Note this does find things in PATH that are not executable.
Like searchPath use, the executable bit is not checked. Thing is,
there does not seem to be a binding for access(), which would be the
right way to check that the right execute bit is set. Anyway, if it's in
PATH and it's a file, it's probably fine to treat it as something that
was intended to be executable.

This commit was sponsored by Brock Spratlen on Patreon.
2021-02-02 19:37:09 -04:00
Joey Hess
1b63132ca3
add searchPathContents
And rename related functions for consistency.
2021-02-02 19:06:15 -04:00
guardcat
ebb805b611 2021-02-02 21:04:21 +00:00
Joey Hess
aec2cf0abe
addon commands
Seems only fair, that, like git runs git-annex, git-annex runs
git-annex-foo.

Implementation relies on O.forwardOptions, so that any options are passed
through to the addon program. Note that this includes options before the
subcommand, eg: git-annex -cx=y foo

Unfortunately, git-annex eats the --help/-h options.
This is because it uses O.hsubparser, which injects that option into each
subcommand. Seems like this should be possible to avoid somehow, to let
commands display their own --help, instead of the dummy one git-annex
displays.

The two step searching mirrors how git works, it makes finding
git-annex-foo fast when "git annex foo" is run, but will also support fuzzy
matching, once findAllAddonCommands gets implemented.

This commit was sponsored by Dr. Land Raider on Patreon.
2021-02-02 16:32:49 -04:00
Joey Hess
e78d2c9642
move global options handling closer to Command definitions 2021-02-02 15:55:45 -04:00
Joey Hess
4cc65d97fc
refactor 2021-02-02 14:27:42 -04:00
jrollins
627940cf7b Added a comment 2021-02-02 17:43:19 +00:00
Joey Hess
d0fe0c5e10
close old todo 2021-02-02 13:25:58 -04:00
Joey Hess
696ee5d464
close 2021-02-02 13:21:34 -04:00
Joey Hess
233b2ab133
reject 2021-02-02 13:19:28 -04:00