Commit graph

2374 commits

Author SHA1 Message Date
Joey Hess
04c12aa6df
custom protocol for transferrer
Rather than using Read/Show, which would force me to preserve data types
into the future.

I considered just deriving json and sending that, but I don't much like
deriving json with data types that have named constructors (like Key
does) because again it locks in data type details.

So instead, used SimpleProtocol, with a fairly complex and unreadable
protocol. But it is as efficient as the p2p protocol at least, and as
future proof.

(Writing my own custom json instances would have worked but I thought
of it too late and don't want to do all the work twice. The only real
benefit might be that aeson could be faster.)

Note that, when a new protocol request type is added later, git-annex
trying to use it will cause the git-annex transferrer to display a
protocol error message. That seems ok; it would only happen if a new
git-annex found an old version of itself in PATH or the program
file. So it's unlikely, and all it can do anyway is display an error.
(The error message could perhaps be improved..)

This commit was sponsored by Jack Hill on Patreon.
2020-12-09 16:13:59 -04:00
Joey Hess
004a4f5fb1
factor out Types.Transferrer 2020-12-09 13:28:49 -04:00
Joey Hess
677003a6df
rename helper
More consistent name with TransferrerPool
2020-12-09 13:24:24 -04:00
Joey Hess
05c0543e8e
move new interface to git-annex transfer
This is to avoid breakage when upgrading or downgrading git-annex with a
process running that uses the interface. It's better to keep the
compatability code for a few years than worry about such breakage.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-12-09 12:33:56 -04:00
Joey Hess
fcc9e01556
finally using transferkeys
Seems to work! Even progress bars. Have not tested prompting or various
error message displays yet.

transferkeys had to be made to operate in different modes for the
Assistant and Annex monads. A bit ugly, but it did relegate that
really ugly Database.Keys.closeDb in transferkeys to only the assistant
code path.

This commit was sponsored by Noam Kremen.
2020-12-07 16:18:26 -04:00
Joey Hess
4c47568876
refactoring
This is groundwork for using git-annex transferkeys to run transfers,
in order to allow stalled transfers to be interrupted and retried.

The new upload and download are closer to what git-annex transferkeys
does, so the plan is to make them use it.

Then things that were left using upload' and download' won't recover
from stalls. Notably, that includes import and export. But
at least get/move/copy will be able to. (Also the assistant hopefully,
but not yet.)

This commit was sponsored by Jake Vosloo on Patreon.
2020-12-07 14:49:17 -04:00
Joey Hess
438d5be1f7
support prompt in message serialization
That seems to be the last thing needed for message serialization.
Although it's only used in the assistant currently, so hard to tell if I
forgot something.

At this point, it should be possible to start using transferkeys
when performing transfers, which will allow killing a transferkeys
process if a transfer times out or stalls. But that's for another day.

This commit was sponsored by Ethan Aubin.
2020-12-04 14:54:09 -04:00
Joey Hess
7a9b618d5d
fix problem with last commit and assistant
liftAnnex blocks all others calls, so avoid using it with a long-duration
call to readResponse.
2020-12-04 12:20:04 -04:00
Joey Hess
cad147cbbf
new protocol for transferkeys, with message serialization
Necessarily threw out the old protocol, so if an old git-annex assistant
is running, and starts a transferkeys from the new git-annex, it would
fail. But, that seems unlikely; the assistant starts up transferkeys
processes and then keeps them running. Still, may need to test that
scenario.

The new protocol is simple read/show and looks like this:

TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo"))
TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9}))
TransferOutput (OutputMessage "(checksum...) ")
TransferResult True

Granted, this is not optimally fast, but it seems good enough, and is
probably nearly as fast as the old protocol anyhow.

emitSerializedOutput for ProgressMeter is not yet implemented. It needs
to somehow start or update a progress meter. There may need to be a new
message that allocates a progress meter, and then have ProgressMeter
update it.

This commit was sponsored by Ethan Aubin
2020-12-03 16:21:20 -04:00
Joey Hess
a3b714ddd9
finish fixing removeLink on windows
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink

This commit was sponsored by Ethan Aubin.
2020-11-24 13:20:44 -04:00
Joey Hess
631c8d3e5b
avoid redundant adjusted branch update in sync
sync still does update it if the config would otherwise not, since it
already did.
2020-11-16 15:13:48 -04:00
Joey Hess
0896038ba7
annex.adjustedbranchrefresh
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.

Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:

* If two files point to one file, then eg, `git annex get foo` will
  update the branch to unlock foo, but will not unlock bar, because it
  does not know about it. Might be fixable by making `git annex get
  bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
  of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info

Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
2020-11-16 14:27:28 -04:00
Joey Hess
26cf26caca
Merge branch 'master' into symlink-missing 2020-11-16 10:03:12 -04:00
Joey Hess
5a8d01f63e
examinekey: Added a "file" format variable
For consistency with find, and for easier scripting.
2020-11-16 09:59:11 -04:00
Joey Hess
ccfa9b2dc4
make sync update --unlock-present branch 2020-11-13 15:04:34 -04:00
Joey Hess
e66b7d2e1b
rename to --unlock-present and better reverse adjusting
An --unlock-present branch reverses back to a branch where
all files that get modified or renamed become locked, even if they were
originally unlocked. This is the same that reversing a --unlock branch
works, and the new name makes that commonality more clear.
2020-11-13 14:56:43 -04:00
Joey Hess
3899e216af
Merge branch 'master' into symlink-missing 2020-11-13 14:19:45 -04:00
Joey Hess
a30030c4a6
move: Fix a regression in the last release that made move --to not honor numcopies settings
This commit was sponsored by Svenne Krap on Patreon.
2020-11-13 14:19:32 -04:00
Joey Hess
c8e49c5ef5
git-annex adjust --lock-missing
Like --hide-missing the branch does not get updated when content
availability changes.

Seems to basically work, but sync does not update it yet.

Also, when a file is present and so unlocked, git mv followed by
git-annex sync results in the basis branch being updated to contain the
file with the new name, unlocked. This seems different than what
happens in an adjusted unlocked branch, where the commit propigates back
locked. Probably the reverse adjustment code needs to be improved to
handle this case.
2020-11-13 13:39:44 -04:00
Joey Hess
7566aa6bc5
examinekey: Added --migrate-to-backend
Note that, the way the SeekInput parser is written to support batch mode,
it's actually possible to do git-annex examinekey
"SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E

While that might be kind of useful to support multiple migrations not using
batch mode, I have not documented it. It would be better to take pairs of
key and file in that case.
2020-11-12 14:09:14 -04:00
Joey Hess
12e32d1dee
examinekey: Added two new format variables: objectpath and objectpointer 2020-11-12 13:02:31 -04:00
Joey Hess
92b7b1964d
add warning on add of annex link
Warn when adding a annex symlink or pointer file that uses a key that is
not known to the repository, to prevent confusion if the user has copied it
from some other repository.

This commit was sponsored by Jake Vosloo on Patreon.
2020-11-10 12:10:51 -04:00
Joey Hess
e81bb05b25
add debug in two unusual situations 2020-11-09 17:52:06 -04:00
Joey Hess
1db49497e0
finished this stage of the RawFilePath conversion
This commit was sponsored by Denis Dzyubenko on Patreon.
2020-11-06 14:10:58 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
5a1e73617d
finished this stage of the RawFilePath conversion
Finally compiles again, and test suite passes.

This commit was sponsored by Brock Spratlen on Patreon.
2020-11-04 14:20:37 -04:00
Joey Hess
4bcb4030a5
more RawFilePath conversion
580/645

This commit was sponsored by Jack Hill on Patreon.
2020-11-03 18:34:27 -04:00
Joey Hess
eb42cd4d46
more RawFilePath conversion
535/645

This commit was sponsored by Brett Eisenberg on Patreon.
2020-11-03 10:11:04 -04:00
Joey Hess
55400a03d3
more RawFilePath conversion
This commit was sponsored by Luke Shumaker on Patreon.
2020-11-02 16:31:28 -04:00
Joey Hess
87f91ce563
more RawFilePath conversion
451/645
2020-10-30 15:55:59 -04:00
Joey Hess
e505c03bcc
more RawFilePath conversion
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.

This commit was sponsored by Graham Spencer on Patreon.
2020-10-29 10:50:29 -04:00
Joey Hess
a108b00b33
testremote: Display exceptions when tests fail, to aid debugging 2020-10-23 15:41:57 -04:00
Joey Hess
0133b7e5a8
move: Improve resuming a move that was interrupted after the object was transferred
In cases where numcopies checks prevented the resumed move from dropping
the object from the source repository, it now relies on a log of recent
moves to replicate the behavior of the interrupted command.

Performance: Probably noticable impact, since it has to add to the log,
check the log, and remove from the log. Seems worth it to avoid this
annoying edge case. The log functions are pretty well optimised to avoid
unncessary work.

An performance improvement to make later would be to avoid cleanup doing
anything if it's not written to the log file, and has confirmed that the
log file does not contain the log line.

This commit was sponsored by Jake Vosloo on Patreon.
2020-10-21 10:31:56 -04:00
Joey Hess
7036d0a4c1
add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan
This commit was sponsored by Jochen Bartl on Patreon.
2020-10-19 15:36:18 -04:00
Joey Hess
2dd38b6403
switch to Haskell2010
When I put in Haskell98 this spring, I was under the mistaken
apprehension that ghc defaulted to that. But it actually its default
is a third mode, which is closer to Haskell2010 but with some differences.
The manual says "By default, GHC mainly aims to behave (mostly) like a
Haskell 2010 compiler"

Fixed two cases where the Haskell98 do indentation flexability let
wrongly indented code build. That is one of the places where
ghc does not behave like Haskell2010 by default.

The other place that I think I was concerned about, is GHC manual
section 19.1.1.3. Expressions and patterns. But that only seems to
affect code using bottoms, so would only affect pure functions throwing
an error, which I don't think git-annex does in many places as it's
pretty horrid style. And it would only affect rare cases like shown in
that section. If it did happen, it would mean that the error was not
thrown before specifying Haskell98, and then was. Haskell2010 behaves
the same as Haskell98.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-10-19 11:26:16 -04:00
Joey Hess
c56efbbdb6
import: Check gitignores when importing trees from special remotes
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-30 10:41:59 -04:00
Joey Hess
0033e08193
avoid a second traversal of the ImportableContents
Do all filtering in one pass.
2020-09-30 10:10:03 -04:00
Joey Hess
4c32499e82
Parse youtube-dl progress output
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.

The youtube-dl output is no longer displayed, except for any errors.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-29 17:53:48 -04:00
Joey Hess
1610d94776
addurl: Avoid a redundant git ignores check for speed
Ensure that checkCanAdd is used everywhere a file is added to git,
so git add is run with -f, presumably avoiding the work it would usually
do to check ignores.
2020-09-29 13:00:41 -04:00
Joey Hess
658ea7ca3c
sync --no-content import from directory special remote
sync: When run without --content, import without copying from
importtree=yes directory special remotes. (Other special remotes may
support this later as well.)

This commit was sponsored by Svenne Krap on Patreon.
2020-09-28 15:29:08 -04:00
Joey Hess
3eaaec3113
consistently use importKey when available
This avoids import with --no-content and with --content potentially
generating two different trees, leading to a merge conflict when run in
two different clones of a repo. And it's necessary groundwork to make
git-annex sync --no-content import from special remotes that support
importKey.

Only the directory special remote currently supports importKey, and it
generates the same key as git-annex usually does, so there is no
behavior change for it.

Future special remotes will need to take care when adding importKey,
if it generates different keys. Added some warnings about that to
comments.

This commit was sponsored by Noam Kremen on Patreon.
2020-09-28 15:27:46 -04:00
Joey Hess
8b74f01a26
split ProvidedInfo and UserProvidedInfo
The latter is for git-annex matchexpression and matching against it can
throw an exception. Splitting out the former reduces the potential for
mistakes and avoids needing to worry about matching against that
throwing an exception.

This is more groundwork for matching largefiles while importing,
without downloading content.

This commit was sponsored by Graham Spencer on Patreon.
2020-09-28 12:12:38 -04:00
Joey Hess
00dbe35fbc
allow matching on files whose content is not present
Anything that needs to examine the file content will fail to match,
or fall back to other available information. But the intent is that the
matcher be checked for matchNeedsFileContent and only be used if it does
not, so the exact behavior doesn't much matter as it should never
happen.

The real point of this is to not need to provide a dummy content file
when matching.

This commit was sponsored by Martin D on Patreon.
2020-09-28 11:17:46 -04:00
Joey Hess
f624876dc2
remove zombie process in file seeking
This was the last one marked as a zombie. There might be others I don't
know about, but except for in the hypothetical case of a thread dying
due to an async exception before it can wait on a process it started, I
don't know of any.

It would probably be safe to remove the reapZombies now, but let's wait
and so that in its own commit in case it turns out to cause problems.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-25 11:38:42 -04:00
Joey Hess
ca454c47f2
explicitly wait for a git process
Eliminate a zombie that was only cleaned up by the later zombie cleanup
code.

This is still not ideal, it would be cleaner if it used conduit or
something, and if the thread gets killed before waiting, it won't stop
the process.

Only remaining zombies are in CmdLine.Seek
2020-09-25 11:03:12 -04:00
Joey Hess
051e16a945
remove debug print 2020-09-24 15:37:39 -04:00
Joey Hess
d89984b121
sync --all avoid unncessary first pass
Sped up seeking to around twice as fast, by avoiding a pass over the
worktree files when preferred content expressions of the local repo and
remotes don't use include=/exclude=.

Thanks to Lukey for identifying the optimisation.

This commit was sponsored by Brock Spratlen on Patreon.
2020-09-24 15:12:09 -04:00
Joey Hess
b45b37b088
wait for first pass to complete before second pass
Otherwise the bloom filter may not be fully populated when the second
pass starts, which could have led to incorrect behavior with --all -J,
probably in very rare circumstances.
2020-09-24 14:23:25 -04:00
Joey Hess
167da965b9
remove obsolete comment 2020-09-24 14:22:56 -04:00
Joey Hess
c1b4d76e6b
make MatchFiles introspectable
matchNeedsFileContent is not used yet, but shows how to add information
about terminals. That one would be needed for
https://git-annex.branchable.com/todo/sync_fast_import/

Note the tricky bit in Annex.FileMatcher.call where it folds over the
included matcher to propagate the information.

This commit was sponsored by Svenne Krap on Patreon.
2020-09-24 14:01:53 -04:00