Commit graph

621 commits

Author SHA1 Message Date
Joey Hess
6b13574827
Windows: include= and exclude= containing '/' will also match filenames that are written using '\'
And vice-versa, but it's better to use '/' for portability.

Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
2020-12-15 12:39:34 -04:00
Joey Hess
5e094d02d6
avoid using MatchingKey where MatchingFile can be used now
This is actually matching worktree files, and now that a Key can be
provided along with the file when doing that, using MatchingFile
reflects that.
2020-12-14 17:54:25 -04:00
Joey Hess
01527b21d8
add key to FileInfo
MatchingKey is not the thing to use when matching on actual worktreee
files.

Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
2020-12-14 17:42:02 -04:00
Joey Hess
205a837e8a
clarify comment 2020-12-14 16:52:53 -04:00
Joey Hess
d3f78da0ed
propagate signals to the transferrer process group
Done on unix, could not implement it on windows quite.

The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.

Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-11 15:32:00 -04:00
Joey Hess
0c46ee5ce0
simplify transferr protocol 2020-12-11 12:52:22 -04:00
Joey Hess
095cdc7e83
extend transferrer protocol to send progress bar total size updates
New protocol is not back-compat with old one, but it's never been
released so that's ok.
2020-12-11 12:42:28 -04:00
Joey Hess
94b323a8e8
use TotalSize more extensively 2020-12-11 12:10:43 -04:00
Joey Hess
04c12aa6df
custom protocol for transferrer
Rather than using Read/Show, which would force me to preserve data types
into the future.

I considered just deriving json and sending that, but I don't much like
deriving json with data types that have named constructors (like Key
does) because again it locks in data type details.

So instead, used SimpleProtocol, with a fairly complex and unreadable
protocol. But it is as efficient as the p2p protocol at least, and as
future proof.

(Writing my own custom json instances would have worked but I thought
of it too late and don't want to do all the work twice. The only real
benefit might be that aeson could be faster.)

Note that, when a new protocol request type is added later, git-annex
trying to use it will cause the git-annex transferrer to display a
protocol error message. That seems ok; it would only happen if a new
git-annex found an old version of itself in PATH or the program
file. So it's unlikely, and all it can do anyway is display an error.
(The error message could perhaps be improved..)

This commit was sponsored by Jack Hill on Patreon.
2020-12-09 16:13:59 -04:00
Joey Hess
004a4f5fb1
factor out Types.Transferrer 2020-12-09 13:28:49 -04:00
Joey Hess
05c0543e8e
move new interface to git-annex transfer
This is to avoid breakage when upgrading or downgrading git-annex with a
process running that uses the interface. It's better to keep the
compatability code for a few years than worry about such breakage.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-12-09 12:33:56 -04:00
Joey Hess
41f2c308ff
stall detection is working
New config annex.stalldetection, remote.name.annex-stalldetection, which
can be used to deal with remotes that stall during transfers, or are
sometimes too slow to want to use.

This commit was sponsored by Luke Shumaker on Patreon.
2020-12-08 15:22:18 -04:00
Joey Hess
47016fc656
move TransferrerPool from Assistant state to Annex state
This commit was sponsored by Graham Spencer on Patreon.
2020-12-07 13:21:35 -04:00
Joey Hess
72e5764a87
move TransferrerPool from assistant
This old code will now be useful for git-annex beyond the assistant.

git-annex won't use the CheckTransferrer part, and won't run transferkeys
as a batch process, and will want withTransferrer to not shut down
transferkeys processes. Still, the rest of this is a good fit for what I
need now.

Also removed some dead code, and simplified a little bit.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-12-07 12:50:48 -04:00
Joey Hess
438d5be1f7
support prompt in message serialization
That seems to be the last thing needed for message serialization.
Although it's only used in the assistant currently, so hard to tell if I
forgot something.

At this point, it should be possible to start using transferkeys
when performing transfers, which will allow killing a transferkeys
process if a transfer times out or stalls. But that's for another day.

This commit was sponsored by Ethan Aubin.
2020-12-04 14:54:09 -04:00
Joey Hess
31e417f351
finish message serialization of progress meters
Any given transfer can only display 1 progress meter at a time, or so
this code assumes. In some cases, there are progress meters for
different stages of a transfer, perhaps, and that is supported by this.

This commit was sponsored by Ethan Aubin.
2020-12-04 13:50:46 -04:00
Joey Hess
cad147cbbf
new protocol for transferkeys, with message serialization
Necessarily threw out the old protocol, so if an old git-annex assistant
is running, and starts a transferkeys from the new git-annex, it would
fail. But, that seems unlikely; the assistant starts up transferkeys
processes and then keeps them running. Still, may need to test that
scenario.

The new protocol is simple read/show and looks like this:

TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo"))
TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9}))
TransferOutput (OutputMessage "(checksum...) ")
TransferResult True

Granted, this is not optimally fast, but it seems good enough, and is
probably nearly as fast as the old protocol anyhow.

emitSerializedOutput for ProgressMeter is not yet implemented. It needs
to somehow start or update a progress meter. There may need to be a new
message that allocates a progress meter, and then have ProgressMeter
update it.

This commit was sponsored by Ethan Aubin
2020-12-03 16:21:20 -04:00
Joey Hess
82dbc4387c
comments 2020-12-03 14:57:22 -04:00
Joey Hess
e7f42e2ec7
when serializing messages, include json objects
This is done always, it's up to the comsumer to decide if it wants to
output the json objects or the messages.

Messages.JSON.finalize changed to not need a JSONOptions.
As far as I can see, this does not change its behavior,
since addErrorMessage appends to any list that's already there.

This commit was sponsored by Ethan Aubin.
2020-12-03 14:47:04 -04:00
Joey Hess
5a41e46bd4
start on serializing Messages
Json objects not yet handled, and some other special cases, but this is
the bulk of the messages.

For progress meters, POSIXTime does not have a Read instance (or a
suitable Show instance), so had to switch to using a Double for progress
meters.

This commit was sponsored by Ethan Aubin on Patreon.
2020-12-03 13:03:03 -04:00
Joey Hess
ca4a928635
add show instance 2020-12-01 15:39:57 -04:00
Joey Hess
0896038ba7
annex.adjustedbranchrefresh
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.

Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:

* If two files point to one file, then eg, `git annex get foo` will
  update the branch to unlock foo, but will not unlock bar, because it
  does not know about it. Might be fixable by making `git annex get
  bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
  of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info

Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
2020-11-16 14:27:28 -04:00
Joey Hess
ccfa9b2dc4
make sync update --unlock-present branch 2020-11-13 15:04:34 -04:00
Joey Hess
e66b7d2e1b
rename to --unlock-present and better reverse adjusting
An --unlock-present branch reverses back to a branch where
all files that get modified or renamed become locked, even if they were
originally unlocked. This is the same that reversing a --unlock branch
works, and the new name makes that commonality more clear.
2020-11-13 14:56:43 -04:00
Joey Hess
c8e49c5ef5
git-annex adjust --lock-missing
Like --hide-missing the branch does not get updated when content
availability changes.

Seems to basically work, but sync does not update it yet.

Also, when a file is present and so unlocked, git mv followed by
git-annex sync results in the basis branch being updated to contain the
file with the new name, unlocked. This seems different than what
happens in an adjusted unlocked branch, where the commit propigates back
locked. Probably the reverse adjustment code needs to be improved to
handle this case.
2020-11-13 13:39:44 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
f45ad178cb
more RawFilePath conversion
At 318/645 after 4k lines of changes

This commit was sponsored by Jake Vosloo on Patreon.
2020-10-29 12:03:50 -04:00
Joey Hess
e505c03bcc
more RawFilePath conversion
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.

This commit was sponsored by Graham Spencer on Patreon.
2020-10-29 10:50:29 -04:00
Joey Hess
8b74f01a26
split ProvidedInfo and UserProvidedInfo
The latter is for git-annex matchexpression and matching against it can
throw an exception. Splitting out the former reduces the potential for
mistakes and avoids needing to worry about matching against that
throwing an exception.

This is more groundwork for matching largefiles while importing,
without downloading content.

This commit was sponsored by Graham Spencer on Patreon.
2020-09-28 12:12:38 -04:00
Joey Hess
00dbe35fbc
allow matching on files whose content is not present
Anything that needs to examine the file content will fail to match,
or fall back to other available information. But the intent is that the
matcher be checked for matchNeedsFileContent and only be used if it does
not, so the exact behavior doesn't much matter as it should never
happen.

The real point of this is to not need to provide a dummy content file
when matching.

This commit was sponsored by Martin D on Patreon.
2020-09-28 11:17:46 -04:00
Joey Hess
ace02f41b0
seek: defer matcher check until more info is known
Sped up seeking for files to operate on, when using options like --copies
or --in, by around 20%.

Benchmark showed an increase for --copies from 155 seconds to 121
seconds, and --in remote will be similar to that.

For --in here, the speedup was less, 5-10% or so.

(both warm cache)

This commit was sponsored by Jack Hill on Patreon.
2020-09-24 17:59:12 -04:00
Joey Hess
d89984b121
sync --all avoid unncessary first pass
Sped up seeking to around twice as fast, by avoiding a pass over the
worktree files when preferred content expressions of the local repo and
remotes don't use include=/exclude=.

Thanks to Lukey for identifying the optimisation.

This commit was sponsored by Brock Spratlen on Patreon.
2020-09-24 15:12:09 -04:00
Joey Hess
c1b4d76e6b
make MatchFiles introspectable
matchNeedsFileContent is not used yet, but shows how to add information
about terminals. That one would be needed for
https://git-annex.branchable.com/todo/sync_fast_import/

Note the tricky bit in Annex.FileMatcher.call where it folds over the
included matcher to propagate the information.

This commit was sponsored by Svenne Krap on Patreon.
2020-09-24 14:01:53 -04:00
Joey Hess
83df401d93
Merge branch 'batchasync' into master 2020-09-16 13:02:58 -04:00
Joey Hess
77c42782d0
differentiate between concurrency enabled at command line and by git config
The latter should not affect --batch mode.
2020-09-16 11:47:12 -04:00
Joey Hess
8471df3b6d
rename Configurable for clarity 2020-09-16 11:16:48 -04:00
Joey Hess
fcf5d11c63
add "input" field to json output
The use case of this field is mostly to support -J combined with --json.
When that is implemented, a user will be able to look at the field to
determine which of the requests they have sent it corresponds to.

The field typically has a single value in its list, but in some cases
mutliple values (eg 2 command-line params) are combined together and the
list will have more.

Note that json parsing was already non-strict, so old git-annex metadata
--json --batch can be fed json produced by the new git-annex and will
not stumble over the new field.
2020-09-15 16:22:44 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
ddf963d019
deepseq all things returned from ResourceT http
Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/
although I don't know if it does.

My thinking is, ResourceT may allocate a resource and then free it,
and a unforced thunk to that resource could result in reading memory
that has since been overwritten by something else, or in a SEGV,
depending. While that seems kind of like a bug in ResourceT to me, if it
is what's happening, this will avoid it. If it's not, this doesn't
really hurt much since the values are all smallish.

This commit was sponsored by Graham Spencer on Patreon.
2020-09-14 18:30:06 -04:00
Joey Hess
d120c73302
sync, assistant: When merge.directoryRenames is not set, default it it to "false"
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".

This is not done when automatic merge conflict resolution is disabled.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-09-07 13:50:58 -04:00
Joey Hess
e36bae74da
Exposed annex.forward-retry git config
One reason is, 5 is an arbitrary number so ought to be configurable.

The real reason though, is I wanted to make the man page explain when
forward retry can override annex.retry, and having a config made the
man page easier to write.
2020-09-04 15:16:40 -04:00
Joey Hess
a47787934a
remove Show Cipher
committed by accident, and could have been a security hole, but since
this compiles, it was not
2020-09-01 18:11:22 -04:00
Joey Hess
4c58433c48
avoid using MonadFail in ParseDuration
There's no instance for Either String, so that makes it not as useful as
it could be, so instead just return an Either String.
2020-08-15 15:53:35 -04:00
Joey Hess
5a5873e052
fix bug caught by test suite 2020-07-31 16:11:50 -04:00
Joey Hess
f75be32166
external backends wip
It's able to start them up, the only thing not implemented is generating
and verifying keys. And, the key translation for HasExt.
2020-07-29 15:23:18 -04:00
Joey Hess
c4cc2cdf4c
rename getKey to genKey
for consistency with external backend protocol
2020-07-20 14:06:05 -04:00
Joey Hess
172743728e
move cryptographicallySecure into Backend type
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.

Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
2020-07-20 12:17:42 -04:00
Joey Hess
9483b10469
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.

Especially when the --all optimisation in the previous commit
pre-cached the location log.

This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.

The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.

But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.

Clearly there could be some further benchmarking and tuning here.
2020-07-07 14:18:55 -04:00
Joey Hess
e72ec8b9b2
add back git-annex branch read cache
The cache was removed way back in 2012,
commit 3417c55189

Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".

The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.

Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
2020-07-06 12:22:33 -04:00
Joey Hess
57cceac569
simplify interface by removing size
Add size to the returned key after the fact, unless the remote happened
to add it itself.
2020-07-03 14:22:22 -04:00