git-annex

Author	SHA1	Message	Date
Joey Hess	095cdc7e83	extend transferrer protocol to send progress bar total size updates New protocol is not back-compat with old one, but it's never been released so that's ok.	2020-12-11 12:42:28 -04:00
Joey Hess	94b323a8e8	use TotalSize more extensively	2020-12-11 12:10:43 -04:00
Joey Hess	04c12aa6df	custom protocol for transferrer Rather than using Read/Show, which would force me to preserve data types into the future. I considered just deriving json and sending that, but I don't much like deriving json with data types that have named constructors (like Key does) because again it locks in data type details. So instead, used SimpleProtocol, with a fairly complex and unreadable protocol. But it is as efficient as the p2p protocol at least, and as future proof. (Writing my own custom json instances would have worked but I thought of it too late and don't want to do all the work twice. The only real benefit might be that aeson could be faster.) Note that, when a new protocol request type is added later, git-annex trying to use it will cause the git-annex transferrer to display a protocol error message. That seems ok; it would only happen if a new git-annex found an old version of itself in PATH or the program file. So it's unlikely, and all it can do anyway is display an error. (The error message could perhaps be improved..) This commit was sponsored by Jack Hill on Patreon.	2020-12-09 16:13:59 -04:00
Joey Hess	004a4f5fb1	factor out Types.Transferrer	2020-12-09 13:28:49 -04:00
Joey Hess	05c0543e8e	move new interface to git-annex transfer This is to avoid breakage when upgrading or downgrading git-annex with a process running that uses the interface. It's better to keep the compatability code for a few years than worry about such breakage. This commit was sponsored by Brett Eisenberg on Patreon.	2020-12-09 12:33:56 -04:00
Joey Hess	41f2c308ff	stall detection is working New config annex.stalldetection, remote.name.annex-stalldetection, which can be used to deal with remotes that stall during transfers, or are sometimes too slow to want to use. This commit was sponsored by Luke Shumaker on Patreon.	2020-12-08 15:22:18 -04:00
Joey Hess	47016fc656	move TransferrerPool from Assistant state to Annex state This commit was sponsored by Graham Spencer on Patreon.	2020-12-07 13:21:35 -04:00
Joey Hess	72e5764a87	move TransferrerPool from assistant This old code will now be useful for git-annex beyond the assistant. git-annex won't use the CheckTransferrer part, and won't run transferkeys as a batch process, and will want withTransferrer to not shut down transferkeys processes. Still, the rest of this is a good fit for what I need now. Also removed some dead code, and simplified a little bit. This commit was sponsored by Mark Reidenbach on Patreon.	2020-12-07 12:50:48 -04:00
Joey Hess	438d5be1f7	support prompt in message serialization That seems to be the last thing needed for message serialization. Although it's only used in the assistant currently, so hard to tell if I forgot something. At this point, it should be possible to start using transferkeys when performing transfers, which will allow killing a transferkeys process if a transfer times out or stalls. But that's for another day. This commit was sponsored by Ethan Aubin.	2020-12-04 14:54:09 -04:00
Joey Hess	31e417f351	finish message serialization of progress meters Any given transfer can only display 1 progress meter at a time, or so this code assumes. In some cases, there are progress meters for different stages of a transfer, perhaps, and that is supported by this. This commit was sponsored by Ethan Aubin.	2020-12-04 13:50:46 -04:00
Joey Hess	cad147cbbf	new protocol for transferkeys, with message serialization Necessarily threw out the old protocol, so if an old git-annex assistant is running, and starts a transferkeys from the new git-annex, it would fail. But, that seems unlikely; the assistant starts up transferkeys processes and then keeps them running. Still, may need to test that scenario. The new protocol is simple read/show and looks like this: TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo")) TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9})) TransferOutput (OutputMessage "(checksum...) ") TransferResult True Granted, this is not optimally fast, but it seems good enough, and is probably nearly as fast as the old protocol anyhow. emitSerializedOutput for ProgressMeter is not yet implemented. It needs to somehow start or update a progress meter. There may need to be a new message that allocates a progress meter, and then have ProgressMeter update it. This commit was sponsored by Ethan Aubin	2020-12-03 16:21:20 -04:00
Joey Hess	82dbc4387c	comments	2020-12-03 14:57:22 -04:00
Joey Hess	e7f42e2ec7	when serializing messages, include json objects This is done always, it's up to the comsumer to decide if it wants to output the json objects or the messages. Messages.JSON.finalize changed to not need a JSONOptions. As far as I can see, this does not change its behavior, since addErrorMessage appends to any list that's already there. This commit was sponsored by Ethan Aubin.	2020-12-03 14:47:04 -04:00
Joey Hess	5a41e46bd4	start on serializing Messages Json objects not yet handled, and some other special cases, but this is the bulk of the messages. For progress meters, POSIXTime does not have a Read instance (or a suitable Show instance), so had to switch to using a Double for progress meters. This commit was sponsored by Ethan Aubin on Patreon.	2020-12-03 13:03:03 -04:00
Joey Hess	ca4a928635	add show instance	2020-12-01 15:39:57 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	ccfa9b2dc4	make sync update --unlock-present branch	2020-11-13 15:04:34 -04:00
Joey Hess	e66b7d2e1b	rename to --unlock-present and better reverse adjusting An --unlock-present branch reverses back to a branch where all files that get modified or renamed become locked, even if they were originally unlocked. This is the same that reversing a --unlock branch works, and the new name makes that commonality more clear.	2020-11-13 14:56:43 -04:00
Joey Hess	c8e49c5ef5	git-annex adjust --lock-missing Like --hide-missing the branch does not get updated when content availability changes. Seems to basically work, but sync does not update it yet. Also, when a file is present and so unlocked, git mv followed by git-annex sync results in the basis branch being updated to contain the file with the new name, unlocked. This seems different than what happens in an adjusted unlocked branch, where the commit propigates back locked. Probably the reverse adjustment code needs to be improved to handle this case.	2020-11-13 13:39:44 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	f45ad178cb	more RawFilePath conversion At 318/645 after 4k lines of changes This commit was sponsored by Jake Vosloo on Patreon.	2020-10-29 12:03:50 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	8b74f01a26	split ProvidedInfo and UserProvidedInfo The latter is for git-annex matchexpression and matching against it can throw an exception. Splitting out the former reduces the potential for mistakes and avoids needing to worry about matching against that throwing an exception. This is more groundwork for matching largefiles while importing, without downloading content. This commit was sponsored by Graham Spencer on Patreon.	2020-09-28 12:12:38 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	ace02f41b0	seek: defer matcher check until more info is known Sped up seeking for files to operate on, when using options like --copies or --in, by around 20%. Benchmark showed an increase for --copies from 155 seconds to 121 seconds, and --in remote will be similar to that. For --in here, the speedup was less, 5-10% or so. (both warm cache) This commit was sponsored by Jack Hill on Patreon.	2020-09-24 17:59:12 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	c1b4d76e6b	make MatchFiles introspectable matchNeedsFileContent is not used yet, but shows how to add information about terminals. That one would be needed for https://git-annex.branchable.com/todo/sync_fast_import/ Note the tricky bit in Annex.FileMatcher.call where it folds over the included matcher to propagate the information. This commit was sponsored by Svenne Krap on Patreon.	2020-09-24 14:01:53 -04:00
Joey Hess	83df401d93	Merge branch 'batchasync' into master	2020-09-16 13:02:58 -04:00
Joey Hess	77c42782d0	differentiate between concurrency enabled at command line and by git config The latter should not affect --batch mode.	2020-09-16 11:47:12 -04:00
Joey Hess	8471df3b6d	rename Configurable for clarity	2020-09-16 11:16:48 -04:00
Joey Hess	fcf5d11c63	add "input" field to json output The use case of this field is mostly to support -J combined with --json. When that is implemented, a user will be able to look at the field to determine which of the requests they have sent it corresponds to. The field typically has a single value in its list, but in some cases mutliple values (eg 2 command-line params) are combined together and the list will have more. Note that json parsing was already non-strict, so old git-annex metadata --json --batch can be fed json produced by the new git-annex and will not stumble over the new field.	2020-09-15 16:22:44 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	ddf963d019	deepseq all things returned from ResourceT http Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/ although I don't know if it does. My thinking is, ResourceT may allocate a resource and then free it, and a unforced thunk to that resource could result in reading memory that has since been overwritten by something else, or in a SEGV, depending. While that seems kind of like a bug in ResourceT to me, if it is what's happening, this will avoid it. If it's not, this doesn't really hurt much since the values are all smallish. This commit was sponsored by Graham Spencer on Patreon.	2020-09-14 18:30:06 -04:00
Joey Hess	d120c73302	sync, assistant: When merge.directoryRenames is not set, default it it to "false" Works better with automatic merge conflict resolution than git's ususual default of "conflict". This is not done when automatic merge conflict resolution is disabled. This commit was sponsored by Mark Reidenbach on Patreon.	2020-09-07 13:50:58 -04:00
Joey Hess	e36bae74da	Exposed annex.forward-retry git config One reason is, 5 is an arbitrary number so ought to be configurable. The real reason though, is I wanted to make the man page explain when forward retry can override annex.retry, and having a config made the man page easier to write.	2020-09-04 15:16:40 -04:00
Joey Hess	a47787934a	remove Show Cipher committed by accident, and could have been a security hole, but since this compiles, it was not	2020-09-01 18:11:22 -04:00
Joey Hess	4c58433c48	avoid using MonadFail in ParseDuration There's no instance for Either String, so that makes it not as useful as it could be, so instead just return an Either String.	2020-08-15 15:53:35 -04:00
Joey Hess	5a5873e052	fix bug caught by test suite	2020-07-31 16:11:50 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	c4cc2cdf4c	rename getKey to genKey for consistency with external backend protocol	2020-07-20 14:06:05 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	9483b10469	cache one more log file for metadata My worry was that a preferred content expression that matches on metadata would have removed the location log from cache, causing an expensive re-read when a Seek action later checked the location log. Especially when the --all optimisation in the previous commit pre-cached the location log. This also means that the --all optimisation could cache the metadata log too, if it wanted too, but not currently done. The cache is a list, with the most recently accessed file first. That optimises it for the common case of reading the same file twice, eg a get, examine, followed by set reads it twice. And sync --content reads the location log 3 times in a row commonly. But, as a list, it should not be made to be too long. I thought about expanding it to 5 items, but that seemed unlikely to be a win commonly enough to outweigh the extra time spent checking the cache. Clearly there could be some further benchmarking and tuning here.	2020-07-07 14:18:55 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	57cceac569	simplify interface by removing size Add size to the returned key after the fact, unless the remote happened to add it itself.	2020-07-03 14:22:22 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	b651d3ede0	test: Fix some test cases that assumed git's default branch name git is making that configurable, and configuring it globally would break the test suite in a few places. No other part of git-annex assumes any branch name. Renamed a few placeholders to make that clearer. This commit was sponsored by Jake Vosloo on Patreon.	2020-06-23 16:40:51 -04:00
Joey Hess	a1d4c8e4ec	external: SETCREDS include creds in externalConfigChanges This makes the creds get saved, since only things recorded there will be saved. IIRC, unparsedRemoteConfig was not originally available when I implemented this; now that it is things get a bit simpler. More could probably be simplified, is externalConfigChanges needed at all? This does not entirely fix the bugs though, because creds are only embedded when embedcreds=yes, but not when encryption=pubkey is used without embedcreds=yes.	2020-06-16 17:24:24 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	e04a931439	improve transfer stages for some commands move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup actions in separate job pool from uploads. transferStages was confusingly named, it's only useful when doing downloads as then the verify actions can be run concurrently with other downloads. For commands that upload, there will be more concurrency from running cleanup actions in a separate job pool. As for sync, I left it using downloadStages although that's not optimal for the part of a sync that uploads. Perhaps it should use the union of both?	2020-05-26 11:55:50 -04:00
Joey Hess	2de40ba4da	improve docs around uri claiming	2020-05-21 11:58:57 -04:00

1 2 3 4 5 ...

615 commits