git-annex

Author	SHA1	Message	Date
Joey Hess	8b5fc94d50	add optional object file location to storeKey This will be used by the next commit to simplify the proxy.	2024-07-01 10:42:27 -04:00
Joey Hess	f04d9574d6	fix transfer lock file for Download to not include uuid While redundant concurrent transfers were already prevented in most cases, it failed to prevent the case where two different repositories were sending the same content to the same repository. By removing the uuid from the transfer lock file for Download transfers, one repository sending content will block the other one from also sending the same content. In order to interoperate with old git-annex, the old lock file is still locked, as well as locking the new one. That added a lot of extra code and work, and the plan is to eventually stop locking the old lock file, at some point in time when an old git-annex process is unlikely to be running at the same time. Note that in the case of 2 repositories both doing eg `git-annex copy foo --to origin` the output is not that great: copy b (to origin...) transfer already in progress, or unable to take transfer lock git-annex: transfer already in progress, or unable to take transfer lock 97% 966.81 MiB 534 GiB/s 0sp2pstdio: 1 failed Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe)) Transfer failed Perhaps that output could be cleaned up? Anyway, it's a lot better than letting the redundant transfer happen and then failing with an obscure error about a temp file, which is what it did before. And it seems users don't often try to do this, since nobody ever reported this bug to me before. (The "97%" there is actually how far along the other transfer is.) Sponsored-by: Joshua Antonishen on Patreon	2024-03-25 14:47:46 -04:00
Joey Hess	62129f0b24	fix windows transfer lock check If the lock file was not able to be exclusivlely locked, don't indicate locking failed. I'm pretty sure this was a typo. It goes all the way back to `891c85cd88` where locking was first introduced on windows, and there's no indication of why it would make sense to return True here. Sponsored-by: Leon Schuermann on Patreon	2024-03-25 14:11:25 -04:00
Joey Hess	cc17ac423b	implement isCryptographicallySecureKey for VURL Considerable difficulty to work around an import cycle. Had to move the list of backends (except for VURL) to Backend.Variety to VURL could use it. Sponsored-by: Kevin Mueller on Patreon	2024-02-29 17:26:35 -04:00
Joey Hess	e7b7ea78af	lift isCryptographicallySecure to Annex Needed for VURL backend. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:14:13 -04:00
Joey Hess	20567e605a	add directional stalldetection and bwlimit configs Sponsored-by: Dartmouth College's DANDI project	2024-01-19 15:27:53 -04:00
Joey Hess	f6cf2dec4c	disk free checking for unsized keys Improve disk free space checking when transferring unsized keys to local git remotes. Since the size of the object file is known, can check that instead. Getting unsized keys from local git remotes does not check the actual object size. It would be harder to handle that direction because the size check is run locally, before anything involving the remote is done. So it doesn't know the size of the file on the remote. Also, transferring unsized keys to other remotes, including ssh remotes and p2p remotes don't do disk size checking for unsized keys. This would need a change in protocol. (It does seem like it would be possible to implement the same thing for directory special remotes though.) In some sense, it might be better to not ever do disk free checking for unsized keys, than to do it only sometimes. A user might notice this direction working and consider it a bug that the other direction does not. On the other hand, disk reserve checking is not implemented for most special remotes at all, and yet it is implemented for a few, which is also inconsistent, but best effort. And so doing this best effort seems to make some sense. Fundamentally, if the user wants the size to always be checked, they should not use unsized keys. Sponsored-by: Brock Spratlen on Patreon	2024-01-16 14:29:10 -04:00
Joey Hess	aff37fc208	avoid annexFileMode special case This makes annexFileMode be just an application of setAnnexPerm', which avoids having 2 functions that do different versions of the same thing. Fixes some buggy behavior for some combinations of core.sharedRepository and umask. Sponsored-by: Jack Hill on Patreon	2023-04-27 15:58:37 -04:00
Joey Hess	8b6c7bdbcc	filter out control characters in all other Messages This does, as a side effect, make long notes in json output not be indented. The indentation is only needed to offset them underneath the display of the file they apply to, so that's ok. Sponsored-by: Brock Spratlen on Patreon	2023-04-11 12:58:01 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	24ae4b291c	addurl, importfeed: Fix failure when annex.securehashesonly is set The temporary URL key used for the download, before the real key is generated, was blocked by annex.securehashesonly. Fixed by passing the Backend that will be used for the final key into runTransfer. When a Backend is provided, have preCheckSecureHashes check that, rather than the key being transferred. Sponsored-by: unqueued on Patreon	2023-03-27 15:10:46 -04:00
Joey Hess	579d9b60c1	improve concurrency of move/copy --from --to Use separate stages for download and upload. In the common case where it downloads the file from one remote and then uploads to the other, those are by far the most expensive operations, and there's a decent chance the two remotes bottleneck on different resources. Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads will be started both downloading from the src remote. They will probably finish at the same time. Then two threads will be started uploading to the dst remote. They will probably take the same time as well. Before this change, it would alternate back and forth, bottlenecking on src and dst. With this change, as soon as the two threads start uploading to dst, two more threads are able to start, downloading from src. So bandwidth to both remotes is saturated more often. Other commands that use transferStages only send in one direction at a time. So the worker threads for the other direction will sit idle, and there will be no change in their behavior. Sponsored-by: Dartmouth College's DANDI project	2023-01-24 13:59:39 -04:00
Joey Hess	1abd457e98	push location log updating up to callers of download Prep for move --to --from, which needs to download from a src repo without updating the location log for the local repo, before sending the content on to the dest repo. Note that caller of download' already update the log themselves. See previous commit `a422a056f2` that pushed it up to download from getViaTmpFrom. (Also removed in passing a debug print + readline that I accidentially committed last week on this branch.) Sponsored-by: Dartmouth College's DANDI project	2023-01-23 13:47:41 -04:00
Joey Hess	a3cdff3fd5	add a comment about checkSaneLock See commit `8c2dd7d8ee` for original introduction of it, but needing to spelunk that far back to understand the code is not good.	2021-10-27 14:55:30 -04:00
Joey Hess	55bfa414b3	move transfer already in progress message to warning This makes it be displayed in the error-messages field with --json-error-messages. And with --quiet, it will let it be displayed, which makes sense because it's telling the user why what they requested to do has failed to happen.	2021-10-27 14:46:21 -04:00
Joey Hess	4fef94d764	simplify annex.stalldetection handling RemoteGitConfig parsing looks for annex.stalldetection when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Noam Kremen on Patreon	2021-09-22 10:46:10 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	51c696679f	avoid using temp file size when deciding whether to retry failed transfer When stall detection is enabled, and a transfer is in progress, it would display a doubled message: (transfer already in progress, or unable to take transfer lock) (transfer already in progress, or unable to take transfer lock) That happened because the forward retry decider had a start size of 0, and an end size of whatever amount of the object the other process had downloaded. So it incorrectly thought that the transferrer process had made progress, when it had in fact immediately given up with that message. Instead, use the reported value from the progress meter. If a remote does not report progress, this will mean it doesn't forward retry, in a situation where it used to. But most remotes do report progress, and any remote that does not can be fixed to, by using watchFileSize when downloading. Also, some remotes might preallocate the temp file (eg bittorrent), so relying on statting its size at this level to get progress is dubious. The same change was made to Annex/Transfer.hs, although only Annex/TransferrerPool.hs needed to be changed to avoid the duplicate message. (An alternate fix would have been to start the retry decider with the size of the object file before downloading begins, rather than 0.) Sponsored-by: Brett Eisenberg on Patreon	2021-06-25 12:04:23 -04:00
Joey Hess	c2f612292a	start splitting out readonly values from AnnexState Values in AnnexRead can be read more efficiently, without MVar overhead. Only a few things have been moved into there, and the performance increase so far is not likely to be noticable. This is groundwork for putting more stuff in there, particularly a value that indicates if debugging is enabled. The obvious next step is to change option parsing to not run in the Annex monad to set values in AnnexState, and instead return a pure value that gets stored in AnnexRead.	2021-04-02 15:51:44 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	04c12aa6df	custom protocol for transferrer Rather than using Read/Show, which would force me to preserve data types into the future. I considered just deriving json and sending that, but I don't much like deriving json with data types that have named constructors (like Key does) because again it locks in data type details. So instead, used SimpleProtocol, with a fairly complex and unreadable protocol. But it is as efficient as the p2p protocol at least, and as future proof. (Writing my own custom json instances would have worked but I thought of it too late and don't want to do all the work twice. The only real benefit might be that aeson could be faster.) Note that, when a new protocol request type is added later, git-annex trying to use it will cause the git-annex transferrer to display a protocol error message. That seems ok; it would only happen if a new git-annex found an old version of itself in PATH or the program file. So it's unlikely, and all it can do anyway is display an error. (The error message could perhaps be improved..) This commit was sponsored by Jack Hill on Patreon.	2020-12-09 16:13:59 -04:00
Joey Hess	004a4f5fb1	factor out Types.Transferrer	2020-12-09 13:28:49 -04:00
Joey Hess	41f2c308ff	stall detection is working New config annex.stalldetection, remote.name.annex-stalldetection, which can be used to deal with remotes that stall during transfers, or are sometimes too slow to want to use. This commit was sponsored by Luke Shumaker on Patreon.	2020-12-08 15:22:18 -04:00
Joey Hess	fcc9e01556	finally using transferkeys Seems to work! Even progress bars. Have not tested prompting or various error message displays yet. transferkeys had to be made to operate in different modes for the Assistant and Annex monads. A bit ugly, but it did relegate that really ugly Database.Keys.closeDb in transferkeys to only the assistant code path. This commit was sponsored by Noam Kremen.	2020-12-07 16:18:26 -04:00
Joey Hess	4c47568876	refactoring This is groundwork for using git-annex transferkeys to run transfers, in order to allow stalled transfers to be interrupted and retried. The new upload and download are closer to what git-annex transferkeys does, so the plan is to make them use it. Then things that were left using upload' and download' won't recover from stalls. Notably, that includes import and export. But at least get/move/copy will be able to. (Also the assistant hopefully, but not yet.) This commit was sponsored by Jake Vosloo on Patreon.	2020-12-07 14:49:17 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	77c42782d0	differentiate between concurrency enabled at command line and by git config The latter should not affect --batch mode.	2020-09-16 11:47:12 -04:00
Joey Hess	e36bae74da	Exposed annex.forward-retry git config One reason is, 5 is an arbitrary number so ought to be configurable. The real reason though, is I wanted to make the man page explain when forward retry can override annex.retry, and having a config made the man page easier to write.	2020-09-04 15:16:40 -04:00
Joey Hess	1a42b2c5a3	combine retry deciders in better way This fixes the problem that, if forwardRetry was checked for the first 5 and decided to retry, the 6th would go to configuredRetry which would see the counter was 6 and so wait retry-delay*2^5 seconds (default 32). Now, it waits for retry-delay before each retry, even when forwardRetry initiated the retry.	2020-09-04 12:48:30 -04:00
Joey Hess	1d244bafbd	Limit retrying of failed transfers when forward progress is being made to 5 To avoid some unusual edge cases where too much retrying could result in far more data transfer than makes sense.	2020-09-04 12:46:37 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	fe9cf1256e	move remoteList into dupState This does mean that RemoteDaemon.Transport.Tor's call runs it, otherwise no change, but this is groundwork for doing more such expensive actions in dupState.	2020-04-17 14:36:45 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	9d36c826c0	use fine-grained WorkerStages when transferring and verifying This means that Command.Move and Command.Get don't need to manually set the stage, and is a lot cleaner conceptually. Also, this makes Command.Sync.syncFile use the worker pool better. In the scenario where it first downloads content and then uploads it to some other remotes, it will start in TransferStage, then enter VerifyStage and then go back to TransferStage for each transfer to the remotes. Before, it entered CleanupStage after the download, and stayed in it for the upload, so too many transfer jobs could run at the same time. Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent inside onLocal. That has a Annex state for the remote, with no worker pool. So the resulting calls to enteringStage won't block in there. While Remote.Git.copyToRemote does do checksum verification, I realized that should not use a verification slot in the WorkerPool to do it. Because, it's reading back from eg, a removable disk to checksum. That will contend with other writes to that disk. It's best to treat that checksum verification as just part of the transer. So, removed the todo item about that, as there's nothing needing to be done.	2019-06-19 13:24:20 -04:00
Joey Hess	82186ca58f	annex.jobs=cpus etc Added the ability to run one job per CPU (core), by setting annex.jobs=cpus, or using option --jobs=cpus or -Jcpus. Built with future expansion in mind, including not defaulting matching on Concurrency so more constructors can later be added, and using "cpu" instead of "0".	2019-05-10 13:27:08 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	9127fe4821	add DebugLocks build flag Using the method described in https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell but my own code to implement it, and with callstacks added. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-19 15:02:43 -04:00
Joey Hess	983c9d5a53	git-annex-shell: fix transfer hang Fix hang when transferring the same objects to two different clients at the same time. (Or when annex.pidlock is used, two different objects to the same or different clients.) Could also potentially occur if a client was downloading an object and somehow lost connection but that git-annex-shell was still running and holding the transfer lock. This does not guarantee that, if `transfer` fails for some other reason, a DATA response will be made. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-06 13:00:37 -04:00
Joey Hess	1a02fc1159	Fix wrong sorting of remotes when using -J It was sorting by uuid, rather than cost! Avoid future bugs of this kind by changing the Ord to primarily compare by cost, with uuid only used when the cost is the same. This commit was supported by the NSF-funded DataLad project.	2018-08-03 13:10:50 -04:00
Joey Hess	db720f6a9c	Display error message when http download fails. * Display error message when http download fails. There's nothing in the http-client library to nicely format a http exception, so in some cases it has to fall back to using show on it. Seems better than just saying "it failed" or only showing the http status code. * Avoid forward retry when 0 bytes were received. forwardRetry was comparing Nothing to Just 0, and so thought there had been progress made when 0 bytes were received. This commit was supported by the NSF-funded DataLad project.	2018-05-08 16:11:45 -04:00
Joey Hess	9ec1d6b077	add units	2018-03-29 13:31:53 -04:00
Joey Hess	961fa377d9	Also do forward retrying in cases where no exception is thrown, but the transfer failed. I think this used to be the case, but it was accidentially lost way back in commit `3887432c54`. Normally, transfers do not throw exceptions, so probably forward retrying was rarely done due to that oversight. This also affects the new annex.retry etc configuration. If a transfer fails, without making any progress, eg because the file is not present on the remote or the remote is not accessible, it will now retry when configuration calls for it. In some cases such a retry is not desirable, for example the remote could be accessible and not have a copy of the file that the local repo thinks it has. I see no way to distinguish such cases from cases where a retry should really be done. So, it'll be up to the user to configure it to work for them.	2018-03-29 13:22:49 -04:00
Joey Hess	46d4316954	implement annex.retry et al Added annex.retry, annex.retry-delay, and per-remote versions to configure transfer retries. This commit was supported by the NSF-funded DataLad project.	2018-03-29 13:04:07 -04:00
Joey Hess	ed81762c86	avoid compiler warning add type sig so it's clear createtfile returns unit	2018-03-15 13:21:32 -04:00
Joey Hess	10d3b7fc62	Fix reversion introduced in 6.20171214 that caused concurrent transfers to incorrectly fail with "transfer already in progress". Avoid creating transfer info file before transfer lock is created and locked. The wrong order for one thing caused transfer info to be overwritten when a transfer was already in progress. But worse, it caused checkTransfer to see the transfer info, and so lock the transfer lock in order to verify the transfer was not in progress. Which in a concurrent situation, prevented the transferrer from locking the transfer lock, so it failed with "transfer already in progress". Note that the transferinfo command does not lock the transfer lock before creating the transfer info. But, that's only run after recvkey is running, and recvkey does lock the transfer lock, so that seems more or less ok. (Other than being a super complicated legacy mess that the P2P code has mostly obsoleted now.) This commit was supported by the NSF-funded DataLad project.	2018-03-14 18:55:34 -04:00

1 2

91 commits