git-annex

Author	SHA1	Message	Date
Joey Hess	f6cf2dec4c	disk free checking for unsized keys Improve disk free space checking when transferring unsized keys to local git remotes. Since the size of the object file is known, can check that instead. Getting unsized keys from local git remotes does not check the actual object size. It would be harder to handle that direction because the size check is run locally, before anything involving the remote is done. So it doesn't know the size of the file on the remote. Also, transferring unsized keys to other remotes, including ssh remotes and p2p remotes don't do disk size checking for unsized keys. This would need a change in protocol. (It does seem like it would be possible to implement the same thing for directory special remotes though.) In some sense, it might be better to not ever do disk free checking for unsized keys, than to do it only sometimes. A user might notice this direction working and consider it a bug that the other direction does not. On the other hand, disk reserve checking is not implemented for most special remotes at all, and yet it is implemented for a few, which is also inconsistent, but best effort. And so doing this best effort seems to make some sense. Fundamentally, if the user wants the size to always be checked, they should not use unsized keys. Sponsored-by: Brock Spratlen on Patreon	2024-01-16 14:29:10 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	8034f2e9bb	factor out IncrementalHasher from IncrementalVerifier	2021-11-09 12:33:22 -04:00
Joey Hess	88b63a43fa	distinguish between incremental verification failing and not being done Sponsored-by: Dartmouth College's DANDI project	2021-08-18 14:38:02 -04:00
Joey Hess	de482c7eeb	move verifyKeyContent to Annex.Verify The goal is that Database.Keys be able to use it; it can't use Annex.Content.Presence due to an import loop. Several other things also needed to be moved to Annex.Verify as a conseqence.	2021-07-27 14:07:23 -04:00
Joey Hess	6940d4ad40	chance case of "transfer failed" to match "Transfer stalled"	2021-03-06 17:47:05 -04:00
Joey Hess	4b63e932f3	incremental checksum on upload to ssh or p2p	2021-02-10 12:41:05 -04:00
Joey Hess	94f6210b68	deal with possibility of short read by S.hGet It may read less than requested, and may yield an empty string if the file was somehow shorter than expected.	2021-02-09 22:20:05 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	4c47568876	refactoring This is groundwork for using git-annex transferkeys to run transfers, in order to allow stalled transfers to be interrupted and retried. The new upload and download are closer to what git-annex transferkeys does, so the plan is to make them use it. Then things that were left using upload' and download' won't recover from stalls. Notably, that includes import and export. But at least get/move/copy will be able to. (Also the assistant hopefully, but not yet.) This commit was sponsored by Jake Vosloo on Patreon.	2020-12-07 14:49:17 -04:00
Joey Hess	0540e987b3	improve p2p protocol handling of requested object not available Avoid spurious "verification of content failed" message when downloading content from a ssh or tor remote fails due to the remote no longer having a copy of the content. The P2P protocol already handled this case by sending DATA 0, followed by VALID. But VALID was not really right, because the data is not the requested data. So, send DATA 0, followed by INVALID. Old versions of git-annex handle INVALID the same as VALID in this case. Now new versions avoid displaying an incorrect message. It would be better for the P2P protocol to have a different way to indicate this, like perhaps sending INVALID without DATA. But that would be a breaking change and need a new protocol verison. Since INVALID already is part of the protocol and already needs to be handled, using it for this special case too seems ok, and avoids the complication of another protocol version. This commit was sponsored by Jochen Bartl on Patreon.	2020-12-01 16:05:55 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	ca80c3154c	more RawFilePath conversion removeFile changed to removeLink, because AFAICS it should be fine to remove non-file things here. In particular, it's fine to remove a symlink, since we're about to write a symlink. (removeLink does not remove directories, so file, symlink, and unix socket are the only possibilities.)	2020-10-30 13:07:41 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	a0e573a3cc	comment typo	2018-11-12 11:53:44 -04:00
Joey Hess	6ecd55a9fa	Fixed some other potential hangs in the P2P protocol Finishes the start made in `983c9d5a53`, by handling the case where `transfer` fails for some other reason, and so the ReadContent callback does not get run. I don't know of a case where `transfer` does fail other than the locking dealt with in that commit, but it's good to have a guarantee. StoreContent and StoreContentTo had a similar problem. Things like `getViaTmp` may decide not to run the transfer action. And `transfer` could certianly fail, if another transfer of the same object was in progress. (Or a different object when annex.pidlock is set.) If the transfer action was not run, the content of the object would not all get consumed, and so would get interpreted as protocol commands, which would not go well. My approach to fixing all of these things is to set a TVar only once all the data in the transfer is known to have been read/written. This way the internals of `transfer`, `getViaTmp` etc don't matter. So in ReadContent, it checks if the transfer completed. If not, as long as it didn't throw an exception, send empty and Invalid data to the callback. On an exception the state of the protocol is unknown so it has to raise ProtoFailureException and close the connection, same as before. In StoreContent, if the transfer did not complete some portion of the DATA has been read, so the protocol is in an unknown state and it has to close the conection as well. (The ProtoFailureMessage used here matches the one in Annex.Transfer, which is the most likely reason. Not ideal to duplicate it..) StoreContent did not ever close the protocol connection before. So this is a protocol change, but only in an exceptional circumstance, and it's not going to break anything, because clients already need to deal with the connection breaking at any point. The way this new behavior looks (here origin has annex.pidlock = true so will only accept one upload to it at a time): git annex copy --to origin -J2 copy x (to origin...) ok copy y (to origin...) Lost connection (fd:25: hGetChar: end of file) This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-06 14:52:32 -04:00
Joey Hess	983c9d5a53	git-annex-shell: fix transfer hang Fix hang when transferring the same objects to two different clients at the same time. (Or when annex.pidlock is used, two different objects to the same or different clients.) Could also potentially occur if a client was downloading an object and somehow lost connection but that git-annex-shell was still running and holding the transfer lock. This does not guarantee that, if `transfer` fails for some other reason, a DATA response will be made. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-06 13:00:37 -04:00
Joey Hess	0b053b9611	Fix a P2P protocol hang When readContent got Nothing from prepSendAnnex, it did not run its callback, and the callback is what sends the DATA reply. sendContent checks with contentSize that the object file is present, but that doesn't really guarantee that prepSendAnnex won't return Nothing. So, it was possible for a P2P protocol GET to not receive a response, and appear to hang. When what it's really doing is waiting for the next protocol command. This seems most likely to happen when the annex is in direct mode, and the file being requested has been modified. It could also happen in an indirect mode repository if genInodeCache somehow failed. Perhaps due to a race with a drop of the content file. Fixed by making readContent behave the way its spec said it should, and run the callback with L.empty in this case. Note that, it's finee for readContent to send any amount of data to the callback, including L.empty. sendBytes deals with that by making sure it sends exactly the specified number of bytes, aborting the protocol if it's too short. So, when L.empty is sent, the protocol will end up aborting. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2018-11-02 13:41:50 -04:00
Joey Hess	6134431254	clean P2P protocol shutdown on EOF try 2 Same goal as `b18fb1e343` but without breaking backwards compatability. Just return IO exceptions when running the P2P protocol, so that git-annex-shell can detect eof and avoid the ugly message. This commit was sponsored by Ethan Aubin.	2018-09-25 16:49:59 -04:00
Joey Hess	b657242f5d	enforce retrievalSecurityPolicy Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon.	2018-06-21 13:37:01 -04:00
Joey Hess	46d4316954	implement annex.retry et al Added annex.retry, annex.retry-delay, and per-remote versions to configure transfer retries. This commit was supported by the NSF-funded DataLad project.	2018-03-29 13:04:07 -04:00
Joey Hess	31e1adc005	deal with unlocked files P2P protocol version 1 adds VALID\|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.	2018-03-13 14:27:14 -04:00
Joey Hess	e16b069331	use total size from DATA Noticed that getting a key whose size is not known resulted in a progress display that didn't include the percent complete. Fixed for P2P by making the size sent with DATA be used to update the meter's total size. In order for rateLimitMeterUpdate to also learn the total size, had to make it be passed the Meter, and some other reorg in Utility.Metered was also done so that --json-progress can construct a Meter to pass to rateLimitMeterUpdate. When the fallback rsync is done, the progress display still doesn't include the percent complete. Only way to fix that seems to be to let rsync display its output again, but that would conflict with git-annex's own progress meter, which is also being displayed. This commit was sponsored by Henrik Riomar on Patreon.	2018-03-12 21:46:58 -04:00
Joey Hess	596af7cbc4	move protocol version stuff to the Net free monad Needs to be in Net not Local, so that Net actions can take the protocol version into account. This commit was sponsored by an anonymous bitcoin donor.	2018-03-12 15:20:51 -04:00
Joey Hess	c81768d425	version the P2P protocol Unfortunately ReceiveMessage didn't handle unknown messages the way it was documented to; client sending VERSION would cause the server to return an ERROR and hang up. Fixed that, but old releases of git-annex use the P2P protocol for tor and will still have that behavior. So, version is not negotiated for Remote.P2P connections, only for Remote.Git connections, which will support VERSION from their first release. There will need to be a later flag day to change Remote.P2P; left a commented out line that is the only thing that will need to be changed then. Version 1 of the P2P protocol is not implemented yet, but updated the docs for the DATA change that will be allowed by that version. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-12 14:36:35 -04:00
Joey Hess	ba53f60801	refactor	2018-03-06 15:14:53 -04:00
Joey Hess	00be07070c	fix build on windows	2016-12-30 12:31:51 -04:00
Joey Hess	b219be5100	refactor	2016-12-30 12:31:17 -04:00
Joey Hess	38f9337e16	Revert "p2p --link now defaults to setting up a bi-directional link" This reverts commit `3037feb1bf`. On second thought, this was an overcomplication of what should be the lowest-level primitive. Let's build bi-directional links at the pairing level with eg magic wormhole.	2016-12-16 18:26:07 -04:00
Joey Hess	3037feb1bf	p2p --link now defaults to setting up a bi-directional link Both the local and remote git repositories get remotes added pointing at one-another. Makes pairing twice as easy! Security: The new LINK command in the protocol can be sent repeatedly, but only by a peer who has authenticated with us. So, it's entirely safe to add a link back to that peer, or to some other peer it knows about. Anything we receive over such a link, the peer could send us over the current connection. There is some risk of being flooded with LINKs, and adding too many remotes. To guard against that, there's a hard cap on the number of remotes that can be set up this way. This will only be a problem if setting up large p2p networks that have exceptional interconnectedness. A new, dedicated authtoken is created when sending LINK. This also allows, in theory, using a p2p network like tor, to learn about links on other networks, like telehash. This commit was sponsored by Bruno BEAUFILS on Patreon.	2016-12-16 16:38:06 -04:00
Joey Hess	16c6333f09	fix build with old ghc	2016-12-10 11:12:18 -04:00
Joey Hess	9dd510bf29	make tor hidden service work when directory watching is not available Avoid crashing when built w/o inotify..	2016-12-09 16:40:47 -04:00
Joey Hess	f7687e0876	only start ref change watcher thread once per P2P connection This is more efficient. Note that the peer will get CHANGED messages for all refs changed since the connection opened, even if those changes happened before it sent NOTIFYCHANGE.	2016-12-09 15:08:54 -04:00
Joey Hess	e152c322f8	refactor ref change watching Added to change notification to P2P protocol. Switched to a TBChan so that a single long-running thread can be started, and serve perhaps intermittent requests for change notifications, without buffering all changes in memory. The P2P runner currently starts up a new thread each times it waits for a change, but that should allow later reusing a thread. Although each connection from a peer will still need a new watcher thread to run. The dependency on stm-chans is more or less free; some stuff in yesod uses it, so it was already indirectly pulled in when building with the webapp. This commit was sponsored by Francois Marier on Patreon.	2016-12-09 15:01:09 -04:00
Joey Hess	bdf2a31424	typo	2016-12-09 12:54:12 -04:00
Joey Hess	71e8cd408e	content removal is supposed to succed if the content was already not present	2016-12-09 12:48:22 -04:00
Joey Hess	38516b2fca	update progress logs in remotedaemon send/receive	2016-12-08 19:56:02 -04:00
Joey Hess	0f4ee4f298	fix memory leak I'm unsure why this fixed it, but it did. Seems to suggest that the memory leak is not due to a bug in my code, but that ghc didn't manage to take full advantage of laziness, or was failing to gc something it could have.	2016-12-08 18:42:52 -04:00
Joey Hess	af41519126	convert P2P runners from Maybe to Either String So we get some useful error messages when things fail. This commit was sponsored by Peter Hogg on Patreon.	2016-12-08 15:47:49 -04:00
Joey Hess	0541f19bea	fix math error that caused resumes to always fail	2016-12-07 15:36:39 -04:00
Joey Hess	db79b69aa0	ReadWriteMode not AppendMode AppendMode does not allow seeking..	2016-12-07 15:24:28 -04:00
Joey Hess	99c36f318c	open file for append, not write, so resuming works WriteMode zeros any existing content, so the seek filled with zeros, and verification failed after download.	2016-12-07 15:06:07 -04:00
Joey Hess	f744bd5391	refactor	2016-12-06 15:43:03 -04:00

1 2

55 commits