git-annex

Author	SHA1	Message	Date
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	e43aaa22be	Merge branch 'p2pflagday'	2021-10-11 15:42:52 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	2e94ba9c70	remove broken code git-annex-shell fsck has never worked, back in commit `1ffb3bb0ba` I discussed maybe adding it one day, but this code has always failed.	2021-10-11 14:59:27 -04:00
Joey Hess	798b33ba3d	simplify annex.bwlimit handling RemoteGitConfig parsing looks for annex.bwlimit when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Svenne Krap on Patreon	2021-09-22 10:52:01 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	325bfda12d	refactor	2021-08-18 13:37:00 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	a644f729ce	refactor fileCopier Sponsored-by: Dartmouth College's DANDI project	2021-08-16 15:56:24 -04:00
Joey Hess	d889ae0c01	move comment	2021-08-16 15:25:06 -04:00
Joey Hess	e676cd43c0	propagate debugging into remote's Annex monad This is needed to make the debugging added in `0073384850` actually be displayed when running git-annex get from a local remote.	2021-07-26 11:40:51 -04:00
Joey Hess	635e7f3e26	split annexLocations To avoid mistakes like commit `0ccbed4f6f`, be explicit about the two variants of this. Incidentially avoids a small amount of overhead in calling reverse. Sponsored-by: Shae Erisson on Patreon	2021-07-16 14:17:56 -04:00
Joey Hess	df2001aa88	Improve display of errors when transfers fail Transfers from or to a local git repo could fail without a reason being given, if the content failed to verify, or if the object file's stat changed while it was being copied. Now display messages in these cases. Sponsored-by: Jack Hill on Patreon	2021-06-25 13:17:04 -04:00
Joey Hess	f8836306fa	remove "checking remotename" message This fixes fsck of a remote that uses chunking displaying (checking remotename) (checking remotename)" for every chunk. Also, some remotes displayed the message, and others did not, with no consistency. It was originally displayed only when accessing remotes that were expensive or might involve a password prompt, I think, but nothing in the API said when to do it so it became an inconsistent mess. Originally I thought fsck should always display it. But it only displays in fsck --from remote, so the user knows the remote is being accessed, so there is no reason to tell them it's accessing it over and over. It was also possible for git-annex move to sometimes display it twice, due to checking if content is present twice. But, the user of move specifies --from/--to, so it does not need to display when it's accessing the remote, as the user expects it to access the remote. git-annex get might display it, but only if the remote also supports hasKeyCheap, which is really only local git remotes, which didn't display it always; and in any case nothing displayed it before hasKeyCheap, which is checked first, so I don't think this needs to display it ever. mirror is like move. And that's all the main places it would have been displayed. This commit was sponsored by Jochen Bartl on Patreon.	2021-04-27 13:05:27 -04:00
Joey Hess	441f65c2cf	split out Annex.CopyFile Goal is to use it in Remote.Directory, but also it's nice to shrink Remote.Git.	2021-04-14 14:06:43 -04:00
Joey Hess	c2f612292a	start splitting out readonly values from AnnexState Values in AnnexRead can be read more efficiently, without MVar overhead. Only a few things have been moved into there, and the performance increase so far is not likely to be noticable. This is groundwork for putting more stuff in there, particularly a value that indicates if debugging is enabled. The obvious next step is to change option parsing to not run in the Annex monad to set values in AnnexState, and instead return a pure value that gets stored in AnnexRead.	2021-04-02 15:51:44 -04:00
Joey Hess	537f9d9a11	Improved display of errors when accessing a git http remote fails. New error message: Remote foo not usable by git-annex; setting annex-ignore http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1 If git config parse fails, or the git config file is not available at the url, a better error message for that is also shown. This commit was sponsored by Mark Reidenbach on Patreon.	2021-03-24 14:19:32 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	48310f2d55	windows build fix from jwodder	2021-02-15 13:35:01 -04:00
Joey Hess	f44d4704c6	incremental checksum for local remotes This benchmarks only slightly faster than the old git-annex. Eg, for a 1 gb file, 14.56s vs 15.57s. (On a ram disk; there would certianly be more of an effect if the file was written to disk and didn't stay in cache.) Commenting out the updateIncremental calls make the same run in 6.31s. May be that overhead in the implementation, other than the actual checksumming, is slowing it down. Eg, MVar access. (I also tried using 10x larger chunks, which did not change the speed.)	2021-02-10 16:05:24 -04:00
Joey Hess	48f63c2798	stop using rsync in fileCopier This is groundwork for calculating checksums while copying, rather than in a separate pass, but that's not done yet. For now, avoid using rsync (and cp on Windows), and instead read and write the file ourselves, with resume handling. Benchmarking vs old git-annex that used rsync, this is faster, at least once the file size is larger than a couple of MB.	2021-02-10 14:44:35 -04:00
Joey Hess	c4c9b99e22	refactoring	2021-02-10 13:38:45 -04:00
Joey Hess	e24ddb8946	Bugfix: fsck --from a ssh remote did not actually check that the content on the remote is not corrupted Changing to the P2P protocol broke this, because preseedTmp copies the local copy of the object to the temp file, and then the P2P transfer sees the right length file and uses it as-is. When git-annex-shell is too old and rsync is used, it did verify the content, and when the local repo does not have the object it did verify the content.	2021-02-10 13:29:12 -04:00
Joey Hess	1c75364eac	fix missing call to check after hard linking This could perhaps have caused a hard link to be made when the content of the object was modified. I don't think that actually happened, because the annexed file would have to be unlocked, with annex.thin, for the object to get modified, and in that case, a hard link is not made. However, to be sure, run the check. Note that it seemed best to run the check only once, although the current implementation is fast and safe to run repeatedly.	2021-02-10 13:07:38 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	230e1c88a9	improve display	2020-12-14 13:13:53 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	0f26782a73	fix windows build more	2020-07-02 12:01:09 -04:00
Joey Hess	00497fd38e	fix windows build	2020-07-02 11:46:26 -04:00
Joey Hess	a76b1ba3d6	local git remote autoinit improvements * Improve display of problems auto-initializing or upgrading local git remotes. * When a local git remote cannot be initialized because it has no git-annex branch or a .noannex file, avoid displaying a message about it.	2020-06-16 13:24:00 -04:00
Joey Hess	2670890b17	convert to withCreateProcess for async exception safety This handles all createProcessSuccess callers, and aside from process pools, the complete conversion of all process running to async exception safety should be complete now. Also, was able to remove from Utility.Process the old API that I now know was not a good idea. And proof it was bad: The code size went down, despite there being a fair bit of boilerplate for some future API to reduce.	2020-06-04 15:45:52 -04:00
Joey Hess	1ee5919d1e	make createProcess calls async exception safe Using cleanupProcess because withCreateProcess cannot run an Annex action, but the effect is the same as using it.	2020-06-03 15:30:30 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	f9ed30de3b	avoid beware of the leopard situation * Display a warning message when a remote uses a protocol, such as git://, that git-annex does not support. Silently skipping such a remote was confusing behavior. It sets annex-ignore, so the warning is only displayed once. * Also display a warning message when a remote, without a known uuid, is located in a directory that does not currently exist, to avoid silently skipping such a remote. This is a bit more debatable, since git-annex get will say, try making repository available. And since it does not set annex-ignore, the warning will be displayed repeatedly. It's also an extreme edge case, I don't think I've ever seen it happen in real life.	2020-05-04 13:01:11 -04:00
Joey Hess	cd1676d604	fix bug involving local git remote and out of date location log get --from, move --from: When used with a local git remote, these used to silently skip files that the location log thought were present on the remote, when the remote actually no longer contained them. Since that behavior could be surprising, now instead display a warning. I got very confused when I encountered this behavior, since it was silently skipping a file I needed that whereis said was on the remote. get without --from already displayed a "unable to access these remotes" message, which while a bit misleading in that the remote is likely accessible, but just doesn't contain the file, at least indicated something went wrong. Having get --from display a warning makes it in line with get w/o --from, so seems certianly ok. It might be there are situations where move --from is used, on eg a whole directory, and the user only wants to move whatever is present in the remote, and is perfectly ok with files that are not present being skipped. So I'm less sure about the new warning being ok there. OTOH, only local git remotes avoiding displaying a warning in that case too, so this just brings them into line with other remotes. (Also note that this makes it a little bit faster when dealing with a lot of files, since it avoids a redundant stat of the file.)	2020-04-21 12:36:58 -04:00
Joey Hess	529f488ec4	fix a thundering herd problem Avoid repeatedly opening keys db when accessing a local git remote and -J is used. What was happening was that Remote.Git.onLocal created a new annex state as each thread started up. The way the MVar was used did not prevent that. And that, in turn, led to repeated opening of the keys db, as well as probably other extra work or resource use. Also managed to get rid of Annex.remoteannexstate, and it turned out there was an unncessary Maybe in the keysdbhandle, since the handle starts out closed.	2020-04-17 17:09:29 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00

1 2 3 4 5 ...

401 commits