git-annex

Author	SHA1	Message	Date
Joey Hess	f1c2e18b8d	improve attribution armoring Split out an author parameter, will make it easier to add authors and reads better. Got rid of the function without the copyright year, because an adversary could have mechanically changed the function with a copyright year to the one without, and so bypassed the protection of LLM copyright year hallucination. Sponsored-by: Luke T. Shumaker on Patreon	2023-11-21 11:34:21 -04:00
Joey Hess	cda3e85164	make my authorship explicit in the code This is intended to guard against LLM code theft, which is the current bubble technology de jour. Note that authorJoeyHess' with a year older than the year I began developing git-annex will behave badly, by intention. Eg, it will spin and eventually crash. This is not the first anti-LLM protection in git-annex. For example see `9562da790f`. That method, while much harder for an adversary to detect and remove, also complicates code somewhat significantly, and needs extensions to be enabled. There are also probably significantly fewer ways to implement that method in Haskell. This new approach, by contrast, will be easy to add throughout the code base, with very little effort, and without complicating reading or maintaining it any more than noticing that yes, I am the author of this code. An adversary could of course remove all calls to these functions before feeding code into their LLM-based laundry facility. I think this would need to be done manually, or with the help of some fairly advanced Haskell parsing though. In some cases, authorJoeyHess needs to be removed, while in other places it needs to be replaced with a value. Also a monadic use of authorJoeyHess' may involve other added monadic machinery which would need to be eliminated to keep the code compiling. Alternatively, an adversary could replace my name with something innocuous. This would be clear intent to remove author attribution from my code, even more than running it through an LLM laundry is. If you work for a large company that is laundering my code through an LLM, please do us a favor and use your immense privilege to quit and go do something socially beneficial. I will not explain further developments of this code in such detail, and you have better things to do than playing cat and mouse with me as I explore directions such as extending this approach to the type level. Sponsored-by: k0ld on Patreon	2023-11-20 12:29:12 -04:00
Joey Hess	df6f9f1ee8	filter out control characters and quote filenames Searched for uses of putStr and hPutStr and changed appropriate ones to filter out control characters and quote filenames. This notably does not make find and findkeys quote filenames in their default output. Because they should only do that when stdout is non a pipe. A few commands like calckey and lookupkey seem too low-level to make sense to filter output, so skipped those. Also when relaying output from other commands that is not progress output, have git-annex filter out control characters. Sponsored-by: k0ld on Patreon	2023-04-11 14:27:22 -04:00
Joey Hess	8675b2b075	rename memoryUnits It's not just used for memory sizes.	2022-05-05 15:35:11 -04:00
Joey Hess	1c11dd4793	avoid cursor jitter when updating progress display When the progress display gets longer, and then shorter again, it causes the cursor to jitter back and forth. Somehow I never noticed this until this morning, but then it became intolerable to watch. To fix it, pad the progress display to the maximum length it's occupied. Sponsored-by: Svenne Krap on Patreon	2021-10-07 11:16:41 -04:00
Joey Hess	64cac1a721	avoid potentially very long bwlimit delay at start I first saw this getting with -J2 over ssh, but later saw it also without the -J2. It was resuming, and the calulated unboundDelay was many minutes. The first update of the meter jumped to some large value, because of the resuming, and so it thought the BW was super fast. Avoid by waiting until the second meter update. Might be a good idea to also guard for the delay being many seconds and avoid waiting. But how many? If BW is legitimately super fast, and a remote happens to read more than a 32kb or so chunk at a time, it could in theory download megabytes or gigabytes of data before the first meter update. It would actually be appropriate then to delay for a long time, if the desired BW was low. Could make up some numbers that are sane now, but tech may improve. (BTW, pleased to see bwlimit does work with -J. I had worried that it might not, if the meter update happened in a different thread than the downloading, but it's done in the same thread.) Sponsored-by: Brett Eisenberg on Patreon	2021-09-22 19:23:30 -04:00
Joey Hess	e8496d62e4	improved bwrate limiting implementation New method is much better. Avoids unrestrained transfer at the beginning (except for the first block. Keeps right at or a few kb/s below the configured limit, with very little varation in the actual reported bandwidth. Removed the /s part of the config as it's not needed. Ready to merge. Sponsored-by: Luke Shumaker on Patreon	2021-09-22 15:27:16 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	7b6deb1109	display scanning message whenever reconcileStaged has enough files to chew on Clear visible progress bar first. Removed showSideActionAfter because it can't be used in reconcileStaged (import loop). Instead, it counts the number of files it processes and displays it after it's seen a sufficient to know it's taking a while. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 12:48:30 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	94b323a8e8	use TotalSize more extensively	2020-12-11 12:10:43 -04:00
Joey Hess	e5b170aa1c	switch back to POSIXTime turned out not to need Read MeterState	2020-12-04 13:54:33 -04:00
Joey Hess	5a41e46bd4	start on serializing Messages Json objects not yet handled, and some other special cases, but this is the bulk of the messages. For progress meters, POSIXTime does not have a Read instance (or a suitable Show instance), so had to switch to using a Double for progress meters. This commit was sponsored by Ethan Aubin on Patreon.	2020-12-03 13:03:03 -04:00
Joey Hess	92136284b1	avoid hGetMetered 0 closing the handle This is an edge case, which happened to be triggered by the P2P protocol seeing DATA 0. When reading 0 bytes, getting an empty string does not mean the handle has reached EOF. I verified there was in fact a bug, where get of an empty file followed by another file would get the empty file and then fail with "handle is closed". This fixes it. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-01 15:39:22 -04:00
Joey Hess	e6d741af79	finish conversion to hGetLineUntilExitOrEOF started in `aafae46bcb`	2020-11-18 14:54:02 -04:00
Joey Hess	aafae46bcb	WIP for https://git-annex.branchable.com/bugs/Buggy_external_special_remote_stalls_after_7245a9e/	2020-11-17 17:31:08 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	4466c1001d	improve slightly This probably avoids the situation that caused the exception to be thrown. It also makes sure that both threads end up canceled in the end, while before the exception from wait outt could have caused errt to never be waited on.	2020-08-10 16:33:58 -04:00
Joey Hess	c59a51a065	discard any exception thrown while trying to kill worker threads Since there's a race here, and since Kyle saw an exception leak out, which I have not been able to reproduce that. See my comment for what I think might be going on. Note that, I used tryNonAsync, because it seems a later tryNonAsync caught the exception. I don't actually understand how it did, as I understand exception classification, it's the data type, not the way it was thrown. One possibility is that the async exception may have been wrapped in some other, non-async exception, and Show displayed it the same way.	2020-08-10 16:24:51 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	aa492bc659	Fix a hang when using git-annex with an old openssh 7.2p2 This does mean a 2 second delay after transfers when using that ssh, but it's an old and apparently quite weirdly broken version of ssh.	2020-07-22 11:04:33 -04:00
Joey Hess	1f2e2d15e8	async exception safety Convert to withCreateProcess and concurrently, both of which handle cleaning up when there's an async exception thrown to the thread running this.	2020-06-03 13:19:28 -04:00
Joey Hess	322c542b5c	fix ByteString conversion on windows the encode' and decode' functions on Windows should not apply the filesystem encoding, which does not work there. Instead, convert to and from UTF-8. Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem encoding, so won't work as expected on windows.	2019-12-18 13:32:56 -04:00
Joey Hess	8ea5f3ff99	explict export lists Eliminated some dead code. In other cases, exported a currently unused function, since it was a logical part of the API. Of course this improves the API documentation. It may also sometimes let ghc optimize code better, since it can know a function is internal to a module. 364 modules still to go, according to git grep -E 'module [A-Za-z.]+ where'	2019-11-21 16:08:37 -04:00
Joey Hess	69cefe8190	followup and display rsync exit status	2019-08-15 14:47:22 -04:00
Joey Hess	7d51b0c109	import Utility.FileSystemEncoding in Common	2019-01-03 11:37:02 -04:00
Joey Hess	0f6775f1ff	refactor sinkResponseFile and add downloadC Remote.S3 and Remote.Helper.Http both had similar code to sink a http-conduit Response to a file; refactor out sinkResponseFile. downloadC downloads an url to a file using http-conduit, and supports resuming. Falls back to curl to handle urls that http-conduit does not support. This is not used yet, but the goal is to replace download with it. git-annex.cabal: conduit-extra was not actually used for a long time, remove the dep. conduit moves into the main dependency list, but since http-conduit was already in there, and it depends on conduit, that's not really adding a new build dep. This commit was supported by the NSF-funded DataLad project.	2018-04-06 16:07:08 -04:00
Joey Hess	bebf541aa7	Fix calculation of estimated completion for progress meter. Was estimating transfer of whole file, not remaining part of it.	2018-03-19 23:26:41 -04:00
Joey Hess	2c05bc9dfd	fix build with old base Old base (used on android still) lacks tryReadMVar	2018-03-16 12:06:45 -04:00
Joey Hess	ba44ca80e6	Include amount of data transferred in progress display.	2018-03-14 13:39:14 -04:00
Joey Hess	e16b069331	use total size from DATA Noticed that getting a key whose size is not known resulted in a progress display that didn't include the percent complete. Fixed for P2P by making the size sent with DATA be used to update the meter's total size. In order for rateLimitMeterUpdate to also learn the total size, had to make it be passed the Meter, and some other reorg in Utility.Metered was also done so that --json-progress can construct a Meter to pass to rateLimitMeterUpdate. When the fallback rsync is done, the progress display still doesn't include the percent complete. Only way to fix that seems to be to let rsync display its output again, but that would conflict with git-annex's own progress meter, which is also being displayed. This commit was sponsored by Henrik Riomar on Patreon.	2018-03-12 21:46:58 -04:00
Joey Hess	9bddc6d5ca	Improve progress display when watching file size, in cases where a transfer does not resume. This commit was supported by the NSF-funded DataLad project.	2017-05-25 14:30:18 -04:00
Joey Hess	a1730cd6af	adeiu, MissingH Removed dependency on MissingH, instead depending on the split library. After laying groundwork for this since 2015, it was mostly straightforward. Added Utility.Tuple and Utility.Split. Eyeballed System.Path.WildMatch while implementing the same thing. Since MissingH's progress meter display was being used, I re-implemented my own. Bonus: Now progress is displayed for transfers of files of unknown size. This commit was sponsored by Shane-o on Patreon.	2017-05-16 01:03:52 -04:00
Joey Hess	2ad06ded7e	force sofar calculation This could avoid a memory leak. It would only happen when the meter didn't look at sofar.	2016-12-08 16:28:07 -04:00
Joey Hess	ad5ef51040	more p2p progress meters Display progress meter on send and receive from remote. Added a new hGetMetered that can read an exact number of bytes (or less), updating a meter as it goes. This commit was sponsored by Andreas on Patreon.	2016-12-07 14:25:01 -04:00
Joey Hess	83ea1cec86	update progress meter when sending to p2p remote This commit was sponsored by Thom May on Patreon.	2016-12-07 13:37:35 -04:00
Joey Hess	e0fae28c72	Rate limit console progress display updates to 10 per second. Was updating as frequently as changes were reported, up to hundreds of times per second, which used unncessary bandwidth when running git-annex over ssh etc.	2016-09-08 13:17:43 -04:00
Joey Hess	1244eb3770	refactor	2015-11-16 20:27:01 -04:00
Joey Hess	7943442dff	Display progress meter in -J mode when copying from a local git repo, to a local git repo, and from a remote git repo. Had everything available, just didn't combine the progress meter with the other places progress is sent to update it. (And to a remote repo already did show progress.) Most special remotes should already display progress meters with -J, same as without it. One exception to this is the web, since it relies on wget/curl progress display without -J. Still todo..	2015-11-16 19:32:30 -04:00
Joey Hess	addc82dab7	removed all uses of undefined from code base It's a code smell, can lead to hard to diagnose error messages.	2015-04-19 00:38:29 -04:00
Joey Hess	42281f12d6	bring back --quiet filtering of stdout and stderr, with deadlock fixed I don't quite understand the cause of the deadlock. It only occurred when git-annex-shell transferinfo was being spawned over ssh to feed download transfer progress back. And if I removed this line from feedprogressback, the deadlock didn't occur: bytes <- readSV v The problem was not a leaked FD, as far as I could see. So what was it? I don't know. Anyway, this is a nice clean implementation, that avoids the deadlock. Just fork off the async threads to handle filtering the stdout and stderr, and let them clean up their handles whenever they decide to exit. I've verified that the handles do get promptly closed, although a little later than I would expect. Presumably that "little later" is what was making waiting on the threads deadlock. Despite the late exit, the last line of stdout and stderr appears where I'd want it to, so I guess this is ok..	2015-04-06 20:20:52 -04:00
Joey Hess	0a89d55269	Fixes a bug in the last release that caused rsync and possibly other commands to hang at the end of a file transfer. Stderr reader blocks waiting for all stderr, and so blocks the process ever exiting. I tried several ways to get around this, but no success yet. For now, disable the stderr reader entirely.	2015-04-06 17:12:38 -04:00
Joey Hess	30aa902174	relay external special remote stderr through progress suppression machinery (eep!) It sounds worse than it is. ;) Some external special remotes may run commands that display progress on stderr. If git-annex is run with --quiet, this should filter out such displays while letting the errors through.	2015-04-04 14:54:03 -04:00
Joey Hess	2343f99c85	well along the way to fully quiet --quiet Came up with a generic way to filter out progress messages while keeping errors, for commands that use stderr for both. --json mode will disable command outputs too.	2015-04-04 14:34:03 -04:00
Joey Hess	20fb91a7ad	WIP on making --quiet silence progress, and infra for concurrent progress bars	2015-04-03 16:48:30 -04:00
Joey Hess	a787cead35	bittorrent: Fix mojibake introduced in parsing arai2c progress output. hGetSomeString reads one byte at a time, so unicode bytes are not composed. The problem comes when outputting that to the console with hPut; that tried to apply the handle's encoding, and so we get mojibake. Instead, use ByteStrings, and only convert it to a string for parsing, not for display. Note that there are a couple of other things that use hGetSomeString, which I've left as-is for now.	2015-02-10 12:34:34 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	1c88b59bd0	refactor	2014-12-17 13:21:55 -04:00
Joey Hess	5d946fe3a9	switch from hGetSome to hGet This should be essentially no-op change for hGetContentsMetered, since it always gets the entire contents. So the only difference is that each chunk of the lazy bytestring will always be the full chunk size. So, I'm pretty sure this is safe. Also, the only current users of hGetContentsMetered are reading files, so the stream won't block for long in the middle. The improvement is that hGetUntilMetered will always get some multiple of the defaultChunkSize. This will allow the S3 multipart code to pick a fixed size and know that hGetUntilMetered will really get that size. (cherry picked from commit `bd09046291`)	2014-11-03 22:11:47 -04:00

1 2

57 commits