Commit graph

56 commits

Author SHA1 Message Date
Joey Hess
cda3e85164
make my authorship explicit in the code
This is intended to guard against LLM code theft, which is the current
bubble technology de jour.

Note that authorJoeyHess' with a year older than the year I began
developing git-annex will behave badly, by intention. Eg, it will spin
and eventually crash.

This is not the first anti-LLM protection in git-annex. For example see
9562da790f. That method, while much harder
for an adversary to detect and remove, also complicates code somewhat
significantly, and needs extensions to be enabled. There are also
probably significantly fewer ways to implement that method in Haskell.
This new approach, by contrast, will be easy to add throughout the code
base, with very little effort, and without complicating reading or
maintaining it any more than noticing that yes, I am the author of this
code.

An adversary could of course remove all calls to these functions
before feeding code into their LLM-based laundry facility. I think this
would need to be done manually, or with the help of some fairly advanced
Haskell parsing though. In some cases, authorJoeyHess needs to be
removed, while in other places it needs to be replaced with a value.
Also a monadic use of authorJoeyHess' may involve other added monadic
machinery which would need to be eliminated to keep the code compiling.

Alternatively, an adversary could replace my name with something
innocuous. This would be clear intent to remove author attribution
from my code, even more than running it through an LLM laundry is.

If you work for a large company that is laundering my code through an
LLM, please do us a favor and use your immense privilege to quit and go
do something socially beneficial. I will not explain further
developments of this code in such detail, and you have better things to
do than playing cat and mouse with me as I explore directions such as
extending this approach to the type level.

Sponsored-by: k0ld on Patreon
2023-11-20 12:29:12 -04:00
Joey Hess
df6f9f1ee8
filter out control characters and quote filenames
Searched for uses of putStr and hPutStr and changed appropriate ones to filter
out control characters and quote filenames.

This notably does not make find and findkeys quote filenames in their default
output. Because they should only do that when stdout is non a pipe.

A few commands like calckey and lookupkey seem too low-level to make sense to filter
output, so skipped those.

Also when relaying output from other commands that is not progress output,
have git-annex filter out control characters.

Sponsored-by: k0ld on Patreon
2023-04-11 14:27:22 -04:00
Joey Hess
8675b2b075
rename memoryUnits
It's not just used for memory sizes.
2022-05-05 15:35:11 -04:00
Joey Hess
1c11dd4793
avoid cursor jitter when updating progress display
When the progress display gets longer, and then shorter again, it causes
the cursor to jitter back and forth. Somehow I never noticed this until
this morning, but then it became intolerable to watch.

To fix it, pad the progress display to the maximum length it's occupied.

Sponsored-by: Svenne Krap on Patreon
2021-10-07 11:16:41 -04:00
Joey Hess
64cac1a721
avoid potentially very long bwlimit delay at start
I first saw this getting with -J2 over ssh, but later saw it also
without the -J2. It was resuming, and the calulated unboundDelay was
many minutes. The first update of the meter jumped to some large value,
because of the resuming, and so it thought the BW was super fast.

Avoid by waiting until the second meter update.

Might be a good idea to also guard for the delay being many seconds
and avoid waiting. But how many? If BW is legitimately super fast, and a
remote happens to read more than a 32kb or so chunk at a time, it could
in theory download megabytes or gigabytes of data before the first meter
update. It would actually be appropriate then to delay for a long time,
if the desired BW was low. Could make up some numbers that are sane now,
but tech may improve.

(BTW, pleased to see bwlimit does work with -J. I had worried that
it might not, if the meter update happened in a different thread than
the downloading, but it's done in the same thread.)

Sponsored-by: Brett Eisenberg on Patreon
2021-09-22 19:23:30 -04:00
Joey Hess
e8496d62e4
improved bwrate limiting implementation
New method is much better. Avoids unrestrained transfer at the beginning
(except for the first block. Keeps right at or a few kb/s below the
configured limit, with very little varation in the actual reported bandwidth.

Removed the /s part of the config as it's not needed.

Ready to merge.

Sponsored-by: Luke Shumaker on Patreon
2021-09-22 15:27:16 -04:00
Joey Hess
18e00500ce
bwlimit
Added annex.bwlimit and remote.name.annex-bwlimit config that works for git
remotes and many but not all special remotes.

This nearly works, at least for a git remote on the same disk. With it set
to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with
occasional spikes to 160 kb/s. So it needs to delay just a bit longer...
I'm unsure why.

However, at the beginning a lot of data flows before it determines the
right bandwidth limit. A granularity of less than 1s would probably improve
that.

And, I don't know yet if it makes sense to have it be 100ks/1s rather than
100kb/s. Is there a situation where the user would want a larger
granularity? Does granulatity need to be configurable at all? I only used that
format for the config really in order to reuse an existing parser.

This can't support for external special remotes, or for ones that
themselves shell out to an external command. (Well, it could, but it
would involve pausing and resuming the child process tree, which seems
very hard to implement and very strange besides.) There could also be some
built-in special remotes that it still doesn't work for, due to them not
having a progress meter whose displays blocks the bandwidth using thread.
But I don't think there are actually any that run a separate thread for
downloads than the thread that displays the progress meter.

Sponsored-by: Graham Spencer on Patreon
2021-09-21 16:58:10 -04:00
Joey Hess
7b6deb1109
display scanning message whenever reconcileStaged has enough files to chew on
Clear visible progress bar first.

Removed showSideActionAfter because it can't be used in reconcileStaged
(import loop). Instead, it counts the number of files it
processes and displays it after it's seen a sufficient to know it's
taking a while.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 12:48:30 -04:00
Joey Hess
62e152f210
incremental checksum on download from ssh or p2p
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.

Not tested at all yet, but I imagine it will work!

Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
2021-02-09 17:03:27 -04:00
Joey Hess
94b323a8e8
use TotalSize more extensively 2020-12-11 12:10:43 -04:00
Joey Hess
e5b170aa1c
switch back to POSIXTime
turned out not to need Read MeterState
2020-12-04 13:54:33 -04:00
Joey Hess
5a41e46bd4
start on serializing Messages
Json objects not yet handled, and some other special cases, but this is
the bulk of the messages.

For progress meters, POSIXTime does not have a Read instance (or a
suitable Show instance), so had to switch to using a Double for progress
meters.

This commit was sponsored by Ethan Aubin on Patreon.
2020-12-03 13:03:03 -04:00
Joey Hess
92136284b1
avoid hGetMetered 0 closing the handle
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.

I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-01 15:39:22 -04:00
Joey Hess
e6d741af79
finish conversion to hGetLineUntilExitOrEOF
started in aafae46bcb
2020-11-18 14:54:02 -04:00
Joey Hess
aafae46bcb
WIP
for https://git-annex.branchable.com/bugs/Buggy_external_special_remote_stalls_after_7245a9e/
2020-11-17 17:31:08 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
4c32499e82
Parse youtube-dl progress output
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.

The youtube-dl output is no longer displayed, except for any errors.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-29 17:53:48 -04:00
Joey Hess
4466c1001d
improve slightly
This probably avoids the situation that caused the exception to be
thrown. It also makes sure that both threads end up canceled in the end,
while before the exception from wait outt could have caused errt to
never be waited on.
2020-08-10 16:33:58 -04:00
Joey Hess
c59a51a065
discard any exception thrown while trying to kill worker threads
Since there's a race here, and since Kyle saw an exception leak out,
which I have not been able to reproduce that. See my comment for what
I think might be going on.

Note that, I used tryNonAsync, because it seems a later tryNonAsync
caught the exception. I don't actually understand how it did, as I
understand exception classification, it's the data type, not the way it
was thrown. One possibility is that the async exception may have been wrapped
in some other, non-async exception, and Show displayed it the same way.
2020-08-10 16:24:51 -04:00
Joey Hess
f75be32166
external backends wip
It's able to start them up, the only thing not implemented is generating
and verifying keys. And, the key translation for HasExt.
2020-07-29 15:23:18 -04:00
Joey Hess
aa492bc659
Fix a hang when using git-annex with an old openssh 7.2p2
This does mean a 2 second delay after transfers when using that ssh, but
it's an old and apparently quite weirdly broken version of ssh.
2020-07-22 11:04:33 -04:00
Joey Hess
1f2e2d15e8
async exception safety
Convert to withCreateProcess and concurrently, both of which
handle cleaning up when there's an async exception thrown to the thread
running this.
2020-06-03 13:19:28 -04:00
Joey Hess
322c542b5c
fix ByteString conversion on windows
the encode' and decode' functions on Windows should not apply the
filesystem encoding, which does not work there. Instead, convert to and
from UTF-8.

Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem
encoding, so won't work as expected on windows.
2019-12-18 13:32:56 -04:00
Joey Hess
8ea5f3ff99
explict export lists
Eliminated some dead code. In other cases, exported a currently unused
function, since it was a logical part of the API.

Of course this improves the API documentation. It may also sometimes
let ghc optimize code better, since it can know a function is internal
to a module.

364 modules still to go, according to
git grep -E 'module [A-Za-z.]+ where'
2019-11-21 16:08:37 -04:00
Joey Hess
69cefe8190
followup and display rsync exit status 2019-08-15 14:47:22 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
0f6775f1ff
refactor sinkResponseFile and add downloadC
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.

downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.

git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 16:07:08 -04:00
Joey Hess
bebf541aa7
Fix calculation of estimated completion for progress meter.
Was estimating transfer of whole file, not remaining part of it.
2018-03-19 23:26:41 -04:00
Joey Hess
2c05bc9dfd
fix build with old base
Old base (used on android still) lacks tryReadMVar
2018-03-16 12:06:45 -04:00
Joey Hess
ba44ca80e6
Include amount of data transferred in progress display. 2018-03-14 13:39:14 -04:00
Joey Hess
e16b069331
use total size from DATA
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.

Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.

In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.

When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-12 21:46:58 -04:00
Joey Hess
9bddc6d5ca
Improve progress display when watching file size, in cases where a transfer does not resume.
This commit was supported by the NSF-funded DataLad project.
2017-05-25 14:30:18 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
2ad06ded7e
force sofar calculation
This could avoid a memory leak. It would only happen when
the meter didn't look at sofar.
2016-12-08 16:28:07 -04:00
Joey Hess
ad5ef51040
more p2p progress meters
Display progress meter on send and receive from remote.

Added a new hGetMetered that can read an exact number of bytes (or
less), updating a meter as it goes.

This commit was sponsored by Andreas on Patreon.
2016-12-07 14:25:01 -04:00
Joey Hess
83ea1cec86
update progress meter when sending to p2p remote
This commit was sponsored by Thom May on Patreon.
2016-12-07 13:37:35 -04:00
Joey Hess
e0fae28c72
Rate limit console progress display updates to 10 per second. Was updating as frequently as changes were reported, up to hundreds of times per second, which used unncessary bandwidth when running git-annex over ssh etc. 2016-09-08 13:17:43 -04:00
Joey Hess
1244eb3770
refactor 2015-11-16 20:27:01 -04:00
Joey Hess
7943442dff
Display progress meter in -J mode when copying from a local git repo, to a local git repo, and from a remote git repo.
Had everything available, just didn't combine the progress meter with the
other places progress is sent to update it. (And to a remote repo already
did show progress.)

Most special remotes should already display progress meters with -J,
same as without it. One exception to this is the web, since it relies on
wget/curl progress display without -J. Still todo..
2015-11-16 19:32:30 -04:00
Joey Hess
addc82dab7 removed all uses of undefined from code base
It's a code smell, can lead to hard to diagnose error messages.
2015-04-19 00:38:29 -04:00
Joey Hess
42281f12d6 bring back --quiet filtering of stdout and stderr, with deadlock fixed
I don't quite understand the cause of the deadlock. It only occurred
when git-annex-shell transferinfo was being spawned over ssh to feed
download transfer progress back. And if I removed this line from
feedprogressback, the deadlock didn't occur:
	bytes <- readSV v

The problem was not a leaked FD, as far as I could see. So what was it?
I don't know.

Anyway, this is a nice clean implementation, that avoids the deadlock.
Just fork off the async threads to handle filtering the stdout and stderr,
and let them clean up their handles whenever they decide to exit.

I've verified that the handles do get promptly closed, although a little
later than I would expect. Presumably that "little later" is what
was making waiting on the threads deadlock.

Despite the late exit, the last line of stdout and stderr appears where
I'd want it to, so I guess this is ok..
2015-04-06 20:20:52 -04:00
Joey Hess
0a89d55269 Fixes a bug in the last release that caused rsync and possibly other commands to hang at the end of a file transfer.
Stderr reader blocks waiting for all stderr, and so blocks the process ever
exiting.

I tried several ways to get around this, but no success yet. For now,
disable the stderr reader entirely.
2015-04-06 17:12:38 -04:00
Joey Hess
30aa902174 relay external special remote stderr through progress suppression machinery (eep!)
It sounds worse than it is. ;)

Some external special remotes may run commands that display progress on
stderr. If git-annex is run with --quiet, this should filter out such
displays while letting the errors through.
2015-04-04 14:54:03 -04:00
Joey Hess
2343f99c85 well along the way to fully quiet --quiet
Came up with a generic way to filter out progress messages while keeping
errors, for commands that use stderr for both.

--json mode will disable command outputs too.
2015-04-04 14:34:03 -04:00
Joey Hess
20fb91a7ad WIP on making --quiet silence progress, and infra for concurrent progress bars 2015-04-03 16:48:30 -04:00
Joey Hess
a787cead35 bittorrent: Fix mojibake introduced in parsing arai2c progress output.
hGetSomeString reads one byte at a time, so unicode bytes are not composed.
The problem comes when outputting that to the console with hPut; that
tried to apply the handle's encoding, and so we get mojibake.

Instead, use ByteStrings, and only convert it to a string for parsing, not
for display.

Note that there are a couple of other things that use hGetSomeString,
which I've left as-is for now.
2015-02-10 12:34:34 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
1c88b59bd0 refactor 2014-12-17 13:21:55 -04:00
Joey Hess
5d946fe3a9 switch from hGetSome to hGet
This should be essentially no-op change for hGetContentsMetered, since it
always gets the entire contents. So the only difference is that each chunk
of the lazy bytestring will always be the full chunk size. So, I'm pretty
sure this is safe. Also, the only current users of hGetContentsMetered are
reading files, so the stream won't block for long in the middle.

The improvement is that hGetUntilMetered will always get some multiple of
the defaultChunkSize. This will allow the S3 multipart code to pick a fixed
size and know that hGetUntilMetered will really get that size.

(cherry picked from commit bd09046291)
2014-11-03 22:11:47 -04:00
Joey Hess
0602b26314 hGetUntilMetered 2014-11-03 18:37:05 -04:00