This leaves git annex unused --from remote still using loggedKeysFor
and buffering more than ought to be necessary, but I can't see a way to
improve that.
In Annex.Branch.branch, the (++) was killing laziness.
Rewrote so it streams lazily.
filterM also kills laziness, so made loggedKeys use a Unchecked type,
and check if the key is dead in the seek loop.
Note that loggedKeysFor still buffers, so git-annex info <remote> and
git-annex unused --from remote still use more memory than necessary.
Also removed some unused functions from Annex.Journal.
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.
It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.
This commit was supported by the NSF-funded DataLad project.
Flipped all comparisons. When a TrustLevel list was wanted from Trusted
downwards, used Down to compare it in that order.
This commit was sponsored by mo on Patreon.
This commit removes the Ord and Enum instances, commenting out all code
that depends on them, to make sure that all code effected by the
inversion fix has been identified.
(Assuming no ifdefs involve TrustLevel.)
The next commit will fix up all the identified code.
See the big comment at the bottom of Command.Drop for the full details.
(The --safe/--unsafe options were never released.)
This commit was sponsored by Jake Vosloo on Patreon.
move: Added --safe option, which makes move honor numcopies settings.
Also --unsafe enables the default behavior, anticipating that the
default may one day change.
This commit was sponsored by Ethan Aubin.
* For url downloads, git-annex now defaults to using a http library,
rather than wget or curl. But, if annex.web-options is set, it will
use curl. To use the .netrc file, run:
git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
git-annex builds).
Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.
This commit was supported by the NSF-funded DataLad project.
Compare these...
numcopies stats:
numcopies -1: 1986
numcopies +0: 1170
numcopies -2: 769
numcopies +1: 716
numcopies -4: 696
numcopies -3: 485
numcopies -6: 230
numcopies -5: 111
numcopies -7: 91
numcopies -9: 9
numcopies stats:
numcopies +1: 716
numcopies +0: 1170
numcopies -1: 1986
numcopies -2: 769
numcopies -3: 485
numcopies -4: 696
numcopies -5: 111
numcopies -6: 230
numcopies -7: 91
numcopies -9: 9
I feel that the former is a jumbled mess that doesn't tell much overall,
while the second shows pretty clearly that most files are within 1 degree
of the desired number of copies, with some outliers without enough.
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.
Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.
Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.
This commit was supported by the NSF-funded DataLad project.
Added annex.retry, annex.retry-delay, and per-remote versions to configure
transfer retries.
This commit was supported by the NSF-funded DataLad project.
Avoid creating transfer info file before transfer lock is created and
locked.
The wrong order for one thing caused transfer info to be overwritten
when a transfer was already in progress.
But worse, it caused checkTransfer to see the transfer info,
and so lock the transfer lock in order to verify the transfer was not in
progress. Which in a concurrent situation, prevented the transferrer
from locking the transfer lock, so it failed with "transfer already in
progress".
Note that the transferinfo command does not lock the transfer lock
before creating the transfer info. But, that's only run after
recvkey is running, and recvkey does lock the transfer lock, so that
seems more or less ok. (Other than being a super complicated legacy mess
that the P2P code has mostly obsoleted now.)
This commit was supported by the NSF-funded DataLad project.
P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.
Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.
As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.
This commit was sponsored by Jack Hill on Patreon.
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.
Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.
In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.
When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.
This commit was sponsored by Henrik Riomar on Patreon.
Unfortunately ReceiveMessage didn't handle unknown messages the way it
was documented to; client sending VERSION would cause the server to
return an ERROR and hang up. Fixed that, but old releases of git-annex
use the P2P protocol for tor and will still have that behavior.
So, version is not negotiated for Remote.P2P connections, only for
Remote.Git connections, which will support VERSION from their first
release. There will need to be a later flag day to change Remote.P2P;
left a commented out line that is the only thing that will need to be
changed then.
Version 1 of the P2P protocol is not implemented yet, but updated
the docs for the DATA change that will be allowed by that version.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
Not yet used for everything else, but this is enough to
verify that it works, and do some benchmarking.
Some bugfixes included, which got it working. Also fallback to old
actions has been verified to work correctly.
Benchmarked dropping one thousand files from a ssh remote on localhost.
Using the old git-annex 40.867 seconds.
With the P2P protocol 9.905 seconds!
This commit was sponsored by Jochen Bartl on Patreon.
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.
Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.
Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)
Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Not yet used by git-annex, but this will allow faster transfers etc than
using individual ssh connections and rsync.
Not called git-annex-shell p2p, because git-annex p2p does something
else and I don't want two subcommands with the same name between the two
for sanity reasons.
This commit was sponsored by Øyvind Andersen Holm.
lockContentShared had a screwy caveat that it didn't verify that the content
was present when locking it, but in the most common case, eg indirect mode,
it failed to lock when the content is not present.
That led to a few callers forgetting to check inAnnex when using it,
but the potential data loss was unlikely to be noticed because it only
affected direct mode I think.
Fix data loss bug when the local repository uses direct mode, and a
locally modified file is dropped from a remote repsitory. The bug
caused the modified file to be counted as a copy of the original file.
(This is not a severe bug because in such a situation, dropping
from the remote and then modifying the file is allowed and has the same
end result.)
And, in content locking over tor, when the remote repository is
in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy.
So, make lockContentShared do its own inAnnex check. This could perhaps
be optimised for direct mode, to avoid the check then, since locking
the content necessarily verifies it exists there, but I have not bothered
with that.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
sync: Fix bug that prevented pulling changes into direct mode repositories
that were committed to remotes using git commit rather than git-annex sync.
This commit was supported by the NSF-funded DataLad project.
Noticed while running this (which a user posted in a comment they deleted
for some reason):
git-annex importfeed https://vimeo.com/logiingimars/videos/rss
The filename that youtube-dl suggests included a subdirectory,
which didn't exist, so renaming to it failed.
This commit was sponsored by mo on Patreon.
Added --json-error-messages option, which includes error messages in the
json output, rather than outputting them to stderr.
The actual rediretion of errors is not implemented yet, this is only
the docs and option plumbing.
This commit was supported by the NSF-funded DataLad project.
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.
Got rid of the remotes member of Git.Repo. This was a bit painful.
Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.
This commit was sponsored by Jake Vosloo on Patreon.
git grep writeFile finds some more that might also be problems, but
for now I've concentrated on .git/annex/ log files. There are certianly
cases where writeFile is not a problem too.
This commit was sponsored by mo on Patreon.
The problem with combining these is that Build.Standalone etc need only
the BuildInfo, and since not built with cabal, the BuildFlags ifdefs
were causing bogus warnings.
Fourth or fifth try at this and finally found a way to make it work.
Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.
Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
addurl: When the file youtube-dl will download is already an annexed file,
don't download it again and fail to overwrite it, instead just do nothing,
like it used to when quvi was used.
This commit was sponsored by Anthony DeRobertis on Patreon.
Chose to make this only handle files actively being downloaded, not temp
files for downloads that were interrupted or files that have been fully
downloaded.
This commit was sponsored by Ole-Morten Duesund on Patreon.
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.
Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.
This commit was sponsored by Jochen Bartl on Patreon.
A top-level .noannex file will prevent git-annex init from being used in a
repository. This is useful for repositories that have a policy reason not
to use git-annex. The content of the file will be displayed to the user who
tries to run git-annex init.
This also affects git annex reinit and initialization via the webapp.
It does not affect automatic inits, when there's a sibling git-annex branch
already.
This commit was supported by the NSF-funded DataLad project.
The youtube changes accidentially caused the OtherDownloader url to not
get used here, which broke datalad's test suite luckily.
This commit was supported by the NSF-funded DataLad project.
lookupkey: Support being given an absolute filename to a file within the
current git repository.
This commit was supported by the NSF-funded DataLad project.