659640e224 was buggy, it had a STM
deadlock because two actions both wanted to takeTMVar the WorkerPool
and so blocked one-another.
Fixed by completely reworking how the pool is maintained. Maintenace
threads now wait for the Async actions and update the WorkerPool. This
means twice as many threads as before, but green threads so will only
use a few extra bytes ram per thread.
When running multiple concurrent actions, the cleanup phase is run in a
separate queue than the main action queue. This can make some commands
faster, because less time is spent on bookkeeping in between each file
transfer.
But as far as I can see, nothing will be sped up much by this yet, because
all the existing cleanup actions are very light-weight. This is just groundwork
for deferring checksum verification to cleanup time.
This change does mean that if the user expects -J2 will mean that they see no
more than 2 jobs running at a time, they may be surprised to see 4 in some
cases (if the cleanup actions are slow enough to notice).
It might also make sense to enable background cleanup without the -J,
for at least one cleanup action. Indeed, that's the behavior that -J1
has now. At some point in the future, it make make sense to make the
behavior with no -J the same as -J1. The only reason it's not currently
is that git-annex can build w/o concurrent-output, and also any bugs
in concurrent-output (such as perhaps misbehaving on non-VT100 compatible
terminals) are avoided by default by only using it when -J is used.
Added the ability to run one job per CPU (core), by setting annex.jobs=cpus,
or using option --jobs=cpus or -Jcpus.
Built with future expansion in mind, including not defaulting matching on
Concurrency so more constructors can later be added, and using "cpu"
instead of "0".
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Added graftTree but it's buggy.
Should use graftTree in Annex.Branch.graftTreeish; it will be faster
than the current implementation there.
Started Annex.Import, but untested and it doesn't yet handle tree
grafting.
* fromkey: Added --json.
* fromkey --batch output changed to support using it with --json.
The old output was not parseable for any useful information, so
this is not expected to break anything.
No deprecation warning at run time, just one on the man page.
One thing findref remains able to do that find cannot is to run in a bare
repo. Find was made to refuse to run in a bare repo because it seemed
confusing for it to not list any files ever in that situation. It would be
better for find --branch to work in a bare repo but not without --branch
but I don't currently have a way to do that.
Probably a better solution would be to make git-annex in a bare repo
default to --branch master or something like that instead of --all.
This commit was sponsored by Denis Dzyubenko on Patreon.
* findref: Support file matching options: --include, --exclude,
--want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin
* Commands supporting --branch now apply file matching options --include,
--exclude, --want-get, --want-drop to filenames from the branch.
Previously, combining --branch with those would fail to match anything.
* add, import, findref: Support --time-limit.
This commit was sponsored by Jake Vosloo on Patreon.
When a command is operating on multiple files and there's an error with
one, try harder to continue to the rest. (As was already done for many
types of errors including IO errors.)
This handles cases like lockContentForRemoval throwing an exception when
the content is already locked. Just because a drop of one file fails, does
not mean it shouldn't go on to try to drop other files.
I looked over uses of `giveup` in Command/*; there are too many to check
them all extensively, but none stood out as being problems that should let
one commandAction stop running other commandActions. Worst case, something
bad will happen and rather than stopping right away with an error,
git-annex will display multiple errors as it fails over and over on each
file. I don't think I ever really intended `error`/`giveup` to stop other
commandActions; this was a relic of old confusion over haskell exception
handling.
Test suite passes.
This commit was sponsored by Ethan Aubin.
Of course, it wasn't used much in those modes, because normal output is
avoided. But it was still initialized and used in a few places,
including a call to hideRegionsWhile.
This relies on git ls-files --with-tree, which I'm using in a way that
its man page does not document. Hm. I emailed the git list to try to get
the docs improved, but at least the git test suite does test the same
kind of use case I'm using here.
Performance impact when not in an adjusted branch is limited to some
additional MVar accesses, and a single git call to determine the name of
the current branch. So very minimal.
When in an adjusted branch, the performance impact is
in Annex.WorkTree.lookupFile, which starts doing an equal amount of work
for files that didn't exist as it already did for files that were
unlocked.
This commit was sponsored by Jochen Bartl on Patreon.
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.
[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/
This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.
Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.
Some documentation specific to the Android app (screenshots etc) needs
to be updated still.
This commit was sponsored by Brett Eisenberg on Patreon.
Added annex.jobs setting, which is like using the -J option.
Of course, -J overrides annex.jobs.
This commit was sponsored by Trenton Cronholm on Patreon.
This is groundwork for nested seek loops, eg seeking over all files and
then performing commandActions on a list of remotes, which can be done
concurrently.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Same goal as b18fb1e343 but without
breaking backwards compatability. Just return IO exceptions when running
the P2P protocol, so that git-annex-shell can detect eof and avoid the
ugly message.
This commit was sponsored by Ethan Aubin.
Added -z option to git-annex commands that use --batch, useful for
supporting filenames containing newlines.
It only controls input to --batch, the output will still be line delimited
unless --json or etc is used to get some other output. While git often
makes -z affect both input and output, I don't like trying them together,
and making it affect output would have been a significant complication,
and also git-annex output is generally not intended to be machine parsed,
unless using --json or a format option.
Commands that take pairs like "file key" still separate them with a space
in --batch mode. All such commands take care to support filenames with
spaces when parsing that, so there was no need to change it, and it would
have needed significant changes to the batch machinery to separate tose
with a null.
To make fromkey and registerurl support -z, I had to give them a --batch
option. The implicit batch mode they enter when not provided with input
parameters does not support -z as that would have complicated option
parsing. Seemed better to move these toward using the same --batch as
everything else, though the implicit batch mode can still be used.
This commit was sponsored by Ole-Morten Duesund on Patreon.
The new second pass sees the file as type changed because the first
pass's changes have typically not reached git yet. So, have to
explicitly check for unmodified files in the second pass.
Note that, if the file has been touched but not really modified,
the first pass will handle it, and so the second pass does nothing.
This commit was sponsored by Jochen Bartl on Patreon.
v6: When a file is unlocked but has not been modified, and the unlocking is
only staged, git-annex add did not lock it. Now it will, for consistency
with how modified files are handled and with v5.
Note the removal of the sameInodeCache check. Otherwise it would see
that the unmodified file is unmodified and stop there. That check seems to have
been copied from the direct mode branch. But, direct mode had a specific
reason to check for unmodified content, that does not apply to v6.
The second pass means there is potential for a race, eg the unlocked
file could be modified in between the first and second passes.
No problem with that, since both passes do the same thing.
This commit was sponsored by Jake Vosloo on Patreon.
When --batch is used with matching options like --in, --metadata, etc, only
operate on the provided files when they match those options. Otherwise, a
blank line is output in the batch protocol.
Affected commands: find, add, whereis, drop, copy, move, get
In the case of find, the documentation for --batch already said it honored
the matching options. The docs for the rest didn't, but it makes sense to
have them honor them. While this is a behavior change, why specify the
matching options with --batch if you didn't want them to apply?
Note that the batch output for all of the affected commands could
already output a blank line in other cases, so batch users should
already be prepared to deal with it.
git-annex metadata didn't seem worth making support the matching options,
since all it does is output metadata or set metadata, the use cases for
using it in combination with the martching options seem small. Made it
refuse to run when they're combined, leaving open the possibility for later
support if a use case develops.
This commit was sponsored by Brett Eisenberg on Patreon.
Useful for dropping old objects from cache repositories.
But also, quite a genrally useful thing to have..
Rather than imitiating find's -atime and other options, all of which are
pretty horrible to use, I made this match files accessed within a time
period, using the same duration format used by git-annex schedule and
--limit-time
In passing, changed the --limit-time option parser to parse the
duration, instead of having it later throw an error.
This commit was supported by the NSF-funded DataLad project.
Makes it allow writes, but not deletion of annexed content. Note that
securing pushes to the git repository is left up to the user.
This commit was sponsored by Jack Hill on Patreon.
In Annex.Branch.branch, the (++) was killing laziness.
Rewrote so it streams lazily.
filterM also kills laziness, so made loggedKeys use a Unchecked type,
and check if the key is dead in the seek loop.
Note that loggedKeysFor still buffers, so git-annex info <remote> and
git-annex unused --from remote still use more memory than necessary.
Also removed some unused functions from Annex.Journal.
Test case is 24 directories each containing files named 1..10000.
The concat and filterM destroyed what laziness there is in
dirContentsRecursive, making it buffer all the filenames. Memory
use was around 300 mb (possibly growing slightly as it progressed).
After this fix, memory use drops to a constant 59 mb.
Note that dirContentsRecursive still buffers the entire content of a
directory (not subdirectories) so this is still not optimal.
Unfortunately ReceiveMessage didn't handle unknown messages the way it
was documented to; client sending VERSION would cause the server to
return an ERROR and hang up. Fixed that, but old releases of git-annex
use the P2P protocol for tor and will still have that behavior.
So, version is not negotiated for Remote.P2P connections, only for
Remote.Git connections, which will support VERSION from their first
release. There will need to be a later flag day to change Remote.P2P;
left a commented out line that is the only thing that will need to be
changed then.
Version 1 of the P2P protocol is not implemented yet, but updated
the docs for the DATA change that will be allowed by that version.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
Not yet used by git-annex, but this will allow faster transfers etc than
using individual ssh connections and rsync.
Not called git-annex-shell p2p, because git-annex p2p does something
else and I don't want two subcommands with the same name between the two
for sanity reasons.
This commit was sponsored by Øyvind Andersen Holm.
Added --json-error-messages option, which includes error messages in the
json output, rather than outputting them to stderr.
The actual rediretion of errors is not implemented yet, this is only
the docs and option plumbing.
This commit was supported by the NSF-funded DataLad project.
Fix behavior of --json-progress followed by --json, in which
the latter option disabled the former.
This commit was supported by the NSF-funded DataLad project.
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.
Got rid of the remotes member of Git.Repo. This was a bit painful.
Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.
This commit was sponsored by Jake Vosloo on Patreon.
Chose to make this only handle files actively being downloaded, not temp
files for downloads that were interrupted or files that have been fully
downloaded.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Test suite is always included.
Building with this flag disabled has actually been broken for some time,
since Command.TestRemote uses tasty. Fewer build flags are better, so good
time to drop it.
This commit was sponsored by Thomas Hochstein on Patreon.
This avoids all the complication about redundant work discussed in
the previous try at fixing this. At the expense of needing each command
that could have the problem to be patched to simply wrap the action in
onlyActionOn once the key is known. But there do not seem to be many
such commands.
onlyActionOn' should not be used with a CommandStart (or CommandPerform),
although the types do allow it. onlyActionOn handles running the whole
CommandStart chain. I couldn't immediately see a way to avoid mistken
use of onlyActionOn'.
This commit was supported by the NSF-funded DataLad project.
git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.
Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.
This commit was supported by the NSF-funded DataLad project.
optparse-applicative-0.14.0.0 adds support for these, so have the
Makefile install their scripts when built with it.
CmdLine/GitAnnex/Options.hs now uses action "file" in cmdParams,
which affects the bash and zsh completions, letting them complete
filenames for subcommands that use that. This is not needed for
bash, since bash-completion.bash enables -o bashdefault, which
lets it complete filenames too. But it does not seem to break the bash
completions. It is needed for zsh; the zsh completion otherwise
does not complete filenames. The fish completion will always complete
filenames no matter what. Messy.
This commit was sponsored by Denis Dzyubenko on Patreon.
When setting metadata of a file that did not exist, no error message was
displayed, unlike getting metadata and most other git-annex commands. Fixed
this oversight.
Note that, if the file exists but is not annexed, there's no error.
This is the same behavior as other git-annex commands.
This commit was supported by the NSF-funded DataLad project.
Reworked remote name parsing to allow things like that. Command.Move
uses it for --to=here, although there's not yet an implementation of
that option.
This commit was sponsored by Ignacio on Patreon.
See my comment. This only avoids the problem for -J; two git-annex
processes started at the same time could still both try to write to
.git/config and one fail. That would be very unlikely though, and it
doesn't really seem worth adding an additional layer of locking around
.git/config.
This commit was supported by the NSF-funded DataLad project.
Fix bug when used with a recently cloned repository, where
"merging" messages were included in the output of configlist (and perhaps
other commands) and caused a "Failed to get annex.uuid configuration"
error.
This does not seem to have been a reversion.
I saw this with configlist, but it seems possible for other commands to be
effected, and it might not always happen only after a fresh clone. Eg, if a
foo/git-annex branch is pushed to the remote, the next git-annex-shell will
auto-merge it and display the message.
Decided to run all git-annex-shell commands with noMessages,
even ones that don't currently use stdout for structured communication.
Better to keep open the possibility for using stdout in the future.
This commit was supported by the NSF-funded DataLad project
It was relying on segmentPaths to work correctly, so when it didn't,
sometimes the file that did not exist got matched up with a non-null
list of results. Fixed by always checking if each parameter exists.
There are two reason segmentPaths might not work correctly.
For one, it assumes that when the original list of paths
has more than 100 paths, it's not worth paying the CPU cost to
preserve input orders.
And then, it fails when a directory such as "." or ".." or
/path/to/repo is in the input list, and the list of found paths
does not start with that same thing. It should probably not be using
dirContains, but something else.
But, it's not clear how to handle this fully. Consider
when [".", "subdir"] has been expanded by git ls-files to
["subdir/1", "subdir/2"]
-- Both of the inputs contained those results, so there's
no one right answer for segmentPaths. All these would be equally valid:
[["subdir/1", "subdir/2"], []]
[[], ["subdir/1", "subdir/2"]]
[["subdir/1"], [""subdir/2"]]
So I've not tried to improve segmentPaths.
Added --securehash option to match files using a secure hash function, and
corresponding securehash preferred content expression.
This commit was sponsored by Ethan Aubin.
Where before the "name" of a key and a backend was a string, this makes
it a concrete data type.
This is groundwork for allowing some varieties of keys to be disabled
in file2key, so git-annex won't use them at all.
Benchmarks ran in my big repo:
old git-annex info:
real 0m3.338s
user 0m3.124s
sys 0m0.244s
new git-annex info:
real 0m3.216s
user 0m3.024s
sys 0m0.220s
new git-annex find:
real 0m7.138s
user 0m6.924s
sys 0m0.252s
old git-annex find:
real 0m7.433s
user 0m7.240s
sys 0m0.232s
Surprising result; I'd have expected it to be slower since it now parses
all the key varieties. But, the parser is very simple and perhaps
sharing KeyVarieties uses less memory or something like that.
This commit was supported by the NSF-funded DataLad project.
* Added post-recieve hook, which makes updateInstead work with direct
mode and adjusted branches.
* init: Set up the post-receive hook.
This commit was sponsored by Fernando Jimenez on Patreon.
Any config names can be set using this; git-annex commands will only look
at specific ones that make sense and are worth the overhead of querying the
branch.
This might also be useful for storing whatever other config-type stuff the
user might want to shove into the git-annex branch.
This commit was sponsored by Jochen Bartl on Patreon.
Still a couple bugs:
* Closing the connection to the server leaves git upload-pack /
receive-pack running, which could be used to DOS.
* Sometimes the data is transferred, but it fails at the end, sometimes
with:
git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe)
Must be a race condition around shutdown.
Almost working, but there's a bug in the relaying.
Also, made tor hidden service setup pick a random port, to make it harder
to port scan.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.
Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.
This commit was sponsored by Ethan Aubin.
I've long considered the XMPP support in git-annex a wart.
It's nice to remove it.
(This also removes the NetMessager, which was only used for XMPP, and the
daemonstatus's desynced list (likewise).)
Existing XMPP remotes should be ignored by git-annex.
This commit was sponsored by Brock Spratlen on Patreon.
Tor unfortunately does not come out of the box configured to let hidden
services register themselves on the fly via the ControlPort.
And, changing the config to enable the ControlPort and a particular type
of auth for it may break something already using the ControlPort, or
lessen the security of the system.
So, this leaves only one option to us: Add a hidden service to the
torrc. git-annex enable-tor does so, and picks an unused high port for
tor to listen on for connections to the hidden service.
It's up to the caller to somehow pick a local port to listen on
that won't be used by something else. That may be difficult to do..
This commit was sponsored by Jochen Bartl on Patreon.
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.
Concurrent json output is currently a mess though since threads output
chunks over top of one-another.
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.
Noisy due to some refactoring into Types/
Show branch:file that is being operated on.
I had to make ActionItem a type and not a type class because
withKeyOptions' passed two different types of values when using the type
class, and I could not get the type checker to accept that.
Added --branch option to copy, drop, fsck, get, metadata, mirror, move, and
whereis commands. This option makes git-annex operate on files that are
included in a specified branch (or other treeish).
The names of the files from the branch that are being operated on are not
displayed yet; only the keys. Displaying the filenames will need changes
to every affected command.
Also, note that --branch can be specified repeatedly. This is not really
documented, but seemed worth supporting, especially since we may later want
the ability to operate on all branches matching a refspec. However, when
operating on two branches that contain the same key, that key will be
operated on twice.
"git annex adjust" may be a temporary interface, but works for a proof of
concept.
It is pretty fast at creating the adjusted branch. The main overhead is
injecting pointer files. It might be worth optimising that by reusing the
symlink target as the pointer file content. When I tried to do that,
the problem was that the clean filter doesn't use that same format, and so
git thought files had changed. Could be dealt with, perhaps make the clean
filter use symlink format for pointer files when on an adjusted branch?
But the real overhead is in checking out the branch, when git runs the
smudge filter once per file. That is perhaps too slow to be usable,
although it may only affect initial checkout of the branch, and not
updates. TBD.
Instead -J will behave as if it was built without concurrent-output support
in this situation. Ie, it will be mostly quiet, except when there's an
error.
Note that it's not a problem for a filename to contain invalid utf-8 when
in a utf-8 locale. That is handled ok by concurrent-output. It's only
displaying unicode characters in a non-unicode locale that doesn't work.
* Removed the webapp-secure build flag, rolling it into the webapp build
flag.
* Removed the quvi and tahoe build flags, which only adds aeson to
the core dependencies.
* Removed the feed build flag, which only adds feed to the core
dependencies.
Build flags have cost in both code complexity and also make Setup configure
have to work harder to find a usable set of build flags when some
dependencies are missing.
When Config.setConfig runs, it throws away the old Repo and loads a new
one. So, add an action to adjust the Repo so that -c settings will persist
across that.
This allows things like Command.Find to use noMessages and generate their
own complete json objects. Previouly, Command.Find managed that only via a
hack, which wasn't compatable with batch mode.
Only Command.Find, Command.Smudge, and Commange.Status use noMessages
currently, and none except for Command.Find are impacted by this change.
Fixes find --json --batch output