Commit graph

110 commits

Author SHA1 Message Date
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
b8ef1bf3be
Fix find --json to output json once more.
Reversion from commit 436f10771, CustomOutput was forcing quiet output
which overrode the json setting.

find happened to be the only command that uses CustomOutput and also
outputs json. (metadata --get does also use CustomOutput and --json does not
enable json output for that, which may be an oversight, but was already the
behavior before this regression.)
2019-07-05 09:58:37 -04:00
Joey Hess
ba2551da6f
add startingNoMessage
Fixes the last wart in the StartMessage transition. A few commands
include other CommandStart actions that generate output, and
do not themselves need to display a start/end message.
2019-06-12 14:11:23 -04:00
Joey Hess
70bc30acb1
get rid of implicitMessages state
Oh joyous day, this is probably git-annex's oldest implementation wart,
source of much unncessary bother.

Now that we have a StartMessage, showEndResult' can look at it to know
if it needs to display an end message or not.

This is also going to be faster, because it avoids an uncessary state
lookup for each file processed.
2019-06-12 14:01:41 -04:00
Joey Hess
8e5ea28c26
finish CommandStart transition
The hoped for optimisation of CommandStart with -J did not materialize.
In fact, not runnign CommandStart in parallel is slower than -J3.
So, CommandStart are still run in parallel.

(The actual bad performance I've been seeing with -J in my big repo
has to do with building the remoteList.)

But, this is still progress toward making -J faster, because it gets rid
of the onlyActionOn roadblock in the way of making CommandCleanup jobs
run separate from CommandPerform jobs.

Added OnlyActionOn constructor for ActionItem which fixes the
onlyActionOn breakage in the last commit.

Made CustomOutput include an ActionItem, so even things using it can
specify OnlyActionOn.

In Command.Move and Command.Sync, there were CommandStarts that used
includeCommandAction, so output messages, which is no longer allowed.
Fixed by using startingCustomOutput, but that's still not quite right,
since it prevents message display for the includeCommandAction run
inside it too.
2019-06-12 13:24:01 -04:00
Joey Hess
436f107715
make CommandStart return a StartMessage
The goal is to be able to run CommandStart in the main thread when -J is
used, rather than unncessarily passing it off to a worker thread, which
incurs overhead that is signficant when the CommandStart is going to
quickly decide to stop.

To do that, the message it displays needs to be displayed in the worker
thread, after the CommandStart has run.

Also, the change will mean that CommandStart will no longer necessarily
run with the same Annex state as CommandPerform. While its docs already
said it should avoid modifying Annex state, I audited all the
CommandStart code as part of the conversion. (Note that CommandSeek
already sometimes runs with a different Annex state, and that has not been
a source of any problems, so I am not too worried that this change will
lead to breakage going forward.)

The only modification of Annex state I found was it calling
allowMessages in some Commands that default to noMessages. Dealt with
that by adding a startCustomOutput and a startingUsualMessages.
This lets a command start with noMessages and then select the output it
wants for each CommandStart.

One bit of breakage: onlyActionOn has been removed from commands that used it.
The plan is that, since a StartMessage contains an ActionItem,
when a Key can be extracted from that, the parallel job runner can
run onlyActionOn' automatically. Then commands won't need to worry about
this detail. Future work.

Otherwise, this was a fairly straightforward process of making each
CommandStart compile again. Hopefully other behavior changes were mostly
avoided.

In a few cases, a command had a CommandStart that called a CommandPerform
that then called showStart multiple times. I have collapsed those
down to a single start action. The main command to perhaps suffer from it
is Command.Direct, which used to show a start for each file, and no
longer does.

Another minor behavior change is that some commands used showStart
before, but had an associated file and a Key available, so were changed
to ShowStart with an ActionItemAssociatedFile. That will not change the
normal output or behavior, but --json output will now include the key.
This should not break it for anyone using a real json parser.
2019-06-06 17:13:54 -04:00
Joey Hess
258a7c5cd1
add Key to all ActionItem constructors 2019-06-06 12:53:24 -04:00
Joey Hess
82186ca58f
annex.jobs=cpus etc
Added the ability to run one job per CPU (core), by setting annex.jobs=cpus,
or using option --jobs=cpus or -Jcpus.

Built with future expansion in mind, including not defaulting matching on
Concurrency so more constructors can later be added, and using "cpu"
instead of "0".
2019-05-10 13:27:08 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
9127fe4821
add DebugLocks build flag
Using the method described in
https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell
but my own code to implement it, and with callstacks added.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-19 15:02:43 -04:00
Joey Hess
872af2b2f1
avoid using concurrent-output at all when --quiet or --json
Of course, it wasn't used much in those modes, because normal output is
avoided. But it was still initialized and used in a few places,
including a call to hideRegionsWhile.
2018-11-15 14:26:40 -04:00
Joey Hess
b2bafdb2fc
v6: Fix database inconsistency
That could cause git-annex to get confused about whether a locked file's
content was present, when the object file got touched.

Unfortunately this means more work sometimes when annex.thin is set,
since it has to checksum the file to tell if it's still got the right
content.

Had to suppress output when inAnnex calls isUnmodified, otherwise
"(checksum...)" would be printed in places it ought not to be,
eg "git annex get" could turn out not need to get anything, and
so only display that.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-10-16 13:51:37 -04:00
Joey Hess
7d9f0e0fbe
Added INFO to external special remote protocol.
It's left up to the special remote to detect when git-annex is new enough
to support the message; an old git-annex will blow up.

This commit was supported by the NSF-funded DataLad project.
2018-02-06 13:03:55 -04:00
Joey Hess
4781ca297b
showStart variant for when there's no worktree file
Clean up some uses of showStart with "" for the file,
or in some cases, a non-filename description string. That would
generate bad json, although none of the commands doing that
supported --json.

Using "" for the file resulted in output like "foo  rest";
now the extra space is eliminated.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-11-28 15:14:16 -04:00
Joey Hess
1d45e47e3f
clear regions before ssh prompt
When built with concurrent-output 1.9, ssh password prompts will no longer
interfere with the -J display.

To avoid flicker, only done when ssh actually does need to prompt;
ssh is first run in batch mode and if that succeeds the connection is up
and no need to clear regions.

This commit was supported by the NSF-funded DataLad project.
2017-05-16 15:50:11 -04:00
Joey Hess
2c6cfbe503
also serialize ssh password prompting when json or quiet output is enable 2017-05-13 13:13:13 -04:00
Joey Hess
6992fe133b
Ssh password prompting improved when using -J
When ssh connection caching is enabled (and when GIT_ANNEX_USE_GIT_SSH is
not set), only one ssh password prompt will be made per host, and only one
ssh password prompt will be made at a time.

This also fixes a race in prepSocket's stale ssh connection stopping
when run with -J. It was possible for one thread to start a cached ssh
connection, and another thread to immediately stop it, resulting in excess
connections being made.

This commit was supported by the NSF-funded DataLad project.
2017-05-11 17:36:03 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
d7ea6a5684
drop incremental json object display; clean up code
This gets rid of quite a lot of ugly hacks around json generation.

I doubt that any real-world json parsers can parse incomplete objects, so
while it's not as nice to need to wait for the complete object, especially
for commands like `git annex info` that take a while, it doesn't seem worth
the added complexity.

This also causes the order of fields within the json objects to be
reordered. Since any real json parser shouldn't care, the only possible
problem would be with ad-hoc parsers of the old json output.
2016-09-09 18:13:55 -04:00
Joey Hess
a108235565
better locking for json with -J
Avoid threads emitting json at the same time and scrambling, which was
still possible even with the buffering, just less likely.

Converted json IO actions to JSONChunk data too.
2016-09-09 15:51:34 -04:00
Joey Hess
05d4438383
addurl, get: Added --json-progress option, which adds progress objects to the json output.
This doesn't work right when used with -J yet, and there is some really
ugly hand-crafting of part of the json output.
2016-09-09 15:06:54 -04:00
Joey Hess
4a09b4bbbd
make maybeShowJSON also add to the buffer 2016-09-09 14:21:06 -04:00
Joey Hess
089c592977
buffer json output until done when in concurrent mode 2016-09-09 13:21:38 -04:00
Joey Hess
8ef494a833
disentangle concurrency and message type
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.

Concurrent json output is currently a mess though since threads output
chunks over top of one-another.
2016-09-09 12:57:42 -04:00
Joey Hess
1a0e2c9901
get, move, copy, mirror: Added --failed switch which retries failed copies/moves
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.

Noisy due to some refactoring into Types/
2016-08-03 12:37:12 -04:00
Joey Hess
bf3327ff25
Added metadata --batch option, which allows getting, setting, deleting, and modifying metadata for multiple files/keys. 2016-07-27 10:46:25 -04:00
Joey Hess
a030d0a8b7
allow using Aeson for streaming JSON output
Keeping Text.JSON use for now, because it seems a better fit for most of
the commands, which don't use very structured JSON objects, but just output
whatever fields suites them. But this lets Aeson be used when a more
structured data type is available to serialize to JSON.
2016-07-26 13:30:07 -04:00
Joey Hess
d13194b230
--branch, stage 2
Show branch:file that is being operated on.

I had to make ActionItem a type and not a type class because
withKeyOptions' passed two different types of values when using the type
class, and I could not get the type checker to accept that.
2016-07-20 15:23:43 -04:00
Joey Hess
847944e6b1
more generic showStart' 2016-07-20 14:03:54 -04:00
Joey Hess
acf74ae945
improve json when showStart' is given only a key
Before, the json contained file:key; change that to key:

If a file and a key are given, inclue both file: and key:
2016-03-06 12:57:24 -04:00
Joey Hess
0f18636c8a
Work around problem with concurrent-output when in a non-unicode locale by avoiding use of it in such a locale.
Instead -J will behave as if it was built without concurrent-output support
in this situation. Ie, it will be mostly quiet, except when there's an
error.

Note that it's not a problem for a filename to contain invalid utf-8 when
in a utf-8 locale. That is handled ok by concurrent-output. It's only
displaying unicode characters in a non-unicode locale that doesn't work.
2016-02-14 15:02:42 -04:00
Joey Hess
70b8cad9c8
make noMessages disable closing of json object in --json mode
This allows things like Command.Find to use noMessages and generate their
own complete json objects. Previouly, Command.Find managed that only via a
hack, which wasn't compatable with batch mode.

Only Command.Find, Command.Smudge, and Commange.Status use noMessages
currently, and none except for Command.Find are impacted by this change.

Fixes find --json --batch output
2016-01-20 14:10:13 -04:00
Joey Hess
249f7f4801
Force output to be line-buffered, even when it's not connected to the terminal.
This is particuarly important for commands with --batch output, which was
not always being flushed at an appropriate time.
2016-01-18 13:01:23 -04:00
Joey Hess
b96cfdc094
whereis --json: Make url list be included in machine-parseable form. 2016-01-06 12:33:32 -04:00
Joey Hess
4224fae71f
optimise read and write for Keys database (untested)
Writes are optimised by queueing up multiple writes when possible.
The queue is flushed after the Annex monad action finishes. That makes it
happen on program termination, and also whenever a nested Annex monad action
finishes.

Reads are optimised by checking once (per AnnexState) if the database
exists. If the database doesn't exist yet, all reads return mempty.

Reads also cause queued writes to be flushed, so reads will always be
consistent with writes (as long as they're made inside the same Annex monad).
A future optimisation path would be to determine when that's not necessary,
which is probably most of the time, and avoid flushing unncessarily.

Design notes for this commit:

- separate reads from writes
- reuse a handle which is left open until program
  exit or until the MVar goes out of scope (and autoclosed then)
- writes are queued
  - queue is flushed periodically
  - immediate queue flush before any read
  - auto-flush queue when database handle is garbage collected
  - flush queue on exit from Annex monad
    (Note that this may happen repeatedly for a single database connection;
    or a connection may be reused for multiple Annex monad actions,
    possibly even concurrent ones.)
- if database does not exist (or is empty) the handle
  is not opened by reads; reads instead return empty results
- writes open the handle if it was not open previously
2015-12-23 19:18:52 -04:00
Joey Hess
4b02af57b6
display a message in the unlikely scenario of fsking a dead repository 2015-11-10 14:44:58 -04:00
Joey Hess
468e52fbe3
add back missing newline to showRaw 2015-11-10 14:06:07 -04:00
Joey Hess
4fd03ccd7b
concurrent-output, first pass
Output without -Jn should be unchanged from before. With -Jn,
concurrent-output is used for messages, but regions are not used yet, so
it's a mess.
2015-11-04 13:45:34 -04:00
Joey Hess
4ed82e5328 fsck: Work around bug in persistent that broke display of problematically encoded filenames on stderr when using --incremental. 2015-09-09 17:02:00 -04:00
Joey Hess
43aa881b47 --debug is passed along to git-annex-shell when git-annex is in debug mode. 2015-08-13 15:05:39 -04:00
Joey Hess
7584e47ba3 --debug log messages are now timestamped with fractional seconds. 2015-08-12 14:42:49 -04:00
Joey Hess
e27b97d364 Merge branch 'master' into concurrentprogress
Conflicts:
	Command/Fsck.hs
	Messages.hs
	Remote/Directory.hs
	Remote/Git.hs
	Remote/Helper/Special.hs
	Types/Remote.hs
	debian/changelog
	git-annex.cabal
2015-05-12 13:23:22 -04:00
Joey Hess
efb37e7c78 Improve behavior when a git-annex command is told to operate on a file that doesn't exist. It will now continue to other files specified after that on the command line, and only error out at the end. 2015-04-30 15:28:17 -04:00
Joey Hess
5e9f0a3493 reuse strings 2015-04-14 16:46:06 -04:00
Joey Hess
f8e700ed06 use built-in progress meters for git when in parallel mode 2015-04-10 15:15:21 -04:00
Joey Hess
2343f99c85 well along the way to fully quiet --quiet
Came up with a generic way to filter out progress messages while keeping
errors, for commands that use stderr for both.

--json mode will disable command outputs too.
2015-04-04 14:34:03 -04:00
Joey Hess
20fb91a7ad WIP on making --quiet silence progress, and infra for concurrent progress bars 2015-04-03 16:48:30 -04:00
Joey Hess
bc0180da83 rename showProgress -> showProgressDots 2015-04-03 13:51:32 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
0ee09f05d2 lower case for consistency 2015-01-10 13:41:25 -04:00