Commit graph

62 commits

Author SHA1 Message Date
Joey Hess
f624876dc2
remove zombie process in file seeking
This was the last one marked as a zombie. There might be others I don't
know about, but except for in the hypothetical case of a thread dying
due to an async exception before it can wait on a process it started, I
don't know of any.

It would probably be safe to remove the reapZombies now, but let's wait
and so that in its own commit in case it turns out to cause problems.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-25 11:38:42 -04:00
Joey Hess
ca454c47f2
explicitly wait for a git process
Eliminate a zombie that was only cleaned up by the later zombie cleanup
code.

This is still not ideal, it would be cleaner if it used conduit or
something, and if the thread gets killed before waiting, it won't stop
the process.

Only remaining zombies are in CmdLine.Seek
2020-09-25 11:03:12 -04:00
Joey Hess
e41f8c83f3
close stdin handles before waiting on commands
Fixes reversion in recent conversions, the old code relied on the GC
apparently, but the new code explicitly waits on the process, so must
close stdin handle first or the command will never exit.
2020-06-05 17:27:49 -04:00
Joey Hess
2670890b17
convert to withCreateProcess for async exception safety
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.

Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
2020-06-04 15:45:52 -04:00
Joey Hess
438dbe3b66
convert to withCreateProcess for async exception safety
This handles all sites where checkSuccessProcess/ignoreFailureProcess
is used, except for one: Git.Command.pipeReadLazy
That one will be significantly more work to convert to bracketing.

(Also skipped Command.Assistant.autoStart, but it does not need to
shut down the processes it started on exception because they are
git-annex assistant daemons..)

forceSuccessProcess is done, except for createProcessSuccess.
All call sites of createProcessSuccess will need to be converted
to bracketing.

(process pools still todo also)
2020-06-04 12:44:09 -04:00
Joey Hess
279991604d
started converting Ref from String to ByteString
This should make code that reads shas and refs from git faster.

Does not compile yet, a lot needs to be done still.
2020-04-06 17:14:49 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
d830386ab2
update based on profiling
While L.toStrict copies, profiling showed it was only around 0.3% of
git-annex find runtime. Does not seem worth optimising that, which would
probably involve either a major refactoring, or a use of
UnsafeInterleaveIO.

Also, it seems to me that the latter would need to read chunks, and
preappend the leftover part to the next chunk. But a strict ByteString
append itself is a copy, so I'm not convinced that would be faster than
L.toStrict.
2019-11-27 14:09:11 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Joey Hess
c8f7cb8558
fix incorrect comment
The process will typically block until all input is read.
2019-04-24 14:29:46 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
2aae6e84af
Support newlines in filenames.
Work around git cat-file --batch's protocol not supporting newlines by
running git cat-file not batched and passing the filename as a
parameter.

Of course this is quite a lot less efficient, especially because it
currently runs it multiple times to query for different pieces of
information.

Also, it has subtly different behavior when the batch process was
started and then some changes were made, in which case the batch process
sees the old index but this workaround sees the current index. Since
that batch behavior is mostly a problem that affects the assistant and has
to be worked around in it, I think I can get away with this difference.

I don't know of any other problems with newlines in filenames, everything
else in git I can think of supports -z. And git-annex's json output
supports newlines in filenames so downstream parsers from git-annex will be ok.
git-annex commands that use --batch themselves don't support newlines
in input filenames; using --json --batch is currently a way around that
problem.

This commit was sponsored by Ewen McNeill on Patreon.
2018-09-20 13:45:44 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
251405eca2
avoid withWorkTreeRelated affecting annex symlink calculation 2016-04-08 14:24:00 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
cd865c3b8f Switch to using relative paths to the git repository.
This allows the git repository to be moved while git-annex is running in
it, with fewer problems.

On Windows, this avoids some of the problems with the absurdly small
MAX_PATH of 260 bytes. In particular, git-annex repositories should
work in deeper/longer directory structures than before. See
http://git-annex.branchable.com/bugs/__34__git-annex:_direct:_1_failed__34___on_Windows/

There are several possible ways this change could break git-annex:

1. If it changes its working directory while it's running, that would
   be Bad News. Good news everyone! git-annex never does so. It would also
   break thread safety, so all such things were stomped out long ago.

2. parentDir "." -> "" which is not a valid path. I had to fix one
   instace of this, and I should probably wipe all calls to parentDir out
   of the git-annex code base; it was never a good idea.

3. Things like relPathDirToFile require absolute input paths,
   and code assumes that the git repo path is absolute and passes it to it
   as-is. In the case of relPathDirToFile, I converted it to not make
   this assumption.

Currently, the test suite has 16 failures.
2015-01-06 16:19:41 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
fc67925fd7
reorg
avoid Git.Command needing Utility.Batch which needs async

For github-backup etc
2014-07-04 12:18:49 -04:00
Joey Hess
a44fd2c019 export CreateProcess fields from Utility.Process
update code to avoid cwd and env redefinition warnings
2014-06-10 19:20:14 -04:00
Joey Hess
3f6e4b8c7c fix all remaining -Wall warnings on Windows 2014-02-25 14:48:50 -04:00
Joey Hess
61ecf76644 unbreak the build 2014-02-12 14:34:01 -04:00
Joey Hess
029a1c431a
remove windows --git-dir unix style path hack
This is no longer necessary, at least with msysgit 1.8.5.2.msysgit.0.
Its root cause may have been fixed by other recent git path fixes.
It was causing the webapp to fail to make repos on other drives.
2014-02-11 16:12:22 -04:00
Joey Hess
c95d0cf7a8 Windows: Fix handling of absolute unix-style git repository paths.
Note that on Windows a remote with a path like /home/foo/bar
is interpreted by git as being some screwy relative path (relative to what
exactly seems ill-defined -- it seemed relative to C:\Program Files\Git\ in
my tests!) So no attempt has been made to handle such a path sanely, just not
to crash when encountering it.

Note that "C:\\foo" </> "/home/foo/bar" yields /home/foo/bar even though
that is not absolute! I don't know what to make of all this,
except that I will be very happy when this crock of **** vanishes from
the face of the earth.
2014-02-08 15:39:04 -04:00
Joey Hess
ed7c61914c assistant: Run the periodic git gc in batch mode. 2014-01-22 17:11:41 -04:00
Joey Hess
858eb26303 Avoid looping if long-running git cat-file or git hash-object crashes and keeps crashing when restarted. 2014-01-01 21:42:25 -04:00
Joey Hess
c2862d9585 pass -c option on to all git commands run
The -c option now not only modifies the git configuration seen by
git-annex, but it is passed along to every git command git-annex runs.

This was easy to plumb through because gitCommandLine is already used to
construct every git command line, to add --git-dir and --work-tree
2013-11-05 13:38:37 -04:00
Joey Hess
4f871f89ba git-recover-repository 1/2 done 2013-10-20 17:50:51 -04:00
Joey Hess
c979e0ea62 fix 2013-10-17 19:51:16 -04:00
Joey Hess
c116383b5d fix 2013-10-17 19:49:44 -04:00
Joey Hess
81c4259a0d fix 2013-10-17 19:41:00 -04:00
Joey Hess
16243b9972 missing import 2013-10-17 19:39:22 -04:00
Joey Hess
e93206e294 Windows: Deal with strange msysgit 1.8.4 behavior of not understanding DOS formatted paths for --git-dir and --work-tree. 2013-10-17 19:35:57 -04:00
Joey Hess
4e2fab90d5 avoid newline translation when writing to git hash-object
They're like mushrooms, just keep popping up.
2013-06-18 15:08:51 -04:00
Joey Hess
91c4dcfc69 Can now restart certain long-running git processes if they crash, and continue working.
Fuzz tests have shown that git cat-file --batch sometimes stops running.
It's not yet known why (no error message; repo seems ok). But this is
something we can deal with in the CoProcess framework, since all 3 types of
long-running git processes should be restartable if they fail.

Note that, as implemented, only IO errors are caught. So an error thrown
by the reveiver, when it sees something that is not valid output from
git cat-file (etc) will not cause a restart. I don't want it to retry
if git commands change their output or are just outputting garbage.
This does mean that if the command did a partial output and crashed in the
middle, it would still not be restarted.

There is currently no guard against restarting a command repeatedly, if,
for example, it crashes repeatedly on startup.
2013-05-31 12:42:13 -04:00
Joey Hess
a5dded0401 assistant: The ConfigMonitor left one zombie behind each time it checked for changes, now fixed. 2013-03-18 22:09:51 -04:00
Joey Hess
0c13d3065e git subcommand cleanup
Pass subcommand as a regular param, which allows passing git parameters
like -c before it. This was already done in the pipeing set of functions,
but not the command running set.
2013-03-03 13:39:07 -04:00
Joey Hess
4d33423067 assistant: Avoid noise in logs from git commit about typechanged files in direct mode repositories. 2013-03-01 16:21:29 -04:00
Joey Hess
f87a781aa6 finished where indentation changes 2012-12-13 00:24:19 -04:00
Joey Hess
919fec85cd better fix for zombie problem, which turns out to be a zombie ssh started by rsync
When rsyncProgress pipes rsync's stdout, this turns out to cause a ssh
process started by rsync to be left behind as a zombie. I don't know why,
but my recent zombie reaping cleanup was correct, it's just that this other
zombie, that's not directly started by git-annex, was no longer reaped
due to changes in the cleanup. Make rsyncProgress reap the zombie started
by rsync, as a workaround.

FWIW, the process tree looks like this. It seems like the rsync child
is for some reason starting but not waiting on this extra ssh process.
Ssh connection caching may be involved -- disabling it seemed to change
the shape of the tree, but did not eliminate the zombie.

 9378 pts/14   S+     0:00  |           \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
 9379 pts/14   S+     0:00  |           |   \_ ssh ...
 9380 pts/14   S+     0:00  |           |   \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
 9381 pts/14   Z+     0:00  |           \_ [ssh] <defunct>
2012-10-17 00:47:52 -04:00
Joey Hess
e05c21cb73 Fix a crash when merging files in the git-annex branch that contain invalid utf8.
The crash actually occurred when writing out the file, which was done to a
handle that had not had fileSystemEncoding applied to it.
2012-10-12 12:19:30 -04:00
Joey Hess
47314c0fad fix last zombies in the assistant
Made Git.LsFiles return cleanup actions, and everything waits on
processes now, except of course for Seek.
2012-10-04 19:56:32 -04:00
Joey Hess
f7f1d25df8 bugfix 2012-10-04 19:41:58 -04:00
Joey Hess
de3ea4adb6 remove now-unnecessary manual reaps 2012-10-04 18:58:57 -04:00
Joey Hess
5594bf0643 more zombie fighting
I'm down to 9 places in the code that can produce unwaited for zombies.

Most of these are pretty innocuous, at least for now, are only
used in short-running commands, or commands that run a set of
actions and explicitly reap zombies after each one.

The one from Annex.Branch.files could be trouble later,
since both Command.Fsck and Command.Unused can trigger it,
and the assistant will be doing those eventally. Ditto the one in
Git.LsTree.lsTree, which Command.Unused uses.

The only ones currently affecting the assistant though, are
in Git.LsFiles. Several threads use several of those.

(And yeah, using pipes or ResourceT would be a less ad-hoc approach,
but I don't really feel like ripping my entire code base apart right
now to change a foundation monad. Maybe one of these days..)
2012-10-04 18:47:31 -04:00
Joey Hess
f67b54e5e3 make a pipeReadStrict, that properly waits on the process
Nearly everything that's reading from git is operating on a small
amount of output and has been switched to use that. Only pipeNullSplit
stuff continues using the lazy version that yields zombies.
2012-10-04 18:04:09 -04:00
Joey Hess
e8188ea611 flip catchDefaultIO 2012-09-17 00:18:07 -04:00
Joey Hess
0b63ee6cd5 run git coprocesses with gitEnv 2012-09-15 17:43:37 -04:00
Joey Hess
c9b3b8829d thread safe git-annex index file use 2012-08-24 20:50:39 -04:00