Commit graph

43 commits

Author SHA1 Message Date
Joey Hess
f09e299156
rawfilepath conversion 2023-02-27 15:06:32 -04:00
Joey Hess
49ee07f93d
fix flush of a closed file handle
Avoids displaying warning about git-annex restage needing to be run in
situations where it does not.

Closing a handle flushes it anyway, so no need for an explict flush. The
handle does get closed twice, but that's fine, the second one does nothing.

Sponsored-by: Dartmouth College's DANDI project
2022-09-30 14:02:31 -04:00
Joey Hess
6a3bd283b8
add restage log
When pointer files need to be restaged, they're first written to the
log, and then when the restage operation runs, it reads the log. This
way, if the git-annex process is interrupted before it can do the
restaging, a later git-annex process can do it.

Currently, this lets a git-annex get/drop command be interrupted and
then re-ran, and as long as it gets/drops additional files, it will
clean up after the interrupted command. But more changes are
needed to make it easier to restage after an interrupted process.

Kept using the git queue to run the restage action, even though the
list of files that it builds up for that action is not actually used by
the action. This could perhaps be simplified to make restaging a cleanup
action that gets registered, rather than using the git queue for it. But
I wasn't sure if that would cause visible behavior changes, when eg
dropping a large number of files, currently the git queue flushes
periodically, and so it restages incrementally, rather than all at the
end.

In restagePointerFiles, it reads the restage log twice, once to get
the number of files and size, and a second time to process it.
This seemed better than reading the whole file into memory, since
potentially a huge number of files could be in there. Probably the OS
will cache the file in memory and there will not be much performance
impact. It might be better to keep running tallies in another file
though. But updating that atomically with the log seems hard.

Also note that it's possible for calcRestageLog to see a different file
than streamRestageLog does. More files may be added to the log in
between. That is ok, it will only cause the filterprocessfaster heuristic to
operate with slightly out of date information, so it may make the wrong
choice for the files that got added and be a little slower than ideal.

Sponsored-by: Dartmouth College's DANDI project
2022-09-23 15:47:24 -04:00
Joey Hess
9c76e503cf
generalize refreshIndex to MonadIO
Sponsored-by: Dartmouth College's DANDI project
2022-09-23 14:28:52 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
87f91ce563
more RawFilePath conversion
451/645
2020-10-30 15:55:59 -04:00
Joey Hess
681b44236a
more RawFilePath conversion
at 377/645

This commit was sponsored by Svenne Krap on Patreon.
2020-10-29 14:20:57 -04:00
Joey Hess
438dbe3b66
convert to withCreateProcess for async exception safety
This handles all sites where checkSuccessProcess/ignoreFailureProcess
is used, except for one: Git.Command.pipeReadLazy
That one will be significantly more work to convert to bracketing.

(Also skipped Command.Assistant.autoStart, but it does not need to
shut down the processes it started on exception because they are
git-annex assistant daemons..)

forceSuccessProcess is done, except for createProcessSuccess.
All call sites of createProcessSuccess will need to be converted
to bracketing.

(process pools still todo also)
2020-06-04 12:44:09 -04:00
Joey Hess
2dc7b5186a
convert to withCreateProcess for async exception safety 2020-06-04 12:05:25 -04:00
Joey Hess
d5d8259937
ByteString Ref continued
Attoparsec parser for diff-tree.

Changed fromRef back to producing a String, to avoid needing to convert
every use of it. However, this does mean I'm going to miss some
opportunities where fromRef is used and the result converted back to a
ByteString. Would be worth revisiting that at some point maybe.
2020-04-07 11:54:27 -04:00
Joey Hess
279991604d
started converting Ref from String to ByteString
This should make code that reads shas and refs from git faster.

Does not compile yet, a lot needs to be done still.
2020-04-06 17:14:49 -04:00
Joey Hess
5e4deb3620
support sha256 git repos
Git will eventually switch to sha2 and there will not be one single
shaSize anymore, but two (40 and 64).

Changed all parsers for git plumbing output to support both sizes of
shas.

One potential problem this does not deal with is, if somewhere in
git-annex it reads two shas from different sources, and compares them
to see if they're the same sha, it would fail if they're sha1 and sha256
of the same value. I don't know if that will really be a concern.
2020-01-07 12:22:19 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
54d49eeac8
avoid update-index race
This commit was supported by the NSF-funded DataLad project.
2018-08-17 16:03:40 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
db27ad26bf split out DiffTreeItem
This makes github-backup happier when it reuses this library.
2014-12-22 15:32:51 -04:00
Joey Hess
13260ccc3a undo command
This commit was sponsored by Andrew Cant.
2014-11-14 14:41:07 -04:00
Joey Hess
c784ef4586 unify exception handling into Utility.Exception
Removed old extensible-exceptions, only needed for very old ghc.

Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.

Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.

However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.

Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
2014-08-07 22:03:29 -04:00
Joey Hess
a44fd2c019 export CreateProcess fields from Utility.Process
update code to avoid cwd and env redefinition warnings
2014-06-10 19:20:14 -04:00
Joey Hess
8496d8aa63
improved direct mode dir/file conflicted merge resultion, using tree grafting 2014-03-04 15:00:19 -04:00
Joey Hess
4e0be2792b remove Read instance for Ref
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)

Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
2014-02-19 01:19:57 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
3e61749d08 index file recovery 2013-10-22 12:58:04 -04:00
Joey Hess
077ca355d0 Revert "flush stream after each write to update-index, to possibly avoid buffering issues on Windows"
Didn't help.
2013-06-14 14:34:24 -04:00
Joey Hess
b97a9ea786 flush stream after each write to update-index, to possibly avoid buffering issues on Windows 2013-06-14 14:25:17 -04:00
Joey Hess
03e8594369 fix the day's windows permissions damage 2013-05-12 19:09:48 -04:00
Joey Hess
73d2f8b280 deal with git using / internally, even on DOS 2013-05-12 17:29:49 -05:00
Joey Hess
f87a781aa6 finished where indentation changes 2012-12-13 00:24:19 -04:00
Joey Hess
5594bf0643 more zombie fighting
I'm down to 9 places in the code that can produce unwaited for zombies.

Most of these are pretty innocuous, at least for now, are only
used in short-running commands, or commands that run a set of
actions and explicitly reap zombies after each one.

The one from Annex.Branch.files could be trouble later,
since both Command.Fsck and Command.Unused can trigger it,
and the assistant will be doing those eventally. Ditto the one in
Git.LsTree.lsTree, which Command.Unused uses.

The only ones currently affecting the assistant though, are
in Git.LsFiles. Several threads use several of those.

(And yeah, using pipes or ResourceT would be a less ad-hoc approach,
but I don't really feel like ripping my entire code base apart right
now to change a foundation monad. Maybe one of these days..)
2012-10-04 18:47:31 -04:00
Joey Hess
c9b3b8829d thread safe git-annex index file use 2012-08-24 20:50:39 -04:00
Joey Hess
1db7d27a45 add back debug logging
Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.

Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.

As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
2012-07-19 00:46:52 -04:00
Joey Hess
d1da9cf221 switch from System.Cmd.Utils to System.Process
Test suite now passes with -threaded!

I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.

In the process, --debug stopped printing commands it runs. I may try to
bring that back later.

Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
2012-07-18 18:00:24 -04:00
Joey Hess
da62edb42a optimisation and memory leak fix 2012-06-12 21:13:15 -04:00
Joey Hess
ca9ee21bd7 crazy optimisation
Crazy like a fox..
2012-06-10 19:58:34 -04:00
Joey Hess
5308b51ec0 stage deletions directly using update-index
no need to run git-rm separately
2012-06-10 13:05:58 -04:00
Joey Hess
7f39415600 force thunk for precalculated value 2012-06-10 12:50:15 -04:00
Joey Hess
d45a9a7831 refactor and function name cleanup
(oops, I had a calcMerge and a calc_merge!)
2012-06-08 00:29:39 -04:00
Joey Hess
91db540769 add support for staging other types of blobs, like symlinks, into the index
Also added a utility TopFilePath type, which could stand to be used more
widely.
2012-06-06 14:26:15 -04:00
Joey Hess
f1bd72ea54 factor out generic update-index code from unionmerge code 2012-06-06 00:10:34 -04:00