Commit graph

38 commits

Author SHA1 Message Date
Joey Hess
a6bebe3c0f
make hashFile support paths with newlines
git hash-object --stdin-paths is a newline protocol so it cannot
support them. It would help to not use absPath, when the problem
is that the repository itself is in a path with a newline. But,
there's a reason it used absPath, which is that
git hash-object --stdin-paths actually chdirs to the top of the
repository on startup! That is not documented, and I think is a bug
in git.

I considered making the path relative to the top of the repo, but
then what if this is a git bug and gets fixed? git-annex would break
horribly.

So instead, keep the absPath, but when the path contains a newline,
fall back to running git hash-object once per file, which avoids
the problem with newlines and --stdin-paths. It will be slower,
but this is an edge case. (Similar slow code paths are already used
elsewhere when dealing with filenames with newlines and other parts
of git that use line-based protocols.)

Sponsored-by: Dartmouth College's Datalad project
2023-03-13 13:43:40 -04:00
Joey Hess
f45ad178cb
more RawFilePath conversion
At 318/645 after 4k lines of changes

This commit was sponsored by Jake Vosloo on Patreon.
2020-10-29 12:03:50 -04:00
Joey Hess
08cbaee1f8
more RawFilePath conversion
Most of Git/ builds now.

Notable win is toTopFilePath no longer double converts

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-10-28 15:55:30 -04:00
Joey Hess
279991604d
started converting Ref from String to ByteString
This should make code that reads shas and refs from git faster.

Does not compile yet, a lot needs to be done still.
2020-04-06 17:14:49 -04:00
Joey Hess
ea3cb7d277
fix a case where file tracked by git unexpectedly becomes annex pointer file
smudge: When annex.largefiles=anything, files that were already stored in
git, and have not been modified could sometimes be converted to being
stored in the annex. Changes in 7.20191024 made this more of a problem.
This case is now detected and prevented.
2019-12-27 15:08:03 -04:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
232b1a08f3
simplification now that all logs use Builder 2019-01-09 14:10:05 -04:00
Joey Hess
53905490df
convert Git.HashObject to use ByteStrings
Both lazy and strict, because sometimes it's more efficient to build a
small strict bytestring, and other times better to lazily stream.
2019-01-03 13:21:01 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
e23028d19b
restart coprocess in raw mode
Restarting a crashing git process could result in filename encoding issues
when not in a unicode locale, as the restarted processes's handles were not
read in raw mode.

Since rawMode is always used when starting a coprocess, didn't bother
to parameterise it and just always enable it for simplicity.

This commit was sponsored by Jake Vosloo on Patreon.
2016-11-01 14:03:59 -04:00
Joey Hess
84f20c9f69
Windows: Avoid terminating git-annex branch lines with \r\n when union merging. 2016-05-27 15:22:52 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
d44b28437d git-hash-object needs absolute files (git bug)
A relative path to a file makes it fail. I am pretty sure this is a git
bug; workaround it.
2015-01-06 17:33:29 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
4f871f89ba git-recover-repository 1/2 done 2013-10-20 17:50:51 -04:00
Joey Hess
02c51266ec missed another hash-object call, disable filtering there too 2013-06-18 14:48:15 -04:00
Joey Hess
a1f8771d2b avoid filtering object being hashed
This avoids newline conversion being done on it in Windows.
2013-06-18 13:42:16 -04:00
Joey Hess
91c4dcfc69 Can now restart certain long-running git processes if they crash, and continue working.
Fuzz tests have shown that git cat-file --batch sometimes stops running.
It's not yet known why (no error message; repo seems ok). But this is
something we can deal with in the CoProcess framework, since all 3 types of
long-running git processes should be restartable if they fail.

Note that, as implemented, only IO errors are caught. So an error thrown
by the reveiver, when it sees something that is not valid output from
git cat-file (etc) will not cause a restart. I don't want it to retry
if git commands change their output or are just outputting garbage.
This does mean that if the command did a partial output and crashed in the
middle, it would still not be restarted.

There is currently no guard against restarting a command repeatedly, if,
for example, it crashes repeatedly on startup.
2013-05-31 12:42:13 -04:00
Joey Hess
abe8d549df fix permission damage (thanks, Windows) 2013-05-11 23:54:25 -04:00
Joey Hess
5e1458152f refactoring 2013-05-11 23:11:56 -04:00
Joey Hess
dc22549ab3 git annex init works on Windows!
git hash-object and cat-file both only use \n at ends of line, even on Windows.
2013-05-11 16:02:35 -05:00
Joey Hess
f87a781aa6 finished where indentation changes 2012-12-13 00:24:19 -04:00
Joey Hess
de3ea4adb6 remove now-unnecessary manual reaps 2012-10-04 18:58:57 -04:00
Joey Hess
0b63ee6cd5 run git coprocesses with gitEnv 2012-09-15 17:43:37 -04:00
Joey Hess
182526ff68 add debugging 2012-07-17 14:40:05 -04:00
Joey Hess
20f425be19 make watch use the queue
May not work. Certianly needs to flush the queue from time to time
when only symlink changes are being made.
2012-06-07 15:40:44 -04:00
Joey Hess
f596084a59 move hashObject to HashObject library and generalize it to support all git object types 2012-06-06 02:31:31 -04:00
Joey Hess
cac130b205 cleanup 2012-02-21 00:16:24 -04:00
Joey Hess
6c0155efb7 refactor 2012-02-20 15:22:21 -04:00
Joey Hess
7ebd98d8d8 fix memory leak when staging the journal
The list of files had to be retained until the end so it could be deleted.
Also, a list of update-index lines was generated and only then fed into it.
Now everything streams in constant space.
2012-02-14 14:37:59 -04:00
Joey Hess
a40ec5e03e Fixed a memory leak due to excessive strictness when committing journal files.
When hashing the files, the entire list of shas was read strictly.
That was entirely unnecessary, since there's a cleanup action run
after they're consumed.
2012-02-14 11:20:34 -04:00
Joey Hess
3ac2677e00 comment typo 2012-02-13 16:58:26 -04:00
Joey Hess
586be39952 fix file encoding of HashObject 2012-02-04 13:01:00 -04:00
Joey Hess
ef28b3fef7 split out Git/Command.hs 2011-12-14 15:56:11 -04:00
Joey Hess
02f1bd2bf4 split more stuff out of Git.hs 2011-12-14 15:43:13 -04:00
Joey Hess
46588674b0 avoid closing pipe before all the shas are read from it
Could have just used hGetContentsStrict here, but that would require
storing all the shas in memory. Since this is called at the end of a
git-annex run, it may have created a *lot* of shas, so I avoid that memory
use and stream them out like before.
2011-12-12 21:41:37 -04:00
Joey Hess
0e45b762a0 broke out Git/HashObject.hs 2011-12-12 21:24:55 -04:00