Commit graph

85 commits

Author SHA1 Message Date
Joey Hess
00d814aecc fix filename encoding for git cat-file
The filename sent to git cat-file needs to be sent on a File encoded handle.

Also set the read handle to use the File encoding, so that any error
message mentioning the filename is received properly.

The actual file content is read using Data.ByteString.Char8, which
will ignore the read handle's encoding, so this won't change that.
(Whether that is entirely correct remains to be seen.)
2012-02-26 14:11:50 -04:00
Joey Hess
cac130b205 cleanup 2012-02-21 00:16:24 -04:00
Joey Hess
6c0155efb7 refactor 2012-02-20 15:22:21 -04:00
Joey Hess
f0f07db01d reorder prams and put -- after atrributes, for compatability with old git
(cherry picked from commit c8ec0e233e)
2012-02-15 14:01:06 -04:00
Joey Hess
52c5b164d8 Added a annex.queuesize setting
useful when adding hundreds of thousands of files on a system with plenty
of memory.

git add gets quite slow in such a large repository, so if the system has
more than the ~32 mb of memory the queue can use by default, it's a useful
optimisation to increase the queue size, in order to decrease the number
of times git add is run.
2012-02-15 11:14:19 -04:00
Joey Hess
7ebd98d8d8 fix memory leak when staging the journal
The list of files had to be retained until the end so it could be deleted.
Also, a list of update-index lines was generated and only then fed into it.
Now everything streams in constant space.
2012-02-14 14:37:59 -04:00
Joey Hess
a40ec5e03e Fixed a memory leak due to excessive strictness when committing journal files.
When hashing the files, the entire list of shas was read strictly.
That was entirely unnecessary, since there's a cleanup action run
after they're consumed.
2012-02-14 11:20:34 -04:00
Joey Hess
8f76d66f32 set fileEncoding on CheckAttr handles
Seemed to work without it, but this is correct.
2012-02-14 04:31:39 -04:00
Joey Hess
a2f241d503 fix LsFiles.typeChanged paths
Passing absolute paths to Command.Add used to work, but after recent
changes doesn't. All LsFiles should use relative paths anyway, so fix it
there.
2012-02-14 00:22:42 -04:00
Joey Hess
cbaebf538a rework git check-attr interface
Now gitattributes are looked up, efficiently, in only the places that
really need them, using the same approach used for cat-file.

The old CheckAttr code seemed very fragile, in the way it streamed files
through git check-attr.
I actually found that cad8824852
was still deadlocking with ghc 7.4, at the end of adding a lot of files.
This should fix that problem, and avoid future ones.

The best part is that this removes withAttrFilesInGit and withNumCopies,
which were complicated Seek methods, as well as simplfying the types
for several other Seek methods that had a Backend tupled in.
2012-02-13 23:52:21 -04:00
Joey Hess
d35a8d85b5 another place hGetBoth was used without a writer thread 2012-02-13 20:23:45 -04:00
Joey Hess
cad8824852 thinko
I removed the now unnecessary forkProcess, but forgot to change back to
pipeBoth, so there was no writer thread.
2012-02-13 20:01:37 -04:00
Joey Hess
3ac2677e00 comment typo 2012-02-13 16:58:26 -04:00
Joey Hess
e4d0923544 wording 2012-02-09 17:35:36 -04:00
Joey Hess
dc682e53a2 use fileEncoding for git-update-index input handle 2012-02-04 13:03:33 -04:00
Joey Hess
586be39952 fix file encoding of HashObject 2012-02-04 13:01:00 -04:00
Joey Hess
d8fb97806c support all filename encodings with ghc 7.4
Under ghc 7.4, this seems to be able to handle all filename encodings
again. Including filename encodings that do not match the LANG setting.
I think this will not work with earlier versions of ghc, it uses some ghc
internals.

Turns out that ghc 7.4 has a special filesystem encoding that it uses when
reading/writing filenames (as FilePaths). This encoding is documented
to allow  "arbitrary undecodable bytes to be round-tripped through it".

So, to get FilePaths from eg, git ls-files, set the Handle that is reading
from git to use this encoding. Then things basically just work.

However, I have not found a way to make Text read using this encoding.
Text really does assume unicode. So I had to switch back to using String
when reading/writing data to git. Which is a pity, because it's some
percent slower, but at least it works.

Note that stdout and stderr also have to be set to this encoding, or
printing out filenames that contain undecodable bytes causes a crash.
IMHO this is a misfeature in ghc, that the user can pass you a filename,
which you can readFile, etc, but that default, putStr of filename may
cause a crash!

Git.CheckAttr gave me special trouble, because the filenames I got back
from git, after feeding them in, had further encoding breakage.
Rather than try to deal with that, I just zip up the input filenames
with the attributes. Which must be returned in the same order queried
for this to work.

Also of note is an apparent GHC bug I worked around in Git.CheckAttr. It
used to forkProcess and feed git from the child process.  Unfortunatly,
after this forkProcess, accessing the `files` variable from the parent
returns []. Not the value that was passed into the function. This screams
of a bad bug, that's clobbering a variable, but for now I just avoid
forkProcess there to work around it. That forkProcess was itself only added
because of a ghc bug, #624389. I've confirmed that the test case for that
bug doesn't reproduce it with ghc 7.4. So that's ok, except for the new ghc
bug I have not isolated and reported. Why does this simple bit of code
magnet the ghc bugs? :)

Also, the symlink touching code is currently broken, when used on utf-8
filenames in a non-utf-8 locale, or probably on any filename containing
undecodable bytes, and I temporarily commented it out.
2012-02-03 16:23:20 -04:00
Joey Hess
3d49258e5b attempt at a quick, utf-8 only fix to the ghc 7.4 problem
If you have only utf-8 filenames, and need to build git-annex with ghc 7.4,
this will work. But, it will crash on non-utf-8 filenames.
2012-02-01 16:16:08 -04:00
Joey Hess
a964012fc3 switch to the strict state monad
I had not realized what a memory leak the lazy state monad could be,
although I have not seen much evidence of actual leaking in git-annex.
However, if running git-annex on a great many files, this could matter.

The additional Utility.State.changeState adds even more strictness,
avoiding a problem I saw in github-backup where repeatedly modifying
state built up a huge pile of thunks.
2012-01-29 22:55:06 -04:00
Joey Hess
97209ac08d fix error message 2012-01-25 20:43:01 -04:00
Joey Hess
3ca7cf5db1 export fromPath
Not used in git-annex, but I am using it in git-backup
2012-01-25 20:42:05 -04:00
Joey Hess
ce5637498f remove Utility.Conditional and use IfElse
This drops the >>! and >>? with the nice low fixity. IfElse does have
undocumented >>=>>! and >>=>>? operators, but I deem that too fishy.
Anyway, using whenM and unlessM is easier; I sometimes mixed the operators
up.
2012-01-24 16:22:07 -04:00
Joey Hess
ba6088b249 rename readMaybe to readish
a stricter (but also partial) readMaybe is getting added to base
2012-01-23 17:00:10 -04:00
Joey Hess
8c87293b48 avoid unnecessary stats when traversing to parent 2012-01-14 11:48:10 -04:00
Joey Hess
92a4af8b20 avoid unnecessary chdir 2012-01-14 11:42:51 -04:00
Joey Hess
1f66af2b53 optimize away 3 stats 2012-01-14 11:28:49 -04:00
Joey Hess
ff5703ce77 tweak 2012-01-13 21:06:00 -04:00
Joey Hess
66aac77467 support relative GIT_DIR 2012-01-13 14:40:36 -04:00
Joey Hess
1ae780ee79 git-annex, git-union-merge: Support GIT_DIR and GIT_WORK_TREE.
Note that GIT_WORK_TREE cannot influence GIT_DIR; that is necessary for
git-fake-bare and vcsh type things to work.
2012-01-13 12:52:09 -04:00
Joey Hess
0d5c402210 Add annex-trustlevel configuration settings, which can be used to override the trust level of a remote.
This overrides the trust.log, and is overridden by the command-line trust
parameters.

It would have been nicer to have Logs.Trust.trustMap just look up the
configuration for all remotes, but a dependency loop prevented that
(Remotes depends on Logs.Trust in several ways). So instead, look up
the configuration when building remotes, storing it in the same forcetrust
field used for the command-line trust parameters.
2012-01-09 23:31:44 -04:00
Joey Hess
9fb5f3edc7 log --after=date 2012-01-06 17:24:03 -04:00
Joey Hess
0b27e6baa0 Support unescaped repository urls, like git does.
Turns out that git will accept a .git/config containing an url with eg,
spaces in its name. Handle this by escaping the url if it's not valid.

This also fixes support for urls containing escaped characters like %20
for space. Before, the path from the url was not unescaped properly.
2012-01-05 14:32:20 -04:00
Joey Hess
f0957426c5 skip local remotes that are not available (ie, not mounted)
With --fast, unavailable local remotes are filtered out of the fast set.
This way, if there are local remotes, --fast always acts only on them,
and if none are mounted, acts on nothing. This consistency is better
than --fast acting on different remotes depending on what's mounted.
2011-12-31 04:50:39 -04:00
Joey Hess
a2ec2d3760 refactor and check for a detached HEAD 2011-12-31 03:38:58 -04:00
Joey Hess
52104dae6f refactor 2011-12-30 18:36:40 -04:00
Joey Hess
26040d6419 add base, under
The describe function was only intended to generate a human-visible
description of a branch, but taking the base of a branch is a useful
operation to be able to do no matter the human-visible representation.

Converting a branch like refs/heads/master to refs/heads/origin/master
is also a useful operation, and under can do that.
2011-12-30 16:48:26 -04:00
Joey Hess
5287d1dc3f fixed behavior when multiple insteadOf configs are provided for the same url base
Consider this git config --list case:

url.git+ssh://git@example.com/.insteadOf=gl
url.git+ssh://git@example.com/.insteadOf=shared

Since config is stored in a Map, only the last of the values for this key
was stored and available for use by the insteadOf code. But that
is wrong; git allows either "gl" or "shared" to be used in an url and
the insteadOf value to be substituted in.

To support this, it seems best to keep the existing config map as-is,
and add a second map that accumulates a list of multiple values for
config keys. This new fullconfig map can be used in the rare places where
multiple values for a key make sense, without needing to complicate
everything else.

Haskell's laziness and data sharing keep the overhead of adding
this second map low.
2011-12-30 14:07:46 -04:00
Joey Hess
cba3ce08df handle C-style escapes in Format
I was happily able to repurpose some code from Git.Filename to handle this.

I remember writing that code... a whole afternoon at a coffee shop, after
which I felt I'd struggled with Haskell and git, and sorta lost, in needing
to write this nasty peice of code. But was also pleased at the use of a
pair of functions and quickcheck that allowed me to get it 100% right.
So, turns out I not only got it right, but the code wasn't as special-purpose
as I'd feared. Yay!
2011-12-23 01:05:16 -04:00
Joey Hess
5a275a3f5d Can now be built with older git versions (before 1.7.7); the resulting binary should only be used with old git.
Remove git old version check from configure, and use the git version
it was built against in the git check-attr code.
2011-12-22 15:01:13 -04:00
Joey Hess
6bffe509d7 Add --include, which is the same as --not --exclude. 2011-12-22 14:00:17 -04:00
Joey Hess
ee3b5b2a42 use Common in a few more modules 2011-12-20 14:37:53 -04:00
Joey Hess
95d2391f58 more partial function removal
Left a few Prelude.head's in where it was checked not null and too hard to
remove, etc.
2011-12-15 18:19:36 -04:00
Joey Hess
fbc3d32f7d avoid partial function, and parse git-ref output better
It's possible that a ref name might contain a space, this properly
preserves the space.
2011-12-15 16:58:04 -04:00
Joey Hess
eb132a854e avoid partial head function
(although it was used safely)
2011-12-15 16:04:08 -04:00
Joey Hess
111b6937ec avoid partial functions, and added check for correct sha content 2011-12-15 15:57:47 -04:00
Joey Hess
a8643ca44c refactor 2011-12-15 13:05:47 -04:00
Joey Hess
09cd042775 Properly handle multiline git config values.
A crash on parsing was fixed a while ago. This adds support for fully
correctly parsing multiline git config values, using git config --null.

Since git-annex-shell configlist uses normal git config output, I left in
support for that too; the two forms of config output can be easily
identified by the parser. Since configlist only prints the annex.uuid
config, there's no risk of multiline values there, so no need to change it.
2011-12-15 12:48:27 -04:00
Joey Hess
ef28b3fef7 split out Git/Command.hs 2011-12-14 15:56:11 -04:00
Joey Hess
02f1bd2bf4 split more stuff out of Git.hs 2011-12-14 15:43:13 -04:00
Joey Hess
9db8ec210f split out two more Git modules 2011-12-13 15:24:23 -04:00