Commit graph

518 commits

Author SHA1 Message Date
Joey Hess
2aae6e84af
Support newlines in filenames.
Work around git cat-file --batch's protocol not supporting newlines by
running git cat-file not batched and passing the filename as a
parameter.

Of course this is quite a lot less efficient, especially because it
currently runs it multiple times to query for different pieces of
information.

Also, it has subtly different behavior when the batch process was
started and then some changes were made, in which case the batch process
sees the old index but this workaround sees the current index. Since
that batch behavior is mostly a problem that affects the assistant and has
to be worked around in it, I think I can get away with this difference.

I don't know of any other problems with newlines in filenames, everything
else in git I can think of supports -z. And git-annex's json output
supports newlines in filenames so downstream parsers from git-annex will be ok.
git-annex commands that use --batch themselves don't support newlines
in input filenames; using --json --batch is currently a way around that
problem.

This commit was sponsored by Ewen McNeill on Patreon.
2018-09-20 13:45:44 -04:00
Joey Hess
fdbdf64d87
fix reversions due to undocumented and buggy git behavior
* Don't use GIT_PREFIX when GIT_WORK_TREE=. because it seems git
  does not intend GIT_WORK_TREE to be relative to GIT_PREFIX in that
  case, despite GIT_WORK_TREE=.. being relative to GIT_PREFIX.
* Don't use GIT_PREFIX to fix up a relative GIT_DIR, because
  git 2.11 sets GIT_PREFIX set to a path it's not relative to.
  and apparently GIT_DIR is never relative to GIT_PREFIX.

Commit e50ed4ba48 led us down this path
by working around a git bug by relying on the barely documented GIT_PREFIX.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-09-11 15:54:21 -04:00
Joey Hess
54d49eeac8
avoid update-index race
This commit was supported by the NSF-funded DataLad project.
2018-08-17 16:03:40 -04:00
Joey Hess
82c5dd8a01
queueing of internal IO actions on files
This would be better if getInternalFiles were
more polymorphic, but I can't see a good
way to accomplish that without messing with Data.Typeable,
which seemed like overkill.

Reverted CommandAction back to the simpler version.

This commit was sponsored by Eric Drechsel on Patreon.
2018-08-17 13:28:21 -04:00
Joey Hess
c5a8abb130
comment 2018-08-17 13:19:37 -04:00
Joey Hess
6a445dc086
support conditionally excluding queued files
Switched code to use a for loop to avoid a filterM that would have
doubled the memory used.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 14:38:37 -04:00
Joey Hess
42c5d7c64f
moving to filterdriver branch
Not used in master, so remove until/unless filterdriver branch is
merged.
2018-08-14 13:37:17 -04:00
Joey Hess
7a1a3ef11f
add missing flush-pkt after version 2018-08-13 16:22:39 -04:00
Joey Hess
d963d40815
use String not Text
On second thought, git passes filepaths, which may not be valid utf8, so
can't use Text here.

String will be a little bit slower, but not enough to worry about.
2018-08-13 16:22:31 -04:00
Joey Hess
4649625f04
small improvement to packet flushing 2018-08-10 18:08:54 -04:00
Joey Hess
65de44aa13
improve layout 2018-08-10 16:26:31 -04:00
Joey Hess
c6884989dc
capability reply must be a subset of what git sent so filter it 2018-08-10 16:24:06 -04:00
Joey Hess
fd42df78a0
git long-running process handshake implementation
This commit was supported by the NSF-funded DataLad project.
2018-08-10 16:03:04 -04:00
Joey Hess
1272ccdf1e
git pkt-line format
Git uses pkt-line in the pack and http protocols, and for the long-running
filter processes protocol as well.

This should be a quite efficient parser and builder since it uses
attoparsec and bytestring-builder.

This adds a dependency on attoparsec, but it's a free dependency because
eg aeson depends on attoparsec and git-annex depends on aeson.

This commit was supported by the NSF-funded DataLad project.
2018-08-10 14:12:33 -04:00
Joey Hess
081f8e57c6
Support working trees set up by git-worktree.
Support working trees set up by git-worktree, by setting up some symlinks
such that git-annex links work right.

Also improved support for repositories created with --separate-git-dir.
At least recent git makes a .git file for those (older may have used a
symlink?), so that also needs to be converted to a symlink.

This commit was sponsored by Nick Piper on Patreon.
2018-07-18 14:27:26 -04:00
Joey Hess
e50ed4ba48
work around git bug
Work around git bug that runs smudge/clean filters at the top of the
repository while passing them a relative GIT_WORK_TREE that may point
outside of the repository, by using GIT_PREFIX to get back to the
subdirectory where a relative GIT_WORK_TREE is valid.

git devs have been informed of the bug and may fix it, which could conveivably
break this fix, but as it is, this works back to git 1.7.6.

This commit was sponsored by Jochen Bartl on Patreon.
2018-07-17 14:27:39 -04:00
Joey Hess
1c8ee99b46
Fix build with ghc 8.4+, which broke due to the Semigroup Monoid change
https://prime.haskell.org/wiki/Libraries/Proposals/SemigroupMonoid

I am not happy with the fragile pile of CPP boilerplate required to support
ghc back to 7.0, which git-annex still targets for both the android build
and the standalone build targeting old linux kernels. It makes me unlikely
to want to use Semigroup more in git-annex, because the benefit of the
abstraction is swamped by the ugliness. I actually considered ripping out
all the Semigroup instances, but some are needed to use
optparse-applicative.

The problem, I think, is they made this transaction on too fast a timeline.
(Although ironically, work on it started in 2015 or earlier!)
In particular, Debian oldstable is not out of security support, and it's
not possible to follow the simpler workarounds documented on the wiki and
have it build on oldstable (because the semigroups package in it is too
old).

I have only tested this build with ghc 8.2.2, not the newer and older
versions that branches of the CPP support. So there could be typoes, we'll
see.

This commit was sponsored by Brock Spratlen on Patreon.
2018-05-30 12:28:43 -04:00
Joey Hess
442e607b0a
Don't allow entering a view with staged or unstaged changes.
In some cases, unstaged changes are safe, eg dotfiles in the top which
are not affected by a view. Or non-annexed files in general which would
prevent view branch checkout from proceeding. But in other cases,
particularly unstaged changes to annexed files, entering a view would wipe
out those changes! And so don't allow entering a view with any unstaged
changes.

Staged changes are not safe when entering a view, because the changes get
committed to the view branch, and so the user is unlikely to remember them
when they exit the view, and so will effectively lose them, even if they're
still present in the view branch.

Also, improved the git status parser, although the improvement turned out
to not really be needed.

This commit was sponsored by Eric Drechsel on Patreon.
2018-05-14 16:51:06 -04:00
Joey Hess
d7021d420f
reuse hashes of dotfiles/dirs/submodules when entering view
This fixes a crash when a git submodule has a name starting with a dot.
Such a submodule might contain dotfiles that are intended to be used when
inside the view (since a dot-directory that's not a submodule was already
preserved when entering a view). So, rather than eliminating the submodule
from the view, its git ls-files --stage hash is copied over into the view.

dotfiles/dirs have their git ls-files --stage hashes similarly copied over
to the view. This is more efficient and simpler than the old method,
and also won't break if git ever adds a new type of tree item, like was
done with submodules.

Since the content of dotfiles in the working tree is no longer hashed
when entering a view, when there are unstaged modifications, they are
not included in the view branch. Entering the view branch still works,
but git checkout shows "M .dotfile", and git diff will show the unstaged
changes. This seems like an improvement over the old behavior.

Also made Command.View not delete empty directories that are submodules
when entering a view, while still deleting other empty directories.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 15:35:20 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
256d8f07e8
avoid insertWith' depreaction warning
Switch to Data.Map.Strict everywhere that used it.

There are still lots of lazy maps in git-annex. I think switching these
is safe. The risk is that there might be a map that is used in a way
that relies on the values not being evaluated to WHNF, and switching to
strict might result in bad performance or memory use. So, I have not
switched everything.
2018-04-22 13:28:31 -04:00
Joey Hess
2b66492d6e
Improve startup time for commands that do not operate on remotes
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
2018-01-09 16:22:07 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
1f5bf73af0
Revert "git-annex.cabal: Add back custom-setup stanza, so cabal new-build works."
This reverts commit 51228c2306.

No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.

With cabal, System.Posix.Process and System.Posix.Env are both missing.
2017-12-31 14:09:41 -04:00
Joey Hess
51228c2306
git-annex.cabal: Add back custom-setup stanza, so cabal new-build works.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-12-31 13:54:41 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
1b6cbb63e9
still can't express custom-setup deps
They need unix on non-windows, for Utility.Env, which Build.Configure uses,
but cabal can't express that in a custom-setup stanza.

To avoid this problem, Utility.Env would need to be moved into
unix-compat..
2017-11-14 14:59:51 -04:00
Joey Hess
8d68112be5
split out setEnv to avoid adding dep
Windows needs the setenv package in custom-setup, but I don't want to
pull it in on unix, which would probably break some builds and need more
work. Instead, split out setEnv to a separate module.

Quite likely, unix-compat will get a portable environment layer, and
then both modules can be removed from here.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-14 14:28:49 -04:00
Joey Hess
c7fe7760d1
typo fix 2017-10-30 12:28:44 -04:00
Joey Hess
24883e01cd
Fix export of subdir of a branch.
Seems I forgot to fully test that feature when documenting it.

git rev-parse needs a colon after a branch to de-reference the tree
it points to, rather than the commit. But that had it adding an extra
colon when the user specified "branch:subdir". So, check if there is a
colon before adding one.

This commit was sponsored by Francois Marier on Patreon.
2017-10-30 12:02:22 -04:00
Joey Hess
e8c9a5c515
sync: Added --cleanup, which removes local and remote synced/ branches.
Also deletes any tagged pushes that the assistant might have done,
since those would also prevent resetting a branch back.

This commit was sponsored by andrea rota.
2017-09-28 14:58:48 -04:00
Joey Hess
5483ea90ec
graft exported tree into git-annex branch
So it will be available later and elsewhere, even after GC.

I first though to use git update-index to do this, but feeding it a line
with a tree object seems to always cause it to generate a git subtree
merge. So, fell back to using the Git.Tree interface to maniupulate the
trees, and not involving the git-annex branch index file at all.

This commit was sponsored by Andreas Karlsson.
2017-08-31 18:06:49 -04:00
Joey Hess
df11e54788
avoid the dashed ssh hostname class of security holes
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.

No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.

Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.

SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.

Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.

This commit was sponsored by Jochen Bartl on Patreon.
2017-08-17 22:11:31 -04:00
Joey Hess
da8e84efe9
fix failing quickcheck properties
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.

As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.

So, making Git.FileName's roundtrip test only chars < 256 seems fine.

Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.

Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.

Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.

This commit was sponsored by Henrik Riomar on Patreon.
2017-06-17 16:48:00 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
c8a6be7eef
fix GIT_SSH_COMMAND -n parameter
It was being passed to sh, not to the command, oops. Noticed because it
broke the test suite on OSX, where sh -n silently does nothing. Would
also break on Linux when eg posh was being used as the shell; bash
ignores the -n.

This commit was supported by the NSF-funded DataLad project.
2017-03-20 14:23:19 -04:00
Joey Hess
d674fd5a69
super tricky shell command generation hack
GIT_SSH_COMMAND was not working correctly with git-annex get,
because when used in rsync -e, there were additional parameters
appended at the end, which the GIT_SSH_COMMAND should not see.

Fixed by constructing the shell command differently.

This commit was supported by the NSF-funded DataLad project.
2017-03-17 18:06:59 -04:00
Joey Hess
c9578be5b2
fix over-shell-escape
Seems I had one time too many.
2017-03-17 17:28:25 -04:00
Joey Hess
faecd73f32
Support GIT_SSH and GIT_SSH_COMMAND
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.

So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.

Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.

(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-03-17 16:20:37 -04:00
Joey Hess
75a15e1ad7
status: Pass --ignore-submodules=when option on to git status.
Didn't make --ignore-submodules without a value be handled because I can't
see a way to make optparse-applicative parse that. I've opened a bug
requesting a way to do that:
https://github.com/pcapriotti/optparse-applicative/issues/243
2017-02-20 17:01:24 -04:00
Joey Hess
a13c0ce66c
adjust: Fix behavior when used in a repository that contains submodules.
Also fixed the LsFiles parser to not assume its output has a fixed width
type field.
2017-02-20 13:44:55 -04:00
Joey Hess
d074532aff
post-recive hook to make updateInstead work in direct mode and adjusted branches
* Added post-recieve hook, which makes updateInstead work with direct
  mode and adjusted branches.
* init: Set up the post-receive hook.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-02-17 14:04:43 -04:00
Joey Hess
ed60f60e9b
unused: Improved memory use significantly when there are a lot of differences between branches.
Argh, didn't need an accumulator here!

I think I use accumulators a lot more than I need to when recusively
processing lists..

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-01-31 19:42:00 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
dbaea98836
fix lack of laziness streaming large diffs
A commit last year that made a partial function use Maybe unfortunately
caused the whole input to need to be consumed, breaking streaming. So,
revert it.

This commit was sponsored by Nick Daly on Patreon.
2017-01-31 17:43:11 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
e23028d19b
restart coprocess in raw mode
Restarting a crashing git process could result in filename encoding issues
when not in a unicode locale, as the restarted processes's handles were not
read in raw mode.

Since rawMode is always used when starting a coprocess, didn't bother
to parameterise it and just always enable it for simplicity.

This commit was sponsored by Jake Vosloo on Patreon.
2016-11-01 14:03:59 -04:00
Joey Hess
b530e43285
Fix reversion in 6.20161012 that prevented adding files with a space in their name. 2016-10-31 18:39:37 -04:00
Joey Hess
2ad7b00e29
Assistant, repair: Fix ignoring of git fsck errors due to duplicate file entries in tree objects. 2016-10-31 14:00:37 -04:00