Switch to Data.Map.Strict everywhere that used it.
There are still lots of lazy maps in git-annex. I think switching these
is safe. The risk is that there might be a map that is used in a way
that relies on the values not being evaluated to WHNF, and switching to
strict might result in bad performance or memory use. So, I have not
switched everything.
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.
Got rid of the remotes member of Git.Repo. This was a bit painful.
Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.
This commit was sponsored by Jake Vosloo on Patreon.
Fourth or fifth try at this and finally found a way to make it work.
Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.
Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
This reverts commit 51228c2306.
No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.
With cabal, System.Posix.Process and System.Posix.Env are both missing.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.
This commit was sponsored by Fernando Jimenez on Patreon.
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.
Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.
This commit was sponsored by Jochen Bartl on Patreon.
They need unix on non-windows, for Utility.Env, which Build.Configure uses,
but cabal can't express that in a custom-setup stanza.
To avoid this problem, Utility.Env would need to be moved into
unix-compat..
Windows needs the setenv package in custom-setup, but I don't want to
pull it in on unix, which would probably break some builds and need more
work. Instead, split out setEnv to a separate module.
Quite likely, unix-compat will get a portable environment layer, and
then both modules can be removed from here.
This commit was sponsored by Øyvind Andersen Holm.
Seems I forgot to fully test that feature when documenting it.
git rev-parse needs a colon after a branch to de-reference the tree
it points to, rather than the commit. But that had it adding an extra
colon when the user specified "branch:subdir". So, check if there is a
colon before adding one.
This commit was sponsored by Francois Marier on Patreon.
Also deletes any tagged pushes that the assistant might have done,
since those would also prevent resetting a branch back.
This commit was sponsored by andrea rota.
So it will be available later and elsewhere, even after GC.
I first though to use git update-index to do this, but feeding it a line
with a tree object seems to always cause it to generate a git subtree
merge. So, fell back to using the Git.Tree interface to maniupulate the
trees, and not involving the git-annex branch index file at all.
This commit was sponsored by Andreas Karlsson.
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.
No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.
Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.
SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.
Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.
This commit was sponsored by Jochen Bartl on Patreon.
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.
As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.
So, making Git.FileName's roundtrip test only chars < 256 seems fine.
Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.
Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.
Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.
This commit was sponsored by Henrik Riomar on Patreon.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
It was being passed to sh, not to the command, oops. Noticed because it
broke the test suite on OSX, where sh -n silently does nothing. Would
also break on Linux when eg posh was being used as the shell; bash
ignores the -n.
This commit was supported by the NSF-funded DataLad project.
GIT_SSH_COMMAND was not working correctly with git-annex get,
because when used in rsync -e, there were additional parameters
appended at the end, which the GIT_SSH_COMMAND should not see.
Fixed by constructing the shell command differently.
This commit was supported by the NSF-funded DataLad project.
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.
So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.
Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.
(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)
This commit was sponsored by Jeff Goeke-Smith on Patreon.
* Added post-recieve hook, which makes updateInstead work with direct
mode and adjusted branches.
* init: Set up the post-receive hook.
This commit was sponsored by Fernando Jimenez on Patreon.
Argh, didn't need an accumulator here!
I think I use accumulators a lot more than I need to when recusively
processing lists..
This commit was sponsored by Jeff Goeke-Smith on Patreon.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.
As well as being an optimisation, this helps move toward eliminating use of
missingh.
(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)
I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
A commit last year that made a partial function use Maybe unfortunately
caused the whole input to need to be consumed, breaking streaming. So,
revert it.
This commit was sponsored by Nick Daly on Patreon.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.
Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.
This commit was sponsored by Ethan Aubin.
Restarting a crashing git process could result in filename encoding issues
when not in a unicode locale, as the restarted processes's handles were not
read in raw mode.
Since rawMode is always used when starting a coprocess, didn't bother
to parameterise it and just always enable it for simplicity.
This commit was sponsored by Jake Vosloo on Patreon.
When adding a tree like a/b/c/d when a/b already exists, fixes the bug that
the tree that got created was a/b/a/b/c/d
Just need to flatten out the top N directories of the tree that's being
grafted in, so we get the c/d part. This was complicated by the Tree
data type being a rose tree rather than a regular tree.
This commit was sponsored by Nick Daly on Patreon.
The modification flag was not being set when making modifications deep
in a tree, so parent trees were not updated to contain the modified tree.
Seems to have exposed another bug where the wrong filename gets grafted in.
This commit was sponsored by Brock Spratlen on Patreon.
.. and have to be checked to see if they are a pointed to an annexed file.
Cases where such memory use could occur included, but were not limited to:
- git commit -a of a large unlocked file (in v5 mode)
- git-annex adjust when a large file was checked into git directly
Generally, any use of catKey was a potential problem.
Fix by using git cat-file --batch-check to check size before catting.
This adds another git batch process, which is included in the CatFileHandle
for simplicity.
There could be performance impact, anywhere catKey is used. Particularly
likely to affect adjusted branch generation speed, and operations on
unlocked files in v6 mode. Hopefully since the --batch-check and
--batch read the same data, disk buffering will avoid most overhead.
Leaving only the overhead of talking to the process over the pipe and
whatever computation --batch-check needs to do.
This commit was sponsored by Bruno BEAUFILS on Patreon.
Speeds up commands like "git-annex find --in remote" by over 50%.
Profiling showed that adjustGitEnv was 21% of the time and 37% of the
allocations of that command. It copied the environment each time with
getEnvironment.
The only repeated use of adjustGitEnv is in withIndexFile, which tends to
be run at least once per file. So, it was optimised by keeping a cache of
the environment, which can be reused.
There could be other better ways to optimise this. Maybe get the while
environment once at startup. But, then it would have to be serialized back
out each time running a child process, so I doubt that would be a net win.
It might be better to cache a version of the environment that is
pre-modified to use .git-annex/index. But, profiling doesn't show that
modifying the enviroment is taking any significant time.
* sync: Previously, when run in a branch with a slash in its name,
such as "foo/bar", the sync branch was "synced/bar". That conflicted
with the sync branch used for branch "bar", so has been changed to
"synced/foo/bar".
* adjust: Previously, when adjusting a branch with a slash in its name,
such as "foo/bar", the adjusted branch was "adjusted/bar(unlocked)".
That conflicted with the adjusted branch used for branch "bar",
so has been changed to "adjusted/foo/bar(unlocked)"
* Also, running sync in an adjusted branch did not correctly sync
changes back to the parent branch when it had a slash in its name.
This bug has been fixed.
Eliminate use of Git.Ref.under and Git.Ref.basename; using
Git.Ref.underBase and Git.Ref.base make everything handle deep branches
correctly.
Probably noone was adjusting deep branches, and v6 is still experimental
anyway, so I'm not going to worry about the mess that was left by that bug.
In the case of git-annex sync, using a fixed git-annex with an old unfixed
one will mean they use different sync branches for a deep branch, and so
they may stop syncing until the old one is upgraded. However, that's only
a problem when syncing between repositories without going via a central
bare repository. Added a warning about this to the CHANGELOG, but it's
probably not going to affect many people at all.
This commit was sponsored by Riku Voipio.
Show branch:file that is being operated on.
I had to make ActionItem a type and not a type class because
withKeyOptions' passed two different types of values when using the type
class, and I could not get the type checker to accept that.
This is actually worse than I thought; when git is being run with a
detached work tree, GIT_INDEX_FILE is treated as a path relative to CWD,
instead of the normal behavior of relative the top of the work tree.
This seems to make it basically impossible for any program that wants to
use GIT_INDEX_FILE to use anything other than an absolute path to it; there
are too many configurations to keep straight that can change how git
interprets what should be a simple relative path to a file.
(I have complained to the git developers.)
This affected git annex view. It turns out that some other places
that use GIT_INDEX_FILE were already working around the bug. I removed the
workaround from Annex.Branch since the new workaround will do.
Since git-annex unsets these when started, they have to be explicitly
propigated. Also, this makes --git-dir and --work-tree settings be
reflected in the environment.
The need for this came up in
https://github.com/DanielDent/git-annex-remote-rclone/issues/3
I'd prefer to use the env var, but let's use what git currently supports.
Revert this when the env var gets supported.
Note that the version checking assumes git 2.8.2 will get support for the
switch.
git 2.8.1 (or perhaps 2.9.0) is going to prevent git merge from merging in
unrelated branches. Since the webapp's pairing etc features often combine
together repositories with unrelated histories, work around this behavior
change by setting GIT_MERGE_ALLOW_UNRELATED_HISTORIES when the assistant
merges.
Note though that this is not done for git annex sync's merges, so
it will follow git's default or configured behavior.
This is how direct mode does it too, and somehow, for reasons that
currently escape me, this makes git merge not care if it's run with an
empty work tree.
Rationalle: User might have hook scripts whose output they want to see.
Also, git commit output may tell the user they forgot to add a file.
The output is not too ugly when there's nothing to commit.
This makes the direct mode to v6 upgrade able to be performed in one clone
of a repository without affecting other clones, which can continue using v5
and direct mode.
This does mean that it has to write out temp files containing updated
objects for the merge. So may use more disk space, and disk IO, but that
should generally win out over needing to launch N separate
git hash-object processes.
Only reverse adjust the changes in the commit, which means that adjustments
do not need to be generally cleanly reversable.
For example, an adjustment can unlock all locked files, but does not need
to worry about files that were originally unlocked when reversing, because
it will only ever be run on files that have been changed. So, it's ok
if it locks all files when reversed, or even leaves all files as-is when
reversed.
So, it will pull and push the original branch, not the adjusted one.
And, for merging, it will use updateAdjustedBranch (not implemented yet).
Note that remaining uses of Git.Branch.current need to be checked too;
for things that should act on the original branch, and not the adjusted
branch.
extractTree has to parse the whole input list in order to generate a tree,
so convert interface to non-streaming.
Some quick memory benchmarks in a repo with 60k files
don't look too bad despite not streaming.
To stream, without building up a whole tree object, one way would
be a new interface:
adjustTree :: MonadIO m :: (TreeItem -> m (Maybe TreeItem)) -> Ref -> Repo -> m Sha
This would only need to buffer tree objects from the current one down
to the root, in order to update trees when a TreeItem is changed.
But, while it supports changing items in the tree, and removing items,
it does not support adding new items, or moving items from one directory to
another.
Previously, it only flushed when the queue got larger than 1.
Also, make the queue auto-flush when items are added, rather than needing
to be flushed as a separate step. This simplifies the code and make it more
efficient too, as it avoids needing to read the queue out of the state to
check if it should be flushed.
03cb2c8ece put a cat-file into the fast
bloomfilter generation path. Instead, add another bloom filter which diffs
from the work tree to the index.
Also, pull the sha of the changed object out of the diffs, and cat that
object directly, rather than indirecting through the filename.
Finally, removed some hacks that are unncessary thanks to the worktree to
index diff.
The repo path is typically relative, not absolute, so
providing it to absPathFrom doesn't yield an absolute path.
This is not a bug, just unclear documentation.
Indeed, there seem to be no reason to simplifyPath here, which absPathFrom
does, so instead just combine the repo path and the TopFilePath.
Also, removed an export of the TopFilePath constructor; asTopFilePath
is provided to construct one as-is.
Fixes several bugs with updates of pointer files. When eg, running
git annex drop --from localremote
it was updating the pointer file in the local repository, not the remote.
Also, fixes drop ../foo when run in a subdir, and probably lots of other
problems. Test suite drops from ~30 to 11 failures now.
TopFilePath is used to force thinking about what the filepath is relative
to.
The data stored in the sqlite db is still just a plain string, and
TopFilePath is a newtype, so there's no overhead involved in using it in
DataBase.Keys.