When the recorded submodule commit changes on an adjusted branch, the
change is carried in the function that reverseAdjustedCommit passes
for adjustTree's adjusttreeitem parameter. Update the CommitObject
handling in adjustTree to consider adjusttreeitem so that a submodule
change is synced back.
This breaks several parts of the upgrade code, when upgrading remotes
of the current repo, but those parts were buggy, and will need to be
fixed somehow anyway.
While git ls-files can actually be used on a repo that is not in the
cwd, it works inconsistently. For example, this fails:
git --git-dir=../foo/.git --work-tree=../foo ls-files ../foo
But change some of the paths to absolute and it will succeed. That seems
like a bug in git.
OTOH, this succeeds:
git --git-dir=../foo/.git --work-tree=../foo ls-files
But, that lists paths relative to the top of the --work-tree,
rather than the usual listing them relative to the cwd. Because the cwd
is not in the repo. And so anything parsing the ls-files output of that
is likely to operate on files in the wrong location. Indeed, there is
code in Upgrade/ that has this problem!
Remaining things needing converted are in the assistant, and Annex.Ssh.
Every other remaining call to createDirectoryIfMissing True has been
audited and is not relevant. The ones in Build/ of course don't get
included in the program. Others included eg, Remote.Tahoe and
Config.Files which both write to dotfiles under the home directory.
using git credential to get the password
One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
Rather than leaking the name of the temp file, just say the config parse
failed, and where the config was downloaded from.
Not closing the bug report because two issues were reported in the same
bug report, because the universe wants me to continually re-read old
unclosed bug reports to waste my time determining what still needs to be
done.
This avoids hardcoding the sha size, so when git uses sha256, it will
output the full sha256 and not a truncation to 40 characters.
I reviewed git's history, and while there have been some
bugs with commands not supporting --no-abbrev (eg git diff --no-index
--no-abbrev was broken in git 2.1), none of the commands git-annex
uses will be impacted by those old bugs.
Git will eventually switch to sha2 and there will not be one single
shaSize anymore, but two (40 and 64).
Changed all parsers for git plumbing output to support both sizes of
shas.
One potential problem this does not deal with is, if somewhere in
git-annex it reads two shas from different sources, and compares them
to see if they're the same sha, it would fail if they're sha1 and sha256
of the same value. I don't know if that will really be a concern.
prop_encode_decode_roundtrip failed on "\175" in C locale.
This may be a new problem after the switch to RawFilePath, but it
already had filtering for high chars, so changed to only test ascii
chars.
smudge: When annex.largefiles=anything, files that were already stored in
git, and have not been modified could sometimes be converted to being
stored in the annex. Changes in 7.20191024 made this more of a problem.
This case is now detected and prevented.
* annex.addunlocked can be set to an expression with the same format used by
annex.largefiles, in case you want to default to unlocking some files but
not others.
* annex.addunlocked can be configured by git-annex config.
Added a git-annex-matching-expression man page, broken out from
tips/largefiles.
A tricky consequence of this is that git-annex add --relaxed
honors annex.addunlocked, but an expression might want to know the size
or content of an url, which it's not going to download. I decided it was
better not to fail, and just dummy up some plausible data in that case.
Performance impact should be negligible. The global config is already
loaded for annex.largefiles. The expression only has to be parsed once,
and in the simple true/false case, it should not do any additional work
matching it.
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.
Git.Repo also changed to use RawFilePath for the path to the repo.
This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
File mode is octal not decimal. This broke in the conversion to
attoparsec.
(I've submitted the content of Utility.Attoparsec to the attoparsec
developers.)
Test suite passes 100% now.
I had thought using ByteString would avoid the problem, but the
quickcheck property is still taking Arbitrary String input, so the use
of ByteString internally doesn't matter.
The parser and looking up config keys in the map should both be faster
due to using ByteString.
I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.
Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
While L.toStrict copies, profiling showed it was only around 0.3% of
git-annex find runtime. Does not seem worth optimising that, which would
probably involve either a major refactoring, or a use of
UnsafeInterleaveIO.
Also, it seems to me that the latter would need to read chunks, and
preappend the leftover part to the next chunk. But a strict ByteString
append itself is a copy, so I'm not convinced that would be faster than
L.toStrict.
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.
Benchmarking vs master, this git-annex find is significantly faster!
Specifically:
num files old new speedup
48500 4.77 3.73 28%
12500 1.36 1.02 66%
20 0.075 0.074 0% (so startup time is unchanged)
That's without really finishing the optimization. Things still to do:
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.
It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)
Currently in a very bad unbuildable in-between state.
Eg:
git clone url --bare r
git --git-dir r annex init
This resulted in worktree = Just "." and so several things that check
worktree to determine when the repo is bare ran code paths intended for
non-bare. One such code path[1] ran git checkout with --worktree=. which
actually makes it ignore core.bare config, and so the current directory
got populated with a checkout of the master branch in this example. There
was probably also other breakage.
The fix is a bit complicated because whether the repo is bare is not
known until after Git.Config reads the config, but Git.Config handles
setting the RepoLocations's worktree when core.worktree is set. So have
to assume the worktree is the cwd, let core.worktree override that,
and then if the repo turns out to be bare, it's set back to Nothing.
(And then GIT_WORK_TREE can still override all of that.)
[1] switchHEADBack, which runs even when the clone is not from a bare repo.
* git-lfs: The url provided to initremote/enableremote will now be
stored in the git-annex branch, allowing enableremote to be used without
an url. initremote --sameas can be used to add additional urls.
* git-lfs: When there's a git remote with an url that's known to be
used for git-lfs, automatically enable the special remote.
Renamed the database to .git/annex/keysdb;
the old .git/annex/keys gets deleted during the upgrade.
It is possible that an old git-annex process is running during the
upgrade. If so, it will be able to continue using the old keys db until the
upgrade is complete, and then will presumably fail in some ugly way. Or
perhaps the upgrade will be unable to delete the open files on some
systems, and so fail with an ugly error message.
It's also possible for multiple processes to be running the upgrade
concurrently. That should be fine; they will both write the same
information into the keys db.
Other databases still need to be upgraded.
Rescued from commit 11d6e2e260 which removed
db benchmarks in favor of benchmarking arbitrary git-annex commands. Which
is nice and general, but microbenchmarks are useful too.
Work around git cat-file --batch's odd stripping of carriage return from
the end of the line (some windows infection), avoiding crashing when the
repo contains a filename ending in a carriage return.
debian oldoldstable has 2.1, and that's what i386ancient uses. It would be
better to require git 2.2, which is needed to use adjusted branches, but
can't do that w/o losing support for some old linux kernels or a
complicated git backport.
This typo would make "git cat-file cat-file" fail, and the way it's used,
I think it broke querying all info from filenames containing newlines,
because the other queries are only run when it succeeds.