9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink
This commit was sponsored by Ethan Aubin.
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.
Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:
* If two files point to one file, then eg, `git annex get foo` will
update the branch to unlock foo, but will not unlock bar, because it
does not know about it. Might be fixable by making `git annex get
bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info
Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.
This commit was sponsored by Graham Spencer on Patreon.
planned to use for an optimisation
most things using stagedDetails were not expecting to get dup files in a
conflicted merge and deal with them, so converted them to use
inRepoDetails.
And convert parser to attoparsec, probably faster.
Before, a parse failure threw the whole --stage output line in to the
filename, which was certianly a bad idea, so fixed that.
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.
Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.
annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
37b42e72e7 made it catch exceptions but
thought they were unlikely to be useful to display, which may be right when
a git command fails, but not in the case yoh found.
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.
--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
Remaining things needing converted are in the assistant, and Annex.Ssh.
Every other remaining call to createDirectoryIfMissing True has been
audited and is not relevant. The ones in Build/ of course don't get
included in the program. Others included eg, Remote.Tahoe and
Config.Files which both write to dotfiles under the home directory.
It's important that it be clear that it overrides a config, such that
reloading the git config won't change it, and in particular, setConfig
won't change it.
Most of the calls to changeGitConfig were actually after setConfig,
which was redundant and unncessary. So removed those.
The only remaining one, besides --debug, is in the handling of
repository-global config values. That one's ok, because the
way mergeGitConfig is implemented, it does not override any value that
is set in git config. If a value with a repo-global setting was passed
to setConfig, it would set it in the git config, reload the git config,
re-apply mergeGitConfig, and use the newly set value, which is the right
thing.
Fix support for repositories tuned with annex.tune.branchhash1=true,
including --all not working and git-annex log not displaying anything for
annexed files.
The git add behavior changes could be avoided if it turns out to be
really annoying, but then it would need to behave the old way when
annex.dotfiles=false and the new way when annex.dotfiles=true. I'd
rather not have the config option result in such divergent behavior as
`git annex add .` skipping a dotfile (old) vs adding to annex (new).
Note that the assistant always adds dotfiles to the annex.
This is surprising, but not new behavior. Might be worth making it also
honor annex.dotfiles, but I wonder if perhaps some user somewhere uses
it and keeps large files in a directory that happens to begin with a
dot. Since dotfiles and dotdirs are a unix culture thing, and the
assistant users may not be part of that culture, it seems best to keep
its current behavior for now.
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.
Benchmarks indicate around 30% speedup in both commands.
Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.
Git.Repo also changed to use RawFilePath for the path to the repo.
This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
The parser and looking up config keys in the map should both be faster
due to using ByteString.
I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.
Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.
Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:
Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.
The Eq instance is fixed.
Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.
Used git-annex benchmark:
find is 7% faster
whereis is 3% faster
get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
Delete the old export dbs on upgrade.
Testing this an exporting to a directory with both exporttree=yes and
importtree=yes, it refused to let an interrupted export proceed after
upgrade, with "unsafe to overwrite file". An import resolved the
problem.
It will be populated automatically by the next command that needs data
from it, the same way it gets populated in a fresh clone. That may be a
little expensive, but it's a one time cost, and no slower than in a
fresh clone.
The old db is cleaned up when a new incremental fsck is started.
The incremental fsck won't pick up where the old one left off, but I
consider this a minor enough thing that it can just be documented and
won't be a problem.
Renamed the database to .git/annex/keysdb;
the old .git/annex/keys gets deleted during the upgrade.
It is possible that an old git-annex process is running during the
upgrade. If so, it will be able to continue using the old keys db until the
upgrade is complete, and then will presumably fail in some ugly way. Or
perhaps the upgrade will be unable to delete the open files on some
systems, and so fail with an ugly error message.
It's also possible for multiple processes to be running the upgrade
concurrently. That should be fine; they will both write the same
information into the keys db.
Other databases still need to be upgraded.
Fix bug that lost modifications to unlocked files when init is re-ran in an
already initialized repo.
In retrospect needing scanUnlockedFiles False in the direct mode upgrade
path was a good hint that it was unsafe when used with True.
However, this bug did not affect upgrade from v5. In such an upgrade, an
unlocked file that is modified is left as-is. The only place
scanUnlockedFiles True did overwrite modified unlocked files is during an
git-annex init of a repo that was already initialized by git-annex.
(I also tried a scenario where the repo had not been initialized by
git-annex yet, but was cloned from a v7 repo with an unlocked file, and the
pointer file replaced with some other content, and the data loss did not
occur in that situation.)
Since the fixed scanUnlockedFiles avoids overwriting non-pointer files,
it should be safe to run in any situation, so there's no need any longer
for the parameter.
That git fixed a memory leak that could cause an OOM during the upgrade.
Most git-annex builds have a new enough git already.
OSX git was upgraded with brew.
Linux i386ancient build's git was too old. Upgrading it to a fixed
git didn't work (due to the newer git not working with the old ssh,
https://bugs.chromium.org/p/git/issues/detail?id=7 )
Choices to deal with that were:
* Somehow make direct mode upgrade work with the old git, avoiding its
OOM problem. One way would be to switch the repo to indirect mode
first, and so upgrade to a repo with locked files. Not good when
the filesystem does not support symlinks.
* backport the OOM fix from git 2.22
(And do what about the version number so git-annex knows it's fixed?)
* backport openssh (and possibly more stuff)
* move the i386ancient build to at least Debian stretch (still backporting git)
But this will make it no longer work with some of the ancient kernels it
targets.
Of those, backporting the OOM fix seemed the best approach. Put "oomfix"
in the git version number to indicate it.
I have not automated building the git backport, so here's the patch I
used:
diff -ur orig/git-2.1.4/convert.c git-2.1.4/convert.c
--- orig/git-2.1.4/convert.c 2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/convert.c 2019-08-29 20:05:04.371872338 +0100
@@ -404,7 +404,7 @@
if (start_async(&async))
return 0; /* error was already reported */
- if (strbuf_read(&nbuf, async.out, len) < 0) {
+ if (strbuf_read(&nbuf, async.out, 0) < 0) {
error("read from external filter %s failed", cmd);
ret = 0;
}
diff -ur orig/git-2.1.4/GIT-VERSION-GEN git-2.1.4/GIT-VERSION-GEN
--- orig/git-2.1.4/GIT-VERSION-GEN 2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/GIT-VERSION-GEN 2019-08-29 20:06:39.132743228 +0100
@@ -1,7 +1,7 @@
#!/bin/sh
GVF=GIT-VERSION-FILE
-DEF_VER=v2.1.4
+DEF_VER=v2.1.4.oomfix
LF='
'
diff -ur orig/git-2.1.4/configure git-2.1.4/configure
--- orig/git-2.1.4/configure 2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/configure 2019-08-29 20:27:45.896380015 +0100
@@ -580,8 +580,8 @@
# Identity of this package.
PACKAGE_NAME='git'
PACKAGE_TARNAME='git'
-PACKAGE_VERSION='2.1.4'
-PACKAGE_STRING='git 2.1.4'
+PACKAGE_VERSION='2.1.4.oomfix'
+PACKAGE_STRING='git 2.1.4.oomfix'
PACKAGE_BUGREPORT='git@vger.kernel.org'
PACKAGE_URL=''
diff -ur orig/git-2.1.4/version git-2.1.4/version
--- orig/git-2.1.4/version 2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/version 2019-08-29 20:06:17.572545210 +0100
@@ -1 +1 @@
-2.1.4
+2.1.4.oomfix
If a direct mode file is deleted or modified, and there are no other
files containing the content, the content was lost. That's a normal
thing that can happen in direct mode, but not in v7, so the upgrade
code has to notice it in order for the location log to be accurate.
No longer used. The only possible user of it would be code in
Upgrade.V5, so I verified that the parts of Annex.Content it used were
not used to manipulate direct mode files.