That git fixed a memory leak that could cause an OOM during the upgrade.
Most git-annex builds have a new enough git already.
OSX git was upgraded with brew.
Linux i386ancient build's git was too old. Upgrading it to a fixed
git didn't work (due to the newer git not working with the old ssh,
https://bugs.chromium.org/p/git/issues/detail?id=7 )
Choices to deal with that were:
* Somehow make direct mode upgrade work with the old git, avoiding its
OOM problem. One way would be to switch the repo to indirect mode
first, and so upgrade to a repo with locked files. Not good when
the filesystem does not support symlinks.
* backport the OOM fix from git 2.22
(And do what about the version number so git-annex knows it's fixed?)
* backport openssh (and possibly more stuff)
* move the i386ancient build to at least Debian stretch (still backporting git)
But this will make it no longer work with some of the ancient kernels it
targets.
Of those, backporting the OOM fix seemed the best approach. Put "oomfix"
in the git version number to indicate it.
I have not automated building the git backport, so here's the patch I
used:
diff -ur orig/git-2.1.4/convert.c git-2.1.4/convert.c
--- orig/git-2.1.4/convert.c 2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/convert.c 2019-08-29 20:05:04.371872338 +0100
@@ -404,7 +404,7 @@
if (start_async(&async))
return 0; /* error was already reported */
- if (strbuf_read(&nbuf, async.out, len) < 0) {
+ if (strbuf_read(&nbuf, async.out, 0) < 0) {
error("read from external filter %s failed", cmd);
ret = 0;
}
diff -ur orig/git-2.1.4/GIT-VERSION-GEN git-2.1.4/GIT-VERSION-GEN
--- orig/git-2.1.4/GIT-VERSION-GEN 2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/GIT-VERSION-GEN 2019-08-29 20:06:39.132743228 +0100
@@ -1,7 +1,7 @@
#!/bin/sh
GVF=GIT-VERSION-FILE
-DEF_VER=v2.1.4
+DEF_VER=v2.1.4.oomfix
LF='
'
diff -ur orig/git-2.1.4/configure git-2.1.4/configure
--- orig/git-2.1.4/configure 2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/configure 2019-08-29 20:27:45.896380015 +0100
@@ -580,8 +580,8 @@
# Identity of this package.
PACKAGE_NAME='git'
PACKAGE_TARNAME='git'
-PACKAGE_VERSION='2.1.4'
-PACKAGE_STRING='git 2.1.4'
+PACKAGE_VERSION='2.1.4.oomfix'
+PACKAGE_STRING='git 2.1.4.oomfix'
PACKAGE_BUGREPORT='git@vger.kernel.org'
PACKAGE_URL=''
diff -ur orig/git-2.1.4/version git-2.1.4/version
--- orig/git-2.1.4/version 2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/version 2019-08-29 20:06:17.572545210 +0100
@@ -1 +1 @@
-2.1.4
+2.1.4.oomfix
If a direct mode file is deleted or modified, and there are no other
files containing the content, the content was lost. That's a normal
thing that can happen in direct mode, but not in v7, so the upgrade
code has to notice it in order for the location log to be accurate.
Systems such as Debian that have overridden the default fpath will need to
set ZSH_COMPLETIONS_PATH.
I feel that Debian is causing unncessary complexity by making this change,
and have filed a bug report about it.
This also means that when git-annex is installed with PREFIX=/usr/local
it will use /usr/local/share/zsh/site-functions which works with probably
all versions of zsh.
Its repeated opening and writing to the sqlite database somehow caused
inode cache information to occasionally be lost.
This loses code coverage, since running git-annex as a child process
prevents tracking what parts of the code are exercised. I have not looked
at the code coverage in a long time. It would probably be possible to
collect code coverage for the child procesess and merge it together.
This way a failure to clean up the main repo dir from a previous pass
can't result in reusing that repo, which won't be configured right for the
current pass.
Improved probing when CoW copies can be made between files on the same
drive. Now supports CoW between BTRFS subvolumes. And, falls back to rsync
instead of using cp when CoW won't work, eg copies between repos on the
same EXT4 filesystem.
Rather than trying cp --reflink=always for each file copied to a remote,
it's tried once and if it fails it falls back to using rsync thereafter
for the lifetime of the Remote object. That avoids overhead of calling cp
which while small, will add up over a large number of files.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Support running v7 upgrade in a repo where there is no branch checked out,
but HEAD is set directly to some other ref.
This commit was sponsored by Jack Hill on Patreon.
Reversion from commit 436f10771, CustomOutput was forcing quiet output
which overrode the json setting.
find happened to be the only command that uses CustomOutput and also
outputs json. (metadata --get does also use CustomOutput and --json does not
enable json output for that, which may be an oversight, but was already the
behavior before this regression.)
Does --unused bypass required content checks in any meaningful sense?
I documented it as such but am unsure of what required content setting
would be considered to match unused content.
init: Fix a reversion in the last release that prevented automatically
generating and setting a description for the repository.
Seemed best to factor out uuidDescMapRaw that does not
have the default mempty descrition behavior.
I don't much like that behavior, but I know things depend on it.
One thing in particular is `git annex info` which lists the uuids and
descriptions; if the current repo has been initialized in some way that
means it does not have a description, it would not show up w/o that.
(Not only repos created due to this bug might lack that. For example a repo
that was marked dead and had --drop-dead delete its git-annex branch info,
and then came back from the dead would similarly not be in the uuid.log.
Also there have been other versions of git-annex that didn't set a default
description; for years there was no default description.)
When downloading an url and the destination file exists but is empty,
avoid using http range to resume, since a range "bytes=0-" is an unusual
edge case that it's best to avoid relying on working.
This is known to fix a case where importfeed downloaded a partial feed from
such a server. Since importfeed uses withTmpFile, the destination always exists
empty, so it would particularly tickle such problem servers. Resuming from 0
is otherwise possible, but unlikely.
Avoid a delay at startup when concurrency is enabled and there are
rsync or gcrypt special remotes, which was caused by git-annex
opening a ssh connection to the remote too early.
sshOptions makes a connection to the ssh server if one is not already open,
when concurrency is enabled. Avoid doing that at startup, when the remote
list is being built, but the remote may not be used at all.
Instead, rsync/gcrypt now runs sshOptions once per ssh connection to the
server. This should not be significant overhead since Remote.Git already
has the same overhead (as do Bup and Ddar).
Add back support for ftp urls, which was disabled as part of the fix for
security hole CVE-2018-10857 (except for configurations which enabled curl
and bypassed public IP address restrictions). Now it will work if allowed
by annex.security.allowed-ip-addresses.
When a remote is configured to be readonly, don't allow changing what's
exported to it.
This was missed in the original export remote implementation, but it makes
sense for a readonly export remote to not be allowed to change.
~/.profile works for bash, but not all other login shells.
This setting PATH is a minor convenience for users, particuarly since
typing on android is so much harder. The usual linux standalone bundle
just expects the user to know how to add it to PATH. I don't want this
code to grow special cases for every possible login shell. So displaying a
message to the presumably minority who don't use bash seems like the best
choice.
Longer term, I'd hope termux gets some way to set an environment variable
for all login shells. Systems using PAM can, via ~/.pam_environment. Or
alternatively, add a git-annex package to termux, even if just an installer
package. I'd rather spend time on either of those than on making this minor
thing support more login shells.
This commit was sponsored by mo on Patreon.
* init: When the repository already has a description, don't change it.
* describe: When run with no description parameter it used to set
the description to "", now it will error out.
Fixes bug that caused git-annex to fail to add a file when another
git-annex process cleaned up the temp directory it was using.
Solution is just to push withOtherTmp out to a higher level, so that
the whole ingest process can be completed inside it.
But in the assistant, that was not practical to do, since withOtherTmp runs
in the Annex monad and the assistant does not. Worked around by introducing
a separate temp directory that only the assistant uses for lockdown.
Since only one assistant can run at a time, it's easy to clean up that
directory of old cruft at startup.
This is only done for correctness sake; I don't see any way that it
would have caused a problem here. The jlog file escaped withOtherTmp
so another process could swoop in and delete it, but the file is only
used as a buffer for a list of filenames, and its handle gets rewound
and they're read back out, which will still work even if it's already
been deleted.
The only reason I didn't just pre-delete the file and keep the handle
open is I'm not sure that works on all OS's (eg Windows). If there was
a problem that this fixed it might involve an OS that doesn't support
deleting an open file or something like that.
Fix reversion in last release that caused wrong tree to be written to
remote tracking branch after an export of a subtree.
The invariant "commitsha should have the treesha as its tree"
was not met due to a bug. Guarantee it's met by catting the commitsha
to find its actual tree. A little bit slower, but this is not run often.
Fix bug that caused importing from a special remote to repeatedly download
unchanged files when multiple files in the remote have the same content.
Unfortunately, there's really no good way to remove a uniqueness constraint
from a sqlite database. The best that can be done is to make a new table
and copy the data over. But that would require using persistent's
migrations or raw sql, and I don't want to do either.
Instead, a sledgehammer approach: Renamed .git/annex/cid to
.git/annex/cids. When the new database doesn't exist, it will be populated
from the git-annex branch.
Noting deletes the old database. Don't want to delete it out from under
some long-running git-annex process that might be using it. It could
eventually be deleted. But this is such a new feature, probably few repos
have the database in any case.
protocol=https implies port=443 and
port=443 implies protocol=https
-- this was necessary because the existing configs set port=443, but
with a protocol setting, users will naturally want to use it, and then
there's no need for them to supply the default https port. So we keep
back-compat, add a nicer way to enable https, and also add support for
non-standard https ports.
In particular, when two files had the same content, and one was unlocked
and modified, with annex.thin that can corrupt the content of the
annex object, and so fsck on the other file should detect that.
getKeyStatus was relying on Database.Keys.getAssociatedFiles to tell
when a file is unlocked, but that can false positive because the
database can list old associated files.
Instead, separate out the case of unlocked object which has multiple
hardlinks when annex.thin is in use.
To support filenames starting with dashes.
To update the config of existing repositories, you can re-run git-annex init.
Perhaps it should check every time for the old config and update it, but
that has several problems:
- read-only repos
- unexpected commands like `git annex find` changing git configs
might be surprising behavior
Since filenames starting with dashes are not super common and the user can
re-init easily enough if their repo needs fixed, I went for the simplest
fix.
xporting files with '#' or '?' in their name won't work because urls get
truncated on those. Fail in a better way in this case, and avoid failing
when removing such files from the export, so after the user has renamed the
problem files the export will succeed.
Because when git-annex lacks S3 version IDs for files stored in the bucket,
deleting them would cause data loss.
Also because git-annex is not able to download unversioned objects from a bucket
when versioning=yes.
This also prevents setting versioning=no. While that would perhaps be
possible to do safely, it would add complexity, and would mean that if
the user accidentially did enableremote versioning=no, they would not be
able to undo it.
This commit was sponsored by Trenton Cronholm on Patreon.
However, rsync still won't work with 64 bit git and
this is still not the documented way to install it.
So, if both 64 and 32 are installed, go with 32.
And if neither git can be found, default to 32.