This is needed because when preferred content matches on files,
the second pass would otherwise want to drop all keys. Using a bloom filter
avoids this, and in the case of a false positive, a key will be left
undropped that preferred content would allow dropping. Chances of that
happening are a mere 1 in 1 million.
This makes git annex unused use around 48 mb more memory than it did before,
but the massive increase in accuracy makes this worthwhile for all but the
smallest systems.
Also, I want to use the bloom filter for sync --all --content, to avoid
dropping files that the preferred content doesn't want, and 1/1000
false positives would be far too many in that use case, even if it were
acceptable for unused.
Actual memory use numbers:
1000: 21.06user 3.42system 0:26.40elapsed 92%CPU (0avgtext+0avgdata 501552maxresident)k
1000000: 21.41user 3.55system 0:26.84elapsed 93%CPU (0avgtext+0avgdata 549496maxresident)k
10000000: 21.84user 3.52system 0:27.89elapsed 90%CPU (0avgtext+0avgdata 549920maxresident)k
Based on these numbers, 10 million seemed a better pick than 1 million.
This works, and seems fairly robust. Clean get of 20 files at -J3. At -J10,
there are some messages about ssh multiplexing, probably due to a race
spinning up the ssh connection cacher. But, it manages to get all the files
ok regardless.
The progress bars are a scrambled mess though, due to bugs in
ascii-progress, which I've already filed. Particularly this one:
https://github.com/yamadapc/haskell-ascii-progress/issues/8
Seen for example, a newly checked out git submodule. In this case,
.git/HEAD is a raw sha, rather than the usual reference to a ref.
Removed currentSha in passing, since it was a more roundabout way of
doing what headSha does, and headSha is more robust.
See my comment in the bug report for analysis; basically this is safe
because it's a non-forced push, so won't lose history. Even if it was a
forced push or somehow races, things will eventually become consistent and
no git-annex branch info will be lost.
(This used to be done, but it forgot to do it since version 4.20130909.)
* sync: Use the ssh-options git config when doing git pull and push.
* remotedaemon: Use the ssh-options git config.
Note that the rename env var means that if a new git-annex calls an old one
for git-annex ssh, or a new calls an old, nothing much will go wrong;
just ssh caching won't happen.
Note that while the assistant detects changes made to remote names, I left
the commit message fixed rather than calculating it after every commit. It
doesn't seem worth the CPU to do the latter.
I think this is the last problimatic setCurrentDirectory. I also audited
for extrnal commands that git-annex might run with cwd = foo, and did not
find any that were passed any FilePath that might be absolute.
In the case where a remote of the bare repo has a fetch = configuration,
refs/remotes/origin/master will exist, and so the merge code path tried to
run in the bare repo.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
Support users who have set commit.gpgsign, by disabling gpg signatures for
git-annex branch commits and commits made by the assistant.
The thinking here is that a user sets commit.gpgsign intending the commits
that they manually initiate to be gpg signed. But not commits made in the
background, whether by a deamon or implicitly to the git-annex branch.
gpg signing those would be at best a waste of CPU and at worst would fail,
or flood the user with gpg passphrase prompts, or put their signature on
changes they did not directly do.
See Debian bug #753720.
Also makes all commits done by git-annex go through a few central control
points, to make such changes easier in future.
Also disables commit.gpgsign in the test suite.
This commit was sponsored by Antoine Boegli.
When in direct mode, update the master branch after committing to the
annex/direct/master branch. Also, update the synced/master branch.
This fixes a topology A->B where both A and B are in direct mode and
running the assistant, and a change is made to B. Before this fix, A pulled
the changes from B, but since they were only on the annex/direct/master
branch, it did not merge them.
Note that I considered making the assistant merge the
remotes/B/annex/direct/master, but decided to keep it simple and only merge
the sync branches as before.
Only fsck and reinject and the test suite used the Backend, and they can
look it up as needed from the Key. This simplifies the code and also speeds
it up.
There is a small behavior change here. Before, all commands would warn when
acting on an annexed file with an unknown backend. Now, only fsck and
reinject show that warning.
For sync, saves 1 ssh connection per remote. For remotedaemon, the same
ssh connection that is already open to run git-annex-shell notifychanges
is reused to pull from the remote.
Only potential problem is that this also enables connection caching
when the assistant syncs with a ssh remote. Including the sync it does
when a network connection has just come up. In that case, cached ssh
connections are likely to be stale, and so using them would hang.
Until I'm sure such problems have been dealt with, this commit needs to
stay on the remotecontrol branch, and not be merged to master.
This commit was sponsored by Alexandre Dupas.
This is a new feature, it was not handled before, since it's a bit of an
edge case. However, it can be handled exactly the same as a file/dir
conflict, just leave the non-annexed item alone.
While implementing this, the core resolveMerge' function got a lot simpler
and clearer. Note especially that where before there was an asymetric call to
stagefromdirectmergedir, now graftin is called symmetrically in both cases.
And, in order to add that `graftin us`, the current branch needed to be
known (if there is no current branch, there cannot be a merge conflict).
This led to some cleanups of how autoMergeFrom behaved when there is no
current branch.
This commit was sponsored by Philippe Gauthier.
I think that 751f496c11 didn't quite manage
to actually fix the bug, although I have not checked since its "fix" got
redone.
The test suite now actually checks the file staged in git is a symlink,
rather than relying on the bug casing a later sync failure. This seems a
more reliable way to detect it, and probably avoids a heisenbug in the test
suite.
Added test cases for both ways this can happen, with a conflict involving a
file, or a directory.
Cleaned up resolveMerge to not touch the work tree in direct mode, which
turned out to be the only way to handle things.. And makes it much nicer.
Still need to run test suite on windows.
Using the extract(1) program to do the heavy lifting.
Decided to make git-annex run pre-commit-annex when committing. Since
git-annex pre-commit also runs it, it'll be run when git commit is run too,
via the pre-commit hook. This basically gives back the pre-commit hook
that git-annex took away. The implementation avoids repeatedly looking
for the hook script when the assistant is running and committing
repeatedly; only checks if the hook is available once.
To make the script simpler, made git-annex metadata -s field?=value
only set a field when it's not already got a value.
This commit was sponsored by bak.
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)
Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
Fix bug in automatic merge conflict resolution code when used
on a filesystem not supporting symlinks, which resulted in it losing
track of the symlink bit of annexed files.
This was the underlying bug that was causing another test to fail,
which got worked around in 1c997fd08c.
I've chosen to keep 2 separate test cases since the old test case only
detected the problem accidentially.
Test suite passes on FAT & in windows, as well as on proper unix systems.
This commit was sponsored by Ellis Whitehead.
Need to include the uuid of the local repo in the list of belived locations
of a key after getting it, in order for the drop from remote to include it
in the numcopies calculation.
* sync --content: Honor annex-ignore configuration.
* sync: Don't try to sync with xmpp remotes, which are only currently
supported when using the assistant.
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.
As part of this, preferred content checks pass the Key around.
Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.
I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
I've been disliking how the command seek actions were written for some
time, with their inversion of control and ugly workarounds.
The last straw to fix it was sync --content, which didn't fit the
Annex [CommandStart] interface well at all. I have not yet made it take
advantage of the changed interface though.
The crucial change, and probably why I didn't do it this way from the
beginning, is to make each CommandStart action be run with exceptions
caught, and if it fails, increment a failure counter in annex state.
So I finally remove the very first code I wrote for git-annex, which
was before I had exception handling in the Annex monad, and so ran outside
that monad, passing state explicitly as it ran each CommandStart action.
This was a real slog from 1 to 5 am.
Test suite passes.
Memory usage is lower than before, sometimes by a couple of megabytes, and
remains constant, even when running in a large repo, and even when
repeatedly failing and incrementing the error counter. So no accidental
laziness space leaks.
Wall clock speed is identical, even in large repos.
This commit was sponsored by an anonymous bitcoiner.