Commit graph

155 commits

Author SHA1 Message Date
Joey Hess
f77d485915 move comment 2015-06-16 19:07:14 -04:00
Joey Hess
be9d9cb5ad avoid building unused bloomfilter when run without --all 2015-06-16 19:04:20 -04:00
Joey Hess
5b801fcad9 on second thought, sync --content --unused is probably not useful, remove 2015-06-16 19:01:06 -04:00
Joey Hess
adba0595bd use bloom filter in second pass of sync --all --content
This is needed because when preferred content matches on files,
the second pass would otherwise want to drop all keys. Using a bloom filter
avoids this, and in the case of a false positive, a key will be left
undropped that preferred content would allow dropping. Chances of that
happening are a mere 1 in 1 million.
2015-06-16 18:50:13 -04:00
Joey Hess
8b74aec3ea Increased the default annex.bloomaccuracy from 1000 to 10000000
This makes git annex unused use around 48 mb more memory than it did before,
but the massive increase in accuracy makes this worthwhile for all but the
smallest systems.

Also, I want to use the bloom filter for sync --all --content, to avoid
dropping files that the preferred content doesn't want, and 1/1000
false positives would be far too many in that use case, even if it were
acceptable for unused.

Actual memory use numbers:

1000: 21.06user 3.42system 0:26.40elapsed 92%CPU (0avgtext+0avgdata 501552maxresident)k
1000000: 21.41user 3.55system 0:26.84elapsed 93%CPU (0avgtext+0avgdata 549496maxresident)k
10000000: 21.84user 3.52system 0:27.89elapsed 90%CPU (0avgtext+0avgdata 549920maxresident)k

Based on these numbers, 10 million seemed a better pick than 1 million.
2015-06-16 18:12:00 -04:00
Joey Hess
29c03145e6 sync: Add support for --all and --unused. 2015-06-16 16:50:03 -04:00
Joey Hess
99a1113461 switch code to using associated files 2015-06-16 15:07:03 -04:00
Joey Hess
8077ccbd54 get, move, copy, mirror: Concurrent downloads and uploads are now supported!
This works, and seems fairly robust. Clean get of 20 files at -J3. At -J10,
there are some messages about ssh multiplexing, probably due to a race
spinning up the ssh connection cacher. But, it manages to get all the files
ok regardless.

The progress bars are a scrambled mess though, due to bugs in
ascii-progress, which I've already filed. Particularly this one:
https://github.com/yamadapc/haskell-ascii-progress/issues/8
2015-04-10 17:08:07 -04:00
Joey Hess
a6db10d565 sync: Fix committing when in a direct mode repo that has no HEAD ref.
Seen for example, a newly checked out git submodule. In this case,
.git/HEAD is a raw sha, rather than the usual reference to a ref.

Removed currentSha in passing, since it was a more roundabout way of
doing what headSha does, and headSha is more robust.
2015-03-04 15:25:35 -04:00
Joey Hess
79b6500111 fix innaccurate comment 2015-03-04 14:44:11 -04:00
Joey Hess
289881bdb8 sync: As well as the synced/git-annex push, attempt a git-annex:git-annex push, as long as the remote branch is an ancestor of the local branch, to better support bare git repos.
See my comment in the bug report for analysis; basically this is safe
because it's a non-forced push, so won't lose history. Even if it was a
forced push or somehow races, things will eventually become consistent and
no git-annex branch info will be lost.

(This used to be done, but it forgot to do it since version 4.20130909.)
2015-02-27 14:49:56 -04:00
Joey Hess
15107d2c5a propigate ssh-options everywhere ssh caching is used
* sync: Use the ssh-options git config when doing git pull and push.
* remotedaemon: Use the ssh-options git config.

Note that the rename env var means that if a new git-annex calls an old one
for git-annex ssh, or a new calls an old, nothing much will go wrong;
just ssh caching won't happen.
2015-02-12 16:14:53 -04:00
Joey Hess
f5b830e07c sync, assistant: Include repository name in head branch commit message.
Note that while the assistant detects changes made to remote names, I left
the commit message fixed rather than calculating it after every commit. It
doesn't seem worth the CPU to do the latter.
2015-02-11 13:34:05 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
4fcf65cda3 devblog 2015-01-13 18:36:17 -04:00
Joey Hess
44c9714fdf handle sync's use of setCurrentDirectory to work with relative paths
I think this is the last problimatic setCurrentDirectory. I also audited
for extrnal commands that git-annex might run with cwd = foo, and did not
find any that were passed any FilePath that might be absolute.
2015-01-06 22:23:04 -04:00
Joey Hess
584eac78e6 sync: Fix an edge case where syncing in a bare repository would try to merge and so fail.
In the case where a remote of the bare repo has a fetch =  configuration,
refs/remotes/origin/master will exist, and so the merge code path tried to
run in the bare repo.
2015-01-05 13:40:49 -04:00
Joey Hess
aba3e11776 sync: Now supports remote groups, the same way git remote update does. 2014-12-29 13:42:58 -04:00
Joey Hess
59f88558d5 doh't use "def" for command definitions, it conflicts with Data.Default.def 2014-10-14 14:20:10 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
4c429ad7ee sync: Ensure that pending changes to git-annex branch are committed when in direct mode. (Fixing a very minor reversion.) 2014-09-11 14:35:28 -04:00
Joey Hess
d41849bc23
support commit.gpgsign
Support users who have set commit.gpgsign, by disabling gpg signatures for
git-annex branch commits and commits made by the assistant.

The thinking here is that a user sets commit.gpgsign intending the commits
that they manually initiate to be gpg signed. But not commits made in the
background, whether by a deamon or implicitly to the git-annex branch.
gpg signing those would be at best a waste of CPU and at worst would fail,
or flood the user with gpg passphrase prompts, or put their signature on
changes they did not directly do.

See Debian bug #753720.

Also makes all commits done by git-annex go through a few central control
points, to make such changes easier in future.

Also disables commit.gpgsign in the test suite.

This commit was sponsored by Antoine Boegli.
2014-07-04 11:53:51 -04:00
Joey Hess
501cc8623a assistant: Fix one-way assistant->assistant sync in direct mode.
When in direct mode, update the master branch after committing to the
annex/direct/master branch. Also, update the synced/master branch.

This fixes a topology A->B where both A and B are in direct mode and
running the assistant, and a change is made to B. Before this fix, A pulled
the changes from B, but since they were only on the annex/direct/master
branch, it did not merge them.

Note that I considered making the assistant merge the
remotes/B/annex/direct/master, but decided to keep it simple and only merge
the sync branches as before.
2014-06-16 11:32:13 -04:00
Joey Hess
e880d0d22c replace (Key, Backend) with Key
Only fsck and reinject and the test suite used the Backend, and they can
look it up as needed from the Key. This simplifies the code and also speeds
it up.

There is a small behavior change here. Before, all commands would warn when
acting on an annexed file with an unknown backend. Now, only fsck and
reinject show that warning.
2014-04-17 18:03:39 -04:00
Joey Hess
15917ec1a8 sync, assistant, remotedaemon: Use ssh connection caching for git pushes and pulls.
For sync, saves 1 ssh connection per remote. For remotedaemon, the same
ssh connection that is already open to run git-annex-shell notifychanges
is reused to pull from the remote.

Only potential problem is that this also enables connection caching
when the assistant syncs with a ssh remote. Including the sync it does
when a network connection has just come up. In that case, cached ssh
connections are likely to be stale, and so using them would hang.
Until I'm sure such problems have been dealt with, this commit needs to
stay on the remotecontrol branch, and not be merged to master.

This commit was sponsored by Alexandre Dupas.
2014-04-12 15:59:34 -04:00
Joey Hess
4cc34973d9 copy --fast --to remote: Avoid printing anything for files that are already believed to be present on the remote. 2014-03-13 14:51:22 -04:00
Joey Hess
14d1e878ab sync: Automatically resolve merge conflict between and annexed file and a regular git file.
This is a new feature, it was not handled before, since it's a bit of an
edge case. However, it can be handled exactly the same as a file/dir
conflict, just leave the non-annexed item alone.

While implementing this, the core resolveMerge' function got a lot simpler
and clearer. Note especially that where before there was an asymetric call to
stagefromdirectmergedir, now graftin is called symmetrically in both cases.

And, in order to add that `graftin us`, the current branch needed to be
known (if there is no current branch, there cannot be a merge conflict).
This led to some cleanups of how autoMergeFrom behaved when there is no
current branch.

This commit was sponsored by Philippe Gauthier.
2014-03-04 19:35:55 -04:00
Joey Hess
99295f2c1d factor out Annex.AutoMerge from Command.Sync 2014-03-04 16:26:15 -04:00
Joey Hess
8496d8aa63
improved direct mode dir/file conflicted merge resultion, using tree grafting 2014-03-04 15:00:19 -04:00
Joey Hess
0afa7ae261 much improved test and real fix for FAT symlink loss on conflicted merge
I think that 751f496c11 didn't quite manage
to actually fix the bug, although I have not checked since its "fix" got
redone.

The test suite now actually checks the file staged in git is a symlink,
rather than relying on the bug casing a later sync failure. This seems a
more reliable way to detect it, and probably avoids a heisenbug in the test
suite.
2014-03-04 14:31:26 -04:00
Joey Hess
4a847cdc08 finish fixing direct mode merge bug involving unstaged local files
Added test cases for both ways this can happen, with a conflict involving a
file, or a directory.

Cleaned up resolveMerge to not touch the work tree in direct mode, which
turned out to be the only way to handle things.. And makes it much nicer.

Still need to run test suite on windows.
2014-03-04 02:03:15 -04:00
Joey Hess
6dc8bfad0b rename for clarity 2014-03-03 16:33:21 -04:00
Joey Hess
d6696dbc98 simplfy 2014-03-03 16:21:36 -04:00
Joey Hess
1192d98721 sync: Fix bug in direct mode that caused a file not checked into git to be deleted when merging with a remote that added a file by the same name. (Thanks, jkt) 2014-03-03 14:57:16 -04:00
Joey Hess
d0fce426c4 pre-commit-annex hook script to automatically extract metadata from lots of types of files
Using the extract(1) program to do the heavy lifting.

Decided to make git-annex run pre-commit-annex when committing. Since
git-annex pre-commit also runs it, it'll be run when git commit is run too,
via the pre-commit hook. This basically gives back the pre-commit hook
that git-annex took away. The implementation avoids repeatedly looking
for the hook script when the assistant is running and committing
repeatedly; only checks if the hook is available once.

To make the script simpler, made git-annex metadata -s field?=value
only set a field when it's not already got a value.

This commit was sponsored by bak.
2014-03-02 20:11:58 -04:00
Joey Hess
4e0be2792b remove Read instance for Ref
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)

Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
2014-02-19 01:19:57 -04:00
Joey Hess
556cfeb8f0
hlint to give autobuilder something to do 2014-02-11 13:11:49 -04:00
Joey Hess
751f496c11 add test case & fix conflict resolution bug on Windows & FAT
Fix bug in automatic merge conflict resolution code when used
on a filesystem not supporting symlinks, which resulted in it losing
track of the symlink bit of annexed files.

This was the underlying bug that was causing another test to fail,
which got worked around in 1c997fd08c.
I've chosen to keep 2 separate test cases since the old test case only
detected the problem accidentially.

Test suite passes on FAT & in windows, as well as on proper unix systems.

This commit was sponsored by Ellis Whitehead.
2014-02-04 17:24:12 -04:00
Joey Hess
fded408b44 sync --content: Drop files from remotes that don't have them after getting them.
Need to include the uuid of the local repo in the list of belived locations
of a key after getting it, in order for the drop from remote to include it
in the numcopies calculation.
2014-02-02 22:48:45 -04:00
Joey Hess
0de879e264 update function name in comment 2014-02-02 22:37:09 -04:00
Joey Hess
4676706756 sync --content: Reuse smart copy code from copy command, including handling and repairing out of date location tracking info. Closes: #737480 2014-02-02 19:57:22 -04:00
Joey Hess
a542781f6a remove some monkey faces 2014-02-01 17:14:38 -04:00
Joey Hess
15ac2ea4de sync --content: Re-pull from remotes after downloading content, since that can take a while and other changes may be pushed in the meantime. 2014-02-01 10:49:50 -04:00
Joey Hess
bbef0cddfd improve sync with xmpp and annex-ignore
* sync --content: Honor annex-ignore configuration.
* sync: Don't try to sync with xmpp remotes, which are only currently
  supported when using the assistant.
2014-02-01 10:33:55 -04:00
Joey Hess
cce69eee4d avoid using function named that conflicts with name used in newer version of process library 2014-01-29 13:44:53 -04:00
Joey Hess
86ffeb73d1 reorganize some files and imports 2014-01-26 16:25:55 -04:00
Joey Hess
3518c586cf fix transfers of key with no associated file
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.

As part of this, preferred content checks pass the Key around.

Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.

I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
2014-01-23 16:44:02 -04:00
Joey Hess
73c420ffcf much better command action handling for sync --content 2014-01-20 13:31:03 -04:00
Joey Hess
34c8af74ba fix inversion of control in CommandSeek (no behavior changes)
I've been disliking how the command seek actions were written for some
time, with their inversion of control and ugly workarounds.

The last straw to fix it was sync --content, which didn't fit the
Annex [CommandStart] interface well at all. I have not yet made it take
advantage of the changed interface though.

The crucial change, and probably why I didn't do it this way from the
beginning, is to make each CommandStart action be run with exceptions
caught, and if it fails, increment a failure counter in annex state.
So I finally remove the very first code I wrote for git-annex, which
was before I had exception handling in the Annex monad, and so ran outside
that monad, passing state explicitly as it ran each CommandStart action.

This was a real slog from 1 to 5 am.

Test suite passes.

Memory usage is lower than before, sometimes by a couple of megabytes, and
remains constant, even when running in a large repo, and even when
repeatedly failing and incrementing the error counter. So no accidental
laziness space leaks.

Wall clock speed is identical, even in large repos.

This commit was sponsored by an anonymous bitcoiner.
2014-01-20 04:57:36 -04:00
Joey Hess
e3625e3d89 include information about remotes just uloaded to when calling handleDropsFrom 2014-01-19 18:11:47 -04:00