Commit graph

919 commits

Author SHA1 Message Date
Joey Hess
0ae08947ac
Run ssh with ServerAliveInterval 60
So that stalled transfers will be noticed within about 3 minutes,
even if TCPKeepAlive is disabled or doesn't work.

Rather than setting with -o, use -F with another config file,
so that any settings in ~/.ssh/config or /etc/ssh/ssh_config overrides this.
2016-10-26 16:41:34 -04:00
Joey Hess
1a8ba7eab4
Improve ssh socket cleanup code to skip over the cruft that NFS sometimes puts in a directory when a file is being deleted. 2016-10-26 13:16:41 -04:00
Joey Hess
8e22114735
upgrade: Handle upgrade to v6 when the repository already contains v6 unlocked files whose content is already present.
Closes https://github.com/datalad/datalad/issues/1020

The use of runWriter in scanUnlockedFiles broke due to this change;
it failed with blocked indefinitely in mvar, because the database write
handle was taken while linkFromAnnex needed to also write to it (to update
the inode cache). So, switched to using a separate runWriter for each call
to addAssociatedFileFast. A little less efficient, but not greatly; the
writes should all still be cached.
2016-10-17 15:19:47 -04:00
Joey Hess
148bd0dbfd
refactor 2016-10-17 14:58:33 -04:00
Joey Hess
ee309d6941
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's  performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020

The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.

Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.

Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.

Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.

In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 13:58:43 -04:00
Joey Hess
933bc5c917
Support using v3 repositories without upgrading them to v5.
An easy change now that supportedVersions is a list. Since v3 and v5 are
identical other than version number, just add v3 to the list.

This commit was sponsored by andrea rota.
2016-10-05 16:53:09 -04:00
Joey Hess
f867fc157f
When auto-upgrading a v3 remote, avoid upgrading to version 6, instead keep it at version 5.
Fixes a bug introduced with v6 mode that I didn't notice until now.
Probably not many v3 repos left out there, and upgrading them to v6 mode
is not disastrous, only a little premature.

This commit was sponsored by Riku Voipio
2016-10-05 16:23:09 -04:00
Joey Hess
34530e59d9
Avoid using a lot of memory when large objects are present in the git repository
.. and have to be checked to see if they are a pointed to an annexed file.

Cases where such memory use could occur included, but were not limited to:
  - git commit -a of a large unlocked file (in v5 mode)
  - git-annex adjust when a large file was checked into git directly
Generally, any use of catKey was a potential problem.

Fix by using git cat-file --batch-check to check size before catting.
This adds another git batch process, which is included in the CatFileHandle
for simplicity.

There could be performance impact, anywhere catKey is used. Particularly
likely to affect adjusted branch generation speed, and operations on
unlocked files in v6 mode. Hopefully since the --batch-check and
--batch read the same data, disk buffering will avoid most overhead.
Leaving only the overhead of talking to the process over the pipe and
whatever computation --batch-check needs to do.

This commit was sponsored by Bruno BEAUFILS on Patreon.
2016-10-05 15:24:13 -04:00
Joey Hess
1cd02762bf
Optimisations to git-annex branch query and setting, avoiding repeated copies of the environment.
Speeds up commands like  "git-annex find --in remote" by over 50%.

Profiling showed that adjustGitEnv was 21% of the time and 37% of the
allocations of that command. It copied the environment each time with
getEnvironment.

The only repeated use of adjustGitEnv is in withIndexFile, which tends to
be run at least once per file. So, it was optimised by keeping a cache of
the environment, which can be reused.

There could be other better ways to optimise this. Maybe get the while
environment once at startup. But, then it would have to be serialized back
out each time running a child process, so I doubt that would be a net win.

It might be better to cache a version of the environment that is
pre-modified to use .git-annex/index. But, profiling doesn't show that
modifying the enviroment is taking any significant time.
2016-09-29 13:36:48 -04:00
Joey Hess
35446d3c3a
followup 2016-09-29 11:33:42 -04:00
Joey Hess
8794dcf27b
Optimisations to time it takes git-annex to walk working tree and find files to work on. Sped up by around 18%.
key2file and file2key were top cost centers according to profiling.
The repeated use of replace was not efficient. This new approach is quite a
lot more efficient.

This commit was sponsored by Denis Dzyubenko on Patreon.
2016-09-26 16:48:57 -04:00
Joey Hess
a569f195b7
fix bugs in handing of deep branches with sync and adjusted branches
* sync: Previously, when run in a branch with a slash in its name,
  such as "foo/bar", the sync branch was "synced/bar". That conflicted
  with the sync branch used for branch "bar", so has been changed to
  "synced/foo/bar".
* adjust: Previously, when adjusting a branch with a slash in its name,
  such as "foo/bar", the adjusted branch was "adjusted/bar(unlocked)".
  That conflicted with the adjusted branch used for branch "bar",
  so has been changed to "adjusted/foo/bar(unlocked)"
* Also, running sync in an adjusted branch did not correctly sync
  changes back to the parent branch when it had a slash in its name.
  This bug has been fixed.

Eliminate use of Git.Ref.under and Git.Ref.basename; using
Git.Ref.underBase and Git.Ref.base make everything handle deep branches
correctly.

Probably noone was adjusting deep branches, and v6 is still experimental
anyway, so I'm not going to worry about the mess that was left by that bug.

In the case of git-annex sync, using a fixed git-annex with an old unfixed
one will mean they use different sync branches for a deep branch, and so
they may stop syncing until the old one is upgraded. However, that's only
a problem when syncing between repositories without going via a central
bare repository. Added a warning about this to the CHANGELOG, but it's
probably not going to affect many people at all.

This commit was sponsored by Riku Voipio.
2016-09-21 15:23:47 -04:00
Joey Hess
d4fbc3b460
make --json-progress work for url downloads 2016-09-09 16:15:39 -04:00
Joey Hess
8ef494a833
disentangle concurrency and message type
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.

Concurrent json output is currently a mess though since threads output
chunks over top of one-another.
2016-09-09 12:57:42 -04:00
Joey Hess
31289da691
get -J: Download different files from different remotes when the remotes have the same costs.
Only done in -J mode because only if there's concurrency can downloading
from two remotes be faster. Without concurrency, it's likely the case that
sequential downloads from the same remote are faster than switching back
and forth between two remotes.

There is some hairy MVar code here, but basically it just keeps
the activeremotes MVar full except when deciding which remote to assign
to a thread.

Also affects gets by sync --content -J

This commit was sponsored by Jochen Bartl.
2016-09-06 12:45:21 -04:00
Joey Hess
10ddf2c3bd
remove TransferObserver
unused after last commit
2016-08-03 13:46:20 -04:00
Joey Hess
f461bcae4b
Re-enable accumulating transfer failure log files for command-line actions
This was disabled in commit 61ccf95004,
because only the assistant used them, and they were clutter. But, now
--failed also uses them.

Remove the failure log files after successful transfers. Should avoid
most of the clutter problems.

Commit 61ccf95004 mentions a subtle behavior
change, which has now been reverted:

    There is one behavior change from this. If glacier is being used, and a
    manual git annex get --from glacier fails because the file isn't available
    yet, the assistant will no longer later see that failed transfer file and
    retry the get.
2016-08-03 13:41:07 -04:00
Joey Hess
1a0e2c9901
get, move, copy, mirror: Added --failed switch which retries failed copies/moves
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.

Noisy due to some refactoring into Types/
2016-08-03 12:37:12 -04:00
Joey Hess
bf3327ff25
Added metadata --batch option, which allows getting, setting, deleting, and modifying metadata for multiple files/keys. 2016-07-27 10:46:25 -04:00
Joey Hess
e5225f08fc
When built with ut uid-1.3.12, generate more random UUIDs than before
Use nextRandom to generate the random UUID, rather than using randomIO.
This gets fixes for the following two bugs in the uuid library.

However, this did not impact git-annex much, so a hard depedency has
not been added on uuid-1.3.12.

https://github.com/aslatter/uuid/issues/15
	"v4 UUIDs are not that random"

	This doesn't greatly affect git-annex, because even with only
	2^64 possible UUIDs, the chance that two git-annex repositories
	that are clones of the same git repo get the same UUID is miniscule.

	And, git-annex generates only one UUID per run, so preducting
	subsequent UUIDs is not a problem.

https://github.com/aslatter/uuid/issues/16
	"Remove Random instance for UUID, or mark it as deprecated"

	git-annex was using that instance; let's stop before it gets
	deprecated or removed.
2016-07-27 07:46:08 -04:00
Joey Hess
d13194b230
--branch, stage 2
Show branch:file that is being operated on.

I had to make ActionItem a type and not a type class because
withKeyOptions' passed two different types of values when using the type
class, and I could not get the type checker to accept that.
2016-07-20 15:23:43 -04:00
Joey Hess
2619019630
Avoid any access to keys database in v5 mode repositories, which are not supposed to use that database. 2016-07-19 12:12:19 -04:00
Joey Hess
154c939830
Speed up startup time by caching the refs that have been merged into the git-annex branch.
This can speed up git-annex commands by as much as a second, depending on
the number of remotes.
2016-07-17 12:24:34 -04:00
Joey Hess
cbe3813005
handle SomeAsyncException same as AsyncException
This new class was added to base a while ago; I don't know what uses it,
but it's intended to be an async exception, so make sure we don't catch it.
2016-06-20 10:31:47 -04:00
Joey Hess
142710d1b4
fix build on windows 2016-06-13 14:54:34 -04:00
Joey Hess
bfd00a0f8c
v6: Fix bad merge in an adjusted branch that resulted in an empty tree. 2016-06-13 14:18:22 -04:00
Joey Hess
b6b5a11601
Make git clean filter preserve the backend that was used for a file. 2016-06-09 15:17:08 -04:00
Joey Hess
0249f3aff5
Fix bug in initialization of clone from a repo with an adjusted branch that had not been synced back to master.
This bug caused broken tree objects to get built by a later git annex sync.

This is a somewhat unlikely but not impossible situation, and the test
suite's union_merge_regression test tickled it when it was run on FAT.
2016-06-09 14:11:00 -04:00
Joey Hess
8e4cbefbc6
also avoid crashing in most circumstances if unable to determine the username
Mostly the username is only used for the git committer or other display
purposes, and we can just fall back to a dummy value in these cases.

The only remaining place where an error is thrown is when starting local
pairing, which needs the username to be known.
2016-06-08 15:04:15 -04:00
Joey Hess
9569d6be63
Fix bad automatic merge conflict resolution between an annexed file and a directory with the same name when in an adjusted branch.
When running in an overlay work tree, all unchanged files show as deleted,
so this code that stages deletions should not run.
2016-06-07 12:53:35 -04:00
Joey Hess
8148ee3d4b
withAltRepo needs a separate queue of changes
The queue could potentially contain changes from before withAltRepo, and
get flushed inside the call, which would apply the changes to the modified
repo.

Or, changes could be queued in withAltRepo that were intended to affect
the modified repo, but don't get flushed until later.

I don't know of any cases where either happens, but better safe than sorry.

Note that this affect withIndexFile, which is used in git-annex branch
updates. So, it potentially makes things slower. Should not be by much;
the overhead consists only of querying the current queue a couple of times,
and potentially flushing changes queued within withAltRepo earlier, that
could have maybe been bundled with other later changes.

Notice in particular that the existing queue is not flushed when calling
withAltRepo. So eg when git annex add needs to stage files in the index,
it will still bundle them together efficiently.
2016-06-03 13:57:00 -04:00
Joey Hess
907fc62f2c
Fix initialization of a bare clone of a repo that has an adjusted branch checked out. 2016-06-02 17:02:38 -04:00
Joey Hess
26887745a0
refactor isBareRepo 2016-06-02 16:59:47 -04:00
Joey Hess
3b97c09cde
better avoid switching to direct mode in clone of adjusted branch repo 2016-06-02 16:10:30 -04:00
Joey Hess
69bf128f76
avoid switching to direct mode in clone of adjusted branch repo 2016-06-02 15:36:52 -04:00
Joey Hess
72f0d3d384
Automatically enable v6 mode when initializing in a clone from a repo that has an adjusted branch checked out.
The clone also has the adjusted branch checked out, so it needs to be
initialized to a version that supports that.
2016-06-02 15:34:30 -04:00
Joey Hess
fbf5045d4f
sync --content: Fix bug that caused transfers of files to be made to a git remote that does not have a UUID. This particularly impacted clones from gcrypt repositories.
Added guard in Annex.Transfer to prevent this problem at a deeper level.

I'm unhappy ith NoUUID, but having Maybe UUID instead wouldn't help either
if nothing checked that there was a UUID. Since there legitimately need to
be Remotes that do not have a UUID, I can't see a way to fix it at the type
level, short making there be two separate types of Remotes.
2016-06-02 13:50:43 -04:00
Yaroslav Halchenko
64e844e1fe
minor typo fixes throughout
problematic
flexibility
2016-06-02 11:22:18 -04:00
Joey Hess
714750e593
include 3 in upgradableVersions
Does not change behavior, only git annex version output
2016-05-24 17:13:19 -04:00
Joey Hess
91df4c6b53
Pass the various gnupg-options configs to gpg in several cases where they were not before.
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.

RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
2016-05-23 17:03:20 -04:00
Joey Hess
80b86ff78d
fix recent test suite reversion
git annex adjust --force will overwrite any current adjusted branch.
I didn't document this because for the user, deleting the branch is just as
good.
2016-05-23 11:23:30 -04:00
Joey Hess
097605e2e9
git's handing of relative GIT_INDEX_FILE is more insane than I thought; always make absolute
This is actually worse than I thought; when git is being run with a
detached work tree, GIT_INDEX_FILE is treated as a path relative to CWD,
instead of the normal behavior of relative the top of the work tree.

This seems to make it basically impossible for any program that wants to
use GIT_INDEX_FILE to use anything other than an absolute path to it; there
are too many configurations to keep straight that can change how git
interprets what should be a simple relative path to a file.

(I have complained to the git developers.)
2016-05-22 15:02:55 -04:00
Joey Hess
823c28d2dc
nub transitionList to avoid ugly message after repeated transitions, and avoid redundant work for repeated ForgetDeadRemotes transitions 2016-05-18 12:26:38 -04:00
Joey Hess
766728c8cf
unify handling of unusual GIT_INDEX_FILE relative path
This is probably a git bug that stuck in its interface.
2016-05-17 14:42:06 -04:00
Joey Hess
b4ab1fb093
Fix crash when entering/changing view in a subdirectory of a repo that has a dotfile in its root. 2016-05-17 13:49:10 -04:00
Joey Hess
e91037a38b
use indexEnv 2016-05-17 13:38:04 -04:00
Joey Hess
93c03b5dd5
Work around git bug in handling of relative path to GIT_INDEX_FILE when in a subdirectory of the repository.
This affected git annex view. It turns out that some other places
that use GIT_INDEX_FILE were already working around the bug. I removed the
workaround from Annex.Branch since the new workaround will do.
2016-05-17 13:29:51 -04:00
Joey Hess
d56175164b
avoid checking locations in regular repo
In commit 2d00523609 I accidentially
made gitAnnexLocation do more work, checking content locations,
when used in a regular repo.
2016-05-16 17:19:07 -04:00
Joey Hess
eda5d9cc74
adjust: Add --fix adjustment, which is useful when the git directory is in a nonstandard place. 2016-05-16 17:18:33 -04:00
Joey Hess
4efc26ca6c
move keys db closure to AutoMerge
This makes git-annex sync also do it, which makes sure that the keys db
info is fresh when doing a sync --content.
2016-05-16 15:11:14 -04:00