Commit graph

666 commits

Author SHA1 Message Date
Joey Hess
7f5958eec2
TrustedCopy is good enough to allow dropping
By definition, a trusted repository is trusted to always have its location
tracking log accurate. Thus, it should never be in a position where content
is being dropped from it concurrently, as that would result in the location
tracking log not being accurate.
2015-10-08 18:34:48 -04:00
Joey Hess
e4a33967a1
try harder to verify until at least one VerifiedCopyLock is obtained
This avoids a failure where eg, we start with RecentlyVerifiedCopies
for all remotes, and so didn't do any active verification, which is
required.

Also, dedup the list of VerifiedCopies when checking if we have enough,
in case 2 copies of a UUID slip in.
2015-10-08 18:20:36 -04:00
Joey Hess
b17f5da6c9
require 1 locked copy while dropping from local or a remote
See doc/bugs/concurrent_drop--from_presence_checking_failures.mdwn for
discussion about why 1 locked copy is all we can require, and how this
fixes concurrent dropping bugs.

Note that, since nothing yet generates a VerifiedCopyLock yet, this commit
breaks dropping temporarily.
2015-10-08 18:11:39 -04:00
Joey Hess
c75c79864d
support invalidating existing VerifiedCopys 2015-10-08 17:58:32 -04:00
Joey Hess
90f7c4b6a2
add VerifiedCopy data type
There should be no behavior changes in this commit, it just adds a more
expressive data type and adjusts code that had been passing around a [UUID]
or sometimes a Maybe Remote to instead use [VerifiedCopy].

Although, since some functions were taking two different [UUID] lists,
there's some potential for me to have gotten it horribly wrong.
2015-10-08 16:55:11 -04:00
Joey Hess
beedf1da25
unused import 2015-10-08 14:59:34 -04:00
Joey Hess
9cb9dab69b
I think this comment is stale/confusing; remove 2015-10-08 14:51:44 -04:00
Joey Hess
4d50958ed7
add lockContentShared
Also, rename lockContent to lockContentExclusive

inAnnexSafe should perhaps be eliminated, and instead use
`lockContentShared inAnnex`. However, I'm waiting on that, as there are
only 2 call sites for inAnnexSafe and it's fiddly.
2015-10-08 14:29:35 -04:00
Joey Hess
2def1d0a23 other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.

On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.

As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.

It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 14:35:12 -04:00
Joey Hess
7c7fe895f9 disabling verification also disables size verification
It's not expensive to do size verification, but let's be consistent and
turn it off too.
2015-10-02 12:38:02 -04:00
Joey Hess
c6632ee5c8 avoid verification when hard linking to objects in shared repository
Such a repository is implicitly trusted, so there's no point.
2015-10-02 12:36:03 -04:00
Joey Hess
2fb3722ce9 Do verification of checksums of annex objects downloaded from remotes.
* When annex objects are received into git repositories, their checksums are
  verified then too.
* To get the old, faster, behavior of not verifying checksums, set
  annex.verify=false, or remote.<name>.annex-verify=false.
* setkey, rekey: These commands also now verify that the provided file
  matches the key, unless annex.verify=false.
* reinject: Already verified content; this can now be disabled by
  setting annex.verify=false.

recvkey and reinject already did verification, so removed now duplicate
code from them. fsck still does its own verification, which is ok since it
does not use getViaTmp, so verification doesn't happen twice when using fsck
--from.
2015-10-01 15:56:39 -04:00
Joey Hess
b72d3fbeba rename function 2015-10-01 14:18:57 -04:00
Joey Hess
807ba6a903 refactor 2015-10-01 14:07:06 -04:00
Joey Hess
dc2f1f09b7 Improve robustness of direct mode merge, avoiding a crash if the index file is missing.
I couldn't find a good way to make an *empty* index file (zero byte file
won't do), so I punted and just don't make index.lock when there's no index
yet. This means some other git process could race and write an index file
at the same time as the merge is ongoing, in theory. Only happens in new
repos though.
2015-09-22 13:00:18 -04:00
Joey Hess
b88739f0d0 avoid auto-enabling a remote that's already enabled 2015-09-14 15:34:15 -04:00
Joey Hess
c919489c3e avoid autoenable of dead special remotes 2015-09-14 15:28:14 -04:00
Joey Hess
9cfb96c53d Special remotes configured with autoenable=true will be automatically enabled when git-annex init is run. 2015-09-14 14:49:48 -04:00
Joey Hess
97962591d6 init: Fix reversion in detection of repo made with git clone --shared 2015-09-09 13:56:37 -04:00
Joey Hess
c242e248e8 Fix reversion in init when ran as root, introduced in version 5.20150731. 2015-08-19 12:36:17 -04:00
Joey Hess
0f5d6c09ac importfeed --relaxed: Avoid hitting the urls of items in the feed. 2015-08-19 12:24:55 -04:00
Joey Hess
23e9d3bb77 Fix setting/setting/viewing metadata that contains unicode or other special characters, when in a non-unicode locale.
Oh boy, not again. So, another place that the filesystem encoding needs to
be applied. Yay.

In passing, I changed decodeBS so if a NUL is embedded in the input, the
resulting FilePath doesn't get truncated at that NUL. This was needed to
make prop_b64_roundtrips pass, and on reviewing the callers of decodeBS, I
didn't see any where this wouldn't make sense. When a FilePath is used to
operate on the filesystem, it'll get truncated at a NUL anyway, whereas if
a String is being used for something else, it might conceivably have a NUL
in it, and we wouldn't want it to get truncated when going through
decodeBS.
(NB: There may be a speed impact from this change.)
2015-08-11 18:40:59 -04:00
Joey Hess
f7d7995172 clean 2015-08-04 17:07:45 -04:00
Joey Hess
3c971c414e sshopts is never going to be null; the concat of it may be 2015-08-04 16:53:38 -04:00
Joey Hess
a6374b7a3d typo 2015-08-04 15:44:46 -04:00
Joey Hess
f041a65c33 Windows: Fix bug that caused git-annex sync to fail due to missing environment variable.
I think that the problem was caused by windows not having a concept of an
env var that is set, but to the empty string. So, GIT_ANNEX_SSHOPTION
got set to "" and was not seen as set at all.

Easy fix, which also makes git-annex sync a little faster is to not set
GIT_SSH, when GIT_ANNEX_SSHOPTION has no options. Might as well let git use
ssh per usual in this case, no need to run git-annex as the proxy ssh
command..
2015-08-04 15:27:48 -04:00
Joey Hess
6c15cdfcb8 proxy: Fix proxy git commit of non-annexed files in direct mode.
* proxy: Fix proxy git commit of non-annexed files in direct mode.
* proxy: If a non-proxied git command, such as git revert
  would normally fail because of unstaged files in the work tree,
  make the proxied command fail the same way.
2015-08-04 14:01:59 -04:00
Joey Hess
ea765ec022 windows build warning fixes 2015-08-03 15:54:29 -04:00
Joey Hess
9dfe03dbcd Improve shutdown due to --time-limit, especially for fsck
* Perform a clean shutdown when --time-limit is reached.
  This includes running queued git commands, and cleanup actions normally
  run when a command is finished.
* fsck: Commit incremental fsck database when --time-limit is reached.
  Previously, some of the last files fscked did not make it into the
  database when using --time-limit.

Note that this changes Annex.addCleanup hooks, to run after --time-limit
expires. Fsck was using such a hook to clean up after a
--incremental-schedule, and that shouldn't run when --time-limit exipires
it. So, instead, moved that cleanup code to be run by cleanupIncremental.
Resulted in some data type juggling.
2015-07-31 16:01:54 -04:00
Joey Hess
b30324fec7 init: Detect when the filesystem is crippled such that it ignores attempts to remove the write bit from a file, and enable direct mode. Seen with eg, NTFS fuse on linux. 2015-07-30 14:06:17 -04:00
Joey Hess
267f397d82 avoid calling copy when file DNE
This avoids an ugly warning when running git annex fsck --from a rsync
remote in a repo in direct mode.
2015-07-30 13:40:17 -04:00
Joey Hess
24800b1bf1 Only look at reflogs for relevant branches, not for git-annex branches
This speeds it up quite a bit.. May still be too slow in large repos.
2015-07-07 17:36:30 -04:00
Joey Hess
b11d2f5a8a unused: --used-refspec can now be configured to look at refs in the reflog. This provides a way to not consider old versions of files to be unused after they have reached a specified age, when the old refs in the reflog expire.
May be slow.
2015-07-07 17:13:50 -04:00
Joey Hess
f7dc20595e refactor ls-tree params
All in one place to avoid bugs like 174da80ddc
2015-07-06 14:21:43 -04:00
Joey Hess
174da80ddc bugfix: Pass --full-tree when using git ls-files to get a list of files on the git-annex branch, so it works when run in a subdirectory. This bug affected git-annex unused, and potentially also transitions running code and other things. 2015-07-06 14:09:54 -04:00
Joey Hess
adba0595bd use bloom filter in second pass of sync --all --content
This is needed because when preferred content matches on files,
the second pass would otherwise want to drop all keys. Using a bloom filter
avoids this, and in the case of a false positive, a key will be left
undropped that preferred content would allow dropping. Chances of that
happening are a mere 1 in 1 million.
2015-06-16 18:50:13 -04:00
Joey Hess
a0a8127956 instance Hashable Key for bloomfilter 2015-06-16 18:37:41 -04:00
Joey Hess
8b74aec3ea Increased the default annex.bloomaccuracy from 1000 to 10000000
This makes git annex unused use around 48 mb more memory than it did before,
but the massive increase in accuracy makes this worthwhile for all but the
smallest systems.

Also, I want to use the bloom filter for sync --all --content, to avoid
dropping files that the preferred content doesn't want, and 1/1000
false positives would be far too many in that use case, even if it were
acceptable for unused.

Actual memory use numbers:

1000: 21.06user 3.42system 0:26.40elapsed 92%CPU (0avgtext+0avgdata 501552maxresident)k
1000000: 21.41user 3.55system 0:26.84elapsed 93%CPU (0avgtext+0avgdata 549496maxresident)k
10000000: 21.84user 3.52system 0:27.89elapsed 90%CPU (0avgtext+0avgdata 549920maxresident)k

Based on these numbers, 10 million seemed a better pick than 1 million.
2015-06-16 18:12:00 -04:00
Joey Hess
8c46ea22c2 Added new "anything" preferred content expression, which matches all versions of all files. 2015-06-16 17:03:34 -04:00
Joey Hess
0a998032ed Fix bug that prevented enumerating locally present objects in repos tuned with annex.tune.objecthash1=true
Need to walk 1 level of subdirs less in this case.

The git-annex branch traversal code didn't have a similar bug.
2015-06-11 15:15:05 -04:00
Joey Hess
de3bd11a2c import --clean-duplicates: Fix bug that didn't count local or trusted repo's copy of a file as one of the necessary copies to allow removing it from the import location. 2015-06-03 13:15:38 -04:00
Joey Hess
d28e8fbfd5 get --incomplete: New option to resume any interrupted downloads. 2015-06-02 14:20:38 -04:00
Joey Hess
eb33569f9d remove Params constructor from Utility.SafeCommand
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.

In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.

Problems noticed while doing this conversion:

	* Some uses of Params "oneword" which was entirely unnecessary
	  overhead.
	* A few places that built up a list of parameters with ++
	  and then used Params to split it!

Test suite passes.
2015-06-01 13:52:23 -04:00
Joey Hess
a6d54e49a0 sync, remotedaemon: Pass configured ssh-options even when annex.sshcaching is disabled. 2015-05-30 22:01:52 -04:00
Joey Hess
83b262f1b6 fix windows build 2015-05-22 13:54:54 -04:00
Joey Hess
167539a354 better memoize core.sharedrepository handling
It was memoized, but that was not used consistently. Move it to
Types.GitConfig so it will auto-memoize.
2015-05-19 15:04:24 -04:00
Joey Hess
b47c9fd587 honor core.sharedRepository settings in lockContent
The content file may not be owned by the user running git-annex, in which
case, setting the owner write bit was not enough to let lockContent
act on the file. However, with some core.sharedRepository configs, the file
should be writable by the user's group. So, the thing to do is to call
thawContent on it.
2015-05-19 14:53:19 -04:00
Joey Hess
f4e2093760 fix inAnnexSafe result for direct file that is being dropped
It was returning Just False in this situation, which differed from indirect
mode behavior. I don't think this led to any actual problems; things that
checked if the file being dropped was present just failed to fail, and
instead reported it wasn't present, possibly incorrectly.

Hmm, it's possible that this could have made git annex fsck --from remote
update the location log wrongly, if a remote was in direct mode, and was in
the middle of trying to drop a key, and the drop later failed.
2015-05-19 14:26:07 -04:00
Joey Hess
1312e721ed convert lockContent to use new LockPools
Also cleaned up the code, avoiding creating a lock file if we're going to
open it for create later anyway.

And, if there's an exception while preparing to lock the file, but not at
the point of actually taking the lock, throw an exception, instead of
silently not locking and pretending to succeed.

And, on Windows, always use lock file, even if the repo somehow got into
indirect mode (maybe with cygwin git..)
2015-05-19 14:12:23 -04:00
Joey Hess
ecb0d5c087 use lock pools throughout git-annex
The one exception is in Utility.Daemon. As long as a process only
daemonizes once, which seems reasonable, and as long as it avoids calling
checkDaemon once it's already running as a daemon, the fcntl locking
gotchas won't be a problem there.

Annex.LockFile has it's own separate lock pool layer, which has been
renamed to LockCache. This is a persistent cache of locks that persist
until closed.

This is not quite done; lockContent stil needs to be converted.
2015-05-19 14:09:52 -04:00