Commit graph

261 commits

Author SHA1 Message Date
Joey Hess
d345e5b52f add git fsck to cronner, and UI for repository repair (not yet wired up) 2013-10-22 16:02:52 -04:00
Joey Hess
44bb9a808f clean warnings 2013-10-22 14:52:17 -04:00
Joey Hess
ff3f654cbe make git fsck batch-capable 2013-10-22 14:49:41 -04:00
Joey Hess
3e61749d08 index file recovery 2013-10-22 12:58:04 -04:00
Joey Hess
2fb08acda5 add reflog 2013-10-21 16:41:46 -04:00
Joey Hess
18487c779f corrupt branch resetting (but not yet reflog walking) 2013-10-21 16:20:54 -04:00
Joey Hess
fcd91be6f0 implemented removal of corrupt tracking branches
Oh, git, you made this so hard. Not determining if a branch pointed to some
corrupt object, that was easy, but dealing with corrupt branches using git
plumbing is a PITA.
2013-10-21 15:28:06 -04:00
Joey Hess
6d8250c255 avoid redundant fsck when no changes are made 2013-10-20 19:42:17 -04:00
Joey Hess
4f871f89ba git-recover-repository 1/2 done 2013-10-20 17:50:51 -04:00
Joey Hess
f482de1b76 remove workaround for bug in git 1.8.4r0 2013-10-20 15:23:06 -04:00
Joey Hess
edbf177628 fix lsTreeFiles to use --full-tree
This makes it show the full tree, not just the current directory,
and enables --full-name, which yields TopFilePaths.
2013-10-18 15:50:26 -04:00
Joey Hess
c979e0ea62 fix 2013-10-17 19:51:16 -04:00
Joey Hess
c116383b5d fix 2013-10-17 19:49:44 -04:00
Joey Hess
81c4259a0d fix 2013-10-17 19:41:00 -04:00
Joey Hess
16243b9972 missing import 2013-10-17 19:39:22 -04:00
Joey Hess
e93206e294 Windows: Deal with strange msysgit 1.8.4 behavior of not understanding DOS formatted paths for --git-dir and --work-tree. 2013-10-17 19:35:57 -04:00
Joey Hess
aff125ddab try working around windows xargs problem 2013-10-17 15:56:56 -04:00
Joey Hess
d785432f78 use TopFilePath for DiffTree and LsTree 2013-10-17 14:51:19 -04:00
Joey Hess
82ff37520f fix off-by-one 2013-10-16 12:14:14 -04:00
Joey Hess
bac078742d Deal with git check-attr -z output format change in git 1.8.5.
I have not actually tested with 1.8.5, which is not yet relesaed, but
git.git commit f7cd8c50b9ab83e084e8f52653ecc8d90665eef2 changes -z
to also apply to output, without regards to back-compat. (But with pretty
good reasons.)

New code should work with both versions, by fingerprinting for NULs and
newlines.
2013-10-15 16:05:27 -04:00
Joey Hess
f1295b5141 fix windows build 2013-10-02 20:26:00 -04:00
Joey Hess
1536ebfe47 Disable receive.denyNonFastForwards when setting up a gcrypt special remote
gcrypt needs to be able to fast-forward the master branch. If a git
repository is set up with git init --shared --bare, it gets that set, and
pushing to it will then fail, even when it's up-to-date.
2013-10-01 15:23:48 -04:00
Joey Hess
57d49a6d04 remove *>=> and >=*> ; use <$$> instead
I forgot I had <$$> hidden away in Utility.Applicative.
It allows doing the same kind of currying as does >=*>
and I found using it made the code more readable for me.

(*>=> was not used)
2013-09-27 19:58:48 -04:00
Joey Hess
e864c8d033 blind enabling gcrypt repos on rsync.net
This pulls off quite a nice trick: When given a path on rsync.net, it
determines if it is an encrypted git repository that the user has
the key to decrypt, and merges with it. This is works even when
the local repository had no idea that the gcrypt remote exists!

(As previously done with local drives.)

This commit sponsored by Pedro Côrte-Real
2013-09-27 16:21:56 -04:00
Joey Hess
1550759220 enabling rsync.net gcrypt repos
Still need to detect when the user is trying to create a repo
that already exists, and jump to the enabling code.
2013-09-26 23:47:30 -04:00
Joey Hess
735ed3b822 prep for enabling remotre gcrypt repos in webapp 2013-09-26 17:26:13 -04:00
Joey Hess
3192b059b5 add back lost check that git-annex-shell supports gcrypt 2013-09-24 17:51:12 -04:00
Joey Hess
7390f08ef9 Use cryptohash rather than SHA for hashing.
This is a massive win on OSX, which doesn't have a sha256sum normally.

Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.

SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.

Used the following benchmark to arrive at the 1 mb number.

1 mb file:

benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
  2 (2.0%) high mild
  1 (1.0%) high severe

2 mb file:

benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)

import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common

testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk

main = defaultMain
        [ bgroup "sha256"
                [ bench "internal" $ whnfIO internal
                , bench "external" $ whnfIO external
                ]
        ]

sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy

internal :: IO String
internal = show . sha256 <$> L.readFile testfile

external :: IO String
external = do
	s <- readProcess "sha256sum" [testfile]
        return $ fst $ separate (== ' ') s
2013-09-22 20:06:02 -04:00
Joey Hess
006cf7976f more completely solve catKey memory leak
Done using a mode witness, which ensures it's fixed everywhere.

Fixing catFileKey was a bear, because git cat-file does not provide a
nice way to query for the mode of a file and there is no other efficient
way to do it. Oh, for libgit2..

Note that I am looking at tree objects from HEAD, rather than the index.
Because I cat-file cannot show a tree object for the index.
So this fix is technically incomplete. The only cases where it matters
are:

1. A new large file has been directly staged in git, but not committed.
2. A file that was committed to HEAD as a symlink has been staged
   directly in the index.

This could be fixed a lot better using libgit2.
2013-09-19 16:41:21 -04:00
Joey Hess
f26c996dc6 interface to parse git tree objects 2013-09-19 15:58:35 -04:00
Joey Hess
eb42bde19a sync, pre-commit, indirect: Avoid unnecessarily catting non-symlink files from git, which can be so large it runs out of memory. 2013-09-19 14:48:42 -04:00
Joey Hess
e8e209f4e5 better probing for gcrypt repositories using new --check option
Now can tell if a repo uses gcrypt or not, and whether it's decryptable
with the current gpg keys.

This closes the hole that undecryptable gcrypt repos could have before been
combined into the repo in encrypted mode.
2013-09-19 12:53:24 -04:00
Joey Hess
8062f6337f webapp: support adding existing gcrypt special remotes from removable drives
When adding a removable drive, it's now detected if the drive contains
a gcrypt special remote, and that's all handled nicely. This includes
fetching the git-annex branch from the gcrypt repo in order to find
out how to set up the special remote.

Note that gcrypt repos that are not git-annex special remotes are not
supported. It will attempt to detect such a gcrypt repo and refuse
to use it. (But this is hard to do any may fail; see
https://github.com/blake2-ppc/git-remote-gcrypt/issues/6)

The problem with supporting regular gcrypt repos is that we don't know
what the gcrypt.participants setting is intended to be for the repo.
So even if we can decrypt it, if we push changes to it they might not be
visible to other participants.

Anyway, encrypted sneakernet (or mailnet) is now fully possible with the
git-annex assistant! Assuming that the gpg key distribution is handled
somehow, which the assistant doesn't yet help with.

This commit was sponsored by Navishkar Rao.
2013-09-18 15:55:31 -04:00
Joey Hess
6c35038643 gcrypt: Ensure that signing key is set to one of the participants keys.
Otherwise gcrypt will fail to pull, since it requires this to be the case.

This needs a patched gcrypt, which is in my forked version.
2013-09-17 16:06:29 -04:00
Joey Hess
ab9dd6d8a0 sync: Fix bug that caused direct mode mappings to not be updated when merging files into the tree on Windows. 2013-09-13 13:49:28 -04:00
Joey Hess
7c1a9cdeb9 partially complete gcrypt remote (local send done; rest not)
This is a git-remote-gcrypt encrypted special remote. Only sending files
in to the remote works, and only for local repositories.

Most of the work so far has involved making initremote work. A particular
problem is that remote setup in this case needs to generate its own uuid,
derivied from the gcrypt-id. That required some larger changes in the code
to support.

For ssh remotes, this will probably just reuse Remote.Rsync's code, so
should be easy enough. And for downloading from a web remote, I will need
to factor out the part of Remote.Git that does that.

One particular thing that will need work is supporting hot-swapping a local
gcrypt remote. I think it needs to store the gcrypt-id in the git config of the
local remote, so that it can check it every time, and compare with the
cached annex-uuid for the remote. If there is a mismatch, it can change
both the cached annex-uuid and the gcrypt-id. That should work, and I laid
some groundwork for it by already reading the remote's config when it's
local. (Also needed for other reasons.)

This commit was sponsored by Daniel Callahan.
2013-09-07 18:38:00 -04:00
Joey Hess
dad34e0ea8 add getParticipantList
Note that it needs to look at global git config, since git-remote-gcrypt
will see any setting there as a fallback.
2013-09-05 16:34:13 -04:00
Joey Hess
a48a4e2f8a automatically derive an annex-uuid from a gcrypt-uuids 2013-09-05 16:02:39 -04:00
Joey Hess
6cdac3a003 sync, assistant: Force push of the git-annex branch.
Necessary to ensure it gets pushed to remotes after being rewritten by forget.
See inline rationalles for why I think this is safe!
2013-08-29 14:27:53 -04:00
guilhem
f754779c02 Unused: bugfix
Detect staged files that are not in the working tree.
2013-08-26 13:50:09 -04:00
guilhem
f15fda60ed Speed up the 'unused' command.
Instead of populating the second-level Bloom filter with every key
referenced in every Git reference, consider only those which differ
from what's referenced in the index.

Incidentaly, unlike with its old behavior, staged
modifications/deletion/... will now be detected by 'unused'.

Credits to joeyh for the algorithm. :-)
2013-08-25 21:02:13 -04:00
guilhem
b4a32c7506 Unescape characters in 'file://...' URIs.
That allows, in Git remotes, such URIs to contain spaces or UTF-8
characters. Closes http://git-annex.branchable.com/bugs/Unable_to_use_remotes_with_space_in_the_path/ .
2013-08-22 11:33:16 -04:00
Joey Hess
6fd2935a5a unused: Pay attention to symlinks that are not yet staged in the index. 2013-08-22 10:20:03 -04:00
Joey Hess
a3224ce35b avoid more build warnings on Windows 2013-08-04 14:05:36 -04:00
Joey Hess
b191d5c595 gitignore support for the assistant and watcher
Requires git 1.8.4 or newer. When it's installed, a background
git check-ignore process is run, and used to efficiently check ignores
whenever a new file is added.

Thanks to Adam Spiers, for getting the necessary support into git for this.

A complication is what to do about files that are gitignored but have
been checked into git anyway. git commands assume the ignore has been
overridden in this case, and not need any more overriding to commit a
changed version.

However, for the assistant to do the same, it would have to run git ls-files
to check if the ignored file is in git. This is somewhat expensive. Or it
could use the running git-cat-file process to query the file that way,
but that requires transferring the whole file content over a pipe, so it
can be quite expensive too, for files that are not git-annex
symlinks.

Now imagine if the user knows that a file or directory tree will be getting
frequent changes, and doesn't want the assistant to sync it, so gitignores
it. The assistant could overload the system with repeated ls-files checks!

So, I've decided that the assistant will not automatically commit changes
to files that are gitignored. This is a tradeoff. Hopefully it won't be a
problem to adjust .gitignore settings to not ignore files you want the
assistant to autocommit, or to manually git annex add files that are listed
in .gitignore.

(This could be revisited if git-annex gets access to an interface to check
the content of the index w/o forking a git command. This could be libgit2,
or perhaps a separate git cat-file --batch-check process, so it wouldn't
need to ship over the whole file content.)

This commit was sponsored by Francois Marier. Thanks!
2013-08-02 20:37:03 -04:00
Joey Hess
672cfc3923 better git version checking 2013-08-02 18:32:26 -04:00
Joey Hess
93f2371e09 get rid of __WINDOWS__, use mingw32_HOST_OS
The latter is harder for me to remember, but avoids build failures in code
used by the configure program.
2013-08-02 12:27:32 -04:00
Joey Hess
d16114d024 Slow and ugly work around for bug #718517 in git, which broke git-cat-file --batch for filenames containing spaces.
This runs git-cat-file in non-batch mode for all files with spaces.
If a directory tree has a lot of them, and is in direct mode, even "git
annex add" when there are few new files will need a *lot* of forks!

The only reason buffering the whole file content to get the sha is not a
memory leak is that git-annex only ever uses this on symlinks.

This needs to be reverted as soon as a fix is available in git!
2013-08-01 17:30:47 -04:00
Joey Hess
ebd778c519 Escape ':' in file/directory names to avoid it being treated as a pathspec by some git commands
A git pathspec is a filename, except when it starts with ':', it's taken
to refer to a branch, etc. Rather than special case ':', any filename
starting with anything unusual is prefixed with "./"

This could have been a real mess to deal with, but luckily SafeCommand
is already extensively used and so we know at the type level the difference
between parameters that are files, and parameters that are command options.

Testing did show that Git.Queue was not using SafeCommand on
filenames fed to xargs. (Filenames starting with '-' worked before only
because -- was used to separate filenames from options when calling eg git
add.)

The test suite now passes with filenames starting with ':'. However, I did
not keep that change to it, because such filenames are probably not legal
on windows, and I have enough ugly windows ifdefs in there as it is.

This commit was sponsored by Otavio Salvador. Thanks!
2013-08-01 15:15:49 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00