Commit graph

340 commits

Author SHA1 Message Date
Joey Hess
6edac746f0 merge improved fsck types from git-repair and some associated changes 2013-11-30 14:29:11 -04:00
Joey Hess
0980f3dae6 Fix bug that broke switching between local repositories in the webapp when they use the new guarded direct mode.
git treats eg ~/annex as a bare git repository located in ~/.annex/.git
if ~/annex/.git/config has core.bare=true.
2013-11-22 23:27:15 -04:00
Joey Hess
d490bbb891 make runRepairOf run preRepair
This may be a little late, since a fsck has already been done,
but it can't hurt.
2013-11-21 20:13:55 -04:00
Joey Hess
7d682dd844 merge from git-repair 2013-11-21 20:07:44 -04:00
Joey Hess
ff2b0a9df6 merge from git-repair 2013-11-21 00:43:30 -04:00
Joey Hess
8217e97d88 merge from git-repair 2013-11-20 19:34:30 -04:00
Joey Hess
e80d935b53 merge from git-repair 2013-11-20 19:16:42 -04:00
Joey Hess
8a466247ed merge from git-repair 2013-11-20 18:45:22 -04:00
Joey Hess
7dbb702edd merge from git-repair 2013-11-20 18:31:00 -04:00
Joey Hess
ef34316c45 fix repair failure that occurred when index was corrupted, and other objects too
In this case, the index problem prevented fsck from finding the other
problems.
2013-11-19 17:16:33 -04:00
Joey Hess
b1ed98636b merge with git-repair 2013-11-19 17:08:57 -04:00
Joey Hess
b245aa40df moving git-repair to its own package 2013-11-18 13:24:55 -04:00
Joey Hess
eab4470440 better handling of missing index file 2013-11-13 14:39:26 -04:00
Joey Hess
13108b7196 assistant: Notice on startup when the index file is corrupt, and auto-repair. 2013-11-13 14:27:17 -04:00
Joey Hess
5e7e0c7dc0 repair: Handle case where index file is corrupt, but all objects are ok. 2013-11-13 13:41:02 -04:00
Joey Hess
958312885f webapp: Improve UI around remote that have no annex.uuid set, either because setup of them is incomplete, or because the remote git repository is not a git-annex repository.
Complicated by such repositories potentially being repos that should have
an annex.uuid, but it failed to be gotten, perhaps due to the past ssh repo
setup bugs. This is handled now by an Upgrade Repository button.
2013-11-07 18:02:00 -04:00
Joey Hess
59ecc804cd add new status command
This works for both direct and indirect mode.

It may need some performance tuning.

Note that unlike git status, it only shows the status of the work tree, not
the status of the index. So only one status letter, not two .. and since
files that have been added and not yet committed do not differ between the
work tree and the index, they are not shown. Might want to add display of
the index vs the last commit eventually.

This commit was sponsored by an unknown bitcoin contributor, whose
contribution as been going up lately! ;)
2013-11-07 14:07:25 -04:00
Joey Hess
3802f2f270 work around lack of receive.denyCurrentBranch in direct mode
Now that direct mode sets core.bare=true, git's normal prohibition about
pushing into the currently checked out branch doesn't work.

A simple fix for this would be an update hook which blocks the pushes..
but git hooks must be executable, and git-annex needs to be usable on eg,
FAT, which lacks x bits.

Instead, enabling direct mode switches the branch (eg master) to a special
purpose branch (eg annex/direct/master). This branch is not pushed when
syncing; instead any changes that git annex sync commits get written to
master, and it's pushed (along with synced/master) to the remote.

Note that initialization has been changed to always call setDirect,
even if it's just setDirect False for indirect mode. This is needed because
if the user has just cloned a direct mode repo, that nothing has synced
with before, it may have no master branch, and only a annex/direct/master.
Resulting in that branch being checked out locally too. Calling setDirect False
for indirect mode moves back out of this branch, to a new master branch,
and ensures that a manual "git push" doesn't push changes directly to
the annex/direct/master of the remote. (It's possible that the user
makes a commit w/o using git-annex and pushes it, but nothing I can do
about that really.)

This commit was sponsored by Jonathan Harrington.
2013-11-05 21:08:31 -04:00
Joey Hess
cf34e59c8c factor out update 2013-11-05 18:20:52 -04:00
Joey Hess
4510819215 v5 for direct mode, with automatic upgrade
This includes storing the current state of the HEAD ref, which git annex
sync is going to need, but does not make sync use it.
2013-11-05 17:05:03 -04:00
Joey Hess
04768e44b2 automatically set and unset core.bare when switching to/from direct mode 2013-11-05 15:41:24 -04:00
Joey Hess
0edd9ec03a refactored hook setup 2013-11-05 15:29:56 -04:00
Joey Hess
c2862d9585 pass -c option on to all git commands run
The -c option now not only modifies the git configuration seen by
git-annex, but it is passed along to every git command git-annex runs.

This was easy to plumb through because gitCommandLine is already used to
construct every git command line, to add --git-dir and --work-tree
2013-11-05 13:38:37 -04:00
Joey Hess
58db042033 map: Work when there are gcrypt remotes. 2013-11-04 14:14:44 -04:00
Joey Hess
7ed8e87a34 assistant: Support repairing git remotes that are locally accessible
(eg, on removable drives)

gcrypt remotes are not yet handled.

This commit was sponsored by Sören Brunk.
2013-10-27 15:38:59 -04:00
Joey Hess
0036139b33 wire git repair into webapp 2013-10-23 14:43:58 -04:00
Joey Hess
1ab2ad86c7 minor 2013-10-23 13:19:37 -04:00
Joey Hess
435ea52f3c repair command: add handling of git-annex branch and index 2013-10-23 13:00:45 -04:00
Joey Hess
d5eb85acf4 add repair command 2013-10-23 12:21:59 -04:00
Joey Hess
d345e5b52f add git fsck to cronner, and UI for repository repair (not yet wired up) 2013-10-22 16:02:52 -04:00
Joey Hess
44bb9a808f clean warnings 2013-10-22 14:52:17 -04:00
Joey Hess
ff3f654cbe make git fsck batch-capable 2013-10-22 14:49:41 -04:00
Joey Hess
3e61749d08 index file recovery 2013-10-22 12:58:04 -04:00
Joey Hess
2fb08acda5 add reflog 2013-10-21 16:41:46 -04:00
Joey Hess
18487c779f corrupt branch resetting (but not yet reflog walking) 2013-10-21 16:20:54 -04:00
Joey Hess
fcd91be6f0 implemented removal of corrupt tracking branches
Oh, git, you made this so hard. Not determining if a branch pointed to some
corrupt object, that was easy, but dealing with corrupt branches using git
plumbing is a PITA.
2013-10-21 15:28:06 -04:00
Joey Hess
6d8250c255 avoid redundant fsck when no changes are made 2013-10-20 19:42:17 -04:00
Joey Hess
4f871f89ba git-recover-repository 1/2 done 2013-10-20 17:50:51 -04:00
Joey Hess
f482de1b76 remove workaround for bug in git 1.8.4r0 2013-10-20 15:23:06 -04:00
Joey Hess
edbf177628 fix lsTreeFiles to use --full-tree
This makes it show the full tree, not just the current directory,
and enables --full-name, which yields TopFilePaths.
2013-10-18 15:50:26 -04:00
Joey Hess
c979e0ea62 fix 2013-10-17 19:51:16 -04:00
Joey Hess
c116383b5d fix 2013-10-17 19:49:44 -04:00
Joey Hess
81c4259a0d fix 2013-10-17 19:41:00 -04:00
Joey Hess
16243b9972 missing import 2013-10-17 19:39:22 -04:00
Joey Hess
e93206e294 Windows: Deal with strange msysgit 1.8.4 behavior of not understanding DOS formatted paths for --git-dir and --work-tree. 2013-10-17 19:35:57 -04:00
Joey Hess
aff125ddab try working around windows xargs problem 2013-10-17 15:56:56 -04:00
Joey Hess
d785432f78 use TopFilePath for DiffTree and LsTree 2013-10-17 14:51:19 -04:00
Joey Hess
82ff37520f fix off-by-one 2013-10-16 12:14:14 -04:00
Joey Hess
bac078742d Deal with git check-attr -z output format change in git 1.8.5.
I have not actually tested with 1.8.5, which is not yet relesaed, but
git.git commit f7cd8c50b9ab83e084e8f52653ecc8d90665eef2 changes -z
to also apply to output, without regards to back-compat. (But with pretty
good reasons.)

New code should work with both versions, by fingerprinting for NULs and
newlines.
2013-10-15 16:05:27 -04:00
Joey Hess
f1295b5141 fix windows build 2013-10-02 20:26:00 -04:00
Joey Hess
1536ebfe47 Disable receive.denyNonFastForwards when setting up a gcrypt special remote
gcrypt needs to be able to fast-forward the master branch. If a git
repository is set up with git init --shared --bare, it gets that set, and
pushing to it will then fail, even when it's up-to-date.
2013-10-01 15:23:48 -04:00
Joey Hess
57d49a6d04 remove *>=> and >=*> ; use <$$> instead
I forgot I had <$$> hidden away in Utility.Applicative.
It allows doing the same kind of currying as does >=*>
and I found using it made the code more readable for me.

(*>=> was not used)
2013-09-27 19:58:48 -04:00
Joey Hess
e864c8d033 blind enabling gcrypt repos on rsync.net
This pulls off quite a nice trick: When given a path on rsync.net, it
determines if it is an encrypted git repository that the user has
the key to decrypt, and merges with it. This is works even when
the local repository had no idea that the gcrypt remote exists!

(As previously done with local drives.)

This commit sponsored by Pedro Côrte-Real
2013-09-27 16:21:56 -04:00
Joey Hess
1550759220 enabling rsync.net gcrypt repos
Still need to detect when the user is trying to create a repo
that already exists, and jump to the enabling code.
2013-09-26 23:47:30 -04:00
Joey Hess
735ed3b822 prep for enabling remotre gcrypt repos in webapp 2013-09-26 17:26:13 -04:00
Joey Hess
3192b059b5 add back lost check that git-annex-shell supports gcrypt 2013-09-24 17:51:12 -04:00
Joey Hess
7390f08ef9 Use cryptohash rather than SHA for hashing.
This is a massive win on OSX, which doesn't have a sha256sum normally.

Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.

SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.

Used the following benchmark to arrive at the 1 mb number.

1 mb file:

benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
  2 (2.0%) high mild
  1 (1.0%) high severe

2 mb file:

benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)

import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common

testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk

main = defaultMain
        [ bgroup "sha256"
                [ bench "internal" $ whnfIO internal
                , bench "external" $ whnfIO external
                ]
        ]

sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy

internal :: IO String
internal = show . sha256 <$> L.readFile testfile

external :: IO String
external = do
	s <- readProcess "sha256sum" [testfile]
        return $ fst $ separate (== ' ') s
2013-09-22 20:06:02 -04:00
Joey Hess
006cf7976f more completely solve catKey memory leak
Done using a mode witness, which ensures it's fixed everywhere.

Fixing catFileKey was a bear, because git cat-file does not provide a
nice way to query for the mode of a file and there is no other efficient
way to do it. Oh, for libgit2..

Note that I am looking at tree objects from HEAD, rather than the index.
Because I cat-file cannot show a tree object for the index.
So this fix is technically incomplete. The only cases where it matters
are:

1. A new large file has been directly staged in git, but not committed.
2. A file that was committed to HEAD as a symlink has been staged
   directly in the index.

This could be fixed a lot better using libgit2.
2013-09-19 16:41:21 -04:00
Joey Hess
f26c996dc6 interface to parse git tree objects 2013-09-19 15:58:35 -04:00
Joey Hess
eb42bde19a sync, pre-commit, indirect: Avoid unnecessarily catting non-symlink files from git, which can be so large it runs out of memory. 2013-09-19 14:48:42 -04:00
Joey Hess
e8e209f4e5 better probing for gcrypt repositories using new --check option
Now can tell if a repo uses gcrypt or not, and whether it's decryptable
with the current gpg keys.

This closes the hole that undecryptable gcrypt repos could have before been
combined into the repo in encrypted mode.
2013-09-19 12:53:24 -04:00
Joey Hess
8062f6337f webapp: support adding existing gcrypt special remotes from removable drives
When adding a removable drive, it's now detected if the drive contains
a gcrypt special remote, and that's all handled nicely. This includes
fetching the git-annex branch from the gcrypt repo in order to find
out how to set up the special remote.

Note that gcrypt repos that are not git-annex special remotes are not
supported. It will attempt to detect such a gcrypt repo and refuse
to use it. (But this is hard to do any may fail; see
https://github.com/blake2-ppc/git-remote-gcrypt/issues/6)

The problem with supporting regular gcrypt repos is that we don't know
what the gcrypt.participants setting is intended to be for the repo.
So even if we can decrypt it, if we push changes to it they might not be
visible to other participants.

Anyway, encrypted sneakernet (or mailnet) is now fully possible with the
git-annex assistant! Assuming that the gpg key distribution is handled
somehow, which the assistant doesn't yet help with.

This commit was sponsored by Navishkar Rao.
2013-09-18 15:55:31 -04:00
Joey Hess
6c35038643 gcrypt: Ensure that signing key is set to one of the participants keys.
Otherwise gcrypt will fail to pull, since it requires this to be the case.

This needs a patched gcrypt, which is in my forked version.
2013-09-17 16:06:29 -04:00
Joey Hess
ab9dd6d8a0 sync: Fix bug that caused direct mode mappings to not be updated when merging files into the tree on Windows. 2013-09-13 13:49:28 -04:00
Joey Hess
7c1a9cdeb9 partially complete gcrypt remote (local send done; rest not)
This is a git-remote-gcrypt encrypted special remote. Only sending files
in to the remote works, and only for local repositories.

Most of the work so far has involved making initremote work. A particular
problem is that remote setup in this case needs to generate its own uuid,
derivied from the gcrypt-id. That required some larger changes in the code
to support.

For ssh remotes, this will probably just reuse Remote.Rsync's code, so
should be easy enough. And for downloading from a web remote, I will need
to factor out the part of Remote.Git that does that.

One particular thing that will need work is supporting hot-swapping a local
gcrypt remote. I think it needs to store the gcrypt-id in the git config of the
local remote, so that it can check it every time, and compare with the
cached annex-uuid for the remote. If there is a mismatch, it can change
both the cached annex-uuid and the gcrypt-id. That should work, and I laid
some groundwork for it by already reading the remote's config when it's
local. (Also needed for other reasons.)

This commit was sponsored by Daniel Callahan.
2013-09-07 18:38:00 -04:00
Joey Hess
dad34e0ea8 add getParticipantList
Note that it needs to look at global git config, since git-remote-gcrypt
will see any setting there as a fallback.
2013-09-05 16:34:13 -04:00
Joey Hess
a48a4e2f8a automatically derive an annex-uuid from a gcrypt-uuids 2013-09-05 16:02:39 -04:00
Joey Hess
6cdac3a003 sync, assistant: Force push of the git-annex branch.
Necessary to ensure it gets pushed to remotes after being rewritten by forget.
See inline rationalles for why I think this is safe!
2013-08-29 14:27:53 -04:00
guilhem
f754779c02 Unused: bugfix
Detect staged files that are not in the working tree.
2013-08-26 13:50:09 -04:00
guilhem
f15fda60ed Speed up the 'unused' command.
Instead of populating the second-level Bloom filter with every key
referenced in every Git reference, consider only those which differ
from what's referenced in the index.

Incidentaly, unlike with its old behavior, staged
modifications/deletion/... will now be detected by 'unused'.

Credits to joeyh for the algorithm. :-)
2013-08-25 21:02:13 -04:00
guilhem
b4a32c7506 Unescape characters in 'file://...' URIs.
That allows, in Git remotes, such URIs to contain spaces or UTF-8
characters. Closes http://git-annex.branchable.com/bugs/Unable_to_use_remotes_with_space_in_the_path/ .
2013-08-22 11:33:16 -04:00
Joey Hess
6fd2935a5a unused: Pay attention to symlinks that are not yet staged in the index. 2013-08-22 10:20:03 -04:00
Joey Hess
a3224ce35b avoid more build warnings on Windows 2013-08-04 14:05:36 -04:00
Joey Hess
b191d5c595 gitignore support for the assistant and watcher
Requires git 1.8.4 or newer. When it's installed, a background
git check-ignore process is run, and used to efficiently check ignores
whenever a new file is added.

Thanks to Adam Spiers, for getting the necessary support into git for this.

A complication is what to do about files that are gitignored but have
been checked into git anyway. git commands assume the ignore has been
overridden in this case, and not need any more overriding to commit a
changed version.

However, for the assistant to do the same, it would have to run git ls-files
to check if the ignored file is in git. This is somewhat expensive. Or it
could use the running git-cat-file process to query the file that way,
but that requires transferring the whole file content over a pipe, so it
can be quite expensive too, for files that are not git-annex
symlinks.

Now imagine if the user knows that a file or directory tree will be getting
frequent changes, and doesn't want the assistant to sync it, so gitignores
it. The assistant could overload the system with repeated ls-files checks!

So, I've decided that the assistant will not automatically commit changes
to files that are gitignored. This is a tradeoff. Hopefully it won't be a
problem to adjust .gitignore settings to not ignore files you want the
assistant to autocommit, or to manually git annex add files that are listed
in .gitignore.

(This could be revisited if git-annex gets access to an interface to check
the content of the index w/o forking a git command. This could be libgit2,
or perhaps a separate git cat-file --batch-check process, so it wouldn't
need to ship over the whole file content.)

This commit was sponsored by Francois Marier. Thanks!
2013-08-02 20:37:03 -04:00
Joey Hess
672cfc3923 better git version checking 2013-08-02 18:32:26 -04:00
Joey Hess
93f2371e09 get rid of __WINDOWS__, use mingw32_HOST_OS
The latter is harder for me to remember, but avoids build failures in code
used by the configure program.
2013-08-02 12:27:32 -04:00
Joey Hess
d16114d024 Slow and ugly work around for bug #718517 in git, which broke git-cat-file --batch for filenames containing spaces.
This runs git-cat-file in non-batch mode for all files with spaces.
If a directory tree has a lot of them, and is in direct mode, even "git
annex add" when there are few new files will need a *lot* of forks!

The only reason buffering the whole file content to get the sha is not a
memory leak is that git-annex only ever uses this on symlinks.

This needs to be reverted as soon as a fix is available in git!
2013-08-01 17:30:47 -04:00
Joey Hess
ebd778c519 Escape ':' in file/directory names to avoid it being treated as a pathspec by some git commands
A git pathspec is a filename, except when it starts with ':', it's taken
to refer to a branch, etc. Rather than special case ':', any filename
starting with anything unusual is prefixed with "./"

This could have been a real mess to deal with, but luckily SafeCommand
is already extensively used and so we know at the type level the difference
between parameters that are files, and parameters that are command options.

Testing did show that Git.Queue was not using SafeCommand on
filenames fed to xargs. (Filenames starting with '-' worked before only
because -- was used to separate filenames from options when calling eg git
add.)

The test suite now passes with filenames starting with ':'. However, I did
not keep that change to it, because such filenames are probably not legal
on windows, and I have enough ugly windows ifdefs in there as it is.

This commit was sponsored by Otavio Salvador. Thanks!
2013-08-01 15:15:49 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
4e2fab90d5 avoid newline translation when writing to git hash-object
They're like mushrooms, just keep popping up.
2013-06-18 15:08:51 -04:00
Joey Hess
02c51266ec missed another hash-object call, disable filtering there too 2013-06-18 14:48:15 -04:00
Joey Hess
a1f8771d2b avoid filtering object being hashed
This avoids newline conversion being done on it in Windows.
2013-06-18 13:42:16 -04:00
Joey Hess
077ca355d0 Revert "flush stream after each write to update-index, to possibly avoid buffering issues on Windows"
Didn't help.
2013-06-14 14:34:24 -04:00
Joey Hess
b97a9ea786 flush stream after each write to update-index, to possibly avoid buffering issues on Windows 2013-06-14 14:25:17 -04:00
Joey Hess
91c4dcfc69 Can now restart certain long-running git processes if they crash, and continue working.
Fuzz tests have shown that git cat-file --batch sometimes stops running.
It's not yet known why (no error message; repo seems ok). But this is
something we can deal with in the CoProcess framework, since all 3 types of
long-running git processes should be restartable if they fail.

Note that, as implemented, only IO errors are caught. So an error thrown
by the reveiver, when it sees something that is not valid output from
git cat-file (etc) will not cause a restart. I don't want it to retry
if git commands change their output or are just outputting garbage.
This does mean that if the command did a partial output and crashed in the
middle, it would still not be restarted.

There is currently no guard against restarting a command repeatedly, if,
for example, it crashes repeatedly on startup.
2013-05-31 12:42:13 -04:00
Joey Hess
a600471a23 include HEAD in CanPush shas 2013-05-21 20:04:38 -04:00
Joey Hess
08c03b2af3 XMPP: Avoid redundant and unncessary pushes. Note that this breaks compatibility with previous versions of git-annex, which will refuse to accept any XMPP pushes from this version. 2013-05-21 18:24:29 -04:00
Joey Hess
25dba9da24 fix windows build 2013-05-21 13:07:43 -04:00
Joey Hess
369fb69fe7 fix warning 2013-05-20 18:01:27 -04:00
Joey Hess
25cb9a48da fix the day's Windows permissions damage 2013-05-14 20:15:14 -04:00
Joey Hess
959536ef03 fill in a few windows stubs 2013-05-14 16:32:03 -05:00
Joey Hess
306a36260f typo 2013-05-14 15:44:49 -04:00
Joey Hess
7b92ffc3a1 more leaning toothpick fixes 2013-05-14 15:43:23 -04:00
Joey Hess
dc66b1f27d Merge branch 'master' into windows
Conflicts:
	Annex/Environment.hs
	Build/Configure.hs
	Git/Construct.hs
	Utility/FileMode.hs
2013-05-14 15:37:24 -04:00
Joey Hess
81cded2b9d detect local urls on DOS 2013-05-14 15:27:39 -04:00
Joey Hess
03e8594369 fix the day's windows permissions damage 2013-05-12 19:09:48 -04:00
Joey Hess
73d2f8b280 deal with git using / internally, even on DOS 2013-05-12 17:29:49 -05:00
Joey Hess
06551ad86b set raw mode for git check-attr 2013-05-12 16:37:06 -05:00
Joey Hess
abe8d549df fix permission damage (thanks, Windows) 2013-05-11 23:54:25 -04:00
Joey Hess
5e1458152f refactoring 2013-05-11 23:11:56 -04:00