Commit graph

358 commits

Author SHA1 Message Date
Joey Hess
d72fb5acc2 Fix encoding of utf-8 etc when storing the description of repository and other content.
Write files in raw mode, to avoid mangling the encoding of content
provided.

Note: This was a longstanding problem, it was not introduced in v3.
2011-06-30 00:35:51 -04:00
Joey Hess
e1c18ddec4 Sped back up fsck, copy --from etc
All commands that often have to read a lot of information from
the git-annex branch should now be nearly as fast as before
the branch was introduced.

Before fsck was taking approximatly 3 hours, now it's running in 8 minutes.

The code is very nasty. It should be rewritten to read the header line
from git cat-file, and then read the specified number of bytes of content.
2011-06-29 21:47:31 -04:00
Joey Hess
af45d42224 Merge branch 'master' into v3
Conflicts:
	debian/changelog
2011-06-29 11:42:35 -04:00
Joey Hess
b3aaf980e4 --force will cause add, etc, to operate on ignored files. 2011-06-29 11:42:00 -04:00
Joey Hess
5034d8c298 Modify location log parser to allow future expansion.
Since the logs have just been moved into the git-annex branch, don't need
to worry about backwards compatability with old versions of git-annex that
would fail to parse location logs with extra fields tacked on.
2011-06-28 16:15:50 -04:00
Joey Hess
c90652f015 Always ensure git-annex branch exists. 2011-06-26 22:43:48 -04:00
Joey Hess
874fc044c1 releasing version 3.20110624 2011-06-24 14:58:07 -04:00
Joey Hess
068703c405 improve post-upgrade push instructions 2011-06-23 14:51:04 -04:00
Joey Hess
7ee636f6dd avoid unnecessary read of trust.log 2011-06-23 13:39:04 -04:00
Joey Hess
66ceb92702 docs 2011-06-22 23:37:46 -04:00
Joey Hess
68783fd5e0 let's have the major version number be annex.version 2011-06-22 23:02:58 -04:00
Joey Hess
ad3770e0b2 add merge subcommand 2011-06-22 18:46:56 -04:00
Joey Hess
80302d0b46 improve bare repo handing
Many more commands can work in bare repos now, thanks to the git-annex
branch.
2011-06-22 18:32:41 -04:00
Joey Hess
818ae0c6da docs for v3 2011-06-21 20:21:33 -04:00
Joey Hess
9f9e17aa0f unlock: Made atomic. 2011-06-20 22:38:18 -04:00
Joey Hess
c835166a7c add git-union-merge
This is a new git subcommand, that does a generic union merge operation
between two refs, storing the result in a branch. It operates efficiently
without touching the working tree. It does need to write out a temporary
index file, and may need to write out some other temp files as well.

This could be useful for anything that stores data in a branch,
and needs to merge changes into that branch without actually checking the
branch out. Since conflict handling can't be done without a working copy,
the merge type is always a union merge, which is fine for data stored in
log format (as git-annex does), or in non-conflicting files
(as pristine-tar does).

This probably belongs in git proper, but it will live in git-annex for now.

---

Plan is to move .git-annex/ to a git-annex branch, and use git-union-merge
to handle merging changes when pulling from remotes.

Some preliminary benchmarking using real .git-annex/ data indicates
that it's quite fast, except for the "git add" call, which is as slow
as "git add" tends to be with a big index.
2011-06-20 21:37:18 -04:00
Joey Hess
f547277b75 Allow --trust etc to specify a repository by name, for temporarily trusting repositories that are not configured remotes. 2011-06-13 22:19:44 -04:00
Joey Hess
30d7cce7ec rsync is now used when copying files from repos on other filesystems
cp is still used when copying file from repos on the same filesystem, since
--reflink=auto can make it significantly faster on filesystems such as
btrfs.

Directory special remotes still use cp, not rsync. It's not clear what
tmp file should be used when rsyncing to such a remote.
2011-06-13 20:33:52 -04:00
Joey Hess
38e0100a69 releasing version 0.20110610 2011-06-10 11:58:21 -04:00
Joey Hess
9a272815dd Bugfix: Fix fsck to not think all SHAnE keys are bad. 2011-06-10 11:43:28 -04:00
Joey Hess
90dd245522 get --from is the same as copy --from
get not honoring --from has surprised me a few times, so least surprise
suggests it should just behave like copy --from. This leaves the difference
between get and copy being that copy always requires the remote to copy
from, while get will decide whether to get a file from a key/value store or
a remote.
2011-06-09 18:54:49 -04:00
Joey Hess
a8fb97d2ce Add --trust, --untrust, and --semitrust options. 2011-06-01 17:57:31 -04:00
Joey Hess
3d567aa64f Add --numcopies option. 2011-06-01 16:49:17 -04:00
Joey Hess
dc92a788c7 releasing version 0.20110601 2011-06-01 12:00:25 -04:00
Joey Hess
038da52bdd Somewhat sped up git commit of modifications to unlocked files.
Avoid git reset here too, so I no longer need to care that it's much more
expensive than seems wise (but I asked the git list about that anyway).

It's not necessary to reset the staged file content from the index, as
the `git add` of the the symlink will replace it anyway.

`git commit` of unlocked files is still slow, since git still has to shove
their entire content into the index, only to have it be thrown away. So it's
still better to use `git annex add`
2011-05-31 16:08:37 -04:00
Joey Hess
fb259033d4 Fix locking of files with staged changes.
Previously, lock would skip files that had staged changes, but that is
counterintuitive, I think.
2011-05-31 15:00:56 -04:00
Joey Hess
fafe60768f Massively sped up git annex lock by avoiding use of the uber-slow git reset, and only running git checkout once, even when many files are being locked. 2011-05-31 14:50:41 -04:00
Joey Hess
14ffb5d47b bugfix: fix unused list numbering
Introduced in 43f0a666f0
2011-05-28 22:30:06 -04:00
Joey Hess
7ea54e1c6e releasing version 0.20110522 2011-05-27 20:28:01 -04:00
Joey Hess
82b88d0676 typo 2011-05-27 20:21:13 -04:00
Joey Hess
001edb008a Fix bug in --exclude introduced in 0.20110516. 2011-05-27 20:20:20 -04:00
Joey Hess
5b941980aa Closer emulation of git's behavior when told to use "foo/.git" as a git repository instead of just "foo". Closes: #627563 2011-05-22 14:12:16 -04:00
Joey Hess
8ed27db18f add explict build dep on hslogger
pulled in by missingh, but now used directly by git-annex
2011-05-21 13:03:13 -04:00
Joey Hess
944b1207dc releasing version 0.20110521 2011-05-21 11:58:35 -04:00
Joey Hess
93a4f3d4e6 Add --debug option. Closes: #627499
This takes advantage of the debug logging done by missingh, and I added
my own debug messages for executeFile calls. There are still some other
low-level ways git-annex runs stuff that are not shown by debugging,
but this gets most of it easily.
2011-05-21 11:52:13 -04:00
Joey Hess
cd83541872 --backend now overrides any backend configured in .gitattributes files. 2011-05-18 19:34:46 -04:00
Joey Hess
a8816efc14 status: New subcommand to show info about an annex, including its size. 2011-05-16 21:18:34 -04:00
Joey Hess
3ab15b9f4f releasing version 0.20110516 2011-05-16 15:01:05 -04:00
Joey Hess
5256a6b011 migrate: Use current filename when generating new key, for backends where the filename affects the key name. 2011-05-16 12:10:08 -04:00
Joey Hess
e7b309ce02 clarify 2011-05-16 11:49:52 -04:00
Joey Hess
2a8efc7af1 Added filename extension preserving variant backends SHA1E, SHA256E, etc. 2011-05-16 11:46:34 -04:00
Joey Hess
1d2984441c add a few tweaks to make it easy to use the Internet Archive's variant of S3
In particular, munge key filenames to comply with the IA's filename limits,
disable encryption, support their nonstandard way of creating buckets, and
allow x-amz-* headers to be specified in initremote to set item metadata.

Still TODO: initremote does not handle multiword metadata headers right.
2011-05-16 11:20:35 -04:00
Joey Hess
078a6fbd76 Work around a bug in Network.URI's handling of bracketed ipv6 addresses. 2011-05-06 15:21:30 -04:00
Joey Hess
86d3205061 releasing version 0.20110503 2011-05-03 21:49:20 -04:00
Joey Hess
1f84c7a964 S3: When encryption is enabled, the Amazon S3 login credentials are stored, encrypted, in .git-annex/remotes.log, so environment variables need not be set after the remote is initialized. 2011-05-01 14:05:10 -04:00
Joey Hess
43f0a666f0 unused: Now also lists files fsck places in .git/annex/bad/ 2011-04-29 13:59:00 -04:00
Joey Hess
eef3f634e9 Avoid crashing when an existing key is readded to the annex. 2011-04-28 20:41:40 -04:00
Joey Hess
07576f2a2c documentation for hook special remotes
Releasing before I have quite finished the code. Got a little caught
up in Anathem references. Time for a walk and then a tiny bit more coding
and possibly testing.
2011-04-28 15:26:21 -04:00
Joey Hess
d7b330b33b Fix hasKeyCheap setting for bup and rsync special remotes. 2011-04-28 14:39:51 -04:00
Joey Hess
84e1ebfb0e erm, thought I committed this release? 2011-04-28 14:38:01 -04:00
Joey Hess
7a33803193 Avoid pipeline stall when running git annex drop or fsck on a lot of files.
When it's stalled, there are 3 processes:

git annex
  git ls-files
  git check-attr

git-annex stalls trying to write to git check-attr, which stalls trying to
write to stdout (read by git-annex).

git ls-files does not seem to be involved directly; I've seen the stall when
it was still streaming out the file list, and after it had exited and
zombified.

The read and write are supposed to be handled by two different threads,
which pipeBoth forks off, thus avoiding deadlock. But it does deadlock.
(Certian signals unblock the deadlock for a while, then it stalls again.)

So, this is another case of WTF is the ghc IO manager doing today?
I avoid the issue by converting the writer to a separate process.

Possibly this was caused by some change in ghc 7 -- I'm offline and cannot
verify now, but I'm sure I used to be able to run git annex drop w/o it
hanging! And the code does not seem to have changed, except for commit
c1dc407941, which I tried reverting without
success. In fact, I reverted all the way back to 0.20110316 and still
saw the stall.

Update: Minimal test case:

import System.Cmd.Utils

main = do
	as <- checkAttr "blah" $ map show [1..100000]
	sequence $ map (putStrLn . show) as

checkAttr attr files = do
	(_, s) <- pipeBoth "git" params $ unlines files
	return $ lines s
	where
		params = ["check-attr", attr, "--stdin"]

Bug filed on ghc in debian, #624389
2011-04-27 23:18:35 -04:00
Joey Hess
39966ba4ee filter out --delete rsync option
rsync does not have a --no-delete, so do it this way instead
2011-04-27 20:31:56 -04:00
Joey Hess
e68f128a9b rsync special remote
Fully tested and working, including resuming and encryption. (Though not
resuming when sending *with* encryption; gpg doesn't produce identical
output each time.)

Uses same layout as the directory special remote and the .git/annex/objects/
directory.
2011-04-27 20:23:09 -04:00
Joey Hess
27774bdd56 Revert "Use haskell Crypto library instead of haskell SHA library.a"
This reverts commit 892593c5ef.

Conflicts:

	Crypto.hs
	debian/control
2011-04-26 11:24:23 -04:00
Joey Hess
7d71f8770b releasing version 0.20110425 2011-04-25 16:02:57 -04:00
Joey Hess
76911a446a Avoid using absolute paths when staging location log, as that can confuse git when a remote's path contains a symlink. Closes: #621386
This was a real PITA to fix, since location logs can be staged in
both the current repo, as well as in local remote's repos, in
which case the cwd will not be in the repo. And git add needs different
params in both cases, when absolute paths are not used.

In passing, git annex fsck now stages location log fixes.
2011-04-25 14:54:24 -04:00
Joey Hess
8512a4a1a1 Remove testpack from build depends, as it is not available on all architectures.
The test suite will not be run if it cannot be compiled.

It may be possible later to split off the quickcheck using tests into
a separate program and keep most of the tests using just hunit.
2011-04-25 12:43:22 -04:00
Joey Hess
892593c5ef Use haskell Crypto library instead of haskell SHA library.a
Since hS3 needs Crypto anyway, this actually reduces dependencies.
2011-04-21 16:37:14 -04:00
Joey Hess
24feee25c9 releasing version 0.20110420 2011-04-21 15:11:51 -04:00
Joey Hess
6668a061a8 typo 2011-04-21 14:53:07 -04:00
Joey Hess
2467c56771 update on S3 memory leaks
The remaining leaks are in hS3. The leak with encryption was worked around
by the use of the temp file. (And was probably originally caused by
gpgCipherHandle sparking a thread which kept a reference to the start
of the byte string.)
2011-04-21 11:06:29 -04:00
Joey Hess
6fcd3e1ef7 fix S3 upload buffering problem
Provide file size to new version of hS3.
2011-04-21 10:33:17 -04:00
Joey Hess
d8329731c6 missing build dep 2011-04-21 09:58:32 -04:00
Joey Hess
43639f69f6 ghc7
* Update Debian build dependencies for ghc 7.
* Debian package is now built with S3 support. Thanks Joachim Breitner for
  making this possible, also thanks Greg Heartsfield for working to improve
  the hS3 library for git-annex.

Also hid a conflicting new symbol from Control.Monad.State
2011-04-21 02:22:40 -04:00
Joey Hess
143fc7b692 finalize release 2011-04-19 21:40:21 -04:00
Joey Hess
5985acdfad bup: Avoid memory leak when transferring encrypted data.
This was a most surprising leak. It occurred in the process that is forked
off to feed data to gpg. That process was passed a lazy ByteString of
input, and ghc seemed to not GC the ByteString as it was lazily read
and consumed, so memory slowly leaked as the file was read and passed
through gpg to bup.

To fix it, I simply changed the feeder to take an IO action that returns
the lazy bytestring, and fed the result directly to hPut.

AFAICS, this should change nothing WRT buffering. But somehow it makes
ghc's GC do the right thing. Probably I triggered some weakness in ghc's
GC (version 6.12.1).

(Note that S3 still has this leak, and others too. Fixing it will involve
another dance with the type system.)

Update: One theory I have is that this has something to do with
the forking of the feeder process. Perhaps, when the ByteString
is produced before the fork, ghc decides it need to hold a pointer
to the start of it, for some reason -- maybe it doesn't realize that
it is only used in the forked process.
2011-04-19 15:27:03 -04:00
Joey Hess
a441e08da1 Fix stalls in S3 when transferring encrypted data.
Stalls were caused by code that did approximatly:

content' <- liftIO $ withEncryptedContent cipher content return
store content'

The return evaluated without actually reading content from S3,
and so the cleanup code began waiting on gpg to exit before
gpg could send all its data.

Fixing it involved moving the `store` type action into the IO monad:

liftIO $ withEncryptedContent cipher content store

Which was a bit of a pain to do, thank you type system, but
avoids the problem as now the whole content is consumed, and
stored, before cleanup.
2011-04-19 14:45:19 -04:00
Joey Hess
a91a51fc03 Add missing build dep on dataenc. 2011-04-17 14:41:24 -04:00
Joey Hess
7aa668f4b4 Don't run gpg in batch mode, so it can prompt for passphrase when there is no agent. 2011-04-17 14:30:22 -04:00
Joey Hess
36f048979f releasing version 0.20110417 2011-04-17 12:43:36 -04:00
Joey Hess
11da36e48f build dep update 2011-04-16 23:05:26 -04:00
Joey Hess
1247bfeaa7 gpg recommended 2011-04-16 19:13:05 -04:00
Joey Hess
44c65f40b7 bup is now supported as a special type of remote. 2011-04-08 16:44:43 -04:00
Joey Hess
e2404ca409 refactor away whichCmd and some other cleanup 2011-04-07 22:03:31 -04:00
Joey Hess
b889543507 let's use Maybe String for commands that may not be avilable 2011-04-07 21:47:56 -04:00
Joey Hess
bc51387e6d Periodically flush git command queue, to avoid boating memory usage too much.
Since the queue is flushed in between subcommand actions being run,
there should be no issues with actions that expect to queue up some stuff
and have it run after they do other stuff. So I didn't have to audit for
such assumptions.
2011-04-07 13:59:31 -04:00
Joey Hess
ab0e03498f Add doc-base file. Closes: #621408 2011-04-06 21:57:22 -04:00
Joey Hess
c1bbe43422 Add build depend on perlmagick so docs are consistently built. Closes: #621410 2011-04-06 21:53:06 -04:00
Joey Hess
216ad1a4d3 Clear up short option confusion between --from and --force (-f is now --from, and there is no short option for --force). 2011-04-03 12:18:38 -04:00
Joey Hess
868300d4c1 unused/dropunused: support --from 2011-04-02 21:35:02 -04:00
Joey Hess
616e6f8a84 Use lowercase hash directories for locationlog files
to avoid some issues with git on OSX with the mixed-case directories. No
migration is needed; the old mixed case hash directories are still read;
new information is written to the new directories.
2011-04-02 13:49:03 -04:00
Joey Hess
1283ef73f8 releasing version 0.20110401 2011-04-01 21:31:37 -04:00
Joey Hess
ed7fc4fce9 Bugfix: copy --to --fast never really copied, fixed. 2011-04-01 12:34:06 -04:00
Joey Hess
a47ed922e1 add Remote.Directory 2011-03-30 13:24:36 -04:00
Joey Hess
9c96d86502 nasty hack to build when hS3 is not available
So, it would be nicer to just use Cabal and take advantage
of its conditional compilation support. But, Cabal seems to
lack good support for a package with an internal library that is used by
multiple executables. It wants to build everything twice or more.
That's too slow for me.

Anyway, fairly soon, I expect to upgrade hS3 to a requirment, and I
can just revert this.
2011-03-30 01:32:05 -04:00
Joey Hess
43bdebbc2d update 2011-03-29 18:24:26 -04:00
Joey Hess
996e5eee01 Merge branch 'master' into s3
Conflicts:
	debian/changelog
2011-03-28 16:34:58 -04:00
Joey Hess
0956f0dd15 fsck: Ensure that files and directories in .git/annex/objects have proper permissions. 2011-03-28 16:19:20 -04:00
Joey Hess
3162a724f1 S3 updates; gpg keys 2011-03-28 13:48:17 -04:00
Joey Hess
c5fc4f3d2a Merge branch 'master' into s3
Conflicts:
	debian/changelog
2011-03-28 13:20:58 -04:00
Joey Hess
1b6927995d releasing version 0.20110328 2011-03-28 11:12:32 -04:00
Joey Hess
016eea0280 Bugfix: Keys could be received into v1 annexes from v2 annexes, via v1 git-annex-shell. This results in some oddly named keys in the v1 annex. Recognise and fix those keys when upgrading, instead of crashing. 2011-03-28 09:27:28 -04:00
Joey Hess
1878745a46 more s3 docs 2011-03-28 02:13:26 -04:00
Joey Hess
a7bd63eb01 basic s3 remote start
But bucket name is not handled right; it needs to be globally unique.
2011-03-28 01:32:47 -04:00
Joey Hess
4868b64868 Provide a less expensive version of git annex copy --to, enabled via --fast. This assumes that location tracking information is correct, rather than contacting the remote for every file. 2011-03-27 18:34:30 -04:00
Joey Hess
f8693facab doc update 2011-03-27 17:30:44 -04:00
Joey Hess
8bcdf42b99 annex.diskreserve can be given in arbitrary units (ie "0.5 gigabytes") 2011-03-26 14:37:39 -04:00
Joey Hess
bc80ace96b releasing version 0.20110325 2011-03-25 00:51:12 -04:00
Joey Hess
03fdd0d56e dropunused: Significantly sped up; only read unused log file once. 2011-03-23 23:47:02 -04:00
Joey Hess
6246b807f7 migrate: Support migrating v1 SHA keys to v2 SHA keys with size information that can be used for free space checking. 2011-03-23 17:57:10 -04:00