Commit graph

507 commits

Author SHA1 Message Date
Joey Hess
8cae4115a8 releasing version 3.20120227 2012-02-27 13:07:04 -04:00
Joey Hess
2fd294d06f move --from, copy --from: 10 times faster scanning remote on local disk
Rather than go through the location log to see which files are present on
the remote, it simply looks at the disk contents directly.

I benchmarked this speeding up scanning 834 files, from an annex on my
phone's SSD, from 11.39 seconds to 1.31 seconds. (No files actually moved.)

Also benchmarked 8139 files, from an annex on spinning storage,
speeding up from 103.17 to 13.39 seconds.

Note that benchmarking with an encrypted annex on flash actually showed a
minor slowdown with this optimisation -- from 13.93 to 14.50 seconds. Seems
the overhead of doing the crypto needed to get the filenames to directly
check can be higher than the overhead of looking up data in the location
log. (Which says good things about how well the location log and git have
been optimised!) It *may* make sense to make encrypted local remotes not
have hasKeyCheap set; further benchmarking is called for.
2012-02-26 14:59:48 -04:00
Joey Hess
12b89a3eb8 configure: Check if ssh connection caching is supported by the installed version of ssh and default annex.sshcaching accordingly. 2012-02-25 19:15:29 -04:00
Joey Hess
c3fbe07d7a do a cleanup commit after moving data from or to a git remote
Added Annex.cleanup, which is a general purpose interface for adding
actions to run at the end.

Remotes with the old git-annex-shell will commit every time, and have no
commit command, so hide stderr when running the commit command.
2012-02-25 18:02:49 -04:00
Joey Hess
1f73db3469 improve alwayscommit=false mode
Now changes are staged into the branch's index, but not committed,
which avoids growing a large journal. And sync and merge always
explicitly commit, ensuring that even when they do nothing else,
they commit the staged changes.

Added a flag file to indicate that the branch's journal contains
uncommitted changes. (Could use git ls-files, but don't want to run
that every time.)

In the future, this ability to have uncommitted changes staged in the
journal might be used on remotes after a series of oneshot commands.
2012-02-25 16:18:55 -04:00
Joey Hess
b49c0c2633 add annex.alwayscommit option
To avoid commits of data to the git-annex branch after each command
is run, set annex.alwayscommit=false. Its data will then be committed
less frequently, when a merge or sync is done.
2012-02-25 15:31:42 -04:00
Joey Hess
bd66f962d3 Deal with NFS problem that caused a failure to remove a directory when removing content from the annex.
I was able to reproduce this on linux using the kernel's nfs server and
mounting localhost:/. Determined that removing the directory fails when
the just-deleted file in it was locked. Considered dropping the lock
before removing the directory, but this would complicate parts of the code
that should not need to worry about locking. So instead, ignore the failure
to remove the directory in this case.

While I was at it, made it attempt to remove both levels of hash
directories, in case they're empty.
2012-02-24 16:30:47 -04:00
Joey Hess
5bf07b3b5c Store web special remote url info in a more efficient location.
storing it in remotes/web/xx/yy/foo.log meant lots of extra directory
objects in git. Now I use xx/yy/foo.log.web, which is just as unique, but
more efficient since foo.log is there anyway.

Of course, it still looks in the old location too.
2012-02-17 23:15:29 -04:00
Joey Hess
db6b4cdfcf rekey: New plumbing level command, can be used to change the keys used for files en masse. 2012-02-16 16:36:35 -04:00
Joey Hess
aeaaa0ff87 reorder 2012-02-16 15:07:59 -04:00
Joey Hess
39c3f56b33 addurl: Add --pathdepth option. 2012-02-16 12:25:19 -04:00
Joey Hess
63152428e9 changelog 2012-02-15 17:33:21 -04:00
Joey Hess
52c5b164d8 Added a annex.queuesize setting
useful when adding hundreds of thousands of files on a system with plenty
of memory.

git add gets quite slow in such a large repository, so if the system has
more than the ~32 mb of memory the queue can use by default, it's a useful
optimisation to increase the queue size, in order to decrease the number
of times git add is run.
2012-02-15 11:14:19 -04:00
Joey Hess
7ebd98d8d8 fix memory leak when staging the journal
The list of files had to be retained until the end so it could be deleted.
Also, a list of update-index lines was generated and only then fed into it.
Now everything streams in constant space.
2012-02-14 14:37:59 -04:00
Joey Hess
a40ec5e03e Fixed a memory leak due to excessive strictness when committing journal files.
When hashing the files, the entire list of shas was read strictly.
That was entirely unnecessary, since there's a cleanup action run
after they're consumed.
2012-02-14 11:20:34 -04:00
Joey Hess
cb631ce518 whereis: Prints the urls of files that the web special remote knows about. 2012-02-14 03:49:48 -04:00
Joey Hess
59b2adea4f changelog for a964012fc3
Turns out that commit really made some serious improvements to memory use.
With the lazy state monad, git-annex add in a huge tree grew seemingly
without bound until it overflowed the stack. With the strict monad,
it uses 42 mb max.

It's possible another change since the 3.20120123 release fixed that,
but a964012fc3 seems most likely.
2012-02-13 16:58:58 -04:00
Joey Hess
17fed709c8 addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key. 2012-02-10 19:23:46 -04:00
Joey Hess
9030f68452 When checking that an url has a key, verify that the Content-Length, if available, matches the size of the key.
If there's no Content-Length, or the key has no size, this check is not
done, but it should happen most of the time, and protect against web
content that has changed.
2012-02-10 19:23:41 -04:00
Joey Hess
d55f3c0716 Fix teardown of stale cached ssh connections. 2012-02-09 21:49:46 -04:00
Joey Hess
1c0bd81ba6 addurl: Normalize badly encoded urls. 2012-02-09 14:19:58 -04:00
Joey Hess
ef013506cb addurl: Added a --file option
Can be used to specify what file the url is added to. This can be used to
override the default filename that is used when adding an url, which is
based on the url. Or, when the file already exists, the url is recorded as
another location of the file.
2012-02-08 15:35:29 -04:00
Joey Hess
57a747d081 S3: Fix irrefutable pattern failure when accessing encrypted S3 credentials. 2012-02-08 11:41:15 -04:00
Joey Hess
995bf51e10 correction 2012-02-07 16:52:39 -04:00
Joey Hess
3f4f96228e changelog 2012-02-06 20:42:49 -04:00
Joey Hess
b81d662cbf Avoid repeated location log commits when a remote is receiving files.
Done by adding a oneshot mode, in which location log changes are written to
the journal, but not committed. Taking advantage of git-annex's existing
ability to recover in this situation.

This is used by git-annex-shell and other places where changes are made to
a remote's location log.
2012-01-28 15:41:52 -04:00
Joey Hess
ce5637498f remove Utility.Conditional and use IfElse
This drops the >>! and >>? with the nice low fixity. IfElse does have
undocumented >>=>>! and >>=>>? operators, but I deem that too fishy.
Anyway, using whenM and unlessM is easier; I sometimes mixed the operators
up.
2012-01-24 16:22:07 -04:00
Joey Hess
20d0288802 releasing version 3.20120123 2012-01-23 15:09:50 -04:00
Joey Hess
47250a153a ssh connection caching
Ssh connection caching is now enabled automatically by git-annex. Only one
ssh connection is made to each host per git-annex run, which can speed some
things up a lot, as well as avoiding repeated password prompts. Concurrent
git-annex processes also share ssh connections. Cached ssh connections are
shut down when git-annex exits.

Note: The rsync special remote does not yet participate in the ssh
connection caching.
2012-01-20 17:14:56 -04:00
Joey Hess
61dbad505d fsck --from remote --fast
Avoids expensive file transfers, at the expense of checking file size
and/or contents.

Required some reworking of the remote code.
2012-01-20 13:23:11 -04:00
Joey Hess
90319afa41 fsck --from
Fscking a remote is now supported. It's done by retrieving
the contents of the specified files from the remote, and checking them,
so can be an expensive operation.

(Several optimisations are possible, to speed it up, of course.. This is
the slow and stupid remote fsck to start with.)

Still, if the remote is a special remote, or a git repository that you
cannot run fsck in locally, it's nice to have the ability to fsck it.

If you have any directory special remotes, now would be a good time to
fsck them, in case you were hit by the data loss bug fixed in the
previous release!
2012-01-19 15:24:05 -04:00
Joey Hess
2837e8fef1 releasing version 3.20120116 2012-01-16 16:52:26 -04:00
Joey Hess
f161b5eb59 Fix data loss bug in directory special remote
When moving a file to the remote failed, and partially transferred content
was left behind in the directory, re-running the same move would think it
succeeded and delete the local copy.

I reproduced data loss when moving files to a partition that was almost
full. Interrupting a transfer could have similar results.

Easily fixed by using a temp file which is then moved atomically into place
once the transfer completes.

I've audited other calls to copyFileExternal, and other special remote
file transfer code; everything else seems to use temp files correctly
(rsync, git), or otherwise use atomic transfers (bup, S3).
2012-01-16 16:28:15 -04:00
Joey Hess
ce608303a3 releasing version 3.20120115 2012-01-15 14:02:32 -04:00
Joey Hess
37b5b1bf0d Fix QuickCheck dependency in cabal file. 2012-01-15 13:53:51 -04:00
Joey Hess
81856c3175 add a configure check for StatFS
This way, the build log will indicate whether StatFS can be relied on.
I've tested all the failing architectures now, and on all of them,
the StatFS code now returns Nothing, rather than Just nonsense.

Also, if annex.diskreserve is set on a platform where StatFS is not
working, git-annex will complain.

Also, the Makefile was missing the sources target used when building with
cabal.
2012-01-15 13:49:32 -04:00
Joey Hess
0eed604446 Add a sanity check for bad StatFS results.
git-annex FTBFS on s390, mips, powerpc, sparc. That StatFS code is failing
on all of them. At least on s390, the failure appears as:

Just (FileSystemStats {fsStatBlockSize = 4096, fsStatBlockCount = 0,
fsStatByteCount = 0, fsStatBytesFree = 0, fsStatBytesAvailable = 0,
fsStatBytesUsed = 0})

While I don't understand why this is happening, or how to fix it,
bandaid over it by checking for obviously bad values and returning Nothing.
That disables disk free space checking, but at least git-annex will work.

Upstream bug: http://code.google.com/p/xmobar/issues/detail?id=70
2012-01-14 17:17:20 -04:00
Joey Hess
b88ecbdc1b Add libghc-testpack-dev to build depends on all arches. 2012-01-13 15:50:56 -04:00
Joey Hess
1ae780ee79 git-annex, git-union-merge: Support GIT_DIR and GIT_WORK_TREE.
Note that GIT_WORK_TREE cannot influence GIT_DIR; that is necessary for
git-fake-bare and vcsh type things to work.
2012-01-13 12:52:09 -04:00
Joey Hess
0d5c402210 Add annex-trustlevel configuration settings, which can be used to override the trust level of a remote.
This overrides the trust.log, and is overridden by the command-line trust
parameters.

It would have been nicer to have Logs.Trust.trustMap just look up the
configuration for all remotes, but a dependency loop prevented that
(Remotes depends on Logs.Trust in several ways). So instead, look up
the configuration when building remotes, storing it in the same forcetrust
field used for the command-line trust parameters.
2012-01-09 23:31:44 -04:00
Joey Hess
7675b83efa map: Fix display of remote repos
A change to break local cycles made remote repos be dropped entirely.
2012-01-08 16:05:57 -04:00
Joey Hess
a35278430a log: Add --gource mode, which generates output usable by gource.
As part of this, I fixed up how log was getting the descriptions of
remotes.
2012-01-07 18:18:09 -04:00
Joey Hess
3da28cad07 releasing version 3.20120106 2012-01-07 13:50:35 -04:00
Joey Hess
60c1aeeb6f Fix overbroad gpg --no-tty fix from last release.
Only set --no-tty when GPG_AGENT_INFO is set and batch mode is used.

In the test suite, set GPG_AGENT_INFO to /dev/null to avoid the test suite
relying on /dev/tty.
2012-01-07 12:38:08 -04:00
Joey Hess
b59759e33c typo 2012-01-06 17:52:16 -04:00
Joey Hess
a3a9f87047 log: New command that displays the location log for file, showing each repository they were added to and removed from.
This needs to run git log on the location log files to get at all past
versions of the file, which tends to be a bit slow.

It would be possible to make a version optimised for showing the location
logs for every key. That would only need to run git log once, so would be
faster, but it would need to process an enormous amount of data, so
would not speed up the individual file case.

In the future it would be nice to support log --format. log --json also
doesn't work right yet.
2012-01-06 15:40:07 -04:00
Joey Hess
f534fcc7b1 remove S3stub stuff
Let's keep that in a no-s3 branch, which can be merged into eg,
debian-stable.
2012-01-05 23:14:10 -04:00
Joey Hess
c371c40a88 Don't list S3 as a remote type when built without S3 support. 2012-01-05 23:11:07 -04:00
Joey Hess
0b27e6baa0 Support unescaped repository urls, like git does.
Turns out that git will accept a .git/config containing an url with eg,
spaces in its name. Handle this by escaping the url if it's not valid.

This also fixes support for urls containing escaped characters like %20
for space. Before, the path from the url was not unescaped properly.
2012-01-05 14:32:20 -04:00
Joey Hess
338d472ca2 releasing version 3.20120105 2012-01-05 13:51:13 -04:00