Commit graph

3703 commits

Author SHA1 Message Date
https://www.google.com/accounts/o8/id?id=AItOawm2MUhwzcOSnZfYnmWu7_2dMrH4064OKyQ
f047e04179 2012-03-14 11:09:32 +00:00
Joey Hess
b27760aa68 Work around a bug in rsync (IMHO) introduced by openSUSE's SIP patch.
openSUSE patches rsync with a patch adding SIP protocol support.
https://gist.github.com/2026167

With this patch, running rsync with no hostname parameter is apparently
supposed to list SIP hosts on the network. Practically, it does nothing
and exits 0.

git-annex uses rsync in a very special way to allow git-annex-shell to be
run on the remote host, and so did not need to specify a hostname, or a
file to transfer as a rsync parameter. So it sent ":", a degenerate case of
"host:file".

But the patch cannot differentiate ":" with no host parameter
(a bug in the SIP patch surely).

Results were that getting files failed, as rsync seemed to succeed, but the
requested file failed to arrive. Also I think that sending files will
make git-annex think a file has been transferred to the remote when
really rsync does nothing.

The workaround for this buggy rsync patch is to use "dummy:" as the
hostname.
2012-03-12 22:53:43 -04:00
Joey Hess
59e2feeda1 Merge branch 'master' into bloom
Conflicts:
	doc/todo/git-annex_unused_eats_memory.mdwn
2012-03-12 16:33:16 -04:00
Joey Hess
6a95240dff note fixed 2012-03-12 16:32:54 -04:00
Joey Hess
94aff8b878 Merge branch 'master' into bloom
Conflicts:
	debian/changelog
2012-03-12 16:32:29 -04:00
Joey Hess
8540183a02 close 2012-03-12 16:31:41 -04:00
Joey Hess
77fb50e01a bloom branch 2012-03-12 16:20:17 -04:00
Joey Hess
25809ce2e0 finish bloom filters
Add tuning, docs, etc.

Not sure if status is the right place to remote size.. perhaps unused
should report the size and also warn if it sees more keys than the bloom
filter allows?
2012-03-12 16:18:35 -04:00
Joey Hess
faf3a94fa7 added second stage bloom filter 2012-03-12 15:21:58 -04:00
Joey Hess
32f9742a88 fixed bloom filter creation space leak
it works!
2012-03-12 14:09:43 -04:00
https://www.google.com/accounts/o8/id?id=AItOawl-J5N9y-JBa_GcOQ4VQXIF8MjAtxgN67w
b6caf8997d Formatting 2012-03-12 11:22:21 +00:00
http://joey.kitenet.net/
a886fe1601 Added a comment 2012-03-12 06:43:03 +00:00
Joey Hess
160715166b try at using bloom filters
leaks memory
2012-03-12 02:39:25 -04:00
https://www.google.com/accounts/o8/id?id=AItOawne9wwsAaMzo0kGyidj6PW_3_IA8eeDv7Y
98ff74a851 2012-03-12 05:16:28 +00:00
Joey Hess
89ee70c43a status: More accurate display of sizes of tmp and bad keys.
Can't trust the key size to be accurate for tmp and bad keys, so check
actual file size. In the wild I saw the old code be wrong by a factor
of about 100!

If all tmp/bad keys are empty, they're not shown in status at all.
Showing 0 bytes and suggesting to clean it up seemed weird..
2012-03-12 00:41:48 -04:00
Joey Hess
83bbb3bc93 prettify 2012-03-11 21:21:51 -04:00
Joey Hess
5df18b311a avoid needing to keep list of present keys
Stale and bad files are rare, so it's more efficient to use inAnnex to see
if they can be deleted, rather than keeping the list of all present keys
around for them.
2012-03-11 20:46:03 -04:00
Joey Hess
6fd0c0bfec move 2012-03-11 18:12:36 -04:00
Joey Hess
b325694645 getKeysPresent is now fully lazy
.. Allowing it to be used by things in constant space!

Random statistics: git annex status has gone from taking 239 mb
of memory and 26 seconds in a repo, to 8 mb and 13 seconds.

The trick here is the unsafeInterleaveIO, and the form of the function's
recursion, which I cribbed heavily from System.IO.HVFS.Utils.recurseDirStat.
The difference is, this one goes to a limited depth and avoids statting
everything.
2012-03-11 18:04:58 -04:00
Joey Hess
ff3644ad38 status: Fixed to run in nearly constant space.
Before, it leaked space due to caching lists of keys. Now all necessary
data about keys is calculated as they stream in.

The "nearly constant" is due to getKeysPresent, which builds up a lot
of [] thunks as it traverses .git/annex/objects/. Will deal with it later.
2012-03-11 17:15:58 -04:00
Joey Hess
b086e32c63 unused: Reduce memory usage significantly.
Much of the memory bloat turned out to be due to getKeysReferenced
containing a mapM, which is strict and buffered the whole list
rather than streaming it.

The other half of the bloat was due to building a temporary Set
in order to call S.difference. While that is more cpu efficient,
I switched to successive S.delete, since with it, I can run a whole
git annex unused in less than 8 mb of memory.

The whole Set of keys with content available is still stored in memory,
so running unused in a repo with a whole lot of file content will still
use more memory. In a repo containing 6000 files, it needed 40 mb.

Note that the status command still uses the bloatful getKeysReferenced.
2012-03-11 16:24:07 -04:00
Joey Hess
a13949bf37 fix link 2012-03-11 11:52:26 -04:00
Joey Hess
f2e4187323 fix link 2012-03-11 11:52:02 -04:00
Joey Hess
8e4c66781f Merge branch 'master' of ssh://git-annex.branchable.com 2012-03-11 11:51:33 -04:00
http://joey.kitenet.net/
c90e4fdb66 Added a comment 2012-03-11 15:50:11 +00:00
http://claimid.com/FooBarWidget
43dff01dff 2012-03-11 09:22:58 +00:00
Joey Hess
94d7b323ee remove cruft 2012-03-10 23:02:17 -04:00
Joey Hess
997e29f294 sync: Sync to lower cost remotes first.
This has two benefits.

1. When a lot of refs are going to be received, get them via lower cost
   connection when possible.
2. Allows ctrl-c of sync after the cheaper remotes have been pulled from
   (or pushed to).
2012-03-10 15:37:38 -04:00
Joey Hess
5ab82230f7 fsck: Fix up any broken links and misplaced content caused by the directory hash calculation bug fixed in the last release. 2012-03-10 14:46:21 -04:00
Joey Hess
468fecc315 Setup.hs: import configure
Rather than running make, which runs configure, let Setup.hs just include
the configure code. The standalone configure is retained for use by the
Makefile.

This may work better with cabal-dev, since it avoids the Makefile running
ghc, and lets cabal handle all the compiler running, with whatever
flags it uses to expose dependencies.
2012-03-10 14:00:26 -04:00
Joey Hess
eaa80be917 move text dependency into same block with the other dependencies 2012-03-10 14:00:06 -04:00
Joey Hess
13598d9432 add other-modules for hsc files 2012-03-10 12:47:57 -04:00
Joey Hess
f9d44cccd9 perhaps more clear type 2012-03-10 11:38:38 -04:00
Joey Hess
10d9315b59 cleanup 2012-03-09 20:43:50 -04:00
Joey Hess
41c0d9e969 add news item for git-annex 3.20120309 2012-03-09 20:15:29 -04:00
Joey Hess
433b5fe59e releasing version 3.20120309 2012-03-09 20:14:34 -04:00
Joey Hess
bca3fd65b9 fix key directory hash calculation code
Fix Key directory hash calculation code to behave as it did before version
3.20120227 when a key contains non-ascii.

The hash directories for a given Key are based on its md5sum.
Prior to ghc 7.4, Keys contained raw, undecoded bytes, so the md5sum was
taken of each byte in turn. With the ghc 7.4 filename encoding change,
keys contains decoded unicode characters (possibly with surrigates for
undecodable bytes). This changes the result of the md5sum, since the md5sum
used is pure haskell and supports unicode. And that won't do, as git-annex
will start looking in a different hash directory for the content of a key.

The surrigates are particularly bad, since that's essentially a ghc
implementation detail, so could change again at any time. Also, changing
the locale changes how the bytes are decoded, which can also change
the md5sum.

Symptoms would include things like:

* git annex fsck would complain that no copies existed of a file,
  despite its symlink pointing to the content that was locally present
* git annex fix would change the symlink to use the wrong hash
  directory.

Only WORM backend is likely to have been affected, since only it tends
to include much filename data (SHA1E could in theory also be affected).

I have not tried to support the hash directories used by git-annex versions
3.20120227 to 3.20120308, so things added with those versions with WORM
will require manual fixups. Sorry for the inconvenience!
2012-03-09 20:03:51 -04:00
Joey Hess
d6e77595ba factor out Utility.FileSystemEncoding 2012-03-09 19:08:10 -04:00
Joey Hess
789254747b refactor 2012-03-09 18:52:03 -04:00
Joey Hess
581dc819e1 version base dependency for ghc 7.4 2012-03-06 17:32:18 -04:00
Joey Hess
dc9049373e cleanup 2012-03-06 14:12:15 -04:00
Joey Hess
d08ee1a9d2 syscall optimisation 2012-03-06 13:56:20 -04:00
Joey Hess
cd6fd4a1d1 Merge branch 'master' of ssh://git-annex.branchable.com 2012-03-06 13:23:32 -04:00
http://joey.kitenet.net/
a78f699190 Added a comment 2012-03-06 17:22:54 +00:00
Joey Hess
b927dfd970 remove addurl test
addurl --fast used to avoid network, but it always uses it now, getting at
least size. Thus not appropriate for test suite without a lot of work.
2012-03-06 13:21:46 -04:00
https://www.google.com/accounts/o8/id?id=AItOawk_LOahrm_Cdg7io-_H0CNKkaxsRRQgRFo
ca936cd2d8 Added a comment: Test suite failure 2012-03-06 11:20:36 +00:00
http://peter-simons.myopenid.com/
b4b36b6ebe Added a comment 2012-03-05 23:29:42 +00:00
http://joey.kitenet.net/
d2835d4304 Added a comment 2012-03-05 21:32:00 +00:00
http://joey.kitenet.net/
614208ad52 removed 2012-03-05 21:30:08 +00:00
http://joey.kitenet.net/
eedc774c8a Added a comment 2012-03-05 21:29:46 +00:00