Commit graph

3545 commits

Author SHA1 Message Date
Joey Hess
faf3a94fa7 added second stage bloom filter 2012-03-12 15:21:58 -04:00
Joey Hess
32f9742a88 fixed bloom filter creation space leak
it works!
2012-03-12 14:09:43 -04:00
https://www.google.com/accounts/o8/id?id=AItOawl-J5N9y-JBa_GcOQ4VQXIF8MjAtxgN67w
b6caf8997d Formatting 2012-03-12 11:22:21 +00:00
http://joey.kitenet.net/
a886fe1601 Added a comment 2012-03-12 06:43:03 +00:00
Joey Hess
160715166b try at using bloom filters
leaks memory
2012-03-12 02:39:25 -04:00
https://www.google.com/accounts/o8/id?id=AItOawne9wwsAaMzo0kGyidj6PW_3_IA8eeDv7Y
98ff74a851 2012-03-12 05:16:28 +00:00
Joey Hess
89ee70c43a status: More accurate display of sizes of tmp and bad keys.
Can't trust the key size to be accurate for tmp and bad keys, so check
actual file size. In the wild I saw the old code be wrong by a factor
of about 100!

If all tmp/bad keys are empty, they're not shown in status at all.
Showing 0 bytes and suggesting to clean it up seemed weird..
2012-03-12 00:41:48 -04:00
Joey Hess
83bbb3bc93 prettify 2012-03-11 21:21:51 -04:00
Joey Hess
5df18b311a avoid needing to keep list of present keys
Stale and bad files are rare, so it's more efficient to use inAnnex to see
if they can be deleted, rather than keeping the list of all present keys
around for them.
2012-03-11 20:46:03 -04:00
Joey Hess
6fd0c0bfec move 2012-03-11 18:12:36 -04:00
Joey Hess
b325694645 getKeysPresent is now fully lazy
.. Allowing it to be used by things in constant space!

Random statistics: git annex status has gone from taking 239 mb
of memory and 26 seconds in a repo, to 8 mb and 13 seconds.

The trick here is the unsafeInterleaveIO, and the form of the function's
recursion, which I cribbed heavily from System.IO.HVFS.Utils.recurseDirStat.
The difference is, this one goes to a limited depth and avoids statting
everything.
2012-03-11 18:04:58 -04:00
Joey Hess
ff3644ad38 status: Fixed to run in nearly constant space.
Before, it leaked space due to caching lists of keys. Now all necessary
data about keys is calculated as they stream in.

The "nearly constant" is due to getKeysPresent, which builds up a lot
of [] thunks as it traverses .git/annex/objects/. Will deal with it later.
2012-03-11 17:15:58 -04:00
Joey Hess
b086e32c63 unused: Reduce memory usage significantly.
Much of the memory bloat turned out to be due to getKeysReferenced
containing a mapM, which is strict and buffered the whole list
rather than streaming it.

The other half of the bloat was due to building a temporary Set
in order to call S.difference. While that is more cpu efficient,
I switched to successive S.delete, since with it, I can run a whole
git annex unused in less than 8 mb of memory.

The whole Set of keys with content available is still stored in memory,
so running unused in a repo with a whole lot of file content will still
use more memory. In a repo containing 6000 files, it needed 40 mb.

Note that the status command still uses the bloatful getKeysReferenced.
2012-03-11 16:24:07 -04:00
Joey Hess
a13949bf37 fix link 2012-03-11 11:52:26 -04:00
Joey Hess
f2e4187323 fix link 2012-03-11 11:52:02 -04:00
Joey Hess
8e4c66781f Merge branch 'master' of ssh://git-annex.branchable.com 2012-03-11 11:51:33 -04:00
http://joey.kitenet.net/
c90e4fdb66 Added a comment 2012-03-11 15:50:11 +00:00
http://claimid.com/FooBarWidget
43dff01dff 2012-03-11 09:22:58 +00:00
Joey Hess
94d7b323ee remove cruft 2012-03-10 23:02:17 -04:00
Joey Hess
997e29f294 sync: Sync to lower cost remotes first.
This has two benefits.

1. When a lot of refs are going to be received, get them via lower cost
   connection when possible.
2. Allows ctrl-c of sync after the cheaper remotes have been pulled from
   (or pushed to).
2012-03-10 15:37:38 -04:00
Joey Hess
5ab82230f7 fsck: Fix up any broken links and misplaced content caused by the directory hash calculation bug fixed in the last release. 2012-03-10 14:46:21 -04:00
Joey Hess
468fecc315 Setup.hs: import configure
Rather than running make, which runs configure, let Setup.hs just include
the configure code. The standalone configure is retained for use by the
Makefile.

This may work better with cabal-dev, since it avoids the Makefile running
ghc, and lets cabal handle all the compiler running, with whatever
flags it uses to expose dependencies.
2012-03-10 14:00:26 -04:00
Joey Hess
eaa80be917 move text dependency into same block with the other dependencies 2012-03-10 14:00:06 -04:00
Joey Hess
13598d9432 add other-modules for hsc files 2012-03-10 12:47:57 -04:00
Joey Hess
f9d44cccd9 perhaps more clear type 2012-03-10 11:38:38 -04:00
Joey Hess
10d9315b59 cleanup 2012-03-09 20:43:50 -04:00
Joey Hess
41c0d9e969 add news item for git-annex 3.20120309 2012-03-09 20:15:29 -04:00
Joey Hess
433b5fe59e releasing version 3.20120309 2012-03-09 20:14:34 -04:00
Joey Hess
bca3fd65b9 fix key directory hash calculation code
Fix Key directory hash calculation code to behave as it did before version
3.20120227 when a key contains non-ascii.

The hash directories for a given Key are based on its md5sum.
Prior to ghc 7.4, Keys contained raw, undecoded bytes, so the md5sum was
taken of each byte in turn. With the ghc 7.4 filename encoding change,
keys contains decoded unicode characters (possibly with surrigates for
undecodable bytes). This changes the result of the md5sum, since the md5sum
used is pure haskell and supports unicode. And that won't do, as git-annex
will start looking in a different hash directory for the content of a key.

The surrigates are particularly bad, since that's essentially a ghc
implementation detail, so could change again at any time. Also, changing
the locale changes how the bytes are decoded, which can also change
the md5sum.

Symptoms would include things like:

* git annex fsck would complain that no copies existed of a file,
  despite its symlink pointing to the content that was locally present
* git annex fix would change the symlink to use the wrong hash
  directory.

Only WORM backend is likely to have been affected, since only it tends
to include much filename data (SHA1E could in theory also be affected).

I have not tried to support the hash directories used by git-annex versions
3.20120227 to 3.20120308, so things added with those versions with WORM
will require manual fixups. Sorry for the inconvenience!
2012-03-09 20:03:51 -04:00
Joey Hess
d6e77595ba factor out Utility.FileSystemEncoding 2012-03-09 19:08:10 -04:00
Joey Hess
789254747b refactor 2012-03-09 18:52:03 -04:00
Joey Hess
581dc819e1 version base dependency for ghc 7.4 2012-03-06 17:32:18 -04:00
Joey Hess
dc9049373e cleanup 2012-03-06 14:12:15 -04:00
Joey Hess
d08ee1a9d2 syscall optimisation 2012-03-06 13:56:20 -04:00
Joey Hess
cd6fd4a1d1 Merge branch 'master' of ssh://git-annex.branchable.com 2012-03-06 13:23:32 -04:00
http://joey.kitenet.net/
a78f699190 Added a comment 2012-03-06 17:22:54 +00:00
Joey Hess
b927dfd970 remove addurl test
addurl --fast used to avoid network, but it always uses it now, getting at
least size. Thus not appropriate for test suite without a lot of work.
2012-03-06 13:21:46 -04:00
https://www.google.com/accounts/o8/id?id=AItOawk_LOahrm_Cdg7io-_H0CNKkaxsRRQgRFo
ca936cd2d8 Added a comment: Test suite failure 2012-03-06 11:20:36 +00:00
http://peter-simons.myopenid.com/
b4b36b6ebe Added a comment 2012-03-05 23:29:42 +00:00
http://joey.kitenet.net/
d2835d4304 Added a comment 2012-03-05 21:32:00 +00:00
http://joey.kitenet.net/
614208ad52 removed 2012-03-05 21:30:08 +00:00
http://joey.kitenet.net/
eedc774c8a Added a comment 2012-03-05 21:29:46 +00:00
http://peter-simons.myopenid.com/
6a1e334a78 Added a comment 2012-03-05 21:10:48 +00:00
Joey Hess
ee806c1175 add news item for git-annex 3.20120230 2012-03-05 13:47:35 -04:00
Joey Hess
0d41899304 releasing version 3.20120230 2012-03-05 13:47:20 -04:00
Joey Hess
51338486dc Fix a bug in symlink calculation code, that triggered in rare cases where an annexed file is in a subdirectory that nearly matched to the .git/annex/object/xx/yy subdirectories.
This is a straight up pure-code stinker. The relative path calculation
looked for common subdirectories in the two paths, but failed to stop
after the paths diverged. When a later pair of subdirectories were the
same, the resulting relative path was wrong.

Added regression test for this.
2012-03-05 12:42:52 -04:00
Joey Hess
52e88f3ebf add remote start and stop hooks
Locking is used, so that, if there are multiple git-annex processes
using a remote concurrently, the stop hook is only run by the last
process that uses it.
2012-03-04 19:12:58 -04:00
Joey Hess
fba66c55ed foo 2012-03-04 13:16:35 -04:00
Joey Hess
3960825cef better chunked file retrieval
Avoids opening every chunk at once, instead streaming them in.

Not done for encrypted file retrieval yet.
2012-03-04 11:48:23 -04:00
Joey Hess
612ca3cf2e tweak davfs2 settings 2012-03-04 11:12:52 -04:00