Commit graph

576 commits

Author SHA1 Message Date
Joey Hess
52c5b164d8 Added a annex.queuesize setting
useful when adding hundreds of thousands of files on a system with plenty
of memory.

git add gets quite slow in such a large repository, so if the system has
more than the ~32 mb of memory the queue can use by default, it's a useful
optimisation to increase the queue size, in order to decrease the number
of times git add is run.
2012-02-15 11:14:19 -04:00
Joey Hess
7ebd98d8d8 fix memory leak when staging the journal
The list of files had to be retained until the end so it could be deleted.
Also, a list of update-index lines was generated and only then fed into it.
Now everything streams in constant space.
2012-02-14 14:37:59 -04:00
Joey Hess
a40ec5e03e Fixed a memory leak due to excessive strictness when committing journal files.
When hashing the files, the entire list of shas was read strictly.
That was entirely unnecessary, since there's a cleanup action run
after they're consumed.
2012-02-14 11:20:34 -04:00
Joey Hess
cb631ce518 whereis: Prints the urls of files that the web special remote knows about. 2012-02-14 03:49:48 -04:00
Joey Hess
59b2adea4f changelog for a964012fc3
Turns out that commit really made some serious improvements to memory use.
With the lazy state monad, git-annex add in a huge tree grew seemingly
without bound until it overflowed the stack. With the strict monad,
it uses 42 mb max.

It's possible another change since the 3.20120123 release fixed that,
but a964012fc3 seems most likely.
2012-02-13 16:58:58 -04:00
Joey Hess
17fed709c8 addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key. 2012-02-10 19:23:46 -04:00
Joey Hess
9030f68452 When checking that an url has a key, verify that the Content-Length, if available, matches the size of the key.
If there's no Content-Length, or the key has no size, this check is not
done, but it should happen most of the time, and protect against web
content that has changed.
2012-02-10 19:23:41 -04:00
Joey Hess
d55f3c0716 Fix teardown of stale cached ssh connections. 2012-02-09 21:49:46 -04:00
Joey Hess
1c0bd81ba6 addurl: Normalize badly encoded urls. 2012-02-09 14:19:58 -04:00
Joey Hess
ef013506cb addurl: Added a --file option
Can be used to specify what file the url is added to. This can be used to
override the default filename that is used when adding an url, which is
based on the url. Or, when the file already exists, the url is recorded as
another location of the file.
2012-02-08 15:35:29 -04:00
Joey Hess
57a747d081 S3: Fix irrefutable pattern failure when accessing encrypted S3 credentials. 2012-02-08 11:41:15 -04:00
Joey Hess
995bf51e10 correction 2012-02-07 16:52:39 -04:00
Joey Hess
3f4f96228e changelog 2012-02-06 20:42:49 -04:00
Joey Hess
91fc975964 note 7.4 needed 2012-02-04 14:51:52 -04:00
Joey Hess
ed64bd8a4b remove; unused 2012-01-30 13:20:36 -04:00
Joey Hess
b81d662cbf Avoid repeated location log commits when a remote is receiving files.
Done by adding a oneshot mode, in which location log changes are written to
the journal, but not committed. Taking advantage of git-annex's existing
ability to recover in this situation.

This is used by git-annex-shell and other places where changes are made to
a remote's location log.
2012-01-28 15:41:52 -04:00
Joey Hess
ce5637498f remove Utility.Conditional and use IfElse
This drops the >>! and >>? with the nice low fixity. IfElse does have
undocumented >>=>>! and >>=>>? operators, but I deem that too fishy.
Anyway, using whenM and unlessM is easier; I sometimes mixed the operators
up.
2012-01-24 16:22:07 -04:00
Joey Hess
20d0288802 releasing version 3.20120123 2012-01-23 15:09:50 -04:00
Joey Hess
47250a153a ssh connection caching
Ssh connection caching is now enabled automatically by git-annex. Only one
ssh connection is made to each host per git-annex run, which can speed some
things up a lot, as well as avoiding repeated password prompts. Concurrent
git-annex processes also share ssh connections. Cached ssh connections are
shut down when git-annex exits.

Note: The rsync special remote does not yet participate in the ssh
connection caching.
2012-01-20 17:14:56 -04:00
Joey Hess
61dbad505d fsck --from remote --fast
Avoids expensive file transfers, at the expense of checking file size
and/or contents.

Required some reworking of the remote code.
2012-01-20 13:23:11 -04:00
Joey Hess
711c154561 update NEWS
Add news item recommending fscking directory special remotes.

Remote news item about URL backend being removed; it was later added back
to be used by git annex addurl --fast.

Link NEWS into top level.
2012-01-19 15:27:39 -04:00
Joey Hess
90319afa41 fsck --from
Fscking a remote is now supported. It's done by retrieving
the contents of the specified files from the remote, and checking them,
so can be an expensive operation.

(Several optimisations are possible, to speed it up, of course.. This is
the slow and stupid remote fsck to start with.)

Still, if the remote is a special remote, or a git repository that you
cannot run fsck in locally, it's nice to have the ability to fsck it.

If you have any directory special remotes, now would be a good time to
fsck them, in case you were hit by the data loss bug fixed in the
previous release!
2012-01-19 15:24:05 -04:00
Joey Hess
2837e8fef1 releasing version 3.20120116 2012-01-16 16:52:26 -04:00
Joey Hess
f161b5eb59 Fix data loss bug in directory special remote
When moving a file to the remote failed, and partially transferred content
was left behind in the directory, re-running the same move would think it
succeeded and delete the local copy.

I reproduced data loss when moving files to a partition that was almost
full. Interrupting a transfer could have similar results.

Easily fixed by using a temp file which is then moved atomically into place
once the transfer completes.

I've audited other calls to copyFileExternal, and other special remote
file transfer code; everything else seems to use temp files correctly
(rsync, git), or otherwise use atomic transfers (bup, S3).
2012-01-16 16:28:15 -04:00
Joey Hess
e3ea5fe938 debhelper v9
kills that ugly python message during build
2012-01-15 14:53:38 -04:00
Joey Hess
ce608303a3 releasing version 3.20120115 2012-01-15 14:02:32 -04:00
Joey Hess
37b5b1bf0d Fix QuickCheck dependency in cabal file. 2012-01-15 13:53:51 -04:00
Joey Hess
81856c3175 add a configure check for StatFS
This way, the build log will indicate whether StatFS can be relied on.
I've tested all the failing architectures now, and on all of them,
the StatFS code now returns Nothing, rather than Just nonsense.

Also, if annex.diskreserve is set on a platform where StatFS is not
working, git-annex will complain.

Also, the Makefile was missing the sources target used when building with
cabal.
2012-01-15 13:49:32 -04:00
Joey Hess
0eed604446 Add a sanity check for bad StatFS results.
git-annex FTBFS on s390, mips, powerpc, sparc. That StatFS code is failing
on all of them. At least on s390, the failure appears as:

Just (FileSystemStats {fsStatBlockSize = 4096, fsStatBlockCount = 0,
fsStatByteCount = 0, fsStatBytesFree = 0, fsStatBytesAvailable = 0,
fsStatBytesUsed = 0})

While I don't understand why this is happening, or how to fix it,
bandaid over it by checking for obviously bad values and returning Nothing.
That disables disk free space checking, but at least git-annex will work.

Upstream bug: http://code.google.com/p/xmobar/issues/detail?id=70
2012-01-14 17:17:20 -04:00
Joey Hess
b88ecbdc1b Add libghc-testpack-dev to build depends on all arches. 2012-01-13 15:50:56 -04:00
Joey Hess
1ae780ee79 git-annex, git-union-merge: Support GIT_DIR and GIT_WORK_TREE.
Note that GIT_WORK_TREE cannot influence GIT_DIR; that is necessary for
git-fake-bare and vcsh type things to work.
2012-01-13 12:52:09 -04:00
Joey Hess
0d5c402210 Add annex-trustlevel configuration settings, which can be used to override the trust level of a remote.
This overrides the trust.log, and is overridden by the command-line trust
parameters.

It would have been nicer to have Logs.Trust.trustMap just look up the
configuration for all remotes, but a dependency loop prevented that
(Remotes depends on Logs.Trust in several ways). So instead, look up
the configuration when building remotes, storing it in the same forcetrust
field used for the command-line trust parameters.
2012-01-09 23:31:44 -04:00
Joey Hess
7675b83efa map: Fix display of remote repos
A change to break local cycles made remote repos be dropped entirely.
2012-01-08 16:05:57 -04:00
Joey Hess
a35278430a log: Add --gource mode, which generates output usable by gource.
As part of this, I fixed up how log was getting the descriptions of
remotes.
2012-01-07 18:18:09 -04:00
Joey Hess
3da28cad07 releasing version 3.20120106 2012-01-07 13:50:35 -04:00
Joey Hess
60c1aeeb6f Fix overbroad gpg --no-tty fix from last release.
Only set --no-tty when GPG_AGENT_INFO is set and batch mode is used.

In the test suite, set GPG_AGENT_INFO to /dev/null to avoid the test suite
relying on /dev/tty.
2012-01-07 12:38:08 -04:00
Joey Hess
b59759e33c typo 2012-01-06 17:52:16 -04:00
Joey Hess
a3a9f87047 log: New command that displays the location log for file, showing each repository they were added to and removed from.
This needs to run git log on the location log files to get at all past
versions of the file, which tends to be a bit slow.

It would be possible to make a version optimised for showing the location
logs for every key. That would only need to run git log once, so would be
faster, but it would need to process an enormous amount of data, so
would not speed up the individual file case.

In the future it would be nice to support log --format. log --json also
doesn't work right yet.
2012-01-06 15:40:07 -04:00
Joey Hess
f534fcc7b1 remove S3stub stuff
Let's keep that in a no-s3 branch, which can be merged into eg,
debian-stable.
2012-01-05 23:14:10 -04:00
Joey Hess
c371c40a88 Don't list S3 as a remote type when built without S3 support. 2012-01-05 23:11:07 -04:00
Joey Hess
0b27e6baa0 Support unescaped repository urls, like git does.
Turns out that git will accept a .git/config containing an url with eg,
spaces in its name. Handle this by escaping the url if it's not valid.

This also fixes support for urls containing escaped characters like %20
for space. Before, the path from the url was not unescaped properly.
2012-01-05 14:32:20 -04:00
Joey Hess
338d472ca2 releasing version 3.20120105 2012-01-05 13:51:13 -04:00
Joey Hess
769edd6b08 Run gpg with --no-tty. Closes: #654721 2012-01-05 13:44:09 -04:00
Joey Hess
a1aea174d7 fsck: Do backend-specific check before checking numcopies is satisfied.
This way, when a checksum check fails and the content is moved aside,
the numcopies check also warns if there are not enough copies.
2012-01-03 18:40:47 -04:00
Joey Hess
7e6a54f984 Added quickcheck to build dependencies, and fail if test suite cannot be built. 2012-01-03 14:52:20 -04:00
Joey Hess
34abd7bca8 no implicit dotfiles in add
Dotfiles, and files inside dotdirs are not added by "git annex add" unless
the dotfile or directory is explicitly listed. So "git annex add ." will
add all untracked files in the current directory except for those in
dotdirs.

One reason for this is that it will make git-annex more usable with vcsh,
where you don't want "vcsh big annex add" to check in all the dotfiles
that are already versioned in other repositories.

(If you're using vcsh for repos that contain non-dotfiles, this won't help,
and you'll need to .gitignore such things, but this will cover the common
case.)

A more general reason why this seems like a good idea is the same reason ls
ignores dotfiles, just the unix convention that they are cruft that is kept
out of the way most of the time.

All the other git-annex commands still do deal with any dotfiles that do
get into the annex. This seemed right because if I've gone to the trouble
to add a dotfile, I will want "git annex get ." to get it along with
everything else.
2012-01-03 00:11:00 -04:00
Joey Hess
f0c4a1c770 annex.web-options also works 2012-01-02 14:22:50 -04:00
Joey Hess
aa0882691b Added remote.name.annex-web-options configuration setting, which can be used to provide parameters to whichever of wget or curl git-annex uses (depends on which is available, but most of their important options suitable for use here are the same). 2012-01-02 14:20:20 -04:00
Joey Hess
9b12701b9e releasing version 3.20111231 2011-12-31 15:07:45 -04:00
Joey Hess
e7d3e546c2 sync --fast: Selects some of the remotes with the lowest annex.cost and syncs those, in addition to any specified at the command line. 2011-12-30 21:17:36 -04:00
Joey Hess
dd8451f0f8 update 2011-12-30 20:40:59 -04:00
Joey Hess
8f4fdb3f97 Merge branch 'new-monad-control'
Conflicts:
	debian/changelog
2011-12-30 20:08:01 -04:00
Joey Hess
5287d1dc3f fixed behavior when multiple insteadOf configs are provided for the same url base
Consider this git config --list case:

url.git+ssh://git@example.com/.insteadOf=gl
url.git+ssh://git@example.com/.insteadOf=shared

Since config is stored in a Map, only the last of the values for this key
was stored and available for use by the insteadOf code. But that
is wrong; git allows either "gl" or "shared" to be used in an url and
the insteadOf value to be substituted in.

To support this, it seems best to keep the existing config map as-is,
and add a second map that accumulates a list of multiple values for
config keys. This new fullconfig map can be used in the rare places where
multiple values for a key make sense, without needing to complicate
everything else.

Haskell's laziness and data sharing keep the overhead of adding
this second map low.
2011-12-30 14:07:46 -04:00
Joey Hess
85f1f3a63a Updated to build with monad-control 0.3. 2011-12-24 23:05:23 -04:00
Joey Hess
fdf02986cf find --json 2011-12-23 01:08:19 -04:00
Joey Hess
06bafae9e0 Format strings can be specified using the new --find option, to control what is output by git annex find. 2011-12-22 18:31:44 -04:00
Joey Hess
5a275a3f5d Can now be built with older git versions (before 1.7.7); the resulting binary should only be used with old git.
Remove git old version check from configure, and use the git version
it was built against in the git check-attr code.
2011-12-22 15:01:13 -04:00
Joey Hess
6bffe509d7 Add --include, which is the same as --not --exclude. 2011-12-22 14:00:17 -04:00
Joey Hess
20482712d0 Improve deletion of files from rsync special remotes. Closes: #652849
Rsync is only run once, with include / exclude rules used to specify
exactly what to delete. This is faster, and avoids ugly error messages
from rsync, and doesn't fail if the content already got deleted somehow.
2011-12-21 16:57:03 -04:00
Joey Hess
a76b13b848 test fsck in bare repos (75%) 2011-12-21 14:20:41 -04:00
Joey Hess
8cdcd78b21 test bup special remote (74% coverage) 2011-12-21 13:50:33 -04:00
Joey Hess
c61f3d7b7b test coverage improvements 2011-12-21 12:46:14 -04:00
Joey Hess
82a145df91 test encrypted special remote
This involved adding a test harness to run gpg with a dummy key, and lots
of fun.
2011-12-20 23:24:06 -04:00
Joey Hess
cc88abd0ad Test suite improvements. Current top-level test coverage: 68%
Been higher before, but a lot of new code has been added.
2011-12-20 17:31:25 -04:00
Joey Hess
1c28237e0c map: --fast disables use of dot to display map
Generally useful, and allows the test suite to test it.
2011-12-20 16:42:35 -04:00
Joey Hess
da0bdc1a57 Fix the hook special remote, which bitrotted a while ago. 2011-12-20 12:23:49 -04:00
Joey Hess
09cd042775 Properly handle multiline git config values.
A crash on parsing was fixed a while ago. This adds support for fully
correctly parsing multiline git config values, using git config --null.

Since git-annex-shell configlist uses normal git config output, I left in
support for that too; the two forms of config output can be easily
identified by the parser. Since configlist only prints the annex.uuid
config, there's no risk of multiline values there, so no need to change it.
2011-12-15 12:48:27 -04:00
Joey Hess
6edaabd040 reinject: Add a sanity check for using an annexed file as the source file. 2011-12-12 13:43:52 -04:00
Joey Hess
acd7a52dfd always find optimal merge
Testing b9ac585454, it didn't find the
optimal union merge, the second sha was the one to use, at least in
the case I tried. Let's just try all shas to see if any can be reused.

I stopped using the expensive nub, so despite the use of sets to
sort/uniq file contents, this is probably as fast or faster than it
was before.
2011-12-12 01:59:29 -04:00
Joey Hess
acb2d5a5a6 releasing version 3.20111211 2011-12-11 21:55:51 -04:00
Joey Hess
8680c415de slow, stupid, and safe index updating
Always merge the git-annex branch into .git/annex/index before making a
commit from the index.

This ensures that, when the branch has been changed in any way
(by a push being received, or changes pulled directly into it, or
even by the user checking it out, and committing a change), the index
reflects those changes.

This is much too slow; it needs to be optimised to only update the
index when the branch has really changed, not every time.

Also, there is an unhandled race, when a change is made to the branch
right after the index gets updated. I left it in for now because it's
unlikely and I didn't want to complicate things with additional locking
yet.
2011-12-11 15:05:53 -04:00
Joey Hess
10e8028a42 Fix bug in last version in getting contents from bare repositories. 2011-12-10 18:45:55 -04:00
Joey Hess
c5267802f3 version dependency on old monad-control
This should let cabal build it with the right version.
2011-12-10 12:56:02 -04:00
Joey Hess
fb8231f3a1 sync: New command that synchronises the local repository and default remote, by running git commit, pull, and push for you. 2011-12-09 20:27:22 -04:00
Joey Hess
14e9b87d44 unannex improvements
Added files don't have to be committed before they can be unannexed.

unannex no longer commits existing staged changes

unannex of the last file in a directory now works, before it failed because
git rm deleted the directory out from under it,
2011-12-09 13:07:31 -04:00
Joey Hess
e3f1568e0f Fix caching of decrypted ciphers, which failed when drop had to check multiple different encrypted special remotes. 2011-12-08 16:01:46 -04:00
Joey Hess
8047bba5b9 add: If interrupted, add can leave files converted to symlinks but not yet added to git. Running the add again will now clean up this situtation. 2011-12-07 16:53:53 -04:00
Joey Hess
480495beb4 Prevent key names from containing newlines.
There are several places where it's assumed a key can be written on one
line. One is in the format of the .git/annex/unused files. The difficult
one is that filenames derived from keys are fed into git cat-file --batch,
which has a line based input. (And no -z option.)

So, for now it's best to block such keys being created.
2011-12-06 13:03:09 -04:00
Joey Hess
b6c8a0119a map: Fix a failure to detect a loop when both repositories are local and refer to each other with relative paths. 2011-12-04 12:23:10 -04:00
Joey Hess
ff5df842ea releasing version 3.20111203 2011-12-03 21:13:21 -04:00
Joey Hess
251c01d51e dead: A command which says that a repository is gone for good and you don't want git-annex to mention it again. 2011-12-02 16:59:55 -04:00
Joey Hess
fb68a7881f convert rsync special backend to using both hash directory types 2011-12-02 15:50:27 -04:00
Joey Hess
97f809c006 wording 2011-12-02 14:18:55 -04:00
Joey Hess
998d8f7968 clarify 2011-11-28 23:23:14 -04:00
Joey Hess
f4bf444ae0 store content in hashDirLower directories in bare repositories
When storing content in bare repositories, use the hashDirLower
directories. Bare repositories can be on USB drives, which might
use the FAT filesystem, and fall afoul of recent bugs in linux's handling
of mixed case on FAT. Using hashDirLower avoids that.
2011-11-28 22:55:40 -04:00
Joey Hess
e32ab766b0 --inbackend can be used to make git-annex only operate on files whose content is stored using a specified key-value backend. 2011-11-28 17:45:47 -04:00
Joey Hess
6869e6023e support .git/annex on a different disk than the rest of the repo
The only fully supported thing is to have the main repository on one disk,
and .git/annex on another. Only commands that move data in/out of the annex
will need to copy it across devices.

There is only partial support for putting arbitrary subdirectories of
.git/annex on different devices. For one thing, but this can require more
copies to be done. For example, when .git/annex/tmp is on one device, and
.git/annex/journal on another, every journal write involves a call to
mv(1). Also, there are a few places that make hard links between various
subdirectories of .git/annex with createLink, that are not handled.

In the common case without cross-device, the new moveFile is actually
faster than renameFile, avoiding an unncessary stat to check that a file
(not a directory) is being moved. Of course if a cross-device move is
needed, it is as slow as mv(1) of the data.
2011-11-28 16:17:55 -04:00
Joey Hess
2bf3addf49 Bugfix: dropunused did not drop keys with two spaces in their name. 2011-11-27 13:50:05 -04:00
Joey Hess
a72f0ecc27 changelog 2011-11-26 12:06:03 -04:00
Joey Hess
12243d2279 Flush json output, avoiding a buffering problem that could result in doubled output.
The bug was that with --json, output lines were sometimes doubled. For
example, git annex init --json would output two lines, despite only running
one thing. Adding to the weirdness, this only occurred when the output
was redirected to a pipe or a file.

Strace showed two processes outputting the same buffered output.
The second process was this writer process (only needed to work around
bug #624389):

                _ <- forkProcess $ do
                        hPutStr toh $ unlines paths
                        hClose toh
                        exitSuccess

The doubled output occurs when this process exits, and ghc flushes the
inherited stdout buffer. Why only when piping? I don't know, but ghc may
be behaving differently when stdout is not a terminal.

While this is quite possibly a ghc bug, there is a nice fix in git-annex.
Explicitly flushing after each chunk of json is output works around the
problem, and as a side effect, json is streamed rather than being output
all at the end when performing an expensive operaition.

However, note that this means all uses of putStr in git-annex must be
explicitly flushed. The others were, already.
2011-11-25 11:51:06 -04:00
Joey Hess
75a590bdd8 Put a workaround in the directory special remote for strange behavior with VFAT filesystems on Linux (mounted with shortname=mixed) 2011-11-22 18:21:28 -04:00
Joey Hess
322d9b1cc0 releasing version 3.20111122 2011-11-22 14:40:11 -04:00
Joey Hess
7f7ae7a3b1 find: Support --print0
It would be nice if command-specific options were supported. The first
difficulty is that which command is being called is not known until after
getopt; but that could be worked around by finding the first non-dashed
parameter. Storing the settings without putting them in the annex monad is
the next difficulty; it could perhaps be handled by making the seek stage
pass applicable settings into the start stage (and from there on to perform
as needed). But that still leaves a problem, what data type to use to
represent the options between getopt and seek?
2011-11-22 14:06:31 -04:00
Joey Hess
d675f1c82e status --json now shows most things
Left out the backend usage graph for now, and bad/temp directory sizes
are only displayed when present. Also, disk usage is returned as a string
with units, which I can see changing later.
2011-11-20 14:12:48 -04:00
Joey Hess
c50a5fbeb4 status: Include all special remotes in the list of repositories.
Special remotes do not always have a description listed in uuid.log,
and such ones were not listed before.
2011-11-18 13:22:48 -04:00
Joey Hess
1326bb8635 Avoid excessive escaping for rsync special remotes that are not accessed over ssh.
This is actually tricky, 45bbf210a1 added
the escaping because it's needed for rsync that does go over ssh.
So I had to detect whether the remote's rsync url will use ssh or not,
and vary the escaping.
2011-11-18 12:53:48 -04:00
Joey Hess
c70b78d40a migrate: Don't fall over a stale temp file. 2011-11-17 18:29:28 -04:00
Joey Hess
2bb6b02948 When not run in a git repository, git-annex can still display a usage message, and "git annex version" even works.
Things that sound simple, but are made hard by the Annex monad being built
with the assumption that there will always be a git repo.
2011-11-16 00:49:09 -04:00
Joey Hess
84784e2ca1 cleanup 2011-11-16 00:07:06 -04:00
Joey Hess
21a925dcf1 merge: Now runs in constant space.
Before, a merge was first calculated, by running various actions that
called git and built up a list of lines, which were at the end sent
to git update-index. This necessarily used space proportional to the size
of the diff between the trees being merged.

Now, lines are streamed into git update-index from each of the actions in
turn.

Runtime size of git-annex merge when merging 50000 location log files
drops from around 100 mb to a constant 4 mb.

Presumably it runs quite a lot faster, too.
2011-11-15 23:28:01 -04:00