Commit graph

805 commits

Author SHA1 Message Date
Joey Hess
ff3644ad38 status: Fixed to run in nearly constant space.
Before, it leaked space due to caching lists of keys. Now all necessary
data about keys is calculated as they stream in.

The "nearly constant" is due to getKeysPresent, which builds up a lot
of [] thunks as it traverses .git/annex/objects/. Will deal with it later.
2012-03-11 17:15:58 -04:00
Joey Hess
b086e32c63 unused: Reduce memory usage significantly.
Much of the memory bloat turned out to be due to getKeysReferenced
containing a mapM, which is strict and buffered the whole list
rather than streaming it.

The other half of the bloat was due to building a temporary Set
in order to call S.difference. While that is more cpu efficient,
I switched to successive S.delete, since with it, I can run a whole
git annex unused in less than 8 mb of memory.

The whole Set of keys with content available is still stored in memory,
so running unused in a repo with a whole lot of file content will still
use more memory. In a repo containing 6000 files, it needed 40 mb.

Note that the status command still uses the bloatful getKeysReferenced.
2012-03-11 16:24:07 -04:00
Joey Hess
997e29f294 sync: Sync to lower cost remotes first.
This has two benefits.

1. When a lot of refs are going to be received, get them via lower cost
   connection when possible.
2. Allows ctrl-c of sync after the cheaper remotes have been pulled from
   (or pushed to).
2012-03-10 15:37:38 -04:00
Joey Hess
5ab82230f7 fsck: Fix up any broken links and misplaced content caused by the directory hash calculation bug fixed in the last release. 2012-03-10 14:46:21 -04:00
Joey Hess
433b5fe59e releasing version 3.20120309 2012-03-09 20:14:34 -04:00
Joey Hess
bca3fd65b9 fix key directory hash calculation code
Fix Key directory hash calculation code to behave as it did before version
3.20120227 when a key contains non-ascii.

The hash directories for a given Key are based on its md5sum.
Prior to ghc 7.4, Keys contained raw, undecoded bytes, so the md5sum was
taken of each byte in turn. With the ghc 7.4 filename encoding change,
keys contains decoded unicode characters (possibly with surrigates for
undecodable bytes). This changes the result of the md5sum, since the md5sum
used is pure haskell and supports unicode. And that won't do, as git-annex
will start looking in a different hash directory for the content of a key.

The surrigates are particularly bad, since that's essentially a ghc
implementation detail, so could change again at any time. Also, changing
the locale changes how the bytes are decoded, which can also change
the md5sum.

Symptoms would include things like:

* git annex fsck would complain that no copies existed of a file,
  despite its symlink pointing to the content that was locally present
* git annex fix would change the symlink to use the wrong hash
  directory.

Only WORM backend is likely to have been affected, since only it tends
to include much filename data (SHA1E could in theory also be affected).

I have not tried to support the hash directories used by git-annex versions
3.20120227 to 3.20120308, so things added with those versions with WORM
will require manual fixups. Sorry for the inconvenience!
2012-03-09 20:03:51 -04:00
Joey Hess
0d41899304 releasing version 3.20120230 2012-03-05 13:47:20 -04:00
Joey Hess
51338486dc Fix a bug in symlink calculation code, that triggered in rare cases where an annexed file is in a subdirectory that nearly matched to the .git/annex/object/xx/yy subdirectories.
This is a straight up pure-code stinker. The relative path calculation
looked for common subdirectories in the two paths, but failed to stop
after the paths diverged. When a later pair of subdirectories were the
same, the resulting relative path was wrong.

Added regression test for this.
2012-03-05 12:42:52 -04:00
Joey Hess
52e88f3ebf add remote start and stop hooks
Locking is used, so that, if there are multiple git-annex processes
using a remote concurrently, the stop hook is only run by the last
process that uses it.
2012-03-04 19:12:58 -04:00
Joey Hess
9856c24a59 Add progress bar display to the directory special remote.
So far I've only written progress bars for sending files, not yet
receiving.

No longer uses external cp at all. ByteString IO is fast enough.
2012-03-04 03:17:25 -04:00
Joey Hess
3436aba6de Directory special remotes now support chunking files written to them
Avoiding writing files larger than a specified size is useful on certian
things. For example, box.com has a file size limit of 100 mb. Could also
be useful on really crappy removable media.
2012-03-03 18:05:55 -04:00
Joey Hess
1098bc37ab "here" can be used to refer to the current repository, which can read better than the old "." (which still works too). 2012-03-01 22:35:10 -04:00
Joey Hess
6571831b92 releasing version 3.20120229 2012-02-29 02:39:44 -04:00
Joey Hess
e5fee3f352 Fix test suite to not require a unicode locale.
Without a unicode locale, it will fail to print a unicode filename to
console, and fails.
2012-02-29 02:32:05 -04:00
Joey Hess
8cae4115a8 releasing version 3.20120227 2012-02-27 13:07:04 -04:00
Joey Hess
2fd294d06f move --from, copy --from: 10 times faster scanning remote on local disk
Rather than go through the location log to see which files are present on
the remote, it simply looks at the disk contents directly.

I benchmarked this speeding up scanning 834 files, from an annex on my
phone's SSD, from 11.39 seconds to 1.31 seconds. (No files actually moved.)

Also benchmarked 8139 files, from an annex on spinning storage,
speeding up from 103.17 to 13.39 seconds.

Note that benchmarking with an encrypted annex on flash actually showed a
minor slowdown with this optimisation -- from 13.93 to 14.50 seconds. Seems
the overhead of doing the crypto needed to get the filenames to directly
check can be higher than the overhead of looking up data in the location
log. (Which says good things about how well the location log and git have
been optimised!) It *may* make sense to make encrypted local remotes not
have hasKeyCheap set; further benchmarking is called for.
2012-02-26 14:59:48 -04:00
Joey Hess
b889581945 version dependency on openssh-client
This is only to ensure that it's as new a version as it was built with, so
partial upgrades work.
2012-02-25 19:31:46 -04:00
Joey Hess
12b89a3eb8 configure: Check if ssh connection caching is supported by the installed version of ssh and default annex.sshcaching accordingly. 2012-02-25 19:15:29 -04:00
Joey Hess
c3fbe07d7a do a cleanup commit after moving data from or to a git remote
Added Annex.cleanup, which is a general purpose interface for adding
actions to run at the end.

Remotes with the old git-annex-shell will commit every time, and have no
commit command, so hide stderr when running the commit command.
2012-02-25 18:02:49 -04:00
Joey Hess
1f73db3469 improve alwayscommit=false mode
Now changes are staged into the branch's index, but not committed,
which avoids growing a large journal. And sync and merge always
explicitly commit, ensuring that even when they do nothing else,
they commit the staged changes.

Added a flag file to indicate that the branch's journal contains
uncommitted changes. (Could use git ls-files, but don't want to run
that every time.)

In the future, this ability to have uncommitted changes staged in the
journal might be used on remotes after a series of oneshot commands.
2012-02-25 16:18:55 -04:00
Joey Hess
b49c0c2633 add annex.alwayscommit option
To avoid commits of data to the git-annex branch after each command
is run, set annex.alwayscommit=false. Its data will then be committed
less frequently, when a merge or sync is done.
2012-02-25 15:31:42 -04:00
Joey Hess
df3a310b83 update copyright format url 2012-02-25 10:40:05 -04:00
Joey Hess
bd66f962d3 Deal with NFS problem that caused a failure to remove a directory when removing content from the annex.
I was able to reproduce this on linux using the kernel's nfs server and
mounting localhost:/. Determined that removing the directory fails when
the just-deleted file in it was locked. Considered dropping the lock
before removing the directory, but this would complicate parts of the code
that should not need to worry about locking. So instead, ignore the failure
to remove the directory in this case.

While I was at it, made it attempt to remove both levels of hash
directories, in case they're empty.
2012-02-24 16:30:47 -04:00
Joey Hess
5bf07b3b5c Store web special remote url info in a more efficient location.
storing it in remotes/web/xx/yy/foo.log meant lots of extra directory
objects in git. Now I use xx/yy/foo.log.web, which is just as unique, but
more efficient since foo.log is there anyway.

Of course, it still looks in the old location too.
2012-02-17 23:15:29 -04:00
Joey Hess
db6b4cdfcf rekey: New plumbing level command, can be used to change the keys used for files en masse. 2012-02-16 16:36:35 -04:00
Joey Hess
aeaaa0ff87 reorder 2012-02-16 15:07:59 -04:00
Joey Hess
39c3f56b33 addurl: Add --pathdepth option. 2012-02-16 12:25:19 -04:00
Joey Hess
4d8afc1713 tweak wording 2012-02-15 19:43:15 -04:00
Joey Hess
63152428e9 changelog 2012-02-15 17:33:21 -04:00
Joey Hess
52c5b164d8 Added a annex.queuesize setting
useful when adding hundreds of thousands of files on a system with plenty
of memory.

git add gets quite slow in such a large repository, so if the system has
more than the ~32 mb of memory the queue can use by default, it's a useful
optimisation to increase the queue size, in order to decrease the number
of times git add is run.
2012-02-15 11:14:19 -04:00
Joey Hess
7ebd98d8d8 fix memory leak when staging the journal
The list of files had to be retained until the end so it could be deleted.
Also, a list of update-index lines was generated and only then fed into it.
Now everything streams in constant space.
2012-02-14 14:37:59 -04:00
Joey Hess
a40ec5e03e Fixed a memory leak due to excessive strictness when committing journal files.
When hashing the files, the entire list of shas was read strictly.
That was entirely unnecessary, since there's a cleanup action run
after they're consumed.
2012-02-14 11:20:34 -04:00
Joey Hess
cb631ce518 whereis: Prints the urls of files that the web special remote knows about. 2012-02-14 03:49:48 -04:00
Joey Hess
59b2adea4f changelog for a964012fc3
Turns out that commit really made some serious improvements to memory use.
With the lazy state monad, git-annex add in a huge tree grew seemingly
without bound until it overflowed the stack. With the strict monad,
it uses 42 mb max.

It's possible another change since the 3.20120123 release fixed that,
but a964012fc3 seems most likely.
2012-02-13 16:58:58 -04:00
Joey Hess
17fed709c8 addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key. 2012-02-10 19:23:46 -04:00
Joey Hess
9030f68452 When checking that an url has a key, verify that the Content-Length, if available, matches the size of the key.
If there's no Content-Length, or the key has no size, this check is not
done, but it should happen most of the time, and protect against web
content that has changed.
2012-02-10 19:23:41 -04:00
Joey Hess
d55f3c0716 Fix teardown of stale cached ssh connections. 2012-02-09 21:49:46 -04:00
Joey Hess
1c0bd81ba6 addurl: Normalize badly encoded urls. 2012-02-09 14:19:58 -04:00
Joey Hess
ef013506cb addurl: Added a --file option
Can be used to specify what file the url is added to. This can be used to
override the default filename that is used when adding an url, which is
based on the url. Or, when the file already exists, the url is recorded as
another location of the file.
2012-02-08 15:35:29 -04:00
Joey Hess
57a747d081 S3: Fix irrefutable pattern failure when accessing encrypted S3 credentials. 2012-02-08 11:41:15 -04:00
Joey Hess
995bf51e10 correction 2012-02-07 16:52:39 -04:00
Joey Hess
3f4f96228e changelog 2012-02-06 20:42:49 -04:00
Joey Hess
91fc975964 note 7.4 needed 2012-02-04 14:51:52 -04:00
Joey Hess
ed64bd8a4b remove; unused 2012-01-30 13:20:36 -04:00
Joey Hess
b81d662cbf Avoid repeated location log commits when a remote is receiving files.
Done by adding a oneshot mode, in which location log changes are written to
the journal, but not committed. Taking advantage of git-annex's existing
ability to recover in this situation.

This is used by git-annex-shell and other places where changes are made to
a remote's location log.
2012-01-28 15:41:52 -04:00
Joey Hess
ce5637498f remove Utility.Conditional and use IfElse
This drops the >>! and >>? with the nice low fixity. IfElse does have
undocumented >>=>>! and >>=>>? operators, but I deem that too fishy.
Anyway, using whenM and unlessM is easier; I sometimes mixed the operators
up.
2012-01-24 16:22:07 -04:00
Joey Hess
20d0288802 releasing version 3.20120123 2012-01-23 15:09:50 -04:00
Joey Hess
47250a153a ssh connection caching
Ssh connection caching is now enabled automatically by git-annex. Only one
ssh connection is made to each host per git-annex run, which can speed some
things up a lot, as well as avoiding repeated password prompts. Concurrent
git-annex processes also share ssh connections. Cached ssh connections are
shut down when git-annex exits.

Note: The rsync special remote does not yet participate in the ssh
connection caching.
2012-01-20 17:14:56 -04:00
Joey Hess
61dbad505d fsck --from remote --fast
Avoids expensive file transfers, at the expense of checking file size
and/or contents.

Required some reworking of the remote code.
2012-01-20 13:23:11 -04:00
Joey Hess
711c154561 update NEWS
Add news item recommending fscking directory special remotes.

Remote news item about URL backend being removed; it was later added back
to be used by git annex addurl --fast.

Link NEWS into top level.
2012-01-19 15:27:39 -04:00
Joey Hess
90319afa41 fsck --from
Fscking a remote is now supported. It's done by retrieving
the contents of the specified files from the remote, and checking them,
so can be an expensive operation.

(Several optimisations are possible, to speed it up, of course.. This is
the slow and stupid remote fsck to start with.)

Still, if the remote is a special remote, or a git repository that you
cannot run fsck in locally, it's nice to have the ability to fsck it.

If you have any directory special remotes, now would be a good time to
fsck them, in case you were hit by the data loss bug fixed in the
previous release!
2012-01-19 15:24:05 -04:00
Joey Hess
2837e8fef1 releasing version 3.20120116 2012-01-16 16:52:26 -04:00
Joey Hess
f161b5eb59 Fix data loss bug in directory special remote
When moving a file to the remote failed, and partially transferred content
was left behind in the directory, re-running the same move would think it
succeeded and delete the local copy.

I reproduced data loss when moving files to a partition that was almost
full. Interrupting a transfer could have similar results.

Easily fixed by using a temp file which is then moved atomically into place
once the transfer completes.

I've audited other calls to copyFileExternal, and other special remote
file transfer code; everything else seems to use temp files correctly
(rsync, git), or otherwise use atomic transfers (bup, S3).
2012-01-16 16:28:15 -04:00
Joey Hess
e3ea5fe938 debhelper v9
kills that ugly python message during build
2012-01-15 14:53:38 -04:00
Joey Hess
ce608303a3 releasing version 3.20120115 2012-01-15 14:02:32 -04:00
Joey Hess
37b5b1bf0d Fix QuickCheck dependency in cabal file. 2012-01-15 13:53:51 -04:00
Joey Hess
81856c3175 add a configure check for StatFS
This way, the build log will indicate whether StatFS can be relied on.
I've tested all the failing architectures now, and on all of them,
the StatFS code now returns Nothing, rather than Just nonsense.

Also, if annex.diskreserve is set on a platform where StatFS is not
working, git-annex will complain.

Also, the Makefile was missing the sources target used when building with
cabal.
2012-01-15 13:49:32 -04:00
Joey Hess
0eed604446 Add a sanity check for bad StatFS results.
git-annex FTBFS on s390, mips, powerpc, sparc. That StatFS code is failing
on all of them. At least on s390, the failure appears as:

Just (FileSystemStats {fsStatBlockSize = 4096, fsStatBlockCount = 0,
fsStatByteCount = 0, fsStatBytesFree = 0, fsStatBytesAvailable = 0,
fsStatBytesUsed = 0})

While I don't understand why this is happening, or how to fix it,
bandaid over it by checking for obviously bad values and returning Nothing.
That disables disk free space checking, but at least git-annex will work.

Upstream bug: http://code.google.com/p/xmobar/issues/detail?id=70
2012-01-14 17:17:20 -04:00
Joey Hess
b88ecbdc1b Add libghc-testpack-dev to build depends on all arches. 2012-01-13 15:50:56 -04:00
Joey Hess
1ae780ee79 git-annex, git-union-merge: Support GIT_DIR and GIT_WORK_TREE.
Note that GIT_WORK_TREE cannot influence GIT_DIR; that is necessary for
git-fake-bare and vcsh type things to work.
2012-01-13 12:52:09 -04:00
Joey Hess
0d5c402210 Add annex-trustlevel configuration settings, which can be used to override the trust level of a remote.
This overrides the trust.log, and is overridden by the command-line trust
parameters.

It would have been nicer to have Logs.Trust.trustMap just look up the
configuration for all remotes, but a dependency loop prevented that
(Remotes depends on Logs.Trust in several ways). So instead, look up
the configuration when building remotes, storing it in the same forcetrust
field used for the command-line trust parameters.
2012-01-09 23:31:44 -04:00
Joey Hess
7675b83efa map: Fix display of remote repos
A change to break local cycles made remote repos be dropped entirely.
2012-01-08 16:05:57 -04:00
Joey Hess
a35278430a log: Add --gource mode, which generates output usable by gource.
As part of this, I fixed up how log was getting the descriptions of
remotes.
2012-01-07 18:18:09 -04:00
Joey Hess
3da28cad07 releasing version 3.20120106 2012-01-07 13:50:35 -04:00
Joey Hess
60c1aeeb6f Fix overbroad gpg --no-tty fix from last release.
Only set --no-tty when GPG_AGENT_INFO is set and batch mode is used.

In the test suite, set GPG_AGENT_INFO to /dev/null to avoid the test suite
relying on /dev/tty.
2012-01-07 12:38:08 -04:00
Joey Hess
b59759e33c typo 2012-01-06 17:52:16 -04:00
Joey Hess
a3a9f87047 log: New command that displays the location log for file, showing each repository they were added to and removed from.
This needs to run git log on the location log files to get at all past
versions of the file, which tends to be a bit slow.

It would be possible to make a version optimised for showing the location
logs for every key. That would only need to run git log once, so would be
faster, but it would need to process an enormous amount of data, so
would not speed up the individual file case.

In the future it would be nice to support log --format. log --json also
doesn't work right yet.
2012-01-06 15:40:07 -04:00
Joey Hess
f534fcc7b1 remove S3stub stuff
Let's keep that in a no-s3 branch, which can be merged into eg,
debian-stable.
2012-01-05 23:14:10 -04:00
Joey Hess
c371c40a88 Don't list S3 as a remote type when built without S3 support. 2012-01-05 23:11:07 -04:00
Joey Hess
0b27e6baa0 Support unescaped repository urls, like git does.
Turns out that git will accept a .git/config containing an url with eg,
spaces in its name. Handle this by escaping the url if it's not valid.

This also fixes support for urls containing escaped characters like %20
for space. Before, the path from the url was not unescaped properly.
2012-01-05 14:32:20 -04:00
Joey Hess
338d472ca2 releasing version 3.20120105 2012-01-05 13:51:13 -04:00
Joey Hess
769edd6b08 Run gpg with --no-tty. Closes: #654721 2012-01-05 13:44:09 -04:00
Joey Hess
a1aea174d7 fsck: Do backend-specific check before checking numcopies is satisfied.
This way, when a checksum check fails and the content is moved aside,
the numcopies check also warns if there are not enough copies.
2012-01-03 18:40:47 -04:00
Joey Hess
7e6a54f984 Added quickcheck to build dependencies, and fail if test suite cannot be built. 2012-01-03 14:52:20 -04:00
Joey Hess
34abd7bca8 no implicit dotfiles in add
Dotfiles, and files inside dotdirs are not added by "git annex add" unless
the dotfile or directory is explicitly listed. So "git annex add ." will
add all untracked files in the current directory except for those in
dotdirs.

One reason for this is that it will make git-annex more usable with vcsh,
where you don't want "vcsh big annex add" to check in all the dotfiles
that are already versioned in other repositories.

(If you're using vcsh for repos that contain non-dotfiles, this won't help,
and you'll need to .gitignore such things, but this will cover the common
case.)

A more general reason why this seems like a good idea is the same reason ls
ignores dotfiles, just the unix convention that they are cruft that is kept
out of the way most of the time.

All the other git-annex commands still do deal with any dotfiles that do
get into the annex. This seemed right because if I've gone to the trouble
to add a dotfile, I will want "git annex get ." to get it along with
everything else.
2012-01-03 00:11:00 -04:00
Joey Hess
f0c4a1c770 annex.web-options also works 2012-01-02 14:22:50 -04:00
Joey Hess
aa0882691b Added remote.name.annex-web-options configuration setting, which can be used to provide parameters to whichever of wget or curl git-annex uses (depends on which is available, but most of their important options suitable for use here are the same). 2012-01-02 14:20:20 -04:00
Joey Hess
9b12701b9e releasing version 3.20111231 2011-12-31 15:07:45 -04:00
Joey Hess
e7d3e546c2 sync --fast: Selects some of the remotes with the lowest annex.cost and syncs those, in addition to any specified at the command line. 2011-12-30 21:17:36 -04:00
Joey Hess
dd8451f0f8 update 2011-12-30 20:40:59 -04:00
Joey Hess
8f4fdb3f97 Merge branch 'new-monad-control'
Conflicts:
	debian/changelog
2011-12-30 20:08:01 -04:00
Joey Hess
5287d1dc3f fixed behavior when multiple insteadOf configs are provided for the same url base
Consider this git config --list case:

url.git+ssh://git@example.com/.insteadOf=gl
url.git+ssh://git@example.com/.insteadOf=shared

Since config is stored in a Map, only the last of the values for this key
was stored and available for use by the insteadOf code. But that
is wrong; git allows either "gl" or "shared" to be used in an url and
the insteadOf value to be substituted in.

To support this, it seems best to keep the existing config map as-is,
and add a second map that accumulates a list of multiple values for
config keys. This new fullconfig map can be used in the rare places where
multiple values for a key make sense, without needing to complicate
everything else.

Haskell's laziness and data sharing keep the overhead of adding
this second map low.
2011-12-30 14:07:46 -04:00
Joey Hess
85f1f3a63a Updated to build with monad-control 0.3. 2011-12-24 23:05:23 -04:00
Joey Hess
fdf02986cf find --json 2011-12-23 01:08:19 -04:00
Joey Hess
06bafae9e0 Format strings can be specified using the new --find option, to control what is output by git annex find. 2011-12-22 18:31:44 -04:00
Joey Hess
5a275a3f5d Can now be built with older git versions (before 1.7.7); the resulting binary should only be used with old git.
Remove git old version check from configure, and use the git version
it was built against in the git check-attr code.
2011-12-22 15:01:13 -04:00
Joey Hess
6bffe509d7 Add --include, which is the same as --not --exclude. 2011-12-22 14:00:17 -04:00
Joey Hess
20482712d0 Improve deletion of files from rsync special remotes. Closes: #652849
Rsync is only run once, with include / exclude rules used to specify
exactly what to delete. This is faster, and avoids ugly error messages
from rsync, and doesn't fail if the content already got deleted somehow.
2011-12-21 16:57:03 -04:00
Joey Hess
a76b13b848 test fsck in bare repos (75%) 2011-12-21 14:20:41 -04:00
Joey Hess
8cdcd78b21 test bup special remote (74% coverage) 2011-12-21 13:50:33 -04:00
Joey Hess
c61f3d7b7b test coverage improvements 2011-12-21 12:46:14 -04:00
Joey Hess
82a145df91 test encrypted special remote
This involved adding a test harness to run gpg with a dummy key, and lots
of fun.
2011-12-20 23:24:06 -04:00
Joey Hess
cc88abd0ad Test suite improvements. Current top-level test coverage: 68%
Been higher before, but a lot of new code has been added.
2011-12-20 17:31:25 -04:00
Joey Hess
1c28237e0c map: --fast disables use of dot to display map
Generally useful, and allows the test suite to test it.
2011-12-20 16:42:35 -04:00
Joey Hess
da0bdc1a57 Fix the hook special remote, which bitrotted a while ago. 2011-12-20 12:23:49 -04:00
Joey Hess
09cd042775 Properly handle multiline git config values.
A crash on parsing was fixed a while ago. This adds support for fully
correctly parsing multiline git config values, using git config --null.

Since git-annex-shell configlist uses normal git config output, I left in
support for that too; the two forms of config output can be easily
identified by the parser. Since configlist only prints the annex.uuid
config, there's no risk of multiline values there, so no need to change it.
2011-12-15 12:48:27 -04:00
Joey Hess
6edaabd040 reinject: Add a sanity check for using an annexed file as the source file. 2011-12-12 13:43:52 -04:00
Joey Hess
acd7a52dfd always find optimal merge
Testing b9ac585454, it didn't find the
optimal union merge, the second sha was the one to use, at least in
the case I tried. Let's just try all shas to see if any can be reused.

I stopped using the expensive nub, so despite the use of sets to
sort/uniq file contents, this is probably as fast or faster than it
was before.
2011-12-12 01:59:29 -04:00
Joey Hess
acb2d5a5a6 releasing version 3.20111211 2011-12-11 21:55:51 -04:00
Joey Hess
8680c415de slow, stupid, and safe index updating
Always merge the git-annex branch into .git/annex/index before making a
commit from the index.

This ensures that, when the branch has been changed in any way
(by a push being received, or changes pulled directly into it, or
even by the user checking it out, and committing a change), the index
reflects those changes.

This is much too slow; it needs to be optimised to only update the
index when the branch has really changed, not every time.

Also, there is an unhandled race, when a change is made to the branch
right after the index gets updated. I left it in for now because it's
unlikely and I didn't want to complicate things with additional locking
yet.
2011-12-11 15:05:53 -04:00
Joey Hess
10e8028a42 Fix bug in last version in getting contents from bare repositories. 2011-12-10 18:45:55 -04:00
Joey Hess
c5267802f3 version dependency on old monad-control
This should let cabal build it with the right version.
2011-12-10 12:56:02 -04:00
Joey Hess
fb8231f3a1 sync: New command that synchronises the local repository and default remote, by running git commit, pull, and push for you. 2011-12-09 20:27:22 -04:00
Joey Hess
14e9b87d44 unannex improvements
Added files don't have to be committed before they can be unannexed.

unannex no longer commits existing staged changes

unannex of the last file in a directory now works, before it failed because
git rm deleted the directory out from under it,
2011-12-09 13:07:31 -04:00
Joey Hess
e3f1568e0f Fix caching of decrypted ciphers, which failed when drop had to check multiple different encrypted special remotes. 2011-12-08 16:01:46 -04:00
Joey Hess
8047bba5b9 add: If interrupted, add can leave files converted to symlinks but not yet added to git. Running the add again will now clean up this situtation. 2011-12-07 16:53:53 -04:00
Joey Hess
480495beb4 Prevent key names from containing newlines.
There are several places where it's assumed a key can be written on one
line. One is in the format of the .git/annex/unused files. The difficult
one is that filenames derived from keys are fed into git cat-file --batch,
which has a line based input. (And no -z option.)

So, for now it's best to block such keys being created.
2011-12-06 13:03:09 -04:00
Joey Hess
b6c8a0119a map: Fix a failure to detect a loop when both repositories are local and refer to each other with relative paths. 2011-12-04 12:23:10 -04:00
Joey Hess
ff5df842ea releasing version 3.20111203 2011-12-03 21:13:21 -04:00
Joey Hess
251c01d51e dead: A command which says that a repository is gone for good and you don't want git-annex to mention it again. 2011-12-02 16:59:55 -04:00
Joey Hess
fb68a7881f convert rsync special backend to using both hash directory types 2011-12-02 15:50:27 -04:00
Joey Hess
97f809c006 wording 2011-12-02 14:18:55 -04:00
Joey Hess
998d8f7968 clarify 2011-11-28 23:23:14 -04:00
Joey Hess
f4bf444ae0 store content in hashDirLower directories in bare repositories
When storing content in bare repositories, use the hashDirLower
directories. Bare repositories can be on USB drives, which might
use the FAT filesystem, and fall afoul of recent bugs in linux's handling
of mixed case on FAT. Using hashDirLower avoids that.
2011-11-28 22:55:40 -04:00
Joey Hess
e32ab766b0 --inbackend can be used to make git-annex only operate on files whose content is stored using a specified key-value backend. 2011-11-28 17:45:47 -04:00
Joey Hess
6869e6023e support .git/annex on a different disk than the rest of the repo
The only fully supported thing is to have the main repository on one disk,
and .git/annex on another. Only commands that move data in/out of the annex
will need to copy it across devices.

There is only partial support for putting arbitrary subdirectories of
.git/annex on different devices. For one thing, but this can require more
copies to be done. For example, when .git/annex/tmp is on one device, and
.git/annex/journal on another, every journal write involves a call to
mv(1). Also, there are a few places that make hard links between various
subdirectories of .git/annex with createLink, that are not handled.

In the common case without cross-device, the new moveFile is actually
faster than renameFile, avoiding an unncessary stat to check that a file
(not a directory) is being moved. Of course if a cross-device move is
needed, it is as slow as mv(1) of the data.
2011-11-28 16:17:55 -04:00
Joey Hess
2bf3addf49 Bugfix: dropunused did not drop keys with two spaces in their name. 2011-11-27 13:50:05 -04:00
Joey Hess
a72f0ecc27 changelog 2011-11-26 12:06:03 -04:00
Joey Hess
12243d2279 Flush json output, avoiding a buffering problem that could result in doubled output.
The bug was that with --json, output lines were sometimes doubled. For
example, git annex init --json would output two lines, despite only running
one thing. Adding to the weirdness, this only occurred when the output
was redirected to a pipe or a file.

Strace showed two processes outputting the same buffered output.
The second process was this writer process (only needed to work around
bug #624389):

                _ <- forkProcess $ do
                        hPutStr toh $ unlines paths
                        hClose toh
                        exitSuccess

The doubled output occurs when this process exits, and ghc flushes the
inherited stdout buffer. Why only when piping? I don't know, but ghc may
be behaving differently when stdout is not a terminal.

While this is quite possibly a ghc bug, there is a nice fix in git-annex.
Explicitly flushing after each chunk of json is output works around the
problem, and as a side effect, json is streamed rather than being output
all at the end when performing an expensive operaition.

However, note that this means all uses of putStr in git-annex must be
explicitly flushed. The others were, already.
2011-11-25 11:51:06 -04:00
Joey Hess
75a590bdd8 Put a workaround in the directory special remote for strange behavior with VFAT filesystems on Linux (mounted with shortname=mixed) 2011-11-22 18:21:28 -04:00
Joey Hess
322d9b1cc0 releasing version 3.20111122 2011-11-22 14:40:11 -04:00
Joey Hess
7f7ae7a3b1 find: Support --print0
It would be nice if command-specific options were supported. The first
difficulty is that which command is being called is not known until after
getopt; but that could be worked around by finding the first non-dashed
parameter. Storing the settings without putting them in the annex monad is
the next difficulty; it could perhaps be handled by making the seek stage
pass applicable settings into the start stage (and from there on to perform
as needed). But that still leaves a problem, what data type to use to
represent the options between getopt and seek?
2011-11-22 14:06:31 -04:00
Joey Hess
d675f1c82e status --json now shows most things
Left out the backend usage graph for now, and bad/temp directory sizes
are only displayed when present. Also, disk usage is returned as a string
with units, which I can see changing later.
2011-11-20 14:12:48 -04:00
Joey Hess
c50a5fbeb4 status: Include all special remotes in the list of repositories.
Special remotes do not always have a description listed in uuid.log,
and such ones were not listed before.
2011-11-18 13:22:48 -04:00
Joey Hess
1326bb8635 Avoid excessive escaping for rsync special remotes that are not accessed over ssh.
This is actually tricky, 45bbf210a1 added
the escaping because it's needed for rsync that does go over ssh.
So I had to detect whether the remote's rsync url will use ssh or not,
and vary the escaping.
2011-11-18 12:53:48 -04:00
Joey Hess
c70b78d40a migrate: Don't fall over a stale temp file. 2011-11-17 18:29:28 -04:00
Joey Hess
2bb6b02948 When not run in a git repository, git-annex can still display a usage message, and "git annex version" even works.
Things that sound simple, but are made hard by the Annex monad being built
with the assumption that there will always be a git repo.
2011-11-16 00:49:09 -04:00
Joey Hess
84784e2ca1 cleanup 2011-11-16 00:07:06 -04:00
Joey Hess
21a925dcf1 merge: Now runs in constant space.
Before, a merge was first calculated, by running various actions that
called git and built up a list of lines, which were at the end sent
to git update-index. This necessarily used space proportional to the size
of the diff between the trees being merged.

Now, lines are streamed into git update-index from each of the actions in
turn.

Runtime size of git-annex merge when merging 50000 location log files
drops from around 100 mb to a constant 4 mb.

Presumably it runs quite a lot faster, too.
2011-11-15 23:28:01 -04:00
Joey Hess
7d05ca1d6d Fix support for insteadOf url remapping. Closes: #644278 2011-11-15 14:06:38 -04:00
Joey Hess
bfe38f8ff1 status --json --fast for esc
* status: Fix --json mode (only the repository lists are currently
  displayed)
* status: --fast is back
2011-11-14 19:27:22 -04:00
Joey Hess
aa4fbbdd33 status: Now displays trusted, untrusted, and semitrusted repositories separately. 2011-11-14 16:14:17 -04:00
Joey Hess
04edae6791 Optimised union merging; now only runs git cat-file once. 2011-11-12 17:45:12 -04:00
Joey Hess
cea65b9e5b init: When run in an already initalized repository, and without a description specified, don't delete the old description. 2011-11-12 15:42:52 -04:00
Joey Hess
e9bfa8eaed avoid unnecessary auto-merge when only changing a file in the branch.
Avoids doing auto-merging in commands that don't need fully current
information from the git-annex branch. In particular, git annex add no
longer needs to auto-merge. Affected commands: Anything that doesn't
look up data from the branch, but does write a change to it.

It might seem counterintuitive that we can change a value without first
making sure we have the current value. This optimisation works because
these two sequences are equivilant:

1. pull from remote
2. union merge
3. read file from branch
4. modify file and write to branch

vs.

1. read file from branch
2. modify file and write to branch
3. pull from remote
4. union merge

After either sequence, the git-annex branch contains the same logical content
for the modified file. (Possibly with lines in a different order or
additional old lines of course).
2011-11-12 15:15:57 -04:00
Joey Hess
897bf938f6 merge: Improve commit messages to mention what was merged. 2011-11-12 14:51:19 -04:00
Joey Hess
71b216d1fb map: Support remotes with /~/ and /~user/
More accurately, it was supported already when map uses git-annex-shell,
but not when it does not.

Note that the user name cannot be shell escaped using git-annex's current
approach for shell escaping. I tried and some shells like dash cannot
cd ~'joey'. Rest of directory is still shell escaped, not for security but
in case a directory has a space or other weird character.
2011-11-11 16:18:53 -04:00
Joey Hess
826d5887b2 Automatically fix up badly formatted uuid.log entries produced by 3.20111105, whenever the uuid.log is changed (ie, by init or describe). 2011-11-11 13:42:31 -04:00
Joey Hess
2de1e2c2ce Optimized copy --from and get --from to avoid checking the location log for files that are already present.
This can be a significant speedup when running in large trees that are
only missing a few files; it makes copy --from just as fast as get.
2011-11-10 21:32:42 -04:00
Joey Hess
cf0174c922 content locking
I've tested that this solves the cyclic drop problem.
Have not looked at cyclic move, etc.
2011-11-09 21:54:42 -04:00
Joey Hess
faa4935047 Handle a case where an annexed file is moved into a gitignored directory, by having fix --force add its change. 2011-11-07 18:10:31 -04:00
Joey Hess
f8911cc69d releasing version 3.20111107 2011-11-07 13:06:58 -04:00
Joey Hess
41eecb4601 Bugfix: In the past two releases, git-annex init has written the uuid.log in the wrong format, with the UUID and description flipped.
This is my own damn fault for not making UUID a real type, and then relying
on the type checker to ensure my refactoring was correct -- which it wasn't!

I should probably add code to clean up bogus entries in the uuid.log, but
right now I want to get the fix out there to prevent people experiencing
this bug.

I should also make UUID a real data type.
2011-11-07 12:47:41 -04:00
Joey Hess
aae0417d94 Don't try to read config from repos with annex-ignore set. 2011-11-07 11:50:30 -04:00
Joey Hess
c99fb58909 merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward merges
could result in an ongoing pull/merge/push cycle.

While the git-annex branch is fast-forwarded, git-annex's index file is still
updated using the union merge strategy as before. There's no other way to
update the index that would be any faster.

It is possible that a union merge and a fast-forward result in different file
contents: Files should have the same lines, but a union merge may change
their order. If this happens, the next commit made to the git-annex branch
will have some unnecessary changes to line orders, but the consistency
of data should be preserved.

Note that when the journal contains changes, a fast-forward is never attempted,
which is fine, because committing those changes would be vanishingly unlikely
to leave the git-annex branch at a commit that already exists in one of
the remotes.

The real difficulty is handling the case where multiple remotes have all
changed. git-annex does find the best (ie, newest) one and fast forwards
to it. If the remotes are diverged, no fast-forward is done at all. It would
be possible to pick one, fast forward to it, and make a merge commit to
the rest, I see no benefit to adding that complexity.

Determining the best of N changed remotes requires N*2+1 calls to git-log, but
these are fast git-log calls, and N is typically small. Also, typically
some or all of the remote refs will be the same, and git-log is not called to
compare those. In the real world I expect this will almost always add only
1 git-log call to the merge process. (Which already makes N anyway.)
2011-11-06 15:22:40 -04:00
Joey Hess
0556dc812e releasing version 3.20111105 2011-11-05 15:55:19 -04:00
Joey Hess
0bb798e351 Pass -t to rsync to preserve timestamps. 2011-11-04 19:41:11 -04:00
Joey Hess
ef3457196a use SHA256 by default
To get old behavior, add a .gitattributes containing: * annex.backend=WORM

I feel that SHA256 is a better default for most people, as long as their
systems are fast enough that checksumming their files isn't a problem.
git-annex should default to preserving the integrity of data as well as git
does. Checksum backends also work better with editing files via
unlock/lock.

I considered just using SHA1, but since that hash is believed to be somewhat
near to being broken, and git-annex deals with large files which would be a
perfect exploit medium, I decided to go to a SHA-2 hash.

SHA512 is annoyingly long when displayed, and git-annex displays it in a
few places (and notably it is shown in ls -l), so I picked the shorter
hash. Considered SHA224 as it's even shorter, but feel it's a bit weird.

I expect git-annex will use SHA-3 at some point in the future, but
probably not soon!

Note that systems without a sha256sum (or sha256) program will fall back to
defaulting to SHA1.
2011-11-04 15:51:01 -04:00
Joey Hess
1089e85d48 add changelog for bugfix 2011-11-04 15:51:01 -04:00
Joey Hess
eec137f33a Record uuid when auto-initializing a remote so it shows in status. 2011-11-02 14:18:21 -04:00
Joey Hess
00988bcf36 fixed my build environment 2011-10-31 15:40:57 -04:00
Joey Hess
3d3e1c4c25 better command name 2011-10-31 15:18:41 -04:00
Joey Hess
380839299e The fromkey command now takes the key as its first parameter. The --key option is no longer used. 2011-10-31 12:56:07 -04:00
Joey Hess
cc1ea8f844 Removed the setkey command, and added a setcontent command with a more useful interface. 2011-10-31 12:33:41 -04:00
Joey Hess
22e9f445ab unused, dropunused: Now work in bare repositories.
Turned out I had already done all the work needed to support this when
unused started checking all branches.
2011-10-29 19:16:45 -04:00
Joey Hess
2566eb85fe fsck: Now works in bare repositories.
Checks location log information, and file contents.

Does not check that numcopies is satisfied, as .gitattributes information
about numcopies is not available in a bare repository. In practice, that
should not be a problem, since fsck is also run in a checkout and will
check numcopies there.
2011-10-29 18:03:28 -04:00
Joey Hess
ab738a403a status: Now always shows the current repository, even when it does not appear in uuid.log. 2011-10-28 19:49:01 -04:00
Joey Hess
6c31e3a8c3 drop --from is now supported to remove file content from a remote. 2011-10-28 17:26:38 -04:00
Joey Hess
b955238ec7 Fail if --from or --to is passed to commands that do not support them. 2011-10-27 18:56:54 -04:00
Joey Hess
66194684ac uninit: Add guard against being run with the git-annex branch checked out. 2011-10-27 15:47:11 -04:00
Joey Hess
83d11c03c4 wording 2011-10-27 15:24:58 -04:00
Joey Hess
f84d66fa15 reap in onLocal
Each onLocal call involves a new Annex state, so needs to clean up after it.
2011-10-27 14:55:07 -04:00
Joey Hess
373cad993d Sped up some operations on remotes that are on the same host.
Specifically, disabled trying to update the git-annex branch on the remote,
since that data is never used by operations that act on such remotes.

Also, when copying content to such a remote, skip committing the presence
information changes to its git-annex branch. Leaving it in the journal there
is ok: Any command run on the remote that needs the info will flush the
journal.

This may partially solve this bug:
http://git-annex.branchable.com/bugs/fails_to_handle_lot_of_files/
Although I still see unreaped git processes piling up when doing a copy --to.
2011-10-27 14:55:06 -04:00
Joey Hess
270c1af087 releasing version 3.20111025 2011-10-25 13:46:01 -07:00
Joey Hess
e2853b3fec update 2011-10-25 11:39:15 -07:00
Joey Hess
52c8244219 git-annex-shell: GIT_ANNEX_SHELL_READONLY and GIT_ANNEX_SHELL_LIMITED environment variables can be set to limit what commands can be run.
This could be used by eg, gitolite.
2011-10-15 19:06:35 -04:00
Joey Hess
ec169f84b1 migrate: Copy url logs for keys when migrating. 2011-10-15 16:36:56 -04:00
Joey Hess
9fa9214106 A remote can have a annexUrl configured, that is used by git-annex instead of its usual url. (Similar to pushUrl.) 2011-10-14 18:18:28 -04:00
Joey Hess
205a5b2aaa typo 2011-10-12 00:29:49 -04:00
Joey Hess
11b154e811 prep release 2011-10-11 23:03:19 -04:00
Joey Hess
402d9c7c5f oops 2011-10-11 22:54:38 -04:00
Joey Hess
9c04d1e523 fix git 1.7.7 breakage
* This version of git-annex only works with git 1.7.7 and newer.
  The breakage with old versions is subtle, and affects
  annex.numcopies .gitattributes settings, so be sure to upgrade git
  to 1.7.7. (Debian package now depends on that version.)
* Don't pass absolute paths to git show-attr, as it started following
  symlinks when that's done in 1.7.7. Instead, use relative paths,
  which show-attr only handles 100% correctly in 1.7.7. Closes: #645046

Unfortunatly I can find no way to work with the old and new gits, as
the old had bugs that require absolute paths, while the new doesn't like
them at all. And the behavior of git show-attr in 1.7.7. is the same as
eg, git add of an absolute path to a symlink, so seems entirely
intentional and not likely to change.
2011-10-11 22:53:32 -04:00
Joey Hess
10edaf6dc9 reorder 2011-10-10 16:03:32 -04:00
Joey Hess
81ed7b203d Now supports git's insteadOf configuration, to modify the url used to access a remote. Note that pushInsteadOf is not used; that and pushurl are reserved for actual git pushes. Closes: #644278 2011-10-09 14:58:32 -04:00
Joey Hess
5414bbce58 git-annex-shell uuid verification
* git-annex now asks git-annex-shell to verify that it's operating in
  the expected repository.
* Note that this git-annex will not interoperate with remotes using
  older versions of git-annex-shell.

The reason for this check is to avoid git-annex getting confused about
what remote repository actually contains a value. It's a prerequisite for
supporting git insteadOf aliases.
2011-10-06 19:24:11 -04:00
Joey Hess
f011033869 add timestamps to remote.log 2011-10-06 16:07:58 -04:00
Joey Hess
f929d0229c Add timestamps to trust.log. 2011-10-06 15:55:50 -04:00
Joey Hess
3e0d2a0803 add timestamp to uuid.log
* New or changed repository descriptions in uuid.log now have a timestamp,
  which is used to ensure the newest description is used when the uuid.log
  has been merged.
* Note that older versions of git-annex will display the timestamp as part
  of the repository description, which is ugly but otherwise harmless.
2011-10-06 15:31:25 -04:00
Joey Hess
d357556141 Add locking to avoid races when changing the git-annex branch. 2011-10-03 16:32:36 -04:00
Joey Hess
49f21dd9ba Contain the zombie hordes.a
Specifically, when using gpg, a zombie is forked for each file, so waiting
until shutdown to reap won't do.
2011-10-02 11:16:34 -04:00
Joey Hess
29032cb70e When displaying a list of repositories, show git remote names in addition to their descriptions. 2011-09-30 15:02:29 -04:00
Joey Hess
828f3f1b0c status: List all known repositories. 2011-09-30 03:20:24 -04:00
Joey Hess
a7e7dda55a Fix referring to remotes by uuid.
I think that I broke this in some fairly recent refactoring.
2011-09-30 02:23:24 -04:00
Joey Hess
7ff89ccfee convert all git read/write functions to use ByteStrings
This yields a second or so speedup in unused, find, etc. Seems that even
when the ByteString is immediately split and then converted to Strings,
it's faster.

I may try to push ByteStrings out into more of git-annex gradually,
although I suspect most of the time-critical parts are already covered
now, and many of the rest rely on libraries that only support Strings.
2011-09-29 23:48:57 -04:00
Joey Hess
a91c8a15d5 Sped up unused.
Added Git.ByteString which replaces Git IO methods with ones using lazy
ByteStrings. This can be more efficient when large quantities of data are
being read from git.

In Git.LsTree, parse git ls-tree output more efficiently, thanks
to ByteString. This benchmarks 25% faster, in a benchmark that includes
(probably predominately) the run time for git ls-tree itself.

In real world numbers, this makes git annex unused 2 seconds faster for
each branch it needs to check, in my usual large repo.
2011-09-29 19:04:24 -04:00
Joey Hess
7dddb803a0 releasing version 3.20110928 2011-09-28 19:17:12 -04:00
Joey Hess
d75da353b9 documentation/warning message update for future feature 2011-09-23 18:04:38 -04:00
Joey Hess
9f5c7a246b status: Massively sped up; remove --fast mode.
Using Sets is the right thing; they have constant size lookup like my
SizeList, and logn insertation, which beats nub to death.

Runs faster than --fast mode did before, and gives accurate counts.

13 seconds total runtime with a warm cache in a repository with 40 thousand
keys.
2011-09-20 18:57:05 -04:00
Joey Hess
cabbefd9d2 status: In --fast mode, all status info is displayed now; but some of it is only approximate, and is marked as such. 2011-09-20 18:13:08 -04:00
Joey Hess
a4aef6f115 clarify wording 2011-09-19 01:54:20 -04:00
Joey Hess
33cd1ffbfe make find show files meeting limits, even when not present
find: Rather than only showing files whose contents are present, when used
with --exclude --copies or --in, displays all files that match the
specified conditions.

Note that this is a behavior change for find --exclude! Old behavior
can be gotten with find --in . --exclude=...
2011-09-18 20:42:15 -04:00
Joey Hess
9da23dff78 --copies=N can be used to make git-annex only operate on files with the specified number of copies.
(And --not --copies=N for the inverse.)
2011-09-18 20:23:08 -04:00
Joey Hess
1fc3ee2423 add --in limit 2011-09-18 20:14:18 -04:00
Joey Hess
3e73de4054 releasing version 3.20110915 2011-09-17 09:21:09 -04:00
Joey Hess
d036cd590f bugfix: drop and fsck did not honor --exclude 2011-09-15 15:44:32 -04:00
Joey Hess
a0d3a343b5 copy --auto
Only does copy when numcopies is not yet satisfied.
2011-09-15 15:28:58 -04:00
Joey Hess
984c9fc052 remove optimize subcommand; use --auto instead
get, drop: Added --auto option, which decides whether to get/drop content
as needed to work toward the configured numcopies.

The problem with bundling it up in optimize was that I then found I wanted
to run an optmize that did not drop files, only got them. Considered adding
a --only-get switch to it, but that seemed wrong. Instead, let's make
existing subcommands optionally smarter.

Note that the only actual difference between drop and drop --auto is that
the latter does not even try to drop a file if it knows of not enough
copies, and does not print any error messages about files it was unable to
drop.

It might be nice to make get avoid asking git for attributes when not in
auto mode. For now it always asks for attributes.
2011-09-15 13:30:04 -04:00
Joey Hess
949b3f69d0 optimize: A new subcommand that either gets or drops file content as needed to work toward meeting the configured numcopies setting.
This is currently rather simplistic, though still useful.
In the future, it could become smarter about what content is stored where,
etc.
2011-09-14 13:47:22 -04:00
Joey Hess
03d6209e1c addurl: Always use whole url as destination filename, rather than only its file component.
First, this ensures that git annex addurl, when run repeatedly with the
same url, doesn't create duplicate files, which it did before when it
fell back to the longer filename.

Secondly, the file part of an url is frequently not very descriptive on its
own.

The uri scheme, auth, and port is intentionally left out, as clutter.
2011-09-07 19:04:51 -04:00
Joey Hess
72b54d6170 Fix build without S3. 2011-09-07 10:21:19 -04:00
Joey Hess
6f98fd5391 whereis: Show untrusted locations separately and do not include in location count. 2011-09-06 16:59:53 -04:00
Joey Hess
6fd0df7c2f releasing version 3.20110906 2011-09-06 15:54:21 -04:00
Joey Hess
ebb92221fd Fix Makefile to work with cabal again. 2011-09-06 15:35:13 -04:00
Joey Hess
07125dca53 Improve display of newlines around error and warning messages. 2011-09-06 13:46:08 -04:00
Joey Hess
d238bbd9d9 releasing version 3.20110902 2011-09-02 21:32:05 -04:00
Joey Hess
2f4d4d1c45 basic json support
This includes a generic JSONStream library built on top of Text.JSON
(somewhat hackishly).

It would be possible to stream out a single json document describing
all actions, but it's probably better for consumers if they can expect
one json document per line, so I did it that way instead.

Output from external programs used for transferring files is not
currently hidden when outputting json, which probably makes it not very
useful there. This may be dealt with if there is demand for json
output for --get or --move to be parsable.

The version, status, and find subcommands have hand-crafted output and
don't do json. The whereis subcommand needs to be modified to produce
useful json.
2011-09-01 15:22:06 -04:00
Joey Hess
f600444ab6 unused --remote: Reduced memory use to 1/4th what was used before.
Using a single strictness annotation, in just the right place.
Tried several others, none of which helped and some of which potentially
hurt. This is only the second time I've really had to deal with this in
a year of using haskell, which is, I suppose not that bad.
2011-08-31 19:13:02 -04:00
Joey Hess
ea7b1828d4 unused, status: Sped up by avoiding unnecessary stats of annexed files.
Statting files returned by dirContents to see if they exist and are regular
files seems pretty useless. This code was originally part of fsck, and
perhaps the idea then was to avoid things returned by dirContents that were
not files. But it's certianly not needed in the current use cases for
getKeysPresent.
2011-08-30 15:16:34 -04:00
Joey Hess
d1154d0837 init: Make description an optional parameter. 2011-08-29 14:13:38 -04:00
Joey Hess
6e750764b7 The wget command will now be used in preference to curl, if available.
Got tired of curl's various ugly progress bars.
2011-08-27 12:31:50 -04:00
Joey Hess
20259c2955 Set EMAIL when running test suite so that git does not need to be configured first. Closes: #638998 2011-08-23 13:41:32 -04:00
Joey Hess
06f509854a file moved 2011-08-21 13:19:33 -04:00
Joey Hess
3786f8d348 releasing version 3.20110819 2011-08-19 20:38:36 -04:00
Joey Hess
01cd775d92 Fix broken upgrade from V1 repository. Closes: #638584
Had forgotten to keep several old versions of functions needed during this
upgrade.
2011-08-19 20:32:18 -04:00
Joey Hess
8a2197adfa Added annex-cost-command configuration, which can be used to vary the cost of a remote based on the output of a shell command.
Also avoided crashing if the user specified cost value cannot be parsed.
2011-08-18 12:20:47 -04:00
Joey Hess
56f6923ccb Now "git annex init" only has to be run once
when a git repository is first being created. Clones will automatically
notice that git-annex is in use and automatically perform a basic
initalization. It's still recommended to run "git annex init" in any
clones, to describe them.
2011-08-17 14:44:31 -04:00
Joey Hess
f0c2130700 releasing version 3.20110817 2011-08-17 01:34:15 -04:00
Joey Hess
4a023dd1aa Added curl to Debian package dependencies. 2011-08-16 22:22:00 -04:00
Joey Hess
e6752cc064 Added support for getting content from git remotes using http (and https). 2011-08-16 21:12:48 -04:00
Joey Hess
dede05171b addurl: --fast can be used to avoid immediately downloading the url.
The tricky part about this is that to generate a key, the file must be
present already. Worked around by adding (back) an URL key type, which
is used for addurl --fast.
2011-08-06 14:57:22 -04:00
Joey Hess
45bbf210a1 Fix shell escaping in rsync special remote. 2011-07-29 15:28:21 +02:00
Joey Hess
a8a71b9d91 releasing version 3.20110719 2011-07-19 23:52:09 -04:00
Joey Hess
ec9e9343d9 add closure for new bug that I already fixed 2011-07-17 19:05:50 -04:00
Joey Hess
ded2591124 unannex: Clean up use of git commit -a.
This was more complex than would be expected. unannex has to use git commit -a
since it's removing files from git; git commit filelist won't do.

Allow commands to be added to the Git queue that have no associated files,
and run such commands once.
2011-07-14 17:15:37 -04:00
Joey Hess
0c46cbab09 Support the standard git -c name=value
This allows eg, `git-annex -c annex.rsync-options=-6 get file`

The overridden git configs are not passed on to git plumbing commands
that are run. Perhaps someone will find a need to do that, but I don't yet
and it would require storing more state to know what config settings
have been overridden and need to be passed on.
2011-07-14 16:51:20 -04:00
Joey Hess
7919de73af Bugfix: Make add ../ work.
The complication of check-attr returning absolute paths that have to be
converted back to relative paths..
2011-07-10 13:52:53 -04:00
Joey Hess
40c6ba99f5 add: Be even more robust to avoid ever leaving the file seemingly deleted.
A failure at any point after the file is annexed will result in an undo
that puts the original file back into place and wipes the location log.
2011-07-07 21:30:51 -04:00
Joey Hess
2a108982ad add monad-control to build depends
Will use this to handle exceptions in the Annex monad, yay.
2011-07-07 20:53:57 -04:00
Joey Hess
4d4f297c96 releasing version 3.20110707 2011-07-07 19:37:49 -04:00
Joey Hess
67dcc1f171 add: Avoid a failure mode that resulted in the file seemingly being deleted (content put in the annex but no symlink present). 2011-07-07 19:29:36 -04:00
Joey Hess
2fb771f135 Bugfix: Forgot to de-escape keys when upgrading.
Could result in bad location log data for keys that contain [&:%] in their
names. (A workaround for this problem is to run git annex fsck.)

`git annex unused --from remote` could also run into the broken code.
2011-07-07 17:04:21 -04:00
Joey Hess
497b1e6092 Fix sign bug in disk free space checking.
Giulio Eulisse reported that on OSX, bad free space numbers were being
shown. It thought he had negative free space.

While the documentation is not clear, especially across OS's, it seems
likely that statfs uses unsigned long. It doesn't make sense for any
numbers to be negative.
2011-07-05 20:53:58 -04:00
Joey Hess
d583e04d23 releasing version 3.20110705 2011-07-05 15:21:38 -04:00
Joey Hess
5c69ac14eb Drop the dependency on the haskell curl bindings, use regular haskell HTTP. 2011-07-04 19:33:11 -04:00
Joey Hess
71c783bf24 uninit: Use unannex in --fast mode, to support unannexing multiple files that link to the same content. 2011-07-04 16:20:50 -04:00
Joey Hess
22a4f5b348 unannex: In --fast mode, file content is left in the annex, and a hard link made to it. 2011-07-04 16:06:28 -04:00
Joey Hess
5beb6bc76f uninit: delete .git/annex/ 2011-07-04 15:55:03 -04:00
Joey Hess
5c63b409d4 uninit: Delete the git-annex branch. 2011-07-04 15:50:30 -04:00
Joey Hess
48db40857c releasing version 3.20110702 2011-07-02 15:08:05 -04:00
Joey Hess
457d28c676 wording 2011-07-01 17:24:11 -04:00
Joey Hess
a140f7148f documentation for using the web 2011-07-01 16:05:06 -04:00
Joey Hess
2cdacfbae6 remove URL backend 2011-07-01 16:01:04 -04:00
Joey Hess
6ba866ca73 updates for web remote and removing URL backend 2011-07-01 15:39:30 -04:00
Joey Hess
cdbcd6f495 add web special remote
Generalized LocationLog to PresenceLog, and use a presence log to record
urls for the web special remote.
2011-07-01 15:30:42 -04:00
Joey Hess
ee3a0551a7 Merge branch 'master' into v3
Conflicts:
	debian/changelog
2011-06-30 15:01:08 -04:00
Joey Hess
56aeeb4565 cabal can now be used to build git-annex.
This is substantially slower than using make, does not build or install
documentation, does not run the test suite, and is not particularly
recommended, but could be useful to some.
2011-06-30 14:55:03 -04:00
Joey Hess
8562e6096c v3 is now faster than v2
Rebenchmarked v2 vs v3, and v3 is now actually faster. Yes, storing data
in git, using git as a filesystem is actually faster than just using the
filesystem. If you do it just right. :)
2011-06-30 01:16:53 -04:00
Joey Hess
d72fb5acc2 Fix encoding of utf-8 etc when storing the description of repository and other content.
Write files in raw mode, to avoid mangling the encoding of content
provided.

Note: This was a longstanding problem, it was not introduced in v3.
2011-06-30 00:35:51 -04:00
Joey Hess
e1c18ddec4 Sped back up fsck, copy --from etc
All commands that often have to read a lot of information from
the git-annex branch should now be nearly as fast as before
the branch was introduced.

Before fsck was taking approximatly 3 hours, now it's running in 8 minutes.

The code is very nasty. It should be rewritten to read the header line
from git cat-file, and then read the specified number of bytes of content.
2011-06-29 21:47:31 -04:00
Joey Hess
af45d42224 Merge branch 'master' into v3
Conflicts:
	debian/changelog
2011-06-29 11:42:35 -04:00
Joey Hess
b3aaf980e4 --force will cause add, etc, to operate on ignored files. 2011-06-29 11:42:00 -04:00
Joey Hess
5034d8c298 Modify location log parser to allow future expansion.
Since the logs have just been moved into the git-annex branch, don't need
to worry about backwards compatability with old versions of git-annex that
would fail to parse location logs with extra fields tacked on.
2011-06-28 16:15:50 -04:00
Joey Hess
c90652f015 Always ensure git-annex branch exists. 2011-06-26 22:43:48 -04:00
Joey Hess
874fc044c1 releasing version 3.20110624 2011-06-24 14:58:07 -04:00
Joey Hess
068703c405 improve post-upgrade push instructions 2011-06-23 14:51:04 -04:00
Joey Hess
7ee636f6dd avoid unnecessary read of trust.log 2011-06-23 13:39:04 -04:00
Joey Hess
66ceb92702 docs 2011-06-22 23:37:46 -04:00
Joey Hess
68783fd5e0 let's have the major version number be annex.version 2011-06-22 23:02:58 -04:00
Joey Hess
ad3770e0b2 add merge subcommand 2011-06-22 18:46:56 -04:00
Joey Hess
80302d0b46 improve bare repo handing
Many more commands can work in bare repos now, thanks to the git-annex
branch.
2011-06-22 18:32:41 -04:00
Joey Hess
818ae0c6da docs for v3 2011-06-21 20:21:33 -04:00
Joey Hess
9f9e17aa0f unlock: Made atomic. 2011-06-20 22:38:18 -04:00
Joey Hess
c835166a7c add git-union-merge
This is a new git subcommand, that does a generic union merge operation
between two refs, storing the result in a branch. It operates efficiently
without touching the working tree. It does need to write out a temporary
index file, and may need to write out some other temp files as well.

This could be useful for anything that stores data in a branch,
and needs to merge changes into that branch without actually checking the
branch out. Since conflict handling can't be done without a working copy,
the merge type is always a union merge, which is fine for data stored in
log format (as git-annex does), or in non-conflicting files
(as pristine-tar does).

This probably belongs in git proper, but it will live in git-annex for now.

---

Plan is to move .git-annex/ to a git-annex branch, and use git-union-merge
to handle merging changes when pulling from remotes.

Some preliminary benchmarking using real .git-annex/ data indicates
that it's quite fast, except for the "git add" call, which is as slow
as "git add" tends to be with a big index.
2011-06-20 21:37:18 -04:00
Joey Hess
f547277b75 Allow --trust etc to specify a repository by name, for temporarily trusting repositories that are not configured remotes. 2011-06-13 22:19:44 -04:00
Joey Hess
30d7cce7ec rsync is now used when copying files from repos on other filesystems
cp is still used when copying file from repos on the same filesystem, since
--reflink=auto can make it significantly faster on filesystems such as
btrfs.

Directory special remotes still use cp, not rsync. It's not clear what
tmp file should be used when rsyncing to such a remote.
2011-06-13 20:33:52 -04:00
Joey Hess
38e0100a69 releasing version 0.20110610 2011-06-10 11:58:21 -04:00
Joey Hess
9a272815dd Bugfix: Fix fsck to not think all SHAnE keys are bad. 2011-06-10 11:43:28 -04:00
Joey Hess
90dd245522 get --from is the same as copy --from
get not honoring --from has surprised me a few times, so least surprise
suggests it should just behave like copy --from. This leaves the difference
between get and copy being that copy always requires the remote to copy
from, while get will decide whether to get a file from a key/value store or
a remote.
2011-06-09 18:54:49 -04:00
Joey Hess
a8fb97d2ce Add --trust, --untrust, and --semitrust options. 2011-06-01 17:57:31 -04:00
Joey Hess
3d567aa64f Add --numcopies option. 2011-06-01 16:49:17 -04:00
Joey Hess
dc92a788c7 releasing version 0.20110601 2011-06-01 12:00:25 -04:00
Joey Hess
038da52bdd Somewhat sped up git commit of modifications to unlocked files.
Avoid git reset here too, so I no longer need to care that it's much more
expensive than seems wise (but I asked the git list about that anyway).

It's not necessary to reset the staged file content from the index, as
the `git add` of the the symlink will replace it anyway.

`git commit` of unlocked files is still slow, since git still has to shove
their entire content into the index, only to have it be thrown away. So it's
still better to use `git annex add`
2011-05-31 16:08:37 -04:00
Joey Hess
fb259033d4 Fix locking of files with staged changes.
Previously, lock would skip files that had staged changes, but that is
counterintuitive, I think.
2011-05-31 15:00:56 -04:00
Joey Hess
fafe60768f Massively sped up git annex lock by avoiding use of the uber-slow git reset, and only running git checkout once, even when many files are being locked. 2011-05-31 14:50:41 -04:00
Joey Hess
14ffb5d47b bugfix: fix unused list numbering
Introduced in 43f0a666f0
2011-05-28 22:30:06 -04:00
Joey Hess
7ea54e1c6e releasing version 0.20110522 2011-05-27 20:28:01 -04:00
Joey Hess
82b88d0676 typo 2011-05-27 20:21:13 -04:00
Joey Hess
001edb008a Fix bug in --exclude introduced in 0.20110516. 2011-05-27 20:20:20 -04:00
Joey Hess
5b941980aa Closer emulation of git's behavior when told to use "foo/.git" as a git repository instead of just "foo". Closes: #627563 2011-05-22 14:12:16 -04:00
Joey Hess
8ed27db18f add explict build dep on hslogger
pulled in by missingh, but now used directly by git-annex
2011-05-21 13:03:13 -04:00
Joey Hess
944b1207dc releasing version 0.20110521 2011-05-21 11:58:35 -04:00
Joey Hess
93a4f3d4e6 Add --debug option. Closes: #627499
This takes advantage of the debug logging done by missingh, and I added
my own debug messages for executeFile calls. There are still some other
low-level ways git-annex runs stuff that are not shown by debugging,
but this gets most of it easily.
2011-05-21 11:52:13 -04:00
Joey Hess
cd83541872 --backend now overrides any backend configured in .gitattributes files. 2011-05-18 19:34:46 -04:00
Joey Hess
a8816efc14 status: New subcommand to show info about an annex, including its size. 2011-05-16 21:18:34 -04:00
Joey Hess
3ab15b9f4f releasing version 0.20110516 2011-05-16 15:01:05 -04:00
Joey Hess
5256a6b011 migrate: Use current filename when generating new key, for backends where the filename affects the key name. 2011-05-16 12:10:08 -04:00
Joey Hess
e7b309ce02 clarify 2011-05-16 11:49:52 -04:00
Joey Hess
2a8efc7af1 Added filename extension preserving variant backends SHA1E, SHA256E, etc. 2011-05-16 11:46:34 -04:00
Joey Hess
1d2984441c add a few tweaks to make it easy to use the Internet Archive's variant of S3
In particular, munge key filenames to comply with the IA's filename limits,
disable encryption, support their nonstandard way of creating buckets, and
allow x-amz-* headers to be specified in initremote to set item metadata.

Still TODO: initremote does not handle multiword metadata headers right.
2011-05-16 11:20:35 -04:00
Joey Hess
078a6fbd76 Work around a bug in Network.URI's handling of bracketed ipv6 addresses. 2011-05-06 15:21:30 -04:00
Joey Hess
86d3205061 releasing version 0.20110503 2011-05-03 21:49:20 -04:00
Joey Hess
1f84c7a964 S3: When encryption is enabled, the Amazon S3 login credentials are stored, encrypted, in .git-annex/remotes.log, so environment variables need not be set after the remote is initialized. 2011-05-01 14:05:10 -04:00
Joey Hess
43f0a666f0 unused: Now also lists files fsck places in .git/annex/bad/ 2011-04-29 13:59:00 -04:00
Joey Hess
eef3f634e9 Avoid crashing when an existing key is readded to the annex. 2011-04-28 20:41:40 -04:00
Joey Hess
07576f2a2c documentation for hook special remotes
Releasing before I have quite finished the code. Got a little caught
up in Anathem references. Time for a walk and then a tiny bit more coding
and possibly testing.
2011-04-28 15:26:21 -04:00
Joey Hess
d7b330b33b Fix hasKeyCheap setting for bup and rsync special remotes. 2011-04-28 14:39:51 -04:00
Joey Hess
84e1ebfb0e erm, thought I committed this release? 2011-04-28 14:38:01 -04:00
Joey Hess
7a33803193 Avoid pipeline stall when running git annex drop or fsck on a lot of files.
When it's stalled, there are 3 processes:

git annex
  git ls-files
  git check-attr

git-annex stalls trying to write to git check-attr, which stalls trying to
write to stdout (read by git-annex).

git ls-files does not seem to be involved directly; I've seen the stall when
it was still streaming out the file list, and after it had exited and
zombified.

The read and write are supposed to be handled by two different threads,
which pipeBoth forks off, thus avoiding deadlock. But it does deadlock.
(Certian signals unblock the deadlock for a while, then it stalls again.)

So, this is another case of WTF is the ghc IO manager doing today?
I avoid the issue by converting the writer to a separate process.

Possibly this was caused by some change in ghc 7 -- I'm offline and cannot
verify now, but I'm sure I used to be able to run git annex drop w/o it
hanging! And the code does not seem to have changed, except for commit
c1dc407941, which I tried reverting without
success. In fact, I reverted all the way back to 0.20110316 and still
saw the stall.

Update: Minimal test case:

import System.Cmd.Utils

main = do
	as <- checkAttr "blah" $ map show [1..100000]
	sequence $ map (putStrLn . show) as

checkAttr attr files = do
	(_, s) <- pipeBoth "git" params $ unlines files
	return $ lines s
	where
		params = ["check-attr", attr, "--stdin"]

Bug filed on ghc in debian, #624389
2011-04-27 23:18:35 -04:00
Joey Hess
39966ba4ee filter out --delete rsync option
rsync does not have a --no-delete, so do it this way instead
2011-04-27 20:31:56 -04:00
Joey Hess
e68f128a9b rsync special remote
Fully tested and working, including resuming and encryption. (Though not
resuming when sending *with* encryption; gpg doesn't produce identical
output each time.)

Uses same layout as the directory special remote and the .git/annex/objects/
directory.
2011-04-27 20:23:09 -04:00