Commit graph

512 commits

Author SHA1 Message Date
Joey Hess
ec98581112 notice deleted files on startup 2012-06-04 18:14:42 -04:00
Joey Hess
5b4e5ce7e5 deletion
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.

But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.

To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.

Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
2012-06-04 18:09:18 -04:00
Joey Hess
659e6b1324 suppress "recording state in git" message during add 2012-06-04 17:18:54 -04:00
Joey Hess
677ad74687 add handling of symlink addition events
And just like that, annexed files can be moved and copies around within
the tree, and are automatically fixed to point to the content, and staged
in git. Huzzah!

Delete still remains TODO, with its troublesome race during add..
2012-06-04 15:10:43 -04:00
Joey Hess
7053f5f947 handle directory deletion
When a directory is deleted, or moved away, git rm -r it to stage
the deletion.
2012-06-04 13:30:30 -04:00
Joey Hess
23dbff4b43 add events for symlink creation and directory removal
Improved the inotify code, so it will also notice directory removal
and symlink creation.

In the watch code, optimised away a stat of a file that's being added,
that's done by Command.Add.start. This is the reason symlink creation is
handled separately from file creation, since during initial tree walk
at startup, a stat was already done, and can be reused.
2012-06-04 13:22:56 -04:00
Joey Hess
eab3872d91 Merge branch 'master' into watch 2012-06-04 12:07:59 -04:00
Joey Hess
3a10095d40 import: New subcommand, pulls files from a directory outside the annex and adds them
Use case for this was developed somewhere on the Transiberian Railroad.
2012-05-31 19:47:18 -04:00
Joey Hess
65977a5584 lock: Reset unlocked file to index, rather than to branch head.
Resetting an unlocked file to the branch head failed if it had just been
added, not committed, and unlocked, since the branch didbn't have it.

The code was concerned about dropping any changes that might be staged in the
index, but I cannot see why.
2012-05-30 17:01:22 -04:00
Joey Hess
6e213d04f1 sync: Show a nicer message if a user tries to sync to a special remote. 2012-05-27 20:55:56 -04:00
Joey Hess
bb4f31a0ee Clean up handling of git directory and git worktree.
Baked into the code was an assumption that a repository's git directory
could be determined by adding ".git" to its work tree (or nothing for bare
repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are
used to separate the two.

This was attacked at the type level, by storing the gitdir and worktree
separately, so Nothing for the worktree means a bare repo.

A complication arose because we don't learn where a repository is bare
until its configuration is read. So another Location type handles
repositories that have not had their config read yet. I am not entirely
happy with this being a Location type, rather than representing them
entirely separate from the Git type. The new code is not worse than the
old, but better types could enforce more safety.

Added support for core.worktree. Overriding it with -c isn't supported
because it's not really clear what to do if a git repo's config is read, is
not bare, and is then overridden to bare. What is the right git directory
in this case? I will worry about this if/when someone has a use case for
overriding core.worktree with -c. (See Git.Config.updateLocation)

Also removed and renamed some functions like gitDir and workTree that
misused git's terminology.

One minor regression is known: git annex add in a bare repository does not
print a nice error message, but runs git ls-files in a way that fails
earlier with a less nice error message. This is because before --work-tree
was always passed to git commands, even in a bare repo, while now it's not.
2012-05-18 17:03:12 -04:00
Joey Hess
f7d8982672 Fix use of several config settings
annex.ssh-options, annex.rsync-options, annex.bup-split-options.

And adjust types to avoid the bugs that broke several config settings
recently. Now "annex." prefixing is enforced at the type level.
2012-05-05 20:16:56 -04:00
Joey Hess
392931eca9 addunused: New command, the opposite of dropunused, it relinks unused content into the git repository. 2012-05-02 14:59:05 -04:00
Joey Hess
8f45300479 dropunused: Allow specifying ranges to drop.
Sort of by popular demand, but the last straw for not using seq
was that it can run into command line length limits.
2012-05-02 13:15:19 -04:00
Joey Hess
0c9c14b52f percentage library 2012-04-29 17:48:07 -04:00
Joey Hess
d2bfba6324 show percent the bloom filter is full 2012-04-29 16:10:47 -04:00
Joey Hess
eedde34549 show amount of reserved space 2012-04-23 10:37:05 -04:00
Joey Hess
84ac8c58db Add annex.httpheaders and annex.httpheader-command config settings
Allow custom headers to be sent with all HTTP requests.

(Requested by the Internet Archive)
2012-04-22 01:13:09 -04:00
Joey Hess
ed79596b75 noop 2012-04-21 23:32:33 -04:00
Joey Hess
7e45712d19 better file mode setting code 2012-04-21 16:01:56 -04:00
Joey Hess
b4a5e39ee6 Support git's core.sharedRepository configuration
This is incomplete, it does not honor it yet for hash directories
and other annex bookkeeping files. Some of that is not needed for a bare
repo; some of it may be.
2012-04-21 15:36:52 -04:00
Joey Hess
262017e17d export a more generalized checkDiskSpace 2012-04-20 16:06:10 -04:00
Joey Hess
d5ffd2d99d watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:

- notice deleted files and stage the deletion
  (tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes

Also nice to have would be:

- Somehow sync remotes, possibly using a push sync like dvcs-autosync
  does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
  since we only get an event once the user has tried (and failed) to read
  from the file. Perhaps instead, automatically copy content that is added
  out to remotes, with the goal of all repos eventually getting a copy,
  if df allows.
- Drop files that have not been used lately, or meet some other criteria
  (as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
  although I cannot see a way to do that, since by the time the inotify
  deletion event arrives, the file is deleted, and we cannot see what
  its symlink pointed to! Alternatievely, perhaps automatically
  do an expensive unused/dropunused cleanup process.

Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 17:42:05 -04:00
Joey Hess
fcc08c59ec use unabbreviated size units in status 2012-04-06 14:54:41 -04:00
Joey Hess
e38a839a80 Rewrote free disk space checking code
Moving the portability handling into a small C library cleans up things
a lot, avoiding the pain of unpacking structs from inside haskell code.
2012-03-22 17:32:47 -04:00
Joey Hess
f1398b5583 use new getConfig 2012-03-22 17:32:47 -04:00
Joey Hess
4eb5112681 rationalize getConfig
getConfig got a remote-specific config, and this confusing name caused it
to be used a couple of places that only were interested in global configs.
Rename to getRemoteConfig and make getConfig only get global configs.

There are no behavior changes here, but remote.<name>.annex-web-options
never actually worked (and per-remote web options is a very unlikely to be
useful case so I didn't make it work), so fix the documentation for it.
2012-03-22 17:32:47 -04:00
Joey Hess
52b90e5d4c tweak 2012-03-22 17:32:47 -04:00
Joey Hess
188e2edc41 status: Prints available local disk space, or shows if git-annex doesn't know. 2012-03-21 21:55:02 -04:00
Joey Hess
a362c46b70 fun with symbols
Nothing at all on hackage is using <&&> or <||>.

(Also, <&&> should short-circuit on failure.)
2012-03-17 00:38:40 -04:00
Joey Hess
771052a85e optimize monadic ||
(||) used applicative style runs both conditions rather than short
circuiting. Add an orM that properly short-circuits.
2012-03-16 12:28:17 -04:00
Joey Hess
60ab3d84e1 added ifM and nuked 11 lines of code
no behavior changes
2012-03-14 17:43:34 -04:00
Joey Hess
342fc28437 Merge branch 'master' into bloom
Conflicts:
	Command/Commit.hs
	debian/changelog
2012-03-14 12:41:48 -04:00
Joey Hess
6cb4743cfb ignore hook exit status 2012-03-14 12:41:00 -04:00
Joey Hess
5b869eef91 git-annex-shell: Runs hooks/annex-content after content is received or dropped. 2012-03-14 12:18:10 -04:00
Joey Hess
caf97fcffd git-annex-shell: Runs hooks/annex-content after content is received or dropped. 2012-03-14 12:01:56 -04:00
Joey Hess
94aff8b878 Merge branch 'master' into bloom
Conflicts:
	debian/changelog
2012-03-12 16:32:29 -04:00
Joey Hess
25809ce2e0 finish bloom filters
Add tuning, docs, etc.

Not sure if status is the right place to remote size.. perhaps unused
should report the size and also warn if it sees more keys than the bloom
filter allows?
2012-03-12 16:18:35 -04:00
Joey Hess
faf3a94fa7 added second stage bloom filter 2012-03-12 15:21:58 -04:00
Joey Hess
32f9742a88 fixed bloom filter creation space leak
it works!
2012-03-12 14:09:43 -04:00
Joey Hess
160715166b try at using bloom filters
leaks memory
2012-03-12 02:39:25 -04:00
Joey Hess
89ee70c43a status: More accurate display of sizes of tmp and bad keys.
Can't trust the key size to be accurate for tmp and bad keys, so check
actual file size. In the wild I saw the old code be wrong by a factor
of about 100!

If all tmp/bad keys are empty, they're not shown in status at all.
Showing 0 bytes and suggesting to clean it up seemed weird..
2012-03-12 00:41:48 -04:00
Joey Hess
83bbb3bc93 prettify 2012-03-11 21:21:51 -04:00
Joey Hess
5df18b311a avoid needing to keep list of present keys
Stale and bad files are rare, so it's more efficient to use inAnnex to see
if they can be deleted, rather than keeping the list of all present keys
around for them.
2012-03-11 20:46:03 -04:00
Joey Hess
ff3644ad38 status: Fixed to run in nearly constant space.
Before, it leaked space due to caching lists of keys. Now all necessary
data about keys is calculated as they stream in.

The "nearly constant" is due to getKeysPresent, which builds up a lot
of [] thunks as it traverses .git/annex/objects/. Will deal with it later.
2012-03-11 17:15:58 -04:00
Joey Hess
b086e32c63 unused: Reduce memory usage significantly.
Much of the memory bloat turned out to be due to getKeysReferenced
containing a mapM, which is strict and buffered the whole list
rather than streaming it.

The other half of the bloat was due to building a temporary Set
in order to call S.difference. While that is more cpu efficient,
I switched to successive S.delete, since with it, I can run a whole
git annex unused in less than 8 mb of memory.

The whole Set of keys with content available is still stored in memory,
so running unused in a repo with a whole lot of file content will still
use more memory. In a repo containing 6000 files, it needed 40 mb.

Note that the status command still uses the bloatful getKeysReferenced.
2012-03-11 16:24:07 -04:00
Joey Hess
997e29f294 sync: Sync to lower cost remotes first.
This has two benefits.

1. When a lot of refs are going to be received, get them via lower cost
   connection when possible.
2. Allows ctrl-c of sync after the cheaper remotes have been pulled from
   (or pushed to).
2012-03-10 15:37:38 -04:00
Joey Hess
5ab82230f7 fsck: Fix up any broken links and misplaced content caused by the directory hash calculation bug fixed in the last release. 2012-03-10 14:46:21 -04:00
Joey Hess
dc9049373e cleanup 2012-03-06 14:12:15 -04:00
Joey Hess
1098bc37ab "here" can be used to refer to the current repository, which can read better than the old "." (which still works too). 2012-03-01 22:35:10 -04:00