Commit graph

744 commits

Author SHA1 Message Date
Joey Hess
71b5ad8398 wrote transfer thread
finally!
2012-07-05 14:34:20 -06:00
Joey Hess
3ea708e03b Merge branch 'master' into assistant 2012-07-02 15:45:20 -04:00
Joey Hess
760e028dca pass associatedfile and remoteuuid to git-annex-shell
This *almost* works.

Along the way, I noticed that the --uuid parameter was being accidentially
passed after the --, so that has never been actually used by
git-annex-shell to verify it's running in the expected repository. Oops. Fixed.
2012-07-02 10:57:51 -04:00
Joey Hess
bea0ac0274 record transfers for git-annex-shell
Not yet tested and places git-annex-shell is run need to be modified to
pass the new field settings.

Note that rsyncServerSend was changed to fork, rather than directly exec
rsync, because it needs to keep the transfer lock held, and clean up the
transfer log when done.
2012-07-02 01:31:10 -04:00
Joey Hess
7625319c2c Merge branch 'master' into assistant 2012-07-01 21:00:43 -04:00
Joey Hess
7225c2bfc0 record transfer information on local git remotes
In order to record a semi-useful filename associated with the key,
this required plumbing the filename all the way through to the remotes'
storeKey and retrieveKeyFile.

Note that there is potential for deadlock here, narrowly avoided.
Suppose the repos are A and B. A sends file foo to B, and at the same
time, B gets file foo from A. So, A locks its upload transfer info file,
and then locks B's download transfer info file. At the same time,
B is taking the two locks in the opposite order. This is only not a
deadlock because the lock code does not wait, and aborts. So one of A or
B's transfers will be aborted and the other transfer will continue.
Whew!
2012-07-01 17:15:11 -04:00
Joey Hess
e5fd8b67b7 get, move, copy: Now refuse to do anything when the requested file transfer is already in progress by another process.
Note this is per-remote, so trying to get the same file from multiple
remotes can still let duplicate downloads run. (And uploading the same file
to multiple remotes is not duplicate at all of course.)

get, move, and copy are the only git-annex subcommands that transfer
files, but there's still git-annex-shell recvkey and sendkey to deal with too.

I considered modifying retrieveKeyFile or getViaTmp, but they are called
by other code that does not involve expensive file transfers (migrate)
or that does file transfers that should not be checked by this (fsck --from).
2012-07-01 17:15:11 -04:00
Joey Hess
2e501364d4 Merge branch 'master' into assistant 2012-06-27 18:09:11 -04:00
Joey Hess
2d7ebc0582 typo 2012-06-27 18:08:52 -04:00
Joey Hess
8baff14054 Merge branch 'master' into assistant 2012-06-27 16:14:33 -04:00
Joey Hess
36ddb81df6 use "variant" rather than "version"
While this word may be less familiar to some users, it avoids the
connotation that version 2 is better than version 1, which is wrong
when the two variants were conflicting.
2012-06-27 16:09:17 -04:00
Joey Hess
054ddda18a better filenames for conflict resolution files 2012-06-27 16:03:42 -04:00
Joey Hess
9147ad7493 commit merge resolution
this is necessary so the sync can continue successfully with its push phase
2012-06-27 15:06:47 -04:00
Joey Hess
8810e57995 fix file name 2012-06-27 15:00:26 -04:00
Joey Hess
abd36ed336 don't automerge when the symlinks cannot be parsed as keys 2012-06-27 13:35:02 -04:00
Joey Hess
048b64024a sync: Automatically resolves merge conflicts.
untested, but it compiles :)
2012-06-27 13:08:32 -04:00
Joey Hess
051c68041b properly handle deleted files when processing ls-files --unmerged 2012-06-27 12:11:03 -04:00
Joey Hess
d88ee75a2d Merge branch 'master' into assistant 2012-06-23 10:27:12 -04:00
Joey Hess
c79e3b67e9 sync: Avoid recent git's interactive merge. 2012-06-23 10:22:56 -04:00
Joey Hess
e9630e90de the syncer now pushes out changes to remotes, in parallel
Note that, since this always pushes branch synced/master to the remote, it
assumes that master has already gotten all the commits that are on the
remote merged in. Otherwise, fast-forward prevention may prevent the push.

That's probably ok, because the next stage is to automatically detect
incoming pushes and merge.
2012-06-22 15:49:48 -04:00
Joey Hess
3ee44cf8fe add assistant command
like watch, but more magic
2012-06-22 13:04:03 -04:00
Joey Hess
e0fdfb2e70 maintain set of files pendingAdd
Kqueue needs to remember which files failed to be added due to being open,
and retry them. This commit gets the data in place for such a retry thread.

Broke KeySource out into its own file, and added Eq and Ord instances
so it can be stored in a Set.
2012-06-20 16:31:46 -04:00
Joey Hess
483b1b08c6 Merge branch 'master' into watch 2012-06-20 13:15:59 -04:00
Joey Hess
dfccee2616 unused: Fix crash when file names contain invalid utf8.
Was decoding the git-cat-file of the symlink target as utf8, but that can't
do, unix filenames are from the 70's and need this shiny disco
fileSystemEncoding.
2012-06-20 12:57:00 -04:00
Joey Hess
57cf65eb6d fix kevent symlink creation 2012-06-19 02:40:21 -04:00
Joey Hess
3dac81d345 remove newly created tmp file before linking 2012-06-15 22:19:12 -04:00
Joey Hess
e32dda07ca better temp file handling 2012-06-15 22:16:00 -04:00
Joey Hess
1bae56e4a0 tweak 2012-06-15 22:06:59 -04:00
Joey Hess
53d2e81ffd Merge branch 'master' into watch 2012-06-15 15:20:11 -04:00
Joey Hess
ca9d94a0ad addurl: Was broken by a typo introduced 2 released ago, now fixed. Closes: #677576 2012-06-14 20:20:03 -04:00
Joey Hess
e0095b0bdc fishy commit 2012-06-14 00:01:48 -04:00
Joey Hess
ccc5005245 reorganize 2012-06-13 12:46:39 -04:00
Joey Hess
c31ddeda84 optimise link staging at startup
Now it starts really, really fast! Down from 15 minutes or so on my big
tree to around 1 minute.

The trick is to remember the last time the daemon was running. Links with a
ctime from before that point don't need to be restaged on startup (as long
as they are correct), since the old daemon would have handled them already.

We also assume that if the daemon has never run before, any links that
already exist are good. The pre-commit hook fixes links, so this should be
a safe assumption.

Adds another MVar holding a DaemonStatus data structure. Also
allowed getting rid of the Annex.Fast hack. This data structure will
probably grow a lot of details about the daemon's status, that will
later be used by the webapp's UI.

The code to actually track when the daemon was last running is not written
yet. It's 3 am.
2012-06-13 02:56:16 -04:00
Joey Hess
12dbb9d1d0 plumb file status through to event handlers
The idea, not yet done, is to use this to detect when a file
has an old change time, and avoid expensive restaging of the file.

If git-annex watch keeps track of the last time it finished a full scan,
then any symlink that is older than that time must have been scanned
before, so need not be added. (Relying on moving, copying, etc of a file
all updating its change time.)

Anyway, this info is available for free since inotify already checks it,
so it might as well make it available.
2012-06-13 01:20:37 -04:00
Joey Hess
ab076b2e81 move comment 2012-06-13 00:57:48 -04:00
Joey Hess
7d458c40db tweak 2012-06-12 19:36:11 -04:00
Joey Hess
cb2255e93a do fewer commits during long batch jobs
10 thousand queue size does not use appreciable memory in my testing.
2012-06-12 16:25:56 -04:00
Joey Hess
b240418acc better optimisation of add check
Now really only done in the startup scan.

It turns out to be quite hard for event handlers to know when the startup
scan is complete. I tried to make addWatch pass that info, but found
threading the state very difficult. For now, a quick hack, using the fast
flag.

Note that it's actually possible for inotify events to come in while the
startup scan is still ongoing. Due to my hack, the expensive check will
be done for files added in such inotify events.
2012-06-12 16:24:06 -04:00
Joey Hess
7d2c813396 fix bug that turned files already in git into symlinks
This requires a relatively expensive test at file add time to see if it's
in git already. But it can be optimised to only happen during the startup
scan.
2012-06-12 15:57:24 -04:00
Joey Hess
535d9e4998 add a flag indicating if an event was synthesized during initial dir scan 2012-06-12 14:34:09 -04:00
Joey Hess
d3b9b32f21 cleanup 2012-06-12 13:54:00 -04:00
Joey Hess
942d8f7298 hlint 2012-06-12 11:32:06 -04:00
Joey Hess
d3a6f04abf update 2012-06-11 15:41:26 -04:00
Joey Hess
7f3934520a avoid using STM while the MVar is held
I thought this might be a lock conflict that explains the deadlock when
built with -threaded, but it seems not.. it still locks! It even locks
without the committer thread.

Indeed, it locks when running "git annex add"! -threaded is exposing some
other problem.

Still, this seems conceptually cleaner and did not add any inneficiencies.
Also added some high-level documentation about the threads used.
2012-06-11 15:29:11 -04:00
Joey Hess
f7dbcd58ff tweak 2012-06-11 14:24:13 -04:00
Joey Hess
a5a3cd55ac Merge branch 'master' into watch
Conflicts:
	debian/changelog
2012-06-11 12:13:07 -04:00
Joey Hess
7f70767bfb uninit: Refuse to run in a subdirectory. Closes: #677076 2012-06-11 10:33:58 -04:00
Joey Hess
d0a0a6ae21 git annex watch --stop 2012-06-11 02:01:20 -04:00
Joey Hess
0b3e2bed78 add a pid file
Writes pid to a file. Is supposed to take an exclusive lock, but that's not
working, and it's too late for me to understand why.
2012-06-11 01:20:19 -04:00
Joey Hess
d5884388b0 daemonize git annex watch 2012-06-11 00:39:09 -04:00
Joey Hess
ca9ee21bd7 crazy optimisation
Crazy like a fox..
2012-06-10 19:58:34 -04:00
Joey Hess
c1b432ee54 run git add --update after inotify is started
This way, there's no window where deleted files won't be noticed.
2012-06-10 19:10:18 -04:00
Joey Hess
aae0ba1995 fixed the double commits problem 2012-06-10 18:41:05 -04:00
Joey Hess
fc0dd79774 avoid running pre-commit hook from watch commits 2012-06-10 17:53:17 -04:00
Joey Hess
cda6c4dff5 tweak 2012-06-10 17:40:35 -04:00
Joey Hess
2de50f733a smart commit thread
The commit thread now has access to a channel containing the times of
all uncommitted changes. This lets it be smart about detecting busy times
when a batch job is running (such as rm -rf, or untarring something, etc),
and avoid committing until it's done. While at the same time, instantly
committing one-off changes that the user is going to expect to see
immediately.

I had to use STM to implement the channel, because of
http://hackage.haskell.org/trac/ghc/ticket/4154
While this adds a dependency, I always wanted to use STM, so this actually
makes me happy. ;)

Also happy that shouldCommit is a pure function, so other commit smartness
strategies can easily be played with. Although the current one seems pretty
good.

There is one bug, for some reason it does double commits, every time.
2012-06-10 16:07:48 -04:00
Joey Hess
6e54907e35 add a thread to commit changes
Currently the stupidest possible version, just wakes up every second,
and may make empty commits sometimes.
2012-06-10 13:56:39 -04:00
Joey Hess
e5f855b7f8 generalize and improve state MVar code 2012-06-10 13:23:10 -04:00
Joey Hess
5308b51ec0 stage deletions directly using update-index
no need to run git-rm separately
2012-06-10 13:05:58 -04:00
Joey Hess
7f823b56af fix non-linux build 2012-06-09 14:06:56 -04:00
Joey Hess
d45a9a7831 refactor and function name cleanup
(oops, I had a calcMerge and a calc_merge!)
2012-06-08 00:29:39 -04:00
Joey Hess
7d78cbf97c use git queue for rm too 2012-06-07 21:17:10 -04:00
Joey Hess
20f425be19 make watch use the queue
May not work. Certianly needs to flush the queue from time to time
when only symlink changes are being made.
2012-06-07 15:40:44 -04:00
Joey Hess
0a11b35d89 extend Git.Queue to be able to queue more than simple git commands
While I was in there, I noticed and fixed a bug in the queue size
calculations. It was never encountered only because Queue.add was
only ever run with 1 file in the list.
2012-06-07 15:19:44 -04:00
Joey Hess
727158ff55 Merge branch 'master' into watch 2012-06-07 13:48:55 -04:00
Joey Hess
4d1c114e4d initremote: Automatically describe a remote when creating it.
This ensures that all special remotes show up in git annex status.
Before, a special remote that was not manually described, and was not
a current git remote, did not show up there, although initremote did list
it.
2012-06-07 11:16:48 -04:00
Joey Hess
d5de27ff40 tweak 2012-06-06 23:30:38 -04:00
Joey Hess
b8ae9528ab refactor 2012-06-06 23:20:09 -04:00
Joey Hess
b8f85f7a82 build watch on non-linux, just don't do anything 2012-06-06 22:49:32 -04:00
Joey Hess
c5b11561f0 handle running out of watch descriptors 2012-06-06 16:50:28 -04:00
Joey Hess
db8effb8f3 ignore .gitignore and .gitattributes 2012-06-06 15:50:12 -04:00
Joey Hess
b819f644ad close the git add race
There's a race adding a new file to the annex: The file is moved to the
annex and replaced with a symlink, and then we git add the symlink. If
someone comes along in the meantime and replaces the symlink with
something else, such as a new large file, we add that instead. Which could
be bad..

This race is fixed by avoiding using git add, instead the symlink is
directly staged into the index.

It would be nice to make `git annex add` use this same technique.
I have not done so yet because it currently runs git update-index once per
file, which would slow does `git annex add`. A future enhancement would be
to extend the Git.Queue to include the ability to run update-index with
a list of Streamers.
2012-06-06 14:29:10 -04:00
Joey Hess
993e6459a3 factor out nukeFile 2012-06-06 13:13:13 -04:00
Joey Hess
723eb19bbf split out utility functions 2012-06-06 13:07:30 -04:00
Joey Hess
a7a729bce4 Merge branch 'master' into watch 2012-06-05 20:30:37 -04:00
Joey Hess
c981ccc077 add: Prevent (most) modifications from being made to a file while it is being added to the annex.
Anything that tries to open the file for write, or delete the file,
or replace it with something else, will not affect the add.

Only if a process has the file open for write before add starts
can it still change it while (or after) it's added to the annex.
(fsck will catch this later of course)
2012-06-05 20:28:34 -04:00
Joey Hess
5809f33f8b use createAnnexDirectory when setting up tmp dir 2012-06-05 20:25:32 -04:00
Joey Hess
d3cee987ca separate source of content from the filename associated with the key when generating a key
This already made migrate's code a lot simpler.
2012-06-05 19:51:03 -04:00
Joey Hess
cbdaccd44a run event handlers all in the same Annex monad
Uses a MVar again, as there seems no other way to thread the state through
inotify events.

This is a rather unsatisfactory result. I had wanted to run them in
the same monad so that the git queue could be used to coleasce git commands
and speed things up. But, that led to fragility: If several files are
added, and one is removed before queue flush, git add will fail to add
any of them. So, the queue is still explicitly flushed after each add for
now.

TODO: Investigate using git add --ignore-errors. This would need to be done
in Command.Add. And, git add still exits nonzero with it, so would need
to avoid crashing on queue flush.
2012-06-04 21:21:52 -04:00
Joey Hess
48efa2d2d3 avoid explicit queue flush
The queue is still flushed on add, because each add event is handled by a
separate Annex monad. That needs to be fixed to speed up add a lot.
2012-06-04 20:44:15 -04:00
Joey Hess
bd7857d903 ignore-unmatch when removing a staged file
When a file is added, and then deleted before the add action runs,
the delete event was unhappy that the file never did get staged.
2012-06-04 20:13:25 -04:00
Joey Hess
cbf16f1967 refactor 2012-06-04 19:43:29 -04:00
Joey Hess
ec98581112 notice deleted files on startup 2012-06-04 18:14:42 -04:00
Joey Hess
5b4e5ce7e5 deletion
When a new file is annexed, a deletion event occurs when it's moved away
to be replaced by a symlink. Most of the time, there is no problimatic
race, because the same thread runs the add event as the deletion event.
So, once the symlink is in place, the deletion code won't run at all,
due to existing checks that a deleted file is really gone.

But there is a race at startup, as then the inotify thread is running
at the same time as the main thread, which does the initial tree walking
and annexing. It would be possible for the deletion inotify to run
in a perfect race with the addition, and remove the newly added symlink
from the git cache.

To solve this race, added event serialization via a MVar. We putMVar
before running each event, which blocks if an event is already running.
And when an event finishes (or crashes!), we takeMVar to free the lock.

Also, make rm -rf not spew warnings by passing --ignore-unmatch when
deleting directories.
2012-06-04 18:09:18 -04:00
Joey Hess
659e6b1324 suppress "recording state in git" message during add 2012-06-04 17:18:54 -04:00
Joey Hess
677ad74687 add handling of symlink addition events
And just like that, annexed files can be moved and copies around within
the tree, and are automatically fixed to point to the content, and staged
in git. Huzzah!

Delete still remains TODO, with its troublesome race during add..
2012-06-04 15:10:43 -04:00
Joey Hess
7053f5f947 handle directory deletion
When a directory is deleted, or moved away, git rm -r it to stage
the deletion.
2012-06-04 13:30:30 -04:00
Joey Hess
23dbff4b43 add events for symlink creation and directory removal
Improved the inotify code, so it will also notice directory removal
and symlink creation.

In the watch code, optimised away a stat of a file that's being added,
that's done by Command.Add.start. This is the reason symlink creation is
handled separately from file creation, since during initial tree walk
at startup, a stat was already done, and can be reused.
2012-06-04 13:22:56 -04:00
Joey Hess
eab3872d91 Merge branch 'master' into watch 2012-06-04 12:07:59 -04:00
Joey Hess
3a10095d40 import: New subcommand, pulls files from a directory outside the annex and adds them
Use case for this was developed somewhere on the Transiberian Railroad.
2012-05-31 19:47:18 -04:00
Joey Hess
65977a5584 lock: Reset unlocked file to index, rather than to branch head.
Resetting an unlocked file to the branch head failed if it had just been
added, not committed, and unlocked, since the branch didbn't have it.

The code was concerned about dropping any changes that might be staged in the
index, but I cannot see why.
2012-05-30 17:01:22 -04:00
Joey Hess
6e213d04f1 sync: Show a nicer message if a user tries to sync to a special remote. 2012-05-27 20:55:56 -04:00
Joey Hess
bb4f31a0ee Clean up handling of git directory and git worktree.
Baked into the code was an assumption that a repository's git directory
could be determined by adding ".git" to its work tree (or nothing for bare
repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are
used to separate the two.

This was attacked at the type level, by storing the gitdir and worktree
separately, so Nothing for the worktree means a bare repo.

A complication arose because we don't learn where a repository is bare
until its configuration is read. So another Location type handles
repositories that have not had their config read yet. I am not entirely
happy with this being a Location type, rather than representing them
entirely separate from the Git type. The new code is not worse than the
old, but better types could enforce more safety.

Added support for core.worktree. Overriding it with -c isn't supported
because it's not really clear what to do if a git repo's config is read, is
not bare, and is then overridden to bare. What is the right git directory
in this case? I will worry about this if/when someone has a use case for
overriding core.worktree with -c. (See Git.Config.updateLocation)

Also removed and renamed some functions like gitDir and workTree that
misused git's terminology.

One minor regression is known: git annex add in a bare repository does not
print a nice error message, but runs git ls-files in a way that fails
earlier with a less nice error message. This is because before --work-tree
was always passed to git commands, even in a bare repo, while now it's not.
2012-05-18 17:03:12 -04:00
Joey Hess
f7d8982672 Fix use of several config settings
annex.ssh-options, annex.rsync-options, annex.bup-split-options.

And adjust types to avoid the bugs that broke several config settings
recently. Now "annex." prefixing is enforced at the type level.
2012-05-05 20:16:56 -04:00
Joey Hess
392931eca9 addunused: New command, the opposite of dropunused, it relinks unused content into the git repository. 2012-05-02 14:59:05 -04:00
Joey Hess
8f45300479 dropunused: Allow specifying ranges to drop.
Sort of by popular demand, but the last straw for not using seq
was that it can run into command line length limits.
2012-05-02 13:15:19 -04:00
Joey Hess
0c9c14b52f percentage library 2012-04-29 17:48:07 -04:00
Joey Hess
d2bfba6324 show percent the bloom filter is full 2012-04-29 16:10:47 -04:00
Joey Hess
eedde34549 show amount of reserved space 2012-04-23 10:37:05 -04:00
Joey Hess
84ac8c58db Add annex.httpheaders and annex.httpheader-command config settings
Allow custom headers to be sent with all HTTP requests.

(Requested by the Internet Archive)
2012-04-22 01:13:09 -04:00
Joey Hess
ed79596b75 noop 2012-04-21 23:32:33 -04:00
Joey Hess
7e45712d19 better file mode setting code 2012-04-21 16:01:56 -04:00
Joey Hess
b4a5e39ee6 Support git's core.sharedRepository configuration
This is incomplete, it does not honor it yet for hash directories
and other annex bookkeeping files. Some of that is not needed for a bare
repo; some of it may be.
2012-04-21 15:36:52 -04:00
Joey Hess
262017e17d export a more generalized checkDiskSpace 2012-04-20 16:06:10 -04:00
Joey Hess
d5ffd2d99d watch subcommand
So far this only handles auto-annexing new files that are created inside
the repository while it's running. To make this really useful,
it needs to at least:

- notice deleted files and stage the deletion
  (tricky; there's a race with add..)
- notice renamed files, auto-fix the symlink, and stage the new file location
- periodically auto-commit staged changes
- honor .gitignore, not adding files it excludes

Also nice to have would be:

- Somehow sync remotes, possibly using a push sync like dvcs-autosync
  does, so they are immediately updated.
- Somehow get content that is unavilable. This is problimatic with inotify,
  since we only get an event once the user has tried (and failed) to read
  from the file. Perhaps instead, automatically copy content that is added
  out to remotes, with the goal of all repos eventually getting a copy,
  if df allows.
- Drop files that have not been used lately, or meet some other criteria
  (as long as there's a copy elsewhere).
- Perhaps automatically dropunused files that have been deleted,
  although I cannot see a way to do that, since by the time the inotify
  deletion event arrives, the file is deleted, and we cannot see what
  its symlink pointed to! Alternatievely, perhaps automatically
  do an expensive unused/dropunused cleanup process.

Some of this probably needs the currently stateless threads to maintain
a common state.
2012-04-12 17:42:05 -04:00
Joey Hess
fcc08c59ec use unabbreviated size units in status 2012-04-06 14:54:41 -04:00
Joey Hess
e38a839a80 Rewrote free disk space checking code
Moving the portability handling into a small C library cleans up things
a lot, avoiding the pain of unpacking structs from inside haskell code.
2012-03-22 17:32:47 -04:00
Joey Hess
f1398b5583 use new getConfig 2012-03-22 17:32:47 -04:00
Joey Hess
4eb5112681 rationalize getConfig
getConfig got a remote-specific config, and this confusing name caused it
to be used a couple of places that only were interested in global configs.
Rename to getRemoteConfig and make getConfig only get global configs.

There are no behavior changes here, but remote.<name>.annex-web-options
never actually worked (and per-remote web options is a very unlikely to be
useful case so I didn't make it work), so fix the documentation for it.
2012-03-22 17:32:47 -04:00
Joey Hess
52b90e5d4c tweak 2012-03-22 17:32:47 -04:00
Joey Hess
188e2edc41 status: Prints available local disk space, or shows if git-annex doesn't know. 2012-03-21 21:55:02 -04:00
Joey Hess
a362c46b70 fun with symbols
Nothing at all on hackage is using <&&> or <||>.

(Also, <&&> should short-circuit on failure.)
2012-03-17 00:38:40 -04:00
Joey Hess
771052a85e optimize monadic ||
(||) used applicative style runs both conditions rather than short
circuiting. Add an orM that properly short-circuits.
2012-03-16 12:28:17 -04:00
Joey Hess
60ab3d84e1 added ifM and nuked 11 lines of code
no behavior changes
2012-03-14 17:43:34 -04:00
Joey Hess
342fc28437 Merge branch 'master' into bloom
Conflicts:
	Command/Commit.hs
	debian/changelog
2012-03-14 12:41:48 -04:00
Joey Hess
6cb4743cfb ignore hook exit status 2012-03-14 12:41:00 -04:00
Joey Hess
5b869eef91 git-annex-shell: Runs hooks/annex-content after content is received or dropped. 2012-03-14 12:18:10 -04:00
Joey Hess
caf97fcffd git-annex-shell: Runs hooks/annex-content after content is received or dropped. 2012-03-14 12:01:56 -04:00
Joey Hess
94aff8b878 Merge branch 'master' into bloom
Conflicts:
	debian/changelog
2012-03-12 16:32:29 -04:00
Joey Hess
25809ce2e0 finish bloom filters
Add tuning, docs, etc.

Not sure if status is the right place to remote size.. perhaps unused
should report the size and also warn if it sees more keys than the bloom
filter allows?
2012-03-12 16:18:35 -04:00
Joey Hess
faf3a94fa7 added second stage bloom filter 2012-03-12 15:21:58 -04:00
Joey Hess
32f9742a88 fixed bloom filter creation space leak
it works!
2012-03-12 14:09:43 -04:00
Joey Hess
160715166b try at using bloom filters
leaks memory
2012-03-12 02:39:25 -04:00
Joey Hess
89ee70c43a status: More accurate display of sizes of tmp and bad keys.
Can't trust the key size to be accurate for tmp and bad keys, so check
actual file size. In the wild I saw the old code be wrong by a factor
of about 100!

If all tmp/bad keys are empty, they're not shown in status at all.
Showing 0 bytes and suggesting to clean it up seemed weird..
2012-03-12 00:41:48 -04:00
Joey Hess
83bbb3bc93 prettify 2012-03-11 21:21:51 -04:00
Joey Hess
5df18b311a avoid needing to keep list of present keys
Stale and bad files are rare, so it's more efficient to use inAnnex to see
if they can be deleted, rather than keeping the list of all present keys
around for them.
2012-03-11 20:46:03 -04:00
Joey Hess
ff3644ad38 status: Fixed to run in nearly constant space.
Before, it leaked space due to caching lists of keys. Now all necessary
data about keys is calculated as they stream in.

The "nearly constant" is due to getKeysPresent, which builds up a lot
of [] thunks as it traverses .git/annex/objects/. Will deal with it later.
2012-03-11 17:15:58 -04:00
Joey Hess
b086e32c63 unused: Reduce memory usage significantly.
Much of the memory bloat turned out to be due to getKeysReferenced
containing a mapM, which is strict and buffered the whole list
rather than streaming it.

The other half of the bloat was due to building a temporary Set
in order to call S.difference. While that is more cpu efficient,
I switched to successive S.delete, since with it, I can run a whole
git annex unused in less than 8 mb of memory.

The whole Set of keys with content available is still stored in memory,
so running unused in a repo with a whole lot of file content will still
use more memory. In a repo containing 6000 files, it needed 40 mb.

Note that the status command still uses the bloatful getKeysReferenced.
2012-03-11 16:24:07 -04:00
Joey Hess
997e29f294 sync: Sync to lower cost remotes first.
This has two benefits.

1. When a lot of refs are going to be received, get them via lower cost
   connection when possible.
2. Allows ctrl-c of sync after the cheaper remotes have been pulled from
   (or pushed to).
2012-03-10 15:37:38 -04:00
Joey Hess
5ab82230f7 fsck: Fix up any broken links and misplaced content caused by the directory hash calculation bug fixed in the last release. 2012-03-10 14:46:21 -04:00
Joey Hess
dc9049373e cleanup 2012-03-06 14:12:15 -04:00
Joey Hess
1098bc37ab "here" can be used to refer to the current repository, which can read better than the old "." (which still works too). 2012-03-01 22:35:10 -04:00
Joey Hess
2fd294d06f move --from, copy --from: 10 times faster scanning remote on local disk
Rather than go through the location log to see which files are present on
the remote, it simply looks at the disk contents directly.

I benchmarked this speeding up scanning 834 files, from an annex on my
phone's SSD, from 11.39 seconds to 1.31 seconds. (No files actually moved.)

Also benchmarked 8139 files, from an annex on spinning storage,
speeding up from 103.17 to 13.39 seconds.

Note that benchmarking with an encrypted annex on flash actually showed a
minor slowdown with this optimisation -- from 13.93 to 14.50 seconds. Seems
the overhead of doing the crypto needed to get the filenames to directly
check can be higher than the overhead of looking up data in the location
log. (Which says good things about how well the location log and git have
been optimised!) It *may* make sense to make encrypted local remotes not
have hasKeyCheap set; further benchmarking is called for.
2012-02-26 14:59:48 -04:00
Joey Hess
a3c9d06a26 add git-annex-shell commit
Eventually, git-annex might try running this after making changes to
a remote. I have not yet thought of a good way for it to tell which
remotes it needs to run it on though. It can't just do it when
shutting down a cached ssh connection, because ssh connection caching
is optional, and that would not handle local remotes not accessed over ssh
either.
2012-02-25 16:47:28 -04:00
Joey Hess
1f73db3469 improve alwayscommit=false mode
Now changes are staged into the branch's index, but not committed,
which avoids growing a large journal. And sync and merge always
explicitly commit, ensuring that even when they do nothing else,
they commit the staged changes.

Added a flag file to indicate that the branch's journal contains
uncommitted changes. (Could use git ls-files, but don't want to run
that every time.)

In the future, this ability to have uncommitted changes staged in the
journal might be used on remotes after a series of oneshot commands.
2012-02-25 16:18:55 -04:00
Joey Hess
779ec91908 more robustness fixes 2012-02-18 12:08:02 -04:00
Joey Hess
abd50e01fb don't fail with --pathdepth when file already exists 2012-02-18 12:05:13 -04:00
Joey Hess
00340dfe49 don't error out entirely if an url cannot be downloaded 2012-02-18 11:44:21 -04:00
Joey Hess
1ed5e4d9e3 variable name 2012-02-17 00:21:35 -04:00
Joey Hess
f3c75b601f reorg 2012-02-17 00:19:47 -04:00
Joey Hess
ba5515d422 reorder for clarity 2012-02-16 22:38:08 -04:00
Joey Hess
156a631f63 make Migrate use ReKey rather than the other way around
as ReKey is plumbing, this makes sense
2012-02-16 22:36:56 -04:00
Joey Hess
69a0161c3a fix filename limit when using --pathdepth 2012-02-16 19:37:02 -04:00
Joey Hess
db6b4cdfcf rekey: New plumbing level command, can be used to change the keys used for files en masse. 2012-02-16 16:36:35 -04:00
Joey Hess
d05550e803 zero still bad 2012-02-16 14:28:54 -04:00
Joey Hess
346c934409 allow pathdepth to drop from the front or take from the end (negative) 2012-02-16 14:26:53 -04:00
Joey Hess
c2245260b1 improve usage 2012-02-16 12:37:30 -04:00
Joey Hess
39c3f56b33 addurl: Add --pathdepth option. 2012-02-16 12:25:19 -04:00
Joey Hess
a86d937b5b avoid too long filename when making up a filename for addurl too 2012-02-16 02:09:09 -04:00
Joey Hess
a1e52f0ce5 hlint 2012-02-16 00:44:51 -04:00
Joey Hess
e7aaa55c53 create parent directories as needed for addurl --file 2012-02-16 00:05:49 -04:00
Joey Hess
90a8b38ac0 set oneshot mode on a per-command basis
Avoids ugly (and test suite failing) hack in Command.Version
2012-02-14 12:40:40 -04:00
Joey Hess
2f1f1e6b13 avoid version saving state
This is not the place to commit journal files.
2012-02-14 10:59:48 -04:00
Joey Hess
cb631ce518 whereis: Prints the urls of files that the web special remote knows about. 2012-02-14 03:49:48 -04:00
Joey Hess
cbaebf538a rework git check-attr interface
Now gitattributes are looked up, efficiently, in only the places that
really need them, using the same approach used for cat-file.

The old CheckAttr code seemed very fragile, in the way it streamed files
through git check-attr.
I actually found that cad8824852
was still deadlocking with ghc 7.4, at the end of adding a lot of files.
This should fix that problem, and avoid future ones.

The best part is that this removes withAttrFilesInGit and withNumCopies,
which were complicated Seek methods, as well as simplfying the types
for several other Seek methods that had a Backend tupled in.
2012-02-13 23:52:21 -04:00
Joey Hess
a3ebf16e62 also verify new urls when adding them to existing files 2012-02-10 19:40:54 -04:00
Joey Hess
17fed709c8 addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key. 2012-02-10 19:23:46 -04:00
Joey Hess
1c0bd81ba6 addurl: Normalize badly encoded urls. 2012-02-09 14:19:58 -04:00
Joey Hess
ac97454659 improve error message 2012-02-08 15:49:42 -04:00
Joey Hess
ef013506cb addurl: Added a --file option
Can be used to specify what file the url is added to. This can be used to
override the default filename that is used when adding an url, which is
based on the url. Or, when the file already exists, the url is recorded as
another location of the file.
2012-02-08 15:35:29 -04:00
Joey Hess
a81297065d use "known" instead of "visible"
I think it's clearer, also it's the same length as "local" :)
2012-02-06 20:42:49 -04:00
Joey Hess
90ab17e153 remove old comment 2012-02-04 16:34:13 -04:00
Joey Hess
f1c7dc1212 fix touch and statfs to work on any files in any locale
Use withCAString rather than withCString.

XXX Actually, this only works in non-unicode locales when presented with
unicode characters. Help?
2012-02-04 12:44:51 -04:00
Joey Hess
44b115e0b1 Merge branch 'master' into ghc7.4
Conflicts:
	Utility/Misc.hs
2012-02-03 16:48:40 -04:00
Joey Hess
146c36ca54 IO exception rework
ghc 7.4 comaplains about use of System.IO.Error to catch exceptions.
Ok, use Control.Exception, with variants specialized to only catch IO
exceptions.
2012-02-03 16:47:24 -04:00
Joey Hess
d8fb97806c support all filename encodings with ghc 7.4
Under ghc 7.4, this seems to be able to handle all filename encodings
again. Including filename encodings that do not match the LANG setting.
I think this will not work with earlier versions of ghc, it uses some ghc
internals.

Turns out that ghc 7.4 has a special filesystem encoding that it uses when
reading/writing filenames (as FilePaths). This encoding is documented
to allow  "arbitrary undecodable bytes to be round-tripped through it".

So, to get FilePaths from eg, git ls-files, set the Handle that is reading
from git to use this encoding. Then things basically just work.

However, I have not found a way to make Text read using this encoding.
Text really does assume unicode. So I had to switch back to using String
when reading/writing data to git. Which is a pity, because it's some
percent slower, but at least it works.

Note that stdout and stderr also have to be set to this encoding, or
printing out filenames that contain undecodable bytes causes a crash.
IMHO this is a misfeature in ghc, that the user can pass you a filename,
which you can readFile, etc, but that default, putStr of filename may
cause a crash!

Git.CheckAttr gave me special trouble, because the filenames I got back
from git, after feeding them in, had further encoding breakage.
Rather than try to deal with that, I just zip up the input filenames
with the attributes. Which must be returned in the same order queried
for this to work.

Also of note is an apparent GHC bug I worked around in Git.CheckAttr. It
used to forkProcess and feed git from the child process.  Unfortunatly,
after this forkProcess, accessing the `files` variable from the parent
returns []. Not the value that was passed into the function. This screams
of a bad bug, that's clobbering a variable, but for now I just avoid
forkProcess there to work around it. That forkProcess was itself only added
because of a ghc bug, #624389. I've confirmed that the test case for that
bug doesn't reproduce it with ghc 7.4. So that's ok, except for the new ghc
bug I have not isolated and reported. Why does this simple bit of code
magnet the ghc bugs? :)

Also, the symlink touching code is currently broken, when used on utf-8
filenames in a non-utf-8 locale, or probably on any filename containing
undecodable bytes, and I temporarily commented it out.
2012-02-03 16:23:20 -04:00
Joey Hess
3d49258e5b attempt at a quick, utf-8 only fix to the ghc 7.4 problem
If you have only utf-8 filenames, and need to build git-annex with ghc 7.4,
this will work. But, it will crash on non-utf-8 filenames.
2012-02-01 16:16:08 -04:00
Joey Hess
a964012fc3 switch to the strict state monad
I had not realized what a memory leak the lazy state monad could be,
although I have not seen much evidence of actual leaking in git-annex.
However, if running git-annex on a great many files, this could matter.

The additional Utility.State.changeState adds even more strictness,
avoiding a problem I saw in github-backup where repeatedly modifying
state built up a huge pile of thunks.
2012-01-29 22:55:06 -04:00
Joey Hess
b81d662cbf Avoid repeated location log commits when a remote is receiving files.
Done by adding a oneshot mode, in which location log changes are written to
the journal, but not committed. Taking advantage of git-annex's existing
ability to recover in this situation.

This is used by git-annex-shell and other places where changes are made to
a remote's location log.
2012-01-28 15:41:52 -04:00
Joey Hess
61dbad505d fsck --from remote --fast
Avoids expensive file transfers, at the expense of checking file size
and/or contents.

Required some reworking of the remote code.
2012-01-20 13:23:11 -04:00
Joey Hess
f35a84fac7 use a different tmp file when fscking remote data
Since the content might be symlinked into place, it's not appropriate to
use withTmp here.
2012-01-19 16:56:07 -04:00
Joey Hess
06b0cb6224 add tmp flag parameter to retrieveKeyFile 2012-01-19 16:07:36 -04:00
Joey Hess
90319afa41 fsck --from
Fscking a remote is now supported. It's done by retrieving
the contents of the specified files from the remote, and checking them,
so can be an expensive operation.

(Several optimisations are possible, to speed it up, of course.. This is
the slow and stupid remote fsck to start with.)

Still, if the remote is a special remote, or a git repository that you
cannot run fsck in locally, it's nice to have the ability to fsck it.

If you have any directory special remotes, now would be a good time to
fsck them, in case you were hit by the data loss bug fixed in the
previous release!
2012-01-19 15:24:05 -04:00
Joey Hess
d36525e974 convert fsckKey to a Maybe
This way it's clear when a backend does not implement its own fsck checks.
2012-01-19 13:51:30 -04:00
Joey Hess
abdacf58ed tweaks 2012-01-11 00:06:54 -04:00
Joey Hess
16e7178f20 reorg 2012-01-10 15:29:10 -04:00
Joey Hess
07cacbeee9 break module dependancy loop
A PITA but worth it to clean up the trust configuration code.
2012-01-10 13:32:38 -04:00
Joey Hess
7675b83efa map: Fix display of remote repos
A change to break local cycles made remote repos be dropped entirely.
2012-01-08 16:05:57 -04:00
Joey Hess
a35278430a log: Add --gource mode, which generates output usable by gource.
As part of this, I fixed up how log was getting the descriptions of
remotes.
2012-01-07 18:18:09 -04:00
Joey Hess
bdc49ddbdb typo 2012-01-07 00:45:01 -04:00
Joey Hess
dfa76069d4 reap zombies 2012-01-07 00:22:16 -04:00
Joey Hess
b8966433ef sped up git annex log rather a lot
See comment! Isn't git fun, always interesting approaches to optimise
things that seemed unfixably slow.
2012-01-07 00:15:01 -04:00
Joey Hess
945f56f348 cleanup
Broke out pure general functions etc.
2012-01-07 00:11:15 -04:00
Joey Hess
24b35113cf tweak 2012-01-06 23:43:18 -04:00
Joey Hess
64f9d00bed tweak 2012-01-06 21:51:39 -04:00
Joey Hess
2557bb8764 complete set of log options 2012-01-06 21:48:30 -04:00
Joey Hess
8e7de01047 log --before=date 2012-01-06 21:32:08 -04:00
Joey Hess
539f8c6f14 --boundry was not needed 2012-01-06 21:09:23 -04:00
Joey Hess
d8d72781af better data type 2012-01-06 18:58:35 -04:00
Joey Hess
3c88d57399 log --max-count=n 2012-01-06 17:48:02 -04:00
Joey Hess
078788a9e7 change log display
Including the file in the lines behaves better when limiting with --after,
since only files that changed in the time period are shown.

Still not fully happy with the line layout, but putting the +/- first
followed by the date seems a good change.
2012-01-06 17:36:13 -04:00
Joey Hess
9fb5f3edc7 log --after=date 2012-01-06 17:24:03 -04:00
Joey Hess
47646d44b7 use a zipper 2012-01-06 16:24:40 -04:00
Joey Hess
a3a9f87047 log: New command that displays the location log for file, showing each repository they were added to and removed from.
This needs to run git log on the location log files to get at all past
versions of the file, which tends to be a bit slow.

It would be possible to make a version optimised for showing the location
logs for every key. That would only need to run git log once, so would be
faster, but it would need to process an enormous amount of data, so
would not speed up the individual file case.

In the future it would be nice to support log --format. log --json also
doesn't work right yet.
2012-01-06 15:40:07 -04:00
Joey Hess
1f8a1058c9 tweak 2012-01-06 10:57:57 -04:00
Joey Hess
df21cbfdd2 look up --to and --from remote names only once
This will speed up commands like move and drop.
2012-01-06 04:06:13 -04:00
Joey Hess
0a36f92a31 more command-specific options
Made --from and --to command-specific options.

Added generic storage for values of command-specific options,
which allows removing some of the special case fields in AnnexState.

(Also added generic storage for command-specific flags, although there are
not yet any.)

Note that this storage uses a Map, so repeatedly looking up the same value
is slightly more expensive than looking up an AnnexState field. But, the
value can be looked up once in the seek stage, transformed as necessary,
and passed in a closure to the start stage, and this avoids that overhead.

Still, I'm hesitant to use this for things like force or fast flags.
It's probably best to reserve it for flags that are only used by a few
commands, or options like --from and --to that it's important only be
allowed to be used with commands that implement them, to avoid user
confusion.
2012-01-06 03:16:42 -04:00
Joey Hess
ad43f03626 per-command options
Finally commands can define their own options.

Moved --format and --print0 to be options only of find.
2012-01-05 23:11:07 -04:00
Joey Hess
a1aea174d7 fsck: Do backend-specific check before checking numcopies is satisfied.
This way, when a checksum check fails and the content is moved aside,
the numcopies check also warns if there are not enough copies.
2012-01-03 18:40:47 -04:00
Joey Hess
aa0882691b Added remote.name.annex-web-options configuration setting, which can be used to provide parameters to whichever of wget or curl git-annex uses (depends on which is available, but most of their important options suitable for use here are the same). 2012-01-02 14:20:20 -04:00