Commit graph

80 commits

Author SHA1 Message Date
Joey Hess
58c7b0a56d assistant: Always batch changes found in startup scan.
Batch detection is heuristic, so can sometimes fail. I observed one such
failure while starting up in a repository with 87000 files. After the first
several batches of ~5000 files, it fell out of batch mode, and never
re-entered it, and so made many more commits of a few files at a time
than necessary.

So, let's always use batch mode when in the startup scan. This avoids the
heuristic there, at least.

There is clearly also room to improve the heuristic. Possibly 10 files is
too high a bar to be found during a commit, on a system that can commit
quickly.
2013-12-16 16:16:19 -04:00
Joey Hess
2066e90421 avoid needing --force on windows despite no lsof
Note that I still need to think this through and make sure handling of open
files is safe. This is just for testing purposes.
2013-12-09 16:56:15 -04:00
Joey Hess
13108b7196 assistant: Notice on startup when the index file is corrupt, and auto-repair. 2013-11-13 14:27:17 -04:00
Joey Hess
a1b1b5ef52 moved code out of webapp
No code changes, aside from some changes to lifting in code that turned out
to be able to run in Assistant rather than Handler.
2013-10-26 16:58:16 -04:00
Joey Hess
4f871f89ba git-recover-repository 1/2 done 2013-10-20 17:50:51 -04:00
Joey Hess
635c9a1549 assistant: Detect stale git lock files at startup time, and remove them.
Extends the index.lock handling to other git lock files. I surveyed
all lock files used by git, and found more than I expected. All are
handled the same in git; it leaves them open while doing the operation,
possibly writing the new file content to the lock file, and then closes
them when done.

The gc.pid file is excluded because it won't affect the normal operation
of the assistant, and waiting for a gc to finish on startup wouldn't be
good.

All threads except the webapp thread wait on the new startup sanity checker
thread to complete, so they won't try to do things with git that fail
due to stale lock files. The webapp thread mostly avoids doing that kind of
thing itself. A few configurators might fail on lock files, but only if the
user is explicitly trying to run them. The webapp needs to start
immediately when the user has opened it, even if there are stale lock
files.

Arranging for the threads to wait on the startup sanity checker was a bit
of a bear. Have to get all the NotificationHandles set up before the
startup sanity checker runs, or they won't see its signal. Perhaps
the NotificationBroadcaster is not the best interface to have used for
this. Oh well, it works.

This commit was sponsored by Michael Jakl
2013-10-05 17:04:21 -04:00
Joey Hess
93dbb7842e watcher: Detect at startup time when there is a stale .git/lock, and remove it so it does not interfere with the automatic commits of changed files. 2013-10-03 16:57:21 -04:00
Joey Hess
3ac9c4e672 hlint 2013-10-02 22:59:07 -04:00
Joey Hess
b191d5c595 gitignore support for the assistant and watcher
Requires git 1.8.4 or newer. When it's installed, a background
git check-ignore process is run, and used to efficiently check ignores
whenever a new file is added.

Thanks to Adam Spiers, for getting the necessary support into git for this.

A complication is what to do about files that are gitignored but have
been checked into git anyway. git commands assume the ignore has been
overridden in this case, and not need any more overriding to commit a
changed version.

However, for the assistant to do the same, it would have to run git ls-files
to check if the ignored file is in git. This is somewhat expensive. Or it
could use the running git-cat-file process to query the file that way,
but that requires transferring the whole file content over a pipe, so it
can be quite expensive too, for files that are not git-annex
symlinks.

Now imagine if the user knows that a file or directory tree will be getting
frequent changes, and doesn't want the assistant to sync it, so gitignores
it. The assistant could overload the system with repeated ls-files checks!

So, I've decided that the assistant will not automatically commit changes
to files that are gitignored. This is a tradeoff. Hopefully it won't be a
problem to adjust .gitignore settings to not ignore files you want the
assistant to autocommit, or to manually git annex add files that are listed
in .gitignore.

(This could be revisited if git-annex gets access to an interface to check
the content of the index w/o forking a git command. This could be libgit2,
or perhaps a separate git cat-file --batch-check process, so it wouldn't
need to ship over the whole file content.)

This commit was sponsored by Francois Marier. Thanks!
2013-08-02 20:37:03 -04:00
Joey Hess
ed4febb170 remove debug print 2013-05-25 00:02:39 -04:00
Joey Hess
b8e5b9c645 test suite passes in direct mode
This fixes a bug with git annex add in direct mode. If some files already
existed in the tree pointing at the same key as a file that was just added,
and their content was not present, add neglected to copy the content to
those files.

I also changed the behavior of moveAnnex slightly: When content is moved
into the annex in direct mode, it does not overwrite any content already
present in direct mode files. That content may be modified after all.
2013-05-17 15:59:37 -04:00
Joey Hess
a9081ae473 optimise direct mode startup scan
A recent change made existing symlinks be re-staged. That does not need to
be done during the startup scan though.
2013-04-24 21:20:29 -04:00
Joey Hess
ebee93a837 get rid of need to run pre-commit hook when assistant commits in direct mode
That hook updates associated file bookkeeping info for direct mode.
But, everything already called addAssociatedFile when adding/changing a
file. It only needed to also call removeAssociatedFile when deleting a file,
or a directory.

This should make bulk adds faster, by some possibly significant amount.
Bulk removals may be a little slower, since it has to use catKeyFile now
on each removed file, but will still be faster than adds.
2013-04-24 18:04:59 -04:00
Joey Hess
b8e45ec9d7 refactoring and minor performance tweak 2013-04-24 17:46:46 -04:00
Joey Hess
04a27ad926 assistant: Bug fix to avoid annexing the files that git uses to stand in for symlinks on FAT and other filesystem not supporting symlinks.
also, blog for the day..
2013-04-10 19:57:26 -04:00
Joey Hess
f1b0a4b404 Use lower case hash directories for storing files on crippled filesystems, same as is already done for bare repositories.
* since this is a crippled filesystem anyway, git-annex doesn't use
  symlinks on it
* so there's no reason to use the mixed case hash directories that we're
  stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
  repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
  mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
  test case will recover and it will all just work.
2013-04-04 15:46:33 -04:00
Joey Hess
6e7842475b convert "./file" from inotify to just "file"
This just prettifies some display.
2013-04-02 16:20:23 -04:00
Joey Hess
38d61f934d Update working tree files fully atomically
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.

But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.

(It also probably avoids confusing displays in file managers.)
2013-04-02 15:02:00 -04:00
Joey Hess
8c52b20cc7 optimise last commit
Rather than re-adding a direct mode file unnecessarily when it's not
changed, just re-stage the symlink.
2013-04-02 12:58:56 -04:00
Joey Hess
31cbde8190 assistant: Fix bug that could cause direct mode files to be unstaged from git.
My test case for this bug is to have the assistant running and syncing to
a remote, and create a file in the annex. Then at the command line run
git annex drop. The assistant sees that the file is gone, sees it's a wanted
file, and downloads it from the remote.

With a directory special remote and a small file, I was seeing around 1
time in 3, a race where the file got unstaged from git after it got
downloaded.

Looking at what direct mode content managing code does in this case, it
deletes the symlink, and then adds the file content back. It would be
possible, sometimes, to avoid removing the symlink and do this atomically.
And I probably should.. but in some cases, particularly where the file
needs to be run through `cp` (multiple direct mode files with same
content), there's no way to atomically replace the symlink with the
content.

Anyway, the bug turns out to be something that the watcher does right for
indirect mode, but not for direct mode. When it got an add event, it
checked to see if this was a new file, or one we've already added. In the
latter case, no add event was queued. But that means that only the rm event
is queued, and so it unstages the file.

Fixed by queueing an add event even when the file is already in git.

Tested by running hundreds of drops in a loop; file remained staged.
2013-04-02 12:45:31 -04:00
Joey Hess
5771cfce02 assistant: Check small files into git directly. 2013-03-29 16:54:59 -04:00
Joey Hess
67e817c6a1 New annex.largefiles setting, which configures which files git annex add and the assistant add to the annex.
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.

A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
2013-03-29 16:17:13 -04:00
Joey Hess
f340fd324c synthesize RmChange when a directory is deleted
This gets directory renames closer to being fully detected.
There's close to no extra overhead to doing it this way.
2013-03-11 15:14:42 -04:00
Joey Hess
74f723bb50 let's put type modules under the parent module, not in a Types directory 2013-03-10 22:24:13 -04:00
Joey Hess
65a4c7966f moved transfer queueing out of watcher and into committer
This cleaned up the code quite a bit; now the committer just looks at the
Change to see if it's a change that needs to have a transfer queued for it.
If I later want to add dropping keys for files that were removed, or
something like that, this should make it straightforward.

This also fixes a bug. In direct mode, moving a file out of an archive
directory failed to start a transfer to get its content. The problem
was that the file had not been committed to git yet, and so the transfer
code didn't want to touch it, since fileKey failed to get its key.
Only starting transfers after a commit avoids this problem.
2013-03-10 18:16:03 -04:00
Joey Hess
c908672f3d fix another potential race with the watcher and direct mode
Watcher wants to rewrite symlink to fix it. But in direct mode, the symlink
could be replaced at any time with file content that has finished being
transferred by some other process. So, just don't touch it.

FWIW, I audited the rest of the assistant for places where it removes
files, and the rest is ok. I have not audited the rest of git-annex.
2013-03-04 15:09:32 -04:00
Joey Hess
1d388d5579 fixed the race breaking moving files from archive in direct mode
assistant: Fix bug in direct mode that could occur when a symlink is moved
out of an archive directory, and resulted in the file not being set to
direct mode when it was transferred.

The bug was that the direct mode mapping was not up-to-date when the
transferrer finished. So, finding no direct mode place to store the object,
it was put into .git/annex in indirect mode.

To fix this, just make the watcher update the direct mode mapping to
include the new file before it starts the transfer. (Seems we don't need to
update it to remove the old file if the link was moved, because the direct
mode code will notice it's not present and the mapping gets updated for its
removal later.)

The reason this was a race, and was probably not seen often is because
the committer came along and updated the direct mode mapping as part of
adding the moved symlink. But when the file was sufficiently small or
the remote sufficiently fast, this could happen after the transfer
finished.
2013-03-04 14:25:22 -04:00
Joey Hess
a733271a9c add additional debug info about reasons for drops 2013-03-01 15:58:44 -04:00
Joey Hess
46c9cbeb1e add additional debug info about reasons for transfers 2013-03-01 15:23:59 -04:00
Joey Hess
08854afa10 fix inverted logic 2013-02-22 17:01:48 -04:00
Joey Hess
d7c93b8913 fully support core.symlinks=false in all relevant symlink handling code
Refactored annex link code into nice clean new library.

Audited and dealt with calls to createSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:  ( liftIO $ createSymbolicLink linktarget file
  only when core.symlinks=true
Assistant/WebApp/Configurators/Local.hs:                createSymbolicLink link link
  test if symlinks can be made
Command/Fix.hs: liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/FromKey.hs:     liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/Indirect.hs:                    liftIO $ createSymbolicLink l f
  refuses to run if core.symlinks=false
Init.hs:                createSymbolicLink f f2
  test if symlinks can be made
Remote/Directory.hs:    go [file] = catchBoolIO $ createSymbolicLink file f >> return True
  fast key linking; catches failure to make symlink and falls back to copy
Remote/Git.hs:          liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True
  ditto
Upgrade/V1.hs:                          liftIO $ createSymbolicLink link f
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to readSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:		( liftIO $ catchMaybeIO $ readSymbolicLink file
  only when core.symlinks=true
Assistant/Threads/Watcher.hs:		ifM ((==) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file))
  code that fixes real symlinks when inotify sees them
  It's ok to not fix psdueo-symlinks.
Assistant/Threads/Watcher.hs:		mlink <- liftIO (catchMaybeIO $ readSymbolicLink file)
  ditto
Command/Fix.hs:	stopUnless ((/=) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) $ do
  command only works in indirect mode
Upgrade/V1.hs:	getsymlink = takeFileName <$> readSymbolicLink file
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to isSymbolicLink.
(Typically used with getSymbolicLinkStatus, but that is just used because
getFileStatus is not as robust; it also works on pseudolinks.)
Remaining calls are all safe, because:

Assistant/Threads/SanityChecker.hs:                             | isSymbolicLink s -> addsymlink file ms
  only handles staging of symlinks that were somehow not staged
  (might need to be updated to support pseudolinks, but this is
  only a belt-and-suspenders check anyway, and I've never seen the code run)
Command/Add.hs:         if isSymbolicLink s || not (isRegularFile s)
  avoids adding symlinks to the annex, so not relevant
Command/Indirect.hs:                            | isSymbolicLink s -> void $ flip whenAnnexed f $
  only allowed on systems that support symlinks
Command/Indirect.hs:            whenM (liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f) $ do
  ditto
Seek.hs:notSymlink f = liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f
  used to find unlocked files, only relevant in indirect mode
Utility/FSEvents.hs:                    | Files.isSymbolicLink s = runhook addSymlinkHook $ Just s
Utility/FSEvents.hs:                                            | Files.isSymbolicLink s ->
Utility/INotify.hs:                             | Files.isSymbolicLink s ->
Utility/INotify.hs:                     checkfiletype Files.isSymbolicLink addSymlinkHook f
Utility/Kqueue.hs:              | Files.isSymbolicLink s = callhook addSymlinkHook (Just s) change
  all above are lower-level, not relevant

Audited and dealt with calls to isSymLink.
Remaining calls are all safe, because:

Annex/Direct.hs:			| isSymLink (getmode item) =
  This is looking at git diff-tree objects, not files on disk
Command/Unused.hs:		| isSymLink (LsTree.mode l) = do
  This is looking at git ls-tree, not file on disk
Utility/FileMode.hs:isSymLink :: FileMode -> Bool
Utility/FileMode.hs:isSymLink = checkMode symbolicLinkMode
  low-level

Done!!
2013-02-17 16:43:14 -04:00
Joey Hess
a261412c25 close 2013-01-28 15:39:51 +11:00
Joey Hess
a8bb2749b2 assistant: Ignore .DS_Store on OSX. 2013-01-28 15:13:22 +11:00
Joey Hess
5cd152b8a9 annex.autocommit
New setting, can be used to disable autocommit of changed files by the
assistant, while it still does data syncing and other tasks.

Also wired into webapp UI
2013-01-27 22:43:05 +11:00
Joey Hess
76ddf9b6d3 webapp: Now allows restarting any threads that crash. 2013-01-26 17:09:33 +11:00
Joey Hess
d7ca6fb856 webapp: Now always logs to .git/annex/daemon.log
It used to not log to daemon.log when a repository was first created, and
when starting the webapp. Now both do. Redirecting stdout and stderr to the
log is tricky when starting the webapp, because the web browser may want to
communicate with the user. (Either a console web browser, or web.browser = echo)
This is handled by restoring the original fds when running the browser.
2013-01-15 13:34:59 -04:00
Joey Hess
aedfcde969 guard readSymbolicLink
throws an exception if the file is not a symlink
2013-01-05 16:07:27 -04:00
Joey Hess
1cdf2b923d assistant: Make expensive transfer scan work fully in direct mode.
The expensive scan uses lookupFile, but in direct mode, that doesn't work
for files that are present. So the scan was not finding things that are
present that need to be uploaded. (It did find things not present that
needed to be downloaded.)

Now lookupFile also works in direct mode. Note that it still prefers
symlinks on disk to info committed to git, in direct mode. This is
necessary to make things like Assistant.Threads.Watcher.onAddSymlink
work correctly, when given a new symlink not yet checked into git (or
replacing a file checked into git).
2013-01-05 15:57:53 -04:00
Joey Hess
8cc27b8afc avoid double commits with inotify when direct mode file is created 2012-12-29 14:58:13 -04:00
Joey Hess
bf3270c5b7 add missing modifyHook for watcher
Needed for FSEvents, which calls that hook for modified files.
inotify seems to call the add hook, so I didn't notice it before.
2012-12-28 16:00:45 -04:00
Joey Hess
eb40227d15 assistant direct mode file add/change bookkeeping
When a file is changed in direct mode, the old content is probably lost
(at least from the local repo), and bookeeping needs to be updated to
reflect this.

Also, synthetic add events are generated at assistant startup, so
make it detect when the file has not really changed, and avoid re-adding
it.

This does add the overhead of querying the runing git cat-file for the
key that's recorded in git for the file, each time a file is added or
modified in direct mode.
2012-12-25 15:48:15 -04:00
Joey Hess
cc5140d295 assistant adding of modified files in direct mode
Works with inotify, but I think in kqueue we don't get events
existing files that get modified.
2012-12-24 14:42:19 -04:00
Joey Hess
95db595e91 make startup scan for deleted files work in direct mode
git add --update cannot be used, because it'll stage typechanged direct
mode files. Intead, use ls-files to find deleted files, and stage them
ourselves.

It seems that no commit was made before when the scan staged deleted files.
(Probably masked since if files were added, a commit happened then..)
Now that I'm doing the staging, I was also able to fix that bug.
2012-12-24 14:24:13 -04:00
Joey Hess
c6d2bbe402 assistant adding of files in direct mode 2012-12-24 13:37:29 -04:00
Joey Hess
82617b92e9 move thirdparty program installation for standalone bundle into haskell program
This allows it to use Build.SysConfig to always install the programs
configure detected. Amoung other fixes, this ensures the right uuid
generator and checksum programs are installed.

I also cleaned up the handling of lsof's path; configure now checks for
it in PATH, but falls back to looking for it in sbin directories.
2012-12-14 16:07:59 -04:00
Joey Hess
463cf58140 webapp and assistant glacier support 2012-11-24 16:30:15 -04:00
Joey Hess
c282c8b492 queue uploads when a new or renamed symlink is handled 2012-11-24 15:38:24 -04:00
Joey Hess
93ffd47d76 finished pushing Assistant monad into all relevant files
All temporary and old functions are removed.
2012-10-30 17:14:51 -04:00
Joey Hess
47d94eb9a4 pushed Assistant monad down into DaemonStatus code
Currently have three old versions of functions that more reworking is
needed to remove: getDaemonStatusOld, modifyDaemonStatusOld_, and
modifyDaemonStatusOld
2012-10-30 15:39:15 -04:00
Joey Hess
ea8df8fe9f cleanup daemonStatus accessors 2012-10-30 14:44:18 -04:00