Commit graph

85 commits

Author SHA1 Message Date
Joey Hess
e4d7e2ebde fix for Windows file timestamp timezone madness
On Windows, changing the time zone causes the apparent mtime of files to
change. This confuses git-annex, which natually thinks this means the files
have actually been modified (since THAT'S WHAT A MTIME IS FOR, BILL <sheesh>).

Work around this stupidity, by using the inode sentinal file to detect if
the timezone has changed, and calculate a TSDelta, which will be applied
when generating InodeCaches.

This should add no overhead at all on unix. Indeed, I sped up a few
things slightly in the refactoring.

Seems to basically work! But it has a big known problem:
If the timezone changes while the assistant (or a long-running command)
runs, it won't notice, since it only checks the inode cache once, and
so will use the old delta for all new inode caches it generates for new
files it's added. Which will result in them seeming changed the next time
it runs.

This commit was sponsored by Vincent Demeester.
2014-06-12 13:42:21 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
964a181026 try to drop unused object if it does not need to be transferred anywhere 2014-01-23 16:51:16 -04:00
Joey Hess
b7e3fe2ebd flip for clarity 2013-12-16 16:24:57 -04:00
Joey Hess
58c7b0a56d assistant: Always batch changes found in startup scan.
Batch detection is heuristic, so can sometimes fail. I observed one such
failure while starting up in a repository with 87000 files. After the first
several batches of ~5000 files, it fell out of batch mode, and never
re-entered it, and so made many more commits of a few files at a time
than necessary.

So, let's always use batch mode when in the startup scan. This avoids the
heuristic there, at least.

There is clearly also room to improve the heuristic. Possibly 10 files is
too high a bar to be found during a commit, on a system that can commit
quickly.
2013-12-16 16:16:19 -04:00
Joey Hess
9d323a98e2 avoid trying to use lsof when it's not in path and --forced 2013-12-04 17:39:44 -04:00
Joey Hess
03932212ec Avoid using git commit in direct mode, since in some situations it will read the full contents of files in the tree.
The assistant's commit code also always avoids git commit, for simplicity.
Indirect mode sync still does a git commit -a to catch unstaged changes.

Note that this means that direct mode sync no longer runs the pre-commit
hook or any other hooks git commit might call. The git annex pre-commit
hook action for direct mode is however explicitly run. (The assistant
already ran git commit with hooks disabled, so no change there.)
2013-12-01 13:59:45 -04:00
Joey Hess
3ac9c4e672 hlint 2013-10-02 22:59:07 -04:00
Joey Hess
98fc7e8a19 add, import, assistant: Better preserve the mtime of symlinks, when when adding content that gets deduplicated.
Note that this turned out to remove a syscall, not add any expense.
Otherwise, I would not have done it.
2013-09-25 16:07:11 -04:00
Joey Hess
672cfc3923 better git version checking 2013-08-02 18:32:26 -04:00
Joey Hess
869c638b82 assistant: Fix bug that caused it to stall when adding a very large number of files at once (around 5 thousand).
This bug was introduced in 82a6db8fe8,
which improved handling of adding very large numbers of files by ensuring
that a minimum number of max size commits (5000 files each) were done.

I accidentially made it wait for another change to appear after such a max
size commit, even if a lot of queued changes were already accumulated.
That resulted in a stall when it got to the end. Now fixed to not wait
any longer than necessary to ensure the watcher has had time to wake back
up after the max size commit.

This commit was sponsored by Michael Linksvayer. Thanks!
2013-07-27 17:42:18 -04:00
Joey Hess
ec4d974dcf assistant: Fix deadlock that could occur when adding a lot of files at once in indirect mode.
This is a laziness problem. Despite the bang pattern on newfiles, the list
was not being fully evaluated before cleanup was called. Moving cleanup out
to after the list is actually used fixes this.

More evidence that I should be using ResourceT or pipes, if any was needed.
2013-07-26 18:42:22 -04:00
Joey Hess
dba1e29949 webapp: Better display of added files. 2013-07-10 15:37:40 -04:00
Joey Hess
82a6db8fe8 committer tweak to wait for Watcher to resume after a max-size commit
Without this, a very large batch add has commits of sizes approx
5000, 2500, 1250, etc down to 10, and then starts over at 5000.
This fixes it so it's 5000+ every time.
2013-04-25 00:48:09 -04:00
Joey Hess
ebee93a837 get rid of need to run pre-commit hook when assistant commits in direct mode
That hook updates associated file bookkeeping info for direct mode.
But, everything already called addAssociatedFile when adding/changing a
file. It only needed to also call removeAssociatedFile when deleting a file,
or a directory.

This should make bulk adds faster, by some possibly significant amount.
Bulk removals may be a little slower, since it has to use catKeyFile now
on each removed file, but will still be faster than adds.
2013-04-24 18:04:59 -04:00
Joey Hess
cd7055631f batch commit every 5 thousand changes, not 10 thousand
There's a tradeoff between making less frequent commits, and
needing to use memory to store all the changes that are coming
in. At 10 thousand, it needs 150 mb of memory. 5 thousand drops
that down to 90 mb or so.

This also turns out to have significant imact on total run time.
I benchmarked 10k changes taking 27 minutes. But two 5k batches
took only 21 minutes.
2013-04-24 16:40:35 -04:00
Joey Hess
bda237f14a convert PendingAddChange back to Change when an add fails
If an add failed, we should lose the KeySource, since it, presumably,
differs due to a change that was made to the file.

(The locked down file is already deleted.)
2013-04-24 16:29:25 -04:00
Joey Hess
a929e6641a show one alert when bulk adding files
Turns out that a lot of the time spent in a bulk add was just updating the
add alert to rotate through each file that was added. Showing one alert
makes for a significant speedup.

Also, when the webapp is open, this makes it take quite a lot less cpu
during bulk adds.

Also, it lets the user know when a bulk add happened, which is sorta
nice..
2013-04-24 13:04:46 -04:00
Joey Hess
ca72b1ac7b assistant: when an add fails, requeue it for later
See analysis in bug report for one way this could happen.
2013-04-23 18:23:04 -04:00
Joey Hess
090a69f00c assistant: Work around misfeature in git 1.8.2 that makes git commit --alow-empty -m "" run an editor.
See http://git-annex.branchable.com/bugs/assistant_hangs_during_commit/
2013-04-18 16:27:17 -04:00
Joey Hess
602baae12e Bugfix: Direct mode no longer repeatedly checksums duplicated files.
Fixed by storing a list of cached inodes for a key, instead of just one.

Backwards compatability note: An old git-annex version will fail to parse
an inode cache file that has been written by a new version, and has
multiple items. It will succees if just one. So old git-annexes will have
even worse behavior when there are duplicated files, if that is possible.
I don't think it will be a problem. (Famous last words.)

Also, note that it doesn't expire old and unused inode caches for a key.
It would be possible to add this if needed; just look through the
associated files for a key and if there are more cached inodes, throw out
any not corresponding to associated files. Unless a file is being copied
repeatedly and the old copy deleted, this lack of expiry should not be a
problem.
2013-04-06 16:07:25 -04:00
Joey Hess
f1b0a4b404 Use lower case hash directories for storing files on crippled filesystems, same as is already done for bare repositories.
* since this is a crippled filesystem anyway, git-annex doesn't use
  symlinks on it
* so there's no reason to use the mixed case hash directories that we're
  stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
  repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
  mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
  test case will recover and it will all just work.
2013-04-04 15:46:33 -04:00
Joey Hess
35a0ae334c assistant: Fix OSX bug that prevented committing changed files to a repository when in indirect mode. 2013-03-17 17:01:43 -04:00
Joey Hess
393340dc3b better handling of batch renames
Rather than wait a full second, which may be longer than needed, or too
short to get all the rename events, we start a mode where we wait 1/10th of
a second, and if there are Changes received, wait again. Basically we're
back in batch mode when this happens.
2013-03-11 15:46:09 -04:00
Joey Hess
14fcfced48 detect directory rename and wait up to 1 second to get all the changes 2013-03-11 15:24:13 -04:00
Joey Hess
f340fd324c synthesize RmChange when a directory is deleted
This gets directory renames closer to being fully detected.
There's close to no extra overhead to doing it this way.
2013-03-11 15:14:42 -04:00
Joey Hess
06046a0d2b finish fast direct mode rename handling. wow, it's fast 2013-03-11 14:14:45 -04:00
Joey Hess
87cba71d5a fix changeFile to not be partial
That led to runtime crashes, without even a warning from -Wall. Yipes!
2013-03-11 13:55:36 -04:00
Joey Hess
61c5e8736c detect renames during commit, and .. um, do nothing special because it's lunch time
But I'm well set up to fast-track direct mode adds for renames now.
2013-03-11 12:56:47 -04:00
Joey Hess
2762ab03b4 assistant: generate better commits for renames 2013-03-10 22:10:26 -04:00
Joey Hess
b2c7ee5551 tweak 2013-03-10 20:20:58 -04:00
Joey Hess
65a4c7966f moved transfer queueing out of watcher and into committer
This cleaned up the code quite a bit; now the committer just looks at the
Change to see if it's a change that needs to have a transfer queued for it.
If I later want to add dropping keys for files that were removed, or
something like that, this should make it straightforward.

This also fixes a bug. In direct mode, moving a file out of an archive
directory failed to start a transfer to get its content. The problem
was that the file had not been committed to git yet, and so the transfer
code didn't want to touch it, since fileKey failed to get its key.
Only starting transfers after a commit avoids this problem.
2013-03-10 18:16:03 -04:00
Joey Hess
724711e4b7 fix 2013-03-03 15:18:24 -04:00
Joey Hess
789ca15012 better prevention of auto repack
Looking through the git sources (documentation is unclear),
it seems commit doesn't ever trigger git-gc, mostly fetching and merging
seems to. I cannot easily override the setting in all those places, so
instead set gc.auto in git config when initializing a repository with
the assistant.

This does mean that the user cannot set gc.auto=0 and completely avoid
repacks, as the assistant does it daily. But, it only does it after there
are 100x the default number of loose objects, so this is probably not going
to be too annoying.
2013-03-03 14:20:07 -04:00
Joey Hess
6dea43831e assistant: Prevent automatic commits from causing git-gc runs, as that can make things quite slow. Instead, git-gc --auto is run once a day. (This can be disabled by the usual gc.auto=0 setting.) 2013-03-03 13:44:35 -04:00
Joey Hess
0c13d3065e git subcommand cleanup
Pass subcommand as a regular param, which allows passing git parameters
like -c before it. This was already done in the pipeing set of functions,
but not the command running set.
2013-03-03 13:39:07 -04:00
Joey Hess
4d33423067 assistant: Avoid noise in logs from git commit about typechanged files in direct mode repositories. 2013-03-01 16:21:29 -04:00
Joey Hess
46c9cbeb1e add additional debug info about reasons for transfers 2013-03-01 15:23:59 -04:00
Joey Hess
2a4dad8bd4 remove debug prints 2013-02-19 23:18:15 -04:00
Joey Hess
d7c93b8913 fully support core.symlinks=false in all relevant symlink handling code
Refactored annex link code into nice clean new library.

Audited and dealt with calls to createSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:  ( liftIO $ createSymbolicLink linktarget file
  only when core.symlinks=true
Assistant/WebApp/Configurators/Local.hs:                createSymbolicLink link link
  test if symlinks can be made
Command/Fix.hs: liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/FromKey.hs:     liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/Indirect.hs:                    liftIO $ createSymbolicLink l f
  refuses to run if core.symlinks=false
Init.hs:                createSymbolicLink f f2
  test if symlinks can be made
Remote/Directory.hs:    go [file] = catchBoolIO $ createSymbolicLink file f >> return True
  fast key linking; catches failure to make symlink and falls back to copy
Remote/Git.hs:          liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True
  ditto
Upgrade/V1.hs:                          liftIO $ createSymbolicLink link f
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to readSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:		( liftIO $ catchMaybeIO $ readSymbolicLink file
  only when core.symlinks=true
Assistant/Threads/Watcher.hs:		ifM ((==) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file))
  code that fixes real symlinks when inotify sees them
  It's ok to not fix psdueo-symlinks.
Assistant/Threads/Watcher.hs:		mlink <- liftIO (catchMaybeIO $ readSymbolicLink file)
  ditto
Command/Fix.hs:	stopUnless ((/=) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) $ do
  command only works in indirect mode
Upgrade/V1.hs:	getsymlink = takeFileName <$> readSymbolicLink file
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to isSymbolicLink.
(Typically used with getSymbolicLinkStatus, but that is just used because
getFileStatus is not as robust; it also works on pseudolinks.)
Remaining calls are all safe, because:

Assistant/Threads/SanityChecker.hs:                             | isSymbolicLink s -> addsymlink file ms
  only handles staging of symlinks that were somehow not staged
  (might need to be updated to support pseudolinks, but this is
  only a belt-and-suspenders check anyway, and I've never seen the code run)
Command/Add.hs:         if isSymbolicLink s || not (isRegularFile s)
  avoids adding symlinks to the annex, so not relevant
Command/Indirect.hs:                            | isSymbolicLink s -> void $ flip whenAnnexed f $
  only allowed on systems that support symlinks
Command/Indirect.hs:            whenM (liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f) $ do
  ditto
Seek.hs:notSymlink f = liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f
  used to find unlocked files, only relevant in indirect mode
Utility/FSEvents.hs:                    | Files.isSymbolicLink s = runhook addSymlinkHook $ Just s
Utility/FSEvents.hs:                                            | Files.isSymbolicLink s ->
Utility/INotify.hs:                             | Files.isSymbolicLink s ->
Utility/INotify.hs:                     checkfiletype Files.isSymbolicLink addSymlinkHook f
Utility/Kqueue.hs:              | Files.isSymbolicLink s = callhook addSymlinkHook (Just s) change
  all above are lower-level, not relevant

Audited and dealt with calls to isSymLink.
Remaining calls are all safe, because:

Annex/Direct.hs:			| isSymLink (getmode item) =
  This is looking at git diff-tree objects, not files on disk
Command/Unused.hs:		| isSymLink (LsTree.mode l) = do
  This is looking at git ls-tree, not file on disk
Utility/FileMode.hs:isSymLink :: FileMode -> Bool
Utility/FileMode.hs:isSymLink = checkMode symbolicLinkMode
  low-level

Done!!
2013-02-17 16:43:14 -04:00
Joey Hess
630f4531a7 fix assistant's use of lsof in crippled filesystem mode 2013-02-15 13:08:22 -04:00
Joey Hess
47477b2807 crippled filesystem support, probing and initial support
git annex init probes for crippled filesystems, and sets direct mode, as
well as `annex.crippledfilesystem`.

Avoid manipulating permissions of files on crippled filesystems.
That would likely cause an exception to be thrown.

Very basic support in Command.Add for cripped filesystems; avoids the lock
down entirely since doing it needs both permissions and hard links.
Will make this better soon.
2013-02-14 14:15:26 -04:00
Joey Hess
5737c49804 support Android's crippled lsof 2013-02-11 17:33:45 -04:00
Joey Hess
547d7745fb pre-commit: Update direct mode mappings.
Making the pre-commit hook look at git diff-index to find changed direct
mode files and update the mappings works pretty well.

One case where it does not work is when a file is git annex added, and then
git rmed, and then this is committed. That's a no-op commit, so the hook
probably doesn't even run, and it certianly never notices that the file
was deleted, so the mapping will still have the original filename in it.

For this and other reasons, it's important that the mappings still be
treated as possibly inconsistent.

Also, the assistant now allows the pre-commit hook to run when in direct
mode, so the mappings also get updated there.
2013-02-06 12:44:19 -04:00
Joey Hess
b19c2e6122 assistant: Fix location log when adding new file in direct mode. 2013-02-05 13:41:48 -04:00
Joey Hess
76ddf9b6d3 webapp: Now allows restarting any threads that crash. 2013-01-26 17:09:33 +11:00
Joey Hess
f51ad2a00c assistant: Avoid committer crashing if a file is deleted at the wrong instant. 2013-01-14 15:02:13 -04:00
Joey Hess
4008590c68 type based git config handling for remotes
Still a couple of places that use git config ad-hoc, but this is most of it
done.
2013-01-01 13:58:14 -04:00
Joey Hess
7f7c31df1c type based git config handling
Now there's a Config type, that's extracted from the git config at startup.
Note that laziness means that individual config values are only looked up
and parsed on demand, and so we get implicit memoization for all of them.
So this is not only prettier and more type safe, it optimises several
places that didn't have explicit memoization before. As well as getting rid
of the ugly explicit memoization code.

Not yet done for annex.<remote>.* configuration settings.
2012-12-29 23:10:18 -04:00
Joey Hess
c0f9810f0b OSX assistant: Uses direct mode by default when setting up a new local repository. 2012-12-28 16:42:11 -04:00