Commit graph

343 commits

Author SHA1 Message Date
Joey Hess
0b57113c42 cleanup 2013-04-02 19:45:52 -04:00
Joey Hess
38d61f934d Update working tree files fully atomically
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.

But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.

(It also probably avoids confusing displays in file managers.)
2013-04-02 15:02:00 -04:00
Joey Hess
67e817c6a1 New annex.largefiles setting, which configures which files git annex add and the assistant add to the annex.
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.

A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
2013-03-29 16:17:13 -04:00
Joey Hess
75a1c2f91a cleanup debug print 2013-03-28 14:18:26 -04:00
Joey Hess
80c8c0e62a comment typo 2013-03-18 13:17:43 -04:00
Joey Hess
7a77f98576 move comment to right place 2013-03-18 11:18:04 -04:00
Joey Hess
b3d3ece2ab remove old debug print 2013-03-16 17:04:48 -04:00
Joey Hess
f7de51e8b6 Bugfix: Fix bug in inode cache sentinal check, which broke copying to local repos if the repo being copied from had moved to a different filesystem or otherwise changed all its inodes' 2013-03-12 16:41:54 -04:00
Joey Hess
61c5e8736c detect renames during commit, and .. um, do nothing special because it's lunch time
But I'm well set up to fast-track direct mode adds for renames now.
2013-03-11 12:56:47 -04:00
Joey Hess
40df015d90 remove Eq instance for InodeCache
There are two types of equality here, and which one is right varies,
so this forces me to consider and choose between them.

Based on this, I learned that the commit in git anex sync was
always doing a strong comparison, even when in a repository where
the inodes had changed. Fixed that.
2013-03-11 02:57:48 -04:00
Joey Hess
cbb6e1fae4 tag xmpp pushes with jid
This fixes the issue mentioned in the last commit.

Turns out just collecting UUID of clients behind a XMPP remote is
insufficient (although I should probably still do it for other reasons),
because a single remote repo might be connected via both XMPP and local
pairing. So a way is needed to know when a push was received from any
client using a given XMPP remote over XMPP, as opposed to via ssh.
2013-03-06 16:29:19 -04:00
Joey Hess
c23ea9e311 assistant: Get back in sync with XMPP remotes after network reconnection, and on startup.
Make manualPull send push requests over XMPP.

When reconnecting with remotes, those that are XMPP remotes cannot
immediately be pulled from and scanned, so instead maintain a set of
(probably) desynced remotes, and put XMPP remotes on it. (This set could be
used in other ways later, if we can detect we're out of sync with other
types of remotes.)

The merger handles detecting when a XMPP push is received from a desynced
remote, and triggers a scan then, if they have in fact diverged.

This has one known bug: A single XMPP remote can have multiple clients
behind it. When this happens, only the UUID of one client is recorded
as the UUID of the XMPP remote. Pushes from the other XMPP clients will not
trigger a scan. If the client whose UUID is expected responds to the push
request, it'll work, but when that client is offline, we're SOL.
2013-03-06 15:09:31 -04:00
Joey Hess
974d075108 Run ssh with -T to avoid tty allocation and any login scripts that may do undesired things with it. 2013-03-04 23:36:07 -04:00
Joey Hess
0c13d3065e git subcommand cleanup
Pass subcommand as a regular param, which allows passing git parameters
like -c before it. This was already done in the pipeing set of functions,
but not the command running set.
2013-03-03 13:39:07 -04:00
Joey Hess
cbd53b4a8c Makefile now builds using cabal, taking advantage of cabal's automatic detection of appropriate build flags.
The only thing lost is ./ghci

Speed: make fast used to take 20 seconds here, when rebuilding from
touching Command/Unused.hs. With cabal, it's 29 seconds.
2013-02-27 02:39:22 -04:00
Joey Hess
2d9c046dea annex.version is now set to 4 for direct mode repositories
To avoid old versions of git-annex getting confused.

There is no upgrade required though.
We switch back to 3 when going from direct to indirect.
2013-02-26 15:13:10 -04:00
Joey Hess
e423190b11 fix 2013-02-24 17:40:14 -04:00
Joey Hess
6ff1ce76b7 hopefully fix a bug 2013-02-24 17:21:04 -04:00
Joey Hess
afb21353c8 remove debug print 2013-02-23 14:34:02 -04:00
Joey Hess
051476c2a9 squelch warning 2013-02-22 18:22:12 -04:00
Joey Hess
08854afa10 fix inverted logic 2013-02-22 17:01:48 -04:00
Joey Hess
4689fbde35 fix sameInodeCache to check the inode change sentinal
This should fix the problem where the assistant, on Android, re-adds every
file on startup.
2013-02-22 15:19:28 -04:00
Joey Hess
faa9d3c22b work around broken getEnvironment on Android in the most important place: git annex init
This resulted in a lot of user complains that git annex init had git
telling them they needed to run git config --global user.email .. which
didn't work because even HOME was not passed into git.
2013-02-22 14:47:29 -04:00
Joey Hess
6f9be431e6 only create inode sentinal file when initializing a new repo 2013-02-20 13:55:53 -04:00
Joey Hess
00b465e213 shorter directory to external ssh socket
Before it was too long to be used.
2013-02-19 17:31:08 -04:00
Joey Hess
624e34649f Direct mode: Support filesystems like FAT which can change their inodes each time they are mounted. 2013-02-19 17:31:03 -04:00
Joey Hess
0f4cc559a7 Android: Support ssh connection caching. 2013-02-19 14:57:45 -04:00
Joey Hess
d799ef3182 set fileSystemEncoding when reading files that might be binary 2013-02-18 17:19:37 -04:00
Joey Hess
422dd28f0b hlint 2013-02-18 02:39:40 -04:00
Joey Hess
9aa979edbd types 2013-02-18 02:35:38 -04:00
Joey Hess
d7c93b8913 fully support core.symlinks=false in all relevant symlink handling code
Refactored annex link code into nice clean new library.

Audited and dealt with calls to createSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:  ( liftIO $ createSymbolicLink linktarget file
  only when core.symlinks=true
Assistant/WebApp/Configurators/Local.hs:                createSymbolicLink link link
  test if symlinks can be made
Command/Fix.hs: liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/FromKey.hs:     liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/Indirect.hs:                    liftIO $ createSymbolicLink l f
  refuses to run if core.symlinks=false
Init.hs:                createSymbolicLink f f2
  test if symlinks can be made
Remote/Directory.hs:    go [file] = catchBoolIO $ createSymbolicLink file f >> return True
  fast key linking; catches failure to make symlink and falls back to copy
Remote/Git.hs:          liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True
  ditto
Upgrade/V1.hs:                          liftIO $ createSymbolicLink link f
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to readSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:		( liftIO $ catchMaybeIO $ readSymbolicLink file
  only when core.symlinks=true
Assistant/Threads/Watcher.hs:		ifM ((==) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file))
  code that fixes real symlinks when inotify sees them
  It's ok to not fix psdueo-symlinks.
Assistant/Threads/Watcher.hs:		mlink <- liftIO (catchMaybeIO $ readSymbolicLink file)
  ditto
Command/Fix.hs:	stopUnless ((/=) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) $ do
  command only works in indirect mode
Upgrade/V1.hs:	getsymlink = takeFileName <$> readSymbolicLink file
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to isSymbolicLink.
(Typically used with getSymbolicLinkStatus, but that is just used because
getFileStatus is not as robust; it also works on pseudolinks.)
Remaining calls are all safe, because:

Assistant/Threads/SanityChecker.hs:                             | isSymbolicLink s -> addsymlink file ms
  only handles staging of symlinks that were somehow not staged
  (might need to be updated to support pseudolinks, but this is
  only a belt-and-suspenders check anyway, and I've never seen the code run)
Command/Add.hs:         if isSymbolicLink s || not (isRegularFile s)
  avoids adding symlinks to the annex, so not relevant
Command/Indirect.hs:                            | isSymbolicLink s -> void $ flip whenAnnexed f $
  only allowed on systems that support symlinks
Command/Indirect.hs:            whenM (liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f) $ do
  ditto
Seek.hs:notSymlink f = liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f
  used to find unlocked files, only relevant in indirect mode
Utility/FSEvents.hs:                    | Files.isSymbolicLink s = runhook addSymlinkHook $ Just s
Utility/FSEvents.hs:                                            | Files.isSymbolicLink s ->
Utility/INotify.hs:                             | Files.isSymbolicLink s ->
Utility/INotify.hs:                     checkfiletype Files.isSymbolicLink addSymlinkHook f
Utility/Kqueue.hs:              | Files.isSymbolicLink s = callhook addSymlinkHook (Just s) change
  all above are lower-level, not relevant

Audited and dealt with calls to isSymLink.
Remaining calls are all safe, because:

Annex/Direct.hs:			| isSymLink (getmode item) =
  This is looking at git diff-tree objects, not files on disk
Command/Unused.hs:		| isSymLink (LsTree.mode l) = do
  This is looking at git ls-tree, not file on disk
Utility/FileMode.hs:isSymLink :: FileMode -> Bool
Utility/FileMode.hs:isSymLink = checkMode symbolicLinkMode
  low-level

Done!!
2013-02-17 16:43:14 -04:00
Joey Hess
397082013a proper fix for dropunused
Now getKeysPresent checks that the key's content, not only its directory,
exists. In direct mode, the inode cache file is used as a standin for the
content.

removeAnnex always removes the inode cache file, and drop and move --from
always call removeAnnex, even if the object does not seem to be inAnnex,
to ensure it's always deleted.
2013-02-15 17:58:49 -04:00
Joey Hess
5a8fb26d0a Revert "Clean up direct mode cache and mapping info when dropping keys."
This reverts commit 57780cb3a4.

This was buggy, it caused the direct mode cache to be lost when dropping
keys, so when the file is gotten back, it's stored in indirect mode.

Note to self: Do not attempt bug fixes at 6 am!
2013-02-15 16:37:57 -04:00
Joey Hess
5ea4b91fb4 start to support core.symlinks=false
Utility functions to handle no symlink mode, and converted Annex.Content to
use them; still many other places to convert.
2013-02-15 16:03:11 -04:00
Joey Hess
7ce30b534f add: Improved detection of files that are modified while being added.
In indirect mode, now checks the inode cache to detect changes to a file.
Note that a file can still be changed if a process has it open for write,
after landing in the annex.

In direct mode, some checking of the inode cache was done before, but
from a much later point, so fewer modifications could be detected. Now it's
as good as indirect mode.

On crippled filesystems, no lock down is done before starting to add a
file, so checking the inode cache is the only protection we have.
2013-02-14 16:54:36 -04:00
Joey Hess
a52f8f382b split out Utility.InodeCache 2013-02-14 16:17:40 -04:00
Joey Hess
47477b2807 crippled filesystem support, probing and initial support
git annex init probes for crippled filesystems, and sets direct mode, as
well as `annex.crippledfilesystem`.

Avoid manipulating permissions of files on crippled filesystems.
That would likely cause an exception to be thrown.

Very basic support in Command.Add for cripped filesystems; avoids the lock
down entirely since doing it needs both permissions and hard links.
Will make this better soon.
2013-02-14 14:15:26 -04:00
Joey Hess
f202d997f4 Now uses the Haskell uuid library, rather than needing a uuid program.
Been meaning to do this for some time; Android port was last straw.

Note that newer versions of the uuid library have a Data.UUID.V4 that
generates random UUIDs slightly more cleanly, but Debian has an old version
of the library, so I do it slightly round-about.
2013-02-10 14:52:54 -04:00
Joey Hess
57780cb3a4 Clean up direct mode cache and mapping info when dropping keys.
These files were left behind, and made getKeysPresent find keys that were
not present. It would be expensive to make getKeysPresent check that the
actual key files are present (it just lists the directories). But that's not
needed if we just clean up the stale cache and mapping files.

To handle systems that were in direct mode and got switched back with stale
direct mode files, made cleanObjectLoc remove all files in the key's directory.

git annex unused will still list keys that are gone but for which the stale
direct mode files exists. To deal with that, made dropunused remove the key's
directory even if the key does not seem to be present.
2013-02-07 08:28:40 -04:00
Joey Hess
af3a25ee03 Deal with stale mappings for deleted file in direct mode.
The most common way for a mapping to be stale is when a file was deleted,
or renamed. Nothing updates the mappings for deletions yet.
But they can also become stale in other ways. For example a file can
be modified.

So, the mapping is not trusted to be consistent. When we get a key,
only replace symlinks that still point to that key with its content.
When we drop a key, only put back symlinks for files that still have
the direct mode content.
2013-02-05 16:48:00 -04:00
Joey Hess
0e3f931f37 add another setting to GitConfig 2013-01-28 00:33:19 +11:00
Joey Hess
103b572d8e ensure that content directory is thawed when writing direct mode mapping and cache files 2013-01-26 20:09:15 +11:00
Joey Hess
f86462b475 allow lazy reading of map contents
Don't explicitly close; hGetContents will close when read is done.
2013-01-18 13:16:16 -04:00
Joey Hess
e481ca7658 some more direct mode fixes
Avoid a crash if a mapping contains files that no longer exist.
This could happen because eg, one was deleted and a commit has not yet been
done to update the mapping.

Fix path calculation.
2013-01-18 12:39:26 -04:00
Joey Hess
5c58e9c101 Avoid filename encoding errors when writing direct mode mappings. 2013-01-18 12:26:45 -04:00
Joey Hess
bbf0e74f72 Fix direct mode mapping code to always store direct mode filenames relative to the top of the repository, even when operating inside a subdirectory. 2013-01-18 12:20:08 -04:00
Joey Hess
85c564ea94 In direct mode, files with the same key are no longer hardlinked, as that would cause a surprising behavior if modifying one, where the other would also change. 2013-01-14 11:56:37 -04:00
Joey Hess
a6a5ed8121 check for direct mode file change when copying to a local git remote 2013-01-10 11:45:44 -04:00
Joey Hess
1bc49b7158 Special remotes now all rollback storage of keys that get modified during the transfer, which can happen in direct mode. 2013-01-09 18:42:29 -04:00
Joey Hess
858ad6783b add works in direct mode
Also, changed sync to no longer automatically add files in direct mode.
That was only necessary before because add didn't work.
2013-01-06 17:24:22 -04:00
Joey Hess
909f67443f Fix transferring files to special remotes in direct mode. 2013-01-06 14:29:01 -04:00
Joey Hess
e457be7631 direct: Avoid hardlinking symlinks that point to the same content when the content is not present. 2013-01-06 13:57:53 -04:00
Joey Hess
1c83b6c439 work around a very strange git-cat-file behavior
Sometimes it seems that git-cat-file --batch stops getting info for
files in the current repo, when ":file" is fed to it. I have not reproduced
this at the command line, but only when using git annex whereis and git
annex move inside a direct mode repo. Those failed, because cat-file
returned "file missing". OTOH, git annex find works fine, despite passing
the same file to cat-file. It seems that the failing commands first asked
cat-file to show a file on the git-annex branch. Perhaps it got "stuck" on
that branch? But I cannot repoduce it running cat-file by hand. Most
strange. HEAD is a workaround for this extreme weirdness, since I spent a
good 2 hours struggling with it already.
2013-01-05 17:06:24 -04:00
Joey Hess
1cdf2b923d assistant: Make expensive transfer scan work fully in direct mode.
The expensive scan uses lookupFile, but in direct mode, that doesn't work
for files that are present. So the scan was not finding things that are
present that need to be uploaded. (It did find things not present that
needed to be downloaded.)

Now lookupFile also works in direct mode. Note that it still prefers
symlinks on disk to info committed to git, in direct mode. This is
necessary to make things like Assistant.Threads.Watcher.onAddSymlink
work correctly, when given a new symlink not yet checked into git (or
replacing a file checked into git).
2013-01-05 15:57:53 -04:00
Joey Hess
4008590c68 type based git config handling for remotes
Still a couple of places that use git config ad-hoc, but this is most of it
done.
2013-01-01 13:58:14 -04:00
Joey Hess
7f7c31df1c type based git config handling
Now there's a Config type, that's extracted from the git config at startup.
Note that laziness means that individual config values are only looked up
and parsed on demand, and so we get implicit memoization for all of them.
So this is not only prettier and more type safe, it optimises several
places that didn't have explicit memoization before. As well as getting rid
of the ugly explicit memoization code.

Not yet done for annex.<remote>.* configuration settings.
2012-12-29 23:10:18 -04:00
Joey Hess
2fdefc656b fix logic error breaking direct mode assistant autocommit of modified files 2012-12-28 16:00:19 -04:00
Joey Hess
eb40227d15 assistant direct mode file add/change bookkeeping
When a file is changed in direct mode, the old content is probably lost
(at least from the local repo), and bookeeping needs to be updated to
reflect this.

Also, synthetic add events are generated at assistant startup, so
make it detect when the file has not really changed, and avoid re-adding
it.

This does add the overhead of querying the runing git cat-file for the
key that's recorded in git for the file, each time a file is added or
modified in direct mode.
2012-12-25 15:48:15 -04:00
Joey Hess
ddb0adb998 more quickcheck fun 2012-12-19 16:36:19 -04:00
Joey Hess
93c430c2a4 comment 2012-12-19 12:46:35 -04:00
Joey Hess
97d670b0d5 normalise associated files
Sometimes ./file will be passed in, and sometimes file; need to treat these
the same.
2012-12-19 12:44:24 -04:00
Joey Hess
05ec4587dd partial and incomplete automatic merging in direct mode
Handles our file right, but not theirs.
2012-12-18 17:15:16 -04:00
Joey Hess
53dbcce645 direct mode merging works!
Automatic merge resoltion code needs to be fixed to preserve objects from
direct mode files.
2012-12-18 15:04:44 -04:00
Joey Hess
5df3c66a85 added direct and indirect commands 2012-12-13 15:44:56 -04:00
Joey Hess
cfe354eccd whitespace fix 2012-12-13 00:46:30 -04:00
Joey Hess
ffdd08fd2e Merge branch 'master' into desymlink 2012-12-13 00:46:10 -04:00
Joey Hess
0d50a6105b whitespace fixes 2012-12-13 00:45:27 -04:00
Joey Hess
b080a58b76 Merge branch 'master' into desymlink
Conflicts:
	Annex/CatFile.hs
	Annex/Content.hs
	Git/LsFiles.hs
	Git/LsTree.hs
2012-12-13 00:29:06 -04:00
Joey Hess
f87a781aa6 finished where indentation changes 2012-12-13 00:24:19 -04:00
Joey Hess
e7b8cb0063 direct mode committing 2012-12-12 19:20:38 -04:00
Joey Hess
f2ed0f9659 fix associated files to not fall back to object location 2012-12-12 13:11:59 -04:00
Joey Hess
752b5354ab make parent directory 2012-12-12 13:05:50 -04:00
Joey Hess
9d133270c2 update 2012-12-10 15:02:44 -04:00
Joey Hess
514957914d direct mode mappings now updated by git annex sync
Still lots to do to make sync handle direct mode, but this is a good first
step.
2012-12-10 14:37:24 -04:00
Joey Hess
b4c6da9cbd Got object sending working in direct mode.
However, I don't yet have a reliable way to deal with files being modified
while they're being transferred. I have code that detects it on the sending
side, but the receiver is still free to move the wrong content into its
annex, and record that it has the content. So that's not acceptable, and
I'll need to work on it some more.

However, at this point I can use a direct mode repository as a remote and
transfer files from and to it.
2012-12-08 17:03:39 -04:00
Joey Hess
664765e757 update the cache automatically when moving objects in or out 2012-12-08 13:13:36 -04:00
Joey Hess
ef24751922 support for checking presence of objects in direct mode
Also for dropping objects in direct mode.

Checking presence reliably needs a cache of mtime, size, and inode.
This way, if a file is modified, keys that point to it are no longer
present.

Also, the code for restoring the symlink when removing objects is
unnecessarily messy. calcGitLink was generating links starting with
"../../remote/.git/", when running "git annex move --from remote".
I put in a workaround, but calcGitLink should probably be fixed.

There is not yet support for getting objects from repositories in direct
mode; it still looks for content in .git/annex/objects, and there's no
once place I can change to fix that.

Also, getting objects from direct mode repositories is problematic since
the can be changed while the object is being transferred. It probably needs
to quarantine it first.
2012-12-07 17:29:55 -04:00
Joey Hess
3898d8c091 support for storing files in direct mode 2012-12-07 14:53:02 -04:00
Joey Hess
99a8a5297c --auto fixes
* get/copy --auto: Transfer data even if it would exceed numcopies,
  when preferred content settings want it.
* drop --auto: Fix dropping content when there are no preferred content
  settings.
2012-12-06 13:22:16 -04:00
Joey Hess
b5a9560a1b squelch warning 2012-11-26 16:30:46 -04:00
Joey Hess
da6fb44446 finished XMPP pairing!
This includes keeping track of which buddies we're pairing with, to know
which PairAck are legitimate.
2012-11-05 17:43:17 -04:00
Joey Hess
9767562f65 rsync special remote: Include annex-rsync-options when running rsync to test a key's presence.
Also, use the new withQuietOutput function to avoid running the shell to
/dev/null stderr in two other places.
2012-10-28 13:51:14 -04:00
Joey Hess
3417c55189 remove git-annex branch read cache
This cache prevented noticing changes made by another process.

The case I just ran into involved the assistant dropping a file, which
cached its presence info. Then the same file was downloaded again,
but the assistant didn't know its presence info had changed.

I don't see a way to keep this cache. Will instead rely on the OS level
file cache, for files in the journal. May need to add more higher-level
caching of info that it's ok to have a potentially stale copy of,
although much of git-annex already does so.
2012-10-19 14:25:15 -04:00
Joey Hess
e7780a39f5 Preferred content path matching bugfix.
When in a subdir, both the normal filepath, and the filepath relative to
the top of the git repo are needed for matching. The former for key lookup,
and the latter for include/exclude to match against. Previously, key lookup
didn't work in this situation.
2012-10-17 16:01:09 -04:00
Joey Hess
3156febec8 disable ssh connection caching for standalone builds
The standalone build does not bundle its own ssh, so should be built
to support as wide an array of ssh versions as possible, so turn off
connection caching.

Unfortunatly, as implemented this forces a full rebuild when building the
standalone binary, and of course it makes it somewhat slower.

This is not ideal, but neither is probing the ssh version every time it's
run (slow), or once when initializing a repo (fragile).
2012-10-15 14:49:40 -04:00
Joey Hess
97ea08e2d1 Avoid unsetting HOME when running certian git commands. Closes: #690193
Setting GIT_INDEX_FILE clobbers the rest of the environment, making git
not read ~/.gitconfig, and blow up if GECOS didn't have a name for the
user.

I'm not entirely happy with getEnvironment being run every time now,
that's somewhat expensive. It may make sense to just set GIT_COMMITTER_*
and GIT_AUTHOR_*, but I worry that clobbering the rest could break PATH,
or GIT_PATH, or something else that might be used by a command run in here.
And caching the environment is not a good idea either; it can change..
2012-10-11 12:58:24 -04:00
Joey Hess
39be7eea40 add standard group selector to repo edit form 2012-10-10 16:04:28 -04:00
Joey Hess
9da7dd8874 webapp: configure new repos to use the standard preferred content settings 2012-10-10 15:35:10 -04:00
Joey Hess
3490977d97 webapp: put new repos in standard groups
I'm using transfer for most things, both removable drives and cloud
storage, because it's the safest choice. We'll see if it makes sense
to prompt for the group when setting this up, or let the user pick
something else after the fact.
2012-10-10 15:27:25 -04:00
Joey Hess
f9b81c7a75 refactor 2012-10-10 15:15:56 -04:00
Joey Hess
5ac15149cc assistant: Now honors preferred content settings when deciding what to transfer.
Both when queueing downloads, and uploads, consults the preferred content
settings.

I didn't make it check yet when requeing failed transfers or queuing
deferred downloads; dealing with the preferred content settings (or indeed,
other settings) changing while the assistant is running still needs work.
2012-10-09 12:18:41 -04:00
Joey Hess
fee40dd374 generalized Annex.Wanted
this should make it easy to use from inside the assistant, where
everything is an AssociatedFile.
2012-10-08 17:14:01 -04:00
Joey Hess
836561e057 fix invered logic for shouldDrop 2012-10-08 16:12:02 -04:00
Joey Hess
1eedf495c3 make copy --to check preferred content of the remote 2012-10-08 16:06:56 -04:00
Joey Hess
34e7faf71a uninit: Unset annex.version. Closes: #689852 2012-10-07 16:04:03 -04:00
Joey Hess
47314c0fad fix last zombies in the assistant
Made Git.LsFiles return cleanup actions, and everything waits on
processes now, except of course for Seek.
2012-10-04 19:56:32 -04:00
Joey Hess
5594bf0643 more zombie fighting
I'm down to 9 places in the code that can produce unwaited for zombies.

Most of these are pretty innocuous, at least for now, are only
used in short-running commands, or commands that run a set of
actions and explicitly reap zombies after each one.

The one from Annex.Branch.files could be trouble later,
since both Command.Fsck and Command.Unused can trigger it,
and the assistant will be doing those eventally. Ditto the one in
Git.LsTree.lsTree, which Command.Unused uses.

The only ones currently affecting the assistant though, are
in Git.LsFiles. Several threads use several of those.

(And yeah, using pipes or ResourceT would be a less ad-hoc approach,
but I don't really feel like ripping my entire code base apart right
now to change a foundation monad. Maybe one of these days..)
2012-10-04 18:47:31 -04:00
Joey Hess
bc83179a76 Test that uuid -m works, falling back to plain uuid if not. 2012-09-25 10:48:20 -04:00
Joey Hess
3887432c54 fixes for transfer resume
Fix resuming of downloads, which do not have a transfer info file to read.

When checking upload progress, use the MVar, rather than re-reading
the info file.

Catch exceptions in the transfer action. Required a tryAnnex.
2012-09-24 13:18:16 -04:00
Joey Hess
e8188ea611 flip catchDefaultIO 2012-09-17 00:18:07 -04:00
Joey Hess
ba0334116c more descriptive name for oneshot 2012-09-15 20:46:38 -04:00
Joey Hess
750c4ac6c2 bugfix: avoid staging but not committing changes to git-annex branch
Branch.get is not able to see changes that have been staged to the index
but not committed. This is a limitation of git cat-file --batch; when
reading from the index, as opposed to from a branch, it does not notice
changes made after the first time it reads the index.

So, had to revert the changes made in 1f73db3469
to make annex.alwayscommit=false stage changes.

Also, ensure that Branch.change and Branch.get always see changes
at all points during a commit, by not deleting journal files when
staging to the index. Delete them only after committing the branch.
Before, there was a race during commits where a different git-annex
could see out-of-date info from the branch while a commit was in progress.

That's also done when updating the branch to merge in remote branches.

In the case where the local git-annex branch has had changes pushed into it
that are not yet reflected in the index, and there are journalled changes
as well, a merge commit has to be done.
2012-09-15 20:15:16 -04:00
Joey Hess
a1f93f06fd eliminate some commits to the git-annex branch
Commits used to be made to the git-annex branch whenever there were
journalled changes from a previous command, and the current command looked
up the value of a file. This no longer happens.

This means that transferkey, which is a oneshot command that stages
changes, can be run multiple times by the assistant, without each of them
committing the changes made by the command before. Which will be a lot
faster and use less space by batching up the commits.

Commits still happen if a remote git-annex branch has been changed and is
merged in.
2012-09-15 18:36:42 -04:00
Joey Hess
ca45cea113 Revert "add catFileIndex"
This interface is not a good idea, because a running git cat-file --batch
does not notice when existing files in the index are changed.
2012-09-15 18:30:53 -04:00
Joey Hess
e1baf48d88 add catFileIndex 2012-09-15 17:06:10 -04:00
Joey Hess
87fb9c690e remove withIndexUpdate helper 2012-09-15 15:48:21 -04:00
Joey Hess
5573911d25 Disable ssh connection caching if the path to the control socket would be too long (and use relative path to minimise path to the control socket). 2012-09-13 19:26:39 -04:00
Joey Hess
c9b3b8829d thread safe git-annex index file use 2012-08-24 20:50:39 -04:00
Joey Hess
5c3e14649e avoid unnecessary transfer scans when syncing a disconnected remote
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.

However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.

Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.

This code could have its efficiency improved:

* When multiple remotes are synced, if any one has diverged, they're
  all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
  the local repo has new content, or both, and could optimise its scan
  accordingly.
2012-08-22 15:05:57 -04:00
Joey Hess
9fc94d780b better readProcess 2012-07-19 00:57:40 -04:00
Joey Hess
1db7d27a45 add back debug logging
Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.

Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.

As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
2012-07-19 00:46:52 -04:00
Joey Hess
d1da9cf221 switch from System.Cmd.Utils to System.Process
Test suite now passes with -threaded!

I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.

In the process, --debug stopped printing commands it runs. I may try to
bring that back later.

Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
2012-07-18 18:00:24 -04:00
Joey Hess
05310538ef more debugging 2012-07-18 13:31:00 -04:00
Joey Hess
75b6ee81f9 avoid ByteString.Char8 where not needed
Its truncation behavior is a red flag, so avoid using it in these places
where only raw ByteStrings are used, without looking at the data inside.
2012-06-20 13:13:40 -04:00
Joey Hess
e0095b0bdc fishy commit 2012-06-14 00:01:48 -04:00
Joey Hess
942d8f7298 hlint 2012-06-12 11:32:06 -04:00
Joey Hess
ca9ee21bd7 crazy optimisation
Crazy like a fox..
2012-06-10 19:58:34 -04:00
Joey Hess
c5707c84d3 queue size fix
Increase queue size for update-index actions, because otherwise they'll
never be flushed.
2012-06-10 13:56:04 -04:00
Joey Hess
d45a9a7831 refactor and function name cleanup
(oops, I had a calcMerge and a calc_merge!)
2012-06-08 00:29:39 -04:00
Joey Hess
20f425be19 make watch use the queue
May not work. Certianly needs to flush the queue from time to time
when only symlink changes are being made.
2012-06-07 15:40:44 -04:00
Joey Hess
0a11b35d89 extend Git.Queue to be able to queue more than simple git commands
While I was in there, I noticed and fixed a bug in the queue size
calculations. It was never encountered only because Queue.add was
only ever run with 1 file in the list.
2012-06-07 15:19:44 -04:00
Joey Hess
b819f644ad close the git add race
There's a race adding a new file to the annex: The file is moved to the
annex and replaced with a symlink, and then we git add the symlink. If
someone comes along in the meantime and replaces the symlink with
something else, such as a new large file, we add that instead. Which could
be bad..

This race is fixed by avoiding using git add, instead the symlink is
directly staged into the index.

It would be nice to make `git annex add` use this same technique.
I have not done so yet because it currently runs git update-index once per
file, which would slow does `git annex add`. A future enhancement would be
to extend the Git.Queue to include the ability to run update-index with
a list of Streamers.
2012-06-06 14:29:10 -04:00
Joey Hess
993e6459a3 factor out nukeFile 2012-06-06 13:13:13 -04:00
Joey Hess
27cfeca4ea Merge branch 'master' into watch 2012-06-06 02:16:21 -04:00
Joey Hess
f1bd72ea54 factor out generic update-index code from unionmerge code 2012-06-06 00:10:34 -04:00
Joey Hess
7a6fb8ae4e flush the git queue when a new type of action is being added to it
This allows the queue to be used in a single process for multiple possibly
conflicting commands, like add and rm, without running them out of order.

This assumes that running the same git subcommand with different parameters
cannot itself conflict.
2012-06-04 20:41:22 -04:00
Joey Hess
bb4f31a0ee Clean up handling of git directory and git worktree.
Baked into the code was an assumption that a repository's git directory
could be determined by adding ".git" to its work tree (or nothing for bare
repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are
used to separate the two.

This was attacked at the type level, by storing the gitdir and worktree
separately, so Nothing for the worktree means a bare repo.

A complication arose because we don't learn where a repository is bare
until its configuration is read. So another Location type handles
repositories that have not had their config read yet. I am not entirely
happy with this being a Location type, rather than representing them
entirely separate from the Git type. The new code is not worse than the
old, but better types could enforce more safety.

Added support for core.worktree. Overriding it with -c isn't supported
because it's not really clear what to do if a git repo's config is read, is
not bare, and is then overridden to bare. What is the right git directory
in this case? I will worry about this if/when someone has a use case for
overriding core.worktree with -c. (See Git.Config.updateLocation)

Also removed and renamed some functions like gitDir and workTree that
misused git's terminology.

One minor regression is known: git annex add in a bare repository does not
print a nice error message, but runs git ls-files in a way that fails
earlier with a less nice error message. This is because before --work-tree
was always passed to git commands, even in a bare repo, while now it's not.
2012-05-18 17:03:12 -04:00
Joey Hess
f7d8982672 Fix use of several config settings
annex.ssh-options, annex.rsync-options, annex.bup-split-options.

And adjust types to avoid the bugs that broke several config settings
recently. Now "annex." prefixing is enforced at the type level.
2012-05-05 20:16:56 -04:00
Joey Hess
76102c1c75 display "Recording state in git..." when staging the journal
A bit tricky to avoid printing it twice in a row when there are queued git
commands to run and journal to stage.

Added a generic way to run an action that may output multiple side
messages, with only the first displayed.
2012-04-27 13:54:33 -04:00
Joey Hess
e0b7012ccc uninit: Clear annex.uuid from .git/config. Closes: #670639 2012-04-27 12:21:38 -04:00
Joey Hess
84ac8c58db Add annex.httpheaders and annex.httpheader-command config settings
Allow custom headers to be sent with all HTTP requests.

(Requested by the Internet Archive)
2012-04-22 01:13:09 -04:00
Joey Hess
ed79596b75 noop 2012-04-21 23:32:33 -04:00
Joey Hess
bee420bd2d in which I discover void
void :: Functor f => f a -> f () -- ah, of course that's useful :)
2012-04-21 23:06:19 -04:00
Joey Hess
cab63b89f2 cache parsed core.sharedrepository 2012-04-21 19:42:49 -04:00
Joey Hess
b98b69e8c6 honor core.sharedRepository when making all the other files in the annex
Lock files, directories, etc.
2012-04-21 19:36:03 -04:00
Joey Hess
7e45712d19 better file mode setting code 2012-04-21 16:01:56 -04:00
Joey Hess
b4a5e39ee6 Support git's core.sharedRepository configuration
This is incomplete, it does not honor it yet for hash directories
and other annex bookkeeping files. Some of that is not needed for a bare
repo; some of it may be.
2012-04-21 15:36:52 -04:00
Joey Hess
b65e257b13 inverted logic 2012-04-20 16:16:13 -04:00
Joey Hess
262017e17d export a more generalized checkDiskSpace 2012-04-20 16:06:10 -04:00
Joey Hess
e38a839a80 Rewrote free disk space checking code
Moving the portability handling into a small C library cleans up things
a lot, avoiding the pain of unpacking structs from inside haskell code.
2012-03-22 17:32:47 -04:00
Joey Hess
f1398b5583 use new getConfig 2012-03-22 17:32:47 -04:00
Joey Hess
4eb5112681 rationalize getConfig
getConfig got a remote-specific config, and this confusing name caused it
to be used a couple of places that only were interested in global configs.
Rename to getRemoteConfig and make getConfig only get global configs.

There are no behavior changes here, but remote.<name>.annex-web-options
never actually worked (and per-remote web options is a very unlikely to be
useful case so I didn't make it work), so fix the documentation for it.
2012-03-22 17:32:47 -04:00
Joey Hess
188e2edc41 status: Prints available local disk space, or shows if git-annex doesn't know. 2012-03-21 21:55:02 -04:00
Joey Hess
181d2ccd20 Improve detection of inability to check free disk space.
Don't check if configure indicated checks won't work. This should fix a
FTBFS on mipsel, where configure correctly detects the checks won't work,
while garbage is returned for disk space info at git-annex runtime. It also
means that, when built via cabal, disk space checks are not enabled,
unfortunatly.
2012-03-21 21:21:20 -04:00
Joey Hess
60ab3d84e1 added ifM and nuked 11 lines of code
no behavior changes
2012-03-14 17:43:34 -04:00
Joey Hess
b325694645 getKeysPresent is now fully lazy
.. Allowing it to be used by things in constant space!

Random statistics: git annex status has gone from taking 239 mb
of memory and 26 seconds in a repo, to 8 mb and 13 seconds.

The trick here is the unsafeInterleaveIO, and the form of the function's
recursion, which I cribbed heavily from System.IO.HVFS.Utils.recurseDirStat.
The difference is, this one goes to a limited depth and avoids statting
everything.
2012-03-11 18:04:58 -04:00
Joey Hess
ff3644ad38 status: Fixed to run in nearly constant space.
Before, it leaked space due to caching lists of keys. Now all necessary
data about keys is calculated as they stream in.

The "nearly constant" is due to getKeysPresent, which builds up a lot
of [] thunks as it traverses .git/annex/objects/. Will deal with it later.
2012-03-11 17:15:58 -04:00
Joey Hess
d08ee1a9d2 syscall optimisation 2012-03-06 13:56:20 -04:00
Joey Hess
12b89a3eb8 configure: Check if ssh connection caching is supported by the installed version of ssh and default annex.sshcaching accordingly. 2012-02-25 19:15:29 -04:00
Joey Hess
1f73db3469 improve alwayscommit=false mode
Now changes are staged into the branch's index, but not committed,
which avoids growing a large journal. And sync and merge always
explicitly commit, ensuring that even when they do nothing else,
they commit the staged changes.

Added a flag file to indicate that the branch's journal contains
uncommitted changes. (Could use git ls-files, but don't want to run
that every time.)

In the future, this ability to have uncommitted changes staged in the
journal might be used on remotes after a series of oneshot commands.
2012-02-25 16:18:55 -04:00