Commit graph

115 commits

Author SHA1 Message Date
Joey Hess
d9d76cf98b Fix minor FD leak in journal code.
Minor because normally only 1 FD is leaked per git-annex run. However,
the test suite leaks a few hundred FDs, and this broke it on the Debian
autobuilders, which seem to have a tigher than usual ulimit.

The leak was introduced by the lazy getDirectoryContents' that was
introduced in e6330988dd in order to scale to
millions of journal files -- if the lazy list was never fully consumed, the
directory handle did not get closed.

Instead, pull in openDirectory/readDirectory/closeDirectory code that I
already developed and submitted in a patch to the haskell directory library
earlier. Using this in journalDirty avoids the place that the lazy list
caused a problem. And using it in stageJournal eliminates the need for
getDirectoryContents'.

The getJournalFiles* functions are switched back to using the regular
strict getDirectoryContents. I'm not sure if those always consume the whole
list, so this avoids any leak. And the things that call those are things
like git annex unused, which also look at every file committed to the
git-annex branch, so would need more work to scale to insane numbers of
files anyway.
2014-07-09 23:36:53 -04:00
Joey Hess
e6330988dd
Fix memory leak when committing millions of changes to the git-annex branch
Eg after git-annex add has run on 2 million files in one go.

Slightly unhappy with the neeed to use a temp file here, but I cannot see
any other alternative (see comments on the bug report).

This commit was sponsored by Hamish Coleman.
2014-07-04 15:28:07 -04:00
Joey Hess
d41849bc23
support commit.gpgsign
Support users who have set commit.gpgsign, by disabling gpg signatures for
git-annex branch commits and commits made by the assistant.

The thinking here is that a user sets commit.gpgsign intending the commits
that they manually initiate to be gpg signed. But not commits made in the
background, whether by a deamon or implicitly to the git-annex branch.
gpg signing those would be at best a waste of CPU and at worst would fail,
or flood the user with gpg passphrase prompts, or put their signature on
changes they did not directly do.

See Debian bug #753720.

Also makes all commits done by git-annex go through a few central control
points, to make such changes easier in future.

Also disables commit.gpgsign in the test suite.

This commit was sponsored by Antoine Boegli.
2014-07-04 11:53:51 -04:00
Joey Hess
2dd274e4ca webapp: When adding a new local repository, fix bug that caused its group and preferred content to be set in the current repository, even when not combining.
There was a tricky bit here, when it does combine, the edit form is shown,
and so the info needs to be committed to the new repository, but then
pulled into the current one. And caches need to be invalidated for it
to be visible in the edit form.
2014-05-29 20:17:05 -04:00
Joey Hess
95ca3bb022 Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.

The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.

(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)

The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.

Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.

This commit was sponsored by Kyle Meyer.
2014-05-27 14:16:33 -04:00
Joey Hess
4e0be2792b remove Read instance for Ref
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)

Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
2014-02-19 01:19:57 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
a44e01c29c --in can now refer to files that were located in a repository at some past date. For example, --in="here@{yesterday}" 2014-02-06 12:43:56 -04:00
Joey Hess
2ecd42b43b remove debug print
just saw it legitimately occur when 2 git-annex were running
2014-01-26 17:04:12 -04:00
Joey Hess
207ac67aaa avoid needing a build-dep on hxt for Data.AssocList 2014-01-14 16:42:10 -04:00
Joey Hess
d07f2d7865 Fix a long-standing bug that could cause the wrong index file to be used when committing to the git-annex branch, if GIT_INDEX_FILE is set in the environment. This typically resulted in git-annex branch log files being committed to the master branch and later showing up in the work tree. (These log files can be safely removed.) 2014-01-14 15:36:33 -04:00
Joey Hess
03932212ec Avoid using git commit in direct mode, since in some situations it will read the full contents of files in the tree.
The assistant's commit code also always avoids git commit, for simplicity.
Indirect mode sync still does a git commit -a to catch unstaged changes.

Note that this means that direct mode sync no longer runs the pre-commit
hook or any other hooks git commit might call. The git annex pre-commit
hook action for direct mode is however explicitly run. (The assistant
already ran git commit with hooks disabled, so no change there.)
2013-12-01 13:59:45 -04:00
Joey Hess
b876df6fdb Ensure that core.sharedrepository is honored when creating the .git/annex directory. 2013-11-18 18:20:20 -04:00
Joey Hess
310c549b5a Ensure execute bit is set on directories when core.sharedrepsitory is set. 2013-11-18 18:13:09 -04:00
Joey Hess
81117e8a9d typo 2013-11-06 12:39:14 -04:00
Joey Hess
ee23be55fd Fix exception handling bug that could cause .git/annex/index to be used for git commits outside the git-annex branch. Known to affect git-annex when used with the git shipped with Ubuntu 13.10. 2013-11-06 12:21:50 -04:00
Joey Hess
435ea52f3c repair command: add handling of git-annex branch and index 2013-10-23 13:00:45 -04:00
Joey Hess
f8880c4fe4 Automatically and safely detect and recover from dangling .git/annex/index.lock files, which would prevent git from committing to the git-annex branch, eg after a crash. 2013-10-03 15:43:08 -04:00
Joey Hess
83b4b8d589 rename confusing function
The index.lck file is not a lock file. Kept the historical name for now as
changing it would be work.
2013-10-03 15:06:58 -04:00
Joey Hess
f2ee4ef86d ensure that commitBranch is only called when the journal is locked
This is not strictly a requirement, since it does not actually update the
journal. But it's a nice invariant to enforce.
2013-10-03 14:48:46 -04:00
Joey Hess
56c3f68a53 use types to partially prove correctness of journal locking code
My implementation does not guard against double locking of the journal. But
it does ensure that the journal is always locked when operated on, by using
a type that is only produced by lockJournal, and which is required as a
parameter of all functions that operate on the journal.

Note that I had to add the fooStale functions for cases where it does not
make sense to lock the journal when querying it. I was more concerned about
ensuring that anything that modifies the journal is locked.
setJournalFile's implementation ensures that any query of the journal will
get one value or the other atomically, even if the journal is being changed
at the time.
2013-10-03 14:41:57 -04:00
Joey Hess
7a9a16b337 lockJournal when running performTransitions
This may not strictly be needed -- the transition code bypasses the
journal. However, this ensures that the git-annex branch is only
committed with the journal locked. This will allow for further
improvements.
2013-10-03 14:37:46 -04:00
Joey Hess
4079f9cfe8 avoid double commit during transition
The second commit had some bad refs which resulted in the race detection
code running. But that commit was unnecessary anyway, it only was there to
merge in the other refs.
2013-09-03 16:33:15 -04:00
Joey Hess
0831e18372 forget --drop-dead: Completely removes mentions of repositories that have been marked as dead from the git-annex branch.
Wrote nice pure transition calculator, and ugly code to stage its results
into the git-annex branch. Also had to split up several Log modules
that Annex.Branch needed to use, but that themselves used Annex.Branch.

The transition calculator is limited to looking at and changing one file at
a time. While this made the implementation relatively easy, it precludes
transitions that do stuff like deleting old url log files for keys that are
being removed because they are no longer present anywhere.
2013-08-31 17:51:13 -04:00
Joey Hess
2f57d74534 remove print 2013-08-29 20:28:45 -04:00
Joey Hess
6147652cc6 wording 2013-08-29 16:41:59 -04:00
Joey Hess
4a915cd3cd add forget command
Works, more or less. --dead is not implemented, and so far a new branch
is made, but keys no longer present anywhere are not scrubbed.

git annex sync fails to push the synced/git-annex branch after a forget,
because it's not a fast-forward of the existing synced branch. Could be
fixed by making git-annex sync use assistant-style sync branches.
2013-08-28 16:41:13 -04:00
Joey Hess
fcd5c167ef untested transition detection on merging, and transition running code 2013-08-28 15:57:42 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
08c03b2af3 XMPP: Avoid redundant and unncessary pushes. Note that this breaks compatibility with previous versions of git-annex, which will refuse to accept any XMPP pushes from this version. 2013-05-21 18:24:29 -04:00
Joey Hess
3d8355d984 Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.

This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.

Refer to commit e9bfa8eaed for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.

In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 15:14:59 -04:00
Joey Hess
3240006c56 fix android build, broken by changes for windows port 2013-05-16 11:52:48 -04:00
Joey Hess
abe8d549df fix permission damage (thanks, Windows) 2013-05-11 23:54:25 -04:00
Joey Hess
18bdff3fae clean up from windows porting 2013-05-11 18:23:41 -04:00
Joey Hess
3c7e30a295 git-annex now builds on Windows (doesn't work) 2013-05-11 15:03:00 -05:00
Joey Hess
763cbda14f fixup #if 0 stubs to use #ifndef mingw32_HOST_OS
That's needed in files used to build the configure program.
For the other files, I'm keeping my __WINDOWS__ define, as I find that much easier to type.
I may search and replace it to use the mingw32_HOST_OS thing later.
2013-05-10 16:57:21 -05:00
Joey Hess
6c74a42cc6 stub out POSIX stuff 2013-05-10 16:29:59 -05:00
Joey Hess
8a5b397ac4 hlint 2013-04-03 03:52:41 -04:00
Joey Hess
0c13d3065e git subcommand cleanup
Pass subcommand as a regular param, which allows passing git parameters
like -c before it. This was already done in the pipeing set of functions,
but not the command running set.
2013-03-03 13:39:07 -04:00
Joey Hess
cbd53b4a8c Makefile now builds using cabal, taking advantage of cabal's automatic detection of appropriate build flags.
The only thing lost is ./ghci

Speed: make fast used to take 20 seconds here, when rebuilding from
touching Command/Unused.hs. With cabal, it's 29 seconds.
2013-02-27 02:39:22 -04:00
Joey Hess
faa9d3c22b work around broken getEnvironment on Android in the most important place: git annex init
This resulted in a lot of user complains that git annex init had git
telling them they needed to run git config --global user.email .. which
didn't work because even HOME was not passed into git.
2013-02-22 14:47:29 -04:00
Joey Hess
0d50a6105b whitespace fixes 2012-12-13 00:45:27 -04:00
Joey Hess
f87a781aa6 finished where indentation changes 2012-12-13 00:24:19 -04:00
Joey Hess
3417c55189 remove git-annex branch read cache
This cache prevented noticing changes made by another process.

The case I just ran into involved the assistant dropping a file, which
cached its presence info. Then the same file was downloaded again,
but the assistant didn't know its presence info had changed.

I don't see a way to keep this cache. Will instead rely on the OS level
file cache, for files in the journal. May need to add more higher-level
caching of info that it's ok to have a potentially stale copy of,
although much of git-annex already does so.
2012-10-19 14:25:15 -04:00
Joey Hess
97ea08e2d1 Avoid unsetting HOME when running certian git commands. Closes: #690193
Setting GIT_INDEX_FILE clobbers the rest of the environment, making git
not read ~/.gitconfig, and blow up if GECOS didn't have a name for the
user.

I'm not entirely happy with getEnvironment being run every time now,
that's somewhat expensive. It may make sense to just set GIT_COMMITTER_*
and GIT_AUTHOR_*, but I worry that clobbering the rest could break PATH,
or GIT_PATH, or something else that might be used by a command run in here.
And caching the environment is not a good idea either; it can change..
2012-10-11 12:58:24 -04:00
Joey Hess
47314c0fad fix last zombies in the assistant
Made Git.LsFiles return cleanup actions, and everything waits on
processes now, except of course for Seek.
2012-10-04 19:56:32 -04:00
Joey Hess
5594bf0643 more zombie fighting
I'm down to 9 places in the code that can produce unwaited for zombies.

Most of these are pretty innocuous, at least for now, are only
used in short-running commands, or commands that run a set of
actions and explicitly reap zombies after each one.

The one from Annex.Branch.files could be trouble later,
since both Command.Fsck and Command.Unused can trigger it,
and the assistant will be doing those eventally. Ditto the one in
Git.LsTree.lsTree, which Command.Unused uses.

The only ones currently affecting the assistant though, are
in Git.LsFiles. Several threads use several of those.

(And yeah, using pipes or ResourceT would be a less ad-hoc approach,
but I don't really feel like ripping my entire code base apart right
now to change a foundation monad. Maybe one of these days..)
2012-10-04 18:47:31 -04:00
Joey Hess
e8188ea611 flip catchDefaultIO 2012-09-17 00:18:07 -04:00
Joey Hess
750c4ac6c2 bugfix: avoid staging but not committing changes to git-annex branch
Branch.get is not able to see changes that have been staged to the index
but not committed. This is a limitation of git cat-file --batch; when
reading from the index, as opposed to from a branch, it does not notice
changes made after the first time it reads the index.

So, had to revert the changes made in 1f73db3469
to make annex.alwayscommit=false stage changes.

Also, ensure that Branch.change and Branch.get always see changes
at all points during a commit, by not deleting journal files when
staging to the index. Delete them only after committing the branch.
Before, there was a race during commits where a different git-annex
could see out-of-date info from the branch while a commit was in progress.

That's also done when updating the branch to merge in remote branches.

In the case where the local git-annex branch has had changes pushed into it
that are not yet reflected in the index, and there are journalled changes
as well, a merge commit has to be done.
2012-09-15 20:15:16 -04:00
Joey Hess
a1f93f06fd eliminate some commits to the git-annex branch
Commits used to be made to the git-annex branch whenever there were
journalled changes from a previous command, and the current command looked
up the value of a file. This no longer happens.

This means that transferkey, which is a oneshot command that stages
changes, can be run multiple times by the assistant, without each of them
committing the changes made by the command before. Which will be a lot
faster and use less space by batching up the commits.

Commits still happen if a remote git-annex branch has been changed and is
merged in.
2012-09-15 18:36:42 -04:00