Commit graph

864 commits

Author SHA1 Message Date
Joey Hess
e2c44bf656 implement chunk logs
Slightly tricky as they are not normal UUIDBased logs, but are instead maps
from (uuid, chunksize) to chunkcount.

This commit was sponsored by Frank Thomas.
2014-07-24 16:23:36 -04:00
Joey Hess
3e4cb0e7f9 fix windows build 2014-07-14 15:55:48 -04:00
Joey Hess
822f4619ae resolvemerge: finish up by committing 2014-07-11 16:59:49 -04:00
Joey Hess
61a35de433 Deal with change in git 2.0 that made indirect mode merge conflict resolution leave behind old files.
I think this is a git behavior change, but have not checked to be sure.
Conflict cruft used to look like $foo~HEAD, but now just $foo is left
behind as conflict cruft.

With test case.
2014-07-11 16:56:19 -04:00
Joey Hess
cb66ca3a76 resolvemerge: New plumbing command that runs the automatic merge conflict resolver. 2014-07-11 16:45:18 -04:00
Joey Hess
1efa51f344 direct: Fix handling of case where a work tree subdirectory cannot be written to due to permissions.
Running `git annex direct` would cause loss of data, because the object
was moved to a temp file, which it then tried to replace the work tree file
with, and on failure, the temp file got deleted. Now it's instead moved
back into the annex object location.
2014-07-10 14:15:46 -04:00
Joey Hess
26ee27915a refactor locking 2014-07-10 00:32:23 -04:00
Joey Hess
e5b88713a1 refactor 2014-07-10 00:16:53 -04:00
Joey Hess
d9d76cf98b Fix minor FD leak in journal code.
Minor because normally only 1 FD is leaked per git-annex run. However,
the test suite leaks a few hundred FDs, and this broke it on the Debian
autobuilders, which seem to have a tigher than usual ulimit.

The leak was introduced by the lazy getDirectoryContents' that was
introduced in e6330988dd in order to scale to
millions of journal files -- if the lazy list was never fully consumed, the
directory handle did not get closed.

Instead, pull in openDirectory/readDirectory/closeDirectory code that I
already developed and submitted in a patch to the haskell directory library
earlier. Using this in journalDirty avoids the place that the lazy list
caused a problem. And using it in stageJournal eliminates the need for
getDirectoryContents'.

The getJournalFiles* functions are switched back to using the regular
strict getDirectoryContents. I'm not sure if those always consume the whole
list, so this avoids any leak. And the things that call those are things
like git annex unused, which also look at every file committed to the
git-annex branch, so would need more work to scale to insane numbers of
files anyway.
2014-07-09 23:36:53 -04:00
Joey Hess
c75193e88b fix build warning 2014-07-09 15:39:19 -04:00
Joey Hess
84186ee626 fix windows build 2014-07-09 15:37:25 -04:00
Joey Hess
58acaf8026 prospective fix for bad_merge_commit_deleting_all_files
Assuming my analysis of a race is correct. In any case, this certianly closes a
race..
2014-07-09 15:08:19 -04:00
Joey Hess
ba42b67c70 Fix bug in automatic merge conflict resolution
When one side is an annexed symlink, and the other side is a non-annexed symlink.

In this case, git-merge does not replace the annexed symlink in the work
tree with the non-annexed symlink, which is different from it's handling of
conflicts between annexed symlinks and regular files or directories.
So, while git-annex generated the correct merge commit, the work tree
didn't get updated to reflect it.
See comments on bug for additional analysis.

Did not add this to the test suite yet; just unloaded a truckload of firewood
and am feeling lazy.

This commit was sponsored by Adam Spiers.
2014-07-08 13:55:11 -04:00
Joey Hess
4a66cd3f91 assistant: Fix bug, introduced in last release, that caused the assistant to make many unncessary empty merge commits. 2014-07-05 17:12:05 -04:00
Joey Hess
c90e4e8778
work around getDirectoryContents not streaming lazily 2014-07-04 17:59:26 -04:00
Joey Hess
e6330988dd
Fix memory leak when committing millions of changes to the git-annex branch
Eg after git-annex add has run on 2 million files in one go.

Slightly unhappy with the neeed to use a temp file here, but I cannot see
any other alternative (see comments on the bug report).

This commit was sponsored by Hamish Coleman.
2014-07-04 15:28:07 -04:00
Joey Hess
d41849bc23
support commit.gpgsign
Support users who have set commit.gpgsign, by disabling gpg signatures for
git-annex branch commits and commits made by the assistant.

The thinking here is that a user sets commit.gpgsign intending the commits
that they manually initiate to be gpg signed. But not commits made in the
background, whether by a deamon or implicitly to the git-annex branch.
gpg signing those would be at best a waste of CPU and at worst would fail,
or flood the user with gpg passphrase prompts, or put their signature on
changes they did not directly do.

See Debian bug #753720.

Also makes all commits done by git-annex go through a few central control
points, to make such changes easier in future.

Also disables commit.gpgsign in the test suite.

This commit was sponsored by Antoine Boegli.
2014-07-04 11:53:51 -04:00
Joey Hess
fc80956092
really add non-date metadata too 2014-07-03 14:35:20 -04:00
Joey Hess
d0c1a22e7c import metadata from feeds
When annex.genmetadata is set, metadata from the feed is added to files
that are imported from it.

Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names
that are used in --template.

Also added title and author, which are the item title/author if available,
falling back to the feed title/author. These are more likely to be common
metadata fields.

(There is a small bit of dupication here, but once git gets
around to packing the object, it will compress it away.)

The itempubdate field is not included in the metadata as a string; instead
it is used to generate year and month fields, same as is done when adding
files with annex.genmetadata set.

This commit was sponsored by Amitai Schlair, who cooincidentially
is responsible for ikiwiki generating nice feed metadata!
2014-07-03 14:15:00 -04:00
Joey Hess
7a8f8b5ac9 refactor 2014-06-16 18:59:23 -04:00
Joey Hess
b30de0dfd2 work around a bug in git
http://marc.info/?l=git&m=140262402204212&w=2

This git bug manifested on FAT and Windows as the test suite failing in 3
places. All involved merge conflict resolution. It turned out that the
associated file mappings were getting messed up, and that happened because
this git bug lost track of what files were supposed to be symlinks.

This commit was sponsored by Eric Kidd.
2014-06-12 22:00:02 -04:00
Joey Hess
4fe2e53f5b finish fixing windows timezone madness
Rather than calculating the TSDelta once, and caching it, this now
reads the inode sential file's InodeCache file once, and then each time a
new InodeCache is generated, looks at the sentinal file to get the current
delta.

This way, if the time zone changes while git-annex is running, it will
adapt.

This adds some inneffiency, but only on Windows, and only 1 stat per new
file added. The worst innefficiency is that `git annex status` and
`git annex sync` will now (on Windows) stat the inode sentinal file once per
file in the repo.

It would be more efficient to use getCurrentTimeZone, rather than needing
to stat the sentinal file. This should be easy to do, once the time
package gets my bugfix patch.

This commit was sponsored by Jürgen Lüters.
2014-06-12 13:54:08 -04:00
Joey Hess
e4d7e2ebde fix for Windows file timestamp timezone madness
On Windows, changing the time zone causes the apparent mtime of files to
change. This confuses git-annex, which natually thinks this means the files
have actually been modified (since THAT'S WHAT A MTIME IS FOR, BILL <sheesh>).

Work around this stupidity, by using the inode sentinal file to detect if
the timezone has changed, and calculate a TSDelta, which will be applied
when generating InodeCaches.

This should add no overhead at all on unix. Indeed, I sped up a few
things slightly in the refactoring.

Seems to basically work! But it has a big known problem:
If the timezone changes while the assistant (or a long-running command)
runs, it won't notice, since it only checks the inode cache once, and
so will use the old delta for all new inode caches it generates for new
files it's added. Which will result in them seeming changed the next time
it runs.

This commit was sponsored by Vincent Demeester.
2014-06-12 13:42:21 -04:00
Joey Hess
a44fd2c019 export CreateProcess fields from Utility.Process
update code to avoid cwd and env redefinition warnings
2014-06-10 19:20:14 -04:00
Joey Hess
f08fcb5030 simplify 2014-06-09 20:32:11 -04:00
Joey Hess
ab72456bb3 avoid fast-forwarding when a merge conflict was auto-resolved 2014-06-09 20:10:12 -04:00
Joey Hess
d6711800ad avoid bad commits after interrupted direct mode sync (or merge)
It was possible for a interrupted sync or merge in direct mode to
leave the work tree out of sync with the last recorded commit.
This would result in the next commit seeing files missing from the work
tree, and committing their removal.

Now, a direct mode merge happens not only in a throwaway work tree, but using
a temporary index file, and without any commits or index changes
being made until the real work tree has been updated. If the merge is
interrupted, the work tree may have some updated files, but worst case a
commit will redundantly commit changes that come from the merge.

This commit was sponsored by Tony Cantor.
2014-06-09 19:40:28 -04:00
Joey Hess
dcddacfd5c fixed getting files from bare repos on windows 2014-06-05 15:54:06 -04:00
Joey Hess
eb86f1338f wip 2014-06-05 15:31:23 -04:00
Joey Hess
1ab3d7c810 Windows: Fix bug introduced in last release that caused files in the git-annex branch to have lines teminated with \r. 2014-06-05 14:57:01 -04:00
Joey Hess
2dd274e4ca webapp: When adding a new local repository, fix bug that caused its group and preferred content to be set in the current repository, even when not combining.
There was a tricky bit here, when it does combine, the edit form is shown,
and so the info needs to be committed to the new repository, but then
pulled into the current one. And caches need to be invalidated for it
to be visible in the edit form.
2014-05-29 20:17:05 -04:00
Ben Gamari
99b89b22fd Use exceptions in place of deprecated MonadCatchIO-transformers 2014-05-28 17:03:40 -04:00
Joey Hess
95ca3bb022 Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.

The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.

(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)

The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.

Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.

This commit was sponsored by Kyle Meyer.
2014-05-27 14:16:33 -04:00
Joey Hess
79cf404e75 support being run by ssh as ssh-askpass replacement
To use, set GIT_ANNEX_SSHASKPASS to point to a fifo or regular file
(FIFO is better, avoids touching disk or multiple readers) that contains
the password. Then set SSH_ASKPASS=git-annex, and when ssh runs it, it will
tell ssh the password.

This is not yet used..
2014-04-29 18:08:10 -04:00
Joey Hess
2807e14904
use a subdir of GIT_ANNEX_TMP for ssh connection caching sockets
To prevent any possible collisions with other, non-socket files, like the
xmppgit directory.
2014-04-20 16:56:01 -04:00
Joey Hess
a8bd7a607d When init detects that git is not configured to commit, and sets user.email to work around the problem, also make it set user.name.
I was able to reproduce git failing to commit despite user.email being set,
in a test account on my laptop. The account had no GECOS information.
2014-04-20 14:17:57 -04:00
Joey Hess
e880d0d22c replace (Key, Backend) with Key
Only fsck and reinject and the test suite used the Backend, and they can
look it up as needed from the Key. This simplifies the code and also speeds
it up.

There is a small behavior change here. Before, all commands would warn when
acting on an annexed file with an unknown backend. Now, only fsck and
reinject show that warning.
2014-04-17 18:03:39 -04:00
Joey Hess
915d038bec reinit: New command that can initialize a new reposotory using the configuration of a previously known repository. Useful if a repository got deleted and you want to clone it back the way it was. 2014-04-15 20:13:35 -04:00
Joey Hess
138d25518d Merge branch 'master' into remotecontrol
Conflicts:
	doc/devblog/day_152__more_ssh_connection_caching.mdwn
2014-04-14 13:38:35 -04:00
Joey Hess
2ff9ba9f74
add missing Network.URI Ord instance for Debian stable 2014-04-14 13:25:49 -04:00
Joey Hess
a0ef99b3f9
don't try to use ssh connection caching for non-ssh urls 2014-04-13 21:39:04 -04:00
Joey Hess
a33b30d0c4 remotedaemon: When network connection is lost, close all cached ssh connections.
This commit was sponsored by Cedric Staub.
2014-04-12 16:32:59 -04:00
Joey Hess
15917ec1a8 sync, assistant, remotedaemon: Use ssh connection caching for git pushes and pulls.
For sync, saves 1 ssh connection per remote. For remotedaemon, the same
ssh connection that is already open to run git-annex-shell notifychanges
is reused to pull from the remote.

Only potential problem is that this also enables connection caching
when the assistant syncs with a ssh remote. Including the sync it does
when a network connection has just come up. In that case, cached ssh
connections are likely to be stale, and so using them would hang.
Until I'm sure such problems have been dealt with, this commit needs to
stay on the remotecontrol branch, and not be merged to master.

This commit was sponsored by Alexandre Dupas.
2014-04-12 15:59:34 -04:00
Johan Kiviniemi
4025515616 Notification: Add action/status-dependent icon and urgency 2014-04-05 20:45:11 +03:00
Johan Kiviniemi
7760dfcc7f Notification: summary is not optional
Use the summary field instead of body.
2014-04-05 20:44:06 +03:00
Joey Hess
a772b23c62 fix warning on !dbus 2014-04-02 18:10:03 -04:00
Joey Hess
fe19e15040 reorg matcher types; no non-type code changes 2014-03-29 14:43:34 -04:00
Joey Hess
0df4848a9e forget --drop-dead: Avoid removing the dead remote from the trust.log, so that if git remotes for it still exist anywhere, git annex info will still know it's dead and not show it. 2014-03-26 13:28:26 -04:00
Joey Hess
16025a4f12
fix build w/o DesktopNotification 2014-03-23 18:17:35 -04:00
Joey Hess
fb8a32cc7f notifications on drop 2014-03-22 15:01:48 -04:00
Joey Hess
a5dcd0e4bd fix failure notification 2014-03-22 14:26:36 -04:00
Joey Hess
e426fac273 add desktop notifications
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.

This commit was sponsored by THM Schoemaker.
2014-03-22 14:12:19 -04:00
Joey Hess
f64c2d6138 toplevel lastchanged field 2014-03-19 19:10:55 -04:00
Joey Hess
1052eeface Windows: Fix some filename encoding bugs.
http://git-annex.branchable.com/bugs/Unicode_file_names_ignored_on_Windows/

Not a complete fix yet.
2014-03-19 15:57:56 -04:00
Joey Hess
caa97d1271 Each for each metadata field, there's now an automatically maintained "$field-lastchanged" that gives the timestamp of the last change to that field.
Note that this is a nearly entirely free feature. The data was already
stored in the metadata log in an easily accessible way, and already was
parsed to a time when parsing the log. The generation of the metadata
fields may even be done lazily, although probably not entirely (the map
has to be evaulated to when queried).
2014-03-18 18:55:43 -04:00
Joey Hess
6a4dd42328 finish wiring up groupwanted 2014-03-15 17:08:55 -04:00
Joey Hess
3551d40b05 "standard" can now be used as a first-class keyword in preferred content expressions.
For example "standard or (include=otherdir/*)" or even "not standard"

Note that the implementation avoids any potential for loops (if a
standard preferred content expression itself mentioned standard).

This commit was sponsored by Jochen Bartl.
2014-03-14 15:04:33 -04:00
Joey Hess
ba02cd8a80 Fix ssh connection caching stop method to work with openssh 6.5p1, which broke the old method.
Old ssh did not check the hostname passed to -O stop, so I had used "any".
But now ssh does check it! I think this happened as part of the client-side
hostname canonicalization changes in 6.5p1, but have not verified that
introduced the problem.

The symptom was that it would try to dns lookup "any", which often caused a
bit of a delay at shutdown. And the old ssh connection kept running, so
it would do it over and over again.

Fixed by using localhost, which hopefully reliably resolves to some address
that ssh will accept.. Also nukeFile the socket after ssh has been asked to
shutdown, just in case.
2014-03-13 19:35:06 -04:00
Joey Hess
8e2997aa69 only run sshCleanup when the command actually used ssh connection caching
Optimises query commands that do not. More importantly, avoids any ssh
connection cleanup delay causing problems at the end of such commands.
2014-03-13 19:30:13 -04:00
Joey Hess
1f99a6778f Fix direct mode getKeysPresent false positive & also sped up direct mode unused and unannex
unused: In direct mode, files that are deleted from the work tree are no longer incorrectly detected as unused.

Direct mode `git annex info` slows down a bit due to more stringent
checking, but not by a lot.
2014-03-07 12:43:56 -04:00
Joey Hess
8ee3b47d2b style 2014-03-04 22:55:40 -04:00
Joey Hess
14d1e878ab sync: Automatically resolve merge conflict between and annexed file and a regular git file.
This is a new feature, it was not handled before, since it's a bit of an
edge case. However, it can be handled exactly the same as a file/dir
conflict, just leave the non-annexed item alone.

While implementing this, the core resolveMerge' function got a lot simpler
and clearer. Note especially that where before there was an asymetric call to
stagefromdirectmergedir, now graftin is called symmetrically in both cases.

And, in order to add that `graftin us`, the current branch needed to be
known (if there is no current branch, there cannot be a merge conflict).
This led to some cleanups of how autoMergeFrom behaved when there is no
current branch.

This commit was sponsored by Philippe Gauthier.
2014-03-04 19:35:55 -04:00
Joey Hess
99295f2c1d factor out Annex.AutoMerge from Command.Sync 2014-03-04 16:26:15 -04:00
Joey Hess
4a847cdc08 finish fixing direct mode merge bug involving unstaged local files
Added test cases for both ways this can happen, with a conflict involving a
file, or a directory.

Cleaned up resolveMerge to not touch the work tree in direct mode, which
turned out to be the only way to handle things.. And makes it much nicer.

Still need to run test suite on windows.
2014-03-04 02:03:15 -04:00
Joey Hess
fb88e0f02c fix 1192d98721 to handle annexed files in conflicted merge
In the case of a conflicted merge where the remote adds a directory, and we
have a file (which is checked in), resolveMerge' will create the link,
and so the fix for 1192d98721 looked at that,
thought it was an unannexed file (it's not in the oldref), and preserved
it.

This is a hacky fix. It would be better for resolveMerge' to not update the
work tree, at least in direct mode, and only stage the changes, which
mergeDirectCleanUp could then move into tree. I want to make that change,
but this is not the time to do it.
2014-03-03 17:09:53 -04:00
Joey Hess
1192d98721 sync: Fix bug in direct mode that caused a file not checked into git to be deleted when merging with a remote that added a file by the same name. (Thanks, jkt) 2014-03-03 14:57:16 -04:00
Joey Hess
04b77328ef
fix handling of nonexistant hook 2014-03-03 13:59:36 -04:00
Joey Hess
d0fce426c4 pre-commit-annex hook script to automatically extract metadata from lots of types of files
Using the extract(1) program to do the heavy lifting.

Decided to make git-annex run pre-commit-annex when committing. Since
git-annex pre-commit also runs it, it'll be run when git commit is run too,
via the pre-commit hook. This basically gives back the pre-commit hook
that git-annex took away. The implementation avoids repeatedly looking
for the hook script when the assistant is running and committing
repeatedly; only checks if the hook is available once.

To make the script simpler, made git-annex metadata -s field?=value
only set a field when it's not already got a value.

This commit was sponsored by bak.
2014-03-02 20:11:58 -04:00
Joey Hess
308d4b67f3 fix combining of FIlterValues 2014-03-02 15:44:14 -04:00
Joey Hess
7d9486a709 vadd: Allow listing multiple desired values for a field. 2014-03-02 15:36:45 -04:00
Joey Hess
c2e8c21ca6 view, vfilter: Add support for filtering tags and values out of a view, using !tag and field!=value.
Note that negated globs are not supported. Would have complicated the code
to add them, without changing the data type serialization in a
non-backwards-compatable way.

This commit was sponsored by Denver Gingerich.
2014-03-02 14:53:19 -04:00
Joey Hess
7ac37a7854 Probe for quvi version at run time.
Overhead: git annex addurl runs quvi --version once.
And more bloat to Annex state..
2014-02-28 14:54:02 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
06e9080f01 metadata: FIeld names are now case insensative. 2014-02-25 18:45:09 -04:00
Joey Hess
b9147b4012
fix test to work on Windows 2014-02-25 18:09:45 -04:00
Joey Hess
003fc2b7e1
add UrlOptions sum type 2014-02-24 22:00:25 -04:00
Joey Hess
c69d6eb035 Make annex.web-options be used in several places that call curl. 2014-02-24 21:29:37 -04:00
Joey Hess
8d5158fa31 Preserve metadata when staging a new version of an annexed file.
Performance impact: When adding a large tree of new files, this needs
to do some git cat-file queries to check if any of the files already
existed and might need a metadata copy. I tried a benchmark in a copy
of my sound repository (so there was already a significant git tree
to check against.

Adding 10000 small files, with a cold cache:
  before: 1m48.539s
  after:  1m52.791s

So, impact is 0.0004 seconds per file added. Which seems acceptable, so did
not add some kind of configuration to enable/disable this.

This commit was sponsored by Lisa Feilen.
2014-02-24 14:41:33 -04:00
Joey Hess
7498c5dd96 annex.genmetadata can be set to make git-annex automatically set metadata (year and month) when adding files 2014-02-23 00:08:29 -04:00
Joey Hess
fa6f553083 exclude derived metadata when extracting metadata from a viewed file 2014-02-22 18:16:28 -04:00
Joey Hess
079b35a1a8 views: add automatically constructed file location metadata
When constructing views, metadata is available about the location of the
file in the view's reference branch. Allows incorporating parts of the
directory hierarchy in a view.

For example `git annex view tag=* podcasts/=*` makes a view in the form
tag/showname.

Performance impact: I benchmarked git annex view tag=* in the conference
proceedings repo to take 6.459s before this change, and 6.544s after.

FWIW, I considered making the syntax for this be podcasts/*, which might
be easier for the user to learn. However, I think it's not as good:

* The user has to then juggle two different syntaxes, and podcasts/* will
  be expanded by the shell so they also need to quote it, while podcasts/=*
  is unlikely to be expanded by the shell.
* It would allow for things like podcasts/*/* and *.mp3 which do not
  map well into views.

This commit was sponsored by Aurélien Pinceaux.
2014-02-22 16:27:53 -04:00
Joey Hess
73a5245502 prune imports 2014-02-22 14:58:05 -04:00
Joey Hess
cc0a576ab0 change directory encoding in ViewedFile such that the original directory can be extracted from it 2014-02-22 14:54:53 -04:00
Joey Hess
1435c4f149 factor out new module 2014-02-22 13:35:50 -04:00
Joey Hess
24f8136504 --metadata field=value can now use globs to match, and matches case insensatively, the same as git annex view field=value does.
Also refactored glob code into its own module.
2014-02-21 18:34:34 -04:00
Joey Hess
db48b8a4a3 unused: Fix to actually detect unused keys when in direct mode. 2014-02-20 13:53:49 -04:00
Joey Hess
dd7b99c860 add tip about metadata driven views (and more flexible view filtering)
While writing this documentation, I realized that there needed to be a way
to stay in a view like tag=* while adding a filter like tag=work that
applies to the same field.

So, there are really two ways a view can be refined. It can have a new
"field=explicitvalue" filter added to it, which does not change the
"shape" of the view, but narrows the files it shows.
Or, it can have a new view added, which adds another level of
subdirectories.

So, added a vfilter command, which takes explicit values to add to the
filter, and rejects changes that would change the shape of the view.

And, made vadd only accept changes that change the shape of the view.

And, changed the View data type slightly; now components that can match
multiple metadata values can be visible, or not visible.

This commit was sponsored by Stelian Iancu.
2014-02-19 16:29:56 -04:00
Joey Hess
39ebfa1a2e pre-commit: Update metadata when committing changes to annexed files within a view.
So the user can now switch to a view and then move files around within it
to manage metadata. For example, moving a file into a new directory
when in the tags=* view adds a tag to it.

Implementation is fairly efficient. One diff-index, which is no more
expensive than the first stage of a git commit, followed by possibly
some cat-file --batch traffic to find the key (when deleting a file).
Very similar to what's done in direct mode when committing. And like
direct mode when updating the WC after a merge, it has to buffer the
diff-tree values in order to make 2 passes over them.

When not in a view, pre-commit now does one extra git symbolic-ref,
which is tiny overhead.

This commit was sponsored by Andrew Eskridge.
2014-02-19 14:17:58 -04:00
Joey Hess
1fe6cd3c0d decruft 2014-02-19 02:32:22 -04:00
Joey Hess
fb266e2da6 make view globs case-insensative, memoized, and bring back TFDA
I was careful to write the code so its clear how laziness memoizes it,
although it's likely that much less explicit currying would have had
the same effect. Verified that the memoization works using a Debug.Trace.
2014-02-19 02:30:14 -04:00
Joey Hess
0b7ede2088 reject views with too many nested subdirs 2014-02-19 01:28:48 -04:00
Joey Hess
4e0be2792b remove Read instance for Ref
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)

Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
2014-02-19 01:19:57 -04:00
Joey Hess
72c118152f fix view changing when in subdir
Failed reading some files with relative paths. This is a quick and dirty
fix.
2014-02-18 20:57:14 -04:00
Joey Hess
9b51d43318 view: preserve toplevel dotfiles 2014-02-18 20:32:00 -04:00
Joey Hess
5790acc19f improve view filenames 2014-02-18 20:01:44 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
103dab702b better data types 2014-02-17 00:38:33 -04:00
Joey Hess
e806152f77 split out types 2014-02-17 00:18:57 -04:00
Joey Hess
d7a95098fb
tricky view refining code that keeps track of whether the view is widenening or narrowing 2014-02-16 22:44:28 -04:00
Joey Hess
410f603383 support globs when built w/o TDFA, just slower 2014-02-16 21:26:57 -04:00
Joey Hess
613f8f02e3
add another quickcheck property, and several edge cases handled 2014-02-16 21:00:12 -04:00
Joey Hess
81628d24c8 simplify type 2014-02-16 17:46:52 -04:00
Joey Hess
9633c67842 filter branches (incomplete)
Promosing work toward metadata driven filter branches. A few methods
to construct them are stubbed out; all the data types and pure code
seems good.

This commit was sponsored by Walter Somerville.
2014-02-16 17:39:54 -04:00
Joey Hess
2075cdeb59
limiting files based on metadata
Note that there is currently no caching, so
	--metadata foo=bar --metadata tag=blah
will currently read the log 2x per file.
2014-02-13 02:24:30 -04:00
Joey Hess
9f7e76130e add metadata command to get/set metadata
Adds metadata log, and command.

Note that unsetting field values seems to currently be broken.
And in general this has had all of 2 minutes worth of testing.

This commit was sponsored by Julien Lefrique.
2014-02-12 21:30:33 -04:00
Joey Hess
2d480602aa random hlint (to give the autobuilder something new to build) 2014-02-11 00:41:19 -04:00
Joey Hess
43d17632f6 remove workaround for old bug
ef24751922 described a bug moving between
remotes in direct mode; I can no longer reproduce it with this strange
workaround removed. Also test suite still passes. Hope the broken code just
got fixed in the meantime.
2014-02-06 17:36:14 -04:00
Joey Hess
897d877472 work around absNormPath not working on Windows
When making git-annex links, we want unix-style paths in the link targets.
2014-02-06 17:17:35 -04:00
Joey Hess
a44e01c29c --in can now refer to files that were located in a repository at some past date. For example, --in="here@{yesterday}" 2014-02-06 12:43:56 -04:00
Joey Hess
08afe3a1f6 fix failing test case on Windows
ensure file being modified is all read before it's opened for write
2014-02-03 10:20:18 -04:00
Joey Hess
1572c460e8 avoid using openFile when withFile can be used
Potentially fixes some FD leak if an action on an opened file handle fails
for some reason. There have been some hard to reproduce reports of
git-annex leaking FDs, and this may solve them.
2014-02-03 10:19:06 -04:00
Joey Hess
fd1382f96f factor out utility function 2014-02-03 10:08:28 -04:00
Joey Hess
a542781f6a remove some monkey faces 2014-02-01 17:14:38 -04:00
Joey Hess
1669e80e85 Windows: Avoid using unix-compat's rename, which refuses to rename directories.
Opened a bug about this: https://github.com/jystic/unix-compat/issues/10
2014-01-29 15:19:03 -04:00
Joey Hess
721cc0cd22 rework annexed object locking in direct mode & support Windows
Seems that locking of annexed objects when they're being dropped was broken
in direct mode:

* When taking the lock before dropping, it created the .git/annex/objects
  file, as an empty file. It seems that the dropping code deleted that,
  but that is not right, and for all I know could in some situation cause
  a corrupted object to leak out.
* When the lock was checked, it actually tried to open each direct mode
  file, and checked if it was locked. Not the same lock used above, and
  could also fail if some consumer of the file locked it.

Fixed this, and added windows support by switching direct mode to lock a
.lck file.
2014-01-28 16:43:11 -04:00
Joey Hess
891c85cd88 use locking on Windows
This is all the easy cases, where there was already a separate lock file.
2014-01-28 14:42:03 -04:00
Joey Hess
2ecd42b43b remove debug print
just saw it legitimately occur when 2 git-annex were running
2014-01-26 17:04:12 -04:00
Joey Hess
74b101d1dd reorg 2014-01-26 16:36:31 -04:00
Joey Hess
b93e485ef1 added annex.secure-erase-command config option. 2014-01-24 12:58:52 -04:00
Joey Hess
3518c586cf fix transfers of key with no associated file
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.

As part of this, preferred content checks pass the Key around.

Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.

I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
2014-01-23 16:44:02 -04:00
Joey Hess
4b55afe9e9 add "unused" preferred content expression
With a really nice optimisation that keeps it from having any overhead
in normal operation!

This commit was sponsored by Ulises Vitulli.
2014-01-22 16:35:32 -04:00
Joey Hess
f2713a3bb9 benchmarked numcopies .gitattributes in preferred content
Checking .gitattributes adds a full minute to a git annex find looking for
files that don't have enough copies. 2:25 increasts to 3:27. I feel this is
too much of a slowdown to justify making it the default. So, exposed two
versions of the preferred content expression, a slow one and a fast but
approximate one.

I'm using the approximate one in the default preferred content expressions
to avoid slowing down the assistant.
2014-01-21 18:49:25 -04:00
Joey Hess
f7cdc40f7b reorg 2014-01-21 18:08:56 -04:00
Joey Hess
0ef282a116 numcopies cleanup, part 2
This includes several bug fixes.
2014-01-21 17:25:39 -04:00
Joey Hess
b40df4f0d0 reorganize numcopies code (no behavior changes)
Move stuff into Logs.NumCopies. Add a NumCopies newtype.

Better names for various serialization classes that are specific to one
thing or another.
2014-01-21 16:08:59 -04:00
Joey Hess
3159da2693 Add and use numcopiesneeded preferred content expression.
* Add numcopiesneeded preferred content expression.
* Client, transfer, incremental backup, and archive repositories
  now want to get content that does not yet have enough copies.

This means the asssistant will make copies of files that don't yet
meet the configured numcopies, even to places that would not normally want
the file.

For example, if numcopies is 4, and there are 2 client repos and
2 transfer repos, and 2 removable backup drives, the file will be sent
to both transfer repos in order to make 4 copies. Once a removable drive
get a copy of the file, it will be dropped from one transfer repo or the
other (but not both).

Another example, numcopies is 3 and there is a client that has a backup
removable drive and two small archive repos. Normally once one of the small
archives has a file, it will not be put into the other one. But, to satisfy
numcopies, the assistant will duplicate it into the other small archive
too, if the backup repo is not available to receive the file.

I notice that these examples are fairly unlikely setups .. the old behavior
was not too bad, but it's nice to finally have it really correct.

.. Almost. I have skipped checking the annex.numcopies .gitattributes
out of fear it will be too slow.

This commit was sponsored by Florian Schlegel.
2014-01-20 17:35:29 -04:00
Joey Hess
d66535f065 global numcopies setting
* numcopies: New command, sets global numcopies value that is seen by all
  clones of a repository.
* The annex.numcopies git config setting is deprecated. Once the numcopies
  command is used to set the global number of copies, any annex.numcopies
  git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.

This global numcopies setting is needed to let preferred content
expressions operate on numcopies.

It's also convenient, because typically if you want git-annex to preserve N
copies of files in a repo, you want it to do that no matter which repo it's
running in. Making it global avoids needing to warn the user about gotchas
involving inconsistent annex.numcopies settings.
(See changes to doc/numcopies.mdwn.)

Added a new variety of git-annex branch log file, that holds only 1 value.
Will probably be useful for other stuff later.

This commit was sponsored by Nicolas Pouillard.
2014-01-20 16:47:56 -04:00
Joey Hess
73c420ffcf much better command action handling for sync --content 2014-01-20 13:31:03 -04:00
Joey Hess
34c8af74ba fix inversion of control in CommandSeek (no behavior changes)
I've been disliking how the command seek actions were written for some
time, with their inversion of control and ugly workarounds.

The last straw to fix it was sync --content, which didn't fit the
Annex [CommandStart] interface well at all. I have not yet made it take
advantage of the changed interface though.

The crucial change, and probably why I didn't do it this way from the
beginning, is to make each CommandStart action be run with exceptions
caught, and if it fails, increment a failure counter in annex state.
So I finally remove the very first code I wrote for git-annex, which
was before I had exception handling in the Annex monad, and so ran outside
that monad, passing state explicitly as it ran each CommandStart action.

This was a real slog from 1 to 5 am.

Test suite passes.

Memory usage is lower than before, sometimes by a couple of megabytes, and
remains constant, even when running in a large repo, and even when
repeatedly failing and incrementing the error counter. So no accidental
laziness space leaks.

Wall clock speed is identical, even in large repos.

This commit was sponsored by an anonymous bitcoiner.
2014-01-20 04:57:36 -04:00
Joey Hess
b6ba0bd556 sync --content: New option that makes the content of annexed files be transferred.
Similar to the assistant, this honors any configured preferred content
expressions.

I am not entirely happpy with the implementation. It would be nicer if
the seek function returned a list of actions which included the individual
file gets and copies and drops, rather than the current list of calls to
syncContent. This would allow getting rid of the somewhat reundant display
of "sync file [ok|failed]" after the get/put display.

But, do that, withFilesInGit would need to somehow be able to construct
such a mixed action list. And it would be less efficient than the current
implementation, which is able to reuse several values between eg get and
drop.

Note that currently this does not try to satisfy numcopies when
getting/putting files (numcopies are of course checked when dropping
files!) This makes it like the assistant, and unlike get --auto
and copy --auto, which do duplicate files when numcopies is not yet
satisfied. I don't know if this is the right decision; it only seemed to
make sense to have this parallel the assistant as far as possible to start
with, since I know the assistant works.

This commit was sponsored by Øyvind Andersen Holm.
2014-01-19 17:49:54 -04:00
Joey Hess
8ce515ffe4 improve matcher data type to allow matching Keys, instead of just files (no behavior changes) 2014-01-18 14:51:55 -04:00
Joey Hess
207ac67aaa avoid needing a build-dep on hxt for Data.AssocList 2014-01-14 16:42:10 -04:00
Joey Hess
d07f2d7865 Fix a long-standing bug that could cause the wrong index file to be used when committing to the git-annex branch, if GIT_INDEX_FILE is set in the environment. This typically resulted in git-annex branch log files being committed to the master branch and later showing up in the work tree. (These log files can be safely removed.) 2014-01-14 15:36:33 -04:00
Joey Hess
78c7c54fdb also check diskreserve for quvi downloads 2014-01-04 15:38:59 -04:00
Joey Hess
f9e7b6cf61 addurl, importfeed: Honor annex.diskreserve as long as the size of the url can be checked.
This adds a http HEAD before the download is done. That was already the
case when the assistant was running, and it seems worth it to avoid filling
up the whole disk, like happened to my server today.
2014-01-04 15:08:06 -04:00
Joey Hess
3e68c1c2fd add remote state logs
This allows a remote to store a piece of arbitrary state associated with a
key. This is needed to support Tahoe, where the file-cap is calculated from
the data stored in it, and used to retrieve a key later. Glacier also would
be much improved by using this.

GETSTATE and SETSTATE are added to the external special remote protocol.

Note that the state is left as-is even when a key is removed from a remote.
It's up to the remote to decide when it wants to clear the state.

The remote state log, $KEY.log.rmt, is a UUID-based log. However,
rather than using the old UUID-based log format, I created a new variant
of that format. The new varient is more space efficient (since it lacks the
"timestamp=" hack, and easier to parse (and the parser doesn't mess with
whitespace in the value), and avoids compatability cruft in the old one.

This seemed worth cleaning up for these new files, since there could be a
lot of them, while before UUID-based logs were only used for a few log
files at the top of the git-annex branch. The transition code has also
been updated to handle these new UUID-based logs.

This commit was sponsored by Daniel Hofer.
2014-01-03 16:35:57 -04:00
Joey Hess
b1d7474c1d Auto-upgrade v3 indirect repos to v5 with no changes. This also fixes a problem when a direct mode repo was somehow set to v3 rather than v4, and so the automatic direct mode upgrade to v5 was not done. 2013-12-29 13:06:23 -04:00
Joey Hess
7d5b25515c Add plumbing-level lookupkey command. 2013-12-15 14:02:23 -04:00
Joey Hess
bef567c31f Fix direct mode's handling when modifications to non-annexed files are pulled from a remote. A bug prevented the files from being updated in the work tree, and this caused the modification to be reverted. 2013-12-12 15:57:09 -04:00
Joey Hess
c160bf9d88 format comment 2013-12-12 15:16:44 -04:00
Joey Hess
03932212ec Avoid using git commit in direct mode, since in some situations it will read the full contents of files in the tree.
The assistant's commit code also always avoids git commit, for simplicity.
Indirect mode sync still does a git commit -a to catch unstaged changes.

Note that this means that direct mode sync no longer runs the pre-commit
hook or any other hooks git commit might call. The git annex pre-commit
hook action for direct mode is however explicitly run. (The assistant
already ran git commit with hooks disabled, so no change there.)
2013-12-01 13:59:45 -04:00
Joey Hess
b25abdb3e6 fix reversion in relative paths to local remotes of direct mode repos
0980f3dae6 broke support for local remotes
from direct mode repos, because the relative path was taken to be from the
gitdir, rather than from the work tree.
2013-11-26 19:33:26 -04:00
Joey Hess
f913deab78 move programPath out of Config.Files to Annex.Path
This works around horribleness in the Mavericks cpp, which falls over on
the #if when configure is running. Moving it avoids the file being built at
that point.

But it's also a location that makes sense..
2013-11-24 16:03:03 -04:00
Joey Hess
e563c7e6f4 fsck distribution key 2013-11-23 21:58:39 -04:00
Joey Hess
b8e74bf489 fix standalone build of this module 2013-11-22 12:21:37 -04:00
Joey Hess
b876df6fdb Ensure that core.sharedrepository is honored when creating the .git/annex directory. 2013-11-18 18:20:20 -04:00
Joey Hess
310c549b5a Ensure execute bit is set on directories when core.sharedrepsitory is set. 2013-11-18 18:13:09 -04:00
Joey Hess
5561b46416 fix windows build 2013-11-18 11:05:16 -04:00
Joey Hess
d48b00ebed Direct mode .git/annex/objects directories are no longer left writable
Because that allowed writing to symlinks of files that are not present,
which followed the link and put bad content in an object location.

fsck: Fix up .git/annex/object directory permissions.

This commit was sponsored by an anonymous bitcoin donor.
2013-11-15 14:52:03 -04:00
Joey Hess
b0f85b3e22 Fix direct mode merge bug when a direct mode file was deleted and replaced with a directory. An ordering problem caused the directory to not get created in this case. Thanks to Tim for the test cases. 2013-11-15 13:40:12 -04:00
Joey Hess
59ecc804cd add new status command
This works for both direct and indirect mode.

It may need some performance tuning.

Note that unlike git status, it only shows the status of the work tree, not
the status of the index. So only one status letter, not two .. and since
files that have been added and not yet committed do not differ between the
work tree and the index, they are not shown. Might want to add display of
the index vs the last commit eventually.

This commit was sponsored by an unknown bitcoin contributor, whose
contribution as been going up lately! ;)
2013-11-07 14:07:25 -04:00
Joey Hess
00c91816fb Merge branch 'master' into directguard 2013-11-06 13:02:35 -04:00
Joey Hess
81117e8a9d typo 2013-11-06 12:39:14 -04:00
Joey Hess
ee23be55fd Fix exception handling bug that could cause .git/annex/index to be used for git commits outside the git-annex branch. Known to affect git-annex when used with the git shipped with Ubuntu 13.10. 2013-11-06 12:21:50 -04:00
Joey Hess
3802f2f270 work around lack of receive.denyCurrentBranch in direct mode
Now that direct mode sets core.bare=true, git's normal prohibition about
pushing into the currently checked out branch doesn't work.

A simple fix for this would be an update hook which blocks the pushes..
but git hooks must be executable, and git-annex needs to be usable on eg,
FAT, which lacks x bits.

Instead, enabling direct mode switches the branch (eg master) to a special
purpose branch (eg annex/direct/master). This branch is not pushed when
syncing; instead any changes that git annex sync commits get written to
master, and it's pushed (along with synced/master) to the remote.

Note that initialization has been changed to always call setDirect,
even if it's just setDirect False for indirect mode. This is needed because
if the user has just cloned a direct mode repo, that nothing has synced
with before, it may have no master branch, and only a annex/direct/master.
Resulting in that branch being checked out locally too. Calling setDirect False
for indirect mode moves back out of this branch, to a new master branch,
and ensures that a manual "git push" doesn't push changes directly to
the annex/direct/master of the remote. (It's possible that the user
makes a commit w/o using git-annex and pushes it, but nothing I can do
about that really.)

This commit was sponsored by Jonathan Harrington.
2013-11-05 21:08:31 -04:00
Joey Hess
4510819215 v5 for direct mode, with automatic upgrade
This includes storing the current state of the HEAD ref, which git annex
sync is going to need, but does not make sync use it.
2013-11-05 17:05:03 -04:00
Joey Hess
0edd9ec03a refactored hook setup 2013-11-05 15:29:56 -04:00
Joey Hess
230bfa9688 add --want-get and --want-drop options
New --want-get and --want-drop options which can be used to test preferred
content settings. For example, "git annex find --in . --want-drop"
2013-10-28 14:50:17 -04:00
Joey Hess
049e80e865 refactor 2013-10-28 14:05:55 -04:00
Joey Hess
435ea52f3c repair command: add handling of git-annex branch and index 2013-10-23 13:00:45 -04:00
Joey Hess
4f871f89ba git-recover-repository 1/2 done 2013-10-20 17:50:51 -04:00
Joey Hess
19816bca41 update for DiffTree type change (which fixes assistant in subdir confusion bug) 2013-10-17 15:11:21 -04:00
Joey Hess
78acbfeb6a ensure merge directory is empty before starting merge
Don't want some past failed merge to lead to bad results, potentially.
2013-10-16 14:57:58 -04:00
Joey Hess
18f4d1b400 queue downloads of keys that fsck finds with bad content 2013-10-10 17:27:00 -04:00
Joey Hess
267c124f67 run ssh in the directory with its socket when stopping
This guarantees that stopping an existing socket never fails.

This might be the route out of the mess of needing to worry about socket
lengths in general. However, it would need quite a lot of refactoring
to make every place in git-annex that runs ssh run it with a cwd that was
determined by the location of its connection caching socket. If this
wasn't already such a mess, I'd consider even the thought of that API a bad
idea..
2013-10-06 21:11:39 -04:00
Joey Hess
6f38426cb8 work around ssh brain-damange
The control socket path passed to ssh needs to be 17 characters shorter
than the maximum unix domain socket length, because ssh appends stuff to it
to make a temporary filename. Closes: #725512

Also, take the shorter of the relative and the absolute paths to the
socket. Typically the relative path will be a lot shorter (unless
deep inside a subdirectory of the repository), and so using it will
avoid flirting with the maximum safe socket lenghts in more situations,
and so lead to less breakage if all my attempts at fixing this are
still buggy.
2013-10-06 20:59:36 -04:00
Joey Hess
f8880c4fe4 Automatically and safely detect and recover from dangling .git/annex/index.lock files, which would prevent git from committing to the git-annex branch, eg after a crash. 2013-10-03 15:43:08 -04:00
Joey Hess
83b4b8d589 rename confusing function
The index.lck file is not a lock file. Kept the historical name for now as
changing it would be work.
2013-10-03 15:06:58 -04:00
Joey Hess
f2ee4ef86d ensure that commitBranch is only called when the journal is locked
This is not strictly a requirement, since it does not actually update the
journal. But it's a nice invariant to enforce.
2013-10-03 14:48:46 -04:00
Joey Hess
56c3f68a53 use types to partially prove correctness of journal locking code
My implementation does not guard against double locking of the journal. But
it does ensure that the journal is always locked when operated on, by using
a type that is only produced by lockJournal, and which is required as a
parameter of all functions that operate on the journal.

Note that I had to add the fooStale functions for cases where it does not
make sense to lock the journal when querying it. I was more concerned about
ensuring that anything that modifies the journal is locked.
setJournalFile's implementation ensures that any query of the journal will
get one value or the other atomically, even if the journal is being changed
at the time.
2013-10-03 14:41:57 -04:00
Joey Hess
7a9a16b337 lockJournal when running performTransitions
This may not strictly be needed -- the transition code bypasses the
journal. However, this ensures that the git-annex branch is only
committed with the journal locked. This will allow for further
improvements.
2013-10-03 14:37:46 -04:00
Joey Hess
12f6b9693a Send a git-annex user-agent when downloading urls.
Overridable with --user-agent option.

Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.

This commit sponsored by Michael Zehrer.
2013-09-28 14:35:21 -04:00
Joey Hess
c45f5fbdb3 indirect: Better behavior when a file in direct mode is not owned by the user running the conversion. 2013-09-25 15:29:56 -04:00
Joey Hess
b405295aee hlint
test suite still passes
2013-09-25 03:09:06 -04:00
Joey Hess
3588729f0d completely solve catKey memory leak
Since 006cf7976f was incomplete, not being
able to get the right mode of the file when the index differs from HEAD,
this is a final workaround. Only buffering the start of the file
in this case avoids leaking memory.

This does not prevent git-cat-file being asked to output the whole file,
which needs to be consumed, and can be slow. But this only happens in a
rare edge case.
2013-09-19 20:09:03 -04:00
Joey Hess
006cf7976f more completely solve catKey memory leak
Done using a mode witness, which ensures it's fixed everywhere.

Fixing catFileKey was a bear, because git cat-file does not provide a
nice way to query for the mode of a file and there is no other efficient
way to do it. Oh, for libgit2..

Note that I am looking at tree objects from HEAD, rather than the index.
Because I cat-file cannot show a tree object for the index.
So this fix is technically incomplete. The only cases where it matters
are:

1. A new large file has been directly staged in git, but not committed.
2. A file that was committed to HEAD as a symlink has been staged
   directly in the index.

This could be fixed a lot better using libgit2.
2013-09-19 16:41:21 -04:00
Joey Hess
eb42bde19a sync, pre-commit, indirect: Avoid unnecessarily catting non-symlink files from git, which can be so large it runs out of memory. 2013-09-19 14:48:42 -04:00
Joey Hess
ab9dd6d8a0 sync: Fix bug that caused direct mode mappings to not be updated when merging files into the tree on Windows. 2013-09-13 13:49:28 -04:00
Joey Hess
a48a4e2f8a automatically derive an annex-uuid from a gcrypt-uuids 2013-09-05 16:02:39 -04:00
Joey Hess
4079f9cfe8 avoid double commit during transition
The second commit had some bad refs which resulted in the race detection
code running. But that commit was unnecessary anyway, it only was there to
merge in the other refs.
2013-09-03 16:33:15 -04:00
Joey Hess
db83cc82d6 Merge branch 'forget'
Conflicts:
	debian/changelog
2013-09-03 14:36:00 -04:00
Joey Hess
67fda9e669 Honor core.sharedrepository when receiving and adding files in direct mode. 2013-09-03 13:35:49 -04:00
Joey Hess
0831e18372 forget --drop-dead: Completely removes mentions of repositories that have been marked as dead from the git-annex branch.
Wrote nice pure transition calculator, and ugly code to stage its results
into the git-annex branch. Also had to split up several Log modules
that Annex.Branch needed to use, but that themselves used Annex.Branch.

The transition calculator is limited to looking at and changing one file at
a time. While this made the implementation relatively easy, it precludes
transitions that do stuff like deleting old url log files for keys that are
being removed because they are no longer present anywhere.
2013-08-31 17:51:13 -04:00
Joey Hess
2f57d74534 remove print 2013-08-29 20:28:45 -04:00
Joey Hess
6147652cc6 wording 2013-08-29 16:41:59 -04:00
Joey Hess
6cdac3a003 sync, assistant: Force push of the git-annex branch.
Necessary to ensure it gets pushed to remotes after being rewritten by forget.
See inline rationalles for why I think this is safe!
2013-08-29 14:27:53 -04:00
Joey Hess
c181efe437 use --force in taggedPush
This should make the assistant force update its tagged push branch
after a transition like git annex forget.
2013-08-29 13:31:29 -04:00
Joey Hess
336d5ec349 Merge branch 'master' into forget 2013-08-29 13:23:02 -04:00
Joey Hess
d3af414568 typo 2013-08-28 17:05:07 -04:00
Joey Hess
4a915cd3cd add forget command
Works, more or less. --dead is not implemented, and so far a new branch
is made, but keys no longer present anywhere are not scrubbed.

git annex sync fails to push the synced/git-annex branch after a forget,
because it's not a fast-forward of the existing synced branch. Could be
fixed by making git-annex sync use assistant-style sync branches.
2013-08-28 16:41:13 -04:00
Joey Hess
fcd5c167ef untested transition detection on merging, and transition running code 2013-08-28 15:57:42 -04:00
Joey Hess
46b6d75274 Youtube support! (And 53 other video hosts)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.

web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.

This commit was sponsored by Mark Hepburn. Thanks!
2013-08-22 18:50:43 -04:00
Joey Hess
412dcb8017 Fix bug that caused typechanged symlinks to be assumed to be unlocked files, so they were added to the annex by the pre-commit hook. 2013-08-22 13:57:07 -04:00
Joey Hess
a3224ce35b avoid more build warnings on Windows 2013-08-04 14:05:36 -04:00
Joey Hess
06db8e0bd9 squash compiler warnings on Windows 2013-08-04 13:18:05 -04:00
Joey Hess
b191d5c595 gitignore support for the assistant and watcher
Requires git 1.8.4 or newer. When it's installed, a background
git check-ignore process is run, and used to efficiently check ignores
whenever a new file is added.

Thanks to Adam Spiers, for getting the necessary support into git for this.

A complication is what to do about files that are gitignored but have
been checked into git anyway. git commands assume the ignore has been
overridden in this case, and not need any more overriding to commit a
changed version.

However, for the assistant to do the same, it would have to run git ls-files
to check if the ignored file is in git. This is somewhat expensive. Or it
could use the running git-cat-file process to query the file that way,
but that requires transferring the whole file content over a pipe, so it
can be quite expensive too, for files that are not git-annex
symlinks.

Now imagine if the user knows that a file or directory tree will be getting
frequent changes, and doesn't want the assistant to sync it, so gitignores
it. The assistant could overload the system with repeated ls-files checks!

So, I've decided that the assistant will not automatically commit changes
to files that are gitignored. This is a tradeoff. Hopefully it won't be a
problem to adjust .gitignore settings to not ignore files you want the
assistant to autocommit, or to manually git annex add files that are listed
in .gitignore.

(This could be revisited if git-annex gets access to an interface to check
the content of the index w/o forking a git command. This could be libgit2,
or perhaps a separate git cat-file --batch-check process, so it wouldn't
need to ship over the whole file content.)

This commit was sponsored by Francois Marier. Thanks!
2013-08-02 20:37:03 -04:00
Joey Hess
93f2371e09 get rid of __WINDOWS__, use mingw32_HOST_OS
The latter is harder for me to remember, but avoids build failures in code
used by the configure program.
2013-08-02 12:27:32 -04:00
Joey Hess
ddd46db09a Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit.
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..

Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.

Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.

A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.

Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.

Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.

There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.

This commit was sponsored by Alexander Brem. Thanks!
2013-07-30 19:18:29 -04:00
Joey Hess
7b0970b340 Fix inverted logic in last release's fix for data loss bug, that caused git-annex sync on FAT or other crippled filesystems to add symlink standin files to the annex. 2013-07-30 16:08:09 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
6ae2637eb1 For long hostnames, use a hash of the hostname to generate the socket file for ssh connection caching.
This is ok to do now that the socket filename never needs to be mapped back
to a hostname.

Short hostnames will still appear in the clear, which is less obfuscated.
So this cannot possibly make ssh connection caching fail for a hostname it
used to work for.
2013-07-22 15:09:41 -04:00
Joey Hess
c6a020ad1f stop cached ssh connection w/o needing to look up host and port
Turns out that with -O stop -S socketfile, ssh does not need the real
hostname, or port to be specificed. This is because it simply talks to the
ssh behind the socket and tells it to stop. So, can eliminate the
conversion back from a socketfile to host and port. Which will allow using
shorter filenames for sockets in the future.
2013-07-21 14:14:54 -04:00
Joey Hess
ecdfa40cbe avoid false positives when detecting core.symlinks=false symlink standin files
If the file is > 8192 bytes, it's certianly not a symlink file.

And if it contains nuls or newlines or whitespace, it's certianly
not a link to annexed content. But it might be a tarball containing
a git-annex repo.
2013-07-20 19:28:02 -04:00
Joey Hess
ae341c1a37 avoid reading files that are not symlinks when core.symlinks=false
This hack is only needed on FAT filesystems, so there's no point in doing
it the rest of the time. And it's possible for there to be a false
positive, so it's best to avoid the hack when possible.
2013-07-20 19:14:29 -04:00
Joey Hess
3e422cb5fa fix uninit to delete content from annex when it ended up hard linked back to the work tree 2013-07-18 13:30:12 -04:00
Joey Hess
c1307b1388 fsck: Don't claim to fix direct mode when run on a symlink whose content is not present. 2013-07-08 17:29:42 -04:00
Joey Hess
d84a000e92 detect system with no dot in FQDN, where git commit will fail, and workaround
Sigh, git is so *fragile*. Or rather, across the set of systems that use
git-annex, where are no many horribly broken systems..
2013-07-05 12:24:28 -04:00
Joey Hess
7a7e426352 moved AssociatedFile definition 2013-07-04 02:36:02 -04:00
Joey Hess
72ab02ca48 avoid failure creating inode sentinal file
Test suite on windows failed running git annex init in a bare clone of an
annexed repo. The annex directory didn't exist when it tried to write the
inode sentinal file.
2013-06-18 15:38:17 -04:00
Joey Hess
1312cffad0 Revert "Windows: Ssh connection caching is now supported."
Yeah, that didn't actually work. Got error messages like it couldn't read
from the control socket, so probably ssh doesn't really support that on
Windows, at least the cygwin ssh build I'm using.
2013-06-17 22:13:28 -04:00
Joey Hess
07a17f58b7 Windows: Ssh connection caching is now supported.
Turns out the socket stuff just works on windows.
2013-06-17 22:05:49 -04:00
Joey Hess
d80a0f62a4 avoid lazy read of file contents
On Windows, that means the file could still be open when later code wants
to delete it, which fails. Since we're only reading 8k anyway, just read
it, strictly. However, avoid reading the whole file strictly, so no
getContentsStrict here.
2013-06-17 21:12:09 -04:00
Joey Hess
b7674b464b typo in comment 2013-06-17 20:45:04 -04:00
Joey Hess
0527c74c0f assistant: In direct mode, objects are now only dropped when all associated files are unwanted. This avoids a repreated drop/get loop of a file that has a copy in an archive directory, and a copy not in an archive directory. (Indirect mode still has some buggy behavior in this area, since it does not keep track of associated files.) Closes: #712060 2013-06-15 14:44:43 -04:00
Joey Hess
92f036fcb4 avoid warnings when built with ghc 7.6 2013-06-02 15:01:58 -04:00
Joey Hess
eba9ee5bc6 remove debug print 2013-05-27 11:18:18 -04:00
Joey Hess
3b1aedea3d Merge branch 'robustness' 2013-05-25 15:22:18 -04:00
Joey Hess
5eeea0fac9 make direct mode merge cleanup more robust
If the cleanup of a single file fails for some reason, continue
to clean up other files.

This could happen because of a race. The merge pulls in a change to a file,
which gets changed locally at the same time.
2013-05-25 15:22:16 -04:00
Joey Hess
bf86b5ca16 improve robustness of fromDirect and replaceFile
Made fromDirect check that a file in the tree has good content (and is not
a broken symlink either) before copying it to another file that has the
same key.

Made replaceFile clean up the temp file if the action that creates it, or
the file replacement action fails.
2013-05-25 15:06:02 -04:00
Joey Hess
729eab1f89 assistant: Work around git-cat-file's not reloading the index after files are staged.
Argh.
2013-05-25 00:37:41 -04:00
Joey Hess
2b14fe2c98 refactor 2013-05-24 23:07:26 -04:00
Joey Hess
08c03b2af3 XMPP: Avoid redundant and unncessary pushes. Note that this breaks compatibility with previous versions of git-annex, which will refuse to accept any XMPP pushes from this version. 2013-05-21 18:24:29 -04:00
Joey Hess
0cb34f3caa update inode cache after copying content
This was also tripped by the test suite's automatic conflict resolution
test. Which also shows BTW that an unnecessary copy of content is done
sometimes when merging in direct mode. Not going to try to speed that up
now.
2013-05-20 17:11:40 -04:00
Joey Hess
d88be65495 didn't quite get removeDirect right before, this passes test suite 2013-05-20 16:28:33 -04:00
Joey Hess
3d8355d984 Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.

This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.

Refer to commit e9bfa8eaed for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.

In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 15:14:59 -04:00
Joey Hess
4c22c2261f minor optimisation and warning fix 2013-05-20 13:58:41 -04:00
Joey Hess
f4ba19f2b8 direct mode bug fix: After a conflicted merge was automatically resolved, the content of a file that was already present could incorrectly be replaced with a symlink.
The bug was in movein, which just replaceFile'd the file with a symlink,
even if it already had the desired content, before trying to pull the
content out of the annex and replace the symlink with it.

That was ok-ish for non conflicted merges, where if the file existed it would
be an old version of the content. But for conflicted merges, the automatic
merge resolver has already run, and will have already put the desired
content into the file for the local variant.

Also, made removeDirect not trust that the associated files map is correct.
Only if it can verify that another file has the content will it not move it
into .git/annex/objects.
2013-05-20 13:41:09 -04:00
Joey Hess
345ee4f37c Switch to MonadCatchIO-transformers for better handling of state while catching exceptions.
As seen in this bug report, the lifted exception handling using the StateT
monad throws away state changes when an action throws an exception.
http://git-annex.branchable.com/bugs/git_annex_fork_bombs_on_gpg_file/
  .. Which can result in cached values being redundantly calculated, or other
     possibly worse bugs when the annex state gets out of sync with reality.

This switches from a StateT AnnexState to a ReaderT (MVar AnnexState).
All changes to the state go via the MVar. So when an Annex action is
running inside an exception handler, and it makes some changes, they
immediately go into affect in the MVar. If it then throws an exception
(or even crashes its thread!), the state changes are still in effect.

The MonadCatchIO-transformers change is actually only incidental.
I could have kept on using lifted-base for the exception handling.
However, I'd have needed to write a new instance of MonadBaseControl
for the new monad.. and I didn't write the old instance.. I begged Bas
and he kindly sent it to me. Happily, MonadCatchIO-transformers is
able to derive a MonadCatchIO instance for my monad.

This is a deep level change. It passes the test suite! What could it break?

Well.. The most likely breakage would be to code that runs an Annex action
in an exception handler, and *wants* state changes to be thrown away.
Perhaps the state changes leaves the state inconsistent, or wrong. Since
there are relatively few places in git-annex that catch exceptions in the
Annex monad, and the AnnexState is generally just used to cache calculated
data, this is unlikely to be a problem.

Oh yeah, this change also makes Assistant.Types.ThreadedMonad a bit
redundant. It's now entirely possible to run concurrent Annex actions in
different threads, all sharing access to the same state! The ThreadedMonad
just adds some extra work on top of that, with its own MVar, and avoids
such actions possibly stepping on one-another's toes. I have not gotten
rid of it, but might try that later. Being able to run concurrent Annex
actions would simplify parts of the Assistant code.
2013-05-19 14:16:36 -04:00
Joey Hess
630a8b9ad2 warning 2013-05-19 12:43:44 -04:00
Joey Hess
1b616c5d37 improve handling of receiving object in direct mode when associated files are modified
Before, if a direct mode repo had one or more associated files that
were modifed, moving the object into it would overwrite the associated
files with the pristine object.

Now, modified associated files are left unchanged. To ensure that,
when an object is moved into a direct mode repo, it's not thrown away,
it gets stored in indirect mode.
2013-05-17 16:25:18 -04:00
Joey Hess
94cb037aa3 store copy in inode cache too 2013-05-17 16:16:10 -04:00
Joey Hess
b8e5b9c645 test suite passes in direct mode
This fixes a bug with git annex add in direct mode. If some files already
existed in the tree pointing at the same key as a file that was just added,
and their content was not present, add neglected to copy the content to
those files.

I also changed the behavior of moveAnnex slightly: When content is moved
into the annex in direct mode, it does not overwrite any content already
present in direct mode files. That content may be modified after all.
2013-05-17 15:59:37 -04:00
Joey Hess
3240006c56 fix android build, broken by changes for windows port 2013-05-16 11:52:48 -04:00
Joey Hess
aba49995b6 Merge branch 'master' into windows 2013-05-15 19:18:04 -04:00
Joey Hess
4829eae883 fix toDirectGen bug introduced in 247b7e9e58 2013-05-15 19:15:40 -04:00
Joey Hess
c62b54d80d start one git-cat-file per index file
This reverts 1c83b6c439 and properly fixes
the issue discussed there.

This makes git-annex behave much nicer in direct mode.
2013-05-15 18:46:38 -04:00
Joey Hess
25cb9a48da fix the day's Windows permissions damage 2013-05-14 20:15:14 -04:00
Joey Hess
8a2ff023a3 convert from internal git path when checking symlink standin file 2013-05-14 15:08:40 -05:00
Joey Hess
15af92291f Merge remote-tracking branch 'gnu/windows' into windows 2013-05-14 14:21:49 -05:00
Joey Hess
fee6cd4635 fix imports 2013-05-14 14:21:35 -05:00
Joey Hess
e7936b1a34 always try to read symlink; only fall back to looking inside file
On Windows with Cygwin, checking out a git-annex repo will create symlinks
on disk, so we need to always try to read the symlink, even when
core.symlinks says they're not supported.
2013-05-14 14:18:47 -04:00
Joey Hess
17952a893e fix imports 2013-05-14 13:53:29 -04:00
Joey Hess
43f2de8522 Merge branch 'windows' of git://git-annex.branchable.com into windows 2013-05-13 20:11:30 -05:00
Joey Hess
1093302eba read inode cache file strictly to avoid failure to drop on windows
Seems that Windows doesn't allow deleting a file that the same process has open.
Here the inode cache file was read and a the value from it gets used later.
But due to laziness, the old file is still open when it gets deleted. Adding
strictness avoids this problem. Of course, the file is small, so it's no
problem to read it all strictly, so this is probably an improvement even
outside of Windows.
2013-05-13 19:29:52 -05:00
Joey Hess
13b629c208 fix warnings 2013-05-13 15:30:18 -04:00
Joey Hess
25a8d4b11c rename module 2013-05-12 19:19:28 -04:00
Joey Hess
03e8594369 fix the day's windows permissions damage 2013-05-12 19:09:48 -04:00
Joey Hess
73d2f8b280 deal with git using / internally, even on DOS 2013-05-12 17:29:49 -05:00
Joey Hess
2f3ce4c02f fix 2013-05-12 15:43:59 -05:00
Joey Hess
838b984797 deal with dos path separators 2013-05-12 15:37:32 -05:00
Joey Hess
abe8d549df fix permission damage (thanks, Windows) 2013-05-11 23:54:25 -04:00
Joey Hess
18bdff3fae clean up from windows porting 2013-05-11 18:23:41 -04:00
Joey Hess
3c7e30a295 git-annex now builds on Windows (doesn't work) 2013-05-11 15:03:00 -05:00
Joey Hess
763cbda14f fixup #if 0 stubs to use #ifndef mingw32_HOST_OS
That's needed in files used to build the configure program.
For the other files, I'm keeping my __WINDOWS__ define, as I find that much easier to type.
I may search and replace it to use the mingw32_HOST_OS thing later.
2013-05-10 16:57:21 -05:00
Joey Hess
6c74a42cc6 stub out POSIX stuff 2013-05-10 16:29:59 -05:00
Joey Hess
adde00f4f3 git-annex-shell: Ensure that received files can be read. Files transferred from some Android devices may have very broken permissions as received. 2013-05-06 17:30:57 -04:00
Joey Hess
247b7e9e58 direct: Fix a bug that could cause some files to be left in indirect mode.
It's possible for files in indirect mode to have a direct mode mapping
file. Probably from when they were in direct mode. In this case,
toDirectGen tried to copy the content from the direct mode file that the
mapping said had it. But, being in indirect mode, it didn't really have the
content. So it did nothing. This fix makes it always move the content from
.git/annex/objects/ when it's there.
2013-05-06 12:43:03 -04:00
Joey Hess
543ffa5b9f work around git/environment/gecos/android suck
I don't know why, but I can't seem to set the environment variables inside
git-annex to work around the git error caused by android's crappy username
and hostname settings. This workaround works, and that's all that's good
about it.
2013-05-03 14:08:26 -04:00
Joey Hess
e23a7598e2 set EMAIL when GECOS workaround is needed
Git fails on Android, because it gets some weird domain for local host like
"localhost.(none)". This works around that. I made it always set EMAIL when
GECOS workaround was needed (unless EMAIL is already set). It might be
nicer to try to get the hostname.domain as git does, and only set it if
that fails. But I don't want to be stuck trying to exactly duplicate
whatever git is doing.
2013-05-03 11:52:04 -04:00
Joey Hess
0807211a67 thaw content directory in direct mode too
A content directory can be frozen in direct mode. One way this can happen
is if the content is transferred before direct mode has a mapping for it,
so it's stored in the content directory.

So, we need to thaw the content directory before doing things with it.
2013-04-30 19:33:43 -04:00
Joey Hess
11ca4cee34 refactor 2013-04-30 19:09:36 -04:00
Joey Hess
0ae8c82c53 per-IA-item content directories 2013-04-25 23:44:55 -04:00
Joey Hess
07580dc3df sync: Bug fix, avoid adding to the annex the dummy symlinks used on crippled filesystems.
The root of the problem is that toInodeCache sees a non-symlink, and so
goes on and generates a new inode cache for the dummy symlink.

Any place that toInodeCache, or sameFileStatus, or genInodeCache are called
may need to deal with this case. Although many of them are ok. For example,
prepSendAnnex calls sameInodeCache, which calls genInodeCache.. but if
the file content is not present, the InodeCache generated for its standin
file is appropriately not the same, and so it returns Nothing.

I've audited some, but have to say I'm not happy with this; it should be
handled at the type level somehow, or a toInodeCache wrapper be used that
is aware of dummy symlinks.

(The Watcher already dealt with it, via the guardSymlinkStandin function.)
2013-04-23 17:14:28 -04:00
Joey Hess
8a2d1988d3 expose Control.Monad.join
I think I've been looking for that function for some time.
Ie, I remember wanting to collapse Just Nothing to Nothing.
2013-04-22 20:24:53 -04:00
Joey Hess
9cb223a8b3 Detect systems that have no user name set in GECOS, and also don't have user.name set in git config, and put in a workaround so that commits to the git-annex branch (and the assistant) will still succeed despite git not liking the system configuration. 2013-04-22 15:36:34 -04:00
guilhem
a1eded8641 Allow rsync to use other remote shells.
Introduced a new per-remote option 'annex-rsync-transport' to specify
the remote shell that it to be used with rsync. In case the value is
'ssh', connections are cached unless 'sshcaching' is unset.
2013-04-13 19:26:24 -04:00
Joey Hess
4f5ceffead implement massReplace
This looks at the string one char at a time, which is hardly efficient..
but more than good enough for expanding variables in
relatively short command lines.
2013-04-08 23:56:37 -04:00
Joey Hess
d440b6047b Added annex.web-download-command setting. 2013-04-08 23:34:05 -04:00
Joey Hess
602baae12e Bugfix: Direct mode no longer repeatedly checksums duplicated files.
Fixed by storing a list of cached inodes for a key, instead of just one.

Backwards compatability note: An old git-annex version will fail to parse
an inode cache file that has been written by a new version, and has
multiple items. It will succees if just one. So old git-annexes will have
even worse behavior when there are duplicated files, if that is possible.
I don't think it will be a problem. (Famous last words.)

Also, note that it doesn't expire old and unused inode caches for a key.
It would be possible to add this if needed; just look through the
associated files for a key and if there are more cached inodes, throw out
any not corresponding to associated files. Unless a file is being copied
repeatedly and the old copy deleted, this lack of expiry should not be a
problem.
2013-04-06 16:07:25 -04:00
Joey Hess
f1b0a4b404 Use lower case hash directories for storing files on crippled filesystems, same as is already done for bare repositories.
* since this is a crippled filesystem anyway, git-annex doesn't use
  symlinks on it
* so there's no reason to use the mixed case hash directories that we're
  stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
  repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
  mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
  test case will recover and it will all just work.
2013-04-04 15:46:33 -04:00
Joey Hess
8a5b397ac4 hlint 2013-04-03 03:52:41 -04:00
Joey Hess
0b57113c42 cleanup 2013-04-02 19:45:52 -04:00
Joey Hess
38d61f934d Update working tree files fully atomically
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.

But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.

(It also probably avoids confusing displays in file managers.)
2013-04-02 15:02:00 -04:00
Joey Hess
67e817c6a1 New annex.largefiles setting, which configures which files git annex add and the assistant add to the annex.
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.

A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
2013-03-29 16:17:13 -04:00
Joey Hess
75a1c2f91a cleanup debug print 2013-03-28 14:18:26 -04:00
Joey Hess
80c8c0e62a comment typo 2013-03-18 13:17:43 -04:00
Joey Hess
7a77f98576 move comment to right place 2013-03-18 11:18:04 -04:00
Joey Hess
b3d3ece2ab remove old debug print 2013-03-16 17:04:48 -04:00
Joey Hess
f7de51e8b6 Bugfix: Fix bug in inode cache sentinal check, which broke copying to local repos if the repo being copied from had moved to a different filesystem or otherwise changed all its inodes' 2013-03-12 16:41:54 -04:00
Joey Hess
61c5e8736c detect renames during commit, and .. um, do nothing special because it's lunch time
But I'm well set up to fast-track direct mode adds for renames now.
2013-03-11 12:56:47 -04:00
Joey Hess
40df015d90 remove Eq instance for InodeCache
There are two types of equality here, and which one is right varies,
so this forces me to consider and choose between them.

Based on this, I learned that the commit in git anex sync was
always doing a strong comparison, even when in a repository where
the inodes had changed. Fixed that.
2013-03-11 02:57:48 -04:00
Joey Hess
cbb6e1fae4 tag xmpp pushes with jid
This fixes the issue mentioned in the last commit.

Turns out just collecting UUID of clients behind a XMPP remote is
insufficient (although I should probably still do it for other reasons),
because a single remote repo might be connected via both XMPP and local
pairing. So a way is needed to know when a push was received from any
client using a given XMPP remote over XMPP, as opposed to via ssh.
2013-03-06 16:29:19 -04:00
Joey Hess
c23ea9e311 assistant: Get back in sync with XMPP remotes after network reconnection, and on startup.
Make manualPull send push requests over XMPP.

When reconnecting with remotes, those that are XMPP remotes cannot
immediately be pulled from and scanned, so instead maintain a set of
(probably) desynced remotes, and put XMPP remotes on it. (This set could be
used in other ways later, if we can detect we're out of sync with other
types of remotes.)

The merger handles detecting when a XMPP push is received from a desynced
remote, and triggers a scan then, if they have in fact diverged.

This has one known bug: A single XMPP remote can have multiple clients
behind it. When this happens, only the UUID of one client is recorded
as the UUID of the XMPP remote. Pushes from the other XMPP clients will not
trigger a scan. If the client whose UUID is expected responds to the push
request, it'll work, but when that client is offline, we're SOL.
2013-03-06 15:09:31 -04:00
Joey Hess
974d075108 Run ssh with -T to avoid tty allocation and any login scripts that may do undesired things with it. 2013-03-04 23:36:07 -04:00
Joey Hess
0c13d3065e git subcommand cleanup
Pass subcommand as a regular param, which allows passing git parameters
like -c before it. This was already done in the pipeing set of functions,
but not the command running set.
2013-03-03 13:39:07 -04:00
Joey Hess
cbd53b4a8c Makefile now builds using cabal, taking advantage of cabal's automatic detection of appropriate build flags.
The only thing lost is ./ghci

Speed: make fast used to take 20 seconds here, when rebuilding from
touching Command/Unused.hs. With cabal, it's 29 seconds.
2013-02-27 02:39:22 -04:00
Joey Hess
2d9c046dea annex.version is now set to 4 for direct mode repositories
To avoid old versions of git-annex getting confused.

There is no upgrade required though.
We switch back to 3 when going from direct to indirect.
2013-02-26 15:13:10 -04:00
Joey Hess
e423190b11 fix 2013-02-24 17:40:14 -04:00
Joey Hess
6ff1ce76b7 hopefully fix a bug 2013-02-24 17:21:04 -04:00
Joey Hess
afb21353c8 remove debug print 2013-02-23 14:34:02 -04:00
Joey Hess
051476c2a9 squelch warning 2013-02-22 18:22:12 -04:00
Joey Hess
08854afa10 fix inverted logic 2013-02-22 17:01:48 -04:00
Joey Hess
4689fbde35 fix sameInodeCache to check the inode change sentinal
This should fix the problem where the assistant, on Android, re-adds every
file on startup.
2013-02-22 15:19:28 -04:00
Joey Hess
faa9d3c22b work around broken getEnvironment on Android in the most important place: git annex init
This resulted in a lot of user complains that git annex init had git
telling them they needed to run git config --global user.email .. which
didn't work because even HOME was not passed into git.
2013-02-22 14:47:29 -04:00
Joey Hess
6f9be431e6 only create inode sentinal file when initializing a new repo 2013-02-20 13:55:53 -04:00
Joey Hess
00b465e213 shorter directory to external ssh socket
Before it was too long to be used.
2013-02-19 17:31:08 -04:00
Joey Hess
624e34649f Direct mode: Support filesystems like FAT which can change their inodes each time they are mounted. 2013-02-19 17:31:03 -04:00
Joey Hess
0f4cc559a7 Android: Support ssh connection caching. 2013-02-19 14:57:45 -04:00
Joey Hess
d799ef3182 set fileSystemEncoding when reading files that might be binary 2013-02-18 17:19:37 -04:00
Joey Hess
422dd28f0b hlint 2013-02-18 02:39:40 -04:00
Joey Hess
9aa979edbd types 2013-02-18 02:35:38 -04:00
Joey Hess
d7c93b8913 fully support core.symlinks=false in all relevant symlink handling code
Refactored annex link code into nice clean new library.

Audited and dealt with calls to createSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:  ( liftIO $ createSymbolicLink linktarget file
  only when core.symlinks=true
Assistant/WebApp/Configurators/Local.hs:                createSymbolicLink link link
  test if symlinks can be made
Command/Fix.hs: liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/FromKey.hs:     liftIO $ createSymbolicLink link file
  command only works in indirect mode
Command/Indirect.hs:                    liftIO $ createSymbolicLink l f
  refuses to run if core.symlinks=false
Init.hs:                createSymbolicLink f f2
  test if symlinks can be made
Remote/Directory.hs:    go [file] = catchBoolIO $ createSymbolicLink file f >> return True
  fast key linking; catches failure to make symlink and falls back to copy
Remote/Git.hs:          liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True
  ditto
Upgrade/V1.hs:                          liftIO $ createSymbolicLink link f
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to readSymbolicLink.
Remaining calls are all safe, because:

Annex/Link.hs:		( liftIO $ catchMaybeIO $ readSymbolicLink file
  only when core.symlinks=true
Assistant/Threads/Watcher.hs:		ifM ((==) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file))
  code that fixes real symlinks when inotify sees them
  It's ok to not fix psdueo-symlinks.
Assistant/Threads/Watcher.hs:		mlink <- liftIO (catchMaybeIO $ readSymbolicLink file)
  ditto
Command/Fix.hs:	stopUnless ((/=) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) $ do
  command only works in indirect mode
Upgrade/V1.hs:	getsymlink = takeFileName <$> readSymbolicLink file
  v1 repos could not be on a filesystem w/o symlinks

Audited and dealt with calls to isSymbolicLink.
(Typically used with getSymbolicLinkStatus, but that is just used because
getFileStatus is not as robust; it also works on pseudolinks.)
Remaining calls are all safe, because:

Assistant/Threads/SanityChecker.hs:                             | isSymbolicLink s -> addsymlink file ms
  only handles staging of symlinks that were somehow not staged
  (might need to be updated to support pseudolinks, but this is
  only a belt-and-suspenders check anyway, and I've never seen the code run)
Command/Add.hs:         if isSymbolicLink s || not (isRegularFile s)
  avoids adding symlinks to the annex, so not relevant
Command/Indirect.hs:                            | isSymbolicLink s -> void $ flip whenAnnexed f $
  only allowed on systems that support symlinks
Command/Indirect.hs:            whenM (liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f) $ do
  ditto
Seek.hs:notSymlink f = liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f
  used to find unlocked files, only relevant in indirect mode
Utility/FSEvents.hs:                    | Files.isSymbolicLink s = runhook addSymlinkHook $ Just s
Utility/FSEvents.hs:                                            | Files.isSymbolicLink s ->
Utility/INotify.hs:                             | Files.isSymbolicLink s ->
Utility/INotify.hs:                     checkfiletype Files.isSymbolicLink addSymlinkHook f
Utility/Kqueue.hs:              | Files.isSymbolicLink s = callhook addSymlinkHook (Just s) change
  all above are lower-level, not relevant

Audited and dealt with calls to isSymLink.
Remaining calls are all safe, because:

Annex/Direct.hs:			| isSymLink (getmode item) =
  This is looking at git diff-tree objects, not files on disk
Command/Unused.hs:		| isSymLink (LsTree.mode l) = do
  This is looking at git ls-tree, not file on disk
Utility/FileMode.hs:isSymLink :: FileMode -> Bool
Utility/FileMode.hs:isSymLink = checkMode symbolicLinkMode
  low-level

Done!!
2013-02-17 16:43:14 -04:00
Joey Hess
397082013a proper fix for dropunused
Now getKeysPresent checks that the key's content, not only its directory,
exists. In direct mode, the inode cache file is used as a standin for the
content.

removeAnnex always removes the inode cache file, and drop and move --from
always call removeAnnex, even if the object does not seem to be inAnnex,
to ensure it's always deleted.
2013-02-15 17:58:49 -04:00
Joey Hess
5a8fb26d0a Revert "Clean up direct mode cache and mapping info when dropping keys."
This reverts commit 57780cb3a4.

This was buggy, it caused the direct mode cache to be lost when dropping
keys, so when the file is gotten back, it's stored in indirect mode.

Note to self: Do not attempt bug fixes at 6 am!
2013-02-15 16:37:57 -04:00
Joey Hess
5ea4b91fb4 start to support core.symlinks=false
Utility functions to handle no symlink mode, and converted Annex.Content to
use them; still many other places to convert.
2013-02-15 16:03:11 -04:00
Joey Hess
7ce30b534f add: Improved detection of files that are modified while being added.
In indirect mode, now checks the inode cache to detect changes to a file.
Note that a file can still be changed if a process has it open for write,
after landing in the annex.

In direct mode, some checking of the inode cache was done before, but
from a much later point, so fewer modifications could be detected. Now it's
as good as indirect mode.

On crippled filesystems, no lock down is done before starting to add a
file, so checking the inode cache is the only protection we have.
2013-02-14 16:54:36 -04:00
Joey Hess
a52f8f382b split out Utility.InodeCache 2013-02-14 16:17:40 -04:00
Joey Hess
47477b2807 crippled filesystem support, probing and initial support
git annex init probes for crippled filesystems, and sets direct mode, as
well as `annex.crippledfilesystem`.

Avoid manipulating permissions of files on crippled filesystems.
That would likely cause an exception to be thrown.

Very basic support in Command.Add for cripped filesystems; avoids the lock
down entirely since doing it needs both permissions and hard links.
Will make this better soon.
2013-02-14 14:15:26 -04:00
Joey Hess
f202d997f4 Now uses the Haskell uuid library, rather than needing a uuid program.
Been meaning to do this for some time; Android port was last straw.

Note that newer versions of the uuid library have a Data.UUID.V4 that
generates random UUIDs slightly more cleanly, but Debian has an old version
of the library, so I do it slightly round-about.
2013-02-10 14:52:54 -04:00
Joey Hess
57780cb3a4 Clean up direct mode cache and mapping info when dropping keys.
These files were left behind, and made getKeysPresent find keys that were
not present. It would be expensive to make getKeysPresent check that the
actual key files are present (it just lists the directories). But that's not
needed if we just clean up the stale cache and mapping files.

To handle systems that were in direct mode and got switched back with stale
direct mode files, made cleanObjectLoc remove all files in the key's directory.

git annex unused will still list keys that are gone but for which the stale
direct mode files exists. To deal with that, made dropunused remove the key's
directory even if the key does not seem to be present.
2013-02-07 08:28:40 -04:00
Joey Hess
af3a25ee03 Deal with stale mappings for deleted file in direct mode.
The most common way for a mapping to be stale is when a file was deleted,
or renamed. Nothing updates the mappings for deletions yet.
But they can also become stale in other ways. For example a file can
be modified.

So, the mapping is not trusted to be consistent. When we get a key,
only replace symlinks that still point to that key with its content.
When we drop a key, only put back symlinks for files that still have
the direct mode content.
2013-02-05 16:48:00 -04:00
Joey Hess
0e3f931f37 add another setting to GitConfig 2013-01-28 00:33:19 +11:00
Joey Hess
103b572d8e ensure that content directory is thawed when writing direct mode mapping and cache files 2013-01-26 20:09:15 +11:00
Joey Hess
f86462b475 allow lazy reading of map contents
Don't explicitly close; hGetContents will close when read is done.
2013-01-18 13:16:16 -04:00
Joey Hess
e481ca7658 some more direct mode fixes
Avoid a crash if a mapping contains files that no longer exist.
This could happen because eg, one was deleted and a commit has not yet been
done to update the mapping.

Fix path calculation.
2013-01-18 12:39:26 -04:00
Joey Hess
5c58e9c101 Avoid filename encoding errors when writing direct mode mappings. 2013-01-18 12:26:45 -04:00
Joey Hess
bbf0e74f72 Fix direct mode mapping code to always store direct mode filenames relative to the top of the repository, even when operating inside a subdirectory. 2013-01-18 12:20:08 -04:00
Joey Hess
85c564ea94 In direct mode, files with the same key are no longer hardlinked, as that would cause a surprising behavior if modifying one, where the other would also change. 2013-01-14 11:56:37 -04:00
Joey Hess
a6a5ed8121 check for direct mode file change when copying to a local git remote 2013-01-10 11:45:44 -04:00
Joey Hess
1bc49b7158 Special remotes now all rollback storage of keys that get modified during the transfer, which can happen in direct mode. 2013-01-09 18:42:29 -04:00
Joey Hess
858ad6783b add works in direct mode
Also, changed sync to no longer automatically add files in direct mode.
That was only necessary before because add didn't work.
2013-01-06 17:24:22 -04:00
Joey Hess
909f67443f Fix transferring files to special remotes in direct mode. 2013-01-06 14:29:01 -04:00
Joey Hess
e457be7631 direct: Avoid hardlinking symlinks that point to the same content when the content is not present. 2013-01-06 13:57:53 -04:00
Joey Hess
1c83b6c439 work around a very strange git-cat-file behavior
Sometimes it seems that git-cat-file --batch stops getting info for
files in the current repo, when ":file" is fed to it. I have not reproduced
this at the command line, but only when using git annex whereis and git
annex move inside a direct mode repo. Those failed, because cat-file
returned "file missing". OTOH, git annex find works fine, despite passing
the same file to cat-file. It seems that the failing commands first asked
cat-file to show a file on the git-annex branch. Perhaps it got "stuck" on
that branch? But I cannot repoduce it running cat-file by hand. Most
strange. HEAD is a workaround for this extreme weirdness, since I spent a
good 2 hours struggling with it already.
2013-01-05 17:06:24 -04:00
Joey Hess
1cdf2b923d assistant: Make expensive transfer scan work fully in direct mode.
The expensive scan uses lookupFile, but in direct mode, that doesn't work
for files that are present. So the scan was not finding things that are
present that need to be uploaded. (It did find things not present that
needed to be downloaded.)

Now lookupFile also works in direct mode. Note that it still prefers
symlinks on disk to info committed to git, in direct mode. This is
necessary to make things like Assistant.Threads.Watcher.onAddSymlink
work correctly, when given a new symlink not yet checked into git (or
replacing a file checked into git).
2013-01-05 15:57:53 -04:00
Joey Hess
4008590c68 type based git config handling for remotes
Still a couple of places that use git config ad-hoc, but this is most of it
done.
2013-01-01 13:58:14 -04:00
Joey Hess
7f7c31df1c type based git config handling
Now there's a Config type, that's extracted from the git config at startup.
Note that laziness means that individual config values are only looked up
and parsed on demand, and so we get implicit memoization for all of them.
So this is not only prettier and more type safe, it optimises several
places that didn't have explicit memoization before. As well as getting rid
of the ugly explicit memoization code.

Not yet done for annex.<remote>.* configuration settings.
2012-12-29 23:10:18 -04:00
Joey Hess
2fdefc656b fix logic error breaking direct mode assistant autocommit of modified files 2012-12-28 16:00:19 -04:00
Joey Hess
eb40227d15 assistant direct mode file add/change bookkeeping
When a file is changed in direct mode, the old content is probably lost
(at least from the local repo), and bookeeping needs to be updated to
reflect this.

Also, synthetic add events are generated at assistant startup, so
make it detect when the file has not really changed, and avoid re-adding
it.

This does add the overhead of querying the runing git cat-file for the
key that's recorded in git for the file, each time a file is added or
modified in direct mode.
2012-12-25 15:48:15 -04:00
Joey Hess
ddb0adb998 more quickcheck fun 2012-12-19 16:36:19 -04:00
Joey Hess
93c430c2a4 comment 2012-12-19 12:46:35 -04:00
Joey Hess
97d670b0d5 normalise associated files
Sometimes ./file will be passed in, and sometimes file; need to treat these
the same.
2012-12-19 12:44:24 -04:00
Joey Hess
05ec4587dd partial and incomplete automatic merging in direct mode
Handles our file right, but not theirs.
2012-12-18 17:15:16 -04:00
Joey Hess
53dbcce645 direct mode merging works!
Automatic merge resoltion code needs to be fixed to preserve objects from
direct mode files.
2012-12-18 15:04:44 -04:00
Joey Hess
5df3c66a85 added direct and indirect commands 2012-12-13 15:44:56 -04:00
Joey Hess
cfe354eccd whitespace fix 2012-12-13 00:46:30 -04:00
Joey Hess
ffdd08fd2e Merge branch 'master' into desymlink 2012-12-13 00:46:10 -04:00
Joey Hess
0d50a6105b whitespace fixes 2012-12-13 00:45:27 -04:00
Joey Hess
b080a58b76 Merge branch 'master' into desymlink
Conflicts:
	Annex/CatFile.hs
	Annex/Content.hs
	Git/LsFiles.hs
	Git/LsTree.hs
2012-12-13 00:29:06 -04:00
Joey Hess
f87a781aa6 finished where indentation changes 2012-12-13 00:24:19 -04:00
Joey Hess
e7b8cb0063 direct mode committing 2012-12-12 19:20:38 -04:00
Joey Hess
f2ed0f9659 fix associated files to not fall back to object location 2012-12-12 13:11:59 -04:00
Joey Hess
752b5354ab make parent directory 2012-12-12 13:05:50 -04:00
Joey Hess
9d133270c2 update 2012-12-10 15:02:44 -04:00
Joey Hess
514957914d direct mode mappings now updated by git annex sync
Still lots to do to make sync handle direct mode, but this is a good first
step.
2012-12-10 14:37:24 -04:00
Joey Hess
b4c6da9cbd Got object sending working in direct mode.
However, I don't yet have a reliable way to deal with files being modified
while they're being transferred. I have code that detects it on the sending
side, but the receiver is still free to move the wrong content into its
annex, and record that it has the content. So that's not acceptable, and
I'll need to work on it some more.

However, at this point I can use a direct mode repository as a remote and
transfer files from and to it.
2012-12-08 17:03:39 -04:00
Joey Hess
664765e757 update the cache automatically when moving objects in or out 2012-12-08 13:13:36 -04:00
Joey Hess
ef24751922 support for checking presence of objects in direct mode
Also for dropping objects in direct mode.

Checking presence reliably needs a cache of mtime, size, and inode.
This way, if a file is modified, keys that point to it are no longer
present.

Also, the code for restoring the symlink when removing objects is
unnecessarily messy. calcGitLink was generating links starting with
"../../remote/.git/", when running "git annex move --from remote".
I put in a workaround, but calcGitLink should probably be fixed.

There is not yet support for getting objects from repositories in direct
mode; it still looks for content in .git/annex/objects, and there's no
once place I can change to fix that.

Also, getting objects from direct mode repositories is problematic since
the can be changed while the object is being transferred. It probably needs
to quarantine it first.
2012-12-07 17:29:55 -04:00
Joey Hess
3898d8c091 support for storing files in direct mode 2012-12-07 14:53:02 -04:00
Joey Hess
99a8a5297c --auto fixes
* get/copy --auto: Transfer data even if it would exceed numcopies,
  when preferred content settings want it.
* drop --auto: Fix dropping content when there are no preferred content
  settings.
2012-12-06 13:22:16 -04:00
Joey Hess
b5a9560a1b squelch warning 2012-11-26 16:30:46 -04:00
Joey Hess
da6fb44446 finished XMPP pairing!
This includes keeping track of which buddies we're pairing with, to know
which PairAck are legitimate.
2012-11-05 17:43:17 -04:00
Joey Hess
9767562f65 rsync special remote: Include annex-rsync-options when running rsync to test a key's presence.
Also, use the new withQuietOutput function to avoid running the shell to
/dev/null stderr in two other places.
2012-10-28 13:51:14 -04:00
Joey Hess
3417c55189 remove git-annex branch read cache
This cache prevented noticing changes made by another process.

The case I just ran into involved the assistant dropping a file, which
cached its presence info. Then the same file was downloaded again,
but the assistant didn't know its presence info had changed.

I don't see a way to keep this cache. Will instead rely on the OS level
file cache, for files in the journal. May need to add more higher-level
caching of info that it's ok to have a potentially stale copy of,
although much of git-annex already does so.
2012-10-19 14:25:15 -04:00
Joey Hess
e7780a39f5 Preferred content path matching bugfix.
When in a subdir, both the normal filepath, and the filepath relative to
the top of the git repo are needed for matching. The former for key lookup,
and the latter for include/exclude to match against. Previously, key lookup
didn't work in this situation.
2012-10-17 16:01:09 -04:00
Joey Hess
3156febec8 disable ssh connection caching for standalone builds
The standalone build does not bundle its own ssh, so should be built
to support as wide an array of ssh versions as possible, so turn off
connection caching.

Unfortunatly, as implemented this forces a full rebuild when building the
standalone binary, and of course it makes it somewhat slower.

This is not ideal, but neither is probing the ssh version every time it's
run (slow), or once when initializing a repo (fragile).
2012-10-15 14:49:40 -04:00
Joey Hess
97ea08e2d1 Avoid unsetting HOME when running certian git commands. Closes: #690193
Setting GIT_INDEX_FILE clobbers the rest of the environment, making git
not read ~/.gitconfig, and blow up if GECOS didn't have a name for the
user.

I'm not entirely happy with getEnvironment being run every time now,
that's somewhat expensive. It may make sense to just set GIT_COMMITTER_*
and GIT_AUTHOR_*, but I worry that clobbering the rest could break PATH,
or GIT_PATH, or something else that might be used by a command run in here.
And caching the environment is not a good idea either; it can change..
2012-10-11 12:58:24 -04:00
Joey Hess
39be7eea40 add standard group selector to repo edit form 2012-10-10 16:04:28 -04:00
Joey Hess
9da7dd8874 webapp: configure new repos to use the standard preferred content settings 2012-10-10 15:35:10 -04:00
Joey Hess
3490977d97 webapp: put new repos in standard groups
I'm using transfer for most things, both removable drives and cloud
storage, because it's the safest choice. We'll see if it makes sense
to prompt for the group when setting this up, or let the user pick
something else after the fact.
2012-10-10 15:27:25 -04:00
Joey Hess
f9b81c7a75 refactor 2012-10-10 15:15:56 -04:00
Joey Hess
5ac15149cc assistant: Now honors preferred content settings when deciding what to transfer.
Both when queueing downloads, and uploads, consults the preferred content
settings.

I didn't make it check yet when requeing failed transfers or queuing
deferred downloads; dealing with the preferred content settings (or indeed,
other settings) changing while the assistant is running still needs work.
2012-10-09 12:18:41 -04:00
Joey Hess
fee40dd374 generalized Annex.Wanted
this should make it easy to use from inside the assistant, where
everything is an AssociatedFile.
2012-10-08 17:14:01 -04:00
Joey Hess
836561e057 fix invered logic for shouldDrop 2012-10-08 16:12:02 -04:00
Joey Hess
1eedf495c3 make copy --to check preferred content of the remote 2012-10-08 16:06:56 -04:00
Joey Hess
34e7faf71a uninit: Unset annex.version. Closes: #689852 2012-10-07 16:04:03 -04:00
Joey Hess
47314c0fad fix last zombies in the assistant
Made Git.LsFiles return cleanup actions, and everything waits on
processes now, except of course for Seek.
2012-10-04 19:56:32 -04:00
Joey Hess
5594bf0643 more zombie fighting
I'm down to 9 places in the code that can produce unwaited for zombies.

Most of these are pretty innocuous, at least for now, are only
used in short-running commands, or commands that run a set of
actions and explicitly reap zombies after each one.

The one from Annex.Branch.files could be trouble later,
since both Command.Fsck and Command.Unused can trigger it,
and the assistant will be doing those eventally. Ditto the one in
Git.LsTree.lsTree, which Command.Unused uses.

The only ones currently affecting the assistant though, are
in Git.LsFiles. Several threads use several of those.

(And yeah, using pipes or ResourceT would be a less ad-hoc approach,
but I don't really feel like ripping my entire code base apart right
now to change a foundation monad. Maybe one of these days..)
2012-10-04 18:47:31 -04:00
Joey Hess
bc83179a76 Test that uuid -m works, falling back to plain uuid if not. 2012-09-25 10:48:20 -04:00
Joey Hess
3887432c54 fixes for transfer resume
Fix resuming of downloads, which do not have a transfer info file to read.

When checking upload progress, use the MVar, rather than re-reading
the info file.

Catch exceptions in the transfer action. Required a tryAnnex.
2012-09-24 13:18:16 -04:00
Joey Hess
e8188ea611 flip catchDefaultIO 2012-09-17 00:18:07 -04:00
Joey Hess
ba0334116c more descriptive name for oneshot 2012-09-15 20:46:38 -04:00
Joey Hess
750c4ac6c2 bugfix: avoid staging but not committing changes to git-annex branch
Branch.get is not able to see changes that have been staged to the index
but not committed. This is a limitation of git cat-file --batch; when
reading from the index, as opposed to from a branch, it does not notice
changes made after the first time it reads the index.

So, had to revert the changes made in 1f73db3469
to make annex.alwayscommit=false stage changes.

Also, ensure that Branch.change and Branch.get always see changes
at all points during a commit, by not deleting journal files when
staging to the index. Delete them only after committing the branch.
Before, there was a race during commits where a different git-annex
could see out-of-date info from the branch while a commit was in progress.

That's also done when updating the branch to merge in remote branches.

In the case where the local git-annex branch has had changes pushed into it
that are not yet reflected in the index, and there are journalled changes
as well, a merge commit has to be done.
2012-09-15 20:15:16 -04:00
Joey Hess
a1f93f06fd eliminate some commits to the git-annex branch
Commits used to be made to the git-annex branch whenever there were
journalled changes from a previous command, and the current command looked
up the value of a file. This no longer happens.

This means that transferkey, which is a oneshot command that stages
changes, can be run multiple times by the assistant, without each of them
committing the changes made by the command before. Which will be a lot
faster and use less space by batching up the commits.

Commits still happen if a remote git-annex branch has been changed and is
merged in.
2012-09-15 18:36:42 -04:00
Joey Hess
ca45cea113 Revert "add catFileIndex"
This interface is not a good idea, because a running git cat-file --batch
does not notice when existing files in the index are changed.
2012-09-15 18:30:53 -04:00
Joey Hess
e1baf48d88 add catFileIndex 2012-09-15 17:06:10 -04:00
Joey Hess
87fb9c690e remove withIndexUpdate helper 2012-09-15 15:48:21 -04:00
Joey Hess
5573911d25 Disable ssh connection caching if the path to the control socket would be too long (and use relative path to minimise path to the control socket). 2012-09-13 19:26:39 -04:00
Joey Hess
c9b3b8829d thread safe git-annex index file use 2012-08-24 20:50:39 -04:00
Joey Hess
5c3e14649e avoid unnecessary transfer scans when syncing a disconnected remote
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.

However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.

Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.

This code could have its efficiency improved:

* When multiple remotes are synced, if any one has diverged, they're
  all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
  the local repo has new content, or both, and could optimise its scan
  accordingly.
2012-08-22 15:05:57 -04:00
Joey Hess
9fc94d780b better readProcess 2012-07-19 00:57:40 -04:00
Joey Hess
1db7d27a45 add back debug logging
Make Utility.Process wrap the parts of System.Process that I use,
and add debug logging to them.

Also wrote some higher-level code that allows running an action
with handles to a processes stdin or stdout (or both), and checking
its exit status, all in a single function call.

As a bonus, the debug logging now indicates whether the process
is being run to read from it, feed it data, chat with it (writing and
reading), or just call it for its side effect.
2012-07-19 00:46:52 -04:00
Joey Hess
d1da9cf221 switch from System.Cmd.Utils to System.Process
Test suite now passes with -threaded!

I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.

In the process, --debug stopped printing commands it runs. I may try to
bring that back later.

Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
2012-07-18 18:00:24 -04:00
Joey Hess
05310538ef more debugging 2012-07-18 13:31:00 -04:00
Joey Hess
75b6ee81f9 avoid ByteString.Char8 where not needed
Its truncation behavior is a red flag, so avoid using it in these places
where only raw ByteStrings are used, without looking at the data inside.
2012-06-20 13:13:40 -04:00
Joey Hess
e0095b0bdc fishy commit 2012-06-14 00:01:48 -04:00
Joey Hess
942d8f7298 hlint 2012-06-12 11:32:06 -04:00
Joey Hess
ca9ee21bd7 crazy optimisation
Crazy like a fox..
2012-06-10 19:58:34 -04:00
Joey Hess
c5707c84d3 queue size fix
Increase queue size for update-index actions, because otherwise they'll
never be flushed.
2012-06-10 13:56:04 -04:00
Joey Hess
d45a9a7831 refactor and function name cleanup
(oops, I had a calcMerge and a calc_merge!)
2012-06-08 00:29:39 -04:00
Joey Hess
20f425be19 make watch use the queue
May not work. Certianly needs to flush the queue from time to time
when only symlink changes are being made.
2012-06-07 15:40:44 -04:00
Joey Hess
0a11b35d89 extend Git.Queue to be able to queue more than simple git commands
While I was in there, I noticed and fixed a bug in the queue size
calculations. It was never encountered only because Queue.add was
only ever run with 1 file in the list.
2012-06-07 15:19:44 -04:00
Joey Hess
b819f644ad close the git add race
There's a race adding a new file to the annex: The file is moved to the
annex and replaced with a symlink, and then we git add the symlink. If
someone comes along in the meantime and replaces the symlink with
something else, such as a new large file, we add that instead. Which could
be bad..

This race is fixed by avoiding using git add, instead the symlink is
directly staged into the index.

It would be nice to make `git annex add` use this same technique.
I have not done so yet because it currently runs git update-index once per
file, which would slow does `git annex add`. A future enhancement would be
to extend the Git.Queue to include the ability to run update-index with
a list of Streamers.
2012-06-06 14:29:10 -04:00
Joey Hess
993e6459a3 factor out nukeFile 2012-06-06 13:13:13 -04:00
Joey Hess
27cfeca4ea Merge branch 'master' into watch 2012-06-06 02:16:21 -04:00
Joey Hess
f1bd72ea54 factor out generic update-index code from unionmerge code 2012-06-06 00:10:34 -04:00
Joey Hess
7a6fb8ae4e flush the git queue when a new type of action is being added to it
This allows the queue to be used in a single process for multiple possibly
conflicting commands, like add and rm, without running them out of order.

This assumes that running the same git subcommand with different parameters
cannot itself conflict.
2012-06-04 20:41:22 -04:00
Joey Hess
bb4f31a0ee Clean up handling of git directory and git worktree.
Baked into the code was an assumption that a repository's git directory
could be determined by adding ".git" to its work tree (or nothing for bare
repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are
used to separate the two.

This was attacked at the type level, by storing the gitdir and worktree
separately, so Nothing for the worktree means a bare repo.

A complication arose because we don't learn where a repository is bare
until its configuration is read. So another Location type handles
repositories that have not had their config read yet. I am not entirely
happy with this being a Location type, rather than representing them
entirely separate from the Git type. The new code is not worse than the
old, but better types could enforce more safety.

Added support for core.worktree. Overriding it with -c isn't supported
because it's not really clear what to do if a git repo's config is read, is
not bare, and is then overridden to bare. What is the right git directory
in this case? I will worry about this if/when someone has a use case for
overriding core.worktree with -c. (See Git.Config.updateLocation)

Also removed and renamed some functions like gitDir and workTree that
misused git's terminology.

One minor regression is known: git annex add in a bare repository does not
print a nice error message, but runs git ls-files in a way that fails
earlier with a less nice error message. This is because before --work-tree
was always passed to git commands, even in a bare repo, while now it's not.
2012-05-18 17:03:12 -04:00
Joey Hess
f7d8982672 Fix use of several config settings
annex.ssh-options, annex.rsync-options, annex.bup-split-options.

And adjust types to avoid the bugs that broke several config settings
recently. Now "annex." prefixing is enforced at the type level.
2012-05-05 20:16:56 -04:00
Joey Hess
76102c1c75 display "Recording state in git..." when staging the journal
A bit tricky to avoid printing it twice in a row when there are queued git
commands to run and journal to stage.

Added a generic way to run an action that may output multiple side
messages, with only the first displayed.
2012-04-27 13:54:33 -04:00