Commit graph

1252 commits

Author SHA1 Message Date
Joey Hess
46b6d75274 Youtube support! (And 53 other video hosts)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.

web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.

This commit was sponsored by Mark Hepburn. Thanks!
2013-08-22 18:50:43 -04:00
Joey Hess
412dcb8017 Fix bug that caused typechanged symlinks to be assumed to be unlocked files, so they were added to the annex by the pre-commit hook. 2013-08-22 13:57:07 -04:00
Joey Hess
74034ec781 Better error message when trying to use a git remote that has annex.ignore set. 2013-08-22 12:01:53 -04:00
Joey Hess
d1ea1f2474 Unescape characters in 'file://...' URIs. (Thanks, guilhem for the patch.) 2013-08-22 11:36:48 -04:00
Joey Hess
6fd2935a5a unused: Pay attention to symlinks that are not yet staged in the index. 2013-08-22 10:20:03 -04:00
Joey Hess
d603f536bd Set --clobber when running wget to ensure resuming works properly. 2013-08-21 18:19:01 -04:00
Joey Hess
0912e752b5 Revert "Delete empty downloaded file when wget fails, to work around reported resume failure."
This reverts commit 98886e3fbf.

Better fix forthcoming
2013-08-21 18:17:48 -04:00
Joey Hess
98886e3fbf Delete empty downloaded file when wget fails, to work around reported resume failure.
<RichiH> i richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % rm /home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % git annex get P8060008.JPG
<RichiH> get P8060008.JPG (from website...) --2013-08-21 21:42:45--  http://annex.debconf.org/debconf-share/.git//annex/objects/1a4/67d/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> Resolving annex.debconf.org (annex.debconf.org)... 5.153.231.227, 2001:41c8:1000:19::227:2
<RichiH> Connecting to annex.debconf.org (annex.debconf.org)|5.153.231.227|:80... connected.
<RichiH> HTTP request sent, awaiting response... 404 Not Found
<RichiH> 2013-08-21 21:42:45 ERROR 404: Not Found.
<RichiH> File `/home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG' already there; not retrieving.
<RichiH>   Unable to access these remotes: website
<RichiH>   Try making some of these repositories available:
<RichiH>    3e0356ac-0743-11e3-83a5-1be63124a102 -- website (annex.debconf.org)
<RichiH>     a7495021-9f2d-474e-80c7-34d29d09fec6 -- chrysn@hephaistos:~/data/projects/debconf13/debconf-share
<RichiH>     eb8990f7-84cd-4e6b-b486-a5e71efbd073 -- joeyh passport usb drive
<RichiH>     f415f118-f428-4c68-be66-c91501da3a93 -- joeyh laptop
<RichiH> failed
<RichiH> git-annex: get: 1 failed
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn %

I was not able to reproduce the failure, but I did reproduce that
wget -O http://404/ results in an empty file being written.
2013-08-21 16:05:51 -04:00
Joey Hess
0f921307e7 mirror: New command, makes two repositories contain the same set of files.
This is a simple approach for setting up a mirroring repository.

It will work with any type of remotes.

Mirror --from is more expensive than mirror --to in general.
OTOH, mirror --from will get the file from any remote that has it, not only
the named mirror remote. And if the named mirror remote is not the fastest
available remote with a file, that can speed things up.

It would be possible to make the assistant or watch command do a more
dynamic mirroring, that didn't need to scan every time.
2013-08-20 15:46:35 -04:00
Joey Hess
e240cb99f7 Merge branch 'duplicate'
Conflicts:
	debian/changelog
2013-08-20 10:27:24 -04:00
Joey Hess
a6a047192e sync, merge: Bug fix: Don't try to merge into master when in a bare repo. 2013-08-17 21:29:44 +02:00
Joey Hess
641b14b124 Debian: Recommend ssh-askpass, which ssh will use when the assistant is run w/o a tty. Closes: #719832 2013-08-16 10:07:44 +02:00
Joey Hess
59f9781a23 Debian: Run the builtin test suite as an autopkgtest. 2013-08-15 15:49:19 +02:00
Joey Hess
572d6fde14 releasing version 4.20130815 2013-08-15 10:46:33 +02:00
Joey Hess
36693b3e86 releasing version 4.20130815 2013-08-15 10:09:55 +02:00
Joey Hess
38022f4f49 Windows: Fixed permissions problem that prevented removing files from directory special remote.
Directory special remotes now fully usable.
2013-08-04 13:43:48 -04:00
Joey Hess
a837ed47f7 Windows: Added support for encrypted special remotes. 2013-08-04 13:03:34 -04:00
Joey Hess
b28023cb52 importfeed: Fix handling of dots in extensions. 2013-08-03 02:36:38 -04:00
Joey Hess
24c8a6042b importfeed: Ignores transient problems with feeds. Only exits nonzero when a feed has repeatedly had a problems for at least 1 day. 2013-08-03 01:40:21 -04:00
Joey Hess
b191d5c595 gitignore support for the assistant and watcher
Requires git 1.8.4 or newer. When it's installed, a background
git check-ignore process is run, and used to efficiently check ignores
whenever a new file is added.

Thanks to Adam Spiers, for getting the necessary support into git for this.

A complication is what to do about files that are gitignored but have
been checked into git anyway. git commands assume the ignore has been
overridden in this case, and not need any more overriding to commit a
changed version.

However, for the assistant to do the same, it would have to run git ls-files
to check if the ignored file is in git. This is somewhat expensive. Or it
could use the running git-cat-file process to query the file that way,
but that requires transferring the whole file content over a pipe, so it
can be quite expensive too, for files that are not git-annex
symlinks.

Now imagine if the user knows that a file or directory tree will be getting
frequent changes, and doesn't want the assistant to sync it, so gitignores
it. The assistant could overload the system with repeated ls-files checks!

So, I've decided that the assistant will not automatically commit changes
to files that are gitignored. This is a tradeoff. Hopefully it won't be a
problem to adjust .gitignore settings to not ignore files you want the
assistant to autocommit, or to manually git annex add files that are listed
in .gitignore.

(This could be revisited if git-annex gets access to an interface to check
the content of the index w/o forking a git command. This could be libgit2,
or perhaps a separate git cat-file --batch-check process, so it wouldn't
need to ship over the whole file content.)

This commit was sponsored by Francois Marier. Thanks!
2013-08-02 20:37:03 -04:00
Joey Hess
7fdf9ea5dd releasing version 4.20130802 2013-08-02 13:38:18 -04:00
Joey Hess
d16114d024 Slow and ugly work around for bug #718517 in git, which broke git-cat-file --batch for filenames containing spaces.
This runs git-cat-file in non-batch mode for all files with spaces.
If a directory tree has a lot of them, and is in direct mode, even "git
annex add" when there are few new files will need a *lot* of forks!

The only reason buffering the whole file content to get the sha is not a
memory leak is that git-annex only ever uses this on symlinks.

This needs to be reverted as soon as a fix is available in git!
2013-08-01 17:30:47 -04:00
Joey Hess
ebd778c519 Escape ':' in file/directory names to avoid it being treated as a pathspec by some git commands
A git pathspec is a filename, except when it starts with ':', it's taken
to refer to a branch, etc. Rather than special case ':', any filename
starting with anything unusual is prefixed with "./"

This could have been a real mess to deal with, but luckily SafeCommand
is already extensively used and so we know at the type level the difference
between parameters that are files, and parameters that are command options.

Testing did show that Git.Queue was not using SafeCommand on
filenames fed to xargs. (Filenames starting with '-' worked before only
because -- was used to separate filenames from options when calling eg git
add.)

The test suite now passes with filenames starting with ':'. However, I did
not keep that change to it, because such filenames are probably not legal
on windows, and I have enough ugly windows ifdefs in there as it is.

This commit was sponsored by Otavio Salvador. Thanks!
2013-08-01 15:15:49 -04:00
Joey Hess
d1ed337035 webapp: Improve handling of remotes whose setup has stalled.
This includes recovery from the ssh-agent problem that led to many reporting
http://git-annex.branchable.com/bugs/Internal_Server_Error:_Unknown_UUID/
(Including fixing up .ssh/config to set IdentitiesOnly.)

Remotes that have no known uuid are now displayed in the webapp as
"unfinished". There's a link to check their status, and if the remote
has been set annex-ignore, a retry button can be used to unset that and
try again to set up the remote.

As this bug has shown, the process of adding a ssh remote has some failure
modes that are not really ideal. It would certianly be better if, when
setting up a ssh remote it would detect if it's failed to get the UUID,
and handle that in the remote setup process, rather than waiting until
later and handling it this way.

However, that's hard to do, particularly for local pairing, since the
PairListener runs as a background thread. The best it could do is pop up an
alert if there's a problem. This solution is not much different.

Also, this solution handles cases where the user has gotten their repo into
a mess manually and let's the assistant help with cleaning it up.

This commit was sponsored by Chia Shee Liang. Thanks!
2013-07-31 16:36:29 -04:00
Joey Hess
cbfdf3ab21 set IdentitiesOnly
When setting up a dedicated ssh key to access the annex on a host,
set IdentitiesOnly to prevent the ssh-agent from forcing use of a different
ssh key.

That behavior could result in unncessary password prompts. I remember
getting a message or two from people who got deluged with password
prompts and I couldn't at the time see why.

Also, it would prevent git-annex-shell from being run on the remote host,
when git-annex was installed there by unpacking the standalone tarball,
since the authorized_keys line for the dedicated ssh key, which sets
up calling git-annex-shell when it's not in path, wouldn't be used.

This fixes
http://git-annex.branchable.com/bugs/Internal_Server_Error:_Unknown_UUID
but I've not closed that bug yet since I should still:

1. Investigate why the ssh remote got set up despite being so broken.
2. Make the webapp not handle the NoUUID state in such an ugly way.
3. Possibly add code to fix up systems that encountered the problem.
   Although since it requires changes to .ssh/config this may be one for
   the release notes.

Thanks to TJ for pointing me in the right direction to understand what
was happening here.
2013-07-31 13:30:49 -04:00
Joey Hess
9476355bc3 find: Avoid polluting stdout with progress messages. Closes: #718186 2013-07-30 20:24:27 -04:00
Joey Hess
ddd46db09a Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit.
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..

Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.

Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.

A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.

Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.

Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.

There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.

This commit was sponsored by Alexander Brem. Thanks!
2013-07-30 19:18:29 -04:00
Joey Hess
147a9f8882 Improve test suite on Windows; now tests git annex sync. 2013-07-30 17:03:32 -04:00
Joey Hess
7b0970b340 Fix inverted logic in last release's fix for data loss bug, that caused git-annex sync on FAT or other crippled filesystems to add symlink standin files to the annex. 2013-07-30 16:08:09 -04:00
Joey Hess
f55dc1b64f OSX: Make git-annex-webapp run in the background, so that the app icon can be clicked on the open a new webapp when the assistant is already running. 2013-07-30 15:04:31 -04:00
Joey Hess
10f297d989 typo 2013-07-28 17:04:46 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
869c638b82 assistant: Fix bug that caused it to stall when adding a very large number of files at once (around 5 thousand).
This bug was introduced in 82a6db8fe8,
which improved handling of adding very large numbers of files by ensuring
that a minimum number of max size commits (5000 files each) were done.

I accidentially made it wait for another change to appear after such a max
size commit, even if a lot of queued changes were already accumulated.
That resulted in a stall when it got to the end. Now fixed to not wait
any longer than necessary to ensure the watcher has had time to wake back
up after the max size commit.

This commit was sponsored by Michael Linksvayer. Thanks!
2013-07-27 17:42:18 -04:00
Joey Hess
ec4d974dcf assistant: Fix deadlock that could occur when adding a lot of files at once in indirect mode.
This is a laziness problem. Despite the bang pattern on newfiles, the list
was not being fully evaluated before cleanup was called. Moving cleanup out
to after the list is actually used fixes this.

More evidence that I should be using ResourceT or pipes, if any was needed.
2013-07-26 18:42:22 -04:00
Joey Hess
97f3aecb17 assistant: Fix NetWatcher to not sync with remotes that have remote.<name>.annex-sync set to false.
This affected both the hourly NetWatcherFallback thread and the syncing
when network connection is detected.

It was a reversion of sorts, introduced in
8861e270be, when annex-ignore was changed to
not control git syncing. I forgot to make it check annex-sync at that
point.
2013-07-26 16:54:20 -04:00
Joey Hess
2311f30e4c correct changelog; git annex get --unused doesn't make sense 2013-07-25 19:53:52 -04:00
Joey Hess
c6100aa5cc unused: No longer shows as unused tmp files that are actively being transferred. 2013-07-25 19:51:08 -04:00
Joey Hess
822918089e dropunused behavior change: Now refuses to drop the last copy of a file, unless you use the --force.
This was the last place in git-annex that could remove data referred to by
the git history, without being forced.

Like drop, dropunused checks remotes, and honors the global annex.numcopies
setting. (However, .gitattributes settings cannot apply to unused files.)
2013-07-25 19:50:44 -04:00
Joey Hess
fc1a79835b Always build with -threaded, to avoid a deadlock when communicating with gpg. 2013-07-25 13:57:53 -04:00
Joey Hess
5c82c99c76 webapp: When creating a repository on a removable drive, set core.fsyncobjectfiles, to help prevent data loss when the drive is yanked. 2013-07-23 13:38:05 -04:00
Joey Hess
381637e4c8 Add status message to XMPP presence tag, to identify to others that the client is a git-annex client.
I only added this to the presense messages that are really intended for
presence. The ones used for tunneling git etc don't have the tag, because
that would waste bandwidth.
2013-07-23 12:41:41 -04:00
Joey Hess
ba4e0b4878 releasing version 4.20130723 2013-07-23 11:41:16 -04:00
Joey Hess
5e3a404d4f Support import in direct mode. 2013-07-22 20:18:00 -04:00
Joey Hess
f353f13c9d Support unannex and uninit in direct mode.
In direct mode, it's best to whenever possible not move direct mode files
out of the way, and so I made unannex avoid touching the direct mode file at
all.

That actually turns out to be easy, because in direct mode, unlike indirect
mode, the pre-commit hook won't get confused if the unannexed file later
gets added back by git add. So there's no need to commit the unannex right
away; it can be staged for the user to commit later. This also means that
unannex in direct mode is a lot faster than in indirect mode!

Another subtle bit is the bookkeeping that is done when unannexing a direct
mode file. The inode cache needs to be removed so that when uninit runs
getKeysPresent, it doesn't see the cache and think the key is still
present and crash when it's not.

This commit is sponsored by Douglas Butts. Thanks!
2013-07-22 17:28:53 -04:00
Joey Hess
6ae2637eb1 For long hostnames, use a hash of the hostname to generate the socket file for ssh connection caching.
This is ok to do now that the socket filename never needs to be mapped back
to a hostname.

Short hostnames will still appear in the clear, which is less obfuscated.
So this cannot possibly make ssh connection caching fail for a hostname it
used to work for.
2013-07-22 15:09:41 -04:00
Joey Hess
002de3e547 point to android icons too 2013-07-21 13:28:35 -04:00
Joey Hess
9fb5eaa179 ref debian bug 2013-07-20 21:32:52 -04:00
Joey Hess
780efc775c When an XMPP server has SRV records, try them, but don't then fall back to the regular host if they all fail.
gmail.com has some XMPP SRV records, but does not itself respond to XMPP
traffic, although it does accept connections on port 5222. So if a user
entered the wrong password, it would try all the SRVs and fall back to
trying gmail, and hang at that point.

This seems the right thing to do, not just a workaround.
2013-07-20 21:18:55 -04:00
Joey Hess
6f526518eb fixed nasty data loss bug
I wanted to try to guard against it in Command.Add too, but it's a case of
garbage in, garbage out. Once Command.Add has been told it's dealing with a
dummy symlink, it goes and deletes it, and even though the object it
thinks it points to is not present in the annex, it's Command.Add is still
doing the right thing to go ahead and add the broken symlink. So the two
fixes I was able to put in will have to do.
2013-07-20 19:37:12 -04:00
Joey Hess
9fc1448947 webapp: Differentiate between creating a new S3/Glacier/WebDav remote, and initializing an existing remote. When creating a new remote, avoid conflicts with other existing (or deleted) remotes with the same name. 2013-07-20 18:15:16 -04:00