Commit graph

88 commits

Author SHA1 Message Date
Joey Hess
d0c1a22e7c import metadata from feeds
When annex.genmetadata is set, metadata from the feed is added to files
that are imported from it.

Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names
that are used in --template.

Also added title and author, which are the item title/author if available,
falling back to the feed title/author. These are more likely to be common
metadata fields.

(There is a small bit of dupication here, but once git gets
around to packing the object, it will compress it away.)

The itempubdate field is not included in the metadata as a string; instead
it is used to generate year and month fields, same as is done when adding
files with annex.genmetadata set.

This commit was sponsored by Amitai Schlair, who cooincidentially
is responsible for ikiwiki generating nice feed metadata!
2014-07-03 14:15:00 -04:00
Joey Hess
e880d0d22c replace (Key, Backend) with Key
Only fsck and reinject and the test suite used the Backend, and they can
look it up as needed from the Key. This simplifies the code and also speeds
it up.

There is a small behavior change here. Before, all commands would warn when
acting on an annexed file with an unknown backend. Now, only fsck and
reinject show that warning.
2014-04-17 18:03:39 -04:00
Joey Hess
1c965f5918
improve addurl --file behavior in a confusing corner case involving a ftp url 2014-04-02 15:22:39 -04:00
Joey Hess
e426fac273 add desktop notifications
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.

This commit was sponsored by THM Schoemaker.
2014-03-22 14:12:19 -04:00
Joey Hess
7ac37a7854 Probe for quvi version at run time.
Overhead: git annex addurl runs quvi --version once.
And more bloat to Annex state..
2014-02-28 14:54:02 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
003fc2b7e1
add UrlOptions sum type 2014-02-24 22:00:25 -04:00
Joey Hess
c69d6eb035 Make annex.web-options be used in several places that call curl. 2014-02-24 21:29:37 -04:00
Joey Hess
86ffeb73d1 reorganize some files and imports 2014-01-26 16:25:55 -04:00
Joey Hess
34c8af74ba fix inversion of control in CommandSeek (no behavior changes)
I've been disliking how the command seek actions were written for some
time, with their inversion of control and ugly workarounds.

The last straw to fix it was sync --content, which didn't fit the
Annex [CommandStart] interface well at all. I have not yet made it take
advantage of the changed interface though.

The crucial change, and probably why I didn't do it this way from the
beginning, is to make each CommandStart action be run with exceptions
caught, and if it fails, increment a failure counter in annex state.
So I finally remove the very first code I wrote for git-annex, which
was before I had exception handling in the Annex monad, and so ran outside
that monad, passing state explicitly as it ran each CommandStart action.

This was a real slog from 1 to 5 am.

Test suite passes.

Memory usage is lower than before, sometimes by a couple of megabytes, and
remains constant, even when running in a large repo, and even when
repeatedly failing and incrementing the error counter. So no accidental
laziness space leaks.

Wall clock speed is identical, even in large repos.

This commit was sponsored by an anonymous bitcoiner.
2014-01-20 04:57:36 -04:00
Joey Hess
78c7c54fdb also check diskreserve for quvi downloads 2014-01-04 15:38:59 -04:00
Joey Hess
f9e7b6cf61 addurl, importfeed: Honor annex.diskreserve as long as the size of the url can be checked.
This adds a http HEAD before the download is done. That was already the
case when the assistant was running, and it seems worth it to avoid filling
up the whole disk, like happened to my server today.
2014-01-04 15:08:06 -04:00
Joey Hess
81f498559a importfeed: Support youtube playlists. 2013-12-29 15:52:20 -04:00
Joey Hess
747f5b123c url size fixes
addurl: Improve message when adding url with wrong size to existing file.
Before the message suggested the url didn't exist.

Fixed handling of URL keys that have no recorded size. Before, if the key
has no size, the url also had to not declare any size, which was unlikely
and wrong, or it was taken to not exist. This probably would mostly affect
keys that were added to the annex with addurl --relaxed.
2013-10-11 13:05:00 -04:00
Joey Hess
9b746ee588 honor fileNameLengthLimit for quvi 2013-10-05 13:32:42 -04:00
Joey Hess
478eeea02e addurl: Better sanitization of generated filenames.
Use sanitizeFilePath rather than rolling our own sanitizer.
2013-10-05 13:30:13 -04:00
Joey Hess
12f6b9693a Send a git-annex user-agent when downloading urls.
Overridable with --user-agent option.

Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.

This commit sponsored by Michael Zehrer.
2013-09-28 14:35:21 -04:00
Joey Hess
98fc7e8a19 add, import, assistant: Better preserve the mtime of symlinks, when when adding content that gets deduplicated.
Note that this turned out to remove a syscall, not add any expense.
Otherwise, I would not have done it.
2013-09-25 16:07:11 -04:00
Joey Hess
5c71dc1087 addurl: Fix quvi audodetection, broken in last release. 2013-09-15 16:37:03 -04:00
Joey Hess
ecbb326e9d Allow building without quvi support. 2013-09-09 02:16:22 -04:00
Joey Hess
824241b6fb better cases 2013-08-22 23:44:13 -04:00
Joey Hess
46b6d75274 Youtube support! (And 53 other video hosts)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.

web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.

This commit was sponsored by Mark Hepburn. Thanks!
2013-08-22 18:50:43 -04:00
Joey Hess
dc3e0725f9 improve error message 2013-08-02 13:01:25 -04:00
Joey Hess
ddd46db09a Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit.
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..

Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.

Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.

A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.

Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.

Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.

There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.

This commit was sponsored by Alexander Brem. Thanks!
2013-07-30 19:18:29 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
74ad3072e4 addurl --pathdepth: Fix failure when the pathdepth specified is deeper than the urls's path. 2013-07-05 12:46:38 -04:00
Joey Hess
8a2d1988d3 expose Control.Monad.join
I think I've been looking for that function for some time.
Ie, I remember wanting to collapse Just Nothing to Nothing.
2013-04-22 20:24:53 -04:00
Joey Hess
9e11699c76 connect existing meters to the transfer log for downloads
Most remotes have meters in their implementations of retrieveKeyFile
already. Simply hooking these up to the transfer log makes that information
available. Easy peasy.

This is particularly valuable information for encrypted remotes, which
otherwise bypass the assistant's polling of temp files, and so don't have
good progress bars yet.

Still some work to do here (see progressbars.mdwn changes), but this
is entirely an improvement from the lack of progress bars for encrypted
downloads.
2013-04-11 17:32:31 -04:00
Joey Hess
1eb3fff787 addurl: Register transfer so the webapp can see it.
* addurl: Register transfer so the webapp can see it.
* addurl: Automatically retry downloads that fail, as long as some
  additional content was downloaded.
2013-04-11 16:14:17 -04:00
Joey Hess
7b4733f0e8 addurl: Bugfix: Did not properly add file in direct mode. 2013-04-11 13:35:52 -04:00
Joey Hess
cfd3b16fe1 add section metadata to all commands
Not yet used .. mindless train work.
2013-03-24 18:28:21 -04:00
Joey Hess
70b7555eaf fix relaxed with existing file 2013-03-12 15:58:36 -04:00
Joey Hess
de6f74ac88 addurl: Add --relaxed option. 2013-03-11 19:55:01 -04:00
Joey Hess
7ce30b534f add: Improved detection of files that are modified while being added.
In indirect mode, now checks the inode cache to detect changes to a file.
Note that a file can still be changed if a process has it open for write,
after landing in the annex.

In direct mode, some checking of the inode cache was done before, but
from a much later point, so fewer modifications could be detected. Now it's
as good as indirect mode.

On crippled filesystems, no lock down is done before starting to add a
file, so checking the inode cache is the only protection we have.
2013-02-14 16:54:36 -04:00
Joey Hess
248090064d addurl in direct mode 2013-01-06 17:34:44 -04:00
Joey Hess
e872c3f648 convert notBareRepo to a CommandCheck
This avoids some small overhead by only running the check once per command;
it also ensures that, even if the command doesn't find anything to run on,
it still fails to run when in a bare repo.
2012-12-29 14:45:19 -04:00
Joey Hess
2ce736ac50 block all commands that don't work in direct mode
I left status working in direct mode, although it doesn't show correct
stats for known annex keys.
2012-12-29 14:28:19 -04:00
Joey Hess
ebd576ebcb where indentation 2012-11-12 01:05:04 -04:00
Joey Hess
e0fdfb2e70 maintain set of files pendingAdd
Kqueue needs to remember which files failed to be added due to being open,
and retry them. This commit gets the data in place for such a retry thread.

Broke KeySource out into its own file, and added Eq and Ord instances
so it can be stored in a Set.
2012-06-20 16:31:46 -04:00
Joey Hess
ca9d94a0ad addurl: Was broken by a typo introduced 2 released ago, now fixed. Closes: #677576 2012-06-14 20:20:03 -04:00
Joey Hess
d3cee987ca separate source of content from the filename associated with the key when generating a key
This already made migrate's code a lot simpler.
2012-06-05 19:51:03 -04:00
Joey Hess
84ac8c58db Add annex.httpheaders and annex.httpheader-command config settings
Allow custom headers to be sent with all HTTP requests.

(Requested by the Internet Archive)
2012-04-22 01:13:09 -04:00
Joey Hess
60ab3d84e1 added ifM and nuked 11 lines of code
no behavior changes
2012-03-14 17:43:34 -04:00
Joey Hess
779ec91908 more robustness fixes 2012-02-18 12:08:02 -04:00
Joey Hess
abd50e01fb don't fail with --pathdepth when file already exists 2012-02-18 12:05:13 -04:00
Joey Hess
00340dfe49 don't error out entirely if an url cannot be downloaded 2012-02-18 11:44:21 -04:00
Joey Hess
69a0161c3a fix filename limit when using --pathdepth 2012-02-16 19:37:02 -04:00
Joey Hess
d05550e803 zero still bad 2012-02-16 14:28:54 -04:00
Joey Hess
346c934409 allow pathdepth to drop from the front or take from the end (negative) 2012-02-16 14:26:53 -04:00
Joey Hess
c2245260b1 improve usage 2012-02-16 12:37:30 -04:00