importfeed: git-annex becomes a podcatcher in 150 LOC

This commit is contained in:
Joey Hess 2013-07-28 15:27:36 -04:00
parent 55bd5a81ad
commit 7e66d260ea
15 changed files with 319 additions and 32 deletions

View file

@ -190,6 +190,19 @@ subdirectories).
git annex import /media/camera/DCIM/
* importfeed [url ...]
Imports the contents of podcast feeds. Only downloads files whose
urls have not already been added to the repository before, so you can
delete, rename, etc the resulting files and repeated runs won't duplicate
them.
Use --template to control where the files are stored.
The default template is '${feedtitle}/${itemtitle}${extension}'
(Other available variables: feedauthor, itemauthor, itemsummary, itemdescription, itemrights, itemid)
The --relaxed and --fast options behave the same as they do in addurl.
* watch
Watches for changes to files in the current directory and its subdirectories,

View file

@ -21,6 +21,7 @@ quite a lot.
* [UUID](http://hackage.haskell.org/package/uuid)
* [regex-tdfa](http://hackage.haskell.org/package/regex-tdfa)
* [extensible-exceptions](http://hackage.haskell.org/package/extensible-exceptions)
* [feed](http://hackage.haskell.org/package/feed)
* Optional haskell stuff, used by the [[assistant]] and its webapp
* [stm](http://hackage.haskell.org/package/stm)
(version 2.3 or newer)

View file

@ -0,0 +1,44 @@
You can use git-annex as a podcatcher, to download podcast contents.
No additional software is required, but your git-annex must be built
with the Feeds feature (run `git annex version` to check).
All you need to do is put something like this in a cron job:
`cd somerepo && git annex importfeed http://url/to/podcast http://other/podcast/url`
This downloads the urls, and parses them as RSS, Atom, or RDF feeds.
All enclosures are downloaded and added to the repository, the same as if you
had manually run `git annex addurl` on each of them.
git-annex will avoid downloading a file from a feed if its url has already
been stored in the repository before. So once a file is downloaded,
you can move it around, delete it, `git annex drop` its content, etc,
and it will not be downloaded again by repeated runs of
`git annex importfeed`. Just how a podcatcher should behave.
## templates
To control the filenames used for items downloaded from a feed,
there's a --template option. The default is
`--template='${feedtitle}/${itemtitle}${extension}'`
Other available template variables:
feedauthor, itemauthor, itemsummary, itemdescription, itemrights, itemid
## catching up
To catch up on a feed without downloading its contents,
use `git annex importfeed --relaxed`, and delete the symlinks it creates.
Next time you run `git annex addurl` it will only fetch any new items.
## fast mode
To add a feed without downloading its contents right now,
use `git annex importfeed --fast`. Then you can use `git annex get` as
usual to download the content of an item.
## distributed podcastching
A nice benefit of using git-annex as a podcatcher is that you can
run `git annex importfeed` on the same url in different clones
of a repository, and `git annex sync` will sync it all up.