Merge branch 'master' into youtube-dl

This commit is contained in:
Joey Hess 2017-11-30 16:16:58 -04:00
commit 640cb36a5c
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
12 changed files with 207 additions and 2 deletions

View file

@ -299,12 +299,12 @@ dist/caballog: git-annex.cabal
# TODO should be possible to derive this from caballog.
hdevtools:
hdevtools --stop-server || true
hdevtools check git-annex.hs -g -cpp -g -i -g -idist/build/git-annex/git-annex-tmp -g -i. -g -idist/build/autogen -g -Idist/build/autogen -g -Idist/build/git-annex/git-annex-tmp -g -IUtility -g -DWITH_TESTSUITE -g -DWITH_S3 -g -DWITH_ASSISTANT -g -DWITH_INOTIFY -g -DWITH_DBUS -g -DWITH_PAIRING -g -g -optP-include -g -optPdist/build/autogen/cabal_macros.h -g -odir -g dist/build/git-annex/git-annex-tmp -g -hidir -g dist/build/git-annex/git-annex-tmp -g -stubdir -g dist/build/git-annex/git-annex-tmp -g -threaded -g -Wall -g -XHaskell98 -g -XPackageImports
hdevtools check git-annex.hs -g -cpp -g -i -g -idist/build/git-annex/git-annex-tmp -g -i. -g -idist/build/autogen -g -Idist/build/autogen -g -Idist/build/git-annex/git-annex-tmp -g -IUtility -g -DWITH_TESTSUITE -g -DWITH_S3 -g -DWITH_ASSISTANT -g -DWITH_INOTIFY -g -DWITH_DBUS -g -DWITH_PAIRING -g -g -optP-include -g -optPdist/build/autogen/cabal_macros.h -g -odir -g dist/build/git-annex/git-annex-tmp -g -hidir -g dist/build/git-annex/git-annex-tmp -g -stubdir -g dist/build/git-annex/git-annex-tmp -g -threaded -g -Wall -g -XHaskell98 -g -XPackageImports -g -XLambdaCase
distributionupdate:
git pull
cabal configure
ghc -Wall -fno-warn-tabs --make Build/DistributionUpdate -XPackageImports -optP-include -optPdist/build/autogen/cabal_macros.h
ghc -Wall -fno-warn-tabs --make Build/DistributionUpdate -XLambdaCase -XPackageImports -optP-include -optPdist/build/autogen/cabal_macros.h
./Build/DistributionUpdate
.PHONY: git-annex git-union-merge tags

View file

@ -0,0 +1,15 @@
### Please describe the problem.
Running adjust --unlock is unexpectedly slow and seems to use a lot of space, even on BTRFS, suggesting it probably does not use --reflink=auto like most other commands.
### What steps will reproduce the problem?
Run adjust --unlock with very large files.
### What version of git-annex are you using? On what operating system?
6.20170101-1+deb9u1 on Debian Stretch
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes I have! I've used it manage lots of video editing disks before, and am now migrating several slightly different copies of 15TB sized documentary footage from random USB3 disks and LTO tapes to a RAID server with BTRFS.

View file

@ -0,0 +1,30 @@
Working on [[todo/switch_from_quvi_to_youtube-dl]], because
quvi is not being maintained and youtube-dl can download a lot more stuff.
Unfortunately, youtube-dl's interface is not a good fit for git-annex,
compared with quvi's interface which was a near-perfect fit. Two things
git-annex relied on quvi for are a way to check if a url has embedded media
without downloading the url, and a way to get the url from which the
embedded media can be downloaded. Youtube-dl supports neither. Also it has
some other warts that make it unncessarily hard to interface with, like not
always [storing the download in the location specified by --output](https://github.com/rg3/youtube-dl/issues/14864),
and [sometimes crashing when downloading non-media urls (eg over my satellite internet)](http://bugs.debian.org/874321).
I've found ways to avoid all these problems. For example, to make
`git annex addurl` avoid unncessarily overhead of running youtube-dl
in the common case of downloading some non-web-page file, I'll have it
download the url content, and check if it looks like a html page.
Only then will it use youtube-dl. So addurl of html pages without
embedded media will get slower, but addurl of everything else
will be as fast as before.
But there's an unavoidable change to `addurl --relaxed`. It will not check
for embedded media and more, because that would make it a lot slower, since
it would have to hit the network. `addurl --fast` will have to be used for
such urls instead. I hope this behavior change won't affect workflows
badly.
Today was all coding groundwork, and I just got to the point that I'm
ready to have it run youtube-dl. Hope to finish it tomorrow.
Today's work was sponsored by Jake Vosloo [on Patreon](https://www.patreon.com/joeyh).

View file

@ -0,0 +1,13 @@
It's mostly working now. Still need to fix --fast and --relaxed, and avoid
youtube-dl running out of the annex.diskreserve.
The first hour or two was spent adding support for per-key temp
directories. youtube-dl is run inside such a directory, to let it write
whatever files it needs. Like the per-key temp files, these temp directories
are not cleaned up when a download fails or is interrupted, so resuming can
pick up where it left off. Taught `git annex dropunused` and everything
else that cleans up per-key temp files to also clean up the temp
directories.
Today's work was sponsored by Trenton Cronholm on
[Patreon](https://patreon.com/joeyh/)

View file

@ -0,0 +1,32 @@
I initialized local repository with
git-annex init $HOSTNAME --version=6
unfortunately I didn't change HOSTNAME on a new machine and it was 'localhost.localdomain'. I didn't notice that before I cloned a git-annex repository.
Now in the remote repository when I run `git-annex info` (and same in local repository), I see
$ git-annex info
...
semitrusted repositories: 6
00000000-0000-0000-0000-000000000001 -- web
00000000-0000-0000-0000-000000000002 -- bittorrent
0085(maybe it's not secure to write it on the internet)-e8f803a -- localhost.localdomain
...
(and other repositories. By the way, I never initialized 'web' and 'bittorent', where did they get from?)
I would like 'localhost.localdomain' to become my real $HOSTNAME, so that I would distinguish that machine. How could I do that?
I found [How to rename a remote](https://git-annex.branchable.com/forum/How_to_rename_a_remote__63__/), but my 'localhost' is not listed in git-remotes.
I grep-ed .git for 'localhost.localdomain', and changed `.git/COMMIT_EDITMSG`. However, after running git-annex sync it returns to 'localhost.localdomain'.
$ more .git/COMMIT_EDITMSG
git-annex in Acer
$ git-annex sync
...
$ more .git/COMMIT_EDITMSG
git-annex in localhost.localdomain
I would like to change 'localhost' to my real machine name both on the remote repository from which I cloned and on local repository. Thank you.

View file

@ -0,0 +1,13 @@
Hello. Am a newbie to Git Annex(ga), but love it already. I kept trying to index own important files for the past long time, but ended up all tangled up. With ga I now see a light at the end of the tunnel! (Hope it's not a train heading my way :)
So thanks a bucket for writing Git Annex!
I am an "archiver": Every file I add to ga repo is a never-to-be-changed file (it's checksum stays same throughout eternity, only metadata keeps changin). All I need ga for atm is to tag all files. Unfortunately we are talking about few hundred thousand files and the performance with the master git-annex-6.20170519 is not quite what one might hope for.
From your design/caching_database doc I gather that the outlook with metadata is positive ( "For metadata, the story is much nicer. Querying for 30000 keys that all have a particular tag in their metadata takes 0.65s. So fast enough to be used in views." ), but is not in a db (sqlite) yet in the master (git-annex-6.20170519) . I tried to dig through some of the Links there to find out which commit could I checkout and build to try out a cached metadata, but no avail.
Since I don't ever change any file once it gets checked into the ga repo, does that simplify my possible use of current metadata cache code, or will I have to try to learn haskell and will I need to code stuff to get performance (creating views and such).
TIA for any pointers, tips and cavats and THANKS AGAIN FOR WRITING GIT-ANNEX.
ganewbie01

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="ganewbie01"
avatar="http://cdn.libravatar.org/avatar/a3b7d6e560486cb87c51cb0cf3328c8e"
subject="development branches inaccessible?"
date="2017-11-26T13:12:16Z"
content="""
To not sit idle, I've been looking for development branches (specifically the one containing code that gave the rise to Joey's claim \"Querying for 30000 keys that all have a particular tag in their metadata takes 0.65s.\"), but could find only repos with the one branch - the master branch, which doesn't (naturally seem to) include the code for SQLite metadata tinkering.
Is there someplace I could find such development branches please?
"""]]

View file

@ -0,0 +1,39 @@
[[!comment format=mdwn
username="olaf"
avatar="http://cdn.libravatar.org/avatar/4ae498d3d6ee558d6b65caa658f72572"
subject="comment 2"
date="2017-11-27T05:39:04Z"
content="""
Did you clone the repository?
$ git clone git://git-annex.branchable.com/ git-annex
I see lots of branches (remember they are *remote* branches so you will need the `-a` flag):
$ git branch -a
* master
remotes/origin/HEAD -> origin/master
remotes/origin/atomic-store-test
remotes/origin/debian
remotes/origin/debian-jessie-backport
remotes/origin/debian-squeeze-backport
remotes/origin/debian-stable-security-fix
remotes/origin/debian-wheezy-backport
remotes/origin/ghc7.0
remotes/origin/improved-smudge-filters
remotes/origin/master
remotes/origin/newwinrelease
remotes/origin/no-direct-mode
remotes/origin/p2p-map
remotes/origin/setup
remotes/origin/smudge
remotes/origin/tweak-fetch
remotes/origin/uuid-type-rework
remotes/origin/winsplicehack
You can checkout one of the branches like:
$ git checkout remotes/origin/setup
Does that help?
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="ganewbie01"
avatar="http://cdn.libravatar.org/avatar/a3b7d6e560486cb87c51cb0cf3328c8e"
subject="found it! ( I think ... or should I be still looking for "database" branch? )"
date="2017-11-28T01:03:05Z"
content="""
hi, thanks for your reply;
I've spent several hours today looking through the git-annex repo. I think it was a great idea to place the forums and everything in one repo! It provides sort of a \"running commentary\" on what was going on and why ...
After a couple of hours looking through the repo using tig, I checked out the key commit \"bb242bdd82a438ebfc937609d8d13b512cb49943\" and found the foo.hs and fooes.hs files which are most likely the ones that Joey was writing about when he expressed hopes for metadata in an sqlite file. ( I didn't find a way to see \"old branches\" though, e.g. the one named `database`. Maybe if I study git more ... )
Thanks for your reply to a silly newbie question anyway! I'll study this some more and see if I have some on-topic questions (hopefully they will be more educated by then :) )
g'day!
"""]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2017-11-28T21:47:54Z"
content="""
Yeah, you found the stuff. That's as far as the metadata cache idea has
gotten yet. I've restored the missing "database" branch, which was just
that commit you found.
I do hope to circle back around to this eventually to speed up generating
views and other metadata queries.
But, as a programmer, you could create your own sqlite database and put
metadata about your git-annex repository in it. Using
`git annex metadata --batch --json` you can query git-annex
for metadata about your files as fast as it can pull it out of git,
and shove it into your database, and then write your own sql queries.
That would be a good first step, because working with real-world
data would help develop the sql schema and see if it'll be fast enough to
bother with putting into git-annex.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="sunny256"
avatar="http://cdn.libravatar.org/avatar/8a221001f74d0e8f4dadee3c7d1996e4"
subject="Version missing from the annex"
date="2017-11-29T16:15:03Z"
content="""
It seems as this version is missing from https://downloads.kitenet.net/.git/ , the newest version there is v6.20171109.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2017-11-29T21:38:26Z"
content="""
Indeed it was. I must have forgotten to push out the files for that
release. Done so now.
"""]]