git-annex (5.20131127) unstable; urgency=low

* webapp: Detect when upgrades are available, and upgrade if the user
    desires.
    (Only when git-annex is installed using the prebuilt binaries
    from git-annex upstream, not from eg Debian.)
  * assistant: Detect when the git-annex binary is modified or replaced,
    and either prompt the user to restart the program, or automatically
    restart it.
  * annex.autoupgrade configures both the above upgrade behaviors.
  * Added support for quvi 0.9. Slightly suboptimal due to limitations in its
    interface compared with the old version.
  * Bug fix: annex.version did not get set on automatic upgrade to v5 direct
    mode repo, so the upgrade was performed repeatedly, slowing commands down.
  * webapp: Fix bug that broke switching between local repositories
    that use the new guarded direct mode.
  * Android: Fix stripping of the git-annex binary.
  * Android: Make terminal app show git-annex version number.
  * Android: Re-enable XMPP support.
  * reinject: Allow to be used in direct mode.
  * Futher improvements to git repo repair. Has now been tested in tens
    of thousands of intentionally damaged repos, and successfully
    repaired them all.
  * Allow use of --unused in bare repository.

# imported from the archive
This commit is contained in:
Joey Hess 2013-11-27 18:41:44 -04:00
commit 7189dfd77d
6383 changed files with 204042 additions and 0 deletions

View file

@ -0,0 +1,111 @@
I've been wrestling with git-annex to try to make it build on Debian, or more specifically, wrestling with Haskell dependencies.
After a fair amount of futzing around, and pestering a bunch of people in the process (thanks for the help! :) ) I finally managed to make it build.
I figured I would post the steps here, since it's not completely trivial, and I expect that a few others might be interested in building newer versions as well.
There appears to currently be two methods:
* Debian packages on Wheezy plus Sid
* Starting out on Wheezy, and then picking the rest from Sid (it seems at least libghc-safesemaphore-dev from Sid is critical for newer git-annex)
* WebDAV suport will not be available with this method
* Cabal packages
#Debian packages on Wheezy plus Sid
##Start off with a clean wheezy chroot
sudo debootstrap wheezy debian-wheezy
sudo chroot debian-wheezy
##Install some build tools
apt-get update
apt-get install devscripts git
##Get git-annex (either by cloning or simply moving the source into the chroot)
mkdir /src
cd /src
git clone git://git-annex.branchable.com/source.git git-annex
cd git-annex
##Remove WebDAV dependency which can't be satisfied anywhere
sed '/libghc-dav-dev/d' -i debian/control
##Create dummy build-depends package and install all available Wheezy dependencies using it
mk-build-deps
dpkg -i git-annex-build-deps*.deb
apt-get install -f
(this will remove the build-depends package)
##Add Sid sources and install all available Sid dependencies
echo "deb http://http.debian.net/debian sid main" >>/etc/apt/sources.list
apt-get update
dpkg -i git-annex-build-deps*.deb
apt-get install -f
(the build-depends package should now be fully installed)
##Disable the 'make test' that fails due to missing hothasktags
echo >>debian/rules
echo "override_dh_auto_test:" >>debian/rules
##Build!
debuild -us -uc -Igit
#Cabal packages
##Start off with a clean Sid(/Wheezy) chroot
sudo debootstrap sid debian-sid
sudo chroot debian-sid
##Install a smaller set of tools and build-depends from Debian (cabal needs these to compile the Haskell stuff)
apt-get update
apt-get install ghc cabal-install devscripts libz-dev pkg-config c2hs libgsasl7-dev libxml2-dev libgnutls-dev c2hs git debhelper ikiwiki perlmagick uuid rsync openssh-client fakeroot
##Get git-annex (either by cloning or simply moving the source into the chroot)
mkdir /src
cd /src
git clone git://git-annex.branchable.com/source.git git-annex
cd git-annex
##Install the Haskell build-dependencies from cabal
cabal update
cabal install --only-dependencies
##Optional step which doesn't work (might in the future)
If we want to run the 'make test' after build we need hothasktags, which is only available via cabal
apt-get install happy
cabal install hothasktags
export PATH=$PATH:~/.cabal/bin
But this currently fails silently inside make test->fast->tags, and if you dig a bit (manually edit the makefile to be more verbose) you see
hothasktags: ./Command/AddUnused.hs: hGetContents: invalid argument (invalid byte sequence)
##Disable the 'make test' that fails
echo >>debian/rules
echo "override_dh_auto_test:" >>debian/rules
##Remove all Debian package haskell depends (taken care of by cabal instead)
sed '/\tlibghc/d' -i debian/control
## Build!
debuild -us -uc -Igit

View file

@ -0,0 +1,27 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkCw26IdxXXPBoLcZsQFslM67OJSJynb1w"
nickname="Alexander"
subject="can't install git-annex on OS X Mountain Lion without disabling WebDAV support"
date="2013-04-29T17:57:03Z"
content="""
possibly related to this Debian issue:
trying to install git-annex with cabal on OS X 10.8.3, the build fails with
Loading package DAV-0.4 ... linking ... ghc:
lookupSymbol failed in relocateSection (relocate external)
~/.cabal/lib/DAV-0.4/ghc-7.4.2/HSDAV-0.4.o: unknown symbol `_DAVzm0zi4_PathszuDAV_version1_closure'
ghc: unable to load package `DAV-0.4'
Failed to install git-annex-4.20130417
cabal: Error: some packages failed to install:
git-annex-4.20130417 failed during the building phase. The exception was:
ExitFailure 1
This was after following all of the instructions for the Homebrew install at [http://git-annex.branchable.com/install/OSX/](http://git-annex.branchable.com/install/OSX/)
I was able to work around this issue by installing with the WebDAV flag disabled (ie, added the option --flags=\"-WebDAV\" to last command in the OS X install instructions):
cabal install git-annex --bindir=$HOME/bin --flags=\"-WebDAV\"
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 2"
date="2013-04-30T21:51:50Z"
content="""
@Alexander that DAV-0.4 problem is a bug in DAV, not git-annex. I've informed its author and it should be fixed soon, in a new version of DAV.
"""]]

View file

@ -0,0 +1,35 @@
Here's a workaround to start syncing folders on Windows right now. It's a bit command line heavy, so you might need to set this up for your users. But I would much rather do this than use some other syncing solution and then have to migrate.
(1) Create a remote server git annex repository with the assistant on Linux or Mac.
(2) [Install git](http://git-scm.com/) on the Windows machine.
(3) [Install git-annex for Windows](http://git-annex.branchable.com/install/Windows/) on the Windows machine. Don't forget to run the installer as administrator.
(4) Run _Git Bash_ from the system menu, and run these commands to clone your repository.
ssh-keygen
cat .ssh/id_rsa.pub | ssh username@my-server.com "cat >> ~/.ssh/authorized_keys"
git clone username@my-server.com:/path/to/annex
cd annex
git annex init
(5) Create a script that will trigger a full sync
echo '
#!/bin/bash
git annex sync
git annex get *
git annex add .
git annex sync
git annex copy * --to origin
' > sync.sh
chmod +x sync.sh
./sync.sh
(6) Copy the "Git Bash" shortcut from your windows menu to your desktop, and change the link target to:
C:\Program Files\Git\bin\sh.exe" --login -i "annex/sync.sh"
Now ask your users to run this shortcut before and after they change files. You can also put it into the "autostart" folder to sync at boot.

View file

@ -0,0 +1,59 @@
If you're anything like me¹, you have a copy of your annex on a computer running at home², set up so you can access it from anywhere like this:
ssh myhome.no-ip.org
This is totally great! Except, there is no way for your home computer to pull your changes, because there is no *on-the-go.no-ip.org*. You can get clunky and use a *bare git repository and git push*, but there is a better way.
First, install *openssh-server* on your *on-the-go* computer
sudo apt-get install openssh-server # Adjust to your flavor of unix
Then, log into your *home* computer, with *port forwarding*:
ssh me@myhome.no-ip.org -R 2201:localhost:22
Your *home* computer can now ssh into your *on-the-go* computer, as long as you keep the above shell running.
You can now add your *on-the-go* computer as a remote on your *home* computer. Use the port forwarding shell you just connected with the command above, if you like.
ssh-keygen -t rsa
ssh-copy-id "me@localhost -p 2201"
cd ~/annex
git remote add on-the-go ssh://me@localhost:2201/home/myuser/annex
Now you can run normal annex operations, as long as the port forwarding shell is running³.
git annex sync
git annex get on-the-go some/big/file
git annex info
You can add more computers by repeating with a different port, e.g. 2202 or 2203 (or any other).
If you're security paranoid (like me), read on. If you're not, that's it! Thanks for reading!
---
Paranoid Area
Note you're granting passwordless access to your on-the-go computer to your home computer. I believe that's all right, as long as:
* Your home computer is really in your home, and not at a friend's house or some datacenter
* Your home computer can be accessed only by ssh, and not HTTP or Samba or NTP or (shoot me now!) FTP
* Only you (and perhaps trustworthy family) have access to your home computer
* You have reasonably strong passwords or key-only logins on both your home and on-the-go computers.
* You regularly install security updates on both computers (sudo apt-get update && sudo apt-get upgrade)
In any case, the setup is much, much, much more secure than Dropbox. With Dropbox, you have exactly the same setup, but:
* Your data is stored in some datacenter. It's supposed to be encrypted. It might not be.
* Lot's of people have routine access to your files, and plausible reason to. Bored employees might regularly be doing some 'maintenance work' involving your pictures.
* The dropbox software can do anything it likes on your computer, and it's closed source so you don't know if it does. A disgruntled employee could put a trojan into it.
* Dropbox might have a backdoor for employee access to any file on your computer. This might be done with the best of intentions, but a mal-intentioned or careless employee might still erase things or send sensitive files from your computer by email.
* A truly huge amount of eyes connected to incredibly smart brains have looked at openssh and found it secure. Everybody trusts openssh. With dropbox, there is, well, dropbox. Whoever that is.
-----
¹ Me=Carlo, not Joey. I'm pretty sure doing what I wrote here is a good idea, but in case it turns out to be catastrophically dumb, it's my fault, not his.
² My always-on computer at home is a raspberry pi with a 32GB USB stick. Best self-hosted dropbox you could imagine.
³ You can just forward the port, but not open a shell, by adding the -N command. This could be useful for connecting on startup, e.g. in /etc/rc.local. I prefer to open the shell to forward the ports, maybe use it, and close it to stop it.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.6.49"
subject="comment 1"
date="2012-11-30T16:25:58Z"
content="""
If you don't trust your home computer with shell access, you can lock it down in `.ssh/authorized_keys` to only be able to run git-annex-shell. See [[forum/Restricting_git-annex-shell_to_a_specific_repository]]
"""]]

View file

@ -0,0 +1,13 @@
# Problem
I noticed that after installing git-annex assistant, my start up times greatly increased because the assistant does a startup scan while everything else is loading.
# Solution (for people using Gnome)
The solution I came up with is to delay the assistant's startup, as well as setting its IO priority as idle. To do this in Gnome 3, run:
gnome-session-properties
Find the "Git Annex Assistant" entry in the Startup Programs tab, then click edit. Change this:
/usr/local/bin/git-annex assistant --autostart (your location of git-annex may be different)
to this:
bash -c "sleep 30; ionice -c3 /usr/local/bin/git-annex assistant --autostart" (replace /usr/local/bin to wherever git-annex is installed)
The "sleep 30" command delays the startup of the assistant by 30 seconds, and "ionice -c3" sets git-annex's IO priority to "idle," the lowest level.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://launchpad.net/~alphapapa"
nickname="alphapapa"
subject="ionice not supported by deadline scheduler"
date="2013-06-28T17:43:47Z"
content="""
Linux's deadline I/O scheduler does not support ionice. It is now the default on some distros, including Ubuntu. CFQ does support ionice.
"""]]

View file

@ -0,0 +1,120 @@
The problem
===========
[Calibre](http://calibre-ebook.com/) is a ebook manager that is
available in [debian](http://packages.debian.org/sid/calibre). I use
it to maintain my library, but also to dowload every day an epub
version of a French newspaper and then put it on my kobo.
Configuring git annex for this
==============================
I wanted to use git-annex, so
$ git init
$ git annex init "some useful name"
But I don't want every thing in annex, because Calibre use some text
file to save some metadata, so I used:
$ git config annex.largefiles "include=* exclude=*.opf exclude=*.json"
then lets add everything
$ git annex add *
$ git add *
$ git commit -m "first commit"
Calibre need read and write access on the its database, so let unlock it:
$ git annex unlock metadata.db
On my other computer I only need to do
$ git clone $user@$host:Calibre\ library
$ cd Calibre\ library
$ git annex init "another useful name"
$ git annex get .
$ git annex unlock metadata.db
The problem is that every time you will `git annex sync`, git annex
will lock again the metadata.db, so lets unlock it automatically. I
use git hooks, in `.git/hooks/post-commit` I have
#!/bin/bash
git annex edit metadata.db
don't forget to make this file executable
$ chmod a+x .git/hooks/post-commit
Day to day operation
====================
$ git annex add .
Will put new file into the annex
$ git add .
Will take care of the files that should no go into annex
$ git annex sync
Will make the repositories exchange informations about all this, and
make remote change local
$ git annex get .
Will make remote book locally available
Merge conflict
--------------
You should not run calibre on the two computer simultaneously, or
without syncing before it. If you do, you will have a conflict that
git-annex will automatically *solve* by rename both of the file.
You can then either:
- Choose one. If no books have been changed or added on one of the
computer, to use the other `metadata.db` will not make you loose
any information
- rebuild it. `calibredb restore_database` won't do it, but will tell
you how to do it.
Checking the library
--------------------
You can use `calibredb check_library` to check you library is
correct. If you use git for it, it will always tell you that it is not
correct: there is this author ".git" it doesn't know about. Just don't
care about it.
Maybe this can be solved by using `vcsh` but apparently
`vcsh`+`git annex` it not well tested yet.
Automatic stuff
---------------
I use `mr` to automatically run all this, but some config could be
done (I believe) to have `git annex copy --auto` do what it should.
There are also the git annex assistant for this kind of automatic
synchronizations of contents, but I don't know if my automatic
unlocking of one file will break this.
It might be interesting to find someway to unlock and lock the library
only when running calibre, a simple script to launch calibre will do
that. Note that each time you will lock and unlock, you will have a
new commit in git.
Another solution
===================
You could also use direct mode in place of the auto unlock feature
git annex direct
The remove the `post-commit` git hook (or do not add it). Its a
simpler solution, but remember that interaction between git annex direct
repositories and plain git are complex and sometimes downright dangerous. See [[direct mode]] for details.
In particular, do *not* called `git add *` in the above steps, as that will commit all books into git.

View file

@ -0,0 +1,19 @@
I worked out how to retroactively annex a large file that had been checked into a git repo some time ago. I thought this might be useful for others, so I am posting it here.
Suppose you have a git repo where somebody had checked in a large file you would like to have annexed, but there are a bunch of commits after it and you don't want to loose history, but you also don't want everybody to have to retrieve the large file when they clone the repo. This will re-write history as if the file had been annexed when it was originally added.
This command works for me, it relies on the current behavior of git which is to use a directory named .git-rewrite/t/ at the top of the git tree for the extracted tree. This will not be fast and it will rewrite history, so be sure that everybody who has a copy of your repo is OK with accepting the new history. If the behavior of git changes, you can specify the directory to use with the -d option. Currently, the t/ directory is created inside the directory you specify, so "-d ./.git-rewrite/" should be roughly equivalent to the default.
Enough with the explanation, on to the command:
<pre>
git filter-branch --tree-filter 'for FILE in file1 file2 file3;do if [ -f "$FILE" ] && [ ! -L "$FILE" ];then git rm --cached "$FILE";git annex add "$FILE";ln -sf `readlink "$FILE"|sed -e "s:^../../::"` "$FILE";fi;done' --tag-name-filter cat -- --all
</pre>
replace file1 file2 file3... with whatever paths you want retroactively annexed. If you wanted bigfile1.bin in the top dir and subdir1/bigfile2.bin to be retroactively annexed try:
<pre>
git filter-branch --tree-filter 'for FILE in bigfile1.bin subdir1/bigfile2.bin;do if [ -f "$FILE" ] && [ ! -L "$FILE" ];then git rm --cached "$FILE";git annex add "$FILE";ln -sf `readlink "$FILE"|sed -e "s:^../../::"` "$FILE";fi;done' --tag-name-filter cat -- --all
</pre>
**If your repo has tags** then you should take a look at the git-filter-branch man page about the --tag-name-filter option and decide what you want to do. By default this will re-write the tags "nearly properly".
You'll probably also want to look at the git-filter-branch man page's section titled "CHECKLIST FOR SHRINKING A REPOSITORY" if you want to free up the space in the existing repo that you just changed history on.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://edheil.wordpress.com/"
ip="173.162.44.162"
subject="comment 1"
date="2012-12-16T00:11:38Z"
content="""
Man, I wish you'd written this a couple weeks ago. :) I was never able to figure that incantation out and ended up unannexing and re-annexing the whole thing to get rid of the file I inadvertently checked into git instead of the annex.
"""]]

View file

@ -0,0 +1,45 @@
[[!comment format=mdwn
username="https://launchpad.net/~arand"
nickname="arand"
subject="comment 2"
date="2013-03-13T12:05:49Z"
content="""
Based on the hints given here I've worked on a filter to both annex and add urls via filter-branch:
[https://gitorious.org/arand-scripts/arand-scripts/blobs/master/annex-filter](https://gitorious.org/arand-scripts/arand-scripts/blobs/master/annex-filter)
The script above is very specific but I think there are a few ideas that can be used in general, the general structure is
#!/bin/bash
# links that already exist
links=$(mktemp)
find . -type l >\"$links\"
# remove from staging area first to not block and then annex
git rm --cached --ignore-unmatch -r bin*
git annex add -c annex.alwayscommit=false bin*
# compare links before and after annexing, remove links that existed before
newlinks=$(mktemp -u)
mkfifo \"$newlinks\"
comm -13 <(sort \"$links\") <(find . -type l | sort) > \"$newlinks\" &
# rewrite links
while IFS= read -r file
do
# link is created below .git-rewrite/t/ during filter-branch, strip two parents for correct target
ln -sf \"$(readlink \"$file\" | sed -e 's%^\.\./\.\./%%')\" \"$file\"
done < \"$newlinks\"
git annex merge
which would be run using
git filter-branch --tree-filter path/annex-filter --tag-filter cat -- --all
or similar.
* I'm using `find` to make sure the only rewritten symlinks are for the newly annexed files, this way it is possible to annex an unknown set of filenames
* If doing several git annex commands using `-c annex.alwayscommit=false` and doing a `git annex merge` at the end instead might be faster.
"""]]

View file

@ -0,0 +1,36 @@
[[!comment format=mdwn
username="arand"
ip="130.238.245.202"
subject="comment 3"
date="2013-03-18T14:39:52Z"
content="""
One thing I noticed is that git-annex needs to checksum each file even if they were previously annexed (rather obviously since there is no general way to tell if the file is the same as the old one without checksumming), but in the specific case that we are replacing files that are already in git, we do actually have the sha1 checksum for each file in question, which could be used.
So, trying to work with this, I wrote a filter script that starts out annexing stuff in the first commit, and continously writes out sha1<->filename<->git-annex-object triplets to a global file, when it then starts with the next commit, it compares the sha1s of the index with those of the global file, and any matches are manually symlinked directly to the corresponding git-annex-object without checksumming.
I've done a few tests and this seems to be considerably faster than letting git-annex checksum everything.
This is from a git-svn import of the (free software) Red Eclipse game project, there are approximately 3500 files (images, maps, models, etc.) being annexed in each commit (and around 5300 commits, hence why I really, really care about speed):
10 commits: ~7min
100 commits: ~38min
For comparison, the old and new method (the difference should increase with the amount of commits):
old, 20 commits ~32min
new, 20 commits: ~11min
The script itself is a bit of a monstrosity in bash(/grep/sed/awk/git), and the files that are annexed are hardcoded (removed in forming $oldindexfiles), but should be fairly easy to adapt:
[https://gitorious.org/arand-scripts/arand-scripts/blobs/master/annex-ffilter](https://gitorious.org/arand-scripts/arand-scripts/blobs/master/annex-ffilter)
The usage would be something like:
rm /tmp/annex-ffilter.log; git filter-branch --tree-filter 'ANNEX_FFILTER_LOG=/tmp/annex-ffilter.log ~/utv/scripts/annex-ffilter' --tag-name-filter cat -- branchname
I suggest you use it with at least two orders of magnitude more caution than normal filter-branch.
Hope it might be useful for someone else wrestling with filter-branch and git-annex :)
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawknOATcOkmzX4jKuET5Z2RsaFUNnLKnQsU"
nickname="Stephen"
subject="comment 4"
date="2013-06-22T07:43:09Z"
content="""
Thanks for the tip :) One question though: how do I push this new history out throughout my other Annexes?
All I managed to make it do was revert the rewrite so the raw file appeared again...
"""]]

View file

@ -0,0 +1,58 @@
[The Internet Archive](http://www.archive.org/) allows members to upload
collections using an Amazon S3
[compatible API](http://www.archive.org/help/abouts3.txt), and this can
be used with git-annex's [[special_remotes/S3]] support.
So, you can locally archive things with git-annex, define remotes that
correspond to "items" at the Internet Archive, and use git-annex to upload
your files to there. Of course, your use of the Internet Archive must
comply with their [terms of service](http://www.archive.org/about/terms.php).
A nice added feature is that whenever git-annex sends a file to the
Internet Archive, it records its url, the same as if you'd run `git annex
addurl`. So any users who can clone your repository can download the files
from archive.org, without needing any login or password info. This makes
the Internet Archive a nice way to publish the large files associated with
a public git repository.
----
Sign up for an account, and get your access keys here:
<http://www.archive.org/account/s3.php>
# export AWS_ACCESS_KEY_ID=blahblah
# export AWS_SECRET_ACCESS_KEY=xxxxxxx
Specify `host=s3.us.archive.org` when doing `initremote` to set up
a remote at the Archive. This will enable a special Internet Archive mode:
Encryption is not allowed; you are required to specify a bucket name
rather than having git-annex pick a random one; and you can optionally
specify `x-archive-meta*` headers to add metadata as explained in their
[documentation](http://www.archive.org/help/abouts3.txt).
# git annex initremote archive-panama type=S3 \
host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
x-archive-meta-mediatype=texts x-archive-meta-language=eng \
x-archive-meta-title="original Panama Canal lock design blueprints"
initremote archive-panama (Internet Archive mode) ok
# git annex describe archive-panama "a man, a plan, a canal: panama"
describe archive-panama ok
Then you can annex files and copy them to the remote as usual:
# git annex add photo1.jpeg --backend=SHA1E
add photo1.jpeg (checksum...) ok
# git annex copy photo1.jpeg --fast --to archive-panama
copy (to archive-panama...) ok
Once a file has been stored on archive.org, it cannot be (easily) removed
from it. Also, git-annex whereis will tell you a public url for the file
on archive.org. (It may take a while for archive.org to make the file
publically visibile.)
Note the use of the SHA1E [[backend|backends]] when adding files. That is
the default backend used by git-annex, but even if you don't normally use
it, it makes most sense to use the WORM or SHA1E backend for files that
will be stored in the Internet Archive, since the key name will be exposed
as the filename there, and since the Archive does special processing of
files based on their extension.

View file

@ -0,0 +1,34 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
ip="72.0.72.144"
subject="how to use with simply addurl?"
date="2013-10-09T22:27:27Z"
content="""
It doesn't seem like git annex addurl by itself supports the archive.org urls...
[[!format txt \"\"\"
anarcat@marcos:presentations$ git annex addurl --file=re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm http://archive.org/download/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
addurl re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
failed to verify url exists: http://archive.org/download/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
failed
git-annex: addurl: 1 failed
\"\"\"]]
I also tried the \"details\" url (<http://archive.org/details/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia>) - but that just downloads the webpage, not the video either...
Even the ultimate video URL doesn't work:
[[!format txt \"\"\"
anarcat@marcos:presentations$ git annex addurl --debug --file=re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
[2013-10-09 18:26:30 EDT] call: quvi [\"-v\",\"mute\",\"--support\",\"http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm\"]
addurl re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm [2013-10-09 18:26:30 EDT] read: curl [\"-s\",\"--head\",\"-L\",\"http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm\",\"-w\",\"%{http_code}\"]
failed to verify url exists: http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012___Eben_Moglen___Freedom_of_Thought_Requires_Free_Media.webm
failed
git-annex: addurl: 1 failed
\"\"\"]]
... even though that URL actually gives out a proper 200 OK response code.
Any ideas? --[[anarcat]]
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.4.22"
subject="comment 2"
date="2013-10-11T17:08:27Z"
content="""
This was a misleading error message. The url you are trying to add to the file does not match the size recorded for the file already in the annex. (Or possibly the file's key has no recorded size). If you really want to add the url to the file despite it being a different encoding, you can use --relaxed, although fsck may not like the result if you ever end up downloading that url..
(Please file bug reports for problems in the future, rather than posting comments on only vaguely related pages which as we can see here can turn out to be entirely offtopic.)
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
ip="72.0.72.144"
subject="still a bug, filed separately!"
date="2013-10-11T18:49:06Z"
content="""
Aaah, of course, sorry for the noise here. It turns out that this is *not* because the filesize (or even the checksum, for that matter) are different, so there's clearly a bug there, and i filed it in [[bugs/addurl_fails_on_the_internet_archive]]. Thanks!
"""]]

View file

@ -0,0 +1,32 @@
I have an annex that syncs my personal files on all my computers. It works great. Phones are different.
For one, everything's a bit slower to sync, there's battery considerations, and I just don't need every last old file on my phone. Then there's some files I explicitly don't want on my phone in case it gets lost, like family pictures, passport scans, or private keys.
But I still want photos, videos and voice recordings I make on my phone to be synced to my server. A transfer repo would work, but I want to keep them. Then there's my PDF book collection; that would certainly be nice to always have around in case I have half on hour on a bus. And my music collection ought to be around as well.
So I came up with this solution, and I'm very happy with it.
include=Music/* or include=Books/* or present
This will sync my music and book collections to my phone whenever I add something new on my computers, and it will sync and keep anything I add to the annex on my phone. Best of all worlds! Impressed how flexible preferred content is. More full-sync folders can be added like this:
include=Music/* or include=Books/* or Notes/* or present
To add them, I first had to figure out the uuid of my phone repo. So I added a new tab on android, and did
cd /sdcard/annex
git config annex.uuid
Then I went to one of my computers, and did
git annex vicfg
And changed the line
content [phone-uuid] = standard
to
content [phone-uuid] = include=Music/* or include=Books/* or Notes/* or present
And waited for it to sync.

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.246"
subject="comment 1"
date="2013-11-16T17:29:03Z"
content="""
That's great, that's how I hoped people would be able to use preferred content settings.
I'd suggest adding support for archive directories to this. So if you create a file on the phone and are done with it, you can move it to an archive directory, and it will then be dropped from the phone once it reaches an archive repository.
This should accomplish that. (Untested)
`((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1))) and (include=Music/* or include=Books/* or present)`
"""]]

View file

@ -0,0 +1,46 @@
[[todo/wishlist: an "assistant" for web-browsing -- tracking the sources of the downloads]] suggests using git-annex as a tool to store downloads tied
to their URLs. This also enables people to have their files stored offline,
while being able to git annex drop them at any time and redownload them
with git annex get. Additionally, a clone of the repo can be used to
download whatever files are desired from online.
This tip explains how to implement a similar system to the one described in
the linked wishlist with existing software and features of git-annex.
The first step is to install the Firefox plugin
[FlashGot](http://flashgot.net/). We will use it to provide the Firefox
shortcuts to add things to our annex.
We also need a normal download manager, if we want to get status updates as
the download is done. We'll need to configure git-annex to use it by
setting `annex.web-download-command` as Joey describes in his comment on
[[todo/wishlist: allow configuration of downloader for addurl]]. See the
manpage [[git-annex]] for more information on setting configuration.
Once we have installed all that, we need a script that has an interface
which FlashGot can treat as a downloader, but which calls git-annex to do
the actual downloading. Such a script is available from
<https://gist.github.com/andyg0808/5342434>. Download it and store it
somewhere it can live, or cut and paste:
[[!format sh """
#!/bin/bash
# $1=folder to cd to (must be a git annex repo)
# $2=URL to download
cd "$1"
git-annex addurl "$2"
"""]]
Finally, we need to configure FlashGot to use the script as a downloader.
Go to Tools > Add-ons in Firefox. Click "Preferences" on FlashGot. Click
the Add button next to the list of download managers. Enter a name for the
git-annex downloader. Choose the script that was downloaded from the
"Locate executable file" dialog that appears. Now set the command line
arguments template to be "[FOLDER] [URL]" (you can find more substitution
expressions in the Placeholders dropdown above the Command line arguments
template field). You're done!
Go ahead and test it by trying to download a file using FlashGot. It should
offer as one of its available download managers the new manager you created
just above. Select it and have fun!

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 1"
date="2013-04-11T20:16:02Z"
content="""
As of my last commit, you don't really need a separate download manager. The webapp will now display urls that `git annex addurl` is downloading in amoung the other transfers.
"""]]

View file

@ -0,0 +1,31 @@
[[!meta title="using assume-unstages to speed up git with large trees of annexed files"]]
Git update-index's assume-unstaged feature can be used to speed
up `git status` and stuff by not statting the whole tree looking for changed
files.
This feature works quite well with git-annex. Especially because git
annex's files are immutable, so aren't going to change out from under it,
this is a nice fit. If you have a very large tree and `git status` is
annoyingly slow, you can turn it on:
git config core.ignoreStat true
When `git mv` and `git rm` are used, those changes *do* get noticed, even
on assume-unchanged files. When new files are added, eg by `git annex add`,
they are also noticed.
There are two gotchas. Both occur because `git add` does not stage
assume-unchanged files.
1. When an annexed file is moved to a different directory, it updates
the symlink, and runs `git add` on it. So the file will move,
but the changed symlink will not be noticed by git and it will commit a
dangling symlink.
2. When using `git annex migrate`, it changes the symlink and `git adds`
it. Again this won't be committed.
These can be worked around by running `git update-index --really-refresh`
after performing such operations. I hope that `git add` will be changed
to stage changes to assume-unchanged files, which would remove this
only complication. --[[Joey]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="https://me.yahoo.com/a/2djv2EYwk43rfJIAQXjYt_vfuOU-#a11a6"
nickname="Olivier R"
subject="It doesn't work 100%"
date="2012-05-03T21:42:54Z"
content="""
When you remove tracked files... it doesn't show the new status. it's like if the file was ignored.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnxlx1UrzVhdy6_gFjzmF42x6QXxBUxg00"
nickname="Jakukyo"
subject="comment 2"
date="2013-09-05T12:14:42Z"
content="""
> There are two gotchas...
So just always run `git annex add` after editing a file
and `git update-index --really-refresh` after migrating
backend?
"""]]

View file

@ -0,0 +1,15 @@
Normally git-annex does not retrieve file contents when checking out a
tree. In some use cases, it makes sense to always have the contents of
files available after a `git checkout` or `git update`. This can be
accomplished by installing the following as `.git/hooks/post-checkout`
#!/bin/sh
# Uses git-annex to get all files in the specified directories
# (relative to the top of the repository) on checkout.
dirs=.
top="$(git rev-parse --show-toplevel)"
for dir in "$dirs"; do git annex get $top/$dir"; done
By default, all files in the whole repository will be made available. The
`dirs` setting can be configured if you only want to get files in certian
directories.

View file

@ -0,0 +1,2 @@
When git annex does fsck on (for example) a GPG-encrypted special directory remote, it first transfers the whole file into .git/annex/tmp directory.
If your annex is on an SSD, it's a good idea to make .git/annex/tmp a symlink to say /var/tmp so SSD isn't worn down. This actually may be a better default.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawln3ckqKx0x_xDZMYwa9Q1bn4I06oWjkog"
nickname="Michael"
subject="comment 1"
date="2013-07-31T15:15:41Z"
content="""
Of course, this only works when /var/tmp isn't on SSD itself. Perhaps tmpfs (e.g. a /tmp on many distros) is good -- after checking that there's enough space to transfer a particular file.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawln3ckqKx0x_xDZMYwa9Q1bn4I06oWjkog"
nickname="Michael"
subject="there's a problem"
date="2013-08-04T17:15:05Z"
content="""
If .git/annex/tmp is a symlink to another fs, then adding doesn't work:
add file1.jpg (checksum...)
git-annex: /path/to/.git/annex/tmp/tmp30148: rename: unsupported operation (Invalid cross-device link)
It looks like it would be good to have two types of tmp directories here, one for adding, another one for verifying (and that one could be redirected off SSD).
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="guilhem"
ip="46.239.117.180"
subject="comment 3"
date="2013-08-19T01:05:40Z"
content="""
A nice feature would be to perform the `fsck` on the (encrypted) remote itself, as it would avoid to clutter either the network or the tmpdir. However, that requires some changes in git-annex's backend. Indeed it would no longer be enough to store a single digest per (plain) file: a new digest needs to be stored for each encrypted copy. It is not necessarily a big deal, but the backend would need to be reorganized carefully.
"""]]

View file

@ -0,0 +1,75 @@
If you are starting from nothing (no existing `git` or `git-annex` repository) and want to use a server as a centralised repository, try the following steps.
On the server where you'll hold the "master" repository:
server$ cd /one/git
server$ mkdir m
server$ cd m
server$ git init --bare
Initialized empty Git repository in /one/git/m/
server$ git annex init origin
init origin ok
server$
Clone that to the laptop:
laptop$ cd /other
laptop$ git clone ssh://server//one/git/m
Cloning into 'm'...
Warning: No xauth data; using fake authentication data for X11 forwarding.
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (5/5), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.
laptop$ cd m
laptop$ git annex init laptop
init laptop ok
laptop$
Merge the `git-annex` repository (this is the bit that is often
overlooked!):
laptop$ git annex merge
merge . (merging "origin/git-annex" into git-annex...)
ok
laptop$
Add some content:
laptop$ git annex addurl http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg
"kitenet.net_~joey_screencasts_git-annex_coding_in_haskell.ogg"
addurl kitenet.net_~joey_screencasts_git-annex_coding_in_haskell.ogg (downloading http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg ...) --2011-12-15 08:13:10-- http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg
Resolving kitenet.net (kitenet.net)... 2001:41c8:125:49::10, 80.68.85.49
Connecting to kitenet.net (kitenet.net)|2001:41c8:125:49::10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39362757 (38M) [audio/ogg]
Saving to: `/other/m/.git/annex/tmp/URL--http&c%%kitenet.net%~joey%screencasts%git-annex_coding_in_haskell.ogg'
100%[======================================>] 39,362,757 2.31M/s in 17s
2011-12-15 08:13:27 (2.21 MB/s) - `/other/m/.git/annex/tmp/URL--http&c%%kitenet.net%~joey%screencasts%git-annex_coding_in_haskell.ogg' saved [39362757/39362757]
(checksum...) ok
(Recording state in git...)
laptop$ git commit -m 'See Joey play.'
[master (root-commit) 106e923] See Joey play.
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 120000 kitenet.net_~joey_screencasts_git-annex_coding_in_haskell.ogg
laptop$
All fine, now push it back to the centralised master:
laptop$ git push
Counting objects: 20, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (18/18), 1.50 KiB, done.
Total 18 (delta 1), reused 1 (delta 0)
To ssh://server//one/git/m
3ba1386..ad3bc9e git-annex -> git-annex
laptop$
You can add more "client" repositories by following the `laptop`
sequence of operations.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joey.kitenet.net/"
nickname="joey"
subject="comment 1"
date="2011-12-23T19:19:53Z"
content="""
See also: [[centralized_git_repository_tutorial]]
"""]]

View file

@ -0,0 +1,140 @@
The [[walkthrough]] builds up a decentralized git repository setup, but
git-annex can also be used with a centralized bare repository, just like
git can. This tutorial shows how to set up a centralized repository hosted on
GitHub.
## set up the repository, and make a checkout
I've created a repository for technical talk videos, which you can
[fork on Github](https://github.com/joeyh/techtalks).
Or make your own repository on GitHub (or elsewhere) now.
On your laptop, [[install]] git-annex, and clone the repository:
# git clone git@github.com:joeyh/techtalks.git
# cd techtalks
Tell git-annex to use the repository, and describe where this clone is
located:
# git annex init 'my laptop'
init my laptop ok
Let's tell git-annex that GitHub doesn't support running git-annex-shell there.
This means you can't store annexed file *contents* on GitHub; it would
really be better to host the bare repository on your own server, which
would not have this limitation. (If you want to do that, check out
[[using_gitolite_with_git-annex]].)
# git config remote.origin.annex-ignore true
## add files to the repository
Add some files, obtained however.
# youtube-dl -t 'http://www.youtube.com/watch?v=b9FagOVqxmI'
# git annex add *.mp4
add Haskell_Amuse_Bouche-b9FagOVqxmI.mp4 (checksum) ok
(Recording state in git...)
# git commit -m "added a video. I have not watched it yet but it sounds interesting"
This file is available directly from the web; so git-annex can download it:
# git annex addurl http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg
addurl kitenet.net_~joey_screencasts_git-annex_coding_in_haskell.ogg
(downloading http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg ...)
(checksum...) ok
(Recording state in git...)
# git commit -a -m 'added a screencast I made'
Feel free the rename the files, etc, using normal git commands:
# git mv Haskell_Amuse_Bouche-b9FagOVqxmI.mp4 Haskell_Amuse_Bouche.mp4
# git mv kitenet.net_~joey_screencasts_git-annex_coding_in_haskell.ogg git-annex_coding_in_haskell.ogg
# git commit -m 'better filenames'
Now push your changes back to the central repository. This first time,
remember to push the git-annex branch, which is used to track the file
contents.
# git push origin master git-annex
To git@github.com:joeyh/techtalks.git
* [new branch] master -> master
* [new branch] git-annex -> git-annex
That push went fast, because it didn't upload large videos to GitHub.
To check this, you can ask git-annex where the contents of the videos are:
# git annex whereis
whereis Haskell_Amuse_Bouche.mp4 (1 copy)
767e8558-0955-11e1-be83-cbbeaab7fff8 -- here
ok
whereis git-annex_coding_in_haskell.ogg (2 copies)
00000000-0000-0000-0000-000000000001 -- web
767e8558-0955-11e1-be83-cbbeaab7fff8 -- here
ok
## make more checkouts
So far you have a central repository, and a checkout on a laptop.
Let's make another checkout that's used as a backup. You can put it anywhere
you like, just make it be somewhere your laptop can access. A few options:
* Put it on a USB drive that you can plug into the laptop.
* Put it on a desktop.
* Put it on some server in the local network.
* Put it on a remote VPS.
I'll use the VPS option, but these instructions should work for
any of the above.
# ssh server
server# sudo apt-get install git-annex
Clone the central repository as before. (If the clone fails, you need
to add your server's ssh public key to github -- see
[this page](http://help.github.com/ssh-issues/).)
server# git clone git@github.com:joeyh/techtalks.git
server# cd techtalks
server# git config remote.origin.annex-ignore true
server# git annex init 'backup'
init backup (merging origin/git-annex into git-annex...) ok
Notice that the server does not have the contents of any of the files yet.
If you run `ls`, you'll see broken symlinks. We want to populate this
backup with the file contents, by copying them from your laptop.
Back on your laptop, you need to configure a git remote for the backup.
Adjust the ssh url as needed to point to wherever the backup is. (If it
was on a local USB drive, you'd use the path to the repository instead.)
# git remote add backup ssh://server/~/techtalks
Now git-annex on your laptop knows how to reach the backup repository,
and can do things like copy files to it:
# git annex copy --to backup git-annex_coding_in_haskell.ogg
copy git-annex_coding_in_haskell.ogg (checking backup...)
12877824 2% 255.11kB/s 00:00
ok
You can also `git annex move` files to it, to free up space on your laptop.
And then you can `git annex get` files back to your laptop later on, as
desired.
After you use git-annex to move files around, remember to push,
which will broadcast its updated location information.
# git push
## take it farther
Of course you can create as many checkouts as you desire. If you have a
desktop machine too, you can make a checkout there, and use `git remote
add` to also let your desktop access the backup repository.
You can add remotes for each direct connection between machines you find you
need -- so make the laptop have the desktop as a remote, and the desktop
have the laptop as a remote, and then on either machine git-annex can
access files stored on the other.

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkC0W3ZQERUaTkHoks6k68Tsp1tz510nGo"
nickname="Georg"
subject="sync, push, pull with/to/from centralized bare repository"
date="2013-10-07T06:45:19Z"
content="""
Hi Joey,
thanks for tutorial with the centralized repo. I am currently trying to set up a central bare repo for two clients (they cannot communicate directly with each other). I am not sure if I am pushing/pulling the right way.
On the server I did:
git init --bare
git annex init origin
On Cĺient Alice (I want to give Bob a chance get call \"git annex get\" from \"origin\"):
git clone ssh://tktest@192.168.56.104/~/annex .
git annex init Alice
git annex merge
git annex add .
git commit -a -m \"Added tutorial\"
git push origin master git-annex
git annex copy . --to origin
On Client Bob I have called \"clone, init, merge, add, push, copy\" also.
Now the tricky part - do I have to call \"git annex sync\" at Alice's side to get the updates from Bob over origin?
I ran into troubles if I called \"copy --to origin\" before \"git push origin master git-annex\". How can I resolve a non-fast-forware on the git-annex branch?
Some notes about how to sync over a central bare repo would be nice here =)
Thanks a lot, Georg
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.153.253.80"
subject="How can I resolve a non-fast-forware on the git-annex branch?"
date="2013-10-07T17:08:32Z"
content="""
By either running `git annex sync`, or if you want to pull and push yourself, by running `git annex merge` before pushing.
"""]]

View file

@ -0,0 +1,63 @@
You can use git-annex as a podcatcher, to download podcast contents.
No additional software is required, but your git-annex must be built
with the Feeds feature (run `git annex version` to check).
All you need to do is put something like this in a cron job:
`cd somerepo && git annex importfeed http://url/to/podcast http://other/podcast/url`
This downloads the urls, and parses them as RSS, Atom, or RDF feeds.
All enclosures are downloaded and added to the repository, the same as if you
had manually run `git annex addurl` on each of them.
git-annex will avoid downloading a file from a feed if its url has already
been stored in the repository before. So once a file is downloaded,
you can move it around, delete it, `git annex drop` its content, etc,
and it will not be downloaded again by repeated runs of
`git annex importfeed`. Just how a podcatcher should behave.
## templates
To control the filenames used for items downloaded from a feed,
there's a --template option. The default is
`--template='${feedtitle}/${itemtitle}${extension}'`
Other available template variables:
feedauthor, itemauthor, itemsummary, itemdescription, itemrights, itemid
## catching up
To catch up on a feed without downloading its contents,
use `git annex importfeed --relaxed`, and delete the symlinks it creates.
Next time you run `git annex addurl` it will only fetch any new items.
## fast mode
To add a feed without downloading its contents right now,
use `git annex importfeed --fast`. Then you can use `git annex get` as
usual to download the content of an item.
## storing the podcast list in git
You can check the list of podcast urls into git right next to the
files it downloads. Just make a file named feeds and add one podcast url
per line.
Then you can run git-annex on all the feeds:
`xargs git-annex importfeed < feeds`
## distributed podcatching
A nice benefit of using git-annex as a podcatcher is that you can
run `git annex importfeed` on the same url in different clones
of a repository, and `git annex sync` will sync it all up.
## centralized podcatching
You can also have a designated machine which always fetches all podcstas
to local disk and stores them. That way, you can archive podcasts with
time-delayed deletion of upstream content. You can also work around slow
downloads upstream by podcatching to a server with ample bandwidth or work
around a slow local Internet connection by podcatching to your home server
and transferring to your laptop on demand.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="2001:4978:f:21a::2"
subject="comment 10"
date="2013-08-05T16:47:30Z"
content="""
`cabal install feed` should get the necessary library installed so that git-annex will build with feeds support.
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="http://a-or-b.myopenid.com/"
ip="220.244.41.108"
subject="comment 11"
date="2013-08-06T04:20:16Z"
content="""
$ cabal install feed
Resolving dependencies...
All the requested packages are already installed:
feed-0.3.9.1
Use --reinstall if you want to reinstall anyway.
Then I reinstalled `git-annex` but it still doesn't find the feeds flag.
$ git annex version
git-annex version: 4.20130802
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV FsEvents XMPP DNS
Do I need to do something like:
cabal install git-annex --bindir=$HOME/bin -f\"-assistant -webapp -webdav -pairing -xmpp -dns -feed\"
...but what are the default flags to include in addition to `-feed`
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="2001:4978:f:21a::2"
subject="comment 12"
date="2013-08-06T04:24:10Z"
content="""
-f-Feed will disable the feature. -fFeed will try to force it on.
You can probably work out what's going wrong using cabal install -v3
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="http://a-or-b.myopenid.com/"
ip="220.244.41.108"
subject="comment 13"
date="2013-08-06T05:42:45Z"
content="""
So I ran `cabal install -v3` and looked at the output,
Flags chosen: feed=True, tdfa=True, testsuite=True, android=False,
production=True, dns=True, xmpp=True, pairing=True, webapp=True,
assistant=True, dbus=True, inotify=True, webdav=True, s3=True
This looks like feed should be on.
There doesn't appear to be any errors in the compile either.
Is it as simple as a bug where this flag just doesn't show in the `git annex version` command?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="2001:4978:f:21a::2"
subject="comment 14"
date="2013-08-07T16:03:12Z"
content="""
Yes, it did turn out to be as simple as my having forgotten that I have to manually add features to the version list.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://23.gs/"
ip="46.165.197.5"
subject="No file extension?"
date="2013-08-12T13:21:50Z"
content="""
It seems git-annex is a bit overzealous when sanitizing the file extension, currently I get: \"Nerdkunde/Let_s_go_to_the_D_M_C_A_m4a\" from http://www.nerdkunde.de/episodes.m4a.rss with the default template and only \"Nerdkunde/Let_s_go_to_the_D_M_C_A._m4a\" if I add the \".\" in the template myself...
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="arand"
ip="130.243.226.21"
subject="comment 16"
date="2013-08-12T13:32:46Z"
content="""
The filename extension is a known issue and already fixed in the development version, see <http://git-annex.branchable.com/bugs/importfeed_uses___34____95__foo__34___as_extension/>
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlpKmTa1OPwy5Jk24pOoD8Vlo2jahzTPnw"
nickname="Stephen"
subject="rss authentication"
date="2013-08-13T13:32:52Z"
content="""
If a podcast requires authentication, is there a way to pass credentials through? I tried `http://user:pass@site.com/rss.xml` but it didn't work.
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="http://www.joachim-breitner.de/"
nickname="nomeata"
subject="--fast and --relaxed"
date="2013-08-16T07:27:59Z"
content="""
Hi,
the explanations to --fast and --relaxed on this page could be extended a bit. I looked it up in the man page, but it is not yet clear to me when I would use one or the other with feeds. Also, does “Next time you run git annex addurl it will only fetch any new items.” really only apply to --relaxed, and not --fast?
Furthermore, it would be good if there were a template variable `itemnum` that I can use to ensure that `ls` prints the casts in the right order, even when the titles of the items are not helpful.
Greetings,
Joachim
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.0.63"
subject="comment 19"
date="2013-08-22T15:25:02Z"
content="""
I would expect user:pass@site.com to work if the site is using http basic auth. `importfeed` just runs `wget` (or `curl`) to do all downloads, and wget's documentation says that works. It also says you can use ~/.netrc to store the password for a site.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="ckeen"
ip="79.249.110.228"
subject="Filename too long"
date="2013-07-30T14:39:44Z"
content="""
It seems that some of my feeds get stored into keys that generate a too long filename:
podcasts/.git/annex/tmp/b1f_325_URL-s143660317--http&c%%feedproxy.google.com%~r%mixotic%~5%urTIRWQK2OQ%Mixotic__258__-__Michael__Miller__-__Galactic__Technolgies.mp3.log.web:
openBinaryFile: invalid argument (File name too long)
Is there a way to work around this?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.0.63"
subject="comment 20"
date="2013-08-22T15:29:11Z"
content="""
The git-annex man page has a bit more to say about --relaxed and --fast. Their behavior when used with `importfeed` is the same as with `addurl`.
If the podcast feed provides an `itemid`, you can use that in the filename template. I don't know how common that is. Due to the way `importfeed` works, it cannot keep track of eg, an incrementing item number itself.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.0.21"
subject="comment 2"
date="2013-07-30T17:16:07Z"
content="""
@ckeen You seem to be using a filesystem that does not support filenames 150 characters long. This is unusual -- even windows and android can support a filename up to 255 characters in length. `git-annex addurl` already deals with this sort of problem by limiting the filename to 255 characters. If you'd like to file a bug report with details about your system, I can try to make git-annex support its limitations, I suppose.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://www.joachim-breitner.de/"
nickname="nomeata"
subject="Great stuff!"
date="2013-07-30T21:21:57Z"
content="""
Looking forward to seeing it in Debian unstable; where it will definitely replace my hpodder setup.
I guess there is no easy way to re-use the files already downloaded with hpodder? At first I thought that `git annex importfeed --relaxed` followed by adding the files to the git annex would work, but `importfeed` stores URLs, not content-based hashes, so it wouldnt match up.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.0.21"
subject="comment 4"
date="2013-07-30T21:29:50Z"
content="""
@nomeata, well, you can, but it has to download the files again.
When run without --fast, `importfeed` does use content based hashes, so if you run it in a temporary directory, it will download the content redundantly, hash it and see it's the same, and add the url to that hash. You can then delete the temporary directory, and the files hpodder had downloaded will have the url attached to them now. I don't know if this really buys you anything over deleting the hpodder files and starting over though.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="ckeen"
ip="79.249.110.228"
subject="Force a reload of a feed?"
date="2013-07-31T10:35:50Z"
content="""
Currently I have my podcasts imported with --fast. For some reason there are podcast episodes missing. This has been done propably during my period of toying with the feature. If I retry on a clean annex I see all episodes. My suspicion is that git-annex has been interrupted during downloading a feed but now somehow thinks it's already there. How can I debug this situation and/or force git annex to retry all the links in a feed?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.0.21"
subject="use the force"
date="2013-07-31T16:20:39Z"
content="""
The only way it can skip downloading a file is if its url has already been seen before. Perhaps you deleted them?
I've made `importfeed --force` re-download files it's seen before.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="ckeen"
ip="78.108.63.46"
subject="--force reload all URLs"
date="2013-08-01T09:47:34Z"
content="""
Is it intentionally saving URLs with a prefixed 2_? I have sorted out all missing URLs and renamed it, so no harm done, but it has been a bit of a hassle to get there.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.152.108.145"
subject="comment 8"
date="2013-08-01T16:05:10Z"
content="""
I've now made importfeed --force a bit smarter about reusing existing files.
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="http://a-or-b.myopenid.com/"
ip="220.244.41.108"
subject="How do I switch on the 'feeds' feature?"
date="2013-08-05T04:52:41Z"
content="""
Joey - your initial post said:
git-annex must be built with the Feeds feature (run git annex version to check).
...but how do I actually switch on the feeds feature?
I install git-annex from cabal, so I do
cabal update
cabal install git-annex
which I did this morning and now `git annex version` gives me:
git-annex version: 4.20130802
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV FsEvents XMPP DNS
So it is the latest version, but without Feeds. :-(
"""]]

View file

@ -0,0 +1,28 @@
dropboxannex
=========
Hook program for gitannex to use dropbox as backend
# Requirements:
python2
Credit for the Dropbox api interface goes to Dropbox.
# Install
Clone the git repository in your home folder.
git clone git://github.com/TobiasTheViking/dropboxannex.git
This should make a ~/dropboxannex folder
# Setup
Run the program once to set it up.
cd ~/dropboxannex; python2 dropboxannex.py
# Commands for gitannex:
git config annex.dropbox-hook '/usr/bin/python2 ~/dropboxannex/dropboxannex.py'
git annex initremote dropbox type=hook hooktype=dropbox encryption=shared
git annex describe dropbox "the dropbox library"

View file

@ -0,0 +1,20 @@
bergey has developed an emacs mode for browsing git-annex repositories,
dired style.
<https://gitorious.org/emacs-contrib/annex-mode>
Locally available files are colored differently, and pressing g runs
`git annex get` on the file at point.
----
John Wiegley has developed a brand new git-annex interaction mode for
Emacs, which aims to integrate with the standard facilities
(C-x C-q, M-x dired, etc) rather than invent its own interface.
<https://github.com/jwiegley/git-annex-el>
He has also added support to org-attach; if
`org-attach-git-annex-cutoff' is non-nil and smaller than the size
of the file you're attaching then org-attach will `git annex add the
file`; otherwise it will "git add" it.

View file

@ -0,0 +1,21 @@
Maybe you had a lot of files scattered around on different drives, and you
added them all into a single git-annex repository. Some of the files are
surely duplicates of others.
While git-annex stores the file contents efficiently, it would still
help in cleaning up this mess if you could find, and perhaps remove
the duplicate files.
Here's a command line that will show duplicate sets of files grouped together:
git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --all-repeated=separate -f1 | \
sed 's/ [^ ]*$//'
Here's a command line that will remove one of each duplicate set of files:
git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --repeated -f1 | sed 's/ [^ ]*$//' | \
xargs -d '\n' git rm
--[[Joey]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmTNrhkVQ26GBLaLD5-zNuEiR8syTj4mI8"
nickname="Juan"
subject="comment 10"
date="2013-08-31T18:20:58Z"
content="""
I'm already spreading the word. Handling scientific papers, data, simulations and code has been quite a challenge during my academic career. While code was solved long ago, the three first items remained a huge problem.
I'm sure many of my colleagues will be happy to use it.
Is there any hashtag or twitter account? I've seen that you collected some of my tweets, but I don't know how you did it. Did you search for git-annex?
Best,
Juan
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://adamspiers.myopenid.com/"
nickname="Adam"
subject="Cool"
date="2011-12-23T19:16:50Z"
content="""
Very nice :) Just for reference, here's [my Perl implementation](https://github.com/aspiers/git-config/blob/master/bin/git-annex-finddups). As per [this discussion](http://git-annex.branchable.com/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/#comment-fb15d5829a52cd05bcbd5dc53edaffb2) it would be interesting to benchmark these two approaches and see if one is substantially more efficient than the other w.r.t. CPU and memory usage.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="bremner"
ip="156.34.89.108"
subject="problems with spaces in filenames"
date="2012-09-05T02:12:18Z"
content="""
note that the sort -k2 doesn't work right for filenames with spaces in them. On the other hand, git-rm doesn't seem to like the escaped names from escaped_file.
"""]]

View file

@ -0,0 +1,39 @@
[[!comment format=mdwn
username="mhameed"
ip="82.32.202.53"
subject="problems with spaces in filenames"
date="Wed Sep 5 09:38:56 BST 2012"
content="""
Spaces, and other special chars can make filename handeling ugly.
If you don't have a restriction on keeping the exact filenames, then
it might be easiest just to get rid of the problematic chars.
#!/bin/bash
function process() {
dir="$1"
echo "processing $dir"
pushd $dir >/dev/null 2>&1
for fileOrDir in *; do
nfileOrDir=`echo "$fileOrDir" | sed -e 's/\[//g' -e 's/\]//g' -e 's/ /_/g' -e "s/'//g" `
if [ "$fileOrDir" != "$nfileOrDir" ]; then
echo renaming $fileOrDir to $nfileOrDir
git mv "$fileOrDir" "$nfileOrDir"
else
echo "skipping $fileOrDir, no need to rename."
fi
done
find ./ -mindepth 1 -maxdepth 1 -type d | while read d; do
process "$d"
done
popd >/dev/null 2>&1
}
process .
Maybe you can run something like this before checking for duplicates.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="bremner"
ip="156.34.89.108"
subject="more about spaces..."
date="2012-09-09T19:33:01Z"
content="""
Ironically, previous renaming to remove spaces, plus some synching is how I ended up with these duplicates. For what it is worth, aspiers perl script worked out for me with a small modification. I just only printed out the duplicates with spaces in them (quoted).
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkaBh9VNJ-RZ26wJZ4BEhMN1IlPT-DK6JA"
nickname="Alex"
subject="printing keys first is the easiest workaround"
date="2013-04-01T23:32:23Z"
content="""
Since the keys are sure to have nos paces in them, putting them first makes working with the output with tools like sort, uniq, and awk simpler.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnkBYpLu_NOj7Uq0-acvLgWhxF8AUEIJbo"
nickname="Chris"
subject="Find files by key"
date="2013-05-03T04:14:55Z"
content="""
Is there any simple way to search for files with a given key?
At the moment, the best I've come up with is this:
````
git annex find --include '*' --format='${key} ${file}' | grep <KEY>
````
where `<KEY>` is the key. This seems like an awfully longwinded approach, but I don't see anything in the docs indicating a simpler way to do it. Am I missing something?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 7"
date="2013-05-13T18:42:14Z"
content="""
@Chris I guess there's no really easy way because searching for a given key is not something many people need to do.
However, git does provide a way. Try `git log --stat -S $KEY`
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmTNrhkVQ26GBLaLD5-zNuEiR8syTj4mI8"
nickname="Juan"
subject="This is an awesome feature"
date="2013-08-28T13:40:23Z"
content="""
Thanks. I have quite a lot of papers in PDF formats. Now I'm saving space, have them controlled, synchronized with many devices and found more than 200 duplicates.
Is there a way to donate to the project? You really deserve it.
Thanks.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.153.8.7"
subject="comment 9"
date="2013-08-28T20:25:20Z"
content="""
@Juan the best thing to do is tell people about git-annex, help them use it, and file bug reports. Just generally be part of the git-annex community.
(If you really want to donate to me, <http://campaign.joeyh.name/> is still open.)
"""]]

62
doc/tips/flickrannex.mdwn Normal file
View file

@ -0,0 +1,62 @@
# Latest version 0.1.10
Hook program for gitannex to use flickr as backend.
This allows storing any type of file on flickr, not only images and movies.
# Requirements:
python2
Credit for the flickr api interface goes to: <http://stuvel.eu/flickrapi>
Credit for the png library goes to: <https://github.com/drj11/pypng>
Credit for the png tEXt patch goes to: <https://code.google.com/p/pypng/issues/detail?id=65>
# Install
Clone the git repository in your home folder.
git clone git://github.com/TobiasTheViking/flickrannex.git
This should make a ~/flickrannex folder
# Setup
Run the program once to set it up.
cd ~/flickrannex; python2 flickrannex.py
After the setup has finished, it will print the git-annex configure lines.
# Configuring git-annex
git config annex.flickr-hook '/usr/bin/python2 ~/flickrannex/flickrannex.py'
git annex initremote flickr type=hook hooktype=flickr encryption=shared
git annex describe flickr "the flickr library"
# Notes
## Unencrypted mode
The photo name on flickr is currently the GPGHMACSHA1 version.
Run the following command in your annex directory
git annex wanted flickr uuid include=*.jpg or include=*.jpeg or include=*.gif or include=*.png
## Encrypted mode
The current version base64 encodes all the data, which results in ~35% larger filesize.
I might look into yyenc instead. I'm not sure if it will work in the tEXt field.
Run the following command in your annex directory
git annex wanted flickr exclude=largerthan=30mb
## Including directories as tags
Get get each of the directories below the top level git directory added as tags to uploads:
git config annex.flickr-hook 'GIT_TOP_LEVEL=`git rev-parse --show-toplevel` /usr/bin/python2 %s/flickrannex.py'
In this case the image:
/home/me/annex-photos/holidays/2013/Greenland/img001.jpg
would get the following tags: "holidays" "2013" "Greenland"
(assuming "/home/me/annex-photos" is the top level in the annex...)
Caveat Emptor - Tags will *always* be NULL for indirect repos - we don't (easily) know the human-readable file name.

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmkBwMWvNKZZCge_YqobCSILPMeK6xbFw8"
nickname="develop"
subject="comment 10"
date="2013-06-07T09:39:59Z"
content="""
I'm not even sure if chunksize is exposed to the hooks at all.
As it is, the hook will check the filesize, and if the filesize is more than 30mbyte it will exit 1.
Chunking may be implemented down the road. I do believe joeyh might have some plans that will touch this issue, so I'd rather wait. Than re-invent the wheel yet again.
"""]]

View file

@ -0,0 +1,46 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnaH44G3QbxBAYyDwy0PbvL0ls60XoaR3Y"
nickname="Nigel"
subject="git annex get failed"
date="2013-08-02T14:29:30Z"
content="""
Hi, I am coming back to this and testing Flickr as a repository for moving files about and have run into what may be my very basic misunderstanding with vanilla annex.
I copied one file to Flickr and dropped it elsewhere (--force). I assumed that the file was on Flickr ok but that the numcopies setting required the force because of the semi-trust level of the Flickr remote.
Then I find I can't get the file back, even though there is a record of it from whereis.
Can you help enlighten me as to what am I missing? I assumed whereis would only report files that exist and can be copied back. If not my error, I can raise bug or search for logs. Thanks in advance for any help.
[[!format perl \"\"\"
nrb@nrb-ThinkPad-T61:~/tmp$ git annex whereis
whereis libpeerconnection.log (3 copies)
31124688-0792-4214-9e00-7ed115aa6b8e -- flickr (the flickr library)
3e3d40d7-de8f-4591-a4ab-747d74a3b278 -- origin (my laptop)
ec2d64fc-30d6-48b4-99bf-7b1bc22d420d -- portable USB drive
ok
whereis test.cgi (1 copy)
31124688-0792-4214-9e00-7ed115aa6b8e -- flickr (the flickr library)
ok
whereis walkthrough.sh (3 copies)
31124688-0792-4214-9e00-7ed115aa6b8e -- flickr (the flickr library)
3e3d40d7-de8f-4591-a4ab-747d74a3b278 -- origin (my laptop)
ec2d64fc-30d6-48b4-99bf-7b1bc22d420d -- portable USB drive
ok
whereis walkthrough.sh~ (3 copies)
31124688-0792-4214-9e00-7ed115aa6b8e -- flickr (the flickr library)
3e3d40d7-de8f-4591-a4ab-747d74a3b278 -- origin (my laptop)
ec2d64fc-30d6-48b4-99bf-7b1bc22d420d -- portable USB drive
ok
nrb@nrb-ThinkPad-T61:~/tmp$ git annex get test.cgi
get test.cgi (from flickr...)
git-annex: /home/nrb/tmp/.git/annex/tmp/SHA256E-s48--a01eedbee949120aeda41e566f9ae8faef1c2bacaa6d7bb8e45050fb8df6d09d.cgi: rename: does not exist (No such file or directory)
failed
git-annex: get: 1 failed
nrb@nrb-ThinkPad-T61:~/tmp$
\"\"\"]]
"""]]

View file

@ -0,0 +1,58 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnaH44G3QbxBAYyDwy0PbvL0ls60XoaR3Y"
nickname="Nigel"
subject="re: git annex get failed"
date="2013-08-02T15:02:14Z"
content="""
Another try - this time a slightly simpler setup using my version of the walkthrough commands
[[!format bash \"\"\"
nrb@nrb-ThinkPad-T61:~/repos/annex/laptop-annex$ git annex drop walkthrough.sh --from usbdrive
drop usbdrive walkthrough.sh ok
(Recording state in git...)
nrb@nrb-ThinkPad-T61:~/repos/annex/laptop-annex$ git annex move walkthrough.sh --to flickr
move walkthrough.sh (gpg) (checking flickr...) (to flickr...)
/home/nrb/repos/gits/flickrannex/flickrannex.py:92: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if res:
/home/nrb/repos/gits/flickrannex/flickrannex.py:100: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
if res:
ok
(Recording state in git...)
nrb@nrb-ThinkPad-T61:~/repos/annex/laptop-annex$ git annex whereis
whereis walkthrough.sh (1 copy)
161b7af0-2075-4314-9767-308a49b86018 -- flickr (the flickr library)
ok
whereis walkthrough.sh~ (3 copies)
161b7af0-2075-4314-9767-308a49b86018 -- flickr (the flickr library)
7803d853-d231-4bb4-b696-f12a950fb96b -- here (my laptop)
d60d75f9-d878-4214-af20-fa055134ae77 -- usbdrive (portable USB drive)
ok
nrb@nrb-ThinkPad-T61:~/repos/annex/laptop-annex$ git annex get walkthrough.sh
get walkthrough.sh (from flickr...) (gpg)
git-annex: /home/nrb/repos/annex/laptop-annex/.git/annex/tmp/GPGHMACSHA1--02f600d7e8b071d2945270fd5e7fc26dd066ff31: openBinaryFile: does not exist (No such file or directory)
gpg: decrypt_message failed: eof
Unable to access these remotes: flickr
Try making some of these repositories available:
161b7af0-2075-4314-9767-308a49b86018 -- flickr (the flickr library)
failed
git-annex: get: 1 failed
nrb@nrb-ThinkPad-T61:~/repos/annex/laptop-annex$ git annex fsck --from flickr
fsck walkthrough.sh (gpg) (checking flickr...) (fixing location log)
** Based on the location log, walkthrough.sh
** was expected to be present, but its content is missing.
** No known copies exist of walkthrough.sh
failed
fsck walkthrough.sh~ (checking flickr...) (fixing location log)
** Based on the location log, walkthrough.sh~
** was expected to be present, but its content is missing.
failed
(Recording state in git...)
git-annex: fsck: 2 failed
nrb@nrb-ThinkPad-T61:~/repos/annex/laptop-annex$
\"\"\" ]]
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmkBwMWvNKZZCge_YqobCSILPMeK6xbFw8"
nickname="develop"
subject="Version 0.1.10 pushed"
date="2013-09-11T20:31:25Z"
content="""
Since the initial release of this hook a lot of issues have been fixed, and a few features added.
I would highly suggest that everyone who is using this hook update to the latest version as i would consider one of the bugs to be fairly major.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmkBwMWvNKZZCge_YqobCSILPMeK6xbFw8"
nickname="develop"
subject="comment 2"
date="2013-06-05T21:33:42Z"
content="""
Get the statically linked version from here http://git-annex.branchable.com/install/Linux_standalone/
I believe the new hook format was introduced in version 4.20130521
"""]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnaH44G3QbxBAYyDwy0PbvL0ls60XoaR3Y"
nickname="Nigel"
subject="missing configuration for flickr-checkpresent-hook"
date="2013-06-05T20:44:25Z"
content="""
<https://github.com/TobiasTheViking/flickrannex/issues/3>
9 days ago: [the annex] \"hook format a few versions ago, and this is using the new hook format\".
Looks very handy. I am just starting with this, but can't seem to get it working as a remote after following the simple walkthrough. All goes well until:
$ git annex copy . --to flickr
copy walkthrough.sh (checking flickr...)
missing configuration for flickr-checkpresent-hook
git-annex: checkpresent hook misconfigured
my Ubuntu 12.04:
$ git annex version
git-annex version: 4.20130516.1
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV Inotify DBus XMPP
local repository version: 3
default repository version: 3
supported repository versions: 3 4
upgrade supported from repository versions: 0 1 2
I guess my \"git-annex version is still too old\"? Any idea what version is needed? Even better if I can figure out which Linux distribution/release has the most up to date version of annex.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmkBwMWvNKZZCge_YqobCSILPMeK6xbFw8"
nickname="develop"
subject="comment 4"
date="2013-06-05T22:02:29Z"
content="""
The path for the binary \"/usr/bin/python2\" is wrong.
It could be any of /usr/bin/python /usr/bin/python2.6 /usr/bin/python2.7
Or maybe in /usr/local/bin
you can try running \"which python\" or \"which python2\" to get the real path.
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnaH44G3QbxBAYyDwy0PbvL0ls60XoaR3Y"
nickname="Nigel"
subject="missing configuration for flickr-checkpresent-hook"
date="2013-06-05T22:00:48Z"
content="""
Many thanks.
I used gitannex-install and was left with a slight anomaly:
Installing...........done
git-annex version 4.20130601 has been installed
$ git-annex version
git-annex version: 4.20130531-g5df09b5
But I guess this includes the new hook format. I get a bit further:
$ git annex copy . --to flickr
copy walkthrough.sh (checking flickr...) (user error (sh [\"-c\",\"/usr/bin/python2 /home/nrb/repos/gits/flickrannex/flickrannex.py\"] exited 1)) failed
copy walkthrough.sh~ (checking flickr...) (user error (sh [\"-c\",\"/usr/bin/python2 /home/nrb/repos/gits/flickrannex/flickrannex.py\"] exited 1)) failed
git-annex: copy: 2 failed
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnaH44G3QbxBAYyDwy0PbvL0ls60XoaR3Y"
nickname="Nigel"
subject="comment 5"
date="2013-06-05T22:11:14Z"
content="""
Thanks, but on my machine I get:
$ which python2
/usr/bin/python2
I have scripted all my walkthrough commands, blowing away the test repositories and flickr settings first each time. This re-runs the flickr scripts and git config annex.flickr-hook etc.
I can't spot anything here.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmWg4VvDTer9f49Y3z-R0AH16P4d1ygotA"
nickname="Tobias"
subject="comment 6"
date="2013-06-06T09:44:11Z"
content="""
That's weird...
You could try adding \"--dbglevel 1 --stderr\" arguments to the hook command and give me the output. But the way i read the log it seems like it doesn't even launch the python intrepreter. I might be wrong though.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnaH44G3QbxBAYyDwy0PbvL0ls60XoaR3Y"
nickname="Nigel"
subject="Unencrypted flickr can only accept picture and video files"
date="2013-06-06T10:24:58Z"
content="""
Thanks and sorry to trouble you, it is my error, I picked unencrypted option (thinking it would be less of an issue) and am using a text file for test, gave an error line:
10:53:07 [flickrannex-0.1.5] main : 'Unencrypted flickr can only accept picture and video files'
I've not looked through your code yet, but could that message be printed when not in debug mode?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmWg4VvDTer9f49Y3z-R0AH16P4d1ygotA"
nickname="Tobias"
subject="comment 8"
date="2013-06-06T10:51:39Z"
content="""
I'll make it so, in the next version i push.
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawleVyKk2kQsB_HgEdS7w1s0BmgRGy1aay0"
nickname="Milan"
subject="chunksize"
date="2013-06-07T09:09:56Z"
content="""
Hi! Does this backend support chunksize option? If yes, is it possible to set it after the remote has been added to the repository?
Thanks, Milan.
"""]]

View file

@ -0,0 +1,127 @@
[git-remote-gcrypt](https://github.com/joeyh/git-remote-gcrypt/)
adds support for encrypted remotes to git. The git-annex
[[gcrypt special remote|special_remotes/gcrypt]] allows git-annex to
also store its files in such repositories. Naturally, git-annex encrypts
the files it stores too, so everything stored on the remote is encrypted.
Here are some ways you can use this awesome stuff..
[[!toc ]]
This page will show how to set it up at the command line, but the git-annex
[[assistant]] can also be used to help you set up encrypted git
repositories.
## prerequisites
* Install
[git-remote-gcrypt](https://github.com/joeyh/git-remote-gcrypt/)
* Install git-annex version 4.20130909 or newer.
## encrypted backup drive
Let's make a USB drive into an encrypted backup repository. It will contain
both the full contents of your git repository, and all the files you
instruct git-annex to store on it, and everything will be encrypted so that
only you can see it.
First, you need to set up a gpg key. You might consider generating a
special purpose key just for this use case, since you may end up wanting to
put the key on multiple machines that you would not trust with your
main gpg key.
You need to tell git-annex the keyid of the key when setting up the
encrypted repository:
git init --bare /mnt/encryptedbackup
git annex initremote encryptedbackup type=gcrypt gitrepo=/mnt/encryptedbackup keyid=$mykey
git annex sync encryptedbackup
Now you can copy (or even move) files to the repository. After
sending files to it, you'll probably want to do a sync, which pushes
the git repository changes to it as well.
git annex copy --to encryptedbackup ...
git annex sync encryptedbackup
Note that if you lose your gpg key, it will be *impossible* to get the
data out of your encrypted backup. You need to find a secure way to store a
backup of your gpg key. Printing it out and storing it in a safe deposit box,
for example.
You can actually specifiy keyid= as many times as you like to allow any one
of a set of gpg keys to access this repository. So you could add a friend's
key, or another gpg key you have.
To restore from the backup, just plug the drive into any machine that has
the gpg key used to encrypt it, and then:
git clone gcrypt::/mnt/encryptedbackup restored
cd restored
git annex enableremote encryptedbackup gitrepo=/mnt/encryptedbackup
git annex get --from encryptedbackup
## encrypted git-annex repository on a ssh server
If you have a ssh server that has rsync installed, you can set up an
encrypted repository there. Works just like the encrypted drive except
without the cable.
First, on the server, run:
git init --bare encryptedrepo
(Also, install git-annex on the server if it's possible & easy to do so.
While this will work without git-annex being installed on the server, it
is recommended to have it installed.)
Now, in your existing git-annex repository, set up the encrypted remote:
git annex initremote encryptedrepo type=gcrypt gitrepo=ssh://my.server/home/me/encryptedrepo keyid=$mykey
git annex sync encryptedrepo
If you're going to be sharing this repository with others, be sure to also
include their keyids, by specifying keyid= repeatedly.
Now you can copy (or even move) files to the repository. After
sending files to it, you'll probably want to do a sync, which pushes
the git repository changes to it as well.
git annex copy --to encryptedrepo ...
git annex sync encryptedbackup
Anyone who has access to the repo it and has one of the keys
used to encrypt it can check it out:
git clone gcrypt::ssh://my.server/home/me/encryptedrepo myrepo
cd myrepo
git annex enableremote encryptedrepo gitrepo=ssh://my.server/home/me/encryptedrepo
git annex get --from encryptedrepo
## private encrypted git remote on hosting site
You can use gcrypt to store your git repository in encrypted form on any
hosting site that supports git. Only you can decrypt its contents.
Using it this way, git-annex does not store large files on the hosting site; it's
only used to store your git repository itself.
git remote add encrypted gcrypt::ssh://hostingsite/myrepo.git
git push encrypted master git-annex
Now you can carry on using git-annex with your new repository. For example,
`git annex sync` will sync with it.
To check out the repository from the hosting site, use the same gcrypt::
url you used when setting it up:
git clone gcrypt::ssh://hostingsite/myrepo.git
## multiuser encrypted git remote on hosting site
Suppose two users want to share an encrypted git remote. Both of you
need to set up the remote, and configure gcrypt to encrypt it so that both
of you can see it.
git remote add sharedencrypted gcrypt::ssh://hostingsite/myrepo.git
git config remote.sharedencrypted.gcryt-participants "$mykey $friendkey"
git config git push sharedencrypted master git-annex

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="tanen"
ip="83.128.159.25"
subject="comment 10"
date="2013-11-04T17:58:36Z"
content="""
> \"We could symetrically encrypt the repository with a keyfile that's stored in the repository itself\"
> Then you would need to decrypt the repository in order get the key you need to decrypt the repository. The impossibility of this design is why I didn't do that!
Sorry, I ment that the file containing the symmetric encryption key should obviously not be used to encrypt itself, it would be stored in the repository \"unencrypted\" (but protected with a passphrase)
> store a non-encrypted gpg key alongside the repsitory encrypted with it, but then you have to rely on a passphrase for all your security.
Exactly. I think such a mode be a great addition. It might not be as secure as encryption based on a private key - depending on the passphrase strength -, but it would certainly be a lot more convenient and portable (and still much more secure than the shared encryption method).
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkbpbjP5j8MqWt_K4NASwv0WvB8T4rQ-pM"
nickname="Fabrice"
subject="Is there a way to specify a preferred pgp key?"
date="2013-11-01T18:57:38Z"
content="""
Hi,
I think the current behavior of the special remote is a bit annoying when one has several pgp keys.
Indeed, I've followed the encrypted backup drive example specifying the id of a dedicated key in the initremote step, so far so good. Doing that, I was prompted for my key phrase by the gnome keyring daemon, as expected.
The annoying part starts right at the git annex sync step. Indeed, when git-remote-gcrypt tries to decrypt the manifest from the encrypted remote, rather than trying only the key specified during the initremote step, it tries all my (secret) keys. This means that I get prompted for the key phrase of all those keys (minus the correct one which is already unlocked...).
In the future, this might possible to avoid by allowing gcrypt to fetch a preferred key from git config and to use with the --try-secret-key option available gnupg 2.1.x. But for 1.x or 2.0.x, the simpler option --default-key does not seem to alter the order in which keys are tried to decrypt the manifest. Also, it does not seem to be a problem of the gnome keyring daemon, but rather a gpg problem as when the daemon is replaced by the standard gpg-agent, the same problem occurs.
Meanwhile, is there any way to avoid this problem?
"""]]

View file

@ -0,0 +1,21 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkbpbjP5j8MqWt_K4NASwv0WvB8T4rQ-pM"
nickname="Fabrice"
subject="A possible solution"
date="2013-11-02T14:22:13Z"
content="""
I'm answering to myself :-). A possible solution to the annoying pass phrase asking with current gnupg is to use a specialized secret keyring. One first exports the secret key used for this repository in a specific keyring as follows:
`gpg --export-secret-keys keyid | gpg --import --no-default-keyring --secret-keyring mygitannexsecret.gpg`
This will create a keyring in $HOME/.gnupg with only the specific key.
Then, in the git-remote-gcrypt shell script, gpg should be called as follows
`gpg --no-default-keyring --secret-keyring mygitannexsecret.gpg -q -d ...`
when decrypting the manifest in order to try only the specific key. This behavior can be easily triggered via some git configuration variable.
Any comment?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.47"
subject="comment 3"
date="2013-11-02T17:32:28Z"
content="""
Fabrice, I've filed a bug report about this: <https://github.com/blake2-ppc/git-remote-gcrypt/issues/9>
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="tanen"
ip="83.128.159.25"
subject="comment 4"
date="2013-11-03T22:35:07Z"
content="""
The way I would want to setup git-annex (assistant) is \"Wuala/Spideroak style\": two computers with a full checkout of the repository, changes automatically being synced between them, even if the two computers are never online simultaneously, and encryption should be done locally: the (special) remote should not be able to view file listings or content.
Do I understand it correctly that the gcrypt remote is the only way to make this happen? I tried to create such a setup via the webapp but failed. Adding the repository and remote (via \"Encrypt with GnuPG key\") on the first computer went OK*, but trying to enable that remote on the other computer fails: clicking enable asks me for the SSH password, but after that I just get redirected to a blank screen, with nothing to see in the logfile after the succesful call to ssh-keygen. No entry for the second computer is being added to authorized_keys on the remote.
Perhaps this is because at this point the assistant is unable to actually parse the content of the encrypted repository? I tried importing the private key that was used while creating the repository on the other computer, but that made no difference.
Thinking about this for a while, I believe gpg keys aren't actually particularly suited for this usecase. Even without the bug above, one would either have to awkwardly copy a private key to all hosts that are syncing to the repository; or, every time a new (or reinstalled) host wants to sync the repository, you would manually have to add the new keyid to the config and do the forced push + GCRYPT_FULL_REPACK, presumably having to reupload your entire history. Apart from this, having to backup a private key (outside of your git-annex based backups!) would be quite inconvenient.
How would you feel about adding a new mode of operation where encryption is simply based on a passphrase? We could symetrically encrypt the repository with a keyfile that's stored in the repository itself, protecting the keyfile with a passphrase which - if stored at all - would be stored on the individual computers, outside of the repository.
*although it erroneously used \"E0D2F776E7F674E3\" as key-id while the actual id is E7F674E3; where did that other half come from?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmNu4V5fvpLlBhaCUfXXOB0MI5NXwh8SkU"
nickname="Adam"
subject="comment 5"
date="2013-11-04T04:40:53Z"
content="""
> How would you feel about adding a new mode of operation where encryption is simply based on a passphrase? We could symetrically encrypt the repository with a keyfile that's stored in the repository itself, protecting the keyfile with a passphrase which - if stored at all - would be stored on the individual computers, outside of the repository.
Isn't that what the regular shared-encryption remote already does? Except it doesn't put a passphrase on the key, because anyone who has access to the local repo wouldn't need access to the remote one anyway.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkbpbjP5j8MqWt_K4NASwv0WvB8T4rQ-pM"
nickname="Fabrice"
subject="comment 6"
date="2013-11-04T07:39:21Z"
content="""
> _How would you feel about adding a new mode of operation where encryption is simply based on a passphrase? We could symetrically encrypt the repository with a keyfile that's stored in the repository itself, protecting the keyfile with a passphrase which - if stored at all - would be stored on the individual computers, outside of the repository._
As Adam wrote, without a passphrase, this is the shared encryption method. With an encrypted key, this is more or less the hybrid (default) scheme. The thing is that you have to share a secret to have a encrypted remote. I don't use the webapp, so I don't know what's happening in your case, but this is how it should work with the command line tools. First Alice create the encrypted remote with her pgp key. As far as I understand, git annex creates (via gpg) a key for a symmetric cypher which is stored in the repository, encrypted with Alice public key. If Alice wants to share the repository with Bob, she must either give a key pair (so the private key also, of course) to Bob or ask Bob for his public key. In the first case, Bob can clone the repository directly (upon reception of the key pair), while in the second case, Alice has to active Bob's public key (with `git annex enableremote myremote keyid+=bobsId`). In this case, again as far as I understand, the symmetric key is reencrypted for both Alice and Bob in the repo.
I understand that you tried the first case with the webapp and that it did not work. I had a similar problem documented in this [http://git-annex.branchable.com/bugs/git-annex-shell:_gcryptsetup_permission_denied](bug). Maybe you could had some comments to this bug description?
> _*although it erroneously used \"E0D2F776E7F674E3\" as key-id while the actual id is E7F674E3; where did that other half come from?_
This is the long id of your pgp key (16 characters as opposed to 8 for the short id).
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="tanen"
ip="83.128.159.25"
subject="comment 7"
date="2013-11-04T09:01:13Z"
content="""
Thanks for the responses. Please correct me if I'm wrong, but the way I understood it, using the shared encryption scheme creates a conflict between \"changes being synced between them, even if the two computers are never online simultaneously\" and \"encryption should be done locally: the (special) remote should not be able to view file listings or content.\"
- If I use shared encryption \"the webapp way\", only the file contents will be rsynced to the remote, not the repository itself. This means that different hosts are unable to sync unless they are online simultaneously, so that commit data can be sent directly between them via XMPP. In practice, this would mean my hosts are never synced (because I don't keep my home computer running when I leave for work, and vice versa)
- If I use shared encryption and additionally put the repository itself on a remote, that remote would have the keys to fully decrypt the repository, that's not acceptable.
Reading through the docs again, the hybrid scheme actually seems to be closer to what I want than the shared scheme, but it still has a major downside: the encryption only applies to the files itself, so in order to get \"offline sync\" there still has to be a 'remote' for the repository itself, which will contain all your metadata unencrypted. And also it would depend on the user being able to manually setup and backup a set of gpg keys instead of just memorizing a secure passphrase.
@Fabrice Looks like the bug you found could very well be the cause of the problem I had; I'll try it again when a new version is available.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkbpbjP5j8MqWt_K4NASwv0WvB8T4rQ-pM"
nickname="Fabrice"
subject="comment 8"
date="2013-11-04T10:31:56Z"
content="""
I think you are (at least partially) right. Of course, the only way to sync completely computers that are not on together is to use either a usb drive or a third always on computer. (I've to confess I did not understand first when I read git annex docs, shame on me ;-) If you don't want to trust completely this computer (I don't, for instance), you must :
* use an encrypted git repository on this computer;
* and use either hybrid or pubkey encryption.
But contrarily to what you seem to imply (I hope I understand you correctly), if you do that, the third computer can still figure out a few things (usage patterns, such as where connections come from), but that's all. You've got full sync and everything is encrypted, both the git part and the files handled by the annex. This applied only to encrypted git special remotes as other remotes do not store the git part.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.47"
subject="comment 9"
date="2013-11-04T17:07:55Z"
content="""
\"We could symetrically encrypt the repository with a keyfile that's stored in the repository itself\"
Then you would need to decrypt the repository in order get the key you need to decrypt the repository. The impossibility of this design is why I didn't do that!
It would certainly be possible to store a non-encrypted gpg key alongside the repsitory encrypted with it, but then you have to rely on a passphrase for all your security.
You should file a bug report for the bug you saw..
"""]]

View file

@ -0,0 +1,28 @@
googledriveannex
=========
Hook program for gitannex to use Google Drive as backend
# Requirements:
python2
Credit for the googledrive api interface goes to google
## Install
Clone the git repository in your home folder.
git clone git://github.com/TobiasTheViking/googledriveannex.git
This should make a ~/googledriveannex folder
## Setup
Run the program once to make an empty config file
cd ~/googledriveannex; python2 googledriveannex.py
## Commands for gitannex:
git config annex.googledrive-hook '/usr/bin/python2 ~/googledriveannex/googledriveannex.py'
git annex initremote googledrive type=hook hooktype=googledrive encryption=shared
git annex describe googledrive "the googledrive library"

27
doc/tips/imapannex.mdwn Normal file
View file

@ -0,0 +1,27 @@
imapannex
=========
Hook program for gitannex to use imap as backend
# Requirements:
python2
# Install
Clone the git repository in your home folder.
git clone git://github.com/TobiasTheViking/imapannex.git
This should make a ~/imapannex folder
# Setup
Run the program once to set it up.
cd ~/imapannex; python2 imapannex.py
# Commands for gitannex:
git config annex.imap-hook '/usr/bin/python2 ~/imapannex/imapannex.py'
git annex initremote imap type=hook hooktype=imap encryption=shared
git annex describe imap "the imap library"
git annex wanted imap exclude=largerthan=30mb

41
doc/tips/megaannex.mdwn Normal file
View file

@ -0,0 +1,41 @@
[Megaannex](https://github.com/TobiasTheViking/megaannex)
is a hook program for git-annex to use mega.co.nz as backend
# Requirements:
python2
requests>=0.10
pycrypto
Credit for the mega api interface goes to:
<https://github.com/richardasaurus/mega.py>
## Install
Clone the git repository in your home folder.
git clone git://github.com/TobiasTheViking/megaannex.git
This should make a ~/megannex folder
## Setup
Run the program once to make an empty config file.
cd ~/megaannex; python2 megaannex.py
Edit the megaannex.conf file. Add your mega.co.nz username, password, and folder name.
## Configuring git-annex
git config annex.mega-hook '/usr/bin/python2 ~/megaannex/megaannex.py'
git annex initremote mega type=hook hooktype=mega encryption=shared
git annex describe mega "the mega.co.nz library"
## Notes
You may need to use a different command than "python2", depending
on your python installation.
-- Tobias

View file

@ -0,0 +1,16 @@
Maybe you started out using the WORM backend, and have now configured
git-annex to use SHA1. But files you added to the annex before still
use the WORM backend. There is a simple command that can migrate that
data:
# git annex migrate my_cool_big_file
migrate my_cool_big_file (checksum...) ok
You can only migrate files whose content is currently available. Other
files will be skipped.
After migrating a file to a new backend, the old content in the old backend
will still be present. That is necessary because multiple files
can point to the same content. The `git annex unused` subcommand can be
used to clear up that detritus later. Note that hard links are used,
to avoid wasting disk space.

View file

@ -0,0 +1,77 @@
Scenario
--------
You are a new git-annex user. You have already files spread around many computers and wish to migrate those into git-annex, without having to recopy all files all over the place.
Let's say, for example, you have a server, named `marcos` and a workstation named `angela`. You have your audio collection stored in `/srv/mp3` in `marcos` and `~/mp3` on `angela`, but only `marcos` has all the files, and `angela` only has a subset.
We also assume that `marcos` has an SSH server.
How do you add all this stuff to git-annex?
Create the biggest git-annex repository
---------------------------------------
Start with `marcos`, with the complete directory:
cd /srv/mp3
git init
git annex init
git annex add .
git commit -m"git annex yay"
This will checksum all files and add them to the `git-annex` branch of the git repository. Wait for this process to complete.
Create the smaller repo and synchronise
---------------------------------------
On `angela`, we want to synchronise the git annex metadata with `marcos`. We need to initialize a git repo with `marcos` as a remote:
cd ~/mp3
git init
git remote add marcos marcos.example.com:/srv/mp3
git fetch marcos
git annex info # this should display the two repos
git annex add .
This will, again, checksum all files and add them to git annex. Once that is done, you can verify that the files are really the same as marcos with `whereis`:
git annex whereis
This should display something like:
whereis Orange Seeds/I remember.wav (2 copies)
b7802161-c984-4c9f-8d05-787a29c41cfe -- marcos (anarcat@marcos:/srv/mp3)
c2ca4a13-9a5f-461b-a44b-53255ed3e2f9 -- here (anarcat@angela)
ok
Once you are sure things went on okay, you can synchronise this with `marcos`:
git annex sync
This will push the metadata information to marcos, so it knows which files are available on `angela`. From there on, you can freely get and move files between the two repos!
Importing files from a third directory
--------------------------------------
Say that some files on `angela` are actually spread out outside of the `~/mp3` directory. You can use the `git annex import` command to add those extra directories:
cd ~/mp3
git annex import ~/music/
(!) Be careful that `~/music` is not a git-annex repository, or this will [[destroy it!|bugs/git annex import destroys a fellow git annex repository]].
Deleting deleted files
----------------------
It is quite possible some files were removed (or renamed!) on `marcos` but not on `angela`, since it was synchronised only some time ago. A good way to find out about those files is to use the `--not --in` argument, for example, on `angela`:
git annex whereis --in here --not --in marcos
This will show files that are on `angela` and not on `marcos`. They could be new files that were only added on `angela`, so be careful! A manual analysis is necessary, but let's say you are certain those files are not relevant anymore, you can delete them from `angela`:
git annex drop <file>
If the file is a renamed or modified version from the original, you may need to use `--force`, but be careful! If you delete the wrong file, it will be lost forever!
> (!) Maybe this wouldn't happen with [[direct mode]] and an fsck? --[[anarcat]]

View file

@ -0,0 +1,69 @@
After you've used git-annex for a while, you will have data in your repository
that you don't want to keep in the limited disk space of a laptop or a server,
but that you don't want to entirely delete.
This is where git-annex's support for offline archive drives shines.
You can move old files to an archive drive, which can be kept offline if
it's not practical to keep it spinning. Better, you can move old files to
two or more archive drives, in case one of them later fails to spin up.
(One consideration when [[future_proofing]] your archive.)
To set up an archive drive, you can take any removable drive, format
it with a filesystem you'll be able to read some years later, and then follow
the [[walkthrough]] to set up a repository on it that is a git remote of
the repository in your computer you want to archive. In short:
cd /media/archive
git clone ~/annex
cd ~/annex
git remote add archivedrive /media/archive/annex
git annex sync archivedrive
Don't forget to tell git-annex this is an archive drive (or perhaps a backup
drive). Also, give the drive a description that matches something you write on
its label, so you can find it later:
git annex group archivedrive archive
git annex wanted archivedrive standard
git annex describe archivedrive "my first archive drive (SATA)"
Or you can use the assistant to set up the drive for you.
(Nice video tutorial here: [[videos/git-annex_assistant_archiving]])
(Keeping the archive drive in an offsite location? Consider encrypting
it! See [[fully_encrypted_git_repositories_with_gcrypt]].)
Then, when the archive drive is plugged in, you can easily copy files to
it:
cd ~/annex
git-annex copy --auto --to archivedrive
Or, if you're using the assistant, it will automatically notice when the drive
gets plugged in and copy files that need to be archived.
When you want to get rid of the local file, leaving only the copy on the
archive, you can just:
git annex drop file
The archive drive has to be plugged in for this to work, so git-annex
can verify it still has the file. If you had configured git-annex to
always store 2 [[copies]], it will need 2 archive drives plugged in.
You may find it useful to configure a [[trust]] setting for the drive to
avoid needing to haul it out of storage to drop a file.
Now the really nice thing. When your archive drive gets filled up, you
can simply remove it, store it somewhere safe, and replace it with a new
drive, which can be mounted at the same location for simplicity. Set up
the new drive the same way described above, and use it to archive even more
files.
Finally, when you want to access one of the files you archived, you can
just ask for it:
git annex get file
If necessary git-annex will tell you which archive drive you need to
pull out of storage to get the file back. This is where the description
you entered earlier comes in handy.

Some files were not shown because too many files have changed in this diff Show more