Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2015-03-15 14:52:31 -04:00
commit 1032a6fc5b
20 changed files with 426 additions and 0 deletions

View file

@ -0,0 +1,28 @@
### Please describe the problem.
Two regular repositories created with the assistant, one on the computer and one on an USB stick cannot synchronise. Log reports problem with the index file:
/media/usb/annex/.git/index: copyFile: does not exist (No such file or directory)
### What steps will reproduce the problem?
Create a repository and then another on an usb stick (with `Add another repository`) and add a file to the first one, it doesn't synchronise.
### What version of git-annex are you using? On what operating system?
git-annex 5.20141125 from Debian Jessie
### Please provide any additional information below.
I was able to solve the problem by copying the .git/index file from the first repository to the second one.
Also note that the second repository is on a FAT32 usb stick
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlc-3pdibcizrdz4WmZooECL0k6AvM1cWc"
nickname="Joe"
subject="direct-mode to direct-mode perhaps?"
date="2015-03-09T17:40:59Z"
content="""
It seems like the problem is when using a direct-mode repository on both sides of the copy.
I can get and copy files to and from the USB stick when I'm dealing with my Linux-based (indirect mode) repository.
Could it be that git-annex isn't properly noticing that c:\annex is in direct mode, and so it should copy from c:\annex\readme.txt instead of c:\annex\.git\annex\objects\... ?
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlc-3pdibcizrdz4WmZooECL0k6AvM1cWc"
nickname="Joe"
subject="Ah, but I can directly overwrite the file"
date="2015-03-09T17:46:58Z"
content="""
If I copy f:\annex\bin\s3.exe into c:\annex\bin\s3.exe, (overwriting the .git symlink record) git annex sync will properly record that the file is now in this replica.
So I guess I have a workaround(ish)
--Joe
"""]]

View file

@ -0,0 +1,44 @@
### Please describe the problem.
Can't clone repository on Windows 7 64bit
### What steps will reproduce the problem?
git clone git://git-annex.branchable.com/ gitannex
...
error: Invalid path 'doc/walkthrough/fsck:_verifying_your_data.mdwn'
error: Invalid path 'doc/walkthrough/fsck:_when_things_go_wrong.mdwn'
error: Invalid path 'doc/walkthrough/quiet_please:_When_git-annex_seems_to_skip_files.mdwn'
error: Invalid path 'doc/walkthrough/removing_files:_When_things_go_wrong.mdwn'
error: Invalid path 'doc/walkthrough/transferring_files:_When_things_go_wrong.mdwn'
Checking out files: 100% (7235/7235), done.
git status shows many deleted
git reset --hard shows same error as clone
### What version of git-annex are you using? On what operating system?
git annex version
git-annex version: 5.20150219-g3fc8d83
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV DNS Feeds Quvi TDFA TorrentParser
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E MD5E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 MD5 WORM
URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: unknown
supported repository version: 5
upgrade supported from repository versions: 2 3 4
git --version
git version 1.9.5.msysgit.0
Windows 7 64bit

View file

@ -0,0 +1,34 @@
[[!comment format=mdwn
username="effigies"
subject="Strategy for getting up and running"
date="2015-03-09T17:32:19Z"
content="""
The following is the set of steps I use when setting assistant up on a new repository:
git clone user@host:repo.git
pushd repo
git annex init
touch EMPTY
git annex add EMPTY
git commit -m 'Initial commit'
git push --all
git annex copy --to origin
git annex direct
git annex sync
git annex untrust .
popd
Entering the folder path now lets the assistant take over.
For an existing repository:
git clone user@host:repo.git
pushd repo
git annex get .
git annex direct
git annex sync
git annex untrust .
popd
Not sure if this would be helpful for working out what the assistant behavior should be, but maybe it'll help others get to the point where the assistant works with a gitolite managed repo.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus"
nickname="Jimmy"
subject="comment 2"
date="2015-03-09T16:48:18Z"
content="""
I've tried throwing about ~16 million files at git/git-annex in the past where some files were 1-2kb in size (around 30% of them). git/git-annex doesn't work well at that scale.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmsy_GIefGlGGD_XJp_R6EsWIRUC4ev9XU"
nickname="David"
subject="This is a BIG task"
date="2015-03-13T20:48:56Z"
content="""
If I understand it correctly, 20PB at 2400 shards of 8TB each with 3 copies is 24TB/shard at 1TB/client is 2400*24 = ~60K clients assuming no churn. So it would probably need ~100K clients to cover the churn and have a good chance that each shard had 3 copies at all times. That's 1/3 the size of BOINC's active population.
It would take time to scale to that population. And it would take time to get three copies out of the Archive. During that time, the Archive is growing. The back of my envelope says that doing this in 2.5yrs roughly doubles the Archive's outbound bandwidth if you average it across the 2.5 years. But the population would grow slowly to start with, then faster, so that the bandwidth impact would be back-loaded. And at the end of the 2.5 years, you would need a lot more than the 100K users.
A design that used erasure coding or entanglement would reduce the storage and bandwidth demand considerably while providing adequate reliability.
"""]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlZ-6dtxJY4cP7shhvV8E6YyuV0Rak8it4"
nickname="Giovanni"
subject="comment 1"
date="2015-03-10T22:16:09Z"
content="""
I have a gcrypt special remote encrypted in hybrid mode, when I try to add a keyid using:
git annex enableremote myremote keyid+=XXXXXXXX
I get this error:
enableremote myremote (encryption update) (hybrid cipher with gpg keys XXXXXXXX XXXXXXX) fatal: remote myremote already exists.
git-annex: git [Params \"remote add\",Param \"myremote\",Param \"gcrypt::XXXXXXXXXXX:gcrypt-tests\"] failed
this is my git-annex version info:
git-annex version: 5.20141125
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV Inotify DBus DesktopNotify XMPP DNS Feeds Quvi TDFA CryptoHash
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 WORM URL
remote types: git gcrypt S3 bup directory rsync web webdav tahoe glacier ddar hook external
local repository version: 5
supported repository version: 5
upgrade supported from repository versions: 0 1 2 4
am I doing something wrong? thank you Giovanni
"""]]

View file

@ -0,0 +1,92 @@
I need some help understanding what would cause ``git-annex sync`` to still be running 4hrs+ on a new remote (FAT32 USB drive - BTEST) ? From all my searching, it appears to be part of git's optimization routines. But to be this long and slow seems odd. Also I am not sure what there would be optimize since it's a brand new remote ? The src (WTEST) doesn't seem to show any need to run ``git gc``.
I followed the walkthrough's setup of a remote and followed that up with ``git-annex sync``. I did not specify the source for the sync as there is only one other (WTEST). The content is mostly ISO files and Win32 executables in 7 separate commits. In between ``git-annex import`` and ``git commit``, the working directory was removed with ``git rm '*'`` but no ``git-annex drop``. There were lots of duplicate files and it truly was satisfying to not see the disk usage increase. My backend is also MD5E for purposes of quicker imports and ease of hash lookup from other sources.
I would like to have 20-30 USB remotes. The length of time to get one remote up and running at this point is horrendous. Other people seem to have better experiences with it. Only odd thing that I see with my configuration is that the remote FAT32 drive is in ``indirect`` mode. My understanding is that ``git-annex`` would automatically switch to ``direct`` if it detected a filesystem that did not support symbolic links.
What is wrong with my setup ? How can I fix it ?
Here's the basic config of WTEST.
[[!format sh """
WTEST$ git-annex info
repository mode: indirect
trusted repositories: 0
semitrusted repositories: 4
00000000-0000-0000-0000-000000000001 -- web
00000000-0000-0000-0000-000000000002 -- bittorrent
98dfasdf-ab83-4a0e-8b73-4dfasffdsaff1ae -- WTEST [here]
9dfdsfdf8-c5d5-4761-ab67-ffsadfsadfsa83 -- BTEST
untrusted repositories: 0
transfers in progress: none
available local disk space: 488.16 gigabytes (+1 megabyte reserved)
local annex keys: 57327
local annex size: 58.57 gigabytes
annexed files in working tree: 4322
size of annexed files in working tree: 6.82 gigabytes
bloom filter size: 16 mebibytes (11.5% full)
backend usage:
MD5E: 61649
"""]]
[[!format sh """
WTEST$git object-count -v
count: 521829
size: 66794240
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0
size-garbage: 0
"""]]
[[!format sh """
WTEST$git config -l
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.ignorecase=true
core.precomposeunicode=true
annex.uuid=98dfasdf-ab83-4a0e-8b73-4dfasffdsaff1ae
annex.sshcaching=false
annex.version=5
annex.backends=MD5E
annex.queuesize=102400
annex.genmetadata=true
remote.BTEST.url=/Volumes/BTEST
remote.BTEST.fetch=+refs/heads/*:refs/remotes/BTEST/*
remote.BTEST.annex-uuid=9dfdsfdf8-c5d5-4761-ab67-ffsadfsadfsa83
"""]]
**The long running sync.**
[[!format sh """
BTEST$ git-annex sync
commit ok
pull origin
Auto packing the repository for optimum performance. You may also
run "git gc" manually. See "git help gc" for more information.
Counting objects: 521834, done.
Delta compression using up to 4 threads.
Compressing objects: 12% (59687/462488)
"""]]
**Update 7hrs later:**
[[!format sh """
Writing objects: 70% (366184/521834)
"""]]

View file

@ -0,0 +1,56 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnudiRFAyVAwehTACPhuNV_dhhWsHKIACw"
nickname="Jose"
subject="git annex on chromebook"
date="2015-03-12T18:32:18Z"
content="""
I know I'm commenting on something old but I was looking this topic up and felt this may help any future questions. :S
* its just like any other machine with git annex. You can connect your instance of git annex that runs in the chroot (crouton) with your existing set up like normal. usb, ssh, xmpp+cloud
* you will need a transfer repository, that vps should do nicely (I use a few USB drives)
* this depends if the VPS also has git annex installed or not.
if it does, great! If you want it configured for you, you can run git annex webapp to add a remote server and give it the details to connect to your VPS via ssh. Then tell it where your transfer repository will be and that should be it. If you plan on sharing this repo with your friend, or if you want more control over the data, you may want to set up your ssh repo via the command line instead. you may be able to add encryption and share your repository without necessarily letting your friend have access to everything on your account (if you have separate accounts)
if it does not, also doable. Again if you want simple configuration git annex webapp will prompt you for credentials; in this case it may as you to encrypt the data (http://git-annex.branchable.com/encryption/).
* how to add an ssh remote via commandline (please don't just copy paste, a simple google of the commands will help you understand what they are doing)
$ cd ~/Downloads/Documents
$ git init
$ git annex init \"chromebook\"
$ git remote add my-ssh ssh://user@google.com/home/user/annex/Documents #I recommend you use an 'ssh alias' so you can just do 'ssh://alias1/home/user/annex/Documents'
$ git annex add
$ git annex sync
$ git annex sync --content #I do not remember but I believe it may be necessary to do this for some versions of git annex.
git annex may or may not complain about git annex being installed on your machine. (it does for me each time I do this... perhaps I'm missing a step?)
If you know git annex is installed in the $PATH of your server (also your machine) then you can just run
$ git config remote.my-ssh.annex-ignore false
$ git annex sync
$ git annex sync --content
* how come I can not see my annexed files on chromeos but I can on crouton?
because you are using git annex in 'indirect' mode
$ cd ~/Downloads/Documents
$ git annex direct
You can also 'unlock' the files too but you will need to run git annex unlock every time you need to read/write a file using.
$ git annex unlock
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawksOCeakibYmGDt3wLLo4nkY0FkB72I2Uo"
nickname="Source"
subject="comment 1"
date="2015-03-07T02:50:19Z"
content="""
I think I have tracked down part of the issue. It due to the options that are called by `git-annex` when trying to `rsync` between the NTFS local hard drive and the SSH remote.
I got `rsync` to work by using `rsync -vrc source dest` as options (outside of git-annex). Found [here](http://askubuntu.com/questions/112863/rsync-not-working-between-ntfs-fat-and-ext).
Is there any way to change the rsync options put out by git-annex when copying/moving to and from local ntfs drives on linux?
"""]]

View file

@ -0,0 +1,13 @@
Hello,
I downloaded git-annex-installer.exe from https://downloads.kitenet.net/git-annex/windows/current/git-annex-installer.exe .
When I try to run it on my work windows 7 64-bit laptop, Symantec Encpoint Protection complains with a WS.Reputation.1 flag that the file has a low reputation and quarantines the file.
I searched online trying to find why git-annex is flagged as such by Symantec but was unable to find anything.
I prefer not to build from source if possible as this is my work laptop and it would require that I install Haskell, MingW, and Cygwin.
Any pointers/suggestions would be greatly appreciated.
Thanks,
Ahmad

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkWHj0RxNMfuwvFzo2d-V6vBKOYwW_Fnfk"
nickname="Andrew"
subject="Thanks for the clarification"
date="2015-03-07T03:00:14Z"
content="""
I will post the todo, and in the meantime I can script a `git annex copy --unused` followed by a `git annex sync --content` to capture the full history in the archive
regards
Andrew
"""]]

View file

@ -0,0 +1,10 @@
<img src=http://s.natalian.org/2015-03-10/where-are-the-files.png>
I managed to sync files to a remote ssh store "bible" with `git annex sync --content` however, where I ssh to bible, I was surprised not to see any of the JPG files that were copied there.
What am I missing?
# Solution
I need to run `git annex sync` on the host bible too!

View file

@ -0,0 +1,7 @@
Hi
So I've installed git-annex on my Debian VPS.
How do I find out the URL to access the web UI / assistant?
Thanks!

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
subject="dupe?"
date="2015-03-07T02:06:06Z"
content="""
see also [[todo/Facilitate_public_pretty_S3_URLs]]
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
subject="about copying to the local store"
date="2015-03-07T13:16:02Z"
content="""
there's a [discussion](https://github.com/jbenet/go-ipfs/issues/875) happening upstream about how copying to the local datastore could be avoided.
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
subject="similar to a forum question"
date="2015-03-07T02:05:26Z"
content="""
this is similar to a forum question i asked: [[forum/original_filename_on_s3/]]. --[[anarcat]]
"""]]

View file

@ -0,0 +1,4 @@
This has been described as Google's [[special_remotes/glacier]].
* [Announcement](http://googlecloudplatform.blogspot.in/2015/03/introducing-Google-Cloud-Storage-Nearline-near-online-data-at-an-offline-price.html)
* <https://cloud.google.com/storage/docs/nearline-storage>

View file

@ -0,0 +1,20 @@
I wish to preserve all history on the backup drives using standard groups and the `sync` command. Namely, if I do the following
touch test-of-annex-backup.txt
git annex add test-of-annex-backup.txt
git commit --message='test: Create empty test-of-annex-backup.txt file'
git annex edit test-of-annex-backup.txt
echo "This line creates version 2 of this file" > test-of-annex-backup.txt
git annex add test-of-annex-backup.txt
git commit --message='test: Create version 2 of test-of-annex-backup.txt'
git annex sync --content --all
I expect to see 2 copies of `test-of-annex-backup.txt` be copied to each accessible annex repository in the `backup` [standard group](http://git-annex.branchable.com/preferred_content/standard_groups/)
At present, the `backup` standard group prefers unused files, but the `sync` command cannot act on this configuration, since it lacks an `--all` option. This is surprising to me as a user, and appears to contradict the intent of the preferred content, as evinced by the awkward explanation of why it is there in [preferred content](http://git-annex.branchable.com/preferred_content/)
Please add an `--all` option to the `sync` command
Thanks
Andrew