Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2017-04-24 10:36:58 -04:00
commit 2a110c939d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
11 changed files with 416 additions and 5 deletions

View file

@ -0,0 +1,247 @@
### Please describe the problem.
I have a Synology DS216+ NAS, which can export shares via SMB2, AFP or NFS. Since I have a lot of "media" data in git-annex, I'd like to be able to put a useful copy of that data (ie, original file names) onto the NAS, and have git-annex track that content as another "local" copy of the content that is available (rather than having to track it by hand).
Because the NFS shares have UID mapping issues (AFAICT NFS v4 id mapping is not supported without Kerberos authentication, due to changes since about Linux 3.4; the Synology is based on a Linux 3.10 kernel), I have been trying to use SMBv2 shares, with symlinks enabled ("Allow symbolic links within shared folders"). I'm aware that `git-annex` on a network file system is less supported, but this is still a use case that would be quite useful to manage larger pools of files.
After updating to the latest release (to [fix git annex init timeouts](https://git-annex.branchable.com/forum/git_annex_init_timeout/)), I am able to complete the `git clone ...`, `git annex init`, and `git sync` steps. (The SMB mount supports symlinks, but not hard links; hence pidlocks being needed.)
But `git annex get` or `git annex copy` to transfer any of the content files fails with:
get README (transfer already in progress, or unable to take transfer lock)
Unable to access these remotes: ashram-data
whether initiated from the SMB mounted `git-annex` or a `git-annex` with the data on a local file system. No content files get transferred.
Of note, on this "SMB share with symlinks enabled" I can manually create symlinks, but `git annex` still detects symlinks as not possible and switches to direct mode. However it leaves the existing symlinks from `git clone ...` in place. It is unclear to me whether this is part of the cause of the "transfer lock" issue, or whether the transfer locks are, eg, not using pidlocks and thus failing because flock() does not work.
### What steps will reproduce the problem?
* Mount a remote file system via SMB v2, with symlinks enabled (eg mounting from a Samba share should be sufficient, as that's what is running on the Synology NAS)
* `git clone` an existing `git-annex` onto a directory on that SMB mounted filesystem
* `git remote ....` configure one or more remotes with the file data
* `git annex init "SMB share"`
* `git annex sync`
* `git annex get ....` of an existing file
or alternatively after the init/sync, go to a source `git-annex` and do:
* `git remote add smb ....`
* `git annex sync`
* `git annex copy --to=smb ...` of an existing file with content in that `git-annex`
### What version of git-annex are you using? On what operating system?
`git-annex` 6.20170320 (downloaded 2017-04-23) on Mac OS X 10.11 (El Capitan):
ewen@ashram:~$ uname -a
Darwin ashram 15.6.0 Darwin Kernel Version 15.6.0: Fri Feb 17 10:21:18 PST 2017; root:xnu-3248.60.11.4.1~1/RELEASE_X86_64 x86_64
ewen@ashram:~$ git annex version
git-annex version: 6.20170320-g41c5d9d
build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV FsEvents ConcurrentOutput TorrentParser MagicMime Feeds Quvi
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
ewen@ashram:~$
### Please provide any additional information below.
[[!format sh """
# Example session
ewen@ashram:/nas01/annex/purchased$ git clone /bkup/purchased/mediabytes
Cloning into 'mediabytes'...
done.
ewen@ashram:/nas01/annex/purchased$ cd mediabytes/
ewen@ashram:/nas01/annex/purchased/mediabytes$ git remote add ashram-data /data/purchased/mediabytes
ewen@ashram:/nas01/annex/purchased/mediabytes$ git remote add ashram /bkup/purchased/mediabytes
ewen@ashram:/nas01/annex/purchased/mediabytes$ git remote remove origin
ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex init 'nas01 file server'
init nas01 file server
Detected a filesystem without POSIX fcntl lock support.
Enabling annex.pidlock.
Detected a filesystem without fifo support.
Disabling ssh connection caching.
Detected a crippled filesystem.
Disabling core.symlinks.
Enabling direct mode.
ok
(recording state in git...)
ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex sync
commit ok
pull ashram
From /bkup/purchased/mediabytes
* [new branch] git-annex -> ashram/git-annex
* [new branch] master -> ashram/master
* [new branch] synced/git-annex -> ashram/synced/git-annex
* [new branch] synced/master -> ashram/synced/master
ok
pull ashram-data
From /data/purchased/mediabytes
* [new branch] git-annex -> ashram-data/git-annex
* [new branch] master -> ashram-data/master
* [new branch] synced/git-annex -> ashram-data/synced/git-annex
* [new branch] synced/master -> ashram-data/synced/master
ok
(merging ashram-data/git-annex into git-annex...)
(recording state in git...)
push ashram
Counting objects: 8, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (8/8), 806 bytes | 0 bytes/s, done.
Total 8 (delta 2), reused 1 (delta 0)
To /bkup/purchased/mediabytes
02b7699..0edf575 git-annex -> synced/git-annex
ok
push ashram-data
Counting objects: 8, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (8/8), 806 bytes | 0 bytes/s, done.
Total 8 (delta 2), reused 1 (delta 0)
To /data/purchased/mediabytes
02b7699..0edf575 git-annex -> synced/git-annex
ok
ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex sync
commit ok
pull ashram
ok
pull ashram-data
ok
ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex get .
get README (transfer already in progress, or unable to take transfer lock)
Unable to access these remotes: ashram-data
Try making some of these repositories available:
00420eb2-1c9a-47e8-abfa-675b9ffb0782 -- naos_wd_1 backup drive
0ac1326e-9e03-4c1e-bd51-6cd664988caa -- Naos (colo) fileserver
0e5fd428-ba5b-4817-8489-9780bc62e655 -- naos_wd_2 backup drive
1144edcc-b18f-4cb1-a828-6c272d7788d6 -- naos_wd_3 backup drive
6142cfd4-83ba-410f-bf54-fd4b79e49fdb -- ashram external data drive [ashram-data]
77aeca6a-a4f8-4734-befc-2a0b50687c52 -- tv home fileserver
cc1c45f3-7c96-4103-b9c0-cce0f5f2cc36 -- bethel_data_drive
eb0ca2b6-e8c6-43b0-a805-de17a4d7eeae -- naos_wd_4 backup drive
failed
get alex-lindsay-1.m4a (transfer already in progress, or unable to take transfer lock)
Unable to access these remotes: ashram-data
Try making some of these repositories available:
00420eb2-1c9a-47e8-abfa-675b9ffb0782 -- naos_wd_1 backup drive
0ac1326e-9e03-4c1e-bd51-6cd664988caa -- Naos (colo) fileserver
0e5fd428-ba5b-4817-8489-9780bc62e655 -- naos_wd_2 backup drive
1144edcc-b18f-4cb1-a828-6c272d7788d6 -- naos_wd_3 backup drive
6142cfd4-83ba-410f-bf54-fd4b79e49fdb -- ashram external data drive [ashram-data]
77aeca6a-a4f8-4734-befc-2a0b50687c52 -- tv home fileserver
cc1c45f3-7c96-4103-b9c0-cce0f5f2cc36 -- bethel_data_drive
eb0ca2b6-e8c6-43b0-a805-de17a4d7eeae -- naos_wd_4 backup drive
failed
[...]
[...]
failed
git-annex: get: 59 failed
ewen@ashram:/nas01/annex/purchased/mediabytes$
ewen@ashram:/nas01/annex/purchased/mediabytes$ ls -l | head
total 118
lrwx------ 1 ewen staff 182 23 Apr 18:53 README -> .git/annex/objects/3M/8w/SHA256E-s273--1eb5c07e12d99c99af8c163b5e005162820518020cc9922152852bd1422858ae/SHA256E-s273--1eb5c07e12d99c99af8c163b5e005162820518020cc9922152852bd1422858ae
lrwx------ 1 ewen staff 200 23 Apr 18:53 alex-lindsay-1.m4a -> .git/annex/objects/9M/xJ/SHA256E-s26905448--c8ecda8ca67c3ff8ab1061b34a2974be41eb4b44421e3c39d1dda63f4de6b910.m4a/SHA256E-s26905448--c8ecda8ca67c3ff8ab1061b34a2974be41eb4b44421e3c39d1dda63f4de6b910.m4a
lrwx------ 1 ewen staff 194 23 Apr 18:53 alexlindsay.txt -> .git/annex/objects/P7/7g/SHA256E-s17467--2e7ae57eff6db5a832681443eb3787d1d7f9e57a52e4fedb2b693136a3df19f0.txt/SHA256E-s17467--2e7ae57eff6db5a832681443eb3787d1d7f9e57a52e4fedb2b693136a3df19f0.txt
lrwx------ 1 ewen staff 198 23 Apr 18:53 bourne-1.mp3 -> .git/annex/objects/98/vg/SHA256E-s7560841--a24ce7acabcb0dd46b9a9df6df8aff50145d8b1d5253308a51573a7742652943.mp3/SHA256E-s7560841--a24ce7acabcb0dd46b9a9df6df8aff50145d8b1d5253308a51573a7742652943.mp3
lrwx------ 1 ewen staff 192 23 Apr 18:53 bourne.txt -> .git/annex/objects/QK/2p/SHA256E-s1031--d45a0f0aba935126bf29d6265e9c252faae8b735abdc20edbec69eb3cc9edcd7.txt/SHA256E-s1031--d45a0f0aba935126bf29d6265e9c252faae8b735abdc20edbec69eb3cc9edcd7.txt
lrwx------ 1 ewen staff 200 23 Apr 18:53 chasejarvis-1.m4a -> .git/annex/objects/M1/8z/SHA256E-s80616681--d1b63892a9d68c51ddef9eb5c0a56ced795a449afe13d4884663ca6ecc76cade.m4a/SHA256E-s80616681--d1b63892a9d68c51ddef9eb5c0a56ced795a449afe13d4884663ca6ecc76cade.m4a
lrwx------ 1 ewen staff 194 23 Apr 18:53 chasejarvis.txt -> .git/annex/objects/35/jv/SHA256E-s50443--687b0cf63cc8f5724e3a694e1de6221ca4d1dd3449a6e6ad9b6f34501bfa30ce.txt/SHA256E-s50443--687b0cf63cc8f5724e3a694e1de6221ca4d1dd3449a6e6ad9b6f34501bfa30ce.txt
lrwx------ 1 ewen staff 184 23 Apr 18:53 checksums -> .git/annex/objects/3Q/vv/SHA256E-s1672--31b3cb7ccbcafc96d5061775dc31aa974bb4ce2f16984b2ab0104eb66674d81b/SHA256E-s1672--31b3cb7ccbcafc96d5061775dc31aa974bb4ce2f16984b2ab0104eb66674d81b
lrwx------ 1 ewen staff 200 23 Apr 18:53 chrismarquardt-1.m4a -> .git/annex/objects/k4/f3/SHA256E-s61861112--c14bacb5adfe3eca237547760183a986499808084dd1925a0202598981e4406b.m4a/SHA256E-s61861112--c14bacb5adfe3eca237547760183a986499808084dd1925a0202598981e4406b.m4a
ewen@ashram:/nas01/annex/purchased/mediabytes$
ewen@ashram:/nas01/annex/purchased/mediabytes$ git annex info
repository mode: direct
trusted repositories: 0
semitrusted repositories: 13
00000000-0000-0000-0000-000000000001 -- web
00000000-0000-0000-0000-000000000002 -- bittorrent
00420eb2-1c9a-47e8-abfa-675b9ffb0782 -- naos_wd_1 backup drive
0ac1326e-9e03-4c1e-bd51-6cd664988caa -- Naos (colo) fileserver
0ba93724-85a8-4ac7-855b-e656a6abaf09 -- ashram (laptop) internal drive [ashram]
0e5fd428-ba5b-4817-8489-9780bc62e655 -- naos_wd_2 backup drive
1144edcc-b18f-4cb1-a828-6c272d7788d6 -- naos_wd_3 backup drive
4e5bfc02-1adc-44d2-a60a-885d9120ed5d -- nas01 file server [here]
6142cfd4-83ba-410f-bf54-fd4b79e49fdb -- ashram external data drive [ashram-data]
77aeca6a-a4f8-4734-befc-2a0b50687c52 -- tv home fileserver
bde87f8e-4740-411f-b068-3dbbd9db03d4 -- nas01 file server
cc1c45f3-7c96-4103-b9c0-cce0f5f2cc36 -- bethel_data_drive
eb0ca2b6-e8c6-43b0-a805-de17a4d7eeae -- naos_wd_4 backup drive
untrusted repositories: 0
transfers in progress: none
available local disk space: 3.63 terabytes (+1 megabyte reserved)
local annex keys: 0
local annex size: 0 bytes
annexed files in working tree: 59
size of annexed files in working tree: 1.67 gigabytes
bloom filter size: 32 mebibytes (0% full)
backend usage:
SHA256E: 59
ewen@ashram:/nas01/annex/purchased/mediabytes$
"""]]
For reference Synology DS216+ `/etc/samba/smb.conf`:
ewen@nas01:/volume1$ cat /etc/samba/smb.conf
[global]
printcap name=cups
winbind enum groups=yes
include=/var/tmp/nginx/smb.netbios.aliases.conf
min protocol=SMB2
security=user
local master=no
realm=*
passdb backend=smbpasswd
printing=cups
max protocol=SMB3
winbind enum users=yes
load printers=yes
workgroup=WORKGROUP
ewen@nas01:/volume1$
and the relevant share:
ewen@nas01:/volume1$ cat /etc/samba/smb.share.conf
[annex]
recycle bin admin only=no
ftp disable modify=no
ftp disable download=no
write list=nobody,nobody
browseable=yes
mediaindex=no
hide unreadable=no
win share=yes
enable recycle bin=no
invalid users=nobody,nobody
read list=nobody,nobody
ftp disable list=no
edit synoacl=yes
valid users=nobody,nobody
writeable=yes
guest ok=yes
path=/volume1/annex
skip smb perm=yes
comment="Git Annexes"
[music]
[...]
ewen@nas01:/volume1$
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
I'm an extremely regular user of git-annex on OS X and Linux, for several years, using it as a podcatcher and to manage most of my "large file" media. It's one of those "couldn't live without" tools. Thanks for writing it.
(Touched, to ask for comments to be emailed to me)

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="git annex standalone on Synology NAS"
date="2017-04-23T07:52:19Z"
content="""
After posting this bug, I found a [git annex tip on running a standalone build on the Synology NAS](http://git-annex.branchable.com/tips/Synology_NAS_and_git_annex/), so that may provide another workaround for me. Particularly since the Synology DS216+ is actually a x86-64 based system:
ewen@nas01:/volume1$ uname -a
Linux nas01 3.10.102 #15047 SMP Thu Feb 23 02:23:28 CST 2017 x86_64 GNU/Linux synology_braswell_216+II
ewen@nas01:/volume1$
but it is probably still worth figuring out the cause of the transfer lock issue on SMB for other SMB use cases.
Ewen
"""]]

View file

@ -1,6 +1,6 @@
Hi,
this is a minor issue and probably there is no better solution, but nevertheless I would like to point out it and maybe discuss a little about the issue.
this is a minor issue and probably there is no better solution, but nevertheless I would like to point it out and maybe discuss a little about the issue.
Given that the symlinks generated by annex are pretty large in size (they point to a file named by a large hash number), ext4 is using an entire block (4K) of storage instead of [embedding the symlink into the inode][inode] itself. For the "archivist use case" of annex, this might lead to tens or hundreds of MBs of disk occupied by symlinks which actually don't add up to more than a few MBs.

View file

@ -0,0 +1,21 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 1"
date="2017-04-24T13:55:02Z"
content="""
> For the \"archivist use case\" of annex, this might lead to tens or hundreds of MBs of disk occupied by symlinks which actually don't add up to more than a few MBs.
$ pwd
/home/sorting_annex/mnt/keyfile
$ du -shc *-*
...
33M fd0dc9d3-ad62-429e-ba1b-acc26a453ca4
33M fd2fc989-bea7-4ffb-bbc8-2e34cd0e5be5
33M fd79bbd4-d41e-4ea8-acc8-86437c5eed7c
33M ffbd042e-f6d9-4450-9a57-8ed1086f587c
2.7G total
$
Just a bit :P (yes, that is 2.7G of symlinks *so far*)
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="git annex get ... transfer lock issues"
date="2017-04-23T07:36:08Z"
content="""
The fix a couple of months ago appears to allow `git annex init` to complete on a SMB file share, which is great. But FTR there are currently [issues with file transfers](https://git-annex.branchable.com/bugs/SMB__58___git_annex_clone_works__44___get_fails_on_transfer_lock/) (link is to bug report about failing to get transfer lock), so use with a NAS is still \"work in progress\".
Ewen
"""]]

View file

@ -1,5 +1,7 @@
Just as git does not scale well with large files, it can also become painful to work with when you have a large *number* of files. Below are things I have found to minimise the pain.
[[!toc]]
# Using version 4 index files
During operations which affect the index, git writes an entirely new index out to index.lck and then replaces .git/index with it. With a large number of files, this index file can be quite large and take several seconds to write every time you manipulate the index!
@ -40,9 +42,14 @@ If it takes a long time to list the files in a directory, naturally, git(-annex)
You can avoid this by keeping the number of files in a directory to between 5000 and 20000 (depends on the filesystem and its settings).
[fpart](http://contribs.martymac.org/fpart/) can be a very useful tool to achieve this.
[fpart](https://sourceforge.net/projects/fpart/) can be a very useful tool to achieve this.
## Topics discussing this sort of usage
This sort of usage was discussed in [[forum/Handling_a_large_number_of_files]] and [[forum/__34__git_annex_sync__34___synced_after_8_hours]]. -- [[CandyAngel]]
* [[forum/Handling_a_large_number_of_files]]
* [[forum/__34__git_annex_sync__34___synced_after_8_hours]]
# Forget tracking information
In addition to keeping track of where files are, git-annex keeps a *log* that keeps track of where files *were*. This can take up space as well and slow down certain operations.
You can use the [[git-annex-forget]] command to drop historical location tracking info for files.
Note: this was discussed in [[forum/scalability_with_lots_of_files]]. -- [[anarcat]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="anarcat"
avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
subject="merge with scalability?"
date="2017-04-24T14:10:29Z"
content="""
shouldn't this tip be merged into the [[scalability]] page directly?
"""]]

View file

@ -1,3 +1,5 @@
Note: this is the reverse of [[splitting_a_repository]].
Scenario
--------

View file

@ -0,0 +1,58 @@
[[!meta title="Splitting a git-annex repository"]]
Note: this is the reverse of [[migrating two seperate disconnected directories to git annex]].
I have a [git annex](https://git-annex.branchable.com/) repo for all my media
that has grown to 57866 files and git operations are getting slow, especially
on external spinning hard drives, so I decided to split it into separate
repositories.
This is how I did it, with some help from `#git-annex`. Suppose the old big repo is at `~/oldrepo`:
```
# Create a new repo for photos only
mkdir ~/photos
cd photos
git init
git annex init laptop
# Hardlink all the annexed data from the old repo
cp -rl ~/oldrepo/.git/annex/objects .git/annex/
# Regenerate the git annex metadata
git annex fsck --fast
# Also split the repo on the usb key
cd /media/usbkey
git clone ~/photos
cd photos
git annex init usbkey
cp -rl ../oldrepo/.git/annex/objects .git/annex/
git annex fsck --fast
# Connect the annexes as remotes of each other
git remote add laptop ~/photos
cd ~/photos
git remote add usbkey /media/usbkey
```
At this point, I went through all repos doing standard cleanup:
```
# Remove unneeded hard links
git annex unused
git annex dropunused --force 1-12345
# Sync
git annex sync
```
To make sure nothing is missing, I used `git annex find --not --in=here`
to see if, for example, the usbkey that should have everything could be missing
some thing.
Update: Antoine Beaupré pointed me to
[this tip about Repositories with large number of files](http://git-annex.branchable.com/tips/Repositories_with_large_number_of_files/)
which I will try next time one of my repositories grows enough to hit a performance issue.
> This document was originally written by [Enrico Zini](http://www.enricozini.org/blog/2017/debian/splitting-a-git-annex-repository/) and added to this wiki by [[anarcat]].

View file

@ -3,3 +3,43 @@ there's a lot of good documentation on this wiki, but it's hard to find sometime
a good example of this problem is [[todo/document_standard_groups_more_extensively_in_the_UI]]. --[[anarcat]]
update: a beginning of this may be the the [[workflow]] page but it lacks a lot of details...
So we have those entry points so far:
* [[git-annex]] - the manpage
* [[walkthrough]] - "A walkthrough of some of the basic features of git-annex, using the command line", described as "only one possible workflow for using git-annex"
* [[assistant]] - a whole subtree of pages describing the assistant, includes a [[assistant/quickstart]] - introduction to the assistant with a series of screenshots, described in [[walkthrough]] as "If you don't want to use the command line, see quickstart instead.", linked from the [[assistant]] page
* [[workflow]] - a summary of the different workflows that git-annex can use
* [[special remotes]] - a good list of "supported backends", which may be a better wording
* inversely, [[not]] is what is *not* supported, obviously
* [[install]] - how to install git-annex, of course
* [[tips]] - a mish-mash list of "how to do X in git-annex", 68 pages at the time of writing
* there's the "details" section on the frontpage which covers lots of the [[internals]], [[design]] and so on
* there are also what i consider to be "leaf" pages like [[how it works]] or [[sync]] there
So it seems the fundamentals of such a user guide are there. It's just a matter of grouping this in a meaningful way.
I am thinking the following structure may be a good basis:
* Introduction
* [[how it works]]?
* ...?
* [[Install|install]]
* Walkthrough
* [[comandline|walkthrough]]
* [[graphical|assistant/quickstart]]?
* How do I...
* [[Supported backends|special remotes]]
* [[Unsupported features|not]]
* [[split repositories|tips/splitting_a_repository]]
* [[merge repositories|tips/migrating_two_seperate_disconnected_directories_to_git_annex]]
* deal with lots of files: [[tips/Repositories_with_large_number_of_files]] or merge into [[scalability/]]?
* decide which files go where? something about [[preferred_content]] and [[preferred_content/standard_groups/]]?
* sort and regroup the best [[tips]] pages
* Troubleshooting / FAQ?
* [[Common mistakes|tips/antipatterns]]
* sort out the best of [[forums]] and [[tips]] that commonly occur
* [[Design]]
* [[internals]]
* [[encryption]]
* ... etc - all the developer stuff users shouldn't usually have to know unless they care about performance or need to reimplement something

View file

@ -27,6 +27,8 @@ produce file changes. When you move files into or out of your repository
folder, git-annex should record the changes and automatically propagate
them to other connected machines.
The webapp is part of the larger [[assistant]].
# 2. [[git annex assistant|git-annex-assistant]] without the webapp
You could call [[`git annex assistant`|git-annex-assistant]] the