Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
38874c4fe0
4 changed files with 129 additions and 0 deletions
|
@ -0,0 +1,23 @@
|
||||||
|
### Please describe the problem.
|
||||||
|
|
||||||
|
Git annex's special S3 remote doesn't seem to work with DRA buckets in Google cloud storage.
|
||||||
|
|
||||||
|
### What steps will reproduce the problem?
|
||||||
|
|
||||||
|
I created a DRA-style bucket in Google cloud storage:
|
||||||
|
|
||||||
|
gsutil mb gs://gitannex-dra
|
||||||
|
|
||||||
|
Then followed [this hint](https://gist.github.com/jterrace/4576324) to
|
||||||
|
set up use of GCS. Except that it didn't work:
|
||||||
|
|
||||||
|
git annex initremote gcs type=S3 encryption=none host=storage.googleapis.com port=80 bucket=gitannex-dra
|
||||||
|
initremote gcs (checking bucket...) git-annex: Invalid argument.
|
||||||
|
|
||||||
|
### What version of git-annex are you using? On what operating system?
|
||||||
|
|
||||||
|
Wheezy, git-annex version: 5.20141024~bpo70+1
|
||||||
|
|
||||||
|
### Please provide any additional information below.
|
||||||
|
|
||||||
|
There didn't seem to be any extra logs and `--debug` didn't seem to add anything useful.
|
|
@ -0,0 +1,33 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawnwNDA50ZupMvOgpgDqzDRyu5B-mYlVwa4"
|
||||||
|
nickname="Andreas"
|
||||||
|
subject="comment 4"
|
||||||
|
date="2015-01-21T07:30:51Z"
|
||||||
|
content="""
|
||||||
|
This is what I see:
|
||||||
|
|
||||||
|
➜ ~ mkdir test
|
||||||
|
➜ ~ cd test
|
||||||
|
➜ test git init
|
||||||
|
Initialized empty Git repository in /home/deas/test/.git/
|
||||||
|
➜ test git:(master)
|
||||||
|
➜ test git:(master) git annex init
|
||||||
|
init ok
|
||||||
|
(Recording state in git...)
|
||||||
|
➜ test git:(master) touch foobar.txt
|
||||||
|
➜ test git:(master) ✗ git annex add
|
||||||
|
add foobar.txt ok
|
||||||
|
(Recording state in git...)
|
||||||
|
➜ test git:(master) ✗ git annex direct
|
||||||
|
commit
|
||||||
|
[master (root-commit) a6e3d83] commit before switching to direct mode
|
||||||
|
1 file changed, 1 insertion(+)
|
||||||
|
create mode 120000 foobar.txt
|
||||||
|
ok
|
||||||
|
direct foobar.txt
|
||||||
|
/home/deas/test/.git/annex/misctmp/tmp6895: rename: does not exist (No such file or directory)
|
||||||
|
|
||||||
|
leaving this file as-is; correct this problem and run git annex fsck on it
|
||||||
|
direct ok
|
||||||
|
➜ test git:(annex/direct/master)
|
||||||
|
"""]]
|
|
@ -0,0 +1,30 @@
|
||||||
|
### Please describe the problem.
|
||||||
|
|
||||||
|
I have 2 indirect mode repos, both on network filesystems, that I have only used for adding
|
||||||
|
data on one end, then syncing via `git annex sync` and `git annex get`. The problem
|
||||||
|
is that`.nfs` copies are being made for each git annex object data file, e.g:
|
||||||
|
|
||||||
|
`./.git/annex/objects/34/2x/SHA256E-s4112535690--c5f0e5a8af7bf17dd4a8ca192c8ddfb01fe6ec10908c80cffa5ac64c00e28443.vtk.gz/.nfs0000000006d0018600002147`
|
||||||
|
|
||||||
|
Reading up on .nfs files, they are generated when "an open file is removed but is still being accessed".
|
||||||
|
|
||||||
|
### What steps will reproduce the problem?
|
||||||
|
Clone a git annex repo on a network file system, run
|
||||||
|
`git annex sync` ,
|
||||||
|
`git annex drop` ,
|
||||||
|
`git annex get`
|
||||||
|
|
||||||
|
### What version of git-annex are you using? On what operating system?
|
||||||
|
* git-annex version: 5.20140818-g10bf03a
|
||||||
|
* 2.6.34.9-69.fc13.x86_64 fedora 13
|
||||||
|
* 2.6.32-279.22.1.el6.x86_64 centOS
|
||||||
|
|
||||||
|
### Please provide any additional information below.
|
||||||
|
|
||||||
|
[[!format sh """
|
||||||
|
# If you can, paste a complete transcript of the problem occurring here.
|
||||||
|
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
|
||||||
|
|
||||||
|
|
||||||
|
# End of transcript or log.
|
||||||
|
"""]]
|
43
doc/forum/scalability_with_lots_of_files.mdwn
Normal file
43
doc/forum/scalability_with_lots_of_files.mdwn
Normal file
|
@ -0,0 +1,43 @@
|
||||||
|
What is git-annex's [[scalability]] with large (10k+) number of files and a few (~10) repositories?
|
||||||
|
|
||||||
|
I have had difficult times maintaining a music archive of around 20k files, spread around 17 repositories.
|
||||||
|
|
||||||
|
`ncdu` tells me, of the actual files in the direct repository:
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
$ ncdu --exclude .git
|
||||||
|
Total disk usage: 109,3GiB Apparent size: 109,3GiB Items: 23771
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
Now looking at the git-annex metadata:
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
$ time git clone -b git-annex /srv/mp3
|
||||||
|
Cloning into 'mp3'...
|
||||||
|
done.
|
||||||
|
Checking out files: 100% (31207/31207), done.
|
||||||
|
0.69user 1.72system 0:04.65elapsed 51%CPU (0avgtext+0avgdata 47732maxresident)k
|
||||||
|
40inputs+489552outputs (1major+2906minor)pagefaults 0swaps
|
||||||
|
$ git branch
|
||||||
|
annex/direct/master
|
||||||
|
* git-annex
|
||||||
|
master
|
||||||
|
$ wc -l uuid.log
|
||||||
|
7 uuid.log
|
||||||
|
$ find -type f | wc
|
||||||
|
31429 62214 3013920
|
||||||
|
$ du -sh .
|
||||||
|
361M .
|
||||||
|
$ du -sch * | tail -1
|
||||||
|
243M total
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
So basically, it looks like the git-annex location tracking takes up around 243M, 361M if we include git's history of it (I assume). This means around 8KiB of storage per file, and 4KiB/file for history (git is doing a pretty good job here). (8KiB kind of makes sense here: one file for the tracking log (4KiB) and another directory to hold it (another 4KiB)...)
|
||||||
|
|
||||||
|
Is that about right? Are there ways to compress that somehow? Could I at least drop the *history* of that from git without too much harm - that would already save 120MiB...
|
||||||
|
|
||||||
|
That repository is around 18 months old.
|
||||||
|
|
||||||
|
(It's interesting to notice the limitation of the "one file per record" storage format here: since git-annex has so many little files, and all of those take at least $blocksize (it seems like it's 4KB here), it takes up space pretty quickly. Another good point for git here: packing files together saves a *lot* of space! Could files be packed *before* being stored in the git-annex branch? or is that totally stupid. :)
|
||||||
|
|
||||||
|
Thanks! --[[anarcat]]
|
Loading…
Reference in a new issue