Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2019-01-04 15:12:02 -04:00
commit 9cf9ef5077
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 97 additions and 2 deletions

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="Chymera"
avatar="http://cdn.libravatar.org/avatar/555d585d6d78c68894ac90fd1e984309"
subject="comment 6"
date="2019-01-03T22:56:25Z"
content="""
git annex info | grep \"size of annexed files in working tree\"
This does nothing but hang and I am not sure whether it's git annex or grep that hangs:
chymera@clusterhost /mnt/overflow $ ps aux | ag annex
chymera 5884 0.0 0.0 139920 3388 pts/7 S+ 23:53 0:00 git annex info
chymera 5885 0.0 0.0 133216 900 pts/7 S+ 23:53 0:00 grep --colour=auto size of annexed files in working tree
chymera 5886 6.4 0.0 1074610112 102528 pts/7 Dl+ 23:53 0:05 /usr/bin/git-annex info
chymera 5905 0.0 0.0 11304 1084 pts/8 S+ 23:55 0:00 ag annex
chymera@clusterhost /mnt/overflow $ ps aux | ag git
chymera 5884 0.0 0.0 139920 3388 pts/7 S+ 23:53 0:00 git annex info
chymera 5886 6.3 0.0 1074610112 102528 pts/7 Dl+ 23:53 0:05 /usr/bin/git-annex info
chymera 5893 0.0 0.0 258580 4492 pts/7 S+ 23:54 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs -c core.bare=false cat-file --batch
chymera 5894 0.0 0.0 139920 3740 pts/7 S+ 23:54 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs -c core.bare=false cat-file --batch-check=%(objectname) %(objecttype) %(objectsize)
chymera 5909 0.0 0.0 11304 1032 pts/8 S+ 23:55 0:00 ag git
chymera@clusterhost /mnt/overflow $ ps aux | ag grep
chymera 5885 0.0 0.0 133216 900 pts/7 S+ 23:53 0:00 grep --colour=auto size of annexed files in working tree
chymera 5913 0.0 0.0 11304 1072 pts/8 S+ 23:55 0:00 ag grep
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="andrew"
avatar="http://cdn.libravatar.org/avatar/acc0ece1eedf07dd9631e7d7d343c435"
subject="comment 7"
date="2019-01-03T23:48:03Z"
content="""
Aaah, sorry, yeah, `git-annex info` is very slow its checks many things locally and remotely… (i've seen it run for 30min+ on some of my repos). No, worries I don't think we'll learn too much more from that command than we learned from the `du` commands.
You indeed do have some un-accounted for space in `.git`, I usually expect most of the space to be in the git-annex or git objects folders but that only accounts for 1.6 of the 501 GB in your .git folder.
What are the outputs of `du -h -d 1 .git/` thats a level-1 listing of files in .git, and `du -h -d 1 .git/annex/` thats for files in the annex specific folder? That will help narrow down where the space is eaten up from. Perhaps `.git/annex/misctmp` or `.git/annex/tmp` are the culprits.
"""]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="Chymera"
avatar="http://cdn.libravatar.org/avatar/555d585d6d78c68894ac90fd1e984309"
subject="comment 8"
date="2019-01-04T03:52:22Z"
content="""
Additionally, I notice that the git annex version my repo has (5) is 2 versions old. Given the git-annex availability on my distributions, I think I could bump this to 6 --- do you suggest I do this now or after I have this issue handled?
chymera@clusterhost ~/ni_data $ du -h -d 1 .git/
12K .git/info
52K .git/hooks
218M .git/objects
501G .git/annex
124K .git/refs
172K .git/logs
501G .git/
chymera@clusterhost ~/ni_data $ du -h -d 1 .git/annex/
499G .git/annex/misctmp
4.0K .git/annex/ssh
8.3M .git/annex/journal
60K .git/annex/keys
30M .git/annex/transfer
1.6G .git/annex/objects
4.0K .git/annex/tmp
501G .git/annex/
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="andrew"
avatar="http://cdn.libravatar.org/avatar/acc0ece1eedf07dd9631e7d7d343c435"
subject="comment 9"
date="2019-01-04T12:49:51Z"
content="""
Aaah. All the space is in `.git/annex/misctmp`. This is essentially a directory for git annex to stage things temporarily, but I don't know too much about what gets put in this directory and when it is safe to delete it (the only official documentation is in [internals](http://git-annex.branchable.com/internals/)).
One person had their `.git/annex/misctmp` dir fill up after [interrupting the assistant during transfers](https://git-annex.branchable.com/forum/misctmp_filling_up/), another person had their misctmp fill up [after interrupting git annex while it was switching to direct mode](https://git-annex.branchable.com/bugs/direct_command_leaves_repository_inconsistent_if_interrupted/).
Maybe one of those situations applies to you? Perhaps take a look at some of the files in misctmp and try to evaluate if you feel they are safe to delete? They should have somewhat recognizable names. I don't know if running `git annex fsck` will cleanup any of these files (Joey?).
I would personally not rush into upgrading from v5. v6 has been deprecated so, with the latest git-annex, it will auto-upgrade v6 to v7 (so you can't have a v6 repo anymore). So your only options are staying on v5 or upgrading to v7. But, there are some significant differences (currently) that you need to be aware of. v7 no longer supports direct mode (it has features that are similar but not equivalent in all situations). v7 (and v6) take control of `git add` so files are actually added to the annex (not git) when you use this command unless you have configured largefiles (this makes it a bit more difficult to maintain repos that have a mix of git and git-annex files. And unlocked/locked files are treated differently.
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="chocolate.camera@ec2ecab153906be21ac5f36652c33786ad0e0b60"
nickname="chocolate.camera"
avatar="http://cdn.libravatar.org/avatar/4f00dfc3ad590ef7492788b854ceba78"
subject="comment 1"
date="2019-01-04T16:11:12Z"
content="""
I still do not know how to fix the repo. An Internet search seems to indicate the error comes from git annex' being written in Haskell: see [A newcomer's run-in with lazy I/O](https://ianthehenry.com/2016/3/9/lazy-io/).
"""]]

View file

@ -0,0 +1,6 @@
Hello, the teams I work with have repositories for tracking CI pipelines and build scripts. There are binary resources, and sensitive information, that we would like to somehow be able to store with the repo, but in a secure fashion. Would a scenario like this be feasible with git-annex?
* create an annex attached to existing code repositories, with s3 as the special remote.
* each developer is able to read or add to and from the encrypted bucket using either their key from signed commits or from an ssh key
We already reject non-signed commits, and are not public-facing in our repositories or accessible without credentials to s3. The developers with access to the repository are all of the same access level internal to the company with permission to do what they must with the keys. I'm sorry if this is an obvious 'yes' or 'no' question. Using git-annex privately as a file store for myself thus far has been excellent.

View file

@ -30,8 +30,8 @@ designed to interoperate with it.
uses git-annex to "create a two-way, distributed content distribution
network for communities with poor connexions to the internet".
* [The Japanese American Legacy Project](http://www.densho.org/)
uses git-annex to manage upwards of 100 terabytes of collections,
* [Densho](http://www.densho.org/)
uses git-annex to manage upwards of 10+ terabytes of collections,
transporting them from small cultural heritage sites on USB drives.
User interface is a [Django web app](https://github.com/densho).