Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2015-06-16 18:56:20 -04:00
commit 8b0549b408
3 changed files with 21 additions and 2 deletions

View file

@ -0,0 +1,3 @@
I have noticed performance getting really slow when adding files (git annex add . ) to a directory already containing several hundred thousand files. When using git annex, is it more recommended to split large numbers of files into multiple directories containing fewer files? Is there a particular recommended way of handling large numbers of files (say getting into the millions) in git annex?
Thanks

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="anarcat"
subject="comment 2"
date="2015-06-16T20:10:50Z"
content="""
understood: i thought `-f` was `--from`... hence my confusion.
as for `remoteFsck`, i guess what i am saying is exactly that: there *does* seem to be a way to do a remote checksum of the file *without* downloading it. it seems to be a critical advantage over having to download the whole repository to check it... maybe `--fast` could use that technique and `non--fast` would download?
as for the on-wire MD5 stuff, that does seem to be overkill...
"""]]

View file

@ -8,11 +8,11 @@ hook to do this. --[[Joey]]
There are two levels of checking it seems such a command could do:
1. Only allow certian files to be changed. For example, maye clients are only
1. Only allow certain files to be changed. For example, maybe clients are only
expected to change location tracking files, and the activity.log
file, but not others like trust.log.
2. Only allow moidiciations of data about a specific UUID. The UUID
2. Only allow modifications of data about a specific UUID. The UUID
would be provided to the command (and could be determined based on a
per-client ssh key or etc).
@ -34,3 +34,8 @@ This might be too limiting for some situations:
changes to remote.log, which the first level of checking would not allow.
And, it would add another UUID, which the second level of checking would
need to be configured to allow.
Python implementation
---------------------
I started doing an implementation of this in Python here. For technical reasons the git repo is not publicly available, but here's a [dump](http://paste.debian.net/232563/) of the code. I went through what seems to be a rather convoluted process with libgit there because I wanted to have some proper unit tests and generating git commands by hand in a shell script is rather painful.Also, it currently adopts a "blocking" approach, ie. it blocks known problems, but maybe it should be based on an "allow" approach, that is: only allow certain things to go through. So far it only forbids removals and changes to trust.log. A bunch of stuff is still missing like parameters (to allow changing the list of protected files) and checking the log tracking info. Feedback welcome. --[[anarcat]]