Merge branch 'master' of ssh://git-annex.branchable.com

2015-06-16 18:56:20 -04:00 · 2015-06-16 18:56:20 -04:00 · 8b0549b408
commit 8b0549b408
parent 2c77fb5cae 0c69e6055d
3 changed files with 21 additions and 2 deletions
--- a/doc/forum/Handling_a_large_number_of_files.mdwn
+++ b/doc/forum/Handling_a_large_number_of_files.mdwn
@ -0,0 +1,3 @@
+I have noticed performance getting really slow when adding files (git annex add . ) to a directory already containing several hundred thousand files. When using git annex, is it more recommended to split large numbers of files into multiple directories containing fewer files? Is there a particular recommended way of handling large numbers of files (say getting into the millions) in git annex? 
+
+Thanks
--- a/doc/todo/S3_fsck_support/comment_2_7a1ce64d362b8f75adf22709771a7787._comment
+++ b/doc/todo/S3_fsck_support/comment_2_7a1ce64d362b8f75adf22709771a7787._comment
@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="anarcat"
+ subject="comment 2"
+ date="2015-06-16T20:10:50Z"
+ content="""
+understood: i thought `-f` was `--from`... hence my confusion.
+
+as for `remoteFsck`, i guess what i am saying is exactly that: there *does* seem to be a way to do a remote checksum of the file *without* downloading it. it seems to be a critical advantage over having to download the whole repository to check it... maybe `--fast` could use that technique and `non--fast` would download?
+
+as for the on-wire MD5 stuff, that does seem to be overkill...
+"""]]
--- a/doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn
+++ b/doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn
@ -8,11 +8,11 @@ hook to do this. --[[Joey]]

 There are two levels of checking it seems such a command could do:

-1. Only allow certian files to be changed. For example, maye clients are only
+1. Only allow certain files to be changed. For example, maybe clients are only
   expected to change location tracking files, and the activity.log
   file, but not others like trust.log.

-2. Only allow moidiciations of data about a specific UUID. The UUID
+2. Only allow modifications of data about a specific UUID. The UUID
   would be provided to the command (and could be determined based on a
   per-client ssh key or etc).

@ -34,3 +34,8 @@ This might be too limiting for some situations:
  changes to remote.log, which the first level of checking would not allow.
  And, it would add another UUID, which the second level of checking would
  need to be configured to allow.
+
+Python implementation
+---------------------
+
+I started doing an implementation of this in Python here. For technical reasons the git repo is not publicly available, but here's a [dump](http://paste.debian.net/232563/) of the code. I went through what seems to be a rather convoluted process with libgit there because I wanted to have some proper unit tests and generating git commands by hand in a shell script is rather painful.Also, it currently adopts a "blocking" approach, ie. it blocks known problems, but maybe it should be based on an "allow" approach, that is: only allow certain things to go through. So far it only forbids removals and changes to trust.log.  A bunch of stuff is still missing like parameters (to allow changing the list of protected files) and checking the log tracking info. Feedback welcome. --[[anarcat]]