Merge branch 'master' into database

2015-02-18 14:12:34 -04:00 · 2015-02-18 14:12:34 -04:00 · 823cc9b800
commit 823cc9b800
parent 17cb219231 ca4cd9f960
20 changed files with 318 additions and 3 deletions
--- a/doc/bare_repositories/comment_3_26ba93bddb0cd1bb4e1799311f3ca750._comment
+++ b/doc/bare_repositories/comment_3_26ba93bddb0cd1bb4e1799311f3ca750._comment
@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2015-02-17T21:54:33Z"
+ content="""
+Since the two repos git-annex branches have diverged, you need to run `git
+annex merge` to merge them before you can push that branch.
+
+Of course, `git annex sync` handles all that for you. It can be used
+against a bare repository as well as a non-bare.
+"""]]
--- a/doc/bugs/git-annex_on_NAS_eats_all_memory/comment_7_a8b34dfe0daad646f8c1def2b7a4d999._comment
+++ b/doc/bugs/git-annex_on_NAS_eats_all_memory/comment_7_a8b34dfe0daad646f8c1def2b7a4d999._comment
@ -0,0 +1,88 @@
+[[!comment format=c
+ username="https://www.google.com/accounts/o8/id?id=AItOawlmLuHhscJsoAqb9q0N3LdtHum6LjY1LK4"
+ nickname="Markus"
+ subject="comment 7"
+ date="2015-02-17T14:43:02Z"
+ content="""
+ssh -t makes no difference, the strace output:
+it's completely repetitive, only the futex and mmap calls are at random positions (mmap probably leads to the enormous memory consumption)
+
+rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
+clock_gettime(0x2 /* CLOCK_??? */, {31, 737743240}) = 0
+clock_gettime(CLOCK_MONOTONIC, {365100, 810332327}) = 0
+clock_gettime(0x3 /* CLOCK_??? */, {31, 737155560}) = 0
+rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
+futex(0x2b32fb1c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2b32fb18, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
+futex(0x2b32fb48, FUTEX_WAKE_PRIVATE, 1) = 1
+futex(0x41981d0, FUTEX_WAKE_PRIVATE, 1) = 1
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
+clock_gettime(0x2 /* CLOCK_??? */, {31, 851239760}) = 0
+clock_gettime(CLOCK_MONOTONIC, {365100, 933314386}) = 0
+clock_gettime(0x3 /* CLOCK_??? */, {31, 850549960}) = 0
+mmap2(0x30b00000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30b00000
+rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
+sigreturn()                             = ? (mask now [])
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [HUP ILL TRAP KILL USR1 USR2 CHLD TSTP TTIN URG XFSZ VTALRM IO PWR])                   
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0                                                                                                  
+clock_gettime(0x2 /* CLOCK_??? */, {56, 575838240}) = 0                                                                                      
+clock_gettime(CLOCK_MONOTONIC, {365125, 751101804}) = 0                                                                               
+clock_gettime(0x3 /* CLOCK_??? */, {56, 574935120}) = 0                                                                               
+rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0                                                                                                 
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [])                                                                                    
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [ILL FPE KILL SEGV USR2 PIPE TERM STOP TSTP URG XCPU XFSZ VTALRM])                     
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])                                        
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])                                        
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])                                        
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                                                                                            
+sigreturn()                             = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])                                        
+--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---                   
+"""]]
--- a/doc/bugs/too_many_ssh_connections_during_sync_of_gcrypt_remotes/comment_2_f87b3ba05c0cdfadd4d5da894dea0abb._comment
+++ b/doc/bugs/too_many_ssh_connections_during_sync_of_gcrypt_remotes/comment_2_f87b3ba05c0cdfadd4d5da894dea0abb._comment
@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="sairon"
+ subject="comment 2"
+ date="2015-02-17T15:04:55Z"
+ content="""
+looks like it was the assistant
+"""]]
--- a/doc/bugs/weird_entry_in_process_list/comment_2_3a551d0144ac55aa54a5d087a705fc28._comment
+++ b/doc/bugs/weird_entry_in_process_list/comment_2_3a551d0144ac55aa54a5d087a705fc28._comment
@ -0,0 +1,9 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2015-02-17T21:39:23Z"
+ content="""
+Re finding repos, if the assistant is configured to automatically
+start managing the repo at boot/login, the repo will be 
+listed in ~/.config/git-annex/autostart
+"""]]
--- a/doc/design/assistant/polls/prioritizing_special_remotes.mdwn
+++ b/doc/design/assistant/polls/prioritizing_special_remotes.mdwn
@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync.
 Help me prioritize my work: What special remote would you most like
 to use with the git-annex assistant?

-[[!poll open=yes 18 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 10 "Box.com (done)" 74 "My phone (or MP3 player)" 25 "Tahoe-LAFS" 13 "OpenStack SWIFT" 36 "Google Drive"]]
+[[!poll open=yes 18 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 10 "Box.com (done)" 74 "My phone (or MP3 player)" 25 "Tahoe-LAFS" 14 "OpenStack SWIFT" 36 "Google Drive"]]

 This poll is ordered with the options I consider easiest to build
 listed first. Mostly because git-annex already supports them and they
--- a/doc/devblog/day_253__sqlite_for_incremental_fsck.mdwn
+++ b/doc/devblog/day_253__sqlite_for_incremental_fsck.mdwn
@ -1,3 +1,5 @@
+[[!meta title="day 254  sqlite for incremental fsck"]]
+
 Yesterday I did a little more investigation of key/value stores.
 I'd love a pure haskell key/value store that didn't buffer everything in
 memory, and that allowed concurrent readers, and was ACID, and production
--- a/doc/devblog/day_253__sqlite_for_incremental_fsck/comment_2_08dd639180ae79addc007ee47d52048a._comment
+++ b/doc/devblog/day_253__sqlite_for_incremental_fsck/comment_2_08dd639180ae79addc007ee47d52048a._comment
@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2015-02-17T20:15:36Z"
+ content="""
+@anarcat, see [[design/caching_database]] for my thinking on that.
+"""]]
--- a/doc/devblog/day_255__sqlite_concurrent_writers_problem.mdwn
+++ b/doc/devblog/day_255__sqlite_concurrent_writers_problem.mdwn
@ -0,0 +1,34 @@
+Worked today on making incremental fsck's use of sqlite be safe with
+multiple concurrent fsck processes.
+
+The first problem was that having `fsck --incremental` running and starting a
+new `fsck --incremental` caused it to crash. And with good reason, since
+starting a new incremental fsck deletes the old database, the old process
+was left writing to a datbase that had been deleted and recreated out from
+underneath it. Fixed with some locking.
+
+Next problem is harder. Sqlite doesn't support multiple concurrent writers
+at all. One of them will fail to write. It's not even possible to have two
+processes building up separate transactions at the same time. Before using
+sqlite, incremental fsck could work perfectly well with multiple fsck
+processes running concurrently. I'd like to keep that working.
+
+My partial solution, so far, is to make git-annex buffer writes, and every
+so often send them all to sqlite at once, in a transaction. So most of the
+time, nothing is writing to the database. (And if it gets unlucky and
+a write fails due to a collision with another writer, it can just wait and
+retry the write later.) This lets multiple processes write to the database
+successfully.
+
+But, for the purposes of concurrent, incremental fsck, it's not ideal.
+Each process doesn't immediately learn of files that another process has
+checked. So they'll tend to do redundant work. Only way I can see to
+improve this is to use some other mechanism for short-term IPC between the
+fsck processes.
+
+----
+
+Also, I made `git annex fsck --from remote --incremental` use a different
+database per remote. This is a real improvement over the sticky bits;
+multiple incremental fscks can be in progress at once, 
+checking different remotes.
--- a/doc/direct_mode.mdwn
+++ b/doc/direct_mode.mdwn
@ -106,11 +106,14 @@ with appropriate handling of the direct mode files.

 ## undoing changes in direct mode

-There is also the `undo` command to do the equivalent of the above revert in a simpler way. Say you made a change in direct mode, the assistant dutifully committed it and you realise your mistake, you can try:
+There is also the `undo` command to do the equivalent of the above revert
+in a simpler way. Say you made a change in direct mode, the assistant
+dutifully committed it and you realise your mistake, you can try:

    git annex undo file

-to revert the last change to `file`. Note that you can use the `--depth` flag to revert earlier versions of the file.
+to revert the last change to `file`. Note that you can use the `--depth`
+flag to revert earlier versions of the file.

 ## forcing git to use the work tree in direct mode

--- a/doc/direct_mode/comment_16_7f6805e090d0acd8a077b65214da5837._comment
+++ b/doc/direct_mode/comment_16_7f6805e090d0acd8a077b65214da5837._comment
@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="https://id.koumbit.net/anarcat"
+ subject="comment 16"
+ date="2015-02-17T05:22:00Z"
+ content="""
+i believe this is [answered here](https://git-annex.branchable.com/todo/windows_support/#comment-e72601243c643d7821e68d3a04489fcb). TLDR; basically NTFS + symlink works in Linux, but not in Windows/Cygwin, which git-annex seems to be using. YMMV.
+"""]]
--- a/doc/forum/What_happens_after_terminated_add_of_huge_picture_folder63/comment_1_0ba60f9625ccda45d59adbd385f5fe98._comment
+++ b/doc/forum/What_happens_after_terminated_add_of_huge_picture_folder63/comment_1_0ba60f9625ccda45d59adbd385f5fe98._comment
@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawnPgn611P6ym5yyL0BS8rUzO0_ZKRldMt0"
+ nickname="Samuel"
+ subject="Reseting to the git-annex branch"
+ date="2015-02-17T09:21:12Z"
+ content="""
+Well, it appears you explicitely asked for reseting to the git-annex branch with the following command
+  git annex reset --hard git-annex
+To go back to the master branch, containing the symlinks, just do
+  git annex checkout master
+
+"""]]
--- a/doc/forum/What_happens_after_terminated_add_of_huge_picture_folder63/comment_3_f628f146a0c652f812c09f78bd574b77._comment
+++ b/doc/forum/What_happens_after_terminated_add_of_huge_picture_folder63/comment_3_f628f146a0c652f812c09f78bd574b77._comment
@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2015-02-17T21:31:42Z"
+ content="""
+There is never a reason to run "git reset --hard git-annex"! For that matter,
+don't mess with the git-annex branch if you have not read and understand
+the [[internals]] documentation. Even if you have, it's entirely the wrong
+thing to be messing with in this situation. It has nothing at all to do
+with your problem, except that after running that completely random reset
+command, you now have two problems..
+
+The right answer to your interrupted add is something like:
+
+* `git reset --hard master`
+* Or, run the `git-annex add` command again and let it resume
+* Or, run `git commit` to commit any changes the add made,
+  followed by `git annex unannex` to back out adding those files.
+
+Or, if this is an entirely new git repo that you have
+never committed to before 
+(my guess based on the "bad default revision 'HEAD'"),
+just `rm -rf .git` and start over.
+"""]]
--- a/doc/forum/canceling_wrong_repository_merge/comment_3_d6e45d7e4f4bdf0a08ab91a08e0c1be6._comment
+++ b/doc/forum/canceling_wrong_repository_merge/comment_3_d6e45d7e4f4bdf0a08ab91a08e0c1be6._comment
@ -0,0 +1,37 @@
+[[!comment format=mdwn
+ username="https://id.koumbit.net/anarcat"
+ subject="the actual process i use"
+ date="2015-02-17T00:58:38Z"
+ content="""
+So it seems i am able to forget all of this within the matter of a few days, and since this is so error prone, here goes a more detailed explanation.
+
+What I do is:
+
+<pre>
+git clone repo repo.test
+cd repo.test
+git annex indirect # be safe! this may take a while, but it's necessary!
+git tag bak # keep track of a good working state
+git log --stat --stat-count=3 # find the commits we want to trash
+git tag firstbad badbeef1 # the first commit we want to kill
+git tag keep dada1234 # the first commit we want to keep
+git rebase -p --onto firstbad^ keep # drop everything between firstbad (inclusive) and keep (exclusive)
+git diff --stat keep # make sure this did what we expected
+git branch -D annex/direct/master synced/master # destroy this old branch that still has refs to the old commits
+</pre>
+
+Then for each repo:
+
+<pre>
+cd repo
+git tag bak
+git fetch origin # sync the master branch in
+git remote prune origin # make sure the dropped branches are gone
+git annex indirect # be safe
+git reset --hard origin/master
+git branch -D synced/master annex/direct/master
+git diff --stat bak # should change
+</pre>
+
+It would be useful to have that transition propagate properly everywhere so I don't have to do this in every repo, but at least the above should work fairly reliably.
+"""]]
--- a/doc/forum/multiple_urls_for_the_same_UUID/comment_9_38f3007635b0a7b7d30bad0af8a2d0a9._comment
+++ b/doc/forum/multiple_urls_for_the_same_UUID/comment_9_38f3007635b0a7b7d30bad0af8a2d0a9._comment
@ -0,0 +1,13 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 9"""
+ date="2015-02-17T21:43:16Z"
+ content="""
+It's entirely expected and normal for git-annex to update the UUID
+of a remote with `url = somepath` when it notices that the repo at 
+`somepath` has changed.
+
+This is what you want to happen. If git-annex didn't notice and react to
+the UUID change, its location tracking information (for UUID A) would be
+inconsistent with the actual status of the repo (using UUID B).
+"""]]
--- a/doc/forum/optimising_lookupkey/comment_1_e06db4754805c1e0ee298ecc676427d2._comment
+++ b/doc/forum/optimising_lookupkey/comment_1_e06db4754805c1e0ee298ecc676427d2._comment
@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2015-02-17T21:46:01Z"
+ content="""
+Yes, that's the same, except lookupkey only operates on files that are
+checked into git.
+
+(Also, lookupkey will work in a direct mode repo, while such a repo
+may not have a symlink to examine.)
+
+25ms doesn't seem bad for a "whole runtime" to fire up. :) I think most of
+the overhead probably involves reading the git config and running
+git-ls-files.
+
+Note that lookupkey can be passed a whole set of files, so you could avoid
+the startup overhead that way too.
+"""]]
--- a/doc/forum/optimising_lookupkey/comment_2_7dbfa3da6ae72a1f0669396664dd0c1a._comment
+++ b/doc/forum/optimising_lookupkey/comment_2_7dbfa3da6ae72a1f0669396664dd0c1a._comment
@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2015-02-17T21:50:11Z"
+ content="""
+And yes, it's fine to bypass git-annex when querying git. 
+
+Or even when manipulating the git-annex branch, so long as you either
+delete or update .git/annex/index. git-annex is not intended to be magical,
+see [[internals]].
+"""]]
--- a/doc/forum/root_assistant63/comment_1_fccab2e36d393f420d0fa23958e6a9d2._comment
+++ b/doc/forum/root_assistant63/comment_1_fccab2e36d393f420d0fa23958e6a9d2._comment
@ -0,0 +1,15 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2015-02-17T21:41:27Z"
+ content="""
+I would not recommend running the assistant as root. Any security issue
+would escalate the root access; any bug could result in some root level
+damage to system.
+
+Of course, I don't know of any such security issues or bugs. If I did, I'd
+be fixing them.
+
+On my system, /usr/local is managed by group staff. It seems much safer to
+make the assistant be run by some non-root user who is in the staff group.
+"""]]
--- a/doc/internals/hashing/comment_6_edb5c3388b5ac3481403c7accf9bb3f2._comment
+++ b/doc/internals/hashing/comment_6_edb5c3388b5ac3481403c7accf9bb3f2._comment
@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""re: why md5sum?"""
+ date="2015-02-17T21:51:59Z"
+ content="""
+Not all types of keys contain hashes.
+"""]]
--- a/doc/preferred_content/standard_groups/comment_3_727ae69e0ef25aa887aed18ef9430f89._comment
+++ b/doc/preferred_content/standard_groups/comment_3_727ae69e0ef25aa887aed18ef9430f89._comment
@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="https://id.koumbit.net/anarcat"
+ subject="document in the manpage?"
+ date="2015-02-17T05:28:33Z"
+ content="""
+the manpage makes a passing reference to \"groups\", but nowhere in the manpage is there a reference to this page, which i had to find through google. maybe this should be in the manpage?
+"""]]
--- a/doc/todo/wishlist:_global_progress_status.mdwn
+++ b/doc/todo/wishlist:_global_progress_status.mdwn
@ -0,0 +1,3 @@
+similar to [[do_not_bug_me_about_intermediate_files]] - i feel that massive `git annex get` operations should have better progress information than the current individual `rsync --progress` bits. i wonder if this couldn't be accomplished with `rsync --info=PROGRESS2`, which gives overall rsync progress, combined with copying multiple files at once with rsync (which would have the side-effect of speeding up `git annex get` for large number of small files).
+
+once this is done, it could be sent back to the webapp UI to give the user a global sense of the overall sync progress (as opposed to per-file progress). --[[anarcat]]