Merge branch 'master' into database

This commit is contained in:
Joey Hess 2015-02-18 14:12:34 -04:00
commit 823cc9b800
20 changed files with 318 additions and 3 deletions

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2015-02-17T21:54:33Z"
content="""
Since the two repos git-annex branches have diverged, you need to run `git
annex merge` to merge them before you can push that branch.
Of course, `git annex sync` handles all that for you. It can be used
against a bare repository as well as a non-bare.
"""]]

View file

@ -0,0 +1,88 @@
[[!comment format=c
username="https://www.google.com/accounts/o8/id?id=AItOawlmLuHhscJsoAqb9q0N3LdtHum6LjY1LK4"
nickname="Markus"
subject="comment 7"
date="2015-02-17T14:43:02Z"
content="""
ssh -t makes no difference, the strace output:
it's completely repetitive, only the futex and mmap calls are at random positions (mmap probably leads to the enormous memory consumption)
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
clock_gettime(0x2 /* CLOCK_??? */, {31, 737743240}) = 0
clock_gettime(CLOCK_MONOTONIC, {365100, 810332327}) = 0
clock_gettime(0x3 /* CLOCK_??? */, {31, 737155560}) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x2b32fb1c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2b32fb18, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x2b32fb48, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x41981d0, FUTEX_WAKE_PRIVATE, 1) = 1
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
clock_gettime(0x2 /* CLOCK_??? */, {31, 851239760}) = 0
clock_gettime(CLOCK_MONOTONIC, {365100, 933314386}) = 0
clock_gettime(0x3 /* CLOCK_??? */, {31, 850549960}) = 0
mmap2(0x30b00000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30b00000
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [HUP ILL TRAP KILL USR1 USR2 CHLD TSTP TTIN URG XFSZ VTALRM IO PWR])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
clock_gettime(0x2 /* CLOCK_??? */, {56, 575838240}) = 0
clock_gettime(CLOCK_MONOTONIC, {365125, 751101804}) = 0
clock_gettime(0x3 /* CLOCK_??? */, {56, 574935120}) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [ILL FPE KILL SEGV USR2 PIPE TERM STOP TSTP URG XCPU XFSZ VTALRM])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="sairon"
subject="comment 2"
date="2015-02-17T15:04:55Z"
content="""
looks like it was the assistant
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2015-02-17T21:39:23Z"
content="""
Re finding repos, if the assistant is configured to automatically
start managing the repo at boot/login, the repo will be
listed in ~/.config/git-annex/autostart
"""]]

View file

@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync.
Help me prioritize my work: What special remote would you most like
to use with the git-annex assistant?
[[!poll open=yes 18 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 10 "Box.com (done)" 74 "My phone (or MP3 player)" 25 "Tahoe-LAFS" 13 "OpenStack SWIFT" 36 "Google Drive"]]
[[!poll open=yes 18 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 10 "Box.com (done)" 74 "My phone (or MP3 player)" 25 "Tahoe-LAFS" 14 "OpenStack SWIFT" 36 "Google Drive"]]
This poll is ordered with the options I consider easiest to build
listed first. Mostly because git-annex already supports them and they

View file

@ -1,3 +1,5 @@
[[!meta title="day 254 sqlite for incremental fsck"]]
Yesterday I did a little more investigation of key/value stores.
I'd love a pure haskell key/value store that didn't buffer everything in
memory, and that allowed concurrent readers, and was ACID, and production

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2015-02-17T20:15:36Z"
content="""
@anarcat, see [[design/caching_database]] for my thinking on that.
"""]]

View file

@ -0,0 +1,34 @@
Worked today on making incremental fsck's use of sqlite be safe with
multiple concurrent fsck processes.
The first problem was that having `fsck --incremental` running and starting a
new `fsck --incremental` caused it to crash. And with good reason, since
starting a new incremental fsck deletes the old database, the old process
was left writing to a datbase that had been deleted and recreated out from
underneath it. Fixed with some locking.
Next problem is harder. Sqlite doesn't support multiple concurrent writers
at all. One of them will fail to write. It's not even possible to have two
processes building up separate transactions at the same time. Before using
sqlite, incremental fsck could work perfectly well with multiple fsck
processes running concurrently. I'd like to keep that working.
My partial solution, so far, is to make git-annex buffer writes, and every
so often send them all to sqlite at once, in a transaction. So most of the
time, nothing is writing to the database. (And if it gets unlucky and
a write fails due to a collision with another writer, it can just wait and
retry the write later.) This lets multiple processes write to the database
successfully.
But, for the purposes of concurrent, incremental fsck, it's not ideal.
Each process doesn't immediately learn of files that another process has
checked. So they'll tend to do redundant work. Only way I can see to
improve this is to use some other mechanism for short-term IPC between the
fsck processes.
----
Also, I made `git annex fsck --from remote --incremental` use a different
database per remote. This is a real improvement over the sticky bits;
multiple incremental fscks can be in progress at once,
checking different remotes.

View file

@ -106,11 +106,14 @@ with appropriate handling of the direct mode files.
## undoing changes in direct mode
There is also the `undo` command to do the equivalent of the above revert in a simpler way. Say you made a change in direct mode, the assistant dutifully committed it and you realise your mistake, you can try:
There is also the `undo` command to do the equivalent of the above revert
in a simpler way. Say you made a change in direct mode, the assistant
dutifully committed it and you realise your mistake, you can try:
git annex undo file
to revert the last change to `file`. Note that you can use the `--depth` flag to revert earlier versions of the file.
to revert the last change to `file`. Note that you can use the `--depth`
flag to revert earlier versions of the file.
## forcing git to use the work tree in direct mode

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
subject="comment 16"
date="2015-02-17T05:22:00Z"
content="""
i believe this is [answered here](https://git-annex.branchable.com/todo/windows_support/#comment-e72601243c643d7821e68d3a04489fcb). TLDR; basically NTFS + symlink works in Linux, but not in Windows/Cygwin, which git-annex seems to be using. YMMV.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnPgn611P6ym5yyL0BS8rUzO0_ZKRldMt0"
nickname="Samuel"
subject="Reseting to the git-annex branch"
date="2015-02-17T09:21:12Z"
content="""
Well, it appears you explicitely asked for reseting to the git-annex branch with the following command
git annex reset --hard git-annex
To go back to the master branch, containing the symlinks, just do
git annex checkout master
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2015-02-17T21:31:42Z"
content="""
There is never a reason to run "git reset --hard git-annex"! For that matter,
don't mess with the git-annex branch if you have not read and understand
the [[internals]] documentation. Even if you have, it's entirely the wrong
thing to be messing with in this situation. It has nothing at all to do
with your problem, except that after running that completely random reset
command, you now have two problems..
The right answer to your interrupted add is something like:
* `git reset --hard master`
* Or, run the `git-annex add` command again and let it resume
* Or, run `git commit` to commit any changes the add made,
followed by `git annex unannex` to back out adding those files.
Or, if this is an entirely new git repo that you have
never committed to before
(my guess based on the "bad default revision 'HEAD'"),
just `rm -rf .git` and start over.
"""]]

View file

@ -0,0 +1,37 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
subject="the actual process i use"
date="2015-02-17T00:58:38Z"
content="""
So it seems i am able to forget all of this within the matter of a few days, and since this is so error prone, here goes a more detailed explanation.
What I do is:
<pre>
git clone repo repo.test
cd repo.test
git annex indirect # be safe! this may take a while, but it's necessary!
git tag bak # keep track of a good working state
git log --stat --stat-count=3 # find the commits we want to trash
git tag firstbad badbeef1 # the first commit we want to kill
git tag keep dada1234 # the first commit we want to keep
git rebase -p --onto firstbad^ keep # drop everything between firstbad (inclusive) and keep (exclusive)
git diff --stat keep # make sure this did what we expected
git branch -D annex/direct/master synced/master # destroy this old branch that still has refs to the old commits
</pre>
Then for each repo:
<pre>
cd repo
git tag bak
git fetch origin # sync the master branch in
git remote prune origin # make sure the dropped branches are gone
git annex indirect # be safe
git reset --hard origin/master
git branch -D synced/master annex/direct/master
git diff --stat bak # should change
</pre>
It would be useful to have that transition propagate properly everywhere so I don't have to do this in every repo, but at least the above should work fairly reliably.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2015-02-17T21:43:16Z"
content="""
It's entirely expected and normal for git-annex to update the UUID
of a remote with `url = somepath` when it notices that the repo at
`somepath` has changed.
This is what you want to happen. If git-annex didn't notice and react to
the UUID change, its location tracking information (for UUID A) would be
inconsistent with the actual status of the repo (using UUID B).
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2015-02-17T21:46:01Z"
content="""
Yes, that's the same, except lookupkey only operates on files that are
checked into git.
(Also, lookupkey will work in a direct mode repo, while such a repo
may not have a symlink to examine.)
25ms doesn't seem bad for a "whole runtime" to fire up. :) I think most of
the overhead probably involves reading the git config and running
git-ls-files.
Note that lookupkey can be passed a whole set of files, so you could avoid
the startup overhead that way too.
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2015-02-17T21:50:11Z"
content="""
And yes, it's fine to bypass git-annex when querying git.
Or even when manipulating the git-annex branch, so long as you either
delete or update .git/annex/index. git-annex is not intended to be magical,
see [[internals]].
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2015-02-17T21:41:27Z"
content="""
I would not recommend running the assistant as root. Any security issue
would escalate the root access; any bug could result in some root level
damage to system.
Of course, I don't know of any such security issues or bugs. If I did, I'd
be fixing them.
On my system, /usr/local is managed by group staff. It seems much safer to
make the assistant be run by some non-root user who is in the staff group.
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""re: why md5sum?"""
date="2015-02-17T21:51:59Z"
content="""
Not all types of keys contain hashes.
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="https://id.koumbit.net/anarcat"
subject="document in the manpage?"
date="2015-02-17T05:28:33Z"
content="""
the manpage makes a passing reference to \"groups\", but nowhere in the manpage is there a reference to this page, which i had to find through google. maybe this should be in the manpage?
"""]]

View file

@ -0,0 +1,3 @@
similar to [[do_not_bug_me_about_intermediate_files]] - i feel that massive `git annex get` operations should have better progress information than the current individual `rsync --progress` bits. i wonder if this couldn't be accomplished with `rsync --info=PROGRESS2`, which gives overall rsync progress, combined with copying multiple files at once with rsync (which would have the side-effect of speeding up `git annex get` for large number of small files).
once this is done, it could be sent back to the webapp UI to give the user a global sense of the overall sync progress (as opposed to per-file progress). --[[anarcat]]