Merge branch 'master' into database
This commit is contained in:
commit
823cc9b800
20 changed files with 318 additions and 3 deletions
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2015-02-17T21:54:33Z"
|
||||
content="""
|
||||
Since the two repos git-annex branches have diverged, you need to run `git
|
||||
annex merge` to merge them before you can push that branch.
|
||||
|
||||
Of course, `git annex sync` handles all that for you. It can be used
|
||||
against a bare repository as well as a non-bare.
|
||||
"""]]
|
|
@ -0,0 +1,88 @@
|
|||
[[!comment format=c
|
||||
username="https://www.google.com/accounts/o8/id?id=AItOawlmLuHhscJsoAqb9q0N3LdtHum6LjY1LK4"
|
||||
nickname="Markus"
|
||||
subject="comment 7"
|
||||
date="2015-02-17T14:43:02Z"
|
||||
content="""
|
||||
ssh -t makes no difference, the strace output:
|
||||
it's completely repetitive, only the futex and mmap calls are at random positions (mmap probably leads to the enormous memory consumption)
|
||||
|
||||
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
|
||||
clock_gettime(0x2 /* CLOCK_??? */, {31, 737743240}) = 0
|
||||
clock_gettime(CLOCK_MONOTONIC, {365100, 810332327}) = 0
|
||||
clock_gettime(0x3 /* CLOCK_??? */, {31, 737155560}) = 0
|
||||
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
|
||||
futex(0x2b32fb1c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2b32fb18, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
|
||||
futex(0x2b32fb48, FUTEX_WAKE_PRIVATE, 1) = 1
|
||||
futex(0x41981d0, FUTEX_WAKE_PRIVATE, 1) = 1
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
|
||||
clock_gettime(0x2 /* CLOCK_??? */, {31, 851239760}) = 0
|
||||
clock_gettime(CLOCK_MONOTONIC, {365100, 933314386}) = 0
|
||||
clock_gettime(0x3 /* CLOCK_??? */, {31, 850549960}) = 0
|
||||
mmap2(0x30b00000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30b00000
|
||||
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [HUP ILL TRAP KILL USR1 USR2 CHLD TSTP TTIN URG XFSZ VTALRM IO PWR])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
|
||||
clock_gettime(0x2 /* CLOCK_??? */, {56, 575838240}) = 0
|
||||
clock_gettime(CLOCK_MONOTONIC, {365125, 751101804}) = 0
|
||||
clock_gettime(0x3 /* CLOCK_??? */, {56, 574935120}) = 0
|
||||
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [ILL FPE KILL SEGV USR2 PIPE TERM STOP TSTP URG XCPU XFSZ VTALRM])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
sigreturn() = ? (mask now [QUIT ABRT BUS PIPE TERM CONT STOP URG IO PWR])
|
||||
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
|
||||
"""]]
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="sairon"
|
||||
subject="comment 2"
|
||||
date="2015-02-17T15:04:55Z"
|
||||
content="""
|
||||
looks like it was the assistant
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2015-02-17T21:39:23Z"
|
||||
content="""
|
||||
Re finding repos, if the assistant is configured to automatically
|
||||
start managing the repo at boot/login, the repo will be
|
||||
listed in ~/.config/git-annex/autostart
|
||||
"""]]
|
|
@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync.
|
|||
Help me prioritize my work: What special remote would you most like
|
||||
to use with the git-annex assistant?
|
||||
|
||||
[[!poll open=yes 18 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 10 "Box.com (done)" 74 "My phone (or MP3 player)" 25 "Tahoe-LAFS" 13 "OpenStack SWIFT" 36 "Google Drive"]]
|
||||
[[!poll open=yes 18 "Amazon S3 (done)" 12 "Amazon Glacier (done)" 10 "Box.com (done)" 74 "My phone (or MP3 player)" 25 "Tahoe-LAFS" 14 "OpenStack SWIFT" 36 "Google Drive"]]
|
||||
|
||||
This poll is ordered with the options I consider easiest to build
|
||||
listed first. Mostly because git-annex already supports them and they
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
[[!meta title="day 254 sqlite for incremental fsck"]]
|
||||
|
||||
Yesterday I did a little more investigation of key/value stores.
|
||||
I'd love a pure haskell key/value store that didn't buffer everything in
|
||||
memory, and that allowed concurrent readers, and was ACID, and production
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2015-02-17T20:15:36Z"
|
||||
content="""
|
||||
@anarcat, see [[design/caching_database]] for my thinking on that.
|
||||
"""]]
|
34
doc/devblog/day_255__sqlite_concurrent_writers_problem.mdwn
Normal file
34
doc/devblog/day_255__sqlite_concurrent_writers_problem.mdwn
Normal file
|
@ -0,0 +1,34 @@
|
|||
Worked today on making incremental fsck's use of sqlite be safe with
|
||||
multiple concurrent fsck processes.
|
||||
|
||||
The first problem was that having `fsck --incremental` running and starting a
|
||||
new `fsck --incremental` caused it to crash. And with good reason, since
|
||||
starting a new incremental fsck deletes the old database, the old process
|
||||
was left writing to a datbase that had been deleted and recreated out from
|
||||
underneath it. Fixed with some locking.
|
||||
|
||||
Next problem is harder. Sqlite doesn't support multiple concurrent writers
|
||||
at all. One of them will fail to write. It's not even possible to have two
|
||||
processes building up separate transactions at the same time. Before using
|
||||
sqlite, incremental fsck could work perfectly well with multiple fsck
|
||||
processes running concurrently. I'd like to keep that working.
|
||||
|
||||
My partial solution, so far, is to make git-annex buffer writes, and every
|
||||
so often send them all to sqlite at once, in a transaction. So most of the
|
||||
time, nothing is writing to the database. (And if it gets unlucky and
|
||||
a write fails due to a collision with another writer, it can just wait and
|
||||
retry the write later.) This lets multiple processes write to the database
|
||||
successfully.
|
||||
|
||||
But, for the purposes of concurrent, incremental fsck, it's not ideal.
|
||||
Each process doesn't immediately learn of files that another process has
|
||||
checked. So they'll tend to do redundant work. Only way I can see to
|
||||
improve this is to use some other mechanism for short-term IPC between the
|
||||
fsck processes.
|
||||
|
||||
----
|
||||
|
||||
Also, I made `git annex fsck --from remote --incremental` use a different
|
||||
database per remote. This is a real improvement over the sticky bits;
|
||||
multiple incremental fscks can be in progress at once,
|
||||
checking different remotes.
|
|
@ -106,11 +106,14 @@ with appropriate handling of the direct mode files.
|
|||
|
||||
## undoing changes in direct mode
|
||||
|
||||
There is also the `undo` command to do the equivalent of the above revert in a simpler way. Say you made a change in direct mode, the assistant dutifully committed it and you realise your mistake, you can try:
|
||||
There is also the `undo` command to do the equivalent of the above revert
|
||||
in a simpler way. Say you made a change in direct mode, the assistant
|
||||
dutifully committed it and you realise your mistake, you can try:
|
||||
|
||||
git annex undo file
|
||||
|
||||
to revert the last change to `file`. Note that you can use the `--depth` flag to revert earlier versions of the file.
|
||||
to revert the last change to `file`. Note that you can use the `--depth`
|
||||
flag to revert earlier versions of the file.
|
||||
|
||||
## forcing git to use the work tree in direct mode
|
||||
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://id.koumbit.net/anarcat"
|
||||
subject="comment 16"
|
||||
date="2015-02-17T05:22:00Z"
|
||||
content="""
|
||||
i believe this is [answered here](https://git-annex.branchable.com/todo/windows_support/#comment-e72601243c643d7821e68d3a04489fcb). TLDR; basically NTFS + symlink works in Linux, but not in Windows/Cygwin, which git-annex seems to be using. YMMV.
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://www.google.com/accounts/o8/id?id=AItOawnPgn611P6ym5yyL0BS8rUzO0_ZKRldMt0"
|
||||
nickname="Samuel"
|
||||
subject="Reseting to the git-annex branch"
|
||||
date="2015-02-17T09:21:12Z"
|
||||
content="""
|
||||
Well, it appears you explicitely asked for reseting to the git-annex branch with the following command
|
||||
git annex reset --hard git-annex
|
||||
To go back to the master branch, containing the symlinks, just do
|
||||
git annex checkout master
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,24 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2015-02-17T21:31:42Z"
|
||||
content="""
|
||||
There is never a reason to run "git reset --hard git-annex"! For that matter,
|
||||
don't mess with the git-annex branch if you have not read and understand
|
||||
the [[internals]] documentation. Even if you have, it's entirely the wrong
|
||||
thing to be messing with in this situation. It has nothing at all to do
|
||||
with your problem, except that after running that completely random reset
|
||||
command, you now have two problems..
|
||||
|
||||
The right answer to your interrupted add is something like:
|
||||
|
||||
* `git reset --hard master`
|
||||
* Or, run the `git-annex add` command again and let it resume
|
||||
* Or, run `git commit` to commit any changes the add made,
|
||||
followed by `git annex unannex` to back out adding those files.
|
||||
|
||||
Or, if this is an entirely new git repo that you have
|
||||
never committed to before
|
||||
(my guess based on the "bad default revision 'HEAD'"),
|
||||
just `rm -rf .git` and start over.
|
||||
"""]]
|
|
@ -0,0 +1,37 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://id.koumbit.net/anarcat"
|
||||
subject="the actual process i use"
|
||||
date="2015-02-17T00:58:38Z"
|
||||
content="""
|
||||
So it seems i am able to forget all of this within the matter of a few days, and since this is so error prone, here goes a more detailed explanation.
|
||||
|
||||
What I do is:
|
||||
|
||||
<pre>
|
||||
git clone repo repo.test
|
||||
cd repo.test
|
||||
git annex indirect # be safe! this may take a while, but it's necessary!
|
||||
git tag bak # keep track of a good working state
|
||||
git log --stat --stat-count=3 # find the commits we want to trash
|
||||
git tag firstbad badbeef1 # the first commit we want to kill
|
||||
git tag keep dada1234 # the first commit we want to keep
|
||||
git rebase -p --onto firstbad^ keep # drop everything between firstbad (inclusive) and keep (exclusive)
|
||||
git diff --stat keep # make sure this did what we expected
|
||||
git branch -D annex/direct/master synced/master # destroy this old branch that still has refs to the old commits
|
||||
</pre>
|
||||
|
||||
Then for each repo:
|
||||
|
||||
<pre>
|
||||
cd repo
|
||||
git tag bak
|
||||
git fetch origin # sync the master branch in
|
||||
git remote prune origin # make sure the dropped branches are gone
|
||||
git annex indirect # be safe
|
||||
git reset --hard origin/master
|
||||
git branch -D synced/master annex/direct/master
|
||||
git diff --stat bak # should change
|
||||
</pre>
|
||||
|
||||
It would be useful to have that transition propagate properly everywhere so I don't have to do this in every repo, but at least the above should work fairly reliably.
|
||||
"""]]
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 9"""
|
||||
date="2015-02-17T21:43:16Z"
|
||||
content="""
|
||||
It's entirely expected and normal for git-annex to update the UUID
|
||||
of a remote with `url = somepath` when it notices that the repo at
|
||||
`somepath` has changed.
|
||||
|
||||
This is what you want to happen. If git-annex didn't notice and react to
|
||||
the UUID change, its location tracking information (for UUID A) would be
|
||||
inconsistent with the actual status of the repo (using UUID B).
|
||||
"""]]
|
|
@ -0,0 +1,18 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2015-02-17T21:46:01Z"
|
||||
content="""
|
||||
Yes, that's the same, except lookupkey only operates on files that are
|
||||
checked into git.
|
||||
|
||||
(Also, lookupkey will work in a direct mode repo, while such a repo
|
||||
may not have a symlink to examine.)
|
||||
|
||||
25ms doesn't seem bad for a "whole runtime" to fire up. :) I think most of
|
||||
the overhead probably involves reading the git config and running
|
||||
git-ls-files.
|
||||
|
||||
Note that lookupkey can be passed a whole set of files, so you could avoid
|
||||
the startup overhead that way too.
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2015-02-17T21:50:11Z"
|
||||
content="""
|
||||
And yes, it's fine to bypass git-annex when querying git.
|
||||
|
||||
Or even when manipulating the git-annex branch, so long as you either
|
||||
delete or update .git/annex/index. git-annex is not intended to be magical,
|
||||
see [[internals]].
|
||||
"""]]
|
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2015-02-17T21:41:27Z"
|
||||
content="""
|
||||
I would not recommend running the assistant as root. Any security issue
|
||||
would escalate the root access; any bug could result in some root level
|
||||
damage to system.
|
||||
|
||||
Of course, I don't know of any such security issues or bugs. If I did, I'd
|
||||
be fixing them.
|
||||
|
||||
On my system, /usr/local is managed by group staff. It seems much safer to
|
||||
make the assistant be run by some non-root user who is in the staff group.
|
||||
"""]]
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""re: why md5sum?"""
|
||||
date="2015-02-17T21:51:59Z"
|
||||
content="""
|
||||
Not all types of keys contain hashes.
|
||||
"""]]
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://id.koumbit.net/anarcat"
|
||||
subject="document in the manpage?"
|
||||
date="2015-02-17T05:28:33Z"
|
||||
content="""
|
||||
the manpage makes a passing reference to \"groups\", but nowhere in the manpage is there a reference to this page, which i had to find through google. maybe this should be in the manpage?
|
||||
"""]]
|
3
doc/todo/wishlist:_global_progress_status.mdwn
Normal file
3
doc/todo/wishlist:_global_progress_status.mdwn
Normal file
|
@ -0,0 +1,3 @@
|
|||
similar to [[do_not_bug_me_about_intermediate_files]] - i feel that massive `git annex get` operations should have better progress information than the current individual `rsync --progress` bits. i wonder if this couldn't be accomplished with `rsync --info=PROGRESS2`, which gives overall rsync progress, combined with copying multiple files at once with rsync (which would have the side-effect of speeding up `git annex get` for large number of small files).
|
||||
|
||||
once this is done, it could be sent back to the webapp UI to give the user a global sense of the overall sync progress (as opposed to per-file progress). --[[anarcat]]
|
Loading…
Add table
Reference in a new issue