Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2022-06-06 11:55:02 -04:00
commit 7851d8fb42
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 120 additions and 0 deletions

View file

@ -0,0 +1,25 @@
[[!comment format=mdwn
username="Atemu"
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
subject="comment 16"
date="2022-06-05T12:55:04Z"
content="""
Nodatacow is not an option. It disables most safetly guarentees of btrfs (also w.r.t. RAID I believe) and is essentially a hack. As pointed out, it's also difficult to enable retroactively.
Autodefrag is fundamentally broken and might result in overall worse performance. Also not an option.
If a file is highly fragmented (which it might not be, amount of extents isn't an accurate representation fragmentation on btrfs), the correct measure is to `btrfs filesystem defragment` it.
If git-annex db fragmentation really is a common problem on btrfs (again, you'd have to do actual performance tests, I'm not aware of other means of measuring fragmentation), perhaps an automatic defrag similar to git's automatic gc would be appropriate. Keys DB files range in the Megabytes IME, so re-writing them every now and then shouldn't be a huge load.
A file's fragmentation on-disk also shouldn't cause issues on DB writes because btrfs is CoW; new writes don't go where the old data was. It can only impact read performance.
Free space fragmentation might impact write performance but that's not something a userspace program can or should solve. @yarikoptic try `btrfs balance start -dusage=10`.
I'm not sure I understand the situation 100% but poor DB write performance with hangouts and stalls can also be caused by long filesystem commits.
Writing a ton of file data to the filesystem only puts it in the cache and it gets written later. When that \"later\" occurs (usually after 30s), as I understand it, all of that data (potentially megabytes-gigabytes) needs to be written to disk *before* a DB's synchronous write returns.
This is essentially what git-annex does when getting data from a remote; it \"writes\" a file's data while downloading (without committing, so no disk activity whatsoever) and then does a syncs at the end (lots of disk activity) before starting the next download. A DB commit at the end can take for as long as (up to) 30s worth of transferred data takes to write. With a fast source/pipe, that could potentially be a lot of data.
Btrfs itself also has issues with insane commit times under some conditions (dedup especially), compounding the issue.
It might be worth looking into the timings of `git annex get`'s write operations using `iostat --human 1`.
"""]]

View file

@ -0,0 +1,41 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 17"
date="2022-06-06T14:00:05Z"
content="""
FWIW4Joey:
> I have made a standalone tarball built that way available here: https://downloads.kitenet.net/git-annex/linux/debuglocks/
I have tried that version. Placing similar load I just got some `move`s to fail ([our issue+idea to just call multiple times](https://github.com/dandi/dandisets/issues/176)) and some other oddities (not new, yet to figure out) but overall -- I have not spotted similar MVar messages :-/ (filing a comment to \"jinx\" it, rerunning again -- seeing some `get`'s going ;))
4Atemu:
> I'm not aware of other means of measuring fragmentation
isn't https://github.com/kilobyte/compsize the one reporting number of fragments, thus degree of fragmentation? As I summarized in [comment above](http://git-annex.branchable.com/bugs/get_is_busy_doing_nothing/#comment-9c647c8d9837d46e45675de30ebfeefc) I have used it + `btrfs fi defrag`.
> It might be worth looking into the timings of git annex get's write operations using iostat --human 1.
well -- it confirms that there is \"write silence\" for a while followed by a burst, e.g.:
```
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
dm-0 50.00 800.0k 0.0k 0.0k 800.0k 0.0k 0.0k
dm-0 9.00 144.0k 0.0k 0.0k 144.0k 0.0k 0.0k
dm-0 0.00 0.0k 0.0k 0.0k 0.0k 0.0k 0.0k
dm-0 0.00 0.0k 0.0k 0.0k 0.0k 0.0k 0.0k
dm-0 0.00 0.0k 0.0k 0.0k 0.0k 0.0k 0.0k
dm-0 0.00 0.0k 0.0k 0.0k 0.0k 0.0k 0.0k
dm-0 0.00 0.0k 0.0k 0.0k 0.0k 0.0k 0.0k
dm-0 17.00 0.0k 207.3M 0.0k 0.0k 207.3M 0.0k
dm-0 50.00 4.0M 319.9M 0.0k 4.0M 319.9M 0.0k
dm-0 111.00 12.6M 365.8M 0.0k 12.6M 365.8M 0.0k
dm-0 99.00 2.9M 685.5M 0.0k 2.9M 685.5M 0.0k
dm-0 107.00 1.6M 10.4M 0.0k 1.6M 10.4M 0.0k
dm-0 236.00 3.7M 0.0k 0.0k 3.7M 0.0k 0.0k
dm-0 330.00 5.2M 0.0k 0.0k 5.2M 0.0k 0.0k
dm-0 323.00 5.0M 0.0k 0.0k 5.0M 0.0k 0.0k
```
"""]]

View file

@ -0,0 +1,41 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 18"
date="2022-06-06T14:50:16Z"
content="""
> I have made a standalone tarball built that way available here: https://downloads.kitenet.net/git-annex/linux/debuglocks/
> It should display a backtrace on stderr when the MVar deadlock happens.
jinxing helped ... thought I thought to complain that I don't see any traceback but apparently it is due to `-J5` (I guess) and lines being ovrewritten, but I managed to `Ctrl-s` at a point showing
```
get 0/0/0/3/5/50 (from web...)
MVar deadlock detected CallStack (from HasCallStack):
debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
thread blocked indefinitely in an MVar operation
(Delaying 1s before retrying....)
```
which when I let it go became smth like
```
get 0/0/0/3/0/98 (from web...)
thread blocked indefinitely in an MVar operation
(Delaying 1s before retrying....)
ok
get 0/0/0/3/1/111 (from web...) ok
```
and reconfirmed for those in screenlog I quickly collected:
```
(base) dandi@drogon:~$ grep debugLocks screenlog.3
debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
debugLocks, called at ./Database/Queue.hs:55:30 in main:Database.Queue
```
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="aurelia@b44312a63326710de6cea9c43290e5debbd55607"
nickname="aurelia"
avatar="http://cdn.libravatar.org/avatar/818bf579caf9992f9123bd9b58321b2b"
subject="comment 6"
date="2022-06-06T12:38:26Z"
content="""
The biggest reason to use age over PGP seems to be in the simplicity / attack surface. It deliberately does not include options to combat complexity and insecure configurations. It also has a lot less baggage and complexity than PGP: obscure packet-based format, web of trust, subkeys - age does a single thing, and it does it well. I do have a use case for hybrid encryption, but I'd rather not touch GPG ever again if I don't need to. Just the squabble about importing keys without identities makes me want to stay far far away. Age keys handle like SSH keys, so if you have a strategy for those age fits into your workflow very easily.
Age also supports passphrase derived keys now, so the \"shared\" use case is covered.
"""]]