Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2019-07-09 10:10:12 -04:00
commit 7d7d68f0c6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 38 additions and 0 deletions

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="comment 1"
date="2019-07-08T16:37:50Z"
content="""
Thanks for the quick fix.
"""]]

View file

@ -0,0 +1,19 @@
### Please describe the problem.
Originally shown/discussed in [local caching](http://git-annex.branchable.com/tips/local_caching_of_annexed_files/#comment-7f214f4eaa629b7731f82014a2e98964) tips page. Decided to give it a separate page for easier tracking etc.
In my use case I have `/home` and `/mnt/btrfs/` as two subvolumes of the same drive
[[!format sh """
/dev/md10 on /mnt/btrfs type btrfs (rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/)
/dev/md10 on /home type btrfs (rw,noatime,compress=lzo,space_cache,subvolid=257,subvol=/home)
"""]]
BTRFS's CoW is a great feature for annex, but whenever I try to `annex get` across those two (git annex version was 7.20190322+git133-g59922f1f4-1~ndall+1) - `rsync` is used instead of `cp`, disks got really busy, and I end up with >70GB of additional space utilization, which is "suboptimal".
As in the original comments thread, I wonder on what is the advantage of using `rsync` over a regular `cp` across devices? (`Device` from `stat` seems to return different ids across volumes, so a bad indicator)
If there is some generic benefit from `rsync`, could it may be at least be a configuration setting which I would set globally on machines with btrfs to use `cp` instead of `rsync` for local transfers?
[[!meta author="yoh"]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="FTR: a dedicated issue on CoW across subvolumes"
date="2019-07-08T19:05:45Z"
content="""
[bugs/Uses_rsync_instead_of___96__cp_--reflink__61__auto__96___on_volumes_of_the_same_BTRFS_partition/](https://git-annex.branchable.com/bugs/Uses_rsync_instead_of___96__cp_--reflink__61__auto__96___on_volumes_of_the_same_BTRFS_partition/)
"""]]

View file

@ -0,0 +1,3 @@
In a number of scenarios (e.g. [[bugs/still_seeing_errors_with_parallel_git-annex-add]], [[bugs/parallel_copy_fails]], [[git-annex-sync/#comment-aceb18109c0a536e04bcdd3aa04bda29]]), `git-annex` operations may fail or hang due to transient conditions. It would help a lot if `git-annex` could be configured to fail timed-out operations, and to retry failed operations after a delay. This would especially help when using `git-annex` in a script or a higher-level tool. I've tried wrapping some retry logic around `git-annex` calls, but it seems `git-annex` itself is in the best position to do that sensibly (e.g. only retrying idempotent operations, or capping retries per remote). This would be a catch-all fix for unusual conditions that are hard to test for.
`git-annex` already has config options `annex.retry` and `annex.retry-delay`, but it seems that they don't cover all failure types.