Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-02-06 22:13:15 -08:00
commit a92104c8d8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 44 additions and 0 deletions

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="t+gitannex@1d62779e8b54f30a854739f61542a6885167b01f"
nickname="t+gitannex"
avatar="http://cdn.libravatar.org/avatar/87c7f62c00e4a744aa500423e421120f"
subject="comment 6"
date="2020-02-06T11:07:34Z"
content="""
I'm able to reproduce this with git annex 7.20191230 and git 2.25.0 on Arch Linux, but I've had it on OSX in the past as well. The annex uses a v7 repository. I don't need to do anything besides unlocking some files and running git status. Unlocking 10 files, git status takes 3s and with 85 files it takes 20s, so it seems to scale linearly with the no of files. Happy to share more details about the repository if it's useful.
"""]]

View file

@ -0,0 +1 @@
I've implemented true resumable upload in git-annex-remote-googledrive which means that uploads can, just as downloads, be resumed at any point, even within one chunk. However, it currently does not work with encrypted files (or chunks) due to the non-deterministic nature of GPG. In order to make this feature useable on encrypted files, I propose to not overwrite encrypted files which are already present inside the `tmp` directory.

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="potential security issues?"
date="2020-02-06T21:00:55Z"
content="""
I wonder if storing checksums in a general-purpose mutable metadata field may cause security issues. Someone could use the [[`git-annex-metadata`|git-annex-metadata]] command to overwrite the checksum. It should be stored in a read-only field written only by `git-annex` itself, like the `field-lastchanged` metadata already is.
Of course, if someone is able to write the [[git-annex branch|internals#The_git-annex_branch]] directly, or get the user to pull merges to it, they could alter the checksum stored there. Maybe, only trust stored checksums if `merge.verifySignatures=true`?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="aborting stuck operations so they can be retried"
date="2020-02-05T16:39:36Z"
content="""
\"The only way to guarantee such an abort is to kill the whole git-annex process and let the signal reap its children\" -- then maybe the initial `git-annex` command can be made a wrapper that starts a separate `git-annex` process to do the actual work, monitors its progress, and kills/reaps/restarts it if it gets stuck? Or `-Jn` could work by starting up several separate git-annex processes, [[each handling a subset of files|parallel_possibilities/#comment-304240ba804513291c1a996b8eb3fd1c]], and the original process could kill/reap/restart any sub-process that gets stuck. This of course presumes idempotent operations.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="retries due to locked index file"
date="2020-02-05T16:59:40Z"
content="""
\"A locked git index file does not prevent git-annex from making transfers\" -- by \"mask transient failures\" I meant all types of failures, not just transfers. So e.g. if concurrent operations fail due to contention for the index file lock, retries (after increasing, randomized intervals) could mask the failure. This would help especially for writing scripts/tools on top of git-annex. Logically, some operations -- like `git-annex-add` -- should never fail, and being able to assume that makes scripting easier.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="example of where retries could help"
date="2020-02-05T22:19:26Z"
content="""
As one example, I just had a `git-annex-copy` command fail twice with `git-annex: thread blocked indefinitely in an STM transaction`, then have the same command succeed (or at least get much further -- still running) on the third try. I can write my own wrappers to mask such errors, but a built-in implementation seems generally useful and would know better which failures are likely transient.
"""]]