Merge branch 'master' into bs

This commit is contained in:
Joey Hess 2019-12-05 11:41:30 -04:00
commit c7a4411e71
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
26 changed files with 537 additions and 10 deletions

View file

@ -0,0 +1,5 @@
It would be useful to have a [[`git-annex-cat`|forum/Is_there_a___34__git_annex_cat-file__34___type_command__63__/]] command that outputs the contents of an annexed file without storing it in the annex. This [[can be faster|OPT: "bundle" get + check (of checksum) in a single operation]] than `git-annex-get` followed by `cat`, even if file is already present. It avoids some failure modes of `git-annex-get` (like running out of local space, or contending for locks). It supports a common use case of just needing a file for some operation, without needing to remember to drop it later. It could be used to implement a web server or FUSE filesystem that serves git-annex repo files on demand.
If file is not present, or `remote.here.cost` is higher than `remote.someremote.cost` where file is present, `someremote` would get a `TRANSFER` request where the `FILE` argument is a named pipe, and a `cat` of that named pipe would be started.
If file is not annexed, for uniformity `git-annex-cat file` would just call `cat file`.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="reference original bug report"
date="2019-11-29T17:58:28Z"
content="""
original bug report was https://git-annex.branchable.com/bugs/git-lfs_remote_URL_is_not_recorded__63__/ for an attempt to share some NWB data on github's LFS
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="representing paths"
date="2019-11-27T15:08:40Z"
content="""
Thanks for working on this Joey.
I don't know Haskell or git-annex architecture, so my thoughts might make no sense, but I'll post just in case.
\"There are likely quite a few places where a value is converted back and forth several times\" -- as a quick/temp fix, could memoization speed this up? Or memoizing the results of some system calls?
The many filenames flying around often share long prefixes. Could that be used to speed things up? E.g. if they could be represented as pointers into some compact storage, maybe cache performance would improve.
\"git annex find... files fly by much more snappily\" -- does this mean `git-annex-find` is testing each file individually, as opposed to constructing a SQL query to an indexed db? Maybe, simpler `git-annex-find` queries that are fully mappable to SQL queries could be special-cased?
Sorry for naive comments, I'll eventually read up on Haskell and make more sense...
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="parallelization"
date="2019-11-27T17:23:14Z"
content="""
When operating on many files, maybe run N parallel commands where i'th command ignores paths for which `(hash(filename) module N) != i`. Or, if git index has size I, i'th command ignores paths that are not legixographically between `index[(I/N)*i]` and `index[(I/N)*(i+1)]` (for index state at command start). Extending [[git-annex-matching-options]] with `--block=i` would let this be done using `xargs`.
"""]]