Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2019-09-21 21:27:06 -04:00
commit 5e787b936b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
13 changed files with 205 additions and 1 deletions

View file

@ -0,0 +1,17 @@
The other day, in the middle of one of my repositories I found a file. There was something strange about this file: it was a regular file, not a symlink like most of the files in my repository.
Maybe it got lost. Maybe I copied it in and forgot to `add` it. So I tried to add it, but it failed. Or, better, it did not give an error but it also did not change a thing. Why?
$ git-annex add -d Cats.mkv
[2019-09-21 12:12:15.243550924] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-09-21 12:12:15.246431114] process done ExitSuccess
[2019-09-21 12:12:15.246610667] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-09-21 12:12:15.249072467] process done ExitSuccess
[2019-09-21 12:12:15.249345571] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--others","--exclude-standard","-z","--","Cats & Dogs (2001).mkv"]
[2019-09-21 12:12:15.254996251] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","ls-files","--modified","-z","--","Cats.mkv"]
[2019-09-21 12:12:15.260730984] read: git ["--git-dir=../../.git","--work-tree=../..","--literal-pathspecs","diff","--name-only","--diff-filter=T","-z","--cached","--","Cats.mkv"]
I also found this same file in all clones of this repository! What is going on?
Is it something to do with locked/unlocked files? I never unlock anything. `git-annex lock Cats.mkv` changes nothing. `git-annex list` shows no files. `git-annex checkpresentkey $(git-annex calckey Cats.mkv)` returns error number 1. So, it seems safe to say that the file is not in the repository and it _wont't_ go into the repository (not like this). I find this very puzzling!
Maybe I could use `reinject` the file into a repository, but I'm trying to diagnose this situation, trying to determine what is going on. However, I'm out of ideas. Any help?

View file

@ -0,0 +1,64 @@
Hello. git-annex has been a wonderful program but I still find the occasional annoyances now and then. These tend to be small issues such as this one that I'm having with only one repository (other clones seem fine):
I use latin-1 directory names, such as "animação". This particular repository insists that all files under that particular directory are untracked. E.g. `"anima\303\247\303\243o/Anita/banner.jpg"`
If I ask git-annex `git-annex whereis animação/Anita/banner.jpg`, it will promptly show it knows where it is. Thus, the data seems to be safe, it appears to be just an issue of git reading the filesystem.
It is quite annoying to see all those hundreds of file names scrolling by every time I sync. Can someone suggest something to help me solve this issue? Here is some more information.
git-annex version: 7.20190912-gab739242a3
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.3 feed-1.2.0.0 ghc-8.6.5 http-client-0.6.4 persistent-sqlite-2.10.5 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
local repository version: 7
The filesystem is an ext4.
tune2fs 1.45.3 (14-Jul-2019)
Filesystem volume name: Toshiba3
Last mounted on: /media/gus/Toshiba3
Filesystem UUID: 8be65206-e627-417b-9ead-2ff0a78bc42e
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 183148544
Block count: 732566385
Reserved block count: 73256
Free blocks: 44132786
Free inodes: 183067035
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 1024
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Tue Jul 11 14:43:53 2017
Last mount time: Sat Sep 21 11:49:39 2019
Last write time: Sat Sep 21 11:49:39 2019
Mount count: 192
Maximum mount count: -1
Last checked: Tue Jul 11 14:43:53 2017
Check interval: 0 (<none>)
Lifetime writes: 3387 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 2f0f1511-b2fd-4d47-a528-c91f3caf50b9
Journal backup: inode blocks

View file

@ -0,0 +1 @@
Of potential interest: [RepoFS: File system view of Git repositories](https://www.sciencedirect.com/science/article/pii/S2352711018300712)

View file

@ -0,0 +1,5 @@
I have a git-annex repository in a portable hard drive. Some times I plug it into one computer, others I plug it into another one. So that it is reachable in both situations, I have it also added as an ssh remote, using the same `annex-uuid`. This works well, but now I always get a fatal error `fatal: Could not read from remote repository.` when I have the same unit plugged in.
May I suggest that this be downgraded to a "warning" or an "info"? After all, the same repository (annex-uuid) was discovered and processed, but at another location (url). It does not need to sound so cataclysmic. Perhaps it does not even need to try the ssh if it found the same repository at a "cheaper" location; and if it is not found here, only declare a failure if it was unavailable at another url.
Thank you for your time (and your fine work).

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="lykos@d125a37d89b1cfac20829f12911656c40cb70018"
nickname="lykos"
avatar="http://cdn.libravatar.org/avatar/085df7b04d3408ba23c19f9c49be9ea2"
subject="comment 1"
date="2019-09-20T20:12:31Z"
content="""
Yes! I've had issues with parallel operations on google remotes. I thought it might be a problem in git-annex-remote-googledrive and I didn't have time yet to investigate it further, so I didn't report it here.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="meanwhile"
date="2019-09-20T18:53:03Z"
content="""
I am [thinking about some abominations](https://github.com/datalad/datalad/issues/3696) to workaround inability to specify smth like
```
licenses/* annex.commit.metadata=(distribution-restrictions=sensitive)
```
in `.gitattributes`. IMHO by limiting to `.commit` the scope when annex should add metadata becomes clear. Sure thing it could leak across files with the same content (0-length files are most prone) but it is unrelated IMHO since a direct call to `git annex metadata licenses/* -s ...` would have the same effect anyways.
"""]]

25
doc/projects/dandi.mdwn Normal file
View file

@ -0,0 +1,25 @@
DANDI [https://dandiarchive.org](https://dandiarchive.org)
==========================================================
DANDI: Distributed Archives for Neurophysiology Data Integration is a platform for publishing, sharing, and processing neurophysiology data funded by the BRAIN Initiative. The platform is under construction with an initial release slated for end of September 2019.
## TODOs
[[!inline pages="todo/* and !todo/done and !link(todo/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0]]
### Done:
[[!inline pages="todo/* and !todo/done and link(todo/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0]]
## BUGs
[[!inline pages="bugs/* and !bugs/done and !link(bugs/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0]]
### Done:
[[!inline pages="bugs/* and !bugs/done and link(bugs/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0]]

View file

@ -0,0 +1,22 @@
ReproNim [https://repronim.org](https://repronim.org)
=====================================================
The center for Reproducible Neuroimaging computation develops standards and tools to improve reproducibility. `git-annex` is used via [DataLad](http://datalad.org) and in the core of efforts such as [YODA](https://github.com/myyoda), [repronim/containers](https://github.com/ReproNim/containers), and heavily used by [ReproMan](https://github.com/ReproNim/reproman/).
## TODOs
[[!inline pages="todo/* and !todo/done and !link(todo/done) and tagged(projects/repronim)" sort=mtime feeds=no actions=yes archive=yes show=0]]
### Done:
[[!inline pages="todo/* and !todo/done and link(todo/done) and tagged(projects/repronim)" sort=mtime feeds=no actions=yes archive=yes show=0]]
## BUGs
[[!inline pages="bugs/* and !bugs/done and !link(bugs/done) and tagged(projects/repronim)" sort=mtime feeds=no actions=yes archive=yes show=0]]
### Done:
[[!inline pages="bugs/* and !bugs/done and link(bugs/done) and tagged(projects/repronim)" sort=mtime feeds=no actions=yes archive=yes show=0]]

View file

@ -13,4 +13,4 @@ proceeds just fine although one of the options is not anything that special remo
At least for the built in special remotes (not external) this should be possible and would help to avoid issues such as [OpenNeuroOrg/datalad-service/issues/67](https://github.com/OpenNeuroOrg/datalad-service/issues/67) etc. Ideally parameters verification should also be provisioned in external special remotes protocol.
[[!meta author=yoh]]
[[!meta project=dandi]]
[[!tag projects/dandi]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 3"
date="2019-09-20T20:46:41Z"
content="""
[git docs](https://git-scm.com/docs/gitattributes#_code_filter_code) say \"Depending on the version that is being filtered, the corresponding file on disk may not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input.\" Are there cases where this could cause problems?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 6"
date="2019-09-20T20:41:25Z"
content="""
\" Improving the interface to let the clean filter read the content of the file itself, rather than it being piped through, would be the best way to improve git add performance\" -- right, but that sounds unlikely near-term?
"""]]

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="found it"
date="2019-09-20T18:56:52Z"
content="""
see [datalad/issues/139](https://github.com/datalad/datalad/issues/139#issuecomment-97948143). Quoting a part of it:
*But I'd like to investigate adding --batch to individual commands first,
since this seems more git-like, and also simpler. It would probably be
helpful to talk about the specific commands you need to call a lot.*
*Things like `git annex lookupkey --batch`, `git-annex readpresentkey --batch`
etc should be able to be spun up and run as long-duration servers, which
you could query as needed, not batched up all at once. This is how
git-annex uses `git cat-file --batch` etc.*
*There's some potential for such a long-running command to either
buffer stale data so it doesn't answer with the current state of the
repository, or for it to buffer changes and not commit them to disk
immediately. For example, a `git annex add --batch` would have the
latter problem.*
*That is actually an argument for only adding --batch mode to specific
commands though, since that would be an opportunity to check thier
behavior. A single `git-annex shell` interface would expose any such
problems in all commands.*
"""]]

View file

@ -9,3 +9,7 @@ To discover more, visit
- [Center for Open Neuroscience](http://centerforopenneuroscience.org)
- [yarikoptic@GitHub](https://github.com/yarikoptic)
- [yarikoptic@Twitter](http://twitter.com/yarikoptic)
# TODO/BUGs pages which should likely be tagged with one of the projects
[[!inline pages="(todo/* or bugs/*) and (author(yoh) or author(mih) or author(ben) or author(yarikoptic) or author(kyle)) and !tagged(projects/*)" sort=mtime feeds=no actions=yes archive=yes show=0]]