move old fixed datalad/dandi/repronim bugs to the project pages

This is to cut down on the number of files in bugs/, which makes it slow
to file new bug reports or update active bug reports. These old bugs
were about 1/3rd of the files in there. These projects want lists of
their old bugs to still be accessible, and have the lists on their
project pages, which will still list the old bugs.

Commands used:

for f in $(git grep -l '\[\[!tag projects/dandi\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/dandi/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/dandi/bugs-done; fi; fi; done
for f in $(git grep -l '\[\[!tag projects/repronim\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/repronim/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/repronim/bugs-done; fi; fi; done
for f in $(git grep -l '\[\[!tag projects/datalad\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/datalad/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/datalad/bugs-done; fi; fi; done

That assumes that bugs are not tagged by multiple projects at the same
time. Of the ones I moved, I've checked and none are.

Could do the same with todo/ but there are only 370 files in there, and
less than 84 of them could be moved this way, which does not seem likely
to produce a sizeable speedup.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2023-01-05 13:16:15 -04:00
parent 946fc20165
commit bcc69f07e8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
1011 changed files with 4 additions and 4 deletions

View file

@ -21,6 +21,6 @@ DANDI: Distributed Archives for Neurophysiology Data Integration is a platform f
<details>
<summary>Done</summary>
[[!inline pages="bugs/* and !bugs/done and link(bugs/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist]]
[[!inline pages="(bugs/* or projects/dandi/bugs-done/*) and !bugs/done and link(bugs/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist]]
</details>

View file

@ -0,0 +1,49 @@
### Please describe the problem.
Original complaints could be found mentioned in the comments of the [importfeed page](https://git-annex.branchable.com/git-annex-importfeed/): when using `addurl`, and even when the server provides Content-Disposition field with the filename, git-annex seems (BTW -- no Content-Disposition was mentioned in the --debug output) to take that filename value and obfuscates it (replaces '-' with '_' etc) to what supposed to be the original filename.
[[!format sh """
$> mkdir /tmp/testrepo; cd /tmp/testrepo; git init; git annex init;
mkdir: cannot create directory /tmp/testrepo: File exists
E: could not determine git repository root
Initialized empty Git repository in /tmp/testrepo/.git/
init ok
(recording state in git...)
$> git annex addurl --fast https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download
addurl https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download (to sub_mouse_AAYYT_ses_20180420_sample_2_slice_20180420_slice_2_cell_20180420_sample_2.nwb) ok
(recording state in git...)
$> ls -l
total 4
lrwxrwxrwx 1 yoh yoh 184 May 7 17:02 sub_mouse_AAYYT_ses_20180420_sample_2_slice_20180420_slice_2_cell_20180420_sample_2.nwb -> .git/annex/objects/Gj/9z/URL-s9335000--https&c%%girder.dandiarchive.org-48163bc503cb7181516be86ef215f923/URL-s9335000--https&c%%girder.dandiarchive.org-48163bc503cb7181516be86ef215f923
"""]]]
whenever original content-disposition was having "-" in the filename, which are perfectly safe the filename AFAIK:
[[!format sh """
$> wget -S https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download
... bunch of forwards to the final one with the content disposition field
Resolving dandiarchive.s3.amazonaws.com (dandiarchive.s3.amazonaws.com)... 52.219.101.51
Connecting to dandiarchive.s3.amazonaws.com (dandiarchive.s3.amazonaws.com)|52.219.101.51|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
x-amz-id-2: VgJE1jV5XUkBQXZDWgR5WEDfmHJp4Fj6fGo6z2tYkLfyTsxDWC+m92B2qOSVppCuiRFu2QpNV5M=
x-amz-request-id: 1221CAC30E3931CF
Date: Thu, 07 May 2020 21:02:52 GMT
Last-Modified: Wed, 22 Apr 2020 00:54:32 GMT
ETag: "acf3b4f5951435245a0efcd4a518e77d"
Content-Disposition: attachment; filename="sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2.nwb"
...
$> git annex version
git-annex version: 7.20190708+git9-gfa3524b95-1~ndall+1
"""]]
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,42 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-05-08T16:50:14Z"
content="""
This is due to the filename being passed through sanitizeFilePath.
There are security concerns here. If the filename contains "../"
it absolutely has to be modified, or the command would have to fail and
refuse the import it.
If the filename contains an ANSI escape sequence, it could potentially
lead to a security hole. Or if the filename starts with "-" it could be
somewhere between a possible security hole and just very annoying to work
with. As could a filename that contains a newline, which will
break large quantities of shell pipelines. While generally git repos can
have these problems with files in them too, the exposure seems larger when
talking to some random web server than when pulling from a repo.
Also, cross filesystem compatibility is a concern. It used to allow "|" in
the filename, but a bug pointed out that cannot be used on fat filesystems.
And "\\" means different things on linux and windows, so probably best to avoid
filenames containing it on linux too.
Finally, it's somewhat opinionated, since it replaces spaces with
underscores. That's certainly the least defensible thing.
(git-annex may also truncate the filename if it's longer than what the
filesystem supports.)
So, it's clearly wrong that it should be taken as-is without obfuscation,
IMHO. Maybe there's a way to improve it to meet some use case though.
I could see having a config that avoids sanitizing the filename, but
makes addurl fail if the filename looks like a security problem.
Though that has the downside that git-annex would then need to
comprehensively track, going forward, all the ways that people find to make
filenames be a security problem; the current method, by being strict in
what it lets through, probably limits expoits to ones involving a) unicode
or b) the user's wetware.
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-05-08T18:19:20Z"
content="""
`git-annex import` does not do any sanitization, and that could be
considered inconsistent, particularly when importing from a remote like S3.
A difference with that is, it creates a remote tracking branch for the
imported files. (That happens to avoid "../" path traversal because git
generally avoids it.) Maybe the real difference is, import from a special
remote is completely analagous to fetching from a git remote. So it feels
different to me than adding an url does.
If I sync with a S3 bucket and it turns out it imported a escape sequence
file, well I could have looked at the bucket first, or imported and
reviewed the branch before merging it. And if I was syncing with a git
remote the same thing could happen. So it feels like I should have no
expectation git-annex would protect me. Whereis, if I add an url and the
web server uses an obscure-ish http header to surprise me with a similar
malicious filename, I had no way before hand to know that would happen, and
so it does feel like git-annex should protect me.
(Although if git did prevent that, git-annex should too, and I'd be
fine with git preventing that.)
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-05-08T19:56:27Z"
content="""
Implemented git-annex addurl --preserve-filename, which will do what you
want.
Leaving this bug open because I only implemented it for web urls, not yet
for torrents and other special remotes that have their own url scheme.
The sanitization for those is currently done at a lower level than addurl,
and so that will take a bit more work to implement.
(importfeed does not, I think, need to implement this option, because
the filenames are based on information from the rss feed, and it's
perfectly fine to sanitize eg a podcast episode title to get a reasonable
filename.)
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2020-05-09T22:10:43Z"
content="""
> If the filename contains an ANSI escape sequence, it could potentially lead to a security hole.
> ... As could a filename that contains a newline, which will break large quantities of shell pipelines.
IMHO those indeed are ok to target for sanitization
> Or if the filename starts with \"-\" it could be somewhere between a possible security hole and just very annoying to work with.
So why not to sanitize it only at the beginning of the filename?
`-` is a very common and a safe character to use within filename. For that matter we VERY frequently use `-` in filenames. It even became part of our BIDS standard in neuroimaging: https://bids-specification.readthedocs.io where we separate `_key` from `value`, e.g.in ` . I really do not see why git-annex should so aggressively sanitize filenames as replacing \"-\" within filenames -- it makes nothing more secure or convenient.
> While generally git repos can have these problems with files in them too, the exposure seems larger when talking to some random web server than when pulling from a repo.
Well, not sure about ansi characters and new line symbols, but typically files are saved by the browsers with the name suggested by the server.
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2020-05-11T17:20:07Z"
content="""
I agree that it may as well allow non-leading '-'. But, if you are relying
on getting the unsanitized filename generally, you should use
--preserve-filename
Web browsers do do some santization, particulary of '/'.
Chrome removes leading "." as well. Often files are downloaded
without the user confirming it. I suspect there is enough insecurity
in that area that someone could make a living injecting bitcoin miners into
dotfiles.
"""]]

View file

@ -0,0 +1,6 @@
While running `git-annex addurl --batch --with-files --jobs 10 --json --json-error-messages --json-progress --raw`, I occasionally run into files that fail to download for no discernable reason, and the `"error-messages"` key in the output from the command is an empty list. This makes it hard to figure out exactly why the download is failing.
[[!meta author=jwodder]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-10-27T16:23:52Z"
content="""
Is it reproducible with a particular url? Does it only happen with -J?
Version would also be good to know. There were recent relevant
changes eg [[!commit 4f42292b13dc5a6664eeb19b5c9d48991eaef292]].
I've spent a while hunting for a code path where it fails without
displaying a warning, and have not found one. Since the code in addurl
is structured as return Nothing and hopefully display a warning
beforehand, rather than as throw an error, it's certianly possible that
happens.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="jwodder"
avatar="http://cdn.libravatar.org/avatar/b06e01332c949b895c681cc92934f36a"
subject="comment 2"
date="2021-10-27T18:16:43Z"
content="""
It appears that the problem occurs whenever one tries to download the same URL to two different paths at the same time. When this occurs, one of the downloads fails, and though its \"error-messages\" is empty, its \"notes\" field reads, \"transfer already in progress, or unable to take transfer lock\".
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="jwodder"
avatar="http://cdn.libravatar.org/avatar/b06e01332c949b895c681cc92934f36a"
subject="comment 3"
date="2021-10-27T18:19:23Z"
content="""
As to your questions, I am using git-annex 8.20211011 on macOS 11.6. The problem does not occur when the `--jobs` option is omitted, but that's not viable for the current project we're using git-annex for.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-10-27T18:40:48Z"
content="""
Aha, that makes sense! addurl constructs a url-based Key to use while
downloading, and the key transfer machinery prevents redundant downloads
of the same Key at the same time.
Arguably, the problem is not where the message gets put, but that
it fails when adding an url to two different paths at the same time.
I have, though, moved that message so it will appear in error-messages.
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-10-27T18:56:23Z"
content="""
The best solution I can find is for it to notice when another thread is
downloading the same url, and wait until it finishes. Then proceed
with downloading the url for a second time.
It's not very satisfying to re-download. But once the url Key is downloaded,
it does not keep that url Key populated, but hashes the content and moves
the content to the final Key. It would be a real complication to
communicate, across threads, what Key the content ended up at, and have the
waiting thread use that. And addurl is already complicated well beyond a
point I am comfortable with.
Also, the content of an url can of course change over time. If I feed
"$url foo" into git-annex addurl --batch -J10 and then some time
later, I feed "$url bar", I might expect that file bar gets whatever
content the url has now, not the content that the url had back when I added
the same url to file foo. And if I cared about avoiding re-downloading,
I could add the url to the first file, and then copy the annex link to the
second file myself.
Implemented this approach.
"""]]

View file

@ -0,0 +1,21 @@
### Please describe the problem.
This is a continuation to the [prior report/discussion](https://git-annex.branchable.com/bugs/leaks_git_config_error_message_upon_inability_to_read_downloaded___34__config__34___file/#comment-424548e59fc41618ffeeb65f418694b3) to facilitate access to private repositories on public hosting portals.
If we place more odd/custom behavior of gitlab etc installations which forward to login screen (thus no 401 or 404 response) upon attempt to access something which might be within private rep, aside, the situation with github and gogs (github clone) which powers gin (which I had [mentioned](https://git-annex.branchable.com/bugs/leaks_git_config_error_message_upon_inability_to_read_downloaded___34__config__34___file/#comment-ec2193d97bb19945ad74cee13f747b35) in that prior discussion)) is different: they return 404 response. And I think (didn't check git code, but just based on its behavior) `git` is then asking for credentials as the "next way to try". I think git-annex should do the same -- if 404 received, ask `git credential` to fill for that domain (as it would do now in case of 401).
### What steps will reproduce the problem?
Try to clone and get data from a private repository on [https://gin.g-node.org/](https://gin.g-node.org/) (repo could be created, or let me know and I would create one, but you would still need to register there). I am not yet 100% certain that upon authentication you would be able to fetch that `/config` (haven't tried). Satellite issue/discussion I just initiated on gin is [here](https://github.com/G-Node/gogs/issues/111)
### What version of git-annex are you using? On what operating system?
8.20201127+git54-ga1b227171-1~ndall+1
edit 1: although probably a deeper look into how/why git decides to ask for credentials for private repos might be due. May be similar check should be done by git-annex first, since otherwise there might be no way to tell apart from a "proper" 404 for inability to get `/config` from github
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[notabug|done]] --[[Joey]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-01-21T16:57:06Z"
content="""
The git source code does not appear to behave
like that, see http.c `normalize_curl_result`, which reauths on 401, but
not on 404. If you think git behaves like this, you need to show an example
where it clearly accesses an url that is 404 and goes on to authenticate.
Seems to me that these hosting sites may simply not be exposing foo.git/config
to http. Git does not request that file over http. Such a hosting site would
probably also not expose foo.git/annex/ over http, so git-annex would not be
able to use it anyway. To support git-annex, it would need to
expose both, and then git-annex's handling of 401 should work fine for
authentication.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2021-01-21T18:36:50Z"
content="""
a quick one: https://gin.g-node.org/ does expose `foo.git/annex/` -- that is what gin has extended original borg with. Example repo to try on https://gin.g-node.org/ljchang/Sherlock . The problem/difficulty is only in access to \"private\" repositories -- access to config and annexed files is working fine through http
"""]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-01-21T19:20:00Z"
content="""
It still seems easy to demonstrate that git does not ask for creds on 404:
joey@darkstar:~> git clone http://google.com/this-url-does-not-exist
Cloning into 'this-url-does-not-exist'...
fatal: repository 'http://google.com/this-url-does-not-exist/' not found
So I need you to show me what makes you think that git does such a strange
thing, before I can take seriously a request to replicate that behavior in
git-annex. Because the only possible reason I would implement such an
insane thing is if git has lost its collective mind and so I needed to
follow into the abyss.
If the actual issue is that gogs has implemented support for git-annex,
but that it sends 404 when git-annex requests config from a
private repo, rather than 401, it seems to me the place to fix that is in
gogs.
"""]]

View file

@ -0,0 +1,112 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2021-01-22T01:47:46Z"
content="""
yeap, it is not about 404 ...
<details>
<summary>with gogs/gin situation is obscure but \"easyish\" - 401 is returned upon access to `/info/refs` but not above:</summary>
```shell
$> wget -S \"https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info/refs\"
--2021-01-21 20:37:22-- https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info/refs
Resolving gin.g-node.org (gin.g-node.org)... 141.84.41.219
Connecting to gin.g-node.org (gin.g-node.org)|141.84.41.219|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 401 Unauthorized
Date: Fri, 22 Jan 2021 01:37:23 GMT
Server: Apache/2.4.38 (Debian)
content-type: text/plain
www-authenticate: Basic realm=\".\"
content-length: 0
set-cookie: lang=en-US; Path=/; Max-Age=2147483647
set-cookie: gnode_gin=823b677f19feb8ef; Path=/; HttpOnly
set-cookie: _csrf=GrekbiqDJleLLNcVyax5z77buGY6MTYxMTI3OTQ0MzYwMTMyMzE4NQ; Path=/; Expires=Sat, 23 Jan 2021 01:37:23 GMT
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Username/Password Authentication Failed.
1 51975 ->6 [2].....................................:Thu 21 Jan 2021 08:37:23 PM EST:.
(git)lena:~/proj/misc/git[master]git
$> wget -S \"https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info\"
--2021-01-21 20:37:52-- https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info
Resolving gin.g-node.org (gin.g-node.org)... 141.84.41.219
Connecting to gin.g-node.org (gin.g-node.org)|141.84.41.219|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 404 Not Found
Date: Fri, 22 Jan 2021 01:37:53 GMT
Server: Apache/2.4.38 (Debian)
content-type: text/html; charset=UTF-8
set-cookie: lang=en-US; Path=/; Max-Age=2147483647
set-cookie: gnode_gin=26d42c5108c8715d; Path=/; HttpOnly
set-cookie: _csrf=SAKUL4rdspufTb_lxEWIijnzYBU6MTYxMTI3OTQ3Mjk5MDczODgzMA; Path=/; Expires=Sat, 23 Jan 2021 01:37:52 GMT
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
2021-01-21 20:37:53 ERROR 404: Not Found.
```
</details>
github is ... trickier, or to say -- my C/gdb/whatever foo is not good enough, since
<details>
<summary>it is still 404 with simple wget but git remote-https seems to get 401:</summary>
```shell
(gdb) p results
$15 = {curl_result = CURLE_HTTP_RETURNED_ERROR, http_code = 401, auth_avail = 1, http_connectcode = 0}
(gdb) p rl
No symbol \"rl\" in current context.
(gdb) p url
$16 = 0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\"
(gdb) bt
#0 http_request (url=0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\",
result=<optimized out>, target=<optimized out>, options=0x7fffffffd920) at http.c:1981
#1 0x00005555555665bf in http_request_reauth (
url=0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\", result=0x7fffffffd880,
target=0, options=0x7fffffffd920) at http.c:2040
#2 0x000055555555f7f3 in discover_refs (service=<optimized out>, service@entry=0x5555556b622c \"git-upload-pack\",
for_push=for_push@entry=0) at remote-curl.c:493
#3 0x000055555556137e in get_refs (for_push=<optimized out>) at remote-curl.c:548
#4 cmd_main (argc=argc@entry=3, argv=argv@entry=0x7fffffffdcd8) at remote-curl.c:1523
#5 0x000055555555ee94 in main (argc=3, argv=0x7fffffffdcd8) at common-main.c:52
```
```
$> wget --header \"Git-Protocol: version=2\" --header \"Pragma: no-cache\" -S 'https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack'
--2021-01-21 20:41:21-- https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 404 Not Found
Server: GitHub.com
Date: Fri, 22 Jan 2021 01:41:21 GMT
Content-Type: text/plain; charset=utf-8
Status: 404 Not Found
Vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With
Cache-Control: no-cache
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Expect-CT: max-age=2592000, report-uri=\"https://api.github.com/_private/browser/errors\"
Content-Security-Policy: default-src 'none'; base-uri 'self'; connect-src 'self'; form-action 'self'; img-src 'self' data:; script-src 'self'; style-src 'unsafe-inline'
Set-Cookie: _gh_sess=UoF3mYOvfYf5mFbK1tr7aWOuYpQbNoJVhajA5nr2ANUvg%2FekQjtgh0h3xLva0EcwHnLNNsl7VMEdVLXNGi9Yn4AbjrBxX0sdo51DL1XQYR%2Bm3ZeS71I7keexEnrZspp%2FQxaT7cJpceXr7ZrKg2HwJu8dMo%2Bcz13Vr%2F9p7MtZ6cIjUMMF3ql8GX%2BYO949RdgS31KNBb1Ln917v7GlLaZhbejgGAYJOFI2YMuWhs3WkZxOZCMy1JnW%2Bbp3OcdyffBt0ToaKaLcUx1mt6kzzOb4Ow%3D%3D--FD5dTEIs8HUBjIdH--P%2B86pTRJ%2FwWUndICVXAaNA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
Set-Cookie: _octo=GH1.1.1513753117.1611279681; Path=/; Domain=github.com; Expires=Sat, 22 Jan 2022 01:41:21 GMT; Secure; SameSite=Lax
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sat, 22 Jan 2022 01:41:21 GMT; HttpOnly; Secure; SameSite=Lax
Content-Length: 9
X-GitHub-Request-Id: 8F40:2881:CD3AD3:1222997:600A2D41
2021-01-21 20:41:21 ERROR 404: Not Found.
```
</details>
but overall the point is that git does seems to get 401 with auth availability (although I failed to dig out how exactly it gets it). So I will leave it to the experts to figure out how
"""]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-01-22T18:36:50Z"
content="""
These possibilities seem about equally likely to me:
1. gogs has not implemented authed access to the files git-annex needs
for private repositories
2. gogs has a bug where it returns 404 rather than 401 when not authed,
but serves the files up when authed.
So why try to work around it in git-annex when it's a coin flip whether
git-annex can at all, when in either case there's clearly a bug in gogs,
and is specifically in code in gogs that is intended to support git-annex?
github has a bad habit of using user-agent to make urls do different
things when git accesses them than when other http clients do. That is the
case in your example; use wget -U git/1 and it will 401. But I don't
see how that's relevant, since git-annex does not talk to github except for
a) via git and b) via its git-lfs implementation (which supports http basic
auth although I can't remember if I tested it against github's server or only
other servers like gitlab).
If github's lfs endpoint did do user-agent sniffing, IMHO that would
violate their spec, but also yeah, I'd probably put in some appropiately
snarky fake user-agent in git-annex there. But not in general, and none of
this says git-annex should be treating 404 like 401.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2021-01-25T15:18:39Z"
content="""
THANK YOU Joey. That is indeed quite odd (\"security through obscurity\") behavior from github (note: github returns 401 even if that repo does not exist, so it is at least consistent in not revealing presence/absence of private repos at a url). Feel welcome to close this issue since I guess nothing should indeed be done on git-annex side, and ideally `gin` portal just returns 401 in such cases
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2021-01-28T16:37:59Z"
content="""
github's rationalle for the sniffing, such as it is, is that an url to a
git repository lets you view it in the web ui, and the same url can be
cloned by git.
Agreed, I'll close this in git-annex, and they can fix it in gin.
"""]]

View file

@ -0,0 +1,88 @@
### Please describe the problem.
decided to test annex on a new to me file system -- beegfs
```
$> mount | grep beegfs
beegfs_nodev on /mnt/beegfs type beegfs (rw,relatime,cfgFile=/etc/beegfs/beegfs-client.conf,_netdev)
```
```
$> modinfo beegfs
filename: /lib/modules/5.4.0-77-generic/updates/fs/beegfs_autobuild/beegfs.ko
version: 7.2.2
alias: fs-beegfs
author: Fraunhofer ITWM, CC-HPC
description: BeeGFS parallel file system client (http://www.beegfs.com)
license: GPL v2
srcversion: 533BB7E5866E52F63B9ACCB
depends: ib_core,rdma_cm
retpoline: Y
name: beegfs
vermagic: 5.4.0-77-generic SMP mod_unload modversions
```
### What steps will reproduce the problem?
1. get beegfs
2.
```
leviathan:/mnt/beegfs/yoh/tmp
$> TMPDIR=$PWD/annex-tmp git annex test
```
### What version of git-annex are you using? On what operating system?
```
leviathan:/mnt/beegfs/yoh/tmp
$> git annex version
git-annex version: 8.20210621-g91f9aac
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
```
### Please provide any additional information below.
looking in detail -- it seems it is not init, but addurl (but subject is set in stone now, can't edit) -- got mislead I guess by the interleaving stdout/err:
[[!format sh """
addurl: FAIL (2.79s)
Init Tests
init: ./Test/Framework.hs:57:
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo96/myurl failed (transcript follows)
(to _mnt_beegfs_yoh_tmp_.t_tmprepo96_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo96%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
...
addurl: FAIL (1.86s)
./Test/Framework.hs:57:
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo193/myurl failed (transcript follows)
(to _mnt_beegfs_yoh_tmp_.t_tmprepo193_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo193%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
Init Tests
...
addurl: FAIL (2.29s)
./Test/Framework.hs:57:
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo293/myurl failed (transcript follows)
(to _mnt_beegfs_yoh_tmp_.t_tmprepo293_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo293%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
3 out of 984 tests failed (1776.96s)
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
on days ending with `y` it seems to work quite nicely.
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]], I think, though have not installed beegfs to test.
> --[[Joey]]

View file

@ -0,0 +1,23 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-07-02T14:26:34Z"
content="""
EBUSY The rename fails because oldpath or new
path is a directory that is in use by
some process (perhaps as current working
directory, or as root directory, or be
cause it was open for reading) or is in
use by the system (for example as mount
point), while the system considers this
an error. (Note that there is no re
quirement to return EBUSY in such cases—
there is nothing wrong with doing the
rename anyway—but it is allowed to re
turn EBUSY if the system cannot other
wise handle such situations.)
".git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo193%myurl"
is not a directory, it is a file. So, rename seems to have no business failing
in this way. Probably the FS is buggy.
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2021-07-04T03:27:20Z"
content="""
Thank you Joey! indeed most likely a \"too fancy\" of a file system.
On [https://www.beegfs.io/release/beegfs_6/Changelog.txt](https://www.beegfs.io/release/beegfs_6/Changelog.txt) I found
```
== Changes in 6.11 (release date: 2017-05-26) ==
General Changes:
* client: Add option sysRenameEbusyAsXdev to return EXDEV instead of EBUSY if
rename() is called on open files. (Tools like \"mv\" can handle EXDEV as return
value.)
```
do you think EXDEV would be worked out Ok if that is the culprit? (meanwhile I will let the beegfs users know as well - may be they could try)
"""]]

View file

@ -0,0 +1,50 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-07-05T16:18:39Z"
content="""
I've checked with strace, to see if the file was open while it was being
renamed. Not that there is anything generally wrong with renaming an open
file on a POSIX file system, but it would possibly be a problem on windows,
where some forms of opening a file locks it in place. And apparently
this filesystem is not trying to be very POSIX either.
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 17
413026 write(17, "hi\n", 3) = 3
413026 close(17) = 0
...
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 11
413026 read(11, "hi\n", 8192) = 3
...
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_RDONLY|O_NOCTTY|O_NONBLOCK <unfinished ...>
413028 <... futex resumed>) = 0
413026 <... openat resumed>) = 16
...
413026 read(16, "hi\n", 32752) = 3
...
413026 close(16) = 0
...
413026 rename(".git/annex/tmp/URL-s3--file&c%%%tmp%foo", "_tmp_foo") = 0
...
413028 close(11) = 0
So the file is left open across the rename, which ought to be able to be
changed and would presumably fix the problem.
It's also a bit odd that the file gets read twice after being copied,
once for checksum makes sense, but what's the other one?
(Copying while checksumming should be able to avoid one of the reads,
but there is an open todo tracking progress on that.)
Aah, the other read is when it's probing if the file is html in case it ought
to be passed off to youtube-dl. That is the read that lingers for a while,
because it's done with a lazy readFile and probing if the file is html doesn't
read to the end and close it, so the file handle lingers until the GC gets
around to closing it. Of course youtube-dl won't be able to do anything with a
file url, but git-annex doesn't know that. And anyway the failure on this
filesystem would also happen when adding a http url.
Ok, fixed it to close the handle promptly. That should fix the test suite.
It does not seem unlikely that something else will break due to this
filesystem's unusual behavior though.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-07-05T17:17:59Z"
content="""
Also looked over other uses of readFile. While there are a couple that
don't read the whole file and so may have a lag closing, none of them are
files that are used in ways that seem likely to trigger this kind of
problem.
"""]]

View file

@ -0,0 +1,28 @@
### Please describe the problem.
Probably it is more of a todo than a bug.
### What steps will reproduce the problem?
This is a use-case where I am trying to establish a special remote to be shared by multiple unrelated repositories.
So I had original repo1 in which I
- created an external special remote with chunking, it got UUID1
- uploaded some data (all got chunked)
created repo2 in which I
- initialized special remote with identical settings and provided `uuid=UUID1`
- decided to test if annex would be able to get a key from the shared special remote
but `annex fsck --key KEY --from remote --fast`, since it doesn't have an exact chunking list, just provides special remote backend with original full key only, which is obviously not found, and it reports failure. But I wondered -- couldn't `git-annex` just use chunking size and "mint" possible chunked-keys to test on the special remote since it has all the information? After all chunk keys AFAIK are deterministically minted and pretty much are just "augmented" original key with `-S<chunksize>-C<chunkindex>` added to the key.
### What version of git-annex are you using? On what operating system?
8.20200908+git175-g95d02d6e2-1~ndall+1
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-10-22T16:09:17Z"
content="""
Note that what you are trying to do will only work if the special remote
is not encrypted.
As well as your use case, which seems very unusual, I think one other use
case would be if a clone uploaded to the special remote, but never synced
out its git-annex branch before being lost, and fsck --from
remote is being run in another clone to reconstruct it. Currently it
won't try chunks as none are recorded.
Speculatively trying the current remote's chunk config would handle the
majority of cases, though wouldn't help if the other clone had adjusted the
special remote's chunk size too.
There's some overhead, but it can check it last, and not check it if
it's in the list of known chunks, so the overhead would only usually
be paid if the content git-annex expected to be present had gone missing,
which I think is rare enough to not care about.
(Also, this can only be done when the size of the key is known, so not
eg addurl --relaxed keys.)
"""]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-10-22T17:00:25Z"
content="""
Implemented that. But..
As implemented, there's nothing to make the chunk size get stored in the
chunk log for a key, after it accesses its content using the configured
chunk size.
So, changing the chunk= of the remote can prevent accessing content that
was accessible before. Of course, avoiding that is why chunk sizes are
logged in the first place.
Seems like maybe fsck --from should fix the chunk log? I think
fsck would always need to be used, to fix up the location log, before any
other commands rely on the data being in the special remote, so it seems
fine to only fix the chunk log there.
But, also a bit unclear how fsck would find out when it needs to do this.
It only needs to when the remote's configured chunk size is not
listed in the chunk log. But that's also common after changing the chunk
size of a remote. So it would have to mess around with checking the
presence of chunk keys itself, which would be extra work and also ugly
to implement.
I'm leaving this todo^Wbug open for now due to this.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-10-22T17:36:12Z"
content="""
Ok, made update the chunk log as needed while checking if chunks are
present. So this is done.
"""]]

View file

@ -0,0 +1,119 @@
### Please describe the problem.
I was trying to follow https://git-annex.branchable.com/special_remotes/git-lfs/ (only without any encryption), to store at least some data on github via LFS (e.g., for https://github.com/dandi-datasets/nwb_test_data).
Even though I do provide URL to the `annex initremote` call, it is not stored within `remote.log`:
[[!format sh """
$> sudo rm -rf /tmp/testds2 && ( mkdir /tmp/testds2 && cd /tmp/testds2 && git init && git annex init && git annex initremote gh-lfs autoenable=true type=git-lfs url=git@github.com:yarikoptic/testds2.git encryption=none && git show git-annex:remote.log; )
Initialized empty Git repository in /tmp/testds2/.git/
init (scanning for unlocked files...)
ok
(recording state in git...)
initremote gh-lfs ok
(recording state in git...)
c9132e68-e9d8-40b5-ba34-5d60a8b9c844 autoenable=true encryption=none name=gh-lfs type=git-lfs timestamp=1570642576.06742667s
"""]]
git annex 7.20190912-1~ndall+1
If I just proceed, populate and copy some data via lfs (example uses datalad's `create-sibling-github` to create a new repo):
[[!format sh """
$> ( cd /tmp/testds2 && touch 123 && git annex add 123 && git commit -m 'add 123' && datalad create-sibling-github -s origin testds2 && git push -u origin master && git annex copy --to=gh-lfs 123; git push origin git-annex; )
add 123
ok
(recording state in git...)
[master (root-commit) d2b2f52] add 123
1 file changed, 1 insertion(+)
create mode 120000 123
[WARNING] Authentication failed using a token.
.: origin(-) [https://github.com/yarikoptic/testds2.git (git)]
'https://github.com/yarikoptic/testds2.git' configured as sibling 'origin' for <Dataset path=/tmp/testds2>
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 307 bytes | 307.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To github.com:yarikoptic/testds2.git
* [new branch] master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.
copy 123 (to gh-lfs...)
ok
(recording state in git...)
Enumerating objects: 19, done.
Counting objects: 100% (19/19), done.
Delta compression using up to 4 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (19/19), 1.66 KiB | 567.00 KiB/s, done.
Total 19 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), done.
remote:
remote: Create a pull request for 'git-annex' on GitHub by visiting:
remote: https://github.com/yarikoptic/testds2/pull/new/git-annex
remote:
To github.com:yarikoptic/testds2.git
* [new branch] git-annex -> git-annex
"""]]
on a new clone I get a complaint that `url=` is missing, and no data is fetched
[[!format sh """
$> sudo rm -rf testds2-clone && git clone git@github.com:yarikoptic/testds2.git testds2-clone && ( cd testds2-clone && git annex init && git annex get 123; )
Cloning into 'testds2-clone'...
remote: Enumerating objects: 22, done.
remote: Counting objects: 100% (22/22), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 22 (delta 5), reused 21 (delta 4), pack-reused 0
Receiving objects: 100% (22/22), done.
Resolving deltas: 100% (5/5), done.
123@
init (merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
Invalid command: 'git-annex-shell 'configlist' '/~/yarikoptic/testds2.git''
You appear to be using ssh to clone a git:// URL.
Make sure your core.gitProxy config option and the
GIT_PROXY_COMMAND environment variable are NOT set.
Remote origin does not have git-annex installed; setting annex-ignore
This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
(Auto enabling special remote gh-lfs...)
Specify url=
ok
(recording state in git...)
get 123 (not available)
Try making some of these repositories available:
92ce3cfc-8c58-42db-8aa3-ea4d4b3a6011 -- yoh@hopa:/tmp/testds2
c9132e68-e9d8-40b5-ba34-5d60a8b9c844 -- gh-lfs
(Note that these git remotes have annex-ignore set: origin)
failed
git-annex: get: 1 failed
"""]]
so I had to enableremote it while providing URL I become able to `get` the file:
[[!format sh """
$> git annex enableremote gh-lfs autoenable=true type=git-lfs url=git@github.com:yarikoptic/testds2.git encryption=none && git annex get 123
enableremote gh-lfs ok
(recording state in git...)
get 123 (from gh-lfs...)
(checksum...) ok
(recording state in git...)
"""]]
Shouldn't that URL be recorded in remote.log? (similarly to `type=git` remotes)
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]]; see my comment --[[Joey]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-10-21T19:07:42Z"
content="""
That is intentional, because a git-lfs remote can have multiple urls that
can access it, and different users of the remote might want to use
different urls.
It's also documented to work that way, the same as the directory
special remote documents that you have to provide directory= each time it's
enabled.
But, now that git-annex supports sameas remotes, it would be possible to
have one special remote for each different url to a given git-lfs remote,
and have git-annex know they're the same repository. The user can then
enableremote whichever one they want.
See [[todo/git-lfs_special_remote_simpler_setup]] for where I hope this
will lead.
Closing this bug report as redundant with that todo item, and not actually a
bug since it is documented to behave the way it currently behaves.
"""]]

View file

@ -0,0 +1,49 @@
### Please describe the problem.
I am trying to import (and then reimport) a directory which I sync to from box.com shared with me folder.
I have used `--duplicate` option to not delete original files upon `import`. But then upon-rerunning `import` command git-annex would error out if file already exists. `--reinject-duplicates` seems to be the option to use, but all those modes are "exclusive" so I cannot use `--duplicate --reinject-duplicates`, and using `--reinject-duplicates` alone would result in removing original files (as without `--duplicates`)
### What version of git-annex are you using? On what operating system?
7.20190819+git2-g908476a9b-1~ndall+1
### Please provide any additional information below.
my little demo snippet for import with using --duplicate and then both options at the same time:
[[!format sh """
$> mkdir /tmp/d-in /tmp/d-repo && touch /tmp/d-in/file && ( cd /tmp/d-repo && git init && git annex init && for r in 1 2; do echo "Run $r"; ls -l ../d-in && git annex import --duplicate ../d-in/.; done )
Initialized empty Git repository in /tmp/d-repo/.git/
init ok
(recording state in git...)
Run 1
total 0
-rw------- 1 yoh yoh 0 Oct 14 10:51 file
import ./file ok
(recording state in git...)
Run 2
total 0
-rw------- 1 yoh yoh 0 Oct 14 10:51 file
import ./file
not overwriting existing ./file (is a symlink)
failed
git-annex: import: 1 failed
$> cd d-repo
$> git annex import ../d-in/. --reinject-duplicates --duplicate 2>&1 | head -n 3
Invalid option `--duplicate'
Usage: git-annex COMMAND
"""]]
Or may be there is a better way to establish re-runnable import from a directory workflow?
[[!meta author=yoh]]
[[!tag projects/dandi]]
[[!tag moreinfo]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-11-19T17:12:41Z"
content="""
I think that you can accomplish what you want by making the directory
you're importing from be a directory special remote with exporttree=yes
importtree=yes and use the new `git annex import master --from remote`
If that does not do what you want, I'd prefer to look at making it be able
to do so. I hope to eventually remove the legacy git-annex import from
directory, since we have this new more general interface.
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-03-30T15:50:17Z"
content="""
Tagged moreinfo since I'm waiting on a reply to my suggestion.
"""]]

View file

@ -0,0 +1,59 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2020-10-06T01:26:59Z"
content="""
I think it worked wonderfully
<details>
<summary>here is my script I have tried</summary>
```shell
#!/bin/bash
export PS4='> '
set -x
set -eu
cd \"$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)\"
mkdir d-in d-repo
echo content >| d-in/file
function dance() {
git annex import master --from d-in
# but we need to merge it
git merge d-in/master
ls -l
grep -e . *
}
(
cd d-repo
git init
git annex init
git annex initremote d-in type=directory directory=../d-in exporttree=yes importtree=yes encryption=none
ls -l ../d-in
for r in 1 2; do
echo \"Run $r\";
dance
done
echo \"more\" >> ../d-in/file
echo \"new\" > ../d-in/newfile
dance
rm ../d-in/file
dance
)
```
</details>
and it seemed to do the right job! I have not tried to add some `.gitattributes` into that branch it imports into to tell some files to go to git, but I hope it would just work, and if not -- I will come back! feel welcome to close this issue.
Cheers
"""]]

View file

@ -0,0 +1,70 @@
### Please describe the problem.
[original question raised by John](https://github.com/dandi/dandisets/issues/139#issuecomment-1149948239) which lead me to the goose chase.
Following reproducer
```
#!/bin/bash
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
set -eux
git init --bare remote
( cd remote; git annex init; cat config )
rpath=$PWD/remote
git init repo
cd repo
git annex init
echo 'This is test text.' > file.txt
git add file.txt
git commit -m Init file.txt
git remote add --fetch remote-git $rpath
# without this -- there is no annex-uuid for remote -- git-annex branch is not getting merged
git annex info
cat .git/config
# but this still fails
git annex initremote testremote type=git location=$rpath autoenable=true
```
ends with
```
[remote "remote-git"]
url = /home/yoh/.tmp/dl-VjO0aSF/remote
fetch = +refs/heads/*:refs/remotes/remote-git/*
annex-uuid = afdc6d54-cd6d-4a20-b639-a639f9c7ef09
+ git annex initremote testremote type=git location=/home/yoh/.tmp/dl-VjO0aSF/remote autoenable=true
initremote testremote
git-annex: could not find existing git remote with specified location
failed
initremote: 1 failed
```
so
- error "could not find existing git remote with specified location" seems not descriptive of the underlying problem since location matches the url. Underlying issue is still not clear why we can't initremote
- as you could see in the script - need `annex info` to have annex-uuid populated and looking at [code ](https://git.kitenet.net/index.cgi/git-annex.git/tree/Remote/Git.hs?id=af0d854460c28230dc682faa7c6daf3d96698cb6#n110) comment -- it requires UUID to be known. If not known -- ideally should be a dedicated error message ("remote blah found but lacks uuid, check if remote is annex")
- IMHO should not need manual `annex info` to merge git-annex branch
### What steps will reproduce the problem?
above
### What version of git-annex are you using? On what operating system?
10.20220504
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-06-08T16:55:50Z"
content="""
Hmm, I think this only works for ssh:// urls currently.
Even the ssh url form host:/path does not work, because it gets
normalized to a ssh:// url.
The implementation does not support non-url's at all; the provided location
is treated as an url (`Git.Url location`). And even if it were treated as a
path, the path gets normalized to a relative path and an absolute path (or
differently relavatized path) would not work.
Using paths with this is rather problematic too, because if the repo is
cloned to another machine, it would not find the repo at the recorded path.
Similarly, relative paths are also problimatic. But it may as well support
them to the extent it can.
I think this needs changes to the core Git data structure, to store the
original, unmodified git.remote.path. Or a different interface than the
current, one that accepts any repo location and probes it to find the uuid.
The latter idea seems better because it simplifies the UI rather than
complicating the internal representation.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2022-06-09T17:04:23Z"
content="""
Implemented probing of the uuid of the repo location. Which may change
how you use this feature. Although the old roundabout method of having an
existing git remote and running initremote with the same location will
work too, it's not neccessary to do that anymore.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="comment 2"
date="2022-06-09T02:32:18Z"
content="""
Wouldn't it be possible to support (absolute) file:// urls, eg. something similar to
`file:///home/jkniiv/test-VEfBrTZ/remote2`? In my mind they feel like a reasonable approximation
of ssh:// urls and could be useful for getting a feel for git special remotes before setting
up a bare git-repo/annex on an ssh-server. I know they are not the same thing implementation wise
but I feel that being able to try this feature out on a least-effort basis would be useful
from a pedagogical standpoint.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2022-06-09T17:28:19Z"
content="""
Re file:// urls, it does now work to use them in location=. I don't know if
I'd consider using them any better than absolute paths though. YMMV.
"""]]

View file

@ -0,0 +1,150 @@
[[!meta title="http remotes that require authentication are not yet supported"]]
It is not a ground shaking issue, but probably would be best to handle it more gracefully.
Initially mentioned while doing install using datalad. Account/permission is required to access this particular repo, ask Canadians for access if you don't have it yet Joey. credentials I guess got asked for and cached by git upon initial invocation, so upon subsequent calls didn't ask for any:
[[!format sh """
$> datalad install https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
[INFO ] Cloning https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids [1 other candidates] into '/tmp/Coffey-mri-bids'
[INFO ] fatal: bad config line 1 in file /home/yoh/.tmp/git-annex96493-5.tmp
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
install(ok): /tmp/Coffey-mri-bids (dataset)
"""]]
which boiled down to that message being spited out during `git annex init` which samples the remote, but fails to download the config and gets instead a redirected html page:
[[!format sh """
$> git clone https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
Cloning into 'Coffey-mri-bids'...
warning: redirecting to https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids.git/
remote: Enumerating objects: 398, done.
remote: Counting objects: 100% (398/398), done.
remote: Compressing objects: 100% (282/282), done.
remote: Total 398 (delta 53), reused 393 (delta 48)
Receiving objects: 100% (398/398), 34.97 KiB | 795.00 KiB/s, done.
Resolving deltas: 100% (53/53), done.
$> git -C Coffey-mri-bids annex init --debug
...
[2019-11-27 19:27:01.341315979] Request {
host = "git.bic.mni.mcgill.ca"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/7.20190819+git2-g908476a9b-1~ndall+1")]
path = "/bic/Coffey-mri-bids/config"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
[2019-11-27 19:27:01.90016181] read: git ["config","--null","--list","--file","/home/yoh/.tmp/git-annex228094-5.tmp"]
fatal: bad config line 1 in file /home/yoh/.tmp/git-annex228094-5.tmp
[2019-11-27 19:27:01.913302324] process done ExitFailure 128
Remote origin not usable by git-annex; setting annex-ignore
$> wget -S https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
--2019-11-27 19:29:25-- https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
Resolving git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)... 132.216.133.92
Connecting to git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)|132.216.133.92|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 302 Found
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 109
Connection: keep-alive
Cache-Control: no-cache
Location: https://git.bic.mni.mcgill.ca/users/sign_in
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; secure; HttpOnly
X-Request-Id: xTcSyu4H36
X-Runtime: 0.071681
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Location: https://git.bic.mni.mcgill.ca/users/sign_in [following]
--2019-11-27 19:29:26-- https://git.bic.mni.mcgill.ca/users/sign_in
Reusing existing connection to git.bic.mni.mcgill.ca:443.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=0, private, must-revalidate
Etag: W/"305857ff0ba591a1e4ee7fec83b5687c"
Referrer-Policy: strict-origin-when-cross-origin
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; expires=Thu, 28 Nov 2019 02:29:26 -0000; secure; HttpOnly
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: DENY
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: MHFi7Yjxe82
X-Runtime: 0.063359
X-Ua-Compatible: IE=edge
X-Xss-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Length: unspecified [text/html]
Saving to: config
config [ <=> ] 13.19K --.-KB/s in 0s
2019-11-27 19:29:26 (89.1 MB/s) - config saved [13505]
$> cat config
<!DOCTYPE html>
<html class="devise-layout-html">
<head prefix="og: http://ogp.me/ns#">
<meta charset="utf-8">
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="object" property="og:type">
<meta content="GitLab" property="og:site_name">
<meta content="Sign in" property="og:title">
...
"""]]
I guess the problem is multi-faceted:
1. in case of authenticated http remote, `git` caches credentials, but then `git annex` tries to download file directly (instead of somehow via git), it could not "sense" that remote to be a valid annex and/or get files from it.
You can try with this simple one -- user "demo", password "demo":
[[!format sh """
$> git clone http://www.onerussian.com/tmp/secret-repo/.git
Cloning into 'secret-repo'...
Username for 'http://www.onerussian.com': demo
Password for 'http://demo@www.onerussian.com':
$> git -C secret-repo annex init
init (merging origin/git-annex into git-annex...)
(recording state in git...)
Remote origin not usable by git-annex; setting annex-ignore
ok
(recording state in git...)
"""]]
although remote is a proper annex, indeed `git annex` cannot use it since does not authenticate as git does.
So even though the error message is not incorrect, I would say the situation is suboptimal
2. if remote server instead of just returning 404 or 403 error code (as eg github seems to do in similar cases of non-authenticated access) instead redirects to some login page, annex feeds that page as a config to git, ignores the error message and just marks that remote as ignored for annex, while leaking that obscure "fatal" error message from git.
IMHO, ideally 1. should be addressed properly (authentication), and for 2. annex should spit out some more sensible message ("git failed to parse a config file fetched from the remote X. Please inspect it at this /path/config"), so keep that file around for debugging. As it is now I had to dig quite deep to figure out WTF is going on.
git annex 7.20190819+git2-g908476a9b-1~ndall+1 and the same with bleeding edge 7.20191114+git43-ge29663773-1~ndall+1 (probably that commit is the one with my patch for stricter git versioning, so use the count of 42 ;))
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]]; the error message is improved and also git remotes that need
> http basic auth to access will get password from `git credential`.
> --[[Joey]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="related: shouldn't git annex try external remotes to download config?"
date="2019-11-28T01:22:53Z"
content="""
I haven't tested, but I can see the situation where a specific repository URL could be handled by external special remote (such as datalad, downloaders of which do handle obscure setups such as this one without 403/404 but rather forwarding to login page) which would provide authenticated access to the URL. Would annex even try that config URL via external special remotes?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2019-11-29T18:09:45Z"
content="""
one of the use-cases (will be) https://gin.g-node.org/ -- an archive of (primarily) electrophys data. The platform is based on gogs, but uses git-annex underneath. It \"will be\" because currently access to git-annex is provided only via ssh, but as of today it is already possible to `git clone` (tried on public, didn't try private) datasets via https, and developers are looking into exposing git-annex also via http. To access private datasets authentication will need to be handled
"""]]

View file

@ -0,0 +1,31 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-01-22T16:04:37Z"
content="""
git-annex could use `git credential` if the config download fails with
401 unauthorized and then retry with the credentials. (The git-lfs special
remote already does this.) And it would also need to do the same thing
when getting a key from the remote.
But that would not help with the https://git.bic.mni.mcgill.ca example,
apparently, because there's no 401, but a 302 redirect to a 200,
that is indistingishable from a successful download.
Yeah, when git-annex expects a git config, if it doesn't parse as one,
it could retry, asking for credentials.
But that seems asking for trouble: what if it fails to parse for
another reason, maybe the web server served up something other than the
expected config, maybe a captive portal got in the way. There would be a
username/password prompt that doesn't make sense to the user at all.
And if this happens in a key download, git-annex certianly has no way to
tell that what it downloaded is not intended as the content of a key,
short of verifying the content, and failure to verify certainly doesn't
justify prompting for a username/password.
So, I am not comfortable with falling back to ask for credentials unless
I've seen a http status code that indicates they are necessary.
And IMHO gitlab's use of a 302 redirect to a login page is a bug in
gitlab, and will need to be fixed there, or a better http server used.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""re: related: shouldn't git annex try external remotes to download config?"""
date="2020-01-22T16:31:16Z"
content="""
No, the external special remote protocol is not aimed at downloading git
config files. Anyway, this code path is never involved with using
special remotes; the uuid of a special remote is known and so there is no
need to ever download a git config file to discover it.
"""]]

View file

@ -0,0 +1,48 @@
### Please describe the problem.
May be not a problem per se, but decided to check if expected. Following [this advise](http://git-annex.branchable.com/todo/git_smudge_clean_interface_suboptiomal/#comment-65f848510d8684bf65c6698f68b700dd) I have `git config filter.annex.process "git-annex filter-process"` in that git-annex repo and now observe following tree (in htop) of processes:
```
3799768 dandi 20 0 1025G 191M 40616 S 6.6 0.3 0:31.87 │ │ ├─ git-annex addurl --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw
3799796 dandi 20 0 191M 5088 4680 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3805272 dandi 20 0 6892 3420 2992 S 0.0 0.0 0:00.27 │ │ │ ├─ /bin/bash /usr/bin/git-annex-remote-rclone
3805640 dandi 20 0 20432 13032 4024 S 0.0 0.0 0:02.82 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
3805646 dandi 20 0 20432 13044 4036 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
3805650 dandi 20 0 31900 4064 3816 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805685 dandi 20 0 30144 4000 3752 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805704 dandi 20 0 30144 16076 15792 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805705 dandi 20 0 30144 3976 3728 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805717 dandi 20 0 30144 15968 15680 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805781 dandi 20 0 30144 3980 3724 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805786 dandi 20 0 30144 4068 3820 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805807 dandi 20 0 30144 16028 15744 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805808 dandi 20 0 30144 3884 3636 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805828 dandi 20 0 30144 4008 3764 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805848 dandi 20 0 20432 13104 4092 S 0.0 0.0 0:00.04 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
3805852 dandi 20 0 20432 12948 3940 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
3805865 dandi 20 0 20432 13032 4024 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
3806054 dandi 20 0 30144 4004 3752 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806066 dandi 20 0 45216 5108 4700 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3806067 dandi 20 0 30144 3888 3640 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806068 dandi 20 0 30144 16032 15748 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806095 dandi 20 0 30144 4060 3816 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806104 dandi 20 0 20432 12928 3916 S 0.0 0.0 0:00.06 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
3806110 dandi 20 0 30144 15944 15660 S 0.0 0.0 0:00.02 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3804258 dandi 20 0 1024G 44336 37772 S 0.0 0.1 0:00.04 │ │ ├─ git-annex addurl --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw
3804277 dandi 20 0 40844 5124 4740 S 0.0 0.0 0:00.00 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3805399 dandi 20 0 1024G 23508 20844 S 0.0 0.0 0:00.61 │ │ ├─ git-annex examinekey --batch --migrate-to-backend=SHA256E
3805493 dandi 20 0 1024G 36516 26184 S 0.0 0.1 0:01.51 │ │ ├─ git-annex fromkey --force --batch --json --json-error-messages
3805503 dandi 20 0 25788 5120 4712 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3805510 dandi 20 0 12472 3984 3732 S 0.0 0.0 0:00.05 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
```
which might be ok but still wonder why they are just sleeping there in more than one per `--jobs` number quantities. git annex 10.20220624-g769be12
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]]; this is now handled like other git helper processes
> and will be capped to the maximum of the number of jobs or cpu cores,
> and in practice usually fewer than that will be started. --[[Joey]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-07-25T20:37:55Z"
content="""
I was able to reproduce this by feeding 10 urls into git-annex addurl
-J5 and got 7 hash-object processes running.
filter.annex.process has nothing to do with this. I reproduced the behavior
without it set.
Seems like a simple concurrency issue, where each thread potentially starts
its own hash-object handle, and there can be around 2x as many threads
started as the -J number due to job stages. Annex.Concurrent sets up pools of
handles for other similar git processes, but not hash-object.
"""]]

View file

@ -0,0 +1,8 @@
(Sorry about the title; I was trying to work within the character limit.)
When invoking `git-annex metadata --batch --json --json-error-messages`, if an error occurs in response to some input — say, because the name of a nonexistent file was supplied (or, in my case, because the name of a file downloaded milliseconds ago in a parallel addurl process was supplied) — then `git-annex metadata` will output "git-annex: not an annexed file: {filepath}" to standard error and immediately exit. Not only is this in contrast to what it seems `--json-error-messages` should do, but the "exiting immediately" bit is in contrast to my understanding of how batch mode is supposed to work. Surely this should be fixed?
[[!meta author=jwodder]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-11-01T16:27:48Z"
content="""
For consistency with other --batch, I've made it reply with a blank line
when the input is not an annexed file.
Do note that --json-error-messages cannot cram every possible kind of error
message into a json object. In particular, errors that occur at startup,
and not when acting on a particular file or key, do not fit into the json
schema.
"""]]

View file

@ -0,0 +1,44 @@
### Please describe the problem.
From [https://github.com/DanielDent/git-annex-remote-rclone/pull/57](https://github.com/DanielDent/git-annex-remote-rclone/pull/57), where we use that rclone special remote for backup of DANDI data to dropbox
Seems like a test sometimes fails on Mac OS with:
```
+ git-annex copy -J5 --quiet . --to GA-rclone-CI
git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
copy: 1 failed
Error: Process completed with exit code 1.
```
indeed so far seemed to happen only on Mac
```
(git)smaug:/mnt/datasets/datalad/ci/git-annex-remote-rclone[master]2022
$> datalad foreach-dataset git grep 'file is locked'
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone]
03/cron/20221003T064418/da57e9a/github-Tests-144-failed/9_test (macos-latest, v1.53.3).txt:2022-10-03T06:47:44.4978580Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/cron/20221003T064418/da57e9a/github-Tests-144-failed/test (macos-latest, v1.53.3)/9_tests.txt:2022-10-03T06:47:44.4978530Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/10_test (macos-latest, v1.33).txt:2022-10-03T23:35:41.8464390Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/9_test (macos-latest, v1.53.3).txt:2022-10-03T23:37:44.0652500Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/test (macos-latest, v1.33)/9_tests.txt:2022-10-03T23:35:41.8463970Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/test (macos-latest, v1.53.3)/9_tests.txt:2022-10-03T23:37:44.0652360Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
foreach-dataset(ok): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/10 (dataset)
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/06 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/06]
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/07 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/07]
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/09 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/09]
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/08 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/08]
```
### What steps will reproduce the problem?
no minimal reproducer yet but happens as part of [this test "script"](https://github.com/DanielDent/git-annex-remote-rclone/blob/master/tests/all-in-one.sh)
### What version of git-annex are you using? On what operating system?
git-annex version: 10.20220927
[[!meta author=yoh]]
[[!tag projects/dandi]]
> Presumed [[fixed|done]]; please followup if I'm wrong. --[[Joey]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-10-07T16:44:04Z"
content="""
I doubt this is really OSX specific. This must be two threads running logMove
at the same time, that end up trying to both write or one write and one
read at the same time. That causes the haskell RTS to fail this way.
Since it does use a lock file when writing and appending to the log file,
I think it must be the call to checkLogFile that is failing. That avoids
taking the lock, for performance reasons. The performace gain is pretty
minimal though, taking the lock is not much. Only when modifyLogFile
is called at the same time might it need to block on the file being
rewritten, but the file only ever has 100 items, so that never takes long
either.
So, I have added locking to checkLogFile (and to calcLogFile though it's
not used here, just because it has the same problem). That should fix it,
though we'll need to wait on the test to know for sure. I'm going to close
this, as I'm pretty sure though..
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2022-11-04T12:41:47Z"
content="""
ok, did the archaeologic expedition to figure when fixed -- was fixed in [10.20221003-19-g4a42c6909 AKA 10.20221103~28](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=4a42c69092a03cce7b31b79b862e59c9842ced77) , brew still (well -- we are just 1 day post release! ;)) has 10.20221003 so in testing git-annex-remote-rclone we keep getting hit but hopefully it would go away soon with update of git-annex in brew.
"""]]

View file

@ -0,0 +1,96 @@
### Please describe the problem.
git status reports having staged changes and no changes from index
```shell
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use "git add" and/or "git commit -a")
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json
```
although git shows no diff and sha256 checksum corresponds to the key:
```shell
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date: Fri Sep 16 22:22:29 2022 +0000
[backups2datalad] 66 files added
diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
```
I think may be the tricky part is that I have it of
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config annex.version
10
```
although I thought that we kept it at 8 but I have user wider config setting
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config filter.annex.process
git-annex filter-process
```
I was recommended to speed up operations while avoiding upgrade to 10, but I guess running most recent version once lead to the upgrade since all the other repos are still at 8 as I thought it would be
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ grep -h '\<version =' ../*/.git/config | sort | uniq -c
1 version = 10
186 version = 8
```
having it reported modified causes our script which does sanity check to operate only on clean repo to fail.
`git reset --hard` seems mitigated that
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git reset --hard
HEAD is now at b859efed7d [backups2datalad] 66 files added
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
```
all. I will now rerun our script and see in what state I would end up (although, once again, I ended up in version 10 of the repo already, so may be behavior would be different).
### What steps will reproduce the problem?
I think I get it after I `annex move` and then `annex get` that file back. Just for my own reference -- git-annex repo is result of the https://github.com/dandi/dandisets/blob/draft/tools/backups2datalad-update-cron
### What version of git-annex are you using? On what operating system?
10.20220822-g84f1875 (conda build), originally observed on earlier 10.20220724-ge30d846
[[!meta author=yoh]]
[[!tag projects/dandi]]
[[!meta title="annex.stalldetection prevents git-annex get from restaging unlocked files"]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 10"
date="2022-09-22T17:34:35Z"
content="""
damn, I should have shared my config! I also do have `annex.stalldetection` set!
```
[annex]
stalldetection = 1KB/120s
```
never thought it might be related. We should look into having some matrix test run with such config set.
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 11"""
date="2022-09-22T17:38:45Z"
content="""
Yeah, a whole git-annex test run with stalldetection set would have found
this bug. Which seems a bit heavy-weight for the test suite to try as a
separate pass by default. But then again, stalldetection does significantly
change how git-annex operates since it has to fork off child processes that
it can kill when they stall.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 12"
date="2022-09-22T18:14:15Z"
content="""
Adding a matrix run where I initiated a custom config settings to our [datalad/git-annex](https://github.com/datalad/git-annex/pull/133) CI run. Let's see how that goes. May be some other interesting config settings to add there? e.g. retries etc? or global `~/.gitconfig` is not used/mocked away during tests? (e.g. we do that in datalad, so I had to trick that in [PR against datalad](https://github.com/datalad/datalad/pull/7056) to test against this setting being set)
"""]]

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="joey"
subject="""comment 12"""
date="2022-09-22T17:40:57Z"
content="""
So, `git-annex transferrer`, after downloading the content, does handle
populating pointer files. So it calls restagePointerFile to register a cleanup
action.
Whatever is making that process exit 1 must be preventing the cleanup
action from being run. And I think what that is, is that its stdout handle
gets closed at the same time its stdin handle is closed. I tried running
`git-annex transferrer` manually and feeding it a transfer request on
stdin. After its stdin was closed, it proceeded to send
`"om (recording state in git...)\n"` to stdout, and that would fail
with stdout already closed.
Worse, I suspect there's another problem.. When a stall actually
is detected, git-annex kills the `git-annex transferrer` process that has
stalled. But suppose that process has already successfully downloaded some
content and populated pointer files. Killing it would prevent it from
running restagePointerFile on those. It seems that to solve this,
it would need to communicate back to the parent what pointer files need to
be restaged. (Which would also solve the exit 1 problem, although not
necessarily in the best way.)
Also, I think that multiple processes running the restagePointerFile
cleanup action at the same time can be a problem, because one will
lock the index and the rest will fail to restage. Not what's happening
here, but with -J, there would be multiple `git-annex transferrer`
processes doing that at the same time at the end.
"""]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 13"""
date="2022-09-22T18:16:22Z"
content="""
Avoided the early stdout handle close, and that did fix this bug as
reported.
The related problems I identified in comment #12 are still unfixed, so
leaving this open for now.
I think what ought to be done to wrap this up is make restagePointerFile
record the files that need to be restaged in a log file. Then at shutdown,
git-annex can read the log file, and restage everything listed in it.
This will solve multiple problems:
* When a previous git-annex process was interrupted after a get/drop of an
unlocked file, the file will be in the log, so git-annex can notice
that and handle the restaging.
* When a stalled `git-annex transferrer` is killed, the parent git-annex
will read the log and handle the restaging that it was not able to do.
* When multiple processes are trying to restage files at the same time,
an exclusive lock can be used to make only one of them run, and it can
handle restaging the files that the others have recorded in the log too.
* As a bonus, in the situations where git-annex is legitimately unable to
restage files, it can still record them to be restaged later. And the
"only a cosmetic problem" message can tell the user to run a single
simple git-annex command, rather than a complicated
`git update-index` command per file.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 15"""
date="2022-09-22T18:42:06Z"
content="""
@yarikoptic oh, `git-annex test` does prevent global gitconfig from
influeencing the tests. So your matrix test won't work if you're
running `git-annex test` in it. If you're running other git-annex commands
in datalad's test suite, it would work though.
I've opened [[todo/specify_gitconfig_for_test_suite]].
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="joey"
subject="""status update"""
date="2022-09-23T19:57:38Z"
content="""
I've implemented the log file. The stalled transferrer case is now handled.
This bug is fixed.
As to a few other cases I considered in comments upthread:
When a get/drop was interrupted before it could restage,
the next get/drop will cause the necessary restaging for the
interrupted process to happen. However, this doesn't help if there's
nothing left to get/drop. Should git-annex always run restagePointerFiles
on shutdown? That would make any git-annex command handle the restaging.
But it doesn't seem right for query commands to do potentially a lot of
work to handle this case. Anyway, I don't think this needs to be dealt
with in this bug report.
When multiple processes try to restage at the same time, one will
restage everything that all of them logged. The others will still display a
warning to the user that they couldn't restage. It would be hard to avoid
displaying that warning, since it does need to warn when it was
unable to restage because git has the index locked at the time. Anyway,
I think it's ok to display the message despite the files having been
restaged, because it's the same as a later git-annex process handling the
restaging. (It does seem like two transferrers belonging to the same parent
could collide in this way, and one display the warning, which isn't great..)
I also implemented a "git-annex restage" command that
is an easier way to restage in the cases where git-annex is not able
to do it itself.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-09-21T17:05:51Z"
content="""
Is .dandi/assets.json an unlocked file?
`git diff --cached` seems like the wrong thing to run, because
that would show changes that you have staged for commit.
This change is one that has not been staged for commit.
So `git diff` should show it.
"""]]

View file

@ -0,0 +1,46 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2022-09-21T18:46:50Z"
content="""
d'oh forgot to show that I have tried that one too. Here is everything at once again with `git diff` and again doing checksums (that should have been different in my prev examples as well if different only in tree but not in index):
```shell
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use \"git add <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
modified: .dandi/assets.json
It took 3.19 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
no changes added to commit (use \"git add\" and/or \"git commit -a\")
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date: Fri Sep 16 22:22:29 2022 +0000
[backups2datalad] 66 files added
diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json
```
"""]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2022-09-21T18:49:06Z"
content="""
the workaround you suggest elsewhere for \"cosmetic\" problem works here too
```
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use \"git add <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use \"git add\" and/or \"git commit -a\")
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git update-index -q --refresh .dandi/assets.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
```
but since we are relying on output from `status`, it is not just a \"cosmetic\" issue. IMHO if such `update-index` is needed, it should have been done by git-annex automagically somehow/sometime.
"""]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2022-09-21T19:19:08Z"
content="""
So you can reproduce this? I am pretty sure it's not as simple as a drop
followed by a get, so more information about reproducing it seems crucial.
I assume you are *not* seeing the "This is only a cosmetic problem affecting git status"
message?
I expect that running `git update-index --refresh .dandi/assets.json`
will fix git status. Can you confirm?
The only way I know of that this can happen without the message is if a
drop or a get is still running, or gets interrupted. One of the last things
git-annex before exiting is restage all the unlocked files that it has
updated.
Short of that, it seems like it would have to be a bug that prevents
restagePointerFile from working. Which might not be a bug in git-annex,
if the problem involves git's handling of timestamps in the index, for
example. (Which is known to have some odd behaviors.)
(git-annex could be improved to do the
restaging later when interrupted and possibly after such a bug.
But there's no way to make it recover in `git status`, because
git doesn't run it in this situation.)
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2022-09-21T22:06:49Z"
content="""
Seems likely that the --time-limit option, when combined with -J,
could result in git-annex exiting before a worker thread gets a chance to
call stagePointerFile. I have not verified this, and it would be unlikely
to result in the same file being affected reproducibly.
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2022-09-22T01:03:18Z"
content="""
may be it one of those options, in my case - it is just a straight `get` on that single unlocked file:
```
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ cat .dandi/assets.json
/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex get .dandi/assets.json
get .dandi/assets.json (from dandi-dandisets-dropbox...)
(checksum...) ok
(recording state in git...)
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use \"git add <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use \"git add\" and/or \"git commit -a\")
```
"""]]

View file

@ -0,0 +1,58 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 7"
date="2022-09-22T01:33:24Z"
content="""
sorry I have not mentioned your [earlier comment 4](http://git-annex.branchable.com/bugs/reports_file___34__modified__34___whenever_it_is_not/#comment-ca0281ff580c91c40e429fbbb71a3791) but my clarification above I think gives the answers to your questions ;)
<details>
<summary>FWIW here is the get --debug output </summary>
```shell
[2022-09-21 21:29:59.904218] (Utility.Process) process [3968193] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"--stage\",\"-z\",\"--error-unmatch\",\"--\",\".dandi/assets.json\"]
[2022-09-21 21:29:59.904725] (Utility.Process) process [3968194] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
[2022-09-21 21:29:59.905645] (Utility.Process) process [3968195] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
[2022-09-21 21:29:59.906012] (Utility.Process) process [3968196] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"]
[2022-09-21 21:29:59.907578] (Utility.Process) process [3968196] done ExitSuccess
[2022-09-21 21:29:59.907891] (Utility.Process) process [3968197] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
[2022-09-21 21:29:59.913611] (Utility.Process) process [3968197] done ExitSuccess
[2022-09-21 21:29:59.914676] (Utility.Process) process [3968198] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..5f5efa8544ff02c9261dd1590425dcea37a55526\",\"--pretty=%H\",\"-n1\"]
[2022-09-21 21:29:59.916707] (Utility.Process) process [3968198] done ExitSuccess
[2022-09-21 21:29:59.916968] (Utility.Process) process [3968199] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..18497e6e9cab7754a85256416c361fee36ba65b2\",\"--pretty=%H\",\"-n1\"]
[2022-09-21 21:29:59.918722] (Utility.Process) process [3968199] done ExitSuccess
[2022-09-21 21:29:59.919069] (Utility.Process) process [3968200] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
get .dandi/assets.json [2022-09-21 21:29:59.921463] (Utility.Process) process [3968202] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"]
(from dandi-dandisets-dropbox...) [2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex [\"transferrer\",\"-c\",\"annex.debug=true\"]
[2022-09-21 21:29:59.93162] (Annex.TransferrerPool) > d rdandi-dandisets-dropbox SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json .dandi/assets.json
[2022-09-21 21:29:59.942599] (Annex.TransferrerPool) < opb
[2022-09-21 21:29:59.942718] (Annex.TransferrerPool) < ops 69507227
[2022-09-21 21:30:03.103409] (Annex.TransferrerPool) < ope
[2022-09-21 21:30:03.103539] (Annex.TransferrerPool) < om (checksum...)
(checksum...) [2022-09-21 21:30:03.768599] (Annex.TransferrerPool) < t
[2022-09-21 21:30:03.768843] (Annex.Branch) read 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
[2022-09-21 21:30:03.770259] (Annex.Branch) set 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
ok
[2022-09-21 21:30:03.770361] (Utility.Process) process [3968200] done ExitSuccess
[2022-09-21 21:30:03.770425] (Utility.Process) process [3968195] done ExitSuccess
[2022-09-21 21:30:03.770484] (Utility.Process) process [3968194] done ExitSuccess
[2022-09-21 21:30:03.770531] (Utility.Process) process [3968193] done ExitSuccess
[2022-09-21 21:30:03.771187] (Utility.Process) process [3968452] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"hash-object\",\"-w\",\"--stdin-paths\",\"--no-filters\"]
[2022-09-21 21:30:03.77319] (Utility.Process) process [3968453] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
[2022-09-21 21:30:04.063182] (Utility.Process) process [3968453] done ExitSuccess
[2022-09-21 21:30:04.063779] (Utility.Process) process [3968463] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
[2022-09-21 21:30:04.065352] (Utility.Process) process [3968463] done ExitSuccess
(recording state in git...)
[2022-09-21 21:30:04.06587] (Utility.Process) process [3968464] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"]
[2022-09-21 21:30:04.407935] (Utility.Process) process [3968464] done ExitSuccess
[2022-09-21 21:30:04.408528] (Utility.Process) process [3968468] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"56c62dcc21145201f9454a2dd6e75cc37f072ee4\",\"--no-gpg-sign\",\"-p\",\"refs/heads/git-annex\"]
[2022-09-21 21:30:04.410591] (Utility.Process) process [3968468] done ExitSuccess
[2022-09-21 21:30:04.413623] (Utility.Process) process [3968469] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"c3a1f9208649b47621b1424b055bd9871aa2fc79\"]
[2022-09-21 21:30:04.415318] (Utility.Process) process [3968469] done ExitSuccess
[2022-09-21 21:30:04.416301] (Utility.Process) process [3968202] done ExitSuccess
[2022-09-21 21:30:04.416574] (Utility.Process) process [3968452] done ExitSuccess
[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1
```
</details>
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2022-09-22T17:02:04Z"
content="""
I've fixed the issue I found with --timestamp combined with -J. Which I do
think could have resulted in the same kind of problem. But you've shown
that is not the cause in your case..
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2022-09-22T17:04:35Z"
content="""
Thanks for the --debug. It shows that git-annex is not running
`git update-index --refresh` at all.
And it shows that the transfer happens in a `git-annex transferrer` process.
So, I think you have annex.stalldetection set.
[2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex [\"transferrer\",\"-c\",\"annex.debug=true\"]
And interestingly, that transferrer process fails at the end:
[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1
Aha! I can reproduce it by setting annex.stalldetection.
"""]]

View file

@ -0,0 +1,72 @@
### Please describe the problem.
NB can't change the title since it is not about depends since libgcc-s1 is essential... so most likely some LD_LIBRARY_PATH manipulation is in place or smth like that.
[Testing of git-annex-remote-rclone on ubuntu-20.04 crashed](https://github.com/DanielDent/git-annex-remote-rclone/actions/runs/3750292044/jobs/6370225718) with
```
+ git-annex copy -J5 --quiet . --to GA-rclone-CI
libgcc_s.so.1 must be installed for pthread_cancel to work
/home/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/tests/all-in-one.sh: line 124: 3066 Aborted (core dumped) git-annex copy -J5 --quiet . --to GA-rclone-CI
Error: Process completed with exit code 134.
```
installation of git annex
```
Run datalad-installer --sudo ok git-annex -m datalad/git-annex:release
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Writing environment modifications to /tmp/dl-env-j8s29if7.sh
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Installing git-annex via datalad/git-annex:release
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Version: None
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Downloading https://github.com/datalad/git-annex/releases/download/10.20221212/git-annex-standalone_10.20221212-1.ndall%2B1_amd64.deb
2022-12-21T15:10:33+0000 [INFO ] datalad_installer Running: sudo dpkg -i /tmp/tmpah14ch03/git-annex-standalone_10.20221212-1.ndall+1_amd64.deb
Selecting previously unselected package git-annex-standalone.
(Reading database ... 236921 files and directories currently installed.)
Preparing to unpack .../git-annex-standalone_10.20221212-1.ndall+1_amd64.deb ...
Unpacking git-annex-standalone (10.20221212-1~ndall+1) ...
Setting up git-annex-standalone (10.20221212-1~ndall+1) ...
Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for man-db (2.10.2-1) ...
2022-12-21T15:10:35+0000 [INFO ] datalad_installer git-annex is now installed at /usr/bin/git-annex
```
or may be that is an issue with `rclone`? in this case it was
```
Run datalad-installer --sudo ok rclone=v1.59.2 -m downloads.rclone.org
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Writing environment modifications to /tmp/dl-env-aon5z6_f.sh
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Installing rclone from downloads.rclone.org
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Version: v1.59.2
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Bin dir: /usr/local/bin
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Man dir: None
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Downloading https://downloads.rclone.org/v1.59.2/rclone-v1.59.2-linux-amd64.zip
2022-12-21T15:10:38+0000 [INFO ] datalad_installer Moving /tmp/tmp75sde__c/rclone-v1.59.2-linux-amd64/rclone to /usr/local/bin/rclone
2022-12-21T15:10:38+0000 [INFO ] datalad_installer rclone is now installed at /usr/local/bin/rclone
```
I have tried to reproduce locally with exactly those installations of rclone and git-annex but not getting the same problem :-/
I have also ran with `--debug` and got
```
[2022-12-21 17:20:10.056928113] (Utility.Process) process [11603] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","c95a5c849daca7183eefc28c360942104d01e900","--no-gpg-sign","-p","refs/heads/git-annex"]
[2022-12-21 17:20:10.060448661] (Utility.Process) process [11603] done ExitSuccess
[2022-12-21 17:20:10.060806165] (Utility.Process) process [11604] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","248cef615747c4aba64fbb475b0a03c8d2a78b27"]
[2022-12-21 17:20:10.063957208] (Utility.Process) process [11604] done ExitSuccess
[2022-12-21 17:20:10.066005436] (Utility.Process) process [11127] done ExitSuccess
[2022-12-21 17:20:10.066266539] (Utility.Process) process [11114] done ExitSuccess
[2022-12-21 17:20:10.066702845] (Utility.Process) process [11126] done ExitSuccess
[2022-12-21 17:20:10.067107151] (Utility.Process) process [11125] done ExitSuccess
[2022-12-21 17:20:10.067357854] (Utility.Process) process [11599] done ExitSuccess
libgcc_s.so.1 must be installed for pthread_cancel to work
/home/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/tests/all-in-one.sh: line 125: 11083 Aborted (core dumped) git-annex drop -J5 --debug .
Error: Process completed with exit code 134.
```
in https://github.com/DanielDent/git-annex-remote-rclone/actions/runs/3751417971/jobs/6372374929 .
Any ideas Joey?
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,23 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-12-22T18:38:32Z"
content="""
I'm a bit surprised git-annex is using `pthread_cancel`, since `strings`
does not show it contains that symbol. Perhaps one of the other pthread
symbols it uses ends up calling that.
It does seem though from the message that it's git-annex and not a program
it runs that is core dumping on this. Also I checked, and the rclone you
installed is a statically linked binary so I would not expect it to use
`libgcc_s.so`. And And git-annex-remote-rclone is a bash script, and bash
doesn't use pthreads.
(I do think that, in general, using the git-annex standalone tarball and
then trying to run additional programs besides git-annex inside it is not
going to always work well. Standalone interposes its own versions of libraries,
which may not work with the other programs. There is already a todo about that,
[[todo/restore_original_environment_when_running_external_special_remotes_from_standalone_git-annex__63__]].)
I've added `libgcc_s.so.1` to the standalone build.
"""]]

View file

@ -24,7 +24,7 @@ My bugs
<details>
<summary>Fixed</summary>
[[!inline pages="bugs/* and !bugs/done and link(bugs/done) and
(author(mih) or author(ben) or author(kyle) or tagged(projects/datalad))" feeds=no actions=yes archive=yes show=0 template=buglist template=buglist]]
[[!inline pages="(bugs/* or projects/datalad/bugs-done/*) and !bugs/done and link(bugs/done) and
(author(mih) or author(ben) or author(kyle) or tagged(projects/datalad))" feeds=no actions=yes archive=yes show=0 template=buglist]]
</details>

View file

@ -0,0 +1,69 @@
### Please describe the problem.
Identified while troubleshooting another [issue](https://git-annex.branchable.com/bugs/enableremote_stuck_with_a_recentish_git-annex/#comment-2116c5e109aaf39ffd62f3bdeeb14602)
[[!format sh """
$> 'git-annex' 'enableremote' --debug -cremote.target1.blah=1 'target1'
enableremote target1 ok
$> 'git-annex' 'enableremote' -cremote.target1.blah=1 --debug 'target1'
[2020-02-26 14:46:47.789794028] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","show-ref","git-annex"]
[2020-02-26 14:46:47.797917978] process done ExitSuccess
[2020-02-26 14:46:47.798350533] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","show-ref","--hash","refs/heads/git-annex"]
[2020-02-26 14:46:47.802576899] process done ExitSuccess
[2020-02-26 14:46:47.802884873] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","log","refs/heads/git-annex..b1ab0b11fbbc94ffd3d52adb7a0e93c3d45d8b52","--pretty=%H","-n1"]
[2020-02-26 14:46:47.813289406] process done ExitSuccess
[2020-02-26 14:46:47.815873454] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","cat-file","--batch"]
[2020-02-26 14:46:47.818598891] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2020-02-26 14:46:47.824657055] read: git ["config","--null","--list"]
[2020-02-26 14:46:47.835897478] process done ExitSuccess
enableremote target1 [2020-02-26 14:46:47.83652184] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","config","remote.target1.annex-ignore","false"]
[2020-02-26 14:46:47.842277017] process done ExitSuccess
[2020-02-26 14:46:47.842703576] read: git ["config","--null","--list"]
[2020-02-26 14:46:47.853478328] process done ExitSuccess
ok
[2020-02-26 14:46:47.855317715] process done ExitSuccess
[2020-02-26 14:46:47.856835556] process done ExitSuccess
"""]]
I consider it a bug since options shouldn't be order dependent, and even if they were -- `--debug` is listed before `-c` in `git annex enableremote --help`:
[[!format sh """
$> git annex enableremote --help
git-annex enableremote - enables git-annex to use a remote
Usage: git-annex enableremote [NAME K=V ...]
Available options:
--force allow actions that may lose annexed data
-F,--fast avoid slow operations
-q,--quiet avoid verbose output
-v,--verbose allow verbose output (default)
-d,--debug show debug messages
--no-debug don't show debug messages
-b,--backend NAME specify key-value backend to use
-N,--numcopies NUMBER override default number of copies
--trust REMOTE override trust setting
--semitrust REMOTE override trust setting back to default
--untrust REMOTE override trust setting to untrusted
-c,--config NAME=VALUE override git configuration setting
--user-agent NAME override default User-Agent
--trust-glacier Trust Amazon Glacier inventory
--notify-finish show desktop notification after transfer finishes
--notify-start show desktop notification after transfer starts
-h,--help Show this help text
For details, run: git-annex help enableremote
$> git annex version
git-annex version: 7.20190819+git2-g908476a9b-1~ndall+1
"""]]
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]
> fixed in [8.20200226-3-gc089f395b](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=c089f395b0c7d6416a3d4f2bf3211404acfd5b0e) --[[yarikoptic]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-02-27T04:20:58Z"
content="""
-c uses adjustGitRepo which calls changeGitRepo, which
re-extracts the GitConfig. --debug uses changeGitConfig which
sets annexDebug in the GitConfig, which does not survive the changeGitRepo.
There might be a broader problem here, as changeGitRepo is also
called by setConfig in many parts of the code. I think it narrowly
escapes being a problem, because by the time a command is started,
it's already enabled debug output, and so the GitConfig being reloaded
doesn't disable debugging.
Other calls to changeGitConfig could also be a problem, if followed by
an adjustGitRepo which loses those changes. There are only a few others,
look probably ok, but this would be an easy gotcha to hit later.
So changeGitConfig needs to make a config change that persists across
changeGitRepo.
Done.
"""]]

View file

@ -0,0 +1,34 @@
### Please describe the problem.
### What steps will reproduce the problem?
I am plowing through on making git-annex available within conda-forge "natively" for Windows. For now I just took the recently built installer, the one now available from [datasets.datalad.org](http://datasets.datalad.org/datalad/packages/windows/) and built on datalad-extensions github setup. I just extracted git-annex component from the installer and placed them within conda hierarchy (installed `posix` package with all the needed basic tools. Overall -- looks great, but:
[[!format sh """
prop_view_roundtrips: FAIL (0.30s)
*** Failed! Falsified (after 524 tests and 1 shrink):
"a"
MetaData (fromList [(MetaField "8",fromList [MetaValue (CurrentlySet False) "",MetaValue (CurrentlySet True) "\nD\EM",MetaValue (CurrentlySet True) "GO`!)",MetaValue (CurrentlySet False) "k\FS\CAN"]),(MetaField "dU",fromList [MetaValue (CurrentlySet True) "",MetaValue (CurrentlySet False) "\NUL44Vfm[\t",MetaValue (CurrentlySet True) "\nLMEgYc",MetaValue (CurrentlySet True) "\SO[",MetaValue (CurrentlySet True) "\FS\DC4\DLE\"3",MetaValue (CurrentlySet True) ";\f0&Wc\GS{^",MetaValue (CurrentlySet True) "D",MetaValue (CurrentlySet True) "c:"]),(MetaField "sV",fromList [MetaValue (CurrentlySet True) "",MetaValue (CurrentlySet False) "\STX8#w",MetaValue (CurrentlySet False) "\ny",MetaValue (CurrentlySet False) "\DC4qOq",MetaValue (CurrentlySet True) "\FSbqjq",MetaValue (CurrentlySet True) "T_bx%[lN",MetaValue (CurrentlySet True) "W0`",MetaValue (CurrentlySet True) "~ ueY"]),(MetaField "V",fromList [MetaValue (CurrentlySet False) "",MetaValue (CurrentlySet False) "\t\DC1~`\SOHv\DC1",MetaValue (CurrentlySet True) "\DLE3",MetaValue (CurrentlySet True) "/MZh$",MetaValue (CurrentlySet False) "0",MetaValue (CurrentlySet False) "MEulc",MetaValue (CurrentlySet True) "P5D",MetaValue (CurrentlySet True) "i|S,",MetaValue (CurrentlySet True) "x|C"])])
True
Use --quickcheck-replay=742853 to reproduce.
"""]]
unfortunately I cannot tell from that output what could be the problem. Please let me know if hard to figure it out and I should provide access to such environment (ATM needs effort, so I do not want to spend time on that unless "no other way")
And it seems it might be a flaky test -- I started another run, it is still running but I this test did not fail
```
$ grep prop_view_roundtrips git-annex-test-miniconda*.log
git-annex-test-miniconda-2.log: prop_view_roundtrips: OK (2.51s)
git-annex-test-miniconda.log: prop_view_roundtrips: FAIL (0.30s)
```
Cheers,
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-10-26T16:03:40Z"
content="""
I can reproduce it in a windows VM running
`git-annex test --quickcheck-replay=742853`
These quickcheck tests test random input so not flaky exactly.
Does not happen with that seed on linux, so it probably involves something
encoding specific. An area where the windows port is known to have
extensive problems.
([[!commit 1b8026b2cbc8df0274082c5f08a8b4f8ca47c5c9]] was similar,
although that was MetaField and this appears to be MetaValue.)
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2020-10-26T17:57:12Z"
content="""
you say \"windows port\", I say \"windows as a whole\", e.g. today revelation (or just a come back if I ran into it before but forgot) to me [was inability to have a file/directory named `con`...](https://github.com/datalad/datalad/issues/5097) - no bloody sense on how such design decision has happened and how it dragged all the way into the flagman of the 2020 product.
"""]]

View file

@ -0,0 +1,42 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-10-26T18:41:27Z"
content="""
Hmm, this uses viewedFiles, which generates filenames
based on the MetaValue. Note use of pathProduct, which uses
System.FilePath.combine.
So, generating random ascii (including escape sequences)
bytestrings, and passing them through decodeBS to generate FilePaths,
and then operating on those filepaths. What could possibly go wrong.
And aha! I made pathProduct use System.FilePath.Windows.combine
and was able to reproduce the test suite failure on Linux.
And aha again:
MetaValue (CurrentlySet True) "c:"
Which of course breaks it on windows because it wanted to generate
something like "bar/c:/baz/a" but instead it gets "c:/bar/baz/a"
git-annex does replace '/' and '\' when generating these filenames.
Not as a security measure (when the view branch is checked out, git's
security checks apply same as any branch so it piggybacks on those),
but to let the user build a view and successfully check it out
when their metadata happens to include such stuff.
However, windows does have enough special filenames and gotchas
that it simply does not seem to make sense for git-annex to try to work
around them all in the view code. If a MetaValue happens to end with a
period, or is "nul", and so the generated filename is illegal on Windows,
it'll blow up at checkout time, and I am ok with that.
So I think it would make sense to also escape ':', but that's about as far
as this should go. *Especially* because the filenames it generates need to
roundtrip back to metadata cleanly, which is what this test case is
testing. While I can finesse individual characters, it would be quite hard
to make a filename w/o a trailing dot roundtrip back to one with it, for
example.
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2021-01-22T16:44:58Z"
content="""
did it come back, I see
```
2021-01-22T04:32:25.5012547Z prop_view_roundtrips: FAIL (0.09s)
2021-01-22T04:32:25.5015902Z *** Failed! Falsified (after 218 tests):
2021-01-22T04:32:25.5016251Z AssociatedFile (Just \"rdmBBP\")
2021-01-22T04:32:25.5018130Z MetaData (fromList [(MetaField \"CkL\",fromList [MetaValue (CurrentlySet False) \"\",MetaValue (CurrentlySet True) \"\SOH5:R9\EM\DC4\",MetaValue (CurrentlySet True) \"\STX\US\fL2\ACK|\\\r[$\",MetaValue (CurrentlySet False) \"\ETBRi\",MetaValue (CurrentlySet False) \"/\FS}\",MetaValue (CurrentlySet True) \"W\",MetaValue (CurrentlySet False) \"X=sQh\NAK^\",MetaValue (CurrentlySet False) \"l\SUB\a\"]),(MetaField \"jM\",fromList [MetaValue (CurrentlySet False) \"\",MetaValue (CurrentlySet False) \"\FSSivk\",MetaValue (CurrentlySet True) \"J'<\SYN\STXGJP\"]),(MetaField \"V\",fromList [MetaValue (CurrentlySet False) \"\",MetaValue (CurrentlySet True) \"\n\NUL\",MetaValue (CurrentlySet True) \"\r\",MetaValue (CurrentlySet False) \"+X\",MetaValue (CurrentlySet True) \"@aN\t~c\SIy\",MetaValue (CurrentlySet False) \"K>xq\",MetaValue (CurrentlySet True) \"a:\"]),(MetaField \"W\",fromList [MetaValue (CurrentlySet True) \"0\DC4qL\",MetaValue (CurrentlySet False) \"K\",MetaValue (CurrentlySet False) \"LD\DC3<M\",MetaValue (CurrentlySet False) \"a\v\",MetaValue (CurrentlySet True) \"dO\",MetaValue (CurrentlySet True) \"w\EOT\"])])
2021-01-22T04:32:25.5020545Z True
2021-01-22T04:32:25.5020894Z Use --quickcheck-replay=455629 to reproduce.
```
on https://github.com/datalad/git-annex/runs/1746587663?check_suite_focus=true with `8.20201129+git169-gaa07e68ed_x64`
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-01-22T16:59:55Z"
content="""
Not sure if this is really the same bug, though certainly related. These
quickcheck tests are fuzz tests, they can find numerous bugs, that's kind of
the point of them. In any case, posting to a closed bug report risks your
followup being lost and deprioritises it.
The problem this new failure shows is that toViewPath is failing to escape the
final character in the path in some cases. Which is not a windows-specific
bug at all really, it could also happen with a metadata value such as "foo/"
being set on linux. Fixed that bug.
Which shows the point of these quickcheck fuzz tests: To be able to catch
lots of different bugs with a single test case.
"""]]

View file

@ -0,0 +1,42 @@
### Please describe the problem.
Since cron build of 20210828
```
(git)smaug:/mnt/datasets/datalad/ci/git-annex/builds/2021/08[master]git
$> git grep -l 'Unable to remove all write permissions'
cron-20210828/build-macos.yaml-403-69466103-failed/2_test-annex (crippled-tmp).txt
cron-20210828/build-macos.yaml-403-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
cron-20210829/build-macos.yaml-404-69466103-failed/2_test-annex (crippled-tmp).txt
cron-20210829/build-macos.yaml-404-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
cron-20210830/build-macos.yaml-405-69466103-failed/2_test-annex (crippled-tmp).txt
cron-20210830/build-macos.yaml-405-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
cron-20210831/build-macos.yaml-406-69466103-failed/2_test-annex (crippled-tmp).txt
cron-20210831/build-macos.yaml-406-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
```
we got two test fails on a crippled FS on Mac (does not happen on linux afaik)
[example CI log](https://github.com/datalad/git-annex/runs/3468283573?check_suite_focus=true)
Both look like
```
2021-08-31T02:15:42.0758760Z magic: OK (2.41s)
2021-08-31T02:15:42.6972710Z import: FAIL (0.62s)
2021-08-31T02:15:42.6973680Z ./Test/Framework.hs:57:
2021-08-31T02:15:42.6974230Z import failed (transcript follows)
2021-08-31T02:15:42.6974760Z import import1/f
2021-08-31T02:15:42.6976570Z Unable to remove all write permissions from /Volumes/crippledfs/importtestvjfjz3/import1/f -- perhaps it has an xattr or ACL set.
2021-08-31T02:15:42.6977430Z failed
2021-08-31T02:15:42.6977830Z import: 1 failed
2021-08-31T02:15:44.1985050Z reinject: OK (1.50s)
```
[here is the script](https://github.com/datalad/git-annex/blob/master/.github/workflows/tools/setup_crippledfs#L24) to setup such a crippled (FAT32) FS on OSX.
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]] (provisionally, waiting on test run) --[[Joey]]

View file

@ -0,0 +1,34 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-09-01T13:48:57Z"
content="""
Seems that mounting that way on OSX results in a FS where files are always mode
777 and the permissions cannot be changed.
When I tried using git-annex on such a FS, I saw:
datalads-imac:x joey$ git annex init
init
Detected a filesystem without fifo support.
Disabling ssh connection caching.
Filesystem allows writing to files whose write bit is not set.
Detected a crippled filesystem.
And it skips the new permissions check when on a crippled filesystem.
But in that that test run, it seems it is failing to detect a crippled
filesystem. Both because of the failure and also the test suite does
not even run the "v8 unlocked" tests when it detects a crippled filesystem.
Is the test suite running as root? Looks like probably yes. Running as
root prevents detecting the issue that made it use a crippled FS above. And it
seems that, when a FAT fs is mounted on OSX that way, symlinks actually work
(!!!) so the other crippled FS tests also don't notice a problem.
So, the fix should be for init to also test if it can remove the write
bits from a file, and it should try that test even when root.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2021-09-01T16:22:22Z"
content="""
> Is the test suite running as root? Looks like probably yes.
FWIW, it is a `runner` user [ref](https://github.com/datalad/git-annex/pull/76/checks?check_run_id=3486443350#step:8:1) (did in a temp [PR](https://github.com/datalad/git-annex/pull/76)) who is not `root` but is part of the `admin` group thus with super privileges indeed (that is why I guess we can also use `hdiutil` directly to mount that crippled FS).
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-09-02T16:22:56Z"
content="""
OSX test is still failing after that fix, reopened.
"""]]

View file

@ -0,0 +1,27 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-09-02T16:26:14Z"
content="""
My prior analysis seems right, as far as it running as root would go, but it is
not running as root. So I missed something.
The test failures are both of `git-annex import`.
Otherwise locking down files does succeed. The difference with import
must be that the file located in a directory outside the repository.
Aha... The test suite is being run with TEMPDIR set to the crippled FS,
but `.t` is in another, non-crippled FS. A very smart idea to test that,
although I think this import test is the only one that actually uses
TEMPDIR. (Reading the workflow file, I think it was maybe expected that
all the tests would run in TEMPDIR, but they don't; `git-annex test`
writes to `./.t`, other than this one test.
When the import directory is on a crippled FS, and the repo
is not, it will think the FS is not crippled. Then it fails
to remove write perms from the file while it is in the import
directory, and the perm check then fails.
So, I think it should skip the perm check when doing the initial lockdown
of the file it's going to import.
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-09-02T17:39:20Z"
content="""
Ok, fixed some more, hopefully all the way this time..
"""]]

View file

@ -0,0 +1,66 @@
### Please describe the problem.
There was some recent work to ["centralize" such prompts](https://git-annex.branchable.com/devblog/day_457__improved_ssh_password_prompting/) but it seems some are still "leaking through" multiple times. May be it is because there are 2 available repos on that remote host, so annex generates one per each of those? (although it knows only about origin)
### What version of git-annex are you using? On what operating system?
6.20170810+gitgff6f9e203-1~ndall+1
### Please provide any additional information below.
[[!format sh """
$> git annex get -J5 .
get R042/R042-2013-08-16/R042-2013-08-16-CSC01a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC02a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC03a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC05a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC04a.ncs (from datalad-archives...)
(from datalad-archives...) (from datalad-archives...)
(from datalad-archives...)
(from datalad-archives...)
[ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
| Unable to access these remotes: origin
|
| Try making some of these repositories available:
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
| failed
| err=git-annex: get: 1 failed
|
[ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
| Unable to access these remotes: origin
|
| Try making some of these repositories available:
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
| failed
| err=git-annex: get: 1 failed
|
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--0501aab6b4d1ce0565921728bc92ef74f81edf0d7bcd5a77946ca58f977f2537.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--8b3b08310db20ca7e3e784a21f935a78f8669efdf1396168596411f1e355e43b.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
(from origin...) (from origin...) [ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
| Unable to access these remotes: origin
|
| Try making some of these repositories available:
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
| failed
| err=git-annex: get: 1 failed
|
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--08ce5a67c7fc09f02b994a3987812a75727eaf51f3e70fa7e1030dae934f9fbc.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
(from origin...) [ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
| Unable to access these remotes: origin
|
| Try making some of these repositories available:
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
| failed
| err=git-annex: get: 1 failed
|
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--bc145f07c79584181cad3763a763a2ea047282bd41153d20a63d85a44fb27a7f.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
(from origin...) yhalchen@discovery.dartmouth.edu's password: yhalchen@discovery.dartmouth.edu's password:
"""]]
[[!meta author=yoh]]
[[!tag projects/datalad]]
> warning added; [[done]] --[[Joey]]

View file

@ -0,0 +1,34 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 1"
date="2017-08-12T04:08:04Z"
content="""
ha -- if I do specify `--from=origin` -- only 1 prompt
[[!format sh \"\"\"
$> git annex get -J5 --from=origin .
get R042/R042-2013-08-16/R042-2013-08-16-CSC01a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC03a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC02a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC04a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC05a.ncs (from origin...) yhalchen@discovery.dartmouth.edu's password:
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
Unable to run git-annex-shell on remote .
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
Unable to run git-annex-shell on remote .
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
Unable to run git-annex-shell on remote .
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
Unable to run git-annex-shell on remote .
SHA256E-s17136940--08ce5a67c7fc09f02b994a3987812a75727eaf51f3e70fa7e1030dae934f9fbc.ncs
0 0% 0.00kB/s 0:00:00 SHA256E-s17136940--bc145f07c79584181cad3763a763a2ea047282bd41153d20a63d85a44fb27a7f.ncs
0 0% 0.00kB/s 0:00:00 SHA256E-s17136940--c3a8af948c77a2df422eae50807a6e7e6e5db7a3451a562bca529d3f1a1a234f.ncs
0 0% 0.00kB/s 0:00:00 git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
\"\"\"]]
"""]]

View file

@ -0,0 +1,25 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2017-08-15T16:43:07Z"
content="""
The improvements around ssh password prompting require ssh connection
caching to work. If a ssh connection fails because the wrong password is
entered or because there's no usable tty or whatever, there's no cached
ssh connection to reuse, so the next attempt to access that host will
result in another password prompt.
Also, datalad does not seem to be running git-annex with -J. So it *can't*
be trying to make two ssh connection at the same time. My recent work on
ssh password prompting was mostly to fix cases where git-annex is run with
-J.
It's also possible that some ssh configuration that I don't know of could
make ssh password prompt even when git-annex is running it with
`BatchMode=true` to avoid password prompts (in order to test if the ssh
connection is already up). That would then result in two ssh password
prompts, one after the other, which seems to match your transcript.
If you have only one remote, specifying `--from=origin` won't change
anything. Entering the right password would change something there though..
"""]]

View file

@ -0,0 +1,43 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="the issue persists"
date="2019-11-01T18:12:27Z"
content="""
Ran into the same problem again, and it is not clear to me either connection caching is enabled or not (and why?):
[[!format sh \"\"\"
[d31548v@discovery7 bids]$ git -c annex.sshcaching=true annex --debug get -J2 --from=origin sub-sid000005
[2019-11-01 14:10:56.178577] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"git-annex\"]
[2019-11-01 14:10:56.475956] process done ExitSuccess
[2019-11-01 14:10:56.47622] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
[2019-11-01 14:10:56.836271] process done ExitSuccess
[2019-11-01 14:10:56.865928] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"log\",\"refs/heads/git-annex..8a694d5c54eb81b1e5c5446fa63bdcd13daa34b3\",\"--pretty=%H\",\"-n1\"]
[2019-11-01 14:10:57.229787] process done ExitSuccess
[2019-11-01 14:10:57.234655] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
[2019-11-01 14:10:57.23592] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
[2019-11-01 14:10:57.546203] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"symbolic-ref\",\"-q\",\"HEAD\"]
[2019-11-01 14:10:57.780246] process done ExitSuccess
[2019-11-01 14:10:57.780454] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"refs/heads/master\"]
[2019-11-01 14:10:58.097345] process done ExitSuccess
[2019-11-01 14:10:58.09754] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"ls-files\",\"--cached\",\"-z\",\"--\",\"sub-sid000005\"]
[2019-11-01 14:10:58.298181] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
[2019-11-01 14:10:58.29998] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
[2019-11-01 14:10:58.305022] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
[2019-11-01 14:10:58.306024] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
[2019-11-01 14:10:58.62005] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
[2019-11-01 14:10:58.621714] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
[2019-11-01 14:10:58.632596] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
[2019-11-01 14:10:58.6338] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
get sub-sid000005/ses-actions1/fmap/sub-sid000005_ses-actions1_acq-25mm_magnitude2.nii.gz get sub-sid000005/ses-actions1/fmap/sub-sid000005_ses-actions1_acq-25mm_magnitude1.nii.gz (from origin...) (from origin...)
[2019-11-01 14:10:59.489719] chat: ssh [\"yohtest@rolando.cns.dartmouth.edu\",\"-T\",\"git-annex-shell 'p2pstdio' '/inbox/BIDS/Haxby/Sam/1021_actions' '--debug' 'fd3f7af9-cf7d-4d7e-8efd-30e6bedf838d' --uuid d839134c-3afe-4456-920a-e280ce0fdf2a\"]
[2019-11-01 14:10:59.553029] chat: ssh [\"yohtest@rolando.cns.dartmouth.edu\",\"-T\",\"git-annex-shell 'p2pstdio' '/inbox/BIDS/Haxby/Sam/1021_actions' '--debug' 'fd3f7af9-cf7d-4d7e-8efd-30e6bedf838d' --uuid d839134c-3afe-4456-920a-e280ce0fdf2a\"]
yohtest@rolando.cns.dartmouth.edu's password: yohtest@rolando.cns.dartmouth.edu's password:
[d31548v@discovery7 bids]$ git annex version
git-annex version: 7.20191024-g6dc2272
\"\"\"]]
Could you hint me on what/where to dig?
"""]]

View file

@ -0,0 +1,21 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2020-01-23T15:51:46Z"
content="""
I notice that debug output has no BatchMode=true in any ssh call. But
the version of git-annex you show always runs ssh with that when
-J is used, unless sshcaching is disabled.
More evidence that sshcaching is disabled in your transcript is that when
it does run ssh, it does not pass -S.
I think the repository must be on a crippled filesystem, on which
git-annex can't do ssh connection caching, because the filesystem
does not support unix sockets. (Or it potentially could be crippled in some
other way.) So it ignores the annex.sshcaching setting.
You could work around this by setting the (undocumented)
GIT_ANNEX_TMP_DIR to some temporary directory on a non-crippled filesystem.
I'm going to add a warning message in this situation.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 5"
date="2020-01-23T17:51:58Z"
content="""
Thank you Joey! I can only confirm that the file system was likely a crippled/NFS one... So we would likely need to do some sensing on DataLad side and instruct git-annex. Will continue on our end at https://github.com/datalad/datalad/issues/4075
"""]]

View file

@ -0,0 +1,79 @@
In datalad test builds with git-annex 7.20191114+git43-ge29663773, one
of the new test failures is due to an unexpectedly dirty repository
([related datalad issue][0]). The dirty status comes from a file that
was tracked in Git switching over to an annex pointer file. Here's a
script that distills enough of the test to trigger the failure on my
end.
[[!format sh """
#!/bin/sh
set -eu
assert_clean () {
if test -n "$(git status --porcelain)"
then
printf "\n\nUnexpectedly dirty:\n" >&2
git status >&2
git diff >&2
exit 1
fi
}
cd "$(mktemp -d --tmpdir gx-pointer-dirty-XXXXXXX)"
git init && git annex init
printf content-git >file-git
git -c annex.largefiles=nothing annex add -- file-git
git commit -m'file-git added'
assert_clean
printf content-annex >file-annex
git -c annex.largefiles=anything annex add -- file-annex
git commit -m'file-annex annexed'
assert_clean
"""]]
On Travis as well as my local machine, the failure is intermittent,
but seems to happen much more often than not. In the failing case,
the last assert_clean call shows:
```
Unexpectedly dirty:
On branch master
Changes not staged for commit:
modified: file-git
no changes added to commit
diff --git a/file-git b/file-git
index d1c416a..b41ca32 100644
--- a/file-git
+++ b/file-git
@@ -1 +1 @@
-content-git
\ No newline at end of file
+/annex/objects/SHA256E-s11--726732d25826965592478fcc7c145d5a10fa1aa70c49fe3a4f847174b6d8889c
```
I see the failure with git-annex built from the latest master
b962471c2 (2019-12-12). Bisecting against the git-annex repo (with a
commit being marked "bad" if there was a failure within ten runs of the
above script), points to ec08b66bd (shouldAnnex: check isInodeKnown,
2019-10-23) as the first bad commit. Just looking at the topic of
the commit, that result seems plausible to me.
### Other details
My git version 2.24.1 and locally I'm building git-annex through guix.
On the failing Travis run, git-annex 7.20191114+git43-ge29663773 came
from neurodebian, and the git version was 2.24.0.
Hopefully the script above is sufficient to trigger the issue on your end.
Thanks for having a look.
[0]: https://github.com/datalad/datalad/issues/3890
[[!meta author=kyle]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,97 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-12-26T16:56:38Z"
content="""
The title makes it sound like a work tree file gets replaced with a
dangling pointer file, which is not the case. A worktree file that was
not annexed is is being added to the annex, if you choose to commit that
state.
For whatever reason, git becomes confused about whether this file is
modified. I seem to recall that git distrusts information it recorded in
its own index if the mtime of the index file is too close to the
mtime recorded inside it, or something like that. (Likely as a
workaround for mtime granularity issues with various filesystems.) Whatever
the reason, git-annex is not involved in it; it will happen sometimes even
when git-annex has not initialized the repo and is not being used.
It's not normally a problem that git gets confused or distrusts its
index or whatever, since all it does is stat the file, or
feed it through the clean filter again, and if the file is not
modified, nothing changes.
Why does the clean filter decide to add the file to annex in this case?
Well, because this is all happening inside this:
git -c annex.largefiles=anything annex add -- file-annex
And there you've told it to add all files to the annex with
annex.largefiles=anything. So it does.
To complete the description of what happens:
`git-annex add` runs `git add` on the `file-annex` symlink it's adding.
`git add file-annex`, for whatever reason, decides to run the clean filter on
file-git.
The annex.largefiles=anything gets inherited through this chain of calls.
While the resulting "change" does not get staged by `git add`
(it was never asked to operate on that file), the clean filter
duly ingests the content into the annex, and remembers its inode.
So when the clean filter later gets run by `git status`, it sees an inode
it knows it saw before, and assumes it should remain annexed.
(This is why the commit that checks for known inodes was fingered by the
bisection.)
---
Note that, you can accomplish the same thing without setting
annex.largefiles, assuming a current version of git-annex:
git add file-git
git annex add file-annex
I think the only reason for setting annex.largefiles in either of the two
places you did is if there's a default value that you want to
temporarily override?
----
Also, just touching file-git before the annex.largefiles=anything
operation causes the same problem, again git-annex add runs git add
file-annex, which runs the clean filter on file-git, which this time
is legitimately modified.
---
Possible ways to improve this short of improving git's behavior:
`git annex` could set annex.gitaddtoannex=false when it runs `git add`.
Since git-annex never relies on `git add` adding files to the annex,
that seems entirely safe to always do (perhaps even when running all git
commands aside from git-annex commands of course). But, that would
not help with a variant where rather than `git-annex add`,
this is run:
git -c annex.largefiles=anything add file-annex
The clean filter could delay long enough that git stops distrusting
its index based on timestamps. A 1 second sleep if the file's mtime
is too close to the current time works; I prototyped a patch doing that.
But, that does not deal with the case
mentioned above where file-git gets touched or legitimately modified.
The clean filter could check if the file is already
in the index but is not annexed, and avoid converting it to annexed.
But that would prevent legitimate conversions from git to annexed
as well, which rely on the same kind of use of annex.largefiles.
Temporary overrides of annex.largefiles could be ignored by the clean
filter. Same problem as previous.
So, I think that fixing this will involve adding a new interface for
converting between git and annexed files that does not involve
-c annex.largefiles. That plus having the clean filter check for
non-annexed files seems like the best approach.
"""]]

View file

@ -0,0 +1,42 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2019-12-27T06:22:23Z"
content="""
On second thought, making the clean filter check for non-annexed files
would prevent use cases like annex.largefiles=largerthan(100kb)
from working as the user intended and letting a small file start out
non-annexed and get annexed once it gets too large. Users certianly rely on
that and this bug that only affects an edge case does not justify breaking
that.
What would work to make the clean filter detect when a file's content
has not changed, though its mtime (or inode) has changed. In that case,
it's reasonable for the clean filter to ignore annex.largefiles and keep
the content represented in git however it already was (non-annexed or
annexed).
To detect that, in the case where the file in the index is not annexed:
First check if the file size is the same as the
size in the index. If it is, run git hash-object on the file, and see if
the sha1 is the same as in the index. This avoids hashing any unusually
large files, so the clean filter only gets a bit slower.
And when the file in the index is annexed, check if the file size is the
same as the size of the annexed key. If it is, verify if the file content
matches the key. (typically be hashing). Cases where keys lack size or
don't use a checksum could lead to false positives or negatives though.
Although, I've not managed to find a version of this bug that makes an
annexed file get converted to git unintentionally, so maybe this part does
not need to be done?
----
Or.. Since the root of the problem is temporarily overriding annex.largefiles,
it could just be documented that it's not a good idea to use
-c annex.largefiles=anything/nothing, because such broad overrides
can affect other files than the ones you intended.
(And since the documented methods of converting files from annexed to git and
git to annexed use such overrides, that documentation would need to be
changed.)
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-12-27T17:11:42Z"
content="""
A variant of this where an annexed unlocked file is added first,
then the file is touched, and then some other file is added
with -c annex.largefiles=nothing does result in the clean filter sending
the whole annexed file content back to git, rather than keeping it annexed.
For whatever reason, git does not store that content in .git/objects or
update the index for that file though, so it doesn't show up as a change.
So *apparently* that variant is only potentially an expensive cat of a
large annexed file, and does not need to be dealt with. Unless git
sometimes behaves otherwise.
"""]]

View file

@ -0,0 +1,45 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2019-12-27T18:41:12Z"
content="""
It's almost possible to get the same unwanted conversion without any git
races:
echo content-git > file-git
sleep 2
git add file-git
git commit -m add
echo foo > file-git
echo content-annex > file-annex
git -c annex.largefiles=anything annex add file-annex
In this case, git currently does not run the modified file-git through the
clean filter in the last line, so the annex.largefiles=anything doesn't
affect it.
But, as far as I can see, there's nothing preventing a future version
of git from deciding it does want to run file-git through the clean filter
in this case.
I am not going to try to prevent against such a thing happening.
As far as I can see, anything that the clean filter can possibly do to
avoid such a situation will cripple existing uses cases of
annex.largefiles, like largerthan() as mentioned above.
The user has told git-annex to annex "anything", and if git
decides to run the clean filter while that is in effect, caveat emptor.
Which is not to say I'm not going to fix the specific case this bug was
filed about. I actually have a fix developed now. But just to say that
setting annex.largefiles=anything/nothing temporarily is a blunt instrument,
and you risk accidental conversion when using it, and so it would be a good
idea to not do that.
One idea: Make `git-annex add --annex` and `git-annex add --git`
add a specific file to annex or git, bypassing annex.largefiles and all
other configuration and state. This could also be used to easily switch
a file from one storage to the other. I'd hope the existence of that
would prevent one-off setting of annex.largefiles=anything/nothing.
[[todo/git_annex_add_option_to_control_to_where]]
"""]]

View file

@ -0,0 +1,58 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="comment 5"
date="2019-12-28T21:06:46Z"
content="""
Thanks for the explanation and the fix.
> For whatever reason, git becomes confused about whether this file is
> modified. I seem to recall that git distrusts information it recorded in
> its own index if the mtime of the index file is too close to the
> mtime recorded inside it, or something like that.
I see. I think the problem and associated workaround you're referring
to is described in git's Documentation/technical/racy-git.txt.
> Note that, you can accomplish the same thing without setting
> annex.largefiles, assuming a current version of git-annex:
>
> git add file-git
> git annex add file-annex
>
> I think the only reason for setting annex.largefiles in either of the two
> places you did is if there's a default value that you want to
> temporarily override?
Right. DataLad's methods that are responsible for calling out to `git
annex add` have a `git={None,False,True}` parameter. By default
(`None`), DataLad just calls `git annex add ...` and let's any
configuration in the repo control whether the file goes to git or is
annexed. But with `git=True` or `git=False`, the `annex add` call
includes a `-c annex.largefiles=` argument with a value of `nothing`
or `anything`, respectively.
> But just to say that setting annex.largefiles=anything/nothing
> temporarily is a blunt instrument, and you risk accidental
> conversion when using it, and so it would be a good idea to not do
> that.
Noted. As mentioned above, DataLad's default behavior is to honor the
repo's `annex.largefiles` configuration. And the documentation for
`datalad save`, DataLad's main user-facing entry point for `annex
add`, recommends that the user configure .gitattributes rather than
using the option that leads calling `annex add` with `-c
annex.largefiles=nothing`.
> One idea: Make `git-annex add --annex` and `git-annex add --git`
> add a specific file to annex or git, bypassing annex.largefiles and all
> other configuration and state. This could also be used to easily switch
> a file from one storage to the other. I'd hope the existence of that
> would prevent one-off setting of annex.largefiles=anything/nothing.
As far as I can see, those flags would completely cover DataLad's
one-off setting of `annex.largefiles=anything/nothing`. They map
directly to DataLad's `git=False/True` option described above. So,
from DataLad's perspective, they'd be very useful and welcome.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2020-01-01T17:41:13Z"
content="""
I've added git-annex add --force-large and --force-small, which would be
good to use to avoid this kind of too-broad overriding problem in the future.
"""]]

Some files were not shown because too many files have changed in this diff Show more