move old fixed datalad/dandi/repronim bugs to the project pages

This is to cut down on the number of files in bugs/, which makes it slow
to file new bug reports or update active bug reports. These old bugs
were about 1/3rd of the files in there. These projects want lists of
their old bugs to still be accessible, and have the lists on their
project pages, which will still list the old bugs.

Commands used:

for f in $(git grep -l '\[\[!tag projects/dandi\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/dandi/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/dandi/bugs-done; fi; fi; done
for f in $(git grep -l '\[\[!tag projects/repronim\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/repronim/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/repronim/bugs-done; fi; fi; done
for f in $(git grep -l '\[\[!tag projects/datalad\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/datalad/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/datalad/bugs-done; fi; fi; done

That assumes that bugs are not tagged by multiple projects at the same
time. Of the ones I moved, I've checked and none are.

Could do the same with todo/ but there are only 370 files in there, and
less than 84 of them could be moved this way, which does not seem likely
to produce a sizeable speedup.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2023-01-05 13:16:15 -04:00
parent 946fc20165
commit bcc69f07e8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
1011 changed files with 4 additions and 4 deletions

View file

@ -0,0 +1,49 @@
### Please describe the problem.
Original complaints could be found mentioned in the comments of the [importfeed page](https://git-annex.branchable.com/git-annex-importfeed/): when using `addurl`, and even when the server provides Content-Disposition field with the filename, git-annex seems (BTW -- no Content-Disposition was mentioned in the --debug output) to take that filename value and obfuscates it (replaces '-' with '_' etc) to what supposed to be the original filename.
[[!format sh """
$> mkdir /tmp/testrepo; cd /tmp/testrepo; git init; git annex init;
mkdir: cannot create directory /tmp/testrepo: File exists
E: could not determine git repository root
Initialized empty Git repository in /tmp/testrepo/.git/
init ok
(recording state in git...)
$> git annex addurl --fast https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download
addurl https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download (to sub_mouse_AAYYT_ses_20180420_sample_2_slice_20180420_slice_2_cell_20180420_sample_2.nwb) ok
(recording state in git...)
$> ls -l
total 4
lrwxrwxrwx 1 yoh yoh 184 May 7 17:02 sub_mouse_AAYYT_ses_20180420_sample_2_slice_20180420_slice_2_cell_20180420_sample_2.nwb -> .git/annex/objects/Gj/9z/URL-s9335000--https&c%%girder.dandiarchive.org-48163bc503cb7181516be86ef215f923/URL-s9335000--https&c%%girder.dandiarchive.org-48163bc503cb7181516be86ef215f923
"""]]]
whenever original content-disposition was having "-" in the filename, which are perfectly safe the filename AFAIK:
[[!format sh """
$> wget -S https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download
... bunch of forwards to the final one with the content disposition field
Resolving dandiarchive.s3.amazonaws.com (dandiarchive.s3.amazonaws.com)... 52.219.101.51
Connecting to dandiarchive.s3.amazonaws.com (dandiarchive.s3.amazonaws.com)|52.219.101.51|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
x-amz-id-2: VgJE1jV5XUkBQXZDWgR5WEDfmHJp4Fj6fGo6z2tYkLfyTsxDWC+m92B2qOSVppCuiRFu2QpNV5M=
x-amz-request-id: 1221CAC30E3931CF
Date: Thu, 07 May 2020 21:02:52 GMT
Last-Modified: Wed, 22 Apr 2020 00:54:32 GMT
ETag: "acf3b4f5951435245a0efcd4a518e77d"
Content-Disposition: attachment; filename="sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2.nwb"
...
$> git annex version
git-annex version: 7.20190708+git9-gfa3524b95-1~ndall+1
"""]]
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,42 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-05-08T16:50:14Z"
content="""
This is due to the filename being passed through sanitizeFilePath.
There are security concerns here. If the filename contains "../"
it absolutely has to be modified, or the command would have to fail and
refuse the import it.
If the filename contains an ANSI escape sequence, it could potentially
lead to a security hole. Or if the filename starts with "-" it could be
somewhere between a possible security hole and just very annoying to work
with. As could a filename that contains a newline, which will
break large quantities of shell pipelines. While generally git repos can
have these problems with files in them too, the exposure seems larger when
talking to some random web server than when pulling from a repo.
Also, cross filesystem compatibility is a concern. It used to allow "|" in
the filename, but a bug pointed out that cannot be used on fat filesystems.
And "\\" means different things on linux and windows, so probably best to avoid
filenames containing it on linux too.
Finally, it's somewhat opinionated, since it replaces spaces with
underscores. That's certainly the least defensible thing.
(git-annex may also truncate the filename if it's longer than what the
filesystem supports.)
So, it's clearly wrong that it should be taken as-is without obfuscation,
IMHO. Maybe there's a way to improve it to meet some use case though.
I could see having a config that avoids sanitizing the filename, but
makes addurl fail if the filename looks like a security problem.
Though that has the downside that git-annex would then need to
comprehensively track, going forward, all the ways that people find to make
filenames be a security problem; the current method, by being strict in
what it lets through, probably limits expoits to ones involving a) unicode
or b) the user's wetware.
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-05-08T18:19:20Z"
content="""
`git-annex import` does not do any sanitization, and that could be
considered inconsistent, particularly when importing from a remote like S3.
A difference with that is, it creates a remote tracking branch for the
imported files. (That happens to avoid "../" path traversal because git
generally avoids it.) Maybe the real difference is, import from a special
remote is completely analagous to fetching from a git remote. So it feels
different to me than adding an url does.
If I sync with a S3 bucket and it turns out it imported a escape sequence
file, well I could have looked at the bucket first, or imported and
reviewed the branch before merging it. And if I was syncing with a git
remote the same thing could happen. So it feels like I should have no
expectation git-annex would protect me. Whereis, if I add an url and the
web server uses an obscure-ish http header to surprise me with a similar
malicious filename, I had no way before hand to know that would happen, and
so it does feel like git-annex should protect me.
(Although if git did prevent that, git-annex should too, and I'd be
fine with git preventing that.)
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-05-08T19:56:27Z"
content="""
Implemented git-annex addurl --preserve-filename, which will do what you
want.
Leaving this bug open because I only implemented it for web urls, not yet
for torrents and other special remotes that have their own url scheme.
The sanitization for those is currently done at a lower level than addurl,
and so that will take a bit more work to implement.
(importfeed does not, I think, need to implement this option, because
the filenames are based on information from the rss feed, and it's
perfectly fine to sanitize eg a podcast episode title to get a reasonable
filename.)
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2020-05-09T22:10:43Z"
content="""
> If the filename contains an ANSI escape sequence, it could potentially lead to a security hole.
> ... As could a filename that contains a newline, which will break large quantities of shell pipelines.
IMHO those indeed are ok to target for sanitization
> Or if the filename starts with \"-\" it could be somewhere between a possible security hole and just very annoying to work with.
So why not to sanitize it only at the beginning of the filename?
`-` is a very common and a safe character to use within filename. For that matter we VERY frequently use `-` in filenames. It even became part of our BIDS standard in neuroimaging: https://bids-specification.readthedocs.io where we separate `_key` from `value`, e.g.in ` . I really do not see why git-annex should so aggressively sanitize filenames as replacing \"-\" within filenames -- it makes nothing more secure or convenient.
> While generally git repos can have these problems with files in them too, the exposure seems larger when talking to some random web server than when pulling from a repo.
Well, not sure about ansi characters and new line symbols, but typically files are saved by the browsers with the name suggested by the server.
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2020-05-11T17:20:07Z"
content="""
I agree that it may as well allow non-leading '-'. But, if you are relying
on getting the unsanitized filename generally, you should use
--preserve-filename
Web browsers do do some santization, particulary of '/'.
Chrome removes leading "." as well. Often files are downloaded
without the user confirming it. I suspect there is enough insecurity
in that area that someone could make a living injecting bitcoin miners into
dotfiles.
"""]]

View file

@ -0,0 +1,6 @@
While running `git-annex addurl --batch --with-files --jobs 10 --json --json-error-messages --json-progress --raw`, I occasionally run into files that fail to download for no discernable reason, and the `"error-messages"` key in the output from the command is an empty list. This makes it hard to figure out exactly why the download is failing.
[[!meta author=jwodder]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-10-27T16:23:52Z"
content="""
Is it reproducible with a particular url? Does it only happen with -J?
Version would also be good to know. There were recent relevant
changes eg [[!commit 4f42292b13dc5a6664eeb19b5c9d48991eaef292]].
I've spent a while hunting for a code path where it fails without
displaying a warning, and have not found one. Since the code in addurl
is structured as return Nothing and hopefully display a warning
beforehand, rather than as throw an error, it's certianly possible that
happens.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="jwodder"
avatar="http://cdn.libravatar.org/avatar/b06e01332c949b895c681cc92934f36a"
subject="comment 2"
date="2021-10-27T18:16:43Z"
content="""
It appears that the problem occurs whenever one tries to download the same URL to two different paths at the same time. When this occurs, one of the downloads fails, and though its \"error-messages\" is empty, its \"notes\" field reads, \"transfer already in progress, or unable to take transfer lock\".
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="jwodder"
avatar="http://cdn.libravatar.org/avatar/b06e01332c949b895c681cc92934f36a"
subject="comment 3"
date="2021-10-27T18:19:23Z"
content="""
As to your questions, I am using git-annex 8.20211011 on macOS 11.6. The problem does not occur when the `--jobs` option is omitted, but that's not viable for the current project we're using git-annex for.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-10-27T18:40:48Z"
content="""
Aha, that makes sense! addurl constructs a url-based Key to use while
downloading, and the key transfer machinery prevents redundant downloads
of the same Key at the same time.
Arguably, the problem is not where the message gets put, but that
it fails when adding an url to two different paths at the same time.
I have, though, moved that message so it will appear in error-messages.
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-10-27T18:56:23Z"
content="""
The best solution I can find is for it to notice when another thread is
downloading the same url, and wait until it finishes. Then proceed
with downloading the url for a second time.
It's not very satisfying to re-download. But once the url Key is downloaded,
it does not keep that url Key populated, but hashes the content and moves
the content to the final Key. It would be a real complication to
communicate, across threads, what Key the content ended up at, and have the
waiting thread use that. And addurl is already complicated well beyond a
point I am comfortable with.
Also, the content of an url can of course change over time. If I feed
"$url foo" into git-annex addurl --batch -J10 and then some time
later, I feed "$url bar", I might expect that file bar gets whatever
content the url has now, not the content that the url had back when I added
the same url to file foo. And if I cared about avoiding re-downloading,
I could add the url to the first file, and then copy the annex link to the
second file myself.
Implemented this approach.
"""]]

View file

@ -0,0 +1,21 @@
### Please describe the problem.
This is a continuation to the [prior report/discussion](https://git-annex.branchable.com/bugs/leaks_git_config_error_message_upon_inability_to_read_downloaded___34__config__34___file/#comment-424548e59fc41618ffeeb65f418694b3) to facilitate access to private repositories on public hosting portals.
If we place more odd/custom behavior of gitlab etc installations which forward to login screen (thus no 401 or 404 response) upon attempt to access something which might be within private rep, aside, the situation with github and gogs (github clone) which powers gin (which I had [mentioned](https://git-annex.branchable.com/bugs/leaks_git_config_error_message_upon_inability_to_read_downloaded___34__config__34___file/#comment-ec2193d97bb19945ad74cee13f747b35) in that prior discussion)) is different: they return 404 response. And I think (didn't check git code, but just based on its behavior) `git` is then asking for credentials as the "next way to try". I think git-annex should do the same -- if 404 received, ask `git credential` to fill for that domain (as it would do now in case of 401).
### What steps will reproduce the problem?
Try to clone and get data from a private repository on [https://gin.g-node.org/](https://gin.g-node.org/) (repo could be created, or let me know and I would create one, but you would still need to register there). I am not yet 100% certain that upon authentication you would be able to fetch that `/config` (haven't tried). Satellite issue/discussion I just initiated on gin is [here](https://github.com/G-Node/gogs/issues/111)
### What version of git-annex are you using? On what operating system?
8.20201127+git54-ga1b227171-1~ndall+1
edit 1: although probably a deeper look into how/why git decides to ask for credentials for private repos might be due. May be similar check should be done by git-annex first, since otherwise there might be no way to tell apart from a "proper" 404 for inability to get `/config` from github
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[notabug|done]] --[[Joey]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-01-21T16:57:06Z"
content="""
The git source code does not appear to behave
like that, see http.c `normalize_curl_result`, which reauths on 401, but
not on 404. If you think git behaves like this, you need to show an example
where it clearly accesses an url that is 404 and goes on to authenticate.
Seems to me that these hosting sites may simply not be exposing foo.git/config
to http. Git does not request that file over http. Such a hosting site would
probably also not expose foo.git/annex/ over http, so git-annex would not be
able to use it anyway. To support git-annex, it would need to
expose both, and then git-annex's handling of 401 should work fine for
authentication.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2021-01-21T18:36:50Z"
content="""
a quick one: https://gin.g-node.org/ does expose `foo.git/annex/` -- that is what gin has extended original borg with. Example repo to try on https://gin.g-node.org/ljchang/Sherlock . The problem/difficulty is only in access to \"private\" repositories -- access to config and annexed files is working fine through http
"""]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-01-21T19:20:00Z"
content="""
It still seems easy to demonstrate that git does not ask for creds on 404:
joey@darkstar:~> git clone http://google.com/this-url-does-not-exist
Cloning into 'this-url-does-not-exist'...
fatal: repository 'http://google.com/this-url-does-not-exist/' not found
So I need you to show me what makes you think that git does such a strange
thing, before I can take seriously a request to replicate that behavior in
git-annex. Because the only possible reason I would implement such an
insane thing is if git has lost its collective mind and so I needed to
follow into the abyss.
If the actual issue is that gogs has implemented support for git-annex,
but that it sends 404 when git-annex requests config from a
private repo, rather than 401, it seems to me the place to fix that is in
gogs.
"""]]

View file

@ -0,0 +1,112 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2021-01-22T01:47:46Z"
content="""
yeap, it is not about 404 ...
<details>
<summary>with gogs/gin situation is obscure but \"easyish\" - 401 is returned upon access to `/info/refs` but not above:</summary>
```shell
$> wget -S \"https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info/refs\"
--2021-01-21 20:37:22-- https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info/refs
Resolving gin.g-node.org (gin.g-node.org)... 141.84.41.219
Connecting to gin.g-node.org (gin.g-node.org)|141.84.41.219|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 401 Unauthorized
Date: Fri, 22 Jan 2021 01:37:23 GMT
Server: Apache/2.4.38 (Debian)
content-type: text/plain
www-authenticate: Basic realm=\".\"
content-length: 0
set-cookie: lang=en-US; Path=/; Max-Age=2147483647
set-cookie: gnode_gin=823b677f19feb8ef; Path=/; HttpOnly
set-cookie: _csrf=GrekbiqDJleLLNcVyax5z77buGY6MTYxMTI3OTQ0MzYwMTMyMzE4NQ; Path=/; Expires=Sat, 23 Jan 2021 01:37:23 GMT
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Username/Password Authentication Failed.
1 51975 ->6 [2].....................................:Thu 21 Jan 2021 08:37:23 PM EST:.
(git)lena:~/proj/misc/git[master]git
$> wget -S \"https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info\"
--2021-01-21 20:37:52-- https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info
Resolving gin.g-node.org (gin.g-node.org)... 141.84.41.219
Connecting to gin.g-node.org (gin.g-node.org)|141.84.41.219|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 404 Not Found
Date: Fri, 22 Jan 2021 01:37:53 GMT
Server: Apache/2.4.38 (Debian)
content-type: text/html; charset=UTF-8
set-cookie: lang=en-US; Path=/; Max-Age=2147483647
set-cookie: gnode_gin=26d42c5108c8715d; Path=/; HttpOnly
set-cookie: _csrf=SAKUL4rdspufTb_lxEWIijnzYBU6MTYxMTI3OTQ3Mjk5MDczODgzMA; Path=/; Expires=Sat, 23 Jan 2021 01:37:52 GMT
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
2021-01-21 20:37:53 ERROR 404: Not Found.
```
</details>
github is ... trickier, or to say -- my C/gdb/whatever foo is not good enough, since
<details>
<summary>it is still 404 with simple wget but git remote-https seems to get 401:</summary>
```shell
(gdb) p results
$15 = {curl_result = CURLE_HTTP_RETURNED_ERROR, http_code = 401, auth_avail = 1, http_connectcode = 0}
(gdb) p rl
No symbol \"rl\" in current context.
(gdb) p url
$16 = 0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\"
(gdb) bt
#0 http_request (url=0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\",
result=<optimized out>, target=<optimized out>, options=0x7fffffffd920) at http.c:1981
#1 0x00005555555665bf in http_request_reauth (
url=0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\", result=0x7fffffffd880,
target=0, options=0x7fffffffd920) at http.c:2040
#2 0x000055555555f7f3 in discover_refs (service=<optimized out>, service@entry=0x5555556b622c \"git-upload-pack\",
for_push=for_push@entry=0) at remote-curl.c:493
#3 0x000055555556137e in get_refs (for_push=<optimized out>) at remote-curl.c:548
#4 cmd_main (argc=argc@entry=3, argv=argv@entry=0x7fffffffdcd8) at remote-curl.c:1523
#5 0x000055555555ee94 in main (argc=3, argv=0x7fffffffdcd8) at common-main.c:52
```
```
$> wget --header \"Git-Protocol: version=2\" --header \"Pragma: no-cache\" -S 'https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack'
--2021-01-21 20:41:21-- https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 404 Not Found
Server: GitHub.com
Date: Fri, 22 Jan 2021 01:41:21 GMT
Content-Type: text/plain; charset=utf-8
Status: 404 Not Found
Vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With
Cache-Control: no-cache
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Frame-Options: deny
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Expect-CT: max-age=2592000, report-uri=\"https://api.github.com/_private/browser/errors\"
Content-Security-Policy: default-src 'none'; base-uri 'self'; connect-src 'self'; form-action 'self'; img-src 'self' data:; script-src 'self'; style-src 'unsafe-inline'
Set-Cookie: _gh_sess=UoF3mYOvfYf5mFbK1tr7aWOuYpQbNoJVhajA5nr2ANUvg%2FekQjtgh0h3xLva0EcwHnLNNsl7VMEdVLXNGi9Yn4AbjrBxX0sdo51DL1XQYR%2Bm3ZeS71I7keexEnrZspp%2FQxaT7cJpceXr7ZrKg2HwJu8dMo%2Bcz13Vr%2F9p7MtZ6cIjUMMF3ql8GX%2BYO949RdgS31KNBb1Ln917v7GlLaZhbejgGAYJOFI2YMuWhs3WkZxOZCMy1JnW%2Bbp3OcdyffBt0ToaKaLcUx1mt6kzzOb4Ow%3D%3D--FD5dTEIs8HUBjIdH--P%2B86pTRJ%2FwWUndICVXAaNA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
Set-Cookie: _octo=GH1.1.1513753117.1611279681; Path=/; Domain=github.com; Expires=Sat, 22 Jan 2022 01:41:21 GMT; Secure; SameSite=Lax
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sat, 22 Jan 2022 01:41:21 GMT; HttpOnly; Secure; SameSite=Lax
Content-Length: 9
X-GitHub-Request-Id: 8F40:2881:CD3AD3:1222997:600A2D41
2021-01-21 20:41:21 ERROR 404: Not Found.
```
</details>
but overall the point is that git does seems to get 401 with auth availability (although I failed to dig out how exactly it gets it). So I will leave it to the experts to figure out how
"""]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-01-22T18:36:50Z"
content="""
These possibilities seem about equally likely to me:
1. gogs has not implemented authed access to the files git-annex needs
for private repositories
2. gogs has a bug where it returns 404 rather than 401 when not authed,
but serves the files up when authed.
So why try to work around it in git-annex when it's a coin flip whether
git-annex can at all, when in either case there's clearly a bug in gogs,
and is specifically in code in gogs that is intended to support git-annex?
github has a bad habit of using user-agent to make urls do different
things when git accesses them than when other http clients do. That is the
case in your example; use wget -U git/1 and it will 401. But I don't
see how that's relevant, since git-annex does not talk to github except for
a) via git and b) via its git-lfs implementation (which supports http basic
auth although I can't remember if I tested it against github's server or only
other servers like gitlab).
If github's lfs endpoint did do user-agent sniffing, IMHO that would
violate their spec, but also yeah, I'd probably put in some appropiately
snarky fake user-agent in git-annex there. But not in general, and none of
this says git-annex should be treating 404 like 401.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2021-01-25T15:18:39Z"
content="""
THANK YOU Joey. That is indeed quite odd (\"security through obscurity\") behavior from github (note: github returns 401 even if that repo does not exist, so it is at least consistent in not revealing presence/absence of private repos at a url). Feel welcome to close this issue since I guess nothing should indeed be done on git-annex side, and ideally `gin` portal just returns 401 in such cases
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2021-01-28T16:37:59Z"
content="""
github's rationalle for the sniffing, such as it is, is that an url to a
git repository lets you view it in the web ui, and the same url can be
cloned by git.
Agreed, I'll close this in git-annex, and they can fix it in gin.
"""]]

View file

@ -0,0 +1,88 @@
### Please describe the problem.
decided to test annex on a new to me file system -- beegfs
```
$> mount | grep beegfs
beegfs_nodev on /mnt/beegfs type beegfs (rw,relatime,cfgFile=/etc/beegfs/beegfs-client.conf,_netdev)
```
```
$> modinfo beegfs
filename: /lib/modules/5.4.0-77-generic/updates/fs/beegfs_autobuild/beegfs.ko
version: 7.2.2
alias: fs-beegfs
author: Fraunhofer ITWM, CC-HPC
description: BeeGFS parallel file system client (http://www.beegfs.com)
license: GPL v2
srcversion: 533BB7E5866E52F63B9ACCB
depends: ib_core,rdma_cm
retpoline: Y
name: beegfs
vermagic: 5.4.0-77-generic SMP mod_unload modversions
```
### What steps will reproduce the problem?
1. get beegfs
2.
```
leviathan:/mnt/beegfs/yoh/tmp
$> TMPDIR=$PWD/annex-tmp git annex test
```
### What version of git-annex are you using? On what operating system?
```
leviathan:/mnt/beegfs/yoh/tmp
$> git annex version
git-annex version: 8.20210621-g91f9aac
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
```
### Please provide any additional information below.
looking in detail -- it seems it is not init, but addurl (but subject is set in stone now, can't edit) -- got mislead I guess by the interleaving stdout/err:
[[!format sh """
addurl: FAIL (2.79s)
Init Tests
init: ./Test/Framework.hs:57:
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo96/myurl failed (transcript follows)
(to _mnt_beegfs_yoh_tmp_.t_tmprepo96_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo96%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
...
addurl: FAIL (1.86s)
./Test/Framework.hs:57:
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo193/myurl failed (transcript follows)
(to _mnt_beegfs_yoh_tmp_.t_tmprepo193_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo193%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
Init Tests
...
addurl: FAIL (2.29s)
./Test/Framework.hs:57:
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo293/myurl failed (transcript follows)
(to _mnt_beegfs_yoh_tmp_.t_tmprepo293_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo293%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
3 out of 984 tests failed (1776.96s)
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
on days ending with `y` it seems to work quite nicely.
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]], I think, though have not installed beegfs to test.
> --[[Joey]]

View file

@ -0,0 +1,23 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-07-02T14:26:34Z"
content="""
EBUSY The rename fails because oldpath or new
path is a directory that is in use by
some process (perhaps as current working
directory, or as root directory, or be
cause it was open for reading) or is in
use by the system (for example as mount
point), while the system considers this
an error. (Note that there is no re
quirement to return EBUSY in such cases—
there is nothing wrong with doing the
rename anyway—but it is allowed to re
turn EBUSY if the system cannot other
wise handle such situations.)
".git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo193%myurl"
is not a directory, it is a file. So, rename seems to have no business failing
in this way. Probably the FS is buggy.
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2021-07-04T03:27:20Z"
content="""
Thank you Joey! indeed most likely a \"too fancy\" of a file system.
On [https://www.beegfs.io/release/beegfs_6/Changelog.txt](https://www.beegfs.io/release/beegfs_6/Changelog.txt) I found
```
== Changes in 6.11 (release date: 2017-05-26) ==
General Changes:
* client: Add option sysRenameEbusyAsXdev to return EXDEV instead of EBUSY if
rename() is called on open files. (Tools like \"mv\" can handle EXDEV as return
value.)
```
do you think EXDEV would be worked out Ok if that is the culprit? (meanwhile I will let the beegfs users know as well - may be they could try)
"""]]

View file

@ -0,0 +1,50 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-07-05T16:18:39Z"
content="""
I've checked with strace, to see if the file was open while it was being
renamed. Not that there is anything generally wrong with renaming an open
file on a POSIX file system, but it would possibly be a problem on windows,
where some forms of opening a file locks it in place. And apparently
this filesystem is not trying to be very POSIX either.
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 17
413026 write(17, "hi\n", 3) = 3
413026 close(17) = 0
...
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 11
413026 read(11, "hi\n", 8192) = 3
...
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_RDONLY|O_NOCTTY|O_NONBLOCK <unfinished ...>
413028 <... futex resumed>) = 0
413026 <... openat resumed>) = 16
...
413026 read(16, "hi\n", 32752) = 3
...
413026 close(16) = 0
...
413026 rename(".git/annex/tmp/URL-s3--file&c%%%tmp%foo", "_tmp_foo") = 0
...
413028 close(11) = 0
So the file is left open across the rename, which ought to be able to be
changed and would presumably fix the problem.
It's also a bit odd that the file gets read twice after being copied,
once for checksum makes sense, but what's the other one?
(Copying while checksumming should be able to avoid one of the reads,
but there is an open todo tracking progress on that.)
Aah, the other read is when it's probing if the file is html in case it ought
to be passed off to youtube-dl. That is the read that lingers for a while,
because it's done with a lazy readFile and probing if the file is html doesn't
read to the end and close it, so the file handle lingers until the GC gets
around to closing it. Of course youtube-dl won't be able to do anything with a
file url, but git-annex doesn't know that. And anyway the failure on this
filesystem would also happen when adding a http url.
Ok, fixed it to close the handle promptly. That should fix the test suite.
It does not seem unlikely that something else will break due to this
filesystem's unusual behavior though.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-07-05T17:17:59Z"
content="""
Also looked over other uses of readFile. While there are a couple that
don't read the whole file and so may have a lag closing, none of them are
files that are used in ways that seem likely to trigger this kind of
problem.
"""]]

View file

@ -0,0 +1,28 @@
### Please describe the problem.
Probably it is more of a todo than a bug.
### What steps will reproduce the problem?
This is a use-case where I am trying to establish a special remote to be shared by multiple unrelated repositories.
So I had original repo1 in which I
- created an external special remote with chunking, it got UUID1
- uploaded some data (all got chunked)
created repo2 in which I
- initialized special remote with identical settings and provided `uuid=UUID1`
- decided to test if annex would be able to get a key from the shared special remote
but `annex fsck --key KEY --from remote --fast`, since it doesn't have an exact chunking list, just provides special remote backend with original full key only, which is obviously not found, and it reports failure. But I wondered -- couldn't `git-annex` just use chunking size and "mint" possible chunked-keys to test on the special remote since it has all the information? After all chunk keys AFAIK are deterministically minted and pretty much are just "augmented" original key with `-S<chunksize>-C<chunkindex>` added to the key.
### What version of git-annex are you using? On what operating system?
8.20200908+git175-g95d02d6e2-1~ndall+1
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-10-22T16:09:17Z"
content="""
Note that what you are trying to do will only work if the special remote
is not encrypted.
As well as your use case, which seems very unusual, I think one other use
case would be if a clone uploaded to the special remote, but never synced
out its git-annex branch before being lost, and fsck --from
remote is being run in another clone to reconstruct it. Currently it
won't try chunks as none are recorded.
Speculatively trying the current remote's chunk config would handle the
majority of cases, though wouldn't help if the other clone had adjusted the
special remote's chunk size too.
There's some overhead, but it can check it last, and not check it if
it's in the list of known chunks, so the overhead would only usually
be paid if the content git-annex expected to be present had gone missing,
which I think is rare enough to not care about.
(Also, this can only be done when the size of the key is known, so not
eg addurl --relaxed keys.)
"""]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-10-22T17:00:25Z"
content="""
Implemented that. But..
As implemented, there's nothing to make the chunk size get stored in the
chunk log for a key, after it accesses its content using the configured
chunk size.
So, changing the chunk= of the remote can prevent accessing content that
was accessible before. Of course, avoiding that is why chunk sizes are
logged in the first place.
Seems like maybe fsck --from should fix the chunk log? I think
fsck would always need to be used, to fix up the location log, before any
other commands rely on the data being in the special remote, so it seems
fine to only fix the chunk log there.
But, also a bit unclear how fsck would find out when it needs to do this.
It only needs to when the remote's configured chunk size is not
listed in the chunk log. But that's also common after changing the chunk
size of a remote. So it would have to mess around with checking the
presence of chunk keys itself, which would be extra work and also ugly
to implement.
I'm leaving this todo^Wbug open for now due to this.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-10-22T17:36:12Z"
content="""
Ok, made update the chunk log as needed while checking if chunks are
present. So this is done.
"""]]

View file

@ -0,0 +1,119 @@
### Please describe the problem.
I was trying to follow https://git-annex.branchable.com/special_remotes/git-lfs/ (only without any encryption), to store at least some data on github via LFS (e.g., for https://github.com/dandi-datasets/nwb_test_data).
Even though I do provide URL to the `annex initremote` call, it is not stored within `remote.log`:
[[!format sh """
$> sudo rm -rf /tmp/testds2 && ( mkdir /tmp/testds2 && cd /tmp/testds2 && git init && git annex init && git annex initremote gh-lfs autoenable=true type=git-lfs url=git@github.com:yarikoptic/testds2.git encryption=none && git show git-annex:remote.log; )
Initialized empty Git repository in /tmp/testds2/.git/
init (scanning for unlocked files...)
ok
(recording state in git...)
initremote gh-lfs ok
(recording state in git...)
c9132e68-e9d8-40b5-ba34-5d60a8b9c844 autoenable=true encryption=none name=gh-lfs type=git-lfs timestamp=1570642576.06742667s
"""]]
git annex 7.20190912-1~ndall+1
If I just proceed, populate and copy some data via lfs (example uses datalad's `create-sibling-github` to create a new repo):
[[!format sh """
$> ( cd /tmp/testds2 && touch 123 && git annex add 123 && git commit -m 'add 123' && datalad create-sibling-github -s origin testds2 && git push -u origin master && git annex copy --to=gh-lfs 123; git push origin git-annex; )
add 123
ok
(recording state in git...)
[master (root-commit) d2b2f52] add 123
1 file changed, 1 insertion(+)
create mode 120000 123
[WARNING] Authentication failed using a token.
.: origin(-) [https://github.com/yarikoptic/testds2.git (git)]
'https://github.com/yarikoptic/testds2.git' configured as sibling 'origin' for <Dataset path=/tmp/testds2>
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 307 bytes | 307.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To github.com:yarikoptic/testds2.git
* [new branch] master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.
copy 123 (to gh-lfs...)
ok
(recording state in git...)
Enumerating objects: 19, done.
Counting objects: 100% (19/19), done.
Delta compression using up to 4 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (19/19), 1.66 KiB | 567.00 KiB/s, done.
Total 19 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), done.
remote:
remote: Create a pull request for 'git-annex' on GitHub by visiting:
remote: https://github.com/yarikoptic/testds2/pull/new/git-annex
remote:
To github.com:yarikoptic/testds2.git
* [new branch] git-annex -> git-annex
"""]]
on a new clone I get a complaint that `url=` is missing, and no data is fetched
[[!format sh """
$> sudo rm -rf testds2-clone && git clone git@github.com:yarikoptic/testds2.git testds2-clone && ( cd testds2-clone && git annex init && git annex get 123; )
Cloning into 'testds2-clone'...
remote: Enumerating objects: 22, done.
remote: Counting objects: 100% (22/22), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 22 (delta 5), reused 21 (delta 4), pack-reused 0
Receiving objects: 100% (22/22), done.
Resolving deltas: 100% (5/5), done.
123@
init (merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
Invalid command: 'git-annex-shell 'configlist' '/~/yarikoptic/testds2.git''
You appear to be using ssh to clone a git:// URL.
Make sure your core.gitProxy config option and the
GIT_PROXY_COMMAND environment variable are NOT set.
Remote origin does not have git-annex installed; setting annex-ignore
This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
(Auto enabling special remote gh-lfs...)
Specify url=
ok
(recording state in git...)
get 123 (not available)
Try making some of these repositories available:
92ce3cfc-8c58-42db-8aa3-ea4d4b3a6011 -- yoh@hopa:/tmp/testds2
c9132e68-e9d8-40b5-ba34-5d60a8b9c844 -- gh-lfs
(Note that these git remotes have annex-ignore set: origin)
failed
git-annex: get: 1 failed
"""]]
so I had to enableremote it while providing URL I become able to `get` the file:
[[!format sh """
$> git annex enableremote gh-lfs autoenable=true type=git-lfs url=git@github.com:yarikoptic/testds2.git encryption=none && git annex get 123
enableremote gh-lfs ok
(recording state in git...)
get 123 (from gh-lfs...)
(checksum...) ok
(recording state in git...)
"""]]
Shouldn't that URL be recorded in remote.log? (similarly to `type=git` remotes)
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]]; see my comment --[[Joey]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-10-21T19:07:42Z"
content="""
That is intentional, because a git-lfs remote can have multiple urls that
can access it, and different users of the remote might want to use
different urls.
It's also documented to work that way, the same as the directory
special remote documents that you have to provide directory= each time it's
enabled.
But, now that git-annex supports sameas remotes, it would be possible to
have one special remote for each different url to a given git-lfs remote,
and have git-annex know they're the same repository. The user can then
enableremote whichever one they want.
See [[todo/git-lfs_special_remote_simpler_setup]] for where I hope this
will lead.
Closing this bug report as redundant with that todo item, and not actually a
bug since it is documented to behave the way it currently behaves.
"""]]

View file

@ -0,0 +1,49 @@
### Please describe the problem.
I am trying to import (and then reimport) a directory which I sync to from box.com shared with me folder.
I have used `--duplicate` option to not delete original files upon `import`. But then upon-rerunning `import` command git-annex would error out if file already exists. `--reinject-duplicates` seems to be the option to use, but all those modes are "exclusive" so I cannot use `--duplicate --reinject-duplicates`, and using `--reinject-duplicates` alone would result in removing original files (as without `--duplicates`)
### What version of git-annex are you using? On what operating system?
7.20190819+git2-g908476a9b-1~ndall+1
### Please provide any additional information below.
my little demo snippet for import with using --duplicate and then both options at the same time:
[[!format sh """
$> mkdir /tmp/d-in /tmp/d-repo && touch /tmp/d-in/file && ( cd /tmp/d-repo && git init && git annex init && for r in 1 2; do echo "Run $r"; ls -l ../d-in && git annex import --duplicate ../d-in/.; done )
Initialized empty Git repository in /tmp/d-repo/.git/
init ok
(recording state in git...)
Run 1
total 0
-rw------- 1 yoh yoh 0 Oct 14 10:51 file
import ./file ok
(recording state in git...)
Run 2
total 0
-rw------- 1 yoh yoh 0 Oct 14 10:51 file
import ./file
not overwriting existing ./file (is a symlink)
failed
git-annex: import: 1 failed
$> cd d-repo
$> git annex import ../d-in/. --reinject-duplicates --duplicate 2>&1 | head -n 3
Invalid option `--duplicate'
Usage: git-annex COMMAND
"""]]
Or may be there is a better way to establish re-runnable import from a directory workflow?
[[!meta author=yoh]]
[[!tag projects/dandi]]
[[!tag moreinfo]]
> [[done]] --[[Joey]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-11-19T17:12:41Z"
content="""
I think that you can accomplish what you want by making the directory
you're importing from be a directory special remote with exporttree=yes
importtree=yes and use the new `git annex import master --from remote`
If that does not do what you want, I'd prefer to look at making it be able
to do so. I hope to eventually remove the legacy git-annex import from
directory, since we have this new more general interface.
"""]]

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-03-30T15:50:17Z"
content="""
Tagged moreinfo since I'm waiting on a reply to my suggestion.
"""]]

View file

@ -0,0 +1,59 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2020-10-06T01:26:59Z"
content="""
I think it worked wonderfully
<details>
<summary>here is my script I have tried</summary>
```shell
#!/bin/bash
export PS4='> '
set -x
set -eu
cd \"$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)\"
mkdir d-in d-repo
echo content >| d-in/file
function dance() {
git annex import master --from d-in
# but we need to merge it
git merge d-in/master
ls -l
grep -e . *
}
(
cd d-repo
git init
git annex init
git annex initremote d-in type=directory directory=../d-in exporttree=yes importtree=yes encryption=none
ls -l ../d-in
for r in 1 2; do
echo \"Run $r\";
dance
done
echo \"more\" >> ../d-in/file
echo \"new\" > ../d-in/newfile
dance
rm ../d-in/file
dance
)
```
</details>
and it seemed to do the right job! I have not tried to add some `.gitattributes` into that branch it imports into to tell some files to go to git, but I hope it would just work, and if not -- I will come back! feel welcome to close this issue.
Cheers
"""]]

View file

@ -0,0 +1,70 @@
### Please describe the problem.
[original question raised by John](https://github.com/dandi/dandisets/issues/139#issuecomment-1149948239) which lead me to the goose chase.
Following reproducer
```
#!/bin/bash
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
set -eux
git init --bare remote
( cd remote; git annex init; cat config )
rpath=$PWD/remote
git init repo
cd repo
git annex init
echo 'This is test text.' > file.txt
git add file.txt
git commit -m Init file.txt
git remote add --fetch remote-git $rpath
# without this -- there is no annex-uuid for remote -- git-annex branch is not getting merged
git annex info
cat .git/config
# but this still fails
git annex initremote testremote type=git location=$rpath autoenable=true
```
ends with
```
[remote "remote-git"]
url = /home/yoh/.tmp/dl-VjO0aSF/remote
fetch = +refs/heads/*:refs/remotes/remote-git/*
annex-uuid = afdc6d54-cd6d-4a20-b639-a639f9c7ef09
+ git annex initremote testremote type=git location=/home/yoh/.tmp/dl-VjO0aSF/remote autoenable=true
initremote testremote
git-annex: could not find existing git remote with specified location
failed
initremote: 1 failed
```
so
- error "could not find existing git remote with specified location" seems not descriptive of the underlying problem since location matches the url. Underlying issue is still not clear why we can't initremote
- as you could see in the script - need `annex info` to have annex-uuid populated and looking at [code ](https://git.kitenet.net/index.cgi/git-annex.git/tree/Remote/Git.hs?id=af0d854460c28230dc682faa7c6daf3d96698cb6#n110) comment -- it requires UUID to be known. If not known -- ideally should be a dedicated error message ("remote blah found but lacks uuid, check if remote is annex")
- IMHO should not need manual `annex info` to merge git-annex branch
### What steps will reproduce the problem?
above
### What version of git-annex are you using? On what operating system?
10.20220504
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-06-08T16:55:50Z"
content="""
Hmm, I think this only works for ssh:// urls currently.
Even the ssh url form host:/path does not work, because it gets
normalized to a ssh:// url.
The implementation does not support non-url's at all; the provided location
is treated as an url (`Git.Url location`). And even if it were treated as a
path, the path gets normalized to a relative path and an absolute path (or
differently relavatized path) would not work.
Using paths with this is rather problematic too, because if the repo is
cloned to another machine, it would not find the repo at the recorded path.
Similarly, relative paths are also problimatic. But it may as well support
them to the extent it can.
I think this needs changes to the core Git data structure, to store the
original, unmodified git.remote.path. Or a different interface than the
current, one that accepts any repo location and probes it to find the uuid.
The latter idea seems better because it simplifies the UI rather than
complicating the internal representation.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2022-06-09T17:04:23Z"
content="""
Implemented probing of the uuid of the repo location. Which may change
how you use this feature. Although the old roundabout method of having an
existing git remote and running initremote with the same location will
work too, it's not neccessary to do that anymore.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="comment 2"
date="2022-06-09T02:32:18Z"
content="""
Wouldn't it be possible to support (absolute) file:// urls, eg. something similar to
`file:///home/jkniiv/test-VEfBrTZ/remote2`? In my mind they feel like a reasonable approximation
of ssh:// urls and could be useful for getting a feel for git special remotes before setting
up a bare git-repo/annex on an ssh-server. I know they are not the same thing implementation wise
but I feel that being able to try this feature out on a least-effort basis would be useful
from a pedagogical standpoint.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2022-06-09T17:28:19Z"
content="""
Re file:// urls, it does now work to use them in location=. I don't know if
I'd consider using them any better than absolute paths though. YMMV.
"""]]

View file

@ -0,0 +1,150 @@
[[!meta title="http remotes that require authentication are not yet supported"]]
It is not a ground shaking issue, but probably would be best to handle it more gracefully.
Initially mentioned while doing install using datalad. Account/permission is required to access this particular repo, ask Canadians for access if you don't have it yet Joey. credentials I guess got asked for and cached by git upon initial invocation, so upon subsequent calls didn't ask for any:
[[!format sh """
$> datalad install https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
[INFO ] Cloning https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids [1 other candidates] into '/tmp/Coffey-mri-bids'
[INFO ] fatal: bad config line 1 in file /home/yoh/.tmp/git-annex96493-5.tmp
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
install(ok): /tmp/Coffey-mri-bids (dataset)
"""]]
which boiled down to that message being spited out during `git annex init` which samples the remote, but fails to download the config and gets instead a redirected html page:
[[!format sh """
$> git clone https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
Cloning into 'Coffey-mri-bids'...
warning: redirecting to https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids.git/
remote: Enumerating objects: 398, done.
remote: Counting objects: 100% (398/398), done.
remote: Compressing objects: 100% (282/282), done.
remote: Total 398 (delta 53), reused 393 (delta 48)
Receiving objects: 100% (398/398), 34.97 KiB | 795.00 KiB/s, done.
Resolving deltas: 100% (53/53), done.
$> git -C Coffey-mri-bids annex init --debug
...
[2019-11-27 19:27:01.341315979] Request {
host = "git.bic.mni.mcgill.ca"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/7.20190819+git2-g908476a9b-1~ndall+1")]
path = "/bic/Coffey-mri-bids/config"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
[2019-11-27 19:27:01.90016181] read: git ["config","--null","--list","--file","/home/yoh/.tmp/git-annex228094-5.tmp"]
fatal: bad config line 1 in file /home/yoh/.tmp/git-annex228094-5.tmp
[2019-11-27 19:27:01.913302324] process done ExitFailure 128
Remote origin not usable by git-annex; setting annex-ignore
$> wget -S https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
--2019-11-27 19:29:25-- https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
Resolving git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)... 132.216.133.92
Connecting to git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)|132.216.133.92|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 302 Found
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 109
Connection: keep-alive
Cache-Control: no-cache
Location: https://git.bic.mni.mcgill.ca/users/sign_in
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; secure; HttpOnly
X-Request-Id: xTcSyu4H36
X-Runtime: 0.071681
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Location: https://git.bic.mni.mcgill.ca/users/sign_in [following]
--2019-11-27 19:29:26-- https://git.bic.mni.mcgill.ca/users/sign_in
Reusing existing connection to git.bic.mni.mcgill.ca:443.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=0, private, must-revalidate
Etag: W/"305857ff0ba591a1e4ee7fec83b5687c"
Referrer-Policy: strict-origin-when-cross-origin
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; expires=Thu, 28 Nov 2019 02:29:26 -0000; secure; HttpOnly
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: DENY
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: MHFi7Yjxe82
X-Runtime: 0.063359
X-Ua-Compatible: IE=edge
X-Xss-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Length: unspecified [text/html]
Saving to: config
config [ <=> ] 13.19K --.-KB/s in 0s
2019-11-27 19:29:26 (89.1 MB/s) - config saved [13505]
$> cat config
<!DOCTYPE html>
<html class="devise-layout-html">
<head prefix="og: http://ogp.me/ns#">
<meta charset="utf-8">
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="object" property="og:type">
<meta content="GitLab" property="og:site_name">
<meta content="Sign in" property="og:title">
...
"""]]
I guess the problem is multi-faceted:
1. in case of authenticated http remote, `git` caches credentials, but then `git annex` tries to download file directly (instead of somehow via git), it could not "sense" that remote to be a valid annex and/or get files from it.
You can try with this simple one -- user "demo", password "demo":
[[!format sh """
$> git clone http://www.onerussian.com/tmp/secret-repo/.git
Cloning into 'secret-repo'...
Username for 'http://www.onerussian.com': demo
Password for 'http://demo@www.onerussian.com':
$> git -C secret-repo annex init
init (merging origin/git-annex into git-annex...)
(recording state in git...)
Remote origin not usable by git-annex; setting annex-ignore
ok
(recording state in git...)
"""]]
although remote is a proper annex, indeed `git annex` cannot use it since does not authenticate as git does.
So even though the error message is not incorrect, I would say the situation is suboptimal
2. if remote server instead of just returning 404 or 403 error code (as eg github seems to do in similar cases of non-authenticated access) instead redirects to some login page, annex feeds that page as a config to git, ignores the error message and just marks that remote as ignored for annex, while leaking that obscure "fatal" error message from git.
IMHO, ideally 1. should be addressed properly (authentication), and for 2. annex should spit out some more sensible message ("git failed to parse a config file fetched from the remote X. Please inspect it at this /path/config"), so keep that file around for debugging. As it is now I had to dig quite deep to figure out WTF is going on.
git annex 7.20190819+git2-g908476a9b-1~ndall+1 and the same with bleeding edge 7.20191114+git43-ge29663773-1~ndall+1 (probably that commit is the one with my patch for stricter git versioning, so use the count of 42 ;))
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]]; the error message is improved and also git remotes that need
> http basic auth to access will get password from `git credential`.
> --[[Joey]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="related: shouldn't git annex try external remotes to download config?"
date="2019-11-28T01:22:53Z"
content="""
I haven't tested, but I can see the situation where a specific repository URL could be handled by external special remote (such as datalad, downloaders of which do handle obscure setups such as this one without 403/404 but rather forwarding to login page) which would provide authenticated access to the URL. Would annex even try that config URL via external special remotes?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2019-11-29T18:09:45Z"
content="""
one of the use-cases (will be) https://gin.g-node.org/ -- an archive of (primarily) electrophys data. The platform is based on gogs, but uses git-annex underneath. It \"will be\" because currently access to git-annex is provided only via ssh, but as of today it is already possible to `git clone` (tried on public, didn't try private) datasets via https, and developers are looking into exposing git-annex also via http. To access private datasets authentication will need to be handled
"""]]

View file

@ -0,0 +1,31 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-01-22T16:04:37Z"
content="""
git-annex could use `git credential` if the config download fails with
401 unauthorized and then retry with the credentials. (The git-lfs special
remote already does this.) And it would also need to do the same thing
when getting a key from the remote.
But that would not help with the https://git.bic.mni.mcgill.ca example,
apparently, because there's no 401, but a 302 redirect to a 200,
that is indistingishable from a successful download.
Yeah, when git-annex expects a git config, if it doesn't parse as one,
it could retry, asking for credentials.
But that seems asking for trouble: what if it fails to parse for
another reason, maybe the web server served up something other than the
expected config, maybe a captive portal got in the way. There would be a
username/password prompt that doesn't make sense to the user at all.
And if this happens in a key download, git-annex certianly has no way to
tell that what it downloaded is not intended as the content of a key,
short of verifying the content, and failure to verify certainly doesn't
justify prompting for a username/password.
So, I am not comfortable with falling back to ask for credentials unless
I've seen a http status code that indicates they are necessary.
And IMHO gitlab's use of a 302 redirect to a login page is a bug in
gitlab, and will need to be fixed there, or a better http server used.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""re: related: shouldn't git annex try external remotes to download config?"""
date="2020-01-22T16:31:16Z"
content="""
No, the external special remote protocol is not aimed at downloading git
config files. Anyway, this code path is never involved with using
special remotes; the uuid of a special remote is known and so there is no
need to ever download a git config file to discover it.
"""]]

View file

@ -0,0 +1,48 @@
### Please describe the problem.
May be not a problem per se, but decided to check if expected. Following [this advise](http://git-annex.branchable.com/todo/git_smudge_clean_interface_suboptiomal/#comment-65f848510d8684bf65c6698f68b700dd) I have `git config filter.annex.process "git-annex filter-process"` in that git-annex repo and now observe following tree (in htop) of processes:
```
3799768 dandi 20 0 1025G 191M 40616 S 6.6 0.3 0:31.87 │ │ ├─ git-annex addurl --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw
3799796 dandi 20 0 191M 5088 4680 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3805272 dandi 20 0 6892 3420 2992 S 0.0 0.0 0:00.27 │ │ │ ├─ /bin/bash /usr/bin/git-annex-remote-rclone
3805640 dandi 20 0 20432 13032 4024 S 0.0 0.0 0:02.82 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
3805646 dandi 20 0 20432 13044 4036 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
3805650 dandi 20 0 31900 4064 3816 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805685 dandi 20 0 30144 4000 3752 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805704 dandi 20 0 30144 16076 15792 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805705 dandi 20 0 30144 3976 3728 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805717 dandi 20 0 30144 15968 15680 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805781 dandi 20 0 30144 3980 3724 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805786 dandi 20 0 30144 4068 3820 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805807 dandi 20 0 30144 16028 15744 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805808 dandi 20 0 30144 3884 3636 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805828 dandi 20 0 30144 4008 3764 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3805848 dandi 20 0 20432 13104 4092 S 0.0 0.0 0:00.04 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
3805852 dandi 20 0 20432 12948 3940 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
3805865 dandi 20 0 20432 13032 4024 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
3806054 dandi 20 0 30144 4004 3752 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806066 dandi 20 0 45216 5108 4700 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3806067 dandi 20 0 30144 3888 3640 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806068 dandi 20 0 30144 16032 15748 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806095 dandi 20 0 30144 4060 3816 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3806104 dandi 20 0 20432 12928 3916 S 0.0 0.0 0:00.06 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
3806110 dandi 20 0 30144 15944 15660 S 0.0 0.0 0:00.02 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
3804258 dandi 20 0 1024G 44336 37772 S 0.0 0.1 0:00.04 │ │ ├─ git-annex addurl --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw
3804277 dandi 20 0 40844 5124 4740 S 0.0 0.0 0:00.00 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3805399 dandi 20 0 1024G 23508 20844 S 0.0 0.0 0:00.61 │ │ ├─ git-annex examinekey --batch --migrate-to-backend=SHA256E
3805493 dandi 20 0 1024G 36516 26184 S 0.0 0.1 0:01.51 │ │ ├─ git-annex fromkey --force --batch --json --json-error-messages
3805503 dandi 20 0 25788 5120 4712 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
3805510 dandi 20 0 12472 3984 3732 S 0.0 0.0 0:00.05 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
```
which might be ok but still wonder why they are just sleeping there in more than one per `--jobs` number quantities. git annex 10.20220624-g769be12
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[done]]; this is now handled like other git helper processes
> and will be capped to the maximum of the number of jobs or cpu cores,
> and in practice usually fewer than that will be started. --[[Joey]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-07-25T20:37:55Z"
content="""
I was able to reproduce this by feeding 10 urls into git-annex addurl
-J5 and got 7 hash-object processes running.
filter.annex.process has nothing to do with this. I reproduced the behavior
without it set.
Seems like a simple concurrency issue, where each thread potentially starts
its own hash-object handle, and there can be around 2x as many threads
started as the -J number due to job stages. Annex.Concurrent sets up pools of
handles for other similar git processes, but not hash-object.
"""]]

View file

@ -0,0 +1,8 @@
(Sorry about the title; I was trying to work within the character limit.)
When invoking `git-annex metadata --batch --json --json-error-messages`, if an error occurs in response to some input — say, because the name of a nonexistent file was supplied (or, in my case, because the name of a file downloaded milliseconds ago in a parallel addurl process was supplied) — then `git-annex metadata` will output "git-annex: not an annexed file: {filepath}" to standard error and immediately exit. Not only is this in contrast to what it seems `--json-error-messages` should do, but the "exiting immediately" bit is in contrast to my understanding of how batch mode is supposed to work. Surely this should be fixed?
[[!meta author=jwodder]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2021-11-01T16:27:48Z"
content="""
For consistency with other --batch, I've made it reply with a blank line
when the input is not an annexed file.
Do note that --json-error-messages cannot cram every possible kind of error
message into a json object. In particular, errors that occur at startup,
and not when acting on a particular file or key, do not fit into the json
schema.
"""]]

View file

@ -0,0 +1,44 @@
### Please describe the problem.
From [https://github.com/DanielDent/git-annex-remote-rclone/pull/57](https://github.com/DanielDent/git-annex-remote-rclone/pull/57), where we use that rclone special remote for backup of DANDI data to dropbox
Seems like a test sometimes fails on Mac OS with:
```
+ git-annex copy -J5 --quiet . --to GA-rclone-CI
git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
copy: 1 failed
Error: Process completed with exit code 1.
```
indeed so far seemed to happen only on Mac
```
(git)smaug:/mnt/datasets/datalad/ci/git-annex-remote-rclone[master]2022
$> datalad foreach-dataset git grep 'file is locked'
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone]
03/cron/20221003T064418/da57e9a/github-Tests-144-failed/9_test (macos-latest, v1.53.3).txt:2022-10-03T06:47:44.4978580Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/cron/20221003T064418/da57e9a/github-Tests-144-failed/test (macos-latest, v1.53.3)/9_tests.txt:2022-10-03T06:47:44.4978530Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/10_test (macos-latest, v1.33).txt:2022-10-03T23:35:41.8464390Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/9_test (macos-latest, v1.53.3).txt:2022-10-03T23:37:44.0652500Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/test (macos-latest, v1.33)/9_tests.txt:2022-10-03T23:35:41.8463970Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
03/push/master/1d0d3ce/github-Tests-146-failed/test (macos-latest, v1.53.3)/9_tests.txt:2022-10-03T23:37:44.0652360Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
foreach-dataset(ok): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/10 (dataset)
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/06 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/06]
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/07 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/07]
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/09 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/09]
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/08 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/08]
```
### What steps will reproduce the problem?
no minimal reproducer yet but happens as part of [this test "script"](https://github.com/DanielDent/git-annex-remote-rclone/blob/master/tests/all-in-one.sh)
### What version of git-annex are you using? On what operating system?
git-annex version: 10.20220927
[[!meta author=yoh]]
[[!tag projects/dandi]]
> Presumed [[fixed|done]]; please followup if I'm wrong. --[[Joey]]

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-10-07T16:44:04Z"
content="""
I doubt this is really OSX specific. This must be two threads running logMove
at the same time, that end up trying to both write or one write and one
read at the same time. That causes the haskell RTS to fail this way.
Since it does use a lock file when writing and appending to the log file,
I think it must be the call to checkLogFile that is failing. That avoids
taking the lock, for performance reasons. The performace gain is pretty
minimal though, taking the lock is not much. Only when modifyLogFile
is called at the same time might it need to block on the file being
rewritten, but the file only ever has 100 items, so that never takes long
either.
So, I have added locking to checkLogFile (and to calcLogFile though it's
not used here, just because it has the same problem). That should fix it,
though we'll need to wait on the test to know for sure. I'm going to close
this, as I'm pretty sure though..
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2022-11-04T12:41:47Z"
content="""
ok, did the archaeologic expedition to figure when fixed -- was fixed in [10.20221003-19-g4a42c6909 AKA 10.20221103~28](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=4a42c69092a03cce7b31b79b862e59c9842ced77) , brew still (well -- we are just 1 day post release! ;)) has 10.20221003 so in testing git-annex-remote-rclone we keep getting hit but hopefully it would go away soon with update of git-annex in brew.
"""]]

View file

@ -0,0 +1,96 @@
### Please describe the problem.
git status reports having staged changes and no changes from index
```shell
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use "git add" and/or "git commit -a")
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json
```
although git shows no diff and sha256 checksum corresponds to the key:
```shell
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date: Fri Sep 16 22:22:29 2022 +0000
[backups2datalad] 66 files added
diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
```
I think may be the tricky part is that I have it of
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config annex.version
10
```
although I thought that we kept it at 8 but I have user wider config setting
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config filter.annex.process
git-annex filter-process
```
I was recommended to speed up operations while avoiding upgrade to 10, but I guess running most recent version once lead to the upgrade since all the other repos are still at 8 as I thought it would be
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ grep -h '\<version =' ../*/.git/config | sort | uniq -c
1 version = 10
186 version = 8
```
having it reported modified causes our script which does sanity check to operate only on clean repo to fail.
`git reset --hard` seems mitigated that
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git reset --hard
HEAD is now at b859efed7d [backups2datalad] 66 files added
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
```
all. I will now rerun our script and see in what state I would end up (although, once again, I ended up in version 10 of the repo already, so may be behavior would be different).
### What steps will reproduce the problem?
I think I get it after I `annex move` and then `annex get` that file back. Just for my own reference -- git-annex repo is result of the https://github.com/dandi/dandisets/blob/draft/tools/backups2datalad-update-cron
### What version of git-annex are you using? On what operating system?
10.20220822-g84f1875 (conda build), originally observed on earlier 10.20220724-ge30d846
[[!meta author=yoh]]
[[!tag projects/dandi]]
[[!meta title="annex.stalldetection prevents git-annex get from restaging unlocked files"]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 10"
date="2022-09-22T17:34:35Z"
content="""
damn, I should have shared my config! I also do have `annex.stalldetection` set!
```
[annex]
stalldetection = 1KB/120s
```
never thought it might be related. We should look into having some matrix test run with such config set.
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 11"""
date="2022-09-22T17:38:45Z"
content="""
Yeah, a whole git-annex test run with stalldetection set would have found
this bug. Which seems a bit heavy-weight for the test suite to try as a
separate pass by default. But then again, stalldetection does significantly
change how git-annex operates since it has to fork off child processes that
it can kill when they stall.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 12"
date="2022-09-22T18:14:15Z"
content="""
Adding a matrix run where I initiated a custom config settings to our [datalad/git-annex](https://github.com/datalad/git-annex/pull/133) CI run. Let's see how that goes. May be some other interesting config settings to add there? e.g. retries etc? or global `~/.gitconfig` is not used/mocked away during tests? (e.g. we do that in datalad, so I had to trick that in [PR against datalad](https://github.com/datalad/datalad/pull/7056) to test against this setting being set)
"""]]

View file

@ -0,0 +1,32 @@
[[!comment format=mdwn
username="joey"
subject="""comment 12"""
date="2022-09-22T17:40:57Z"
content="""
So, `git-annex transferrer`, after downloading the content, does handle
populating pointer files. So it calls restagePointerFile to register a cleanup
action.
Whatever is making that process exit 1 must be preventing the cleanup
action from being run. And I think what that is, is that its stdout handle
gets closed at the same time its stdin handle is closed. I tried running
`git-annex transferrer` manually and feeding it a transfer request on
stdin. After its stdin was closed, it proceeded to send
`"om (recording state in git...)\n"` to stdout, and that would fail
with stdout already closed.
Worse, I suspect there's another problem.. When a stall actually
is detected, git-annex kills the `git-annex transferrer` process that has
stalled. But suppose that process has already successfully downloaded some
content and populated pointer files. Killing it would prevent it from
running restagePointerFile on those. It seems that to solve this,
it would need to communicate back to the parent what pointer files need to
be restaged. (Which would also solve the exit 1 problem, although not
necessarily in the best way.)
Also, I think that multiple processes running the restagePointerFile
cleanup action at the same time can be a problem, because one will
lock the index and the rest will fail to restage. Not what's happening
here, but with -J, there would be multiple `git-annex transferrer`
processes doing that at the same time at the end.
"""]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 13"""
date="2022-09-22T18:16:22Z"
content="""
Avoided the early stdout handle close, and that did fix this bug as
reported.
The related problems I identified in comment #12 are still unfixed, so
leaving this open for now.
I think what ought to be done to wrap this up is make restagePointerFile
record the files that need to be restaged in a log file. Then at shutdown,
git-annex can read the log file, and restage everything listed in it.
This will solve multiple problems:
* When a previous git-annex process was interrupted after a get/drop of an
unlocked file, the file will be in the log, so git-annex can notice
that and handle the restaging.
* When a stalled `git-annex transferrer` is killed, the parent git-annex
will read the log and handle the restaging that it was not able to do.
* When multiple processes are trying to restage files at the same time,
an exclusive lock can be used to make only one of them run, and it can
handle restaging the files that the others have recorded in the log too.
* As a bonus, in the situations where git-annex is legitimately unable to
restage files, it can still record them to be restaged later. And the
"only a cosmetic problem" message can tell the user to run a single
simple git-annex command, rather than a complicated
`git update-index` command per file.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 15"""
date="2022-09-22T18:42:06Z"
content="""
@yarikoptic oh, `git-annex test` does prevent global gitconfig from
influeencing the tests. So your matrix test won't work if you're
running `git-annex test` in it. If you're running other git-annex commands
in datalad's test suite, it would work though.
I've opened [[todo/specify_gitconfig_for_test_suite]].
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="joey"
subject="""status update"""
date="2022-09-23T19:57:38Z"
content="""
I've implemented the log file. The stalled transferrer case is now handled.
This bug is fixed.
As to a few other cases I considered in comments upthread:
When a get/drop was interrupted before it could restage,
the next get/drop will cause the necessary restaging for the
interrupted process to happen. However, this doesn't help if there's
nothing left to get/drop. Should git-annex always run restagePointerFiles
on shutdown? That would make any git-annex command handle the restaging.
But it doesn't seem right for query commands to do potentially a lot of
work to handle this case. Anyway, I don't think this needs to be dealt
with in this bug report.
When multiple processes try to restage at the same time, one will
restage everything that all of them logged. The others will still display a
warning to the user that they couldn't restage. It would be hard to avoid
displaying that warning, since it does need to warn when it was
unable to restage because git has the index locked at the time. Anyway,
I think it's ok to display the message despite the files having been
restaged, because it's the same as a later git-annex process handling the
restaging. (It does seem like two transferrers belonging to the same parent
could collide in this way, and one display the warning, which isn't great..)
I also implemented a "git-annex restage" command that
is an easier way to restage in the cases where git-annex is not able
to do it itself.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-09-21T17:05:51Z"
content="""
Is .dandi/assets.json an unlocked file?
`git diff --cached` seems like the wrong thing to run, because
that would show changes that you have staged for commit.
This change is one that has not been staged for commit.
So `git diff` should show it.
"""]]

View file

@ -0,0 +1,46 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2022-09-21T18:46:50Z"
content="""
d'oh forgot to show that I have tried that one too. Here is everything at once again with `git diff` and again doing checksums (that should have been different in my prev examples as well if different only in tree but not in index):
```shell
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use \"git add <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
modified: .dandi/assets.json
It took 3.19 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
no changes added to commit (use \"git add\" and/or \"git commit -a\")
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date: Fri Sep 16 22:22:29 2022 +0000
[backups2datalad] 66 files added
diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json
```
"""]]

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 3"
date="2022-09-21T18:49:06Z"
content="""
the workaround you suggest elsewhere for \"cosmetic\" problem works here too
```
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use \"git add <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use \"git add\" and/or \"git commit -a\")
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git update-index -q --refresh .dandi/assets.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
```
but since we are relying on output from `status`, it is not just a \"cosmetic\" issue. IMHO if such `update-index` is needed, it should have been done by git-annex automagically somehow/sometime.
"""]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2022-09-21T19:19:08Z"
content="""
So you can reproduce this? I am pretty sure it's not as simple as a drop
followed by a get, so more information about reproducing it seems crucial.
I assume you are *not* seeing the "This is only a cosmetic problem affecting git status"
message?
I expect that running `git update-index --refresh .dandi/assets.json`
will fix git status. Can you confirm?
The only way I know of that this can happen without the message is if a
drop or a get is still running, or gets interrupted. One of the last things
git-annex before exiting is restage all the unlocked files that it has
updated.
Short of that, it seems like it would have to be a bug that prevents
restagePointerFile from working. Which might not be a bug in git-annex,
if the problem involves git's handling of timestamps in the index, for
example. (Which is known to have some odd behaviors.)
(git-annex could be improved to do the
restaging later when interrupted and possibly after such a bug.
But there's no way to make it recover in `git status`, because
git doesn't run it in this situation.)
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2022-09-21T22:06:49Z"
content="""
Seems likely that the --time-limit option, when combined with -J,
could result in git-annex exiting before a worker thread gets a chance to
call stagePointerFile. I have not verified this, and it would be unlikely
to result in the same file being affected reproducibly.
"""]]

View file

@ -0,0 +1,33 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2022-09-22T01:03:18Z"
content="""
may be it one of those options, in my case - it is just a straight `get` on that single unlocked file:
```
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ cat .dandi/assets.json
/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex get .dandi/assets.json
get .dandi/assets.json (from dandi-dandisets-dropbox...)
(checksum...) ok
(recording state in git...)
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use \"git add <file>...\" to update what will be committed)
(use \"git restore <file>...\" to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use \"git add\" and/or \"git commit -a\")
```
"""]]

View file

@ -0,0 +1,58 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 7"
date="2022-09-22T01:33:24Z"
content="""
sorry I have not mentioned your [earlier comment 4](http://git-annex.branchable.com/bugs/reports_file___34__modified__34___whenever_it_is_not/#comment-ca0281ff580c91c40e429fbbb71a3791) but my clarification above I think gives the answers to your questions ;)
<details>
<summary>FWIW here is the get --debug output </summary>
```shell
[2022-09-21 21:29:59.904218] (Utility.Process) process [3968193] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"--stage\",\"-z\",\"--error-unmatch\",\"--\",\".dandi/assets.json\"]
[2022-09-21 21:29:59.904725] (Utility.Process) process [3968194] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
[2022-09-21 21:29:59.905645] (Utility.Process) process [3968195] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
[2022-09-21 21:29:59.906012] (Utility.Process) process [3968196] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"]
[2022-09-21 21:29:59.907578] (Utility.Process) process [3968196] done ExitSuccess
[2022-09-21 21:29:59.907891] (Utility.Process) process [3968197] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
[2022-09-21 21:29:59.913611] (Utility.Process) process [3968197] done ExitSuccess
[2022-09-21 21:29:59.914676] (Utility.Process) process [3968198] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..5f5efa8544ff02c9261dd1590425dcea37a55526\",\"--pretty=%H\",\"-n1\"]
[2022-09-21 21:29:59.916707] (Utility.Process) process [3968198] done ExitSuccess
[2022-09-21 21:29:59.916968] (Utility.Process) process [3968199] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..18497e6e9cab7754a85256416c361fee36ba65b2\",\"--pretty=%H\",\"-n1\"]
[2022-09-21 21:29:59.918722] (Utility.Process) process [3968199] done ExitSuccess
[2022-09-21 21:29:59.919069] (Utility.Process) process [3968200] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
get .dandi/assets.json [2022-09-21 21:29:59.921463] (Utility.Process) process [3968202] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"]
(from dandi-dandisets-dropbox...) [2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex [\"transferrer\",\"-c\",\"annex.debug=true\"]
[2022-09-21 21:29:59.93162] (Annex.TransferrerPool) > d rdandi-dandisets-dropbox SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json .dandi/assets.json
[2022-09-21 21:29:59.942599] (Annex.TransferrerPool) < opb
[2022-09-21 21:29:59.942718] (Annex.TransferrerPool) < ops 69507227
[2022-09-21 21:30:03.103409] (Annex.TransferrerPool) < ope
[2022-09-21 21:30:03.103539] (Annex.TransferrerPool) < om (checksum...)
(checksum...) [2022-09-21 21:30:03.768599] (Annex.TransferrerPool) < t
[2022-09-21 21:30:03.768843] (Annex.Branch) read 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
[2022-09-21 21:30:03.770259] (Annex.Branch) set 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
ok
[2022-09-21 21:30:03.770361] (Utility.Process) process [3968200] done ExitSuccess
[2022-09-21 21:30:03.770425] (Utility.Process) process [3968195] done ExitSuccess
[2022-09-21 21:30:03.770484] (Utility.Process) process [3968194] done ExitSuccess
[2022-09-21 21:30:03.770531] (Utility.Process) process [3968193] done ExitSuccess
[2022-09-21 21:30:03.771187] (Utility.Process) process [3968452] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"hash-object\",\"-w\",\"--stdin-paths\",\"--no-filters\"]
[2022-09-21 21:30:03.77319] (Utility.Process) process [3968453] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
[2022-09-21 21:30:04.063182] (Utility.Process) process [3968453] done ExitSuccess
[2022-09-21 21:30:04.063779] (Utility.Process) process [3968463] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
[2022-09-21 21:30:04.065352] (Utility.Process) process [3968463] done ExitSuccess
(recording state in git...)
[2022-09-21 21:30:04.06587] (Utility.Process) process [3968464] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"]
[2022-09-21 21:30:04.407935] (Utility.Process) process [3968464] done ExitSuccess
[2022-09-21 21:30:04.408528] (Utility.Process) process [3968468] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"56c62dcc21145201f9454a2dd6e75cc37f072ee4\",\"--no-gpg-sign\",\"-p\",\"refs/heads/git-annex\"]
[2022-09-21 21:30:04.410591] (Utility.Process) process [3968468] done ExitSuccess
[2022-09-21 21:30:04.413623] (Utility.Process) process [3968469] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"c3a1f9208649b47621b1424b055bd9871aa2fc79\"]
[2022-09-21 21:30:04.415318] (Utility.Process) process [3968469] done ExitSuccess
[2022-09-21 21:30:04.416301] (Utility.Process) process [3968202] done ExitSuccess
[2022-09-21 21:30:04.416574] (Utility.Process) process [3968452] done ExitSuccess
[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1
```
</details>
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2022-09-22T17:02:04Z"
content="""
I've fixed the issue I found with --timestamp combined with -J. Which I do
think could have resulted in the same kind of problem. But you've shown
that is not the cause in your case..
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2022-09-22T17:04:35Z"
content="""
Thanks for the --debug. It shows that git-annex is not running
`git update-index --refresh` at all.
And it shows that the transfer happens in a `git-annex transferrer` process.
So, I think you have annex.stalldetection set.
[2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex [\"transferrer\",\"-c\",\"annex.debug=true\"]
And interestingly, that transferrer process fails at the end:
[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1
Aha! I can reproduce it by setting annex.stalldetection.
"""]]

View file

@ -0,0 +1,72 @@
### Please describe the problem.
NB can't change the title since it is not about depends since libgcc-s1 is essential... so most likely some LD_LIBRARY_PATH manipulation is in place or smth like that.
[Testing of git-annex-remote-rclone on ubuntu-20.04 crashed](https://github.com/DanielDent/git-annex-remote-rclone/actions/runs/3750292044/jobs/6370225718) with
```
+ git-annex copy -J5 --quiet . --to GA-rclone-CI
libgcc_s.so.1 must be installed for pthread_cancel to work
/home/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/tests/all-in-one.sh: line 124: 3066 Aborted (core dumped) git-annex copy -J5 --quiet . --to GA-rclone-CI
Error: Process completed with exit code 134.
```
installation of git annex
```
Run datalad-installer --sudo ok git-annex -m datalad/git-annex:release
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Writing environment modifications to /tmp/dl-env-j8s29if7.sh
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Installing git-annex via datalad/git-annex:release
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Version: None
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Downloading https://github.com/datalad/git-annex/releases/download/10.20221212/git-annex-standalone_10.20221212-1.ndall%2B1_amd64.deb
2022-12-21T15:10:33+0000 [INFO ] datalad_installer Running: sudo dpkg -i /tmp/tmpah14ch03/git-annex-standalone_10.20221212-1.ndall+1_amd64.deb
Selecting previously unselected package git-annex-standalone.
(Reading database ... 236921 files and directories currently installed.)
Preparing to unpack .../git-annex-standalone_10.20221212-1.ndall+1_amd64.deb ...
Unpacking git-annex-standalone (10.20221212-1~ndall+1) ...
Setting up git-annex-standalone (10.20221212-1~ndall+1) ...
Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for man-db (2.10.2-1) ...
2022-12-21T15:10:35+0000 [INFO ] datalad_installer git-annex is now installed at /usr/bin/git-annex
```
or may be that is an issue with `rclone`? in this case it was
```
Run datalad-installer --sudo ok rclone=v1.59.2 -m downloads.rclone.org
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Writing environment modifications to /tmp/dl-env-aon5z6_f.sh
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Installing rclone from downloads.rclone.org
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Version: v1.59.2
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Bin dir: /usr/local/bin
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Man dir: None
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Downloading https://downloads.rclone.org/v1.59.2/rclone-v1.59.2-linux-amd64.zip
2022-12-21T15:10:38+0000 [INFO ] datalad_installer Moving /tmp/tmp75sde__c/rclone-v1.59.2-linux-amd64/rclone to /usr/local/bin/rclone
2022-12-21T15:10:38+0000 [INFO ] datalad_installer rclone is now installed at /usr/local/bin/rclone
```
I have tried to reproduce locally with exactly those installations of rclone and git-annex but not getting the same problem :-/
I have also ran with `--debug` and got
```
[2022-12-21 17:20:10.056928113] (Utility.Process) process [11603] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","c95a5c849daca7183eefc28c360942104d01e900","--no-gpg-sign","-p","refs/heads/git-annex"]
[2022-12-21 17:20:10.060448661] (Utility.Process) process [11603] done ExitSuccess
[2022-12-21 17:20:10.060806165] (Utility.Process) process [11604] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","248cef615747c4aba64fbb475b0a03c8d2a78b27"]
[2022-12-21 17:20:10.063957208] (Utility.Process) process [11604] done ExitSuccess
[2022-12-21 17:20:10.066005436] (Utility.Process) process [11127] done ExitSuccess
[2022-12-21 17:20:10.066266539] (Utility.Process) process [11114] done ExitSuccess
[2022-12-21 17:20:10.066702845] (Utility.Process) process [11126] done ExitSuccess
[2022-12-21 17:20:10.067107151] (Utility.Process) process [11125] done ExitSuccess
[2022-12-21 17:20:10.067357854] (Utility.Process) process [11599] done ExitSuccess
libgcc_s.so.1 must be installed for pthread_cancel to work
/home/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/tests/all-in-one.sh: line 125: 11083 Aborted (core dumped) git-annex drop -J5 --debug .
Error: Process completed with exit code 134.
```
in https://github.com/DanielDent/git-annex-remote-rclone/actions/runs/3751417971/jobs/6372374929 .
Any ideas Joey?
[[!meta author=yoh]]
[[!tag projects/dandi]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,23 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2022-12-22T18:38:32Z"
content="""
I'm a bit surprised git-annex is using `pthread_cancel`, since `strings`
does not show it contains that symbol. Perhaps one of the other pthread
symbols it uses ends up calling that.
It does seem though from the message that it's git-annex and not a program
it runs that is core dumping on this. Also I checked, and the rclone you
installed is a statically linked binary so I would not expect it to use
`libgcc_s.so`. And And git-annex-remote-rclone is a bash script, and bash
doesn't use pthreads.
(I do think that, in general, using the git-annex standalone tarball and
then trying to run additional programs besides git-annex inside it is not
going to always work well. Standalone interposes its own versions of libraries,
which may not work with the other programs. There is already a todo about that,
[[todo/restore_original_environment_when_running_external_special_remotes_from_standalone_git-annex__63__]].)
I've added `libgcc_s.so.1` to the standalone build.
"""]]