move old fixed datalad/dandi/repronim bugs to the project pages
This is to cut down on the number of files in bugs/, which makes it slow to file new bug reports or update active bug reports. These old bugs were about 1/3rd of the files in there. These projects want lists of their old bugs to still be accessible, and have the lists on their project pages, which will still list the old bugs. Commands used: for f in $(git grep -l '\[\[!tag projects/dandi\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/dandi/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/dandi/bugs-done; fi; fi; done for f in $(git grep -l '\[\[!tag projects/repronim\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/repronim/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/repronim/bugs-done; fi; fi; done for f in $(git grep -l '\[\[!tag projects/datalad\]\]'); do if grep -q 'done\]\]' "$f"; then git mv "$f" ../projects/datalad/bugs-done; g=$(echo "$f" | sed 's/.mdwn//'); if [ -d "$g" ]; then git mv "$g" ../projects/datalad/bugs-done; fi; fi; done That assumes that bugs are not tagged by multiple projects at the same time. Of the ones I moved, I've checked and none are. Could do the same with todo/ but there are only 370 files in there, and less than 84 of them could be moved this way, which does not seem likely to produce a sizeable speedup. Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
parent
946fc20165
commit
bcc69f07e8
1011 changed files with 4 additions and 4 deletions
|
@ -21,6 +21,6 @@ DANDI: Distributed Archives for Neurophysiology Data Integration is a platform f
|
|||
<details>
|
||||
<summary>Done</summary>
|
||||
|
||||
[[!inline pages="bugs/* and !bugs/done and link(bugs/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist]]
|
||||
[[!inline pages="(bugs/* or projects/dandi/bugs-done/*) and !bugs/done and link(bugs/done) and tagged(projects/dandi)" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist]]
|
||||
|
||||
</details>
|
||||
|
|
|
@ -0,0 +1,49 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Original complaints could be found mentioned in the comments of the [importfeed page](https://git-annex.branchable.com/git-annex-importfeed/): when using `addurl`, and even when the server provides Content-Disposition field with the filename, git-annex seems (BTW -- no Content-Disposition was mentioned in the --debug output) to take that filename value and obfuscates it (replaces '-' with '_' etc) to what supposed to be the original filename.
|
||||
|
||||
|
||||
[[!format sh """
|
||||
$> mkdir /tmp/testrepo; cd /tmp/testrepo; git init; git annex init;
|
||||
mkdir: cannot create directory ‘/tmp/testrepo’: File exists
|
||||
E: could not determine git repository root
|
||||
Initialized empty Git repository in /tmp/testrepo/.git/
|
||||
init ok
|
||||
(recording state in git...)
|
||||
|
||||
$> git annex addurl --fast https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download
|
||||
addurl https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download (to sub_mouse_AAYYT_ses_20180420_sample_2_slice_20180420_slice_2_cell_20180420_sample_2.nwb) ok
|
||||
(recording state in git...)
|
||||
|
||||
$> ls -l
|
||||
total 4
|
||||
lrwxrwxrwx 1 yoh yoh 184 May 7 17:02 sub_mouse_AAYYT_ses_20180420_sample_2_slice_20180420_slice_2_cell_20180420_sample_2.nwb -> .git/annex/objects/Gj/9z/URL-s9335000--https&c%%girder.dandiarchive.org-48163bc503cb7181516be86ef215f923/URL-s9335000--https&c%%girder.dandiarchive.org-48163bc503cb7181516be86ef215f923
|
||||
"""]]]
|
||||
|
||||
whenever original content-disposition was having "-" in the filename, which are perfectly safe the filename AFAIK:
|
||||
|
||||
[[!format sh """
|
||||
$> wget -S https://girder.dandiarchive.org/api/v1/item/5e9f9588b5c9745bad9f58ff/download
|
||||
... bunch of forwards to the final one with the content disposition field
|
||||
Resolving dandiarchive.s3.amazonaws.com (dandiarchive.s3.amazonaws.com)... 52.219.101.51
|
||||
Connecting to dandiarchive.s3.amazonaws.com (dandiarchive.s3.amazonaws.com)|52.219.101.51|:443... connected.
|
||||
HTTP request sent, awaiting response...
|
||||
HTTP/1.1 200 OK
|
||||
x-amz-id-2: VgJE1jV5XUkBQXZDWgR5WEDfmHJp4Fj6fGo6z2tYkLfyTsxDWC+m92B2qOSVppCuiRFu2QpNV5M=
|
||||
x-amz-request-id: 1221CAC30E3931CF
|
||||
Date: Thu, 07 May 2020 21:02:52 GMT
|
||||
Last-Modified: Wed, 22 Apr 2020 00:54:32 GMT
|
||||
ETag: "acf3b4f5951435245a0efcd4a518e77d"
|
||||
Content-Disposition: attachment; filename="sub-mouse-AAYYT_ses-20180420-sample-2_slice-20180420-slice-2_cell-20180420-sample-2.nwb"
|
||||
...
|
||||
|
||||
$> git annex version
|
||||
git-annex version: 7.20190708+git9-gfa3524b95-1~ndall+1
|
||||
|
||||
"""]]
|
||||
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -0,0 +1,42 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-05-08T16:50:14Z"
|
||||
content="""
|
||||
This is due to the filename being passed through sanitizeFilePath.
|
||||
|
||||
There are security concerns here. If the filename contains "../"
|
||||
it absolutely has to be modified, or the command would have to fail and
|
||||
refuse the import it.
|
||||
|
||||
If the filename contains an ANSI escape sequence, it could potentially
|
||||
lead to a security hole. Or if the filename starts with "-" it could be
|
||||
somewhere between a possible security hole and just very annoying to work
|
||||
with. As could a filename that contains a newline, which will
|
||||
break large quantities of shell pipelines. While generally git repos can
|
||||
have these problems with files in them too, the exposure seems larger when
|
||||
talking to some random web server than when pulling from a repo.
|
||||
|
||||
Also, cross filesystem compatibility is a concern. It used to allow "|" in
|
||||
the filename, but a bug pointed out that cannot be used on fat filesystems.
|
||||
And "\\" means different things on linux and windows, so probably best to avoid
|
||||
filenames containing it on linux too.
|
||||
|
||||
Finally, it's somewhat opinionated, since it replaces spaces with
|
||||
underscores. That's certainly the least defensible thing.
|
||||
|
||||
(git-annex may also truncate the filename if it's longer than what the
|
||||
filesystem supports.)
|
||||
|
||||
So, it's clearly wrong that it should be taken as-is without obfuscation,
|
||||
IMHO. Maybe there's a way to improve it to meet some use case though.
|
||||
|
||||
I could see having a config that avoids sanitizing the filename, but
|
||||
makes addurl fail if the filename looks like a security problem.
|
||||
|
||||
Though that has the downside that git-annex would then need to
|
||||
comprehensively track, going forward, all the ways that people find to make
|
||||
filenames be a security problem; the current method, by being strict in
|
||||
what it lets through, probably limits expoits to ones involving a) unicode
|
||||
or b) the user's wetware.
|
||||
"""]]
|
|
@ -0,0 +1,26 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2020-05-08T18:19:20Z"
|
||||
content="""
|
||||
`git-annex import` does not do any sanitization, and that could be
|
||||
considered inconsistent, particularly when importing from a remote like S3.
|
||||
|
||||
A difference with that is, it creates a remote tracking branch for the
|
||||
imported files. (That happens to avoid "../" path traversal because git
|
||||
generally avoids it.) Maybe the real difference is, import from a special
|
||||
remote is completely analagous to fetching from a git remote. So it feels
|
||||
different to me than adding an url does.
|
||||
|
||||
If I sync with a S3 bucket and it turns out it imported a escape sequence
|
||||
file, well I could have looked at the bucket first, or imported and
|
||||
reviewed the branch before merging it. And if I was syncing with a git
|
||||
remote the same thing could happen. So it feels like I should have no
|
||||
expectation git-annex would protect me. Whereis, if I add an url and the
|
||||
web server uses an obscure-ish http header to surprise me with a similar
|
||||
malicious filename, I had no way before hand to know that would happen, and
|
||||
so it does feel like git-annex should protect me.
|
||||
|
||||
(Although if git did prevent that, git-annex should too, and I'd be
|
||||
fine with git preventing that.)
|
||||
"""]]
|
|
@ -0,0 +1,18 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-05-08T19:56:27Z"
|
||||
content="""
|
||||
Implemented git-annex addurl --preserve-filename, which will do what you
|
||||
want.
|
||||
|
||||
Leaving this bug open because I only implemented it for web urls, not yet
|
||||
for torrents and other special remotes that have their own url scheme.
|
||||
The sanitization for those is currently done at a lower level than addurl,
|
||||
and so that will take a bit more work to implement.
|
||||
|
||||
(importfeed does not, I think, need to implement this option, because
|
||||
the filenames are based on information from the rss feed, and it's
|
||||
perfectly fine to sanitize eg a podcast episode title to get a reasonable
|
||||
filename.)
|
||||
"""]]
|
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 4"
|
||||
date="2020-05-09T22:10:43Z"
|
||||
content="""
|
||||
> If the filename contains an ANSI escape sequence, it could potentially lead to a security hole.
|
||||
> ... As could a filename that contains a newline, which will break large quantities of shell pipelines.
|
||||
|
||||
IMHO those indeed are ok to target for sanitization
|
||||
|
||||
> Or if the filename starts with \"-\" it could be somewhere between a possible security hole and just very annoying to work with.
|
||||
|
||||
So why not to sanitize it only at the beginning of the filename?
|
||||
`-` is a very common and a safe character to use within filename. For that matter we VERY frequently use `-` in filenames. It even became part of our BIDS standard in neuroimaging: https://bids-specification.readthedocs.io where we separate `_key` from `value`, e.g.in ` . I really do not see why git-annex should so aggressively sanitize filenames as replacing \"-\" within filenames -- it makes nothing more secure or convenient.
|
||||
|
||||
> While generally git repos can have these problems with files in them too, the exposure seems larger when talking to some random web server than when pulling from a repo.
|
||||
|
||||
Well, not sure about ansi characters and new line symbols, but typically files are saved by the browsers with the name suggested by the server.
|
||||
"""]]
|
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2020-05-11T17:20:07Z"
|
||||
content="""
|
||||
I agree that it may as well allow non-leading '-'. But, if you are relying
|
||||
on getting the unsanitized filename generally, you should use
|
||||
--preserve-filename
|
||||
|
||||
Web browsers do do some santization, particulary of '/'.
|
||||
Chrome removes leading "." as well. Often files are downloaded
|
||||
without the user confirming it. I suspect there is enough insecurity
|
||||
in that area that someone could make a living injecting bitcoin miners into
|
||||
dotfiles.
|
||||
"""]]
|
|
@ -0,0 +1,6 @@
|
|||
While running `git-annex addurl --batch --with-files --jobs 10 --json --json-error-messages --json-progress --raw`, I occasionally run into files that fail to download for no discernable reason, and the `"error-messages"` key in the output from the command is an empty list. This makes it hard to figure out exactly why the download is failing.
|
||||
|
||||
[[!meta author=jwodder]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -0,0 +1,16 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-10-27T16:23:52Z"
|
||||
content="""
|
||||
Is it reproducible with a particular url? Does it only happen with -J?
|
||||
|
||||
Version would also be good to know. There were recent relevant
|
||||
changes eg [[!commit 4f42292b13dc5a6664eeb19b5c9d48991eaef292]].
|
||||
|
||||
I've spent a while hunting for a code path where it fails without
|
||||
displaying a warning, and have not found one. Since the code in addurl
|
||||
is structured as return Nothing and hopefully display a warning
|
||||
beforehand, rather than as throw an error, it's certianly possible that
|
||||
happens.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="jwodder"
|
||||
avatar="http://cdn.libravatar.org/avatar/b06e01332c949b895c681cc92934f36a"
|
||||
subject="comment 2"
|
||||
date="2021-10-27T18:16:43Z"
|
||||
content="""
|
||||
It appears that the problem occurs whenever one tries to download the same URL to two different paths at the same time. When this occurs, one of the downloads fails, and though its \"error-messages\" is empty, its \"notes\" field reads, \"transfer already in progress, or unable to take transfer lock\".
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="jwodder"
|
||||
avatar="http://cdn.libravatar.org/avatar/b06e01332c949b895c681cc92934f36a"
|
||||
subject="comment 3"
|
||||
date="2021-10-27T18:19:23Z"
|
||||
content="""
|
||||
As to your questions, I am using git-annex 8.20211011 on macOS 11.6. The problem does not occur when the `--jobs` option is omitted, but that's not viable for the current project we're using git-annex for.
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-10-27T18:40:48Z"
|
||||
content="""
|
||||
Aha, that makes sense! addurl constructs a url-based Key to use while
|
||||
downloading, and the key transfer machinery prevents redundant downloads
|
||||
of the same Key at the same time.
|
||||
|
||||
Arguably, the problem is not where the message gets put, but that
|
||||
it fails when adding an url to two different paths at the same time.
|
||||
|
||||
I have, though, moved that message so it will appear in error-messages.
|
||||
"""]]
|
|
@ -0,0 +1,26 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2021-10-27T18:56:23Z"
|
||||
content="""
|
||||
The best solution I can find is for it to notice when another thread is
|
||||
downloading the same url, and wait until it finishes. Then proceed
|
||||
with downloading the url for a second time.
|
||||
|
||||
It's not very satisfying to re-download. But once the url Key is downloaded,
|
||||
it does not keep that url Key populated, but hashes the content and moves
|
||||
the content to the final Key. It would be a real complication to
|
||||
communicate, across threads, what Key the content ended up at, and have the
|
||||
waiting thread use that. And addurl is already complicated well beyond a
|
||||
point I am comfortable with.
|
||||
|
||||
Also, the content of an url can of course change over time. If I feed
|
||||
"$url foo" into git-annex addurl --batch -J10 and then some time
|
||||
later, I feed "$url bar", I might expect that file bar gets whatever
|
||||
content the url has now, not the content that the url had back when I added
|
||||
the same url to file foo. And if I cared about avoiding re-downloading,
|
||||
I could add the url to the first file, and then copy the annex link to the
|
||||
second file myself.
|
||||
|
||||
Implemented this approach.
|
||||
"""]]
|
|
@ -0,0 +1,21 @@
|
|||
### Please describe the problem.
|
||||
|
||||
This is a continuation to the [prior report/discussion](https://git-annex.branchable.com/bugs/leaks_git_config_error_message_upon_inability_to_read_downloaded___34__config__34___file/#comment-424548e59fc41618ffeeb65f418694b3) to facilitate access to private repositories on public hosting portals.
|
||||
|
||||
If we place more odd/custom behavior of gitlab etc installations which forward to login screen (thus no 401 or 404 response) upon attempt to access something which might be within private rep, aside, the situation with github and gogs (github clone) which powers gin (which I had [mentioned](https://git-annex.branchable.com/bugs/leaks_git_config_error_message_upon_inability_to_read_downloaded___34__config__34___file/#comment-ec2193d97bb19945ad74cee13f747b35) in that prior discussion)) is different: they return 404 response. And I think (didn't check git code, but just based on its behavior) `git` is then asking for credentials as the "next way to try". I think git-annex should do the same -- if 404 received, ask `git credential` to fill for that domain (as it would do now in case of 401).
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
Try to clone and get data from a private repository on [https://gin.g-node.org/](https://gin.g-node.org/) (repo could be created, or let me know and I would create one, but you would still need to register there). I am not yet 100% certain that upon authentication you would be able to fetch that `/config` (haven't tried). Satellite issue/discussion I just initiated on gin is [here](https://github.com/G-Node/gogs/issues/111)
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
8.20201127+git54-ga1b227171-1~ndall+1
|
||||
|
||||
|
||||
edit 1: although probably a deeper look into how/why git decides to ask for credentials for private repos might be due. May be similar check should be done by git-annex first, since otherwise there might be no way to tell apart from a "proper" 404 for inability to get `/config` from github
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[notabug|done]] --[[Joey]]
|
|
@ -0,0 +1,17 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-01-21T16:57:06Z"
|
||||
content="""
|
||||
The git source code does not appear to behave
|
||||
like that, see http.c `normalize_curl_result`, which reauths on 401, but
|
||||
not on 404. If you think git behaves like this, you need to show an example
|
||||
where it clearly accesses an url that is 404 and goes on to authenticate.
|
||||
|
||||
Seems to me that these hosting sites may simply not be exposing foo.git/config
|
||||
to http. Git does not request that file over http. Such a hosting site would
|
||||
probably also not expose foo.git/annex/ over http, so git-annex would not be
|
||||
able to use it anyway. To support git-annex, it would need to
|
||||
expose both, and then git-annex's handling of 401 should work fine for
|
||||
authentication.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2021-01-21T18:36:50Z"
|
||||
content="""
|
||||
a quick one: https://gin.g-node.org/ does expose `foo.git/annex/` -- that is what gin has extended original borg with. Example repo to try on https://gin.g-node.org/ljchang/Sherlock . The problem/difficulty is only in access to \"private\" repositories -- access to config and annexed files is working fine through http
|
||||
"""]]
|
|
@ -0,0 +1,22 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-01-21T19:20:00Z"
|
||||
content="""
|
||||
It still seems easy to demonstrate that git does not ask for creds on 404:
|
||||
|
||||
joey@darkstar:~> git clone http://google.com/this-url-does-not-exist
|
||||
Cloning into 'this-url-does-not-exist'...
|
||||
fatal: repository 'http://google.com/this-url-does-not-exist/' not found
|
||||
|
||||
So I need you to show me what makes you think that git does such a strange
|
||||
thing, before I can take seriously a request to replicate that behavior in
|
||||
git-annex. Because the only possible reason I would implement such an
|
||||
insane thing is if git has lost its collective mind and so I needed to
|
||||
follow into the abyss.
|
||||
|
||||
If the actual issue is that gogs has implemented support for git-annex,
|
||||
but that it sends 404 when git-annex requests config from a
|
||||
private repo, rather than 401, it seems to me the place to fix that is in
|
||||
gogs.
|
||||
"""]]
|
|
@ -0,0 +1,112 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 4"
|
||||
date="2021-01-22T01:47:46Z"
|
||||
content="""
|
||||
yeap, it is not about 404 ...
|
||||
|
||||
<details>
|
||||
<summary>with gogs/gin situation is obscure but \"easyish\" - 401 is returned upon access to `/info/refs` but not above:</summary>
|
||||
|
||||
```shell
|
||||
$> wget -S \"https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info/refs\"
|
||||
--2021-01-21 20:37:22-- https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info/refs
|
||||
Resolving gin.g-node.org (gin.g-node.org)... 141.84.41.219
|
||||
Connecting to gin.g-node.org (gin.g-node.org)|141.84.41.219|:443... connected.
|
||||
HTTP request sent, awaiting response...
|
||||
HTTP/1.1 401 Unauthorized
|
||||
Date: Fri, 22 Jan 2021 01:37:23 GMT
|
||||
Server: Apache/2.4.38 (Debian)
|
||||
content-type: text/plain
|
||||
www-authenticate: Basic realm=\".\"
|
||||
content-length: 0
|
||||
set-cookie: lang=en-US; Path=/; Max-Age=2147483647
|
||||
set-cookie: gnode_gin=823b677f19feb8ef; Path=/; HttpOnly
|
||||
set-cookie: _csrf=GrekbiqDJleLLNcVyax5z77buGY6MTYxMTI3OTQ0MzYwMTMyMzE4NQ; Path=/; Expires=Sat, 23 Jan 2021 01:37:23 GMT
|
||||
Keep-Alive: timeout=5, max=100
|
||||
Connection: Keep-Alive
|
||||
|
||||
Username/Password Authentication Failed.
|
||||
1 51975 ->6 [2].....................................:Thu 21 Jan 2021 08:37:23 PM EST:.
|
||||
(git)lena:~/proj/misc/git[master]git
|
||||
$> wget -S \"https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info\"
|
||||
--2021-01-21 20:37:52-- https://gin.g-node.org/SakshamSharda/ophys_testing1.git/info
|
||||
Resolving gin.g-node.org (gin.g-node.org)... 141.84.41.219
|
||||
Connecting to gin.g-node.org (gin.g-node.org)|141.84.41.219|:443... connected.
|
||||
HTTP request sent, awaiting response...
|
||||
HTTP/1.1 404 Not Found
|
||||
Date: Fri, 22 Jan 2021 01:37:53 GMT
|
||||
Server: Apache/2.4.38 (Debian)
|
||||
content-type: text/html; charset=UTF-8
|
||||
set-cookie: lang=en-US; Path=/; Max-Age=2147483647
|
||||
set-cookie: gnode_gin=26d42c5108c8715d; Path=/; HttpOnly
|
||||
set-cookie: _csrf=SAKUL4rdspufTb_lxEWIijnzYBU6MTYxMTI3OTQ3Mjk5MDczODgzMA; Path=/; Expires=Sat, 23 Jan 2021 01:37:52 GMT
|
||||
Keep-Alive: timeout=5, max=100
|
||||
Connection: Keep-Alive
|
||||
Transfer-Encoding: chunked
|
||||
2021-01-21 20:37:53 ERROR 404: Not Found.
|
||||
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
|
||||
github is ... trickier, or to say -- my C/gdb/whatever foo is not good enough, since
|
||||
|
||||
<details>
|
||||
<summary>it is still 404 with simple wget but git remote-https seems to get 401:</summary>
|
||||
|
||||
```shell
|
||||
(gdb) p results
|
||||
$15 = {curl_result = CURLE_HTTP_RETURNED_ERROR, http_code = 401, auth_avail = 1, http_connectcode = 0}
|
||||
(gdb) p rl
|
||||
No symbol \"rl\" in current context.
|
||||
(gdb) p url
|
||||
$16 = 0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\"
|
||||
(gdb) bt
|
||||
#0 http_request (url=0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\",
|
||||
result=<optimized out>, target=<optimized out>, options=0x7fffffffd920) at http.c:1981
|
||||
#1 0x00005555555665bf in http_request_reauth (
|
||||
url=0x5555557a4450 \"https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack\", result=0x7fffffffd880,
|
||||
target=0, options=0x7fffffffd920) at http.c:2040
|
||||
#2 0x000055555555f7f3 in discover_refs (service=<optimized out>, service@entry=0x5555556b622c \"git-upload-pack\",
|
||||
for_push=for_push@entry=0) at remote-curl.c:493
|
||||
#3 0x000055555556137e in get_refs (for_push=<optimized out>) at remote-curl.c:548
|
||||
#4 cmd_main (argc=argc@entry=3, argv=argv@entry=0x7fffffffdcd8) at remote-curl.c:1523
|
||||
#5 0x000055555555ee94 in main (argc=3, argv=0x7fffffffdcd8) at common-main.c:52
|
||||
|
||||
```
|
||||
|
||||
```
|
||||
$> wget --header \"Git-Protocol: version=2\" --header \"Pragma: no-cache\" -S 'https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack'
|
||||
--2021-01-21 20:41:21-- https://github.com/yarikoptic/abcd-testds2/info/refs?service=git-upload-pack
|
||||
Resolving github.com (github.com)... 140.82.114.3
|
||||
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
|
||||
HTTP request sent, awaiting response...
|
||||
HTTP/1.1 404 Not Found
|
||||
Server: GitHub.com
|
||||
Date: Fri, 22 Jan 2021 01:41:21 GMT
|
||||
Content-Type: text/plain; charset=utf-8
|
||||
Status: 404 Not Found
|
||||
Vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With
|
||||
Cache-Control: no-cache
|
||||
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
|
||||
X-Frame-Options: deny
|
||||
X-Content-Type-Options: nosniff
|
||||
X-XSS-Protection: 1; mode=block
|
||||
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
|
||||
Expect-CT: max-age=2592000, report-uri=\"https://api.github.com/_private/browser/errors\"
|
||||
Content-Security-Policy: default-src 'none'; base-uri 'self'; connect-src 'self'; form-action 'self'; img-src 'self' data:; script-src 'self'; style-src 'unsafe-inline'
|
||||
Set-Cookie: _gh_sess=UoF3mYOvfYf5mFbK1tr7aWOuYpQbNoJVhajA5nr2ANUvg%2FekQjtgh0h3xLva0EcwHnLNNsl7VMEdVLXNGi9Yn4AbjrBxX0sdo51DL1XQYR%2Bm3ZeS71I7keexEnrZspp%2FQxaT7cJpceXr7ZrKg2HwJu8dMo%2Bcz13Vr%2F9p7MtZ6cIjUMMF3ql8GX%2BYO949RdgS31KNBb1Ln917v7GlLaZhbejgGAYJOFI2YMuWhs3WkZxOZCMy1JnW%2Bbp3OcdyffBt0ToaKaLcUx1mt6kzzOb4Ow%3D%3D--FD5dTEIs8HUBjIdH--P%2B86pTRJ%2FwWUndICVXAaNA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
|
||||
Set-Cookie: _octo=GH1.1.1513753117.1611279681; Path=/; Domain=github.com; Expires=Sat, 22 Jan 2022 01:41:21 GMT; Secure; SameSite=Lax
|
||||
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sat, 22 Jan 2022 01:41:21 GMT; HttpOnly; Secure; SameSite=Lax
|
||||
Content-Length: 9
|
||||
X-GitHub-Request-Id: 8F40:2881:CD3AD3:1222997:600A2D41
|
||||
2021-01-21 20:41:21 ERROR 404: Not Found.
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
but overall the point is that git does seems to get 401 with auth availability (although I failed to dig out how exactly it gets it). So I will leave it to the experts to figure out how
|
||||
"""]]
|
|
@ -0,0 +1,29 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2021-01-22T18:36:50Z"
|
||||
content="""
|
||||
These possibilities seem about equally likely to me:
|
||||
|
||||
1. gogs has not implemented authed access to the files git-annex needs
|
||||
for private repositories
|
||||
2. gogs has a bug where it returns 404 rather than 401 when not authed,
|
||||
but serves the files up when authed.
|
||||
|
||||
So why try to work around it in git-annex when it's a coin flip whether
|
||||
git-annex can at all, when in either case there's clearly a bug in gogs,
|
||||
and is specifically in code in gogs that is intended to support git-annex?
|
||||
|
||||
github has a bad habit of using user-agent to make urls do different
|
||||
things when git accesses them than when other http clients do. That is the
|
||||
case in your example; use wget -U git/1 and it will 401. But I don't
|
||||
see how that's relevant, since git-annex does not talk to github except for
|
||||
a) via git and b) via its git-lfs implementation (which supports http basic
|
||||
auth although I can't remember if I tested it against github's server or only
|
||||
other servers like gitlab).
|
||||
|
||||
If github's lfs endpoint did do user-agent sniffing, IMHO that would
|
||||
violate their spec, but also yeah, I'd probably put in some appropiately
|
||||
snarky fake user-agent in git-annex there. But not in general, and none of
|
||||
this says git-annex should be treating 404 like 401.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 6"
|
||||
date="2021-01-25T15:18:39Z"
|
||||
content="""
|
||||
THANK YOU Joey. That is indeed quite odd (\"security through obscurity\") behavior from github (note: github returns 401 even if that repo does not exist, so it is at least consistent in not revealing presence/absence of private repos at a url). Feel welcome to close this issue since I guess nothing should indeed be done on git-annex side, and ideally `gin` portal just returns 401 in such cases
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2021-01-28T16:37:59Z"
|
||||
content="""
|
||||
github's rationalle for the sniffing, such as it is, is that an url to a
|
||||
git repository lets you view it in the web ui, and the same url can be
|
||||
cloned by git.
|
||||
|
||||
Agreed, I'll close this in git-annex, and they can fix it in gin.
|
||||
"""]]
|
|
@ -0,0 +1,88 @@
|
|||
### Please describe the problem.
|
||||
|
||||
decided to test annex on a new to me file system -- beegfs
|
||||
|
||||
```
|
||||
$> mount | grep beegfs
|
||||
beegfs_nodev on /mnt/beegfs type beegfs (rw,relatime,cfgFile=/etc/beegfs/beegfs-client.conf,_netdev)
|
||||
|
||||
```
|
||||
|
||||
```
|
||||
$> modinfo beegfs
|
||||
filename: /lib/modules/5.4.0-77-generic/updates/fs/beegfs_autobuild/beegfs.ko
|
||||
version: 7.2.2
|
||||
alias: fs-beegfs
|
||||
author: Fraunhofer ITWM, CC-HPC
|
||||
description: BeeGFS parallel file system client (http://www.beegfs.com)
|
||||
license: GPL v2
|
||||
srcversion: 533BB7E5866E52F63B9ACCB
|
||||
depends: ib_core,rdma_cm
|
||||
retpoline: Y
|
||||
name: beegfs
|
||||
vermagic: 5.4.0-77-generic SMP mod_unload modversions
|
||||
|
||||
```
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
1. get beegfs
|
||||
|
||||
2.
|
||||
```
|
||||
leviathan:/mnt/beegfs/yoh/tmp
|
||||
$> TMPDIR=$PWD/annex-tmp git annex test
|
||||
```
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
```
|
||||
leviathan:/mnt/beegfs/yoh/tmp
|
||||
$> git annex version
|
||||
git-annex version: 8.20210621-g91f9aac
|
||||
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
|
||||
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
|
||||
operating system: linux x86_64
|
||||
supported repository versions: 8
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
|
||||
```
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
looking in detail -- it seems it is not init, but addurl (but subject is set in stone now, can't edit) -- got mislead I guess by the interleaving stdout/err:
|
||||
|
||||
[[!format sh """
|
||||
addurl: FAIL (2.79s)
|
||||
Init Tests
|
||||
init: ./Test/Framework.hs:57:
|
||||
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo96/myurl failed (transcript follows)
|
||||
(to _mnt_beegfs_yoh_tmp_.t_tmprepo96_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo96%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
|
||||
|
||||
...
|
||||
addurl: FAIL (1.86s)
|
||||
./Test/Framework.hs:57:
|
||||
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo193/myurl failed (transcript follows)
|
||||
(to _mnt_beegfs_yoh_tmp_.t_tmprepo193_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo193%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
|
||||
Init Tests
|
||||
...
|
||||
addurl: FAIL (2.29s)
|
||||
./Test/Framework.hs:57:
|
||||
addurl on file:///mnt/beegfs/yoh/tmp/.t/tmprepo293/myurl failed (transcript follows)
|
||||
(to _mnt_beegfs_yoh_tmp_.t_tmprepo293_myurl) git-annex: .git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo293%myurl: renameFile:renamePath:rename: resource busy (Device or resource busy)failedaddurl: 1 failed
|
||||
|
||||
3 out of 984 tests failed (1776.96s)
|
||||
|
||||
"""]]
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
on days ending with `y` it seems to work quite nicely.
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]], I think, though have not installed beegfs to test.
|
||||
> --[[Joey]]
|
|
@ -0,0 +1,23 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-07-02T14:26:34Z"
|
||||
content="""
|
||||
EBUSY The rename fails because oldpath or new‐
|
||||
path is a directory that is in use by
|
||||
some process (perhaps as current working
|
||||
directory, or as root directory, or be‐
|
||||
cause it was open for reading) or is in
|
||||
use by the system (for example as mount
|
||||
point), while the system considers this
|
||||
an error. (Note that there is no re‐
|
||||
quirement to return EBUSY in such cases—
|
||||
there is nothing wrong with doing the
|
||||
rename anyway—but it is allowed to re‐
|
||||
turn EBUSY if the system cannot other‐
|
||||
wise handle such situations.)
|
||||
|
||||
".git/annex/tmp/URL-s3--file&c%%%mnt%beegfs%yoh%tmp%.t%tmprepo193%myurl"
|
||||
is not a directory, it is a file. So, rename seems to have no business failing
|
||||
in this way. Probably the FS is buggy.
|
||||
"""]]
|
|
@ -0,0 +1,24 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2021-07-04T03:27:20Z"
|
||||
content="""
|
||||
Thank you Joey! indeed most likely a \"too fancy\" of a file system.
|
||||
|
||||
On [https://www.beegfs.io/release/beegfs_6/Changelog.txt](https://www.beegfs.io/release/beegfs_6/Changelog.txt) I found
|
||||
|
||||
|
||||
```
|
||||
== Changes in 6.11 (release date: 2017-05-26) ==
|
||||
|
||||
General Changes:
|
||||
|
||||
* client: Add option sysRenameEbusyAsXdev to return EXDEV instead of EBUSY if
|
||||
rename() is called on open files. (Tools like \"mv\" can handle EXDEV as return
|
||||
value.)
|
||||
```
|
||||
|
||||
do you think EXDEV would be worked out Ok if that is the culprit? (meanwhile I will let the beegfs users know as well - may be they could try)
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,50 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-07-05T16:18:39Z"
|
||||
content="""
|
||||
I've checked with strace, to see if the file was open while it was being
|
||||
renamed. Not that there is anything generally wrong with renaming an open
|
||||
file on a POSIX file system, but it would possibly be a problem on windows,
|
||||
where some forms of opening a file locks it in place. And apparently
|
||||
this filesystem is not trying to be very POSIX either.
|
||||
|
||||
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 17
|
||||
413026 write(17, "hi\n", 3) = 3
|
||||
413026 close(17) = 0
|
||||
...
|
||||
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 11
|
||||
413026 read(11, "hi\n", 8192) = 3
|
||||
...
|
||||
413026 openat(AT_FDCWD, ".git/annex/tmp/URL-s3--file&c%%%tmp%foo", O_RDONLY|O_NOCTTY|O_NONBLOCK <unfinished ...>
|
||||
413028 <... futex resumed>) = 0
|
||||
413026 <... openat resumed>) = 16
|
||||
...
|
||||
413026 read(16, "hi\n", 32752) = 3
|
||||
...
|
||||
413026 close(16) = 0
|
||||
...
|
||||
413026 rename(".git/annex/tmp/URL-s3--file&c%%%tmp%foo", "_tmp_foo") = 0
|
||||
...
|
||||
413028 close(11) = 0
|
||||
|
||||
So the file is left open across the rename, which ought to be able to be
|
||||
changed and would presumably fix the problem.
|
||||
|
||||
It's also a bit odd that the file gets read twice after being copied,
|
||||
once for checksum makes sense, but what's the other one?
|
||||
(Copying while checksumming should be able to avoid one of the reads,
|
||||
but there is an open todo tracking progress on that.)
|
||||
|
||||
Aah, the other read is when it's probing if the file is html in case it ought
|
||||
to be passed off to youtube-dl. That is the read that lingers for a while,
|
||||
because it's done with a lazy readFile and probing if the file is html doesn't
|
||||
read to the end and close it, so the file handle lingers until the GC gets
|
||||
around to closing it. Of course youtube-dl won't be able to do anything with a
|
||||
file url, but git-annex doesn't know that. And anyway the failure on this
|
||||
filesystem would also happen when adding a http url.
|
||||
|
||||
Ok, fixed it to close the handle promptly. That should fix the test suite.
|
||||
It does not seem unlikely that something else will break due to this
|
||||
filesystem's unusual behavior though.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-07-05T17:17:59Z"
|
||||
content="""
|
||||
Also looked over other uses of readFile. While there are a couple that
|
||||
don't read the whole file and so may have a lag closing, none of them are
|
||||
files that are used in ways that seem likely to trigger this kind of
|
||||
problem.
|
||||
"""]]
|
|
@ -0,0 +1,28 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Probably it is more of a todo than a bug.
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
This is a use-case where I am trying to establish a special remote to be shared by multiple unrelated repositories.
|
||||
|
||||
So I had original repo1 in which I
|
||||
|
||||
- created an external special remote with chunking, it got UUID1
|
||||
- uploaded some data (all got chunked)
|
||||
|
||||
created repo2 in which I
|
||||
|
||||
- initialized special remote with identical settings and provided `uuid=UUID1`
|
||||
- decided to test if annex would be able to get a key from the shared special remote
|
||||
|
||||
but `annex fsck --key KEY --from remote --fast`, since it doesn't have an exact chunking list, just provides special remote backend with original full key only, which is obviously not found, and it reports failure. But I wondered -- couldn't `git-annex` just use chunking size and "mint" possible chunked-keys to test on the special remote since it has all the information? After all chunk keys AFAIK are deterministically minted and pretty much are just "augmented" original key with `-S<chunksize>-C<chunkindex>` added to the key.
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
8.20200908+git175-g95d02d6e2-1~ndall+1
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -0,0 +1,26 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-10-22T16:09:17Z"
|
||||
content="""
|
||||
Note that what you are trying to do will only work if the special remote
|
||||
is not encrypted.
|
||||
|
||||
As well as your use case, which seems very unusual, I think one other use
|
||||
case would be if a clone uploaded to the special remote, but never synced
|
||||
out its git-annex branch before being lost, and fsck --from
|
||||
remote is being run in another clone to reconstruct it. Currently it
|
||||
won't try chunks as none are recorded.
|
||||
|
||||
Speculatively trying the current remote's chunk config would handle the
|
||||
majority of cases, though wouldn't help if the other clone had adjusted the
|
||||
special remote's chunk size too.
|
||||
|
||||
There's some overhead, but it can check it last, and not check it if
|
||||
it's in the list of known chunks, so the overhead would only usually
|
||||
be paid if the content git-annex expected to be present had gone missing,
|
||||
which I think is rare enough to not care about.
|
||||
|
||||
(Also, this can only be done when the size of the key is known, so not
|
||||
eg addurl --relaxed keys.)
|
||||
"""]]
|
|
@ -0,0 +1,29 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2020-10-22T17:00:25Z"
|
||||
content="""
|
||||
Implemented that. But..
|
||||
|
||||
As implemented, there's nothing to make the chunk size get stored in the
|
||||
chunk log for a key, after it accesses its content using the configured
|
||||
chunk size.
|
||||
|
||||
So, changing the chunk= of the remote can prevent accessing content that
|
||||
was accessible before. Of course, avoiding that is why chunk sizes are
|
||||
logged in the first place.
|
||||
|
||||
Seems like maybe fsck --from should fix the chunk log? I think
|
||||
fsck would always need to be used, to fix up the location log, before any
|
||||
other commands rely on the data being in the special remote, so it seems
|
||||
fine to only fix the chunk log there.
|
||||
|
||||
But, also a bit unclear how fsck would find out when it needs to do this.
|
||||
It only needs to when the remote's configured chunk size is not
|
||||
listed in the chunk log. But that's also common after changing the chunk
|
||||
size of a remote. So it would have to mess around with checking the
|
||||
presence of chunk keys itself, which would be extra work and also ugly
|
||||
to implement.
|
||||
|
||||
I'm leaving this todo^Wbug open for now due to this.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-10-22T17:36:12Z"
|
||||
content="""
|
||||
Ok, made update the chunk log as needed while checking if chunks are
|
||||
present. So this is done.
|
||||
"""]]
|
|
@ -0,0 +1,119 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I was trying to follow https://git-annex.branchable.com/special_remotes/git-lfs/ (only without any encryption), to store at least some data on github via LFS (e.g., for https://github.com/dandi-datasets/nwb_test_data).
|
||||
|
||||
Even though I do provide URL to the `annex initremote` call, it is not stored within `remote.log`:
|
||||
|
||||
|
||||
[[!format sh """
|
||||
$> sudo rm -rf /tmp/testds2 && ( mkdir /tmp/testds2 && cd /tmp/testds2 && git init && git annex init && git annex initremote gh-lfs autoenable=true type=git-lfs url=git@github.com:yarikoptic/testds2.git encryption=none && git show git-annex:remote.log; )
|
||||
Initialized empty Git repository in /tmp/testds2/.git/
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
initremote gh-lfs ok
|
||||
(recording state in git...)
|
||||
c9132e68-e9d8-40b5-ba34-5d60a8b9c844 autoenable=true encryption=none name=gh-lfs type=git-lfs timestamp=1570642576.06742667s
|
||||
|
||||
"""]]
|
||||
|
||||
git annex 7.20190912-1~ndall+1
|
||||
|
||||
|
||||
If I just proceed, populate and copy some data via lfs (example uses datalad's `create-sibling-github` to create a new repo):
|
||||
|
||||
[[!format sh """
|
||||
$> ( cd /tmp/testds2 && touch 123 && git annex add 123 && git commit -m 'add 123' && datalad create-sibling-github -s origin testds2 && git push -u origin master && git annex copy --to=gh-lfs 123; git push origin git-annex; )
|
||||
add 123
|
||||
ok
|
||||
(recording state in git...)
|
||||
[master (root-commit) d2b2f52] add 123
|
||||
1 file changed, 1 insertion(+)
|
||||
create mode 120000 123
|
||||
[WARNING] Authentication failed using a token.
|
||||
.: origin(-) [https://github.com/yarikoptic/testds2.git (git)]
|
||||
'https://github.com/yarikoptic/testds2.git' configured as sibling 'origin' for <Dataset path=/tmp/testds2>
|
||||
Enumerating objects: 3, done.
|
||||
Counting objects: 100% (3/3), done.
|
||||
Delta compression using up to 4 threads
|
||||
Compressing objects: 100% (2/2), done.
|
||||
Writing objects: 100% (3/3), 307 bytes | 307.00 KiB/s, done.
|
||||
Total 3 (delta 0), reused 0 (delta 0)
|
||||
To github.com:yarikoptic/testds2.git
|
||||
* [new branch] master -> master
|
||||
Branch 'master' set up to track remote branch 'master' from 'origin'.
|
||||
copy 123 (to gh-lfs...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
Enumerating objects: 19, done.
|
||||
Counting objects: 100% (19/19), done.
|
||||
Delta compression using up to 4 threads
|
||||
Compressing objects: 100% (15/15), done.
|
||||
Writing objects: 100% (19/19), 1.66 KiB | 567.00 KiB/s, done.
|
||||
Total 19 (delta 4), reused 0 (delta 0)
|
||||
remote: Resolving deltas: 100% (4/4), done.
|
||||
remote:
|
||||
remote: Create a pull request for 'git-annex' on GitHub by visiting:
|
||||
remote: https://github.com/yarikoptic/testds2/pull/new/git-annex
|
||||
remote:
|
||||
To github.com:yarikoptic/testds2.git
|
||||
* [new branch] git-annex -> git-annex
|
||||
|
||||
"""]]
|
||||
|
||||
on a new clone I get a complaint that `url=` is missing, and no data is fetched
|
||||
|
||||
[[!format sh """
|
||||
$> sudo rm -rf testds2-clone && git clone git@github.com:yarikoptic/testds2.git testds2-clone && ( cd testds2-clone && git annex init && git annex get 123; )
|
||||
Cloning into 'testds2-clone'...
|
||||
remote: Enumerating objects: 22, done.
|
||||
remote: Counting objects: 100% (22/22), done.
|
||||
remote: Compressing objects: 100% (13/13), done.
|
||||
remote: Total 22 (delta 5), reused 21 (delta 4), pack-reused 0
|
||||
Receiving objects: 100% (22/22), done.
|
||||
Resolving deltas: 100% (5/5), done.
|
||||
123@
|
||||
init (merging origin/git-annex into git-annex...)
|
||||
(recording state in git...)
|
||||
(scanning for unlocked files...)
|
||||
Invalid command: 'git-annex-shell 'configlist' '/~/yarikoptic/testds2.git''
|
||||
You appear to be using ssh to clone a git:// URL.
|
||||
Make sure your core.gitProxy config option and the
|
||||
GIT_PROXY_COMMAND environment variable are NOT set.
|
||||
|
||||
Remote origin does not have git-annex installed; setting annex-ignore
|
||||
|
||||
This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin
|
||||
(Auto enabling special remote gh-lfs...)
|
||||
|
||||
Specify url=
|
||||
ok
|
||||
(recording state in git...)
|
||||
get 123 (not available)
|
||||
Try making some of these repositories available:
|
||||
92ce3cfc-8c58-42db-8aa3-ea4d4b3a6011 -- yoh@hopa:/tmp/testds2
|
||||
c9132e68-e9d8-40b5-ba34-5d60a8b9c844 -- gh-lfs
|
||||
|
||||
(Note that these git remotes have annex-ignore set: origin)
|
||||
failed
|
||||
git-annex: get: 1 failed
|
||||
"""]]
|
||||
|
||||
so I had to enableremote it while providing URL I become able to `get` the file:
|
||||
|
||||
[[!format sh """
|
||||
$> git annex enableremote gh-lfs autoenable=true type=git-lfs url=git@github.com:yarikoptic/testds2.git encryption=none && git annex get 123
|
||||
enableremote gh-lfs ok
|
||||
(recording state in git...)
|
||||
get 123 (from gh-lfs...)
|
||||
(checksum...) ok
|
||||
(recording state in git...)
|
||||
"""]]
|
||||
|
||||
|
||||
Shouldn't that URL be recorded in remote.log? (similarly to `type=git` remotes)
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[done]]; see my comment --[[Joey]]
|
|
@ -0,0 +1,24 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-10-21T19:07:42Z"
|
||||
content="""
|
||||
That is intentional, because a git-lfs remote can have multiple urls that
|
||||
can access it, and different users of the remote might want to use
|
||||
different urls.
|
||||
|
||||
It's also documented to work that way, the same as the directory
|
||||
special remote documents that you have to provide directory= each time it's
|
||||
enabled.
|
||||
|
||||
But, now that git-annex supports sameas remotes, it would be possible to
|
||||
have one special remote for each different url to a given git-lfs remote,
|
||||
and have git-annex know they're the same repository. The user can then
|
||||
enableremote whichever one they want.
|
||||
|
||||
See [[todo/git-lfs_special_remote_simpler_setup]] for where I hope this
|
||||
will lead.
|
||||
|
||||
Closing this bug report as redundant with that todo item, and not actually a
|
||||
bug since it is documented to behave the way it currently behaves.
|
||||
"""]]
|
|
@ -0,0 +1,49 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I am trying to import (and then reimport) a directory which I sync to from box.com shared with me folder.
|
||||
I have used `--duplicate` option to not delete original files upon `import`. But then upon-rerunning `import` command git-annex would error out if file already exists. `--reinject-duplicates` seems to be the option to use, but all those modes are "exclusive" so I cannot use `--duplicate --reinject-duplicates`, and using `--reinject-duplicates` alone would result in removing original files (as without `--duplicates`)
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
7.20190819+git2-g908476a9b-1~ndall+1
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
my little demo snippet for import with using --duplicate and then both options at the same time:
|
||||
|
||||
[[!format sh """
|
||||
$> mkdir /tmp/d-in /tmp/d-repo && touch /tmp/d-in/file && ( cd /tmp/d-repo && git init && git annex init && for r in 1 2; do echo "Run $r"; ls -l ../d-in && git annex import --duplicate ../d-in/.; done )
|
||||
Initialized empty Git repository in /tmp/d-repo/.git/
|
||||
init ok
|
||||
(recording state in git...)
|
||||
Run 1
|
||||
total 0
|
||||
-rw------- 1 yoh yoh 0 Oct 14 10:51 file
|
||||
import ./file ok
|
||||
(recording state in git...)
|
||||
Run 2
|
||||
total 0
|
||||
-rw------- 1 yoh yoh 0 Oct 14 10:51 file
|
||||
import ./file
|
||||
not overwriting existing ./file (is a symlink)
|
||||
failed
|
||||
git-annex: import: 1 failed
|
||||
|
||||
|
||||
$> cd d-repo
|
||||
$> git annex import ../d-in/. --reinject-duplicates --duplicate 2>&1 | head -n 3
|
||||
Invalid option `--duplicate'
|
||||
|
||||
Usage: git-annex COMMAND
|
||||
|
||||
"""]]
|
||||
|
||||
|
||||
Or may be there is a better way to establish re-runnable import from a directory workflow?
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
[[!tag moreinfo]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-11-19T17:12:41Z"
|
||||
content="""
|
||||
I think that you can accomplish what you want by making the directory
|
||||
you're importing from be a directory special remote with exporttree=yes
|
||||
importtree=yes and use the new `git annex import master --from remote`
|
||||
|
||||
If that does not do what you want, I'd prefer to look at making it be able
|
||||
to do so. I hope to eventually remove the legacy git-annex import from
|
||||
directory, since we have this new more general interface.
|
||||
"""]]
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2020-03-30T15:50:17Z"
|
||||
content="""
|
||||
Tagged moreinfo since I'm waiting on a reply to my suggestion.
|
||||
"""]]
|
|
@ -0,0 +1,59 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 3"
|
||||
date="2020-10-06T01:26:59Z"
|
||||
content="""
|
||||
I think it worked wonderfully
|
||||
|
||||
<details>
|
||||
<summary>here is my script I have tried</summary>
|
||||
|
||||
```shell
|
||||
#!/bin/bash
|
||||
|
||||
export PS4='> '
|
||||
set -x
|
||||
set -eu
|
||||
cd \"$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)\"
|
||||
|
||||
mkdir d-in d-repo
|
||||
echo content >| d-in/file
|
||||
|
||||
function dance() {
|
||||
git annex import master --from d-in
|
||||
# but we need to merge it
|
||||
git merge d-in/master
|
||||
ls -l
|
||||
grep -e . *
|
||||
}
|
||||
|
||||
(
|
||||
cd d-repo
|
||||
git init
|
||||
git annex init
|
||||
git annex initremote d-in type=directory directory=../d-in exporttree=yes importtree=yes encryption=none
|
||||
|
||||
ls -l ../d-in
|
||||
|
||||
for r in 1 2; do
|
||||
echo \"Run $r\";
|
||||
dance
|
||||
done
|
||||
|
||||
echo \"more\" >> ../d-in/file
|
||||
echo \"new\" > ../d-in/newfile
|
||||
dance
|
||||
|
||||
rm ../d-in/file
|
||||
dance
|
||||
|
||||
)
|
||||
|
||||
```
|
||||
</details>
|
||||
|
||||
and it seemed to do the right job! I have not tried to add some `.gitattributes` into that branch it imports into to tell some files to go to git, but I hope it would just work, and if not -- I will come back! feel welcome to close this issue.
|
||||
|
||||
Cheers
|
||||
"""]]
|
|
@ -0,0 +1,70 @@
|
|||
### Please describe the problem.
|
||||
|
||||
|
||||
[original question raised by John](https://github.com/dandi/dandisets/issues/139#issuecomment-1149948239) which lead me to the goose chase.
|
||||
|
||||
Following reproducer
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
|
||||
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
|
||||
set -eux
|
||||
|
||||
git init --bare remote
|
||||
( cd remote; git annex init; cat config )
|
||||
rpath=$PWD/remote
|
||||
|
||||
git init repo
|
||||
cd repo
|
||||
git annex init
|
||||
echo 'This is test text.' > file.txt
|
||||
git add file.txt
|
||||
git commit -m Init file.txt
|
||||
|
||||
git remote add --fetch remote-git $rpath
|
||||
|
||||
# without this -- there is no annex-uuid for remote -- git-annex branch is not getting merged
|
||||
git annex info
|
||||
|
||||
cat .git/config
|
||||
|
||||
# but this still fails
|
||||
git annex initremote testremote type=git location=$rpath autoenable=true
|
||||
|
||||
```
|
||||
|
||||
ends with
|
||||
|
||||
```
|
||||
[remote "remote-git"]
|
||||
url = /home/yoh/.tmp/dl-VjO0aSF/remote
|
||||
fetch = +refs/heads/*:refs/remotes/remote-git/*
|
||||
annex-uuid = afdc6d54-cd6d-4a20-b639-a639f9c7ef09
|
||||
+ git annex initremote testremote type=git location=/home/yoh/.tmp/dl-VjO0aSF/remote autoenable=true
|
||||
initremote testremote
|
||||
git-annex: could not find existing git remote with specified location
|
||||
failed
|
||||
initremote: 1 failed
|
||||
|
||||
```
|
||||
|
||||
so
|
||||
|
||||
- error "could not find existing git remote with specified location" seems not descriptive of the underlying problem since location matches the url. Underlying issue is still not clear why we can't initremote
|
||||
- as you could see in the script - need `annex info` to have annex-uuid populated and looking at [code ](https://git.kitenet.net/index.cgi/git-annex.git/tree/Remote/Git.hs?id=af0d854460c28230dc682faa7c6daf3d96698cb6#n110) comment -- it requires UUID to be known. If not known -- ideally should be a dedicated error message ("remote blah found but lacks uuid, check if remote is annex")
|
||||
- IMHO should not need manual `annex info` to merge git-annex branch
|
||||
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
above
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
10.20220504
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -0,0 +1,26 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2022-06-08T16:55:50Z"
|
||||
content="""
|
||||
Hmm, I think this only works for ssh:// urls currently.
|
||||
|
||||
Even the ssh url form host:/path does not work, because it gets
|
||||
normalized to a ssh:// url.
|
||||
|
||||
The implementation does not support non-url's at all; the provided location
|
||||
is treated as an url (`Git.Url location`). And even if it were treated as a
|
||||
path, the path gets normalized to a relative path and an absolute path (or
|
||||
differently relavatized path) would not work.
|
||||
|
||||
Using paths with this is rather problematic too, because if the repo is
|
||||
cloned to another machine, it would not find the repo at the recorded path.
|
||||
Similarly, relative paths are also problimatic. But it may as well support
|
||||
them to the extent it can.
|
||||
|
||||
I think this needs changes to the core Git data structure, to store the
|
||||
original, unmodified git.remote.path. Or a different interface than the
|
||||
current, one that accepts any repo location and probes it to find the uuid.
|
||||
The latter idea seems better because it simplifies the UI rather than
|
||||
complicating the internal representation.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2022-06-09T17:04:23Z"
|
||||
content="""
|
||||
Implemented probing of the uuid of the repo location. Which may change
|
||||
how you use this feature. Although the old roundabout method of having an
|
||||
existing git remote and running initremote with the same location will
|
||||
work too, it's not neccessary to do that anymore.
|
||||
"""]]
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="jkniiv"
|
||||
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
|
||||
subject="comment 2"
|
||||
date="2022-06-09T02:32:18Z"
|
||||
content="""
|
||||
Wouldn't it be possible to support (absolute) file:// urls, eg. something similar to
|
||||
`file:///home/jkniiv/test-VEfBrTZ/remote2`? In my mind they feel like a reasonable approximation
|
||||
of ssh:// urls and could be useful for getting a feel for git special remotes before setting
|
||||
up a bare git-repo/annex on an ssh-server. I know they are not the same thing implementation wise
|
||||
but I feel that being able to try this feature out on a least-effort basis would be useful
|
||||
from a pedagogical standpoint.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2022-06-09T17:28:19Z"
|
||||
content="""
|
||||
Re file:// urls, it does now work to use them in location=. I don't know if
|
||||
I'd consider using them any better than absolute paths though. YMMV.
|
||||
"""]]
|
|
@ -0,0 +1,150 @@
|
|||
[[!meta title="http remotes that require authentication are not yet supported"]]
|
||||
|
||||
It is not a ground shaking issue, but probably would be best to handle it more gracefully.
|
||||
|
||||
Initially mentioned while doing install using datalad. Account/permission is required to access this particular repo, ask Canadians for access if you don't have it yet Joey. credentials I guess got asked for and cached by git upon initial invocation, so upon subsequent calls didn't ask for any:
|
||||
|
||||
[[!format sh """
|
||||
$> datalad install https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
|
||||
[INFO ] Cloning https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids [1 other candidates] into '/tmp/Coffey-mri-bids'
|
||||
[INFO ] fatal: bad config line 1 in file /home/yoh/.tmp/git-annex96493-5.tmp
|
||||
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
|
||||
install(ok): /tmp/Coffey-mri-bids (dataset)
|
||||
"""]]
|
||||
|
||||
which boiled down to that message being spited out during `git annex init` which samples the remote, but fails to download the config and gets instead a redirected html page:
|
||||
|
||||
[[!format sh """
|
||||
$> git clone https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
|
||||
Cloning into 'Coffey-mri-bids'...
|
||||
warning: redirecting to https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids.git/
|
||||
remote: Enumerating objects: 398, done.
|
||||
remote: Counting objects: 100% (398/398), done.
|
||||
remote: Compressing objects: 100% (282/282), done.
|
||||
remote: Total 398 (delta 53), reused 393 (delta 48)
|
||||
Receiving objects: 100% (398/398), 34.97 KiB | 795.00 KiB/s, done.
|
||||
Resolving deltas: 100% (53/53), done.
|
||||
|
||||
|
||||
$> git -C Coffey-mri-bids annex init --debug
|
||||
...
|
||||
[2019-11-27 19:27:01.341315979] Request {
|
||||
host = "git.bic.mni.mcgill.ca"
|
||||
port = 443
|
||||
secure = True
|
||||
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/7.20190819+git2-g908476a9b-1~ndall+1")]
|
||||
path = "/bic/Coffey-mri-bids/config"
|
||||
queryString = ""
|
||||
method = "GET"
|
||||
proxy = Nothing
|
||||
rawBody = False
|
||||
redirectCount = 10
|
||||
responseTimeout = ResponseTimeoutDefault
|
||||
requestVersion = HTTP/1.1
|
||||
}
|
||||
|
||||
[2019-11-27 19:27:01.90016181] read: git ["config","--null","--list","--file","/home/yoh/.tmp/git-annex228094-5.tmp"]
|
||||
fatal: bad config line 1 in file /home/yoh/.tmp/git-annex228094-5.tmp
|
||||
[2019-11-27 19:27:01.913302324] process done ExitFailure 128
|
||||
|
||||
Remote origin not usable by git-annex; setting annex-ignore
|
||||
|
||||
$> wget -S https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
|
||||
--2019-11-27 19:29:25-- https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
|
||||
Resolving git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)... 132.216.133.92
|
||||
Connecting to git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)|132.216.133.92|:443... connected.
|
||||
HTTP request sent, awaiting response...
|
||||
HTTP/1.1 302 Found
|
||||
Server: nginx
|
||||
Date: Thu, 28 Nov 2019 00:29:26 GMT
|
||||
Content-Type: text/html; charset=utf-8
|
||||
Content-Length: 109
|
||||
Connection: keep-alive
|
||||
Cache-Control: no-cache
|
||||
Location: https://git.bic.mni.mcgill.ca/users/sign_in
|
||||
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; secure; HttpOnly
|
||||
X-Request-Id: xTcSyu4H36
|
||||
X-Runtime: 0.071681
|
||||
Strict-Transport-Security: max-age=31536000
|
||||
Referrer-Policy: strict-origin-when-cross-origin
|
||||
Location: https://git.bic.mni.mcgill.ca/users/sign_in [following]
|
||||
--2019-11-27 19:29:26-- https://git.bic.mni.mcgill.ca/users/sign_in
|
||||
Reusing existing connection to git.bic.mni.mcgill.ca:443.
|
||||
HTTP request sent, awaiting response...
|
||||
HTTP/1.1 200 OK
|
||||
Server: nginx
|
||||
Date: Thu, 28 Nov 2019 00:29:26 GMT
|
||||
Content-Type: text/html; charset=utf-8
|
||||
Transfer-Encoding: chunked
|
||||
Connection: keep-alive
|
||||
Vary: Accept-Encoding
|
||||
Cache-Control: max-age=0, private, must-revalidate
|
||||
Etag: W/"305857ff0ba591a1e4ee7fec83b5687c"
|
||||
Referrer-Policy: strict-origin-when-cross-origin
|
||||
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; expires=Thu, 28 Nov 2019 02:29:26 -0000; secure; HttpOnly
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Download-Options: noopen
|
||||
X-Frame-Options: DENY
|
||||
X-Permitted-Cross-Domain-Policies: none
|
||||
X-Request-Id: MHFi7Yjxe82
|
||||
X-Runtime: 0.063359
|
||||
X-Ua-Compatible: IE=edge
|
||||
X-Xss-Protection: 1; mode=block
|
||||
Strict-Transport-Security: max-age=31536000
|
||||
Referrer-Policy: strict-origin-when-cross-origin
|
||||
Length: unspecified [text/html]
|
||||
Saving to: ‘config’
|
||||
|
||||
config [ <=> ] 13.19K --.-KB/s in 0s
|
||||
|
||||
2019-11-27 19:29:26 (89.1 MB/s) - ‘config’ saved [13505]
|
||||
|
||||
$> cat config
|
||||
<!DOCTYPE html>
|
||||
<html class="devise-layout-html">
|
||||
<head prefix="og: http://ogp.me/ns#">
|
||||
<meta charset="utf-8">
|
||||
<meta content="IE=edge" http-equiv="X-UA-Compatible">
|
||||
<meta content="object" property="og:type">
|
||||
<meta content="GitLab" property="og:site_name">
|
||||
<meta content="Sign in" property="og:title">
|
||||
...
|
||||
"""]]
|
||||
|
||||
I guess the problem is multi-faceted:
|
||||
|
||||
1. in case of authenticated http remote, `git` caches credentials, but then `git annex` tries to download file directly (instead of somehow via git), it could not "sense" that remote to be a valid annex and/or get files from it.
|
||||
|
||||
You can try with this simple one -- user "demo", password "demo":
|
||||
|
||||
[[!format sh """
|
||||
$> git clone http://www.onerussian.com/tmp/secret-repo/.git
|
||||
Cloning into 'secret-repo'...
|
||||
Username for 'http://www.onerussian.com': demo
|
||||
Password for 'http://demo@www.onerussian.com':
|
||||
|
||||
$> git -C secret-repo annex init
|
||||
init (merging origin/git-annex into git-annex...)
|
||||
(recording state in git...)
|
||||
|
||||
Remote origin not usable by git-annex; setting annex-ignore
|
||||
ok
|
||||
(recording state in git...)
|
||||
|
||||
"""]]
|
||||
|
||||
although remote is a proper annex, indeed `git annex` cannot use it since does not authenticate as git does.
|
||||
So even though the error message is not incorrect, I would say the situation is suboptimal
|
||||
|
||||
2. if remote server instead of just returning 404 or 403 error code (as eg github seems to do in similar cases of non-authenticated access) instead redirects to some login page, annex feeds that page as a config to git, ignores the error message and just marks that remote as ignored for annex, while leaking that obscure "fatal" error message from git.
|
||||
|
||||
IMHO, ideally 1. should be addressed properly (authentication), and for 2. annex should spit out some more sensible message ("git failed to parse a config file fetched from the remote X. Please inspect it at this /path/config"), so keep that file around for debugging. As it is now I had to dig quite deep to figure out WTF is going on.
|
||||
|
||||
git annex 7.20190819+git2-g908476a9b-1~ndall+1 and the same with bleeding edge 7.20191114+git43-ge29663773-1~ndall+1 (probably that commit is the one with my patch for stricter git versioning, so use the count of 42 ;))
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[done]]; the error message is improved and also git remotes that need
|
||||
> http basic auth to access will get password from `git credential`.
|
||||
> --[[Joey]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="related: shouldn't git annex try external remotes to download config?"
|
||||
date="2019-11-28T01:22:53Z"
|
||||
content="""
|
||||
I haven't tested, but I can see the situation where a specific repository URL could be handled by external special remote (such as datalad, downloaders of which do handle obscure setups such as this one without 403/404 but rather forwarding to login page) which would provide authenticated access to the URL. Would annex even try that config URL via external special remotes?
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2019-11-29T18:09:45Z"
|
||||
content="""
|
||||
one of the use-cases (will be) https://gin.g-node.org/ -- an archive of (primarily) electrophys data. The platform is based on gogs, but uses git-annex underneath. It \"will be\" because currently access to git-annex is provided only via ssh, but as of today it is already possible to `git clone` (tried on public, didn't try private) datasets via https, and developers are looking into exposing git-annex also via http. To access private datasets authentication will need to be handled
|
||||
"""]]
|
|
@ -0,0 +1,31 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-01-22T16:04:37Z"
|
||||
content="""
|
||||
git-annex could use `git credential` if the config download fails with
|
||||
401 unauthorized and then retry with the credentials. (The git-lfs special
|
||||
remote already does this.) And it would also need to do the same thing
|
||||
when getting a key from the remote.
|
||||
|
||||
But that would not help with the https://git.bic.mni.mcgill.ca example,
|
||||
apparently, because there's no 401, but a 302 redirect to a 200,
|
||||
that is indistingishable from a successful download.
|
||||
|
||||
Yeah, when git-annex expects a git config, if it doesn't parse as one,
|
||||
it could retry, asking for credentials.
|
||||
But that seems asking for trouble: what if it fails to parse for
|
||||
another reason, maybe the web server served up something other than the
|
||||
expected config, maybe a captive portal got in the way. There would be a
|
||||
username/password prompt that doesn't make sense to the user at all.
|
||||
|
||||
And if this happens in a key download, git-annex certianly has no way to
|
||||
tell that what it downloaded is not intended as the content of a key,
|
||||
short of verifying the content, and failure to verify certainly doesn't
|
||||
justify prompting for a username/password.
|
||||
|
||||
So, I am not comfortable with falling back to ask for credentials unless
|
||||
I've seen a http status code that indicates they are necessary.
|
||||
And IMHO gitlab's use of a 302 redirect to a login page is a bug in
|
||||
gitlab, and will need to be fixed there, or a better http server used.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""re: related: shouldn't git annex try external remotes to download config?"""
|
||||
date="2020-01-22T16:31:16Z"
|
||||
content="""
|
||||
No, the external special remote protocol is not aimed at downloading git
|
||||
config files. Anyway, this code path is never involved with using
|
||||
special remotes; the uuid of a special remote is known and so there is no
|
||||
need to ever download a git config file to discover it.
|
||||
"""]]
|
|
@ -0,0 +1,48 @@
|
|||
### Please describe the problem.
|
||||
|
||||
May be not a problem per se, but decided to check if expected. Following [this advise](http://git-annex.branchable.com/todo/git_smudge_clean_interface_suboptiomal/#comment-65f848510d8684bf65c6698f68b700dd) I have `git config filter.annex.process "git-annex filter-process"` in that git-annex repo and now observe following tree (in htop) of processes:
|
||||
|
||||
```
|
||||
3799768 dandi 20 0 1025G 191M 40616 S 6.6 0.3 0:31.87 │ │ ├─ git-annex addurl --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw
|
||||
3799796 dandi 20 0 191M 5088 4680 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
|
||||
3805272 dandi 20 0 6892 3420 2992 S 0.0 0.0 0:00.27 │ │ │ ├─ /bin/bash /usr/bin/git-annex-remote-rclone
|
||||
3805640 dandi 20 0 20432 13032 4024 S 0.0 0.0 0:02.82 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
|
||||
3805646 dandi 20 0 20432 13044 4036 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
|
||||
3805650 dandi 20 0 31900 4064 3816 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805685 dandi 20 0 30144 4000 3752 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805704 dandi 20 0 30144 16076 15792 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805705 dandi 20 0 30144 3976 3728 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805717 dandi 20 0 30144 15968 15680 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805781 dandi 20 0 30144 3980 3724 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805786 dandi 20 0 30144 4068 3820 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805807 dandi 20 0 30144 16028 15744 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805808 dandi 20 0 30144 3884 3636 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805828 dandi 20 0 30144 4008 3764 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3805848 dandi 20 0 20432 13104 4092 S 0.0 0.0 0:00.04 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
|
||||
3805852 dandi 20 0 20432 12948 3940 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. check-ignore -z --stdin --verbose --non-matching
|
||||
3805865 dandi 20 0 20432 13032 4024 S 0.0 0.0 0:00.02 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
|
||||
3806054 dandi 20 0 30144 4004 3752 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3806066 dandi 20 0 45216 5108 4700 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
|
||||
3806067 dandi 20 0 30144 3888 3640 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3806068 dandi 20 0 30144 16032 15748 S 0.0 0.0 0:00.01 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3806095 dandi 20 0 30144 4060 3816 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3806104 dandi 20 0 20432 12928 3916 S 0.0 0.0 0:00.06 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs check-attr -z --stdin annex.backend annex.largefiles annex.numcopies annex.mincopies --
|
||||
3806110 dandi 20 0 30144 15944 15660 S 0.0 0.0 0:00.02 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
3804258 dandi 20 0 1024G 44336 37772 S 0.0 0.1 0:00.04 │ │ ├─ git-annex addurl --batch --with-files --jobs 5 --json --json-error-messages --json-progress --raw
|
||||
3804277 dandi 20 0 40844 5124 4740 S 0.0 0.0 0:00.00 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
|
||||
3805399 dandi 20 0 1024G 23508 20844 S 0.0 0.0 0:00.61 │ │ ├─ git-annex examinekey --batch --migrate-to-backend=SHA256E
|
||||
3805493 dandi 20 0 1024G 36516 26184 S 0.0 0.1 0:01.51 │ │ ├─ git-annex fromkey --force --batch --json --json-error-messages
|
||||
3805503 dandi 20 0 25788 5120 4712 S 0.0 0.0 0:00.00 │ │ │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
|
||||
3805510 dandi 20 0 12472 3984 3732 S 0.0 0.0 0:00.05 │ │ │ └─ git --git-dir=.git --work-tree=. --literal-pathspecs hash-object -w --stdin-paths --no-filters
|
||||
```
|
||||
|
||||
which might be ok but still wonder why they are just sleeping there in more than one per `--jobs` number quantities. git annex 10.20220624-g769be12
|
||||
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
|
||||
> [[done]]; this is now handled like other git helper processes
|
||||
> and will be capped to the maximum of the number of jobs or cpu cores,
|
||||
> and in practice usually fewer than that will be started. --[[Joey]]
|
|
@ -0,0 +1,16 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2022-07-25T20:37:55Z"
|
||||
content="""
|
||||
I was able to reproduce this by feeding 10 urls into git-annex addurl
|
||||
-J5 and got 7 hash-object processes running.
|
||||
|
||||
filter.annex.process has nothing to do with this. I reproduced the behavior
|
||||
without it set.
|
||||
|
||||
Seems like a simple concurrency issue, where each thread potentially starts
|
||||
its own hash-object handle, and there can be around 2x as many threads
|
||||
started as the -J number due to job stages. Annex.Concurrent sets up pools of
|
||||
handles for other similar git processes, but not hash-object.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
(Sorry about the title; I was trying to work within the character limit.)
|
||||
|
||||
When invoking `git-annex metadata --batch --json --json-error-messages`, if an error occurs in response to some input — say, because the name of a nonexistent file was supplied (or, in my case, because the name of a file downloaded milliseconds ago in a parallel addurl process was supplied) — then `git-annex metadata` will output "git-annex: not an annexed file: {filepath}" to standard error and immediately exit. Not only is this in contrast to what it seems `--json-error-messages` should do, but the "exiting immediately" bit is in contrast to my understanding of how batch mode is supposed to work. Surely this should be fixed?
|
||||
|
||||
[[!meta author=jwodder]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-11-01T16:27:48Z"
|
||||
content="""
|
||||
For consistency with other --batch, I've made it reply with a blank line
|
||||
when the input is not an annexed file.
|
||||
|
||||
Do note that --json-error-messages cannot cram every possible kind of error
|
||||
message into a json object. In particular, errors that occur at startup,
|
||||
and not when acting on a particular file or key, do not fit into the json
|
||||
schema.
|
||||
"""]]
|
|
@ -0,0 +1,44 @@
|
|||
### Please describe the problem.
|
||||
|
||||
From [https://github.com/DanielDent/git-annex-remote-rclone/pull/57](https://github.com/DanielDent/git-annex-remote-rclone/pull/57), where we use that rclone special remote for backup of DANDI data to dropbox
|
||||
|
||||
Seems like a test sometimes fails on Mac OS with:
|
||||
|
||||
```
|
||||
+ git-annex copy -J5 --quiet . --to GA-rclone-CI
|
||||
git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
|
||||
copy: 1 failed
|
||||
Error: Process completed with exit code 1.
|
||||
```
|
||||
|
||||
indeed so far seemed to happen only on Mac
|
||||
|
||||
```
|
||||
(git)smaug:/mnt/datasets/datalad/ci/git-annex-remote-rclone[master]2022
|
||||
$> datalad foreach-dataset git grep 'file is locked'
|
||||
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone]
|
||||
03/cron/20221003T064418/da57e9a/github-Tests-144-failed/9_test (macos-latest, v1.53.3).txt:2022-10-03T06:47:44.4978580Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
|
||||
03/cron/20221003T064418/da57e9a/github-Tests-144-failed/test (macos-latest, v1.53.3)/9_tests.txt:2022-10-03T06:47:44.4978530Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
|
||||
03/push/master/1d0d3ce/github-Tests-146-failed/10_test (macos-latest, v1.33).txt:2022-10-03T23:35:41.8464390Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
|
||||
03/push/master/1d0d3ce/github-Tests-146-failed/9_test (macos-latest, v1.53.3).txt:2022-10-03T23:37:44.0652500Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
|
||||
03/push/master/1d0d3ce/github-Tests-146-failed/test (macos-latest, v1.33)/9_tests.txt:2022-10-03T23:35:41.8463970Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
|
||||
03/push/master/1d0d3ce/github-Tests-146-failed/test (macos-latest, v1.53.3)/9_tests.txt:2022-10-03T23:37:44.0652360Z git-annex: .git/annex/move.log: openFile: resource busy (file is locked)
|
||||
foreach-dataset(ok): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/10 (dataset)
|
||||
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/06 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/06]
|
||||
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/07 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/07]
|
||||
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/09 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/09]
|
||||
foreach-dataset(error): /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/08 (dataset) [CommandError: 'git grep 'file is locked'' failed with exitcode 1 under /mnt/datasets/datalad/ci/git-annex-remote-rclone/2022/08]
|
||||
```
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
no minimal reproducer yet but happens as part of [this test "script"](https://github.com/DanielDent/git-annex-remote-rclone/blob/master/tests/all-in-one.sh)
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
git-annex version: 10.20220927
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> Presumed [[fixed|done]]; please followup if I'm wrong. --[[Joey]]
|
|
@ -0,0 +1,22 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2022-10-07T16:44:04Z"
|
||||
content="""
|
||||
I doubt this is really OSX specific. This must be two threads running logMove
|
||||
at the same time, that end up trying to both write or one write and one
|
||||
read at the same time. That causes the haskell RTS to fail this way.
|
||||
|
||||
Since it does use a lock file when writing and appending to the log file,
|
||||
I think it must be the call to checkLogFile that is failing. That avoids
|
||||
taking the lock, for performance reasons. The performace gain is pretty
|
||||
minimal though, taking the lock is not much. Only when modifyLogFile
|
||||
is called at the same time might it need to block on the file being
|
||||
rewritten, but the file only ever has 100 items, so that never takes long
|
||||
either.
|
||||
|
||||
So, I have added locking to checkLogFile (and to calcLogFile though it's
|
||||
not used here, just because it has the same problem). That should fix it,
|
||||
though we'll need to wait on the test to know for sure. I'm going to close
|
||||
this, as I'm pretty sure though..
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2022-11-04T12:41:47Z"
|
||||
content="""
|
||||
ok, did the archaeologic expedition to figure when fixed -- was fixed in [10.20221003-19-g4a42c6909 AKA 10.20221103~28](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=4a42c69092a03cce7b31b79b862e59c9842ced77) , brew still (well -- we are just 1 day post release! ;)) has 10.20221003 so in testing git-annex-remote-rclone we keep getting hit but hopefully it would go away soon with update of git-annex in brew.
|
||||
"""]]
|
|
@ -0,0 +1,96 @@
|
|||
### Please describe the problem.
|
||||
|
||||
git status reports having staged changes and no changes from index
|
||||
|
||||
```shell
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
|
||||
On branch draft
|
||||
Your branch is up to date with 'github/draft'.
|
||||
|
||||
Changes not staged for commit:
|
||||
(use "git add <file>..." to update what will be committed)
|
||||
(use "git restore <file>..." to discard changes in working directory)
|
||||
modified: .dandi/assets.json
|
||||
|
||||
no changes added to commit (use "git add" and/or "git commit -a")
|
||||
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
|
||||
M ./.dandi/assets.json
|
||||
```
|
||||
|
||||
although git shows no diff and sha256 checksum corresponds to the key:
|
||||
|
||||
```shell
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
|
||||
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
|
||||
Author: DANDI User <info@dandiarchive.org>
|
||||
Date: Fri Sep 16 22:22:29 2022 +0000
|
||||
|
||||
[backups2datalad] 66 files added
|
||||
|
||||
diff --git a/.dandi/assets.json b/.dandi/assets.json
|
||||
index d3ef95e1ee..62fe372810 100644
|
||||
--- a/.dandi/assets.json
|
||||
+++ b/.dandi/assets.json
|
||||
@@ -1 +1 @@
|
||||
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
|
||||
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
|
||||
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
|
||||
```
|
||||
|
||||
I think may be the tricky part is that I have it of
|
||||
|
||||
```
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config annex.version
|
||||
10
|
||||
```
|
||||
|
||||
although I thought that we kept it at 8 but I have user wider config setting
|
||||
|
||||
```
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config filter.annex.process
|
||||
git-annex filter-process
|
||||
```
|
||||
|
||||
I was recommended to speed up operations while avoiding upgrade to 10, but I guess running most recent version once lead to the upgrade since all the other repos are still at 8 as I thought it would be
|
||||
|
||||
```
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ grep -h '\<version =' ../*/.git/config | sort | uniq -c
|
||||
1 version = 10
|
||||
186 version = 8
|
||||
```
|
||||
|
||||
having it reported modified causes our script which does sanity check to operate only on clean repo to fail.
|
||||
|
||||
`git reset --hard` seems mitigated that
|
||||
|
||||
```
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git reset --hard
|
||||
HEAD is now at b859efed7d [backups2datalad] 66 files added
|
||||
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
|
||||
On branch draft
|
||||
Your branch is up to date with 'github/draft'.
|
||||
|
||||
nothing to commit, working tree clean
|
||||
```
|
||||
|
||||
all. I will now rerun our script and see in what state I would end up (although, once again, I ended up in version 10 of the repo already, so may be behavior would be different).
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
I think I get it after I `annex move` and then `annex get` that file back. Just for my own reference -- git-annex repo is result of the https://github.com/dandi/dandisets/blob/draft/tools/backups2datalad-update-cron
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
10.20220822-g84f1875 (conda build), originally observed on earlier 10.20220724-ge30d846
|
||||
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
[[!meta title="annex.stalldetection prevents git-annex get from restaging unlocked files"]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 10"
|
||||
date="2022-09-22T17:34:35Z"
|
||||
content="""
|
||||
damn, I should have shared my config! I also do have `annex.stalldetection` set!
|
||||
|
||||
```
|
||||
[annex]
|
||||
stalldetection = 1KB/120s
|
||||
```
|
||||
|
||||
never thought it might be related. We should look into having some matrix test run with such config set.
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 11"""
|
||||
date="2022-09-22T17:38:45Z"
|
||||
content="""
|
||||
Yeah, a whole git-annex test run with stalldetection set would have found
|
||||
this bug. Which seems a bit heavy-weight for the test suite to try as a
|
||||
separate pass by default. But then again, stalldetection does significantly
|
||||
change how git-annex operates since it has to fork off child processes that
|
||||
it can kill when they stall.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 12"
|
||||
date="2022-09-22T18:14:15Z"
|
||||
content="""
|
||||
Adding a matrix run where I initiated a custom config settings to our [datalad/git-annex](https://github.com/datalad/git-annex/pull/133) CI run. Let's see how that goes. May be some other interesting config settings to add there? e.g. retries etc? or global `~/.gitconfig` is not used/mocked away during tests? (e.g. we do that in datalad, so I had to trick that in [PR against datalad](https://github.com/datalad/datalad/pull/7056) to test against this setting being set)
|
||||
"""]]
|
|
@ -0,0 +1,32 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 12"""
|
||||
date="2022-09-22T17:40:57Z"
|
||||
content="""
|
||||
So, `git-annex transferrer`, after downloading the content, does handle
|
||||
populating pointer files. So it calls restagePointerFile to register a cleanup
|
||||
action.
|
||||
|
||||
Whatever is making that process exit 1 must be preventing the cleanup
|
||||
action from being run. And I think what that is, is that its stdout handle
|
||||
gets closed at the same time its stdin handle is closed. I tried running
|
||||
`git-annex transferrer` manually and feeding it a transfer request on
|
||||
stdin. After its stdin was closed, it proceeded to send
|
||||
`"om (recording state in git...)\n"` to stdout, and that would fail
|
||||
with stdout already closed.
|
||||
|
||||
Worse, I suspect there's another problem.. When a stall actually
|
||||
is detected, git-annex kills the `git-annex transferrer` process that has
|
||||
stalled. But suppose that process has already successfully downloaded some
|
||||
content and populated pointer files. Killing it would prevent it from
|
||||
running restagePointerFile on those. It seems that to solve this,
|
||||
it would need to communicate back to the parent what pointer files need to
|
||||
be restaged. (Which would also solve the exit 1 problem, although not
|
||||
necessarily in the best way.)
|
||||
|
||||
Also, I think that multiple processes running the restagePointerFile
|
||||
cleanup action at the same time can be a problem, because one will
|
||||
lock the index and the rest will fail to restage. Not what's happening
|
||||
here, but with -J, there would be multiple `git-annex transferrer`
|
||||
processes doing that at the same time at the end.
|
||||
"""]]
|
|
@ -0,0 +1,30 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 13"""
|
||||
date="2022-09-22T18:16:22Z"
|
||||
content="""
|
||||
Avoided the early stdout handle close, and that did fix this bug as
|
||||
reported.
|
||||
|
||||
The related problems I identified in comment #12 are still unfixed, so
|
||||
leaving this open for now.
|
||||
|
||||
I think what ought to be done to wrap this up is make restagePointerFile
|
||||
record the files that need to be restaged in a log file. Then at shutdown,
|
||||
git-annex can read the log file, and restage everything listed in it.
|
||||
This will solve multiple problems:
|
||||
|
||||
* When a previous git-annex process was interrupted after a get/drop of an
|
||||
unlocked file, the file will be in the log, so git-annex can notice
|
||||
that and handle the restaging.
|
||||
* When a stalled `git-annex transferrer` is killed, the parent git-annex
|
||||
will read the log and handle the restaging that it was not able to do.
|
||||
* When multiple processes are trying to restage files at the same time,
|
||||
an exclusive lock can be used to make only one of them run, and it can
|
||||
handle restaging the files that the others have recorded in the log too.
|
||||
* As a bonus, in the situations where git-annex is legitimately unable to
|
||||
restage files, it can still record them to be restaged later. And the
|
||||
"only a cosmetic problem" message can tell the user to run a single
|
||||
simple git-annex command, rather than a complicated
|
||||
`git update-index` command per file.
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 15"""
|
||||
date="2022-09-22T18:42:06Z"
|
||||
content="""
|
||||
@yarikoptic oh, `git-annex test` does prevent global gitconfig from
|
||||
influeencing the tests. So your matrix test won't work if you're
|
||||
running `git-annex test` in it. If you're running other git-annex commands
|
||||
in datalad's test suite, it would work though.
|
||||
|
||||
I've opened [[todo/specify_gitconfig_for_test_suite]].
|
||||
"""]]
|
|
@ -0,0 +1,33 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""status update"""
|
||||
date="2022-09-23T19:57:38Z"
|
||||
content="""
|
||||
I've implemented the log file. The stalled transferrer case is now handled.
|
||||
This bug is fixed.
|
||||
|
||||
As to a few other cases I considered in comments upthread:
|
||||
|
||||
When a get/drop was interrupted before it could restage,
|
||||
the next get/drop will cause the necessary restaging for the
|
||||
interrupted process to happen. However, this doesn't help if there's
|
||||
nothing left to get/drop. Should git-annex always run restagePointerFiles
|
||||
on shutdown? That would make any git-annex command handle the restaging.
|
||||
But it doesn't seem right for query commands to do potentially a lot of
|
||||
work to handle this case. Anyway, I don't think this needs to be dealt
|
||||
with in this bug report.
|
||||
|
||||
When multiple processes try to restage at the same time, one will
|
||||
restage everything that all of them logged. The others will still display a
|
||||
warning to the user that they couldn't restage. It would be hard to avoid
|
||||
displaying that warning, since it does need to warn when it was
|
||||
unable to restage because git has the index locked at the time. Anyway,
|
||||
I think it's ok to display the message despite the files having been
|
||||
restaged, because it's the same as a later git-annex process handling the
|
||||
restaging. (It does seem like two transferrers belonging to the same parent
|
||||
could collide in this way, and one display the warning, which isn't great..)
|
||||
|
||||
I also implemented a "git-annex restage" command that
|
||||
is an easier way to restage in the cases where git-annex is not able
|
||||
to do it itself.
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2022-09-21T17:05:51Z"
|
||||
content="""
|
||||
Is .dandi/assets.json an unlocked file?
|
||||
|
||||
`git diff --cached` seems like the wrong thing to run, because
|
||||
that would show changes that you have staged for commit.
|
||||
This change is one that has not been staged for commit.
|
||||
So `git diff` should show it.
|
||||
"""]]
|
|
@ -0,0 +1,46 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2022-09-21T18:46:50Z"
|
||||
content="""
|
||||
d'oh forgot to show that I have tried that one too. Here is everything at once again with `git diff` and again doing checksums (that should have been different in my prev examples as well if different only in tree but not in index):
|
||||
|
||||
```shell
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
|
||||
On branch draft
|
||||
Your branch is up to date with 'github/draft'.
|
||||
|
||||
Changes not staged for commit:
|
||||
(use \"git add <file>...\" to update what will be committed)
|
||||
(use \"git restore <file>...\" to discard changes in working directory)
|
||||
modified: .dandi/assets.json
|
||||
|
||||
|
||||
It took 3.19 seconds to enumerate untracked files. 'status -uno'
|
||||
may speed it up, but you have to be careful not to forget to add
|
||||
new files yourself (see 'git help status').
|
||||
no changes added to commit (use \"git add\" and/or \"git commit -a\")
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
|
||||
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
|
||||
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
|
||||
Author: DANDI User <info@dandiarchive.org>
|
||||
Date: Fri Sep 16 22:22:29 2022 +0000
|
||||
|
||||
[backups2datalad] 66 files added
|
||||
|
||||
diff --git a/.dandi/assets.json b/.dandi/assets.json
|
||||
index d3ef95e1ee..62fe372810 100644
|
||||
--- a/.dandi/assets.json
|
||||
+++ b/.dandi/assets.json
|
||||
@@ -1 +1 @@
|
||||
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
|
||||
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
|
||||
M ./.dandi/assets.json
|
||||
|
||||
```
|
||||
"""]]
|
|
@ -0,0 +1,30 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 3"
|
||||
date="2022-09-21T18:49:06Z"
|
||||
content="""
|
||||
the workaround you suggest elsewhere for \"cosmetic\" problem works here too
|
||||
|
||||
```
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
|
||||
On branch draft
|
||||
Your branch is up to date with 'github/draft'.
|
||||
|
||||
Changes not staged for commit:
|
||||
(use \"git add <file>...\" to update what will be committed)
|
||||
(use \"git restore <file>...\" to discard changes in working directory)
|
||||
modified: .dandi/assets.json
|
||||
|
||||
no changes added to commit (use \"git add\" and/or \"git commit -a\")
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git update-index -q --refresh .dandi/assets.json
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
|
||||
On branch draft
|
||||
Your branch is up to date with 'github/draft'.
|
||||
|
||||
nothing to commit, working tree clean
|
||||
|
||||
```
|
||||
|
||||
but since we are relying on output from `status`, it is not just a \"cosmetic\" issue. IMHO if such `update-index` is needed, it should have been done by git-annex automagically somehow/sometime.
|
||||
"""]]
|
|
@ -0,0 +1,29 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2022-09-21T19:19:08Z"
|
||||
content="""
|
||||
So you can reproduce this? I am pretty sure it's not as simple as a drop
|
||||
followed by a get, so more information about reproducing it seems crucial.
|
||||
|
||||
I assume you are *not* seeing the "This is only a cosmetic problem affecting git status"
|
||||
message?
|
||||
|
||||
I expect that running `git update-index --refresh .dandi/assets.json`
|
||||
will fix git status. Can you confirm?
|
||||
|
||||
The only way I know of that this can happen without the message is if a
|
||||
drop or a get is still running, or gets interrupted. One of the last things
|
||||
git-annex before exiting is restage all the unlocked files that it has
|
||||
updated.
|
||||
|
||||
Short of that, it seems like it would have to be a bug that prevents
|
||||
restagePointerFile from working. Which might not be a bug in git-annex,
|
||||
if the problem involves git's handling of timestamps in the index, for
|
||||
example. (Which is known to have some odd behaviors.)
|
||||
|
||||
(git-annex could be improved to do the
|
||||
restaging later when interrupted and possibly after such a bug.
|
||||
But there's no way to make it recover in `git status`, because
|
||||
git doesn't run it in this situation.)
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2022-09-21T22:06:49Z"
|
||||
content="""
|
||||
Seems likely that the --time-limit option, when combined with -J,
|
||||
could result in git-annex exiting before a worker thread gets a chance to
|
||||
call stagePointerFile. I have not verified this, and it would be unlikely
|
||||
to result in the same file being affected reproducibly.
|
||||
"""]]
|
|
@ -0,0 +1,33 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 6"
|
||||
date="2022-09-22T01:03:18Z"
|
||||
content="""
|
||||
may be it one of those options, in my case - it is just a straight `get` on that single unlocked file:
|
||||
|
||||
```
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
|
||||
On branch draft
|
||||
Your branch is up to date with 'github/draft'.
|
||||
|
||||
nothing to commit, working tree clean
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ cat .dandi/assets.json
|
||||
/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex get .dandi/assets.json
|
||||
get .dandi/assets.json (from dandi-dandisets-dropbox...)
|
||||
(checksum...) ok
|
||||
(recording state in git...)
|
||||
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
|
||||
On branch draft
|
||||
Your branch is up to date with 'github/draft'.
|
||||
|
||||
Changes not staged for commit:
|
||||
(use \"git add <file>...\" to update what will be committed)
|
||||
(use \"git restore <file>...\" to discard changes in working directory)
|
||||
modified: .dandi/assets.json
|
||||
|
||||
no changes added to commit (use \"git add\" and/or \"git commit -a\")
|
||||
|
||||
```
|
||||
"""]]
|
|
@ -0,0 +1,58 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 7"
|
||||
date="2022-09-22T01:33:24Z"
|
||||
content="""
|
||||
sorry I have not mentioned your [earlier comment 4](http://git-annex.branchable.com/bugs/reports_file___34__modified__34___whenever_it_is_not/#comment-ca0281ff580c91c40e429fbbb71a3791) but my clarification above I think gives the answers to your questions ;)
|
||||
|
||||
<details>
|
||||
<summary>FWIW here is the get --debug output </summary>
|
||||
|
||||
```shell
|
||||
[2022-09-21 21:29:59.904218] (Utility.Process) process [3968193] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"--stage\",\"-z\",\"--error-unmatch\",\"--\",\".dandi/assets.json\"]
|
||||
[2022-09-21 21:29:59.904725] (Utility.Process) process [3968194] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
|
||||
[2022-09-21 21:29:59.905645] (Utility.Process) process [3968195] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
|
||||
[2022-09-21 21:29:59.906012] (Utility.Process) process [3968196] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"]
|
||||
[2022-09-21 21:29:59.907578] (Utility.Process) process [3968196] done ExitSuccess
|
||||
[2022-09-21 21:29:59.907891] (Utility.Process) process [3968197] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
|
||||
[2022-09-21 21:29:59.913611] (Utility.Process) process [3968197] done ExitSuccess
|
||||
[2022-09-21 21:29:59.914676] (Utility.Process) process [3968198] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..5f5efa8544ff02c9261dd1590425dcea37a55526\",\"--pretty=%H\",\"-n1\"]
|
||||
[2022-09-21 21:29:59.916707] (Utility.Process) process [3968198] done ExitSuccess
|
||||
[2022-09-21 21:29:59.916968] (Utility.Process) process [3968199] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..18497e6e9cab7754a85256416c361fee36ba65b2\",\"--pretty=%H\",\"-n1\"]
|
||||
[2022-09-21 21:29:59.918722] (Utility.Process) process [3968199] done ExitSuccess
|
||||
[2022-09-21 21:29:59.919069] (Utility.Process) process [3968200] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch=%(objectname) %(objecttype) %(objectsize)\",\"--buffer\"]
|
||||
get .dandi/assets.json [2022-09-21 21:29:59.921463] (Utility.Process) process [3968202] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"]
|
||||
(from dandi-dandisets-dropbox...) [2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex [\"transferrer\",\"-c\",\"annex.debug=true\"]
|
||||
[2022-09-21 21:29:59.93162] (Annex.TransferrerPool) > d rdandi-dandisets-dropbox SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json .dandi/assets.json
|
||||
[2022-09-21 21:29:59.942599] (Annex.TransferrerPool) < opb
|
||||
|
||||
[2022-09-21 21:29:59.942718] (Annex.TransferrerPool) < ops 69507227
|
||||
[2022-09-21 21:30:03.103409] (Annex.TransferrerPool) < ope
|
||||
[2022-09-21 21:30:03.103539] (Annex.TransferrerPool) < om (checksum...)
|
||||
(checksum...) [2022-09-21 21:30:03.768599] (Annex.TransferrerPool) < t
|
||||
[2022-09-21 21:30:03.768843] (Annex.Branch) read 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
|
||||
[2022-09-21 21:30:03.770259] (Annex.Branch) set 6e0/a70/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json.log
|
||||
ok
|
||||
[2022-09-21 21:30:03.770361] (Utility.Process) process [3968200] done ExitSuccess
|
||||
[2022-09-21 21:30:03.770425] (Utility.Process) process [3968195] done ExitSuccess
|
||||
[2022-09-21 21:30:03.770484] (Utility.Process) process [3968194] done ExitSuccess
|
||||
[2022-09-21 21:30:03.770531] (Utility.Process) process [3968193] done ExitSuccess
|
||||
[2022-09-21 21:30:03.771187] (Utility.Process) process [3968452] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"hash-object\",\"-w\",\"--stdin-paths\",\"--no-filters\"]
|
||||
[2022-09-21 21:30:03.77319] (Utility.Process) process [3968453] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
|
||||
[2022-09-21 21:30:04.063182] (Utility.Process) process [3968453] done ExitSuccess
|
||||
[2022-09-21 21:30:04.063779] (Utility.Process) process [3968463] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
|
||||
[2022-09-21 21:30:04.065352] (Utility.Process) process [3968463] done ExitSuccess
|
||||
(recording state in git...)
|
||||
[2022-09-21 21:30:04.06587] (Utility.Process) process [3968464] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"]
|
||||
[2022-09-21 21:30:04.407935] (Utility.Process) process [3968464] done ExitSuccess
|
||||
[2022-09-21 21:30:04.408528] (Utility.Process) process [3968468] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"56c62dcc21145201f9454a2dd6e75cc37f072ee4\",\"--no-gpg-sign\",\"-p\",\"refs/heads/git-annex\"]
|
||||
[2022-09-21 21:30:04.410591] (Utility.Process) process [3968468] done ExitSuccess
|
||||
[2022-09-21 21:30:04.413623] (Utility.Process) process [3968469] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"c3a1f9208649b47621b1424b055bd9871aa2fc79\"]
|
||||
[2022-09-21 21:30:04.415318] (Utility.Process) process [3968469] done ExitSuccess
|
||||
[2022-09-21 21:30:04.416301] (Utility.Process) process [3968202] done ExitSuccess
|
||||
[2022-09-21 21:30:04.416574] (Utility.Process) process [3968452] done ExitSuccess
|
||||
[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1
|
||||
```
|
||||
</details>
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 8"""
|
||||
date="2022-09-22T17:02:04Z"
|
||||
content="""
|
||||
I've fixed the issue I found with --timestamp combined with -J. Which I do
|
||||
think could have resulted in the same kind of problem. But you've shown
|
||||
that is not the cause in your case..
|
||||
"""]]
|
|
@ -0,0 +1,19 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 9"""
|
||||
date="2022-09-22T17:04:35Z"
|
||||
content="""
|
||||
Thanks for the --debug. It shows that git-annex is not running
|
||||
`git update-index --refresh` at all.
|
||||
|
||||
And it shows that the transfer happens in a `git-annex transferrer` process.
|
||||
So, I think you have annex.stalldetection set.
|
||||
|
||||
[2022-09-21 21:29:59.931525] (Utility.Process) process [3968203] chat: /home/dandi/miniconda3/envs/dandisets/bin/git-annex [\"transferrer\",\"-c\",\"annex.debug=true\"]
|
||||
|
||||
And interestingly, that transferrer process fails at the end:
|
||||
|
||||
[2022-09-21 21:30:06.373343] (Utility.Process) process [3968203] done ExitFailure 1
|
||||
|
||||
Aha! I can reproduce it by setting annex.stalldetection.
|
||||
"""]]
|
|
@ -0,0 +1,72 @@
|
|||
### Please describe the problem.
|
||||
|
||||
NB can't change the title since it is not about depends since libgcc-s1 is essential... so most likely some LD_LIBRARY_PATH manipulation is in place or smth like that.
|
||||
|
||||
[Testing of git-annex-remote-rclone on ubuntu-20.04 crashed](https://github.com/DanielDent/git-annex-remote-rclone/actions/runs/3750292044/jobs/6370225718) with
|
||||
|
||||
```
|
||||
+ git-annex copy -J5 --quiet . --to GA-rclone-CI
|
||||
libgcc_s.so.1 must be installed for pthread_cancel to work
|
||||
/home/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/tests/all-in-one.sh: line 124: 3066 Aborted (core dumped) git-annex copy -J5 --quiet . --to GA-rclone-CI
|
||||
Error: Process completed with exit code 134.
|
||||
```
|
||||
|
||||
installation of git annex
|
||||
|
||||
```
|
||||
Run datalad-installer --sudo ok git-annex -m datalad/git-annex:release
|
||||
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Writing environment modifications to /tmp/dl-env-j8s29if7.sh
|
||||
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Installing git-annex via datalad/git-annex:release
|
||||
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Version: None
|
||||
2022-12-21T15:10:30+0000 [INFO ] datalad_installer Downloading https://github.com/datalad/git-annex/releases/download/10.20221212/git-annex-standalone_10.20221212-1.ndall%2B1_amd64.deb
|
||||
2022-12-21T15:10:33+0000 [INFO ] datalad_installer Running: sudo dpkg -i /tmp/tmpah14ch03/git-annex-standalone_10.20221212-1.ndall+1_amd64.deb
|
||||
Selecting previously unselected package git-annex-standalone.
|
||||
(Reading database ... 236921 files and directories currently installed.)
|
||||
Preparing to unpack .../git-annex-standalone_10.20221212-1.ndall+1_amd64.deb ...
|
||||
Unpacking git-annex-standalone (10.20221212-1~ndall+1) ...
|
||||
Setting up git-annex-standalone (10.20221212-1~ndall+1) ...
|
||||
Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
|
||||
Processing triggers for hicolor-icon-theme (0.17-2) ...
|
||||
Processing triggers for man-db (2.10.2-1) ...
|
||||
2022-12-21T15:10:35+0000 [INFO ] datalad_installer git-annex is now installed at /usr/bin/git-annex
|
||||
```
|
||||
|
||||
or may be that is an issue with `rclone`? in this case it was
|
||||
|
||||
```
|
||||
Run datalad-installer --sudo ok rclone=v1.59.2 -m downloads.rclone.org
|
||||
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Writing environment modifications to /tmp/dl-env-aon5z6_f.sh
|
||||
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Installing rclone from downloads.rclone.org
|
||||
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Version: v1.59.2
|
||||
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Bin dir: /usr/local/bin
|
||||
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Man dir: None
|
||||
2022-12-21T15:10:35+0000 [INFO ] datalad_installer Downloading https://downloads.rclone.org/v1.59.2/rclone-v1.59.2-linux-amd64.zip
|
||||
2022-12-21T15:10:38+0000 [INFO ] datalad_installer Moving /tmp/tmp75sde__c/rclone-v1.59.2-linux-amd64/rclone to /usr/local/bin/rclone
|
||||
2022-12-21T15:10:38+0000 [INFO ] datalad_installer rclone is now installed at /usr/local/bin/rclone
|
||||
```
|
||||
|
||||
I have tried to reproduce locally with exactly those installations of rclone and git-annex but not getting the same problem :-/
|
||||
|
||||
I have also ran with `--debug` and got
|
||||
```
|
||||
[2022-12-21 17:20:10.056928113] (Utility.Process) process [11603] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","c95a5c849daca7183eefc28c360942104d01e900","--no-gpg-sign","-p","refs/heads/git-annex"]
|
||||
[2022-12-21 17:20:10.060448661] (Utility.Process) process [11603] done ExitSuccess
|
||||
[2022-12-21 17:20:10.060806165] (Utility.Process) process [11604] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","248cef615747c4aba64fbb475b0a03c8d2a78b27"]
|
||||
[2022-12-21 17:20:10.063957208] (Utility.Process) process [11604] done ExitSuccess
|
||||
[2022-12-21 17:20:10.066005436] (Utility.Process) process [11127] done ExitSuccess
|
||||
[2022-12-21 17:20:10.066266539] (Utility.Process) process [11114] done ExitSuccess
|
||||
[2022-12-21 17:20:10.066702845] (Utility.Process) process [11126] done ExitSuccess
|
||||
[2022-12-21 17:20:10.067107151] (Utility.Process) process [11125] done ExitSuccess
|
||||
[2022-12-21 17:20:10.067357854] (Utility.Process) process [11599] done ExitSuccess
|
||||
libgcc_s.so.1 must be installed for pthread_cancel to work
|
||||
/home/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/tests/all-in-one.sh: line 125: 11083 Aborted (core dumped) git-annex drop -J5 --debug .
|
||||
Error: Process completed with exit code 134.
|
||||
```
|
||||
in https://github.com/DanielDent/git-annex-remote-rclone/actions/runs/3751417971/jobs/6372374929 .
|
||||
|
||||
Any ideas Joey?
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/dandi]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -0,0 +1,23 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2022-12-22T18:38:32Z"
|
||||
content="""
|
||||
I'm a bit surprised git-annex is using `pthread_cancel`, since `strings`
|
||||
does not show it contains that symbol. Perhaps one of the other pthread
|
||||
symbols it uses ends up calling that.
|
||||
|
||||
It does seem though from the message that it's git-annex and not a program
|
||||
it runs that is core dumping on this. Also I checked, and the rclone you
|
||||
installed is a statically linked binary so I would not expect it to use
|
||||
`libgcc_s.so`. And And git-annex-remote-rclone is a bash script, and bash
|
||||
doesn't use pthreads.
|
||||
|
||||
(I do think that, in general, using the git-annex standalone tarball and
|
||||
then trying to run additional programs besides git-annex inside it is not
|
||||
going to always work well. Standalone interposes its own versions of libraries,
|
||||
which may not work with the other programs. There is already a todo about that,
|
||||
[[todo/restore_original_environment_when_running_external_special_remotes_from_standalone_git-annex__63__]].)
|
||||
|
||||
I've added `libgcc_s.so.1` to the standalone build.
|
||||
"""]]
|
|
@ -24,7 +24,7 @@ My bugs
|
|||
<details>
|
||||
<summary>Fixed</summary>
|
||||
|
||||
[[!inline pages="bugs/* and !bugs/done and link(bugs/done) and
|
||||
(author(mih) or author(ben) or author(kyle) or tagged(projects/datalad))" feeds=no actions=yes archive=yes show=0 template=buglist template=buglist]]
|
||||
[[!inline pages="(bugs/* or projects/datalad/bugs-done/*) and !bugs/done and link(bugs/done) and
|
||||
(author(mih) or author(ben) or author(kyle) or tagged(projects/datalad))" feeds=no actions=yes archive=yes show=0 template=buglist]]
|
||||
|
||||
</details>
|
||||
|
|
|
@ -0,0 +1,69 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Identified while troubleshooting another [issue](https://git-annex.branchable.com/bugs/enableremote_stuck_with_a_recentish_git-annex/#comment-2116c5e109aaf39ffd62f3bdeeb14602)
|
||||
|
||||
[[!format sh """
|
||||
$> 'git-annex' 'enableremote' --debug -cremote.target1.blah=1 'target1'
|
||||
enableremote target1 ok
|
||||
|
||||
$> 'git-annex' 'enableremote' -cremote.target1.blah=1 --debug 'target1'
|
||||
[2020-02-26 14:46:47.789794028] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","show-ref","git-annex"]
|
||||
[2020-02-26 14:46:47.797917978] process done ExitSuccess
|
||||
[2020-02-26 14:46:47.798350533] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2020-02-26 14:46:47.802576899] process done ExitSuccess
|
||||
[2020-02-26 14:46:47.802884873] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","log","refs/heads/git-annex..b1ab0b11fbbc94ffd3d52adb7a0e93c3d45d8b52","--pretty=%H","-n1"]
|
||||
[2020-02-26 14:46:47.813289406] process done ExitSuccess
|
||||
[2020-02-26 14:46:47.815873454] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","cat-file","--batch"]
|
||||
[2020-02-26 14:46:47.818598891] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
|
||||
[2020-02-26 14:46:47.824657055] read: git ["config","--null","--list"]
|
||||
[2020-02-26 14:46:47.835897478] process done ExitSuccess
|
||||
enableremote target1 [2020-02-26 14:46:47.83652184] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","remote.target1.blah=1","config","remote.target1.annex-ignore","false"]
|
||||
[2020-02-26 14:46:47.842277017] process done ExitSuccess
|
||||
[2020-02-26 14:46:47.842703576] read: git ["config","--null","--list"]
|
||||
[2020-02-26 14:46:47.853478328] process done ExitSuccess
|
||||
ok
|
||||
[2020-02-26 14:46:47.855317715] process done ExitSuccess
|
||||
[2020-02-26 14:46:47.856835556] process done ExitSuccess
|
||||
|
||||
"""]]
|
||||
|
||||
I consider it a bug since options shouldn't be order dependent, and even if they were -- `--debug` is listed before `-c` in `git annex enableremote --help`:
|
||||
|
||||
[[!format sh """
|
||||
$> git annex enableremote --help
|
||||
git-annex enableremote - enables git-annex to use a remote
|
||||
|
||||
Usage: git-annex enableremote [NAME K=V ...]
|
||||
|
||||
Available options:
|
||||
--force allow actions that may lose annexed data
|
||||
-F,--fast avoid slow operations
|
||||
-q,--quiet avoid verbose output
|
||||
-v,--verbose allow verbose output (default)
|
||||
-d,--debug show debug messages
|
||||
--no-debug don't show debug messages
|
||||
-b,--backend NAME specify key-value backend to use
|
||||
-N,--numcopies NUMBER override default number of copies
|
||||
--trust REMOTE override trust setting
|
||||
--semitrust REMOTE override trust setting back to default
|
||||
--untrust REMOTE override trust setting to untrusted
|
||||
-c,--config NAME=VALUE override git configuration setting
|
||||
--user-agent NAME override default User-Agent
|
||||
--trust-glacier Trust Amazon Glacier inventory
|
||||
--notify-finish show desktop notification after transfer finishes
|
||||
--notify-start show desktop notification after transfer starts
|
||||
-h,--help Show this help text
|
||||
|
||||
For details, run: git-annex help enableremote
|
||||
|
||||
$> git annex version
|
||||
git-annex version: 7.20190819+git2-g908476a9b-1~ndall+1
|
||||
|
||||
"""]]
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
||||
> fixed in [8.20200226-3-gc089f395b](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=c089f395b0c7d6416a3d4f2bf3211404acfd5b0e) --[[yarikoptic]]
|
|
@ -0,0 +1,24 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-02-27T04:20:58Z"
|
||||
content="""
|
||||
-c uses adjustGitRepo which calls changeGitRepo, which
|
||||
re-extracts the GitConfig. --debug uses changeGitConfig which
|
||||
sets annexDebug in the GitConfig, which does not survive the changeGitRepo.
|
||||
|
||||
There might be a broader problem here, as changeGitRepo is also
|
||||
called by setConfig in many parts of the code. I think it narrowly
|
||||
escapes being a problem, because by the time a command is started,
|
||||
it's already enabled debug output, and so the GitConfig being reloaded
|
||||
doesn't disable debugging.
|
||||
|
||||
Other calls to changeGitConfig could also be a problem, if followed by
|
||||
an adjustGitRepo which loses those changes. There are only a few others,
|
||||
look probably ok, but this would be an easy gotcha to hit later.
|
||||
|
||||
So changeGitConfig needs to make a config change that persists across
|
||||
changeGitRepo.
|
||||
|
||||
Done.
|
||||
"""]]
|
|
@ -0,0 +1,34 @@
|
|||
### Please describe the problem.
|
||||
|
||||
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
|
||||
I am plowing through on making git-annex available within conda-forge "natively" for Windows. For now I just took the recently built installer, the one now available from [datasets.datalad.org](http://datasets.datalad.org/datalad/packages/windows/) and built on datalad-extensions github setup. I just extracted git-annex component from the installer and placed them within conda hierarchy (installed `posix` package with all the needed basic tools. Overall -- looks great, but:
|
||||
|
||||
[[!format sh """
|
||||
prop_view_roundtrips: FAIL (0.30s)
|
||||
*** Failed! Falsified (after 524 tests and 1 shrink):
|
||||
"a"
|
||||
MetaData (fromList [(MetaField "8",fromList [MetaValue (CurrentlySet False) "",MetaValue (CurrentlySet True) "\nD\EM",MetaValue (CurrentlySet True) "GO`!)",MetaValue (CurrentlySet False) "k\FS\CAN"]),(MetaField "dU",fromList [MetaValue (CurrentlySet True) "",MetaValue (CurrentlySet False) "\NUL44Vfm[\t",MetaValue (CurrentlySet True) "\nLMEgYc",MetaValue (CurrentlySet True) "\SO[",MetaValue (CurrentlySet True) "\FS\DC4\DLE\"3",MetaValue (CurrentlySet True) ";\f0&Wc\GS{^",MetaValue (CurrentlySet True) "D",MetaValue (CurrentlySet True) "c:"]),(MetaField "sV",fromList [MetaValue (CurrentlySet True) "",MetaValue (CurrentlySet False) "\STX8#w",MetaValue (CurrentlySet False) "\ny",MetaValue (CurrentlySet False) "\DC4qOq",MetaValue (CurrentlySet True) "\FSbqjq",MetaValue (CurrentlySet True) "T_bx%[lN",MetaValue (CurrentlySet True) "W0`",MetaValue (CurrentlySet True) "~ ueY"]),(MetaField "V",fromList [MetaValue (CurrentlySet False) "",MetaValue (CurrentlySet False) "\t\DC1~`\SOHv\DC1",MetaValue (CurrentlySet True) "\DLE3",MetaValue (CurrentlySet True) "/MZh$",MetaValue (CurrentlySet False) "0",MetaValue (CurrentlySet False) "MEulc",MetaValue (CurrentlySet True) "P5D",MetaValue (CurrentlySet True) "i|S,",MetaValue (CurrentlySet True) "x|C"])])
|
||||
True
|
||||
Use --quickcheck-replay=742853 to reproduce.
|
||||
"""]]
|
||||
|
||||
unfortunately I cannot tell from that output what could be the problem. Please let me know if hard to figure it out and I should provide access to such environment (ATM needs effort, so I do not want to spend time on that unless "no other way")
|
||||
|
||||
And it seems it might be a flaky test -- I started another run, it is still running but I this test did not fail
|
||||
|
||||
```
|
||||
$ grep prop_view_roundtrips git-annex-test-miniconda*.log
|
||||
git-annex-test-miniconda-2.log: prop_view_roundtrips: OK (2.51s)
|
||||
git-annex-test-miniconda.log: prop_view_roundtrips: FAIL (0.30s)
|
||||
```
|
||||
|
||||
Cheers,
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -0,0 +1,17 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-10-26T16:03:40Z"
|
||||
content="""
|
||||
I can reproduce it in a windows VM running
|
||||
`git-annex test --quickcheck-replay=742853`
|
||||
|
||||
These quickcheck tests test random input so not flaky exactly.
|
||||
|
||||
Does not happen with that seed on linux, so it probably involves something
|
||||
encoding specific. An area where the windows port is known to have
|
||||
extensive problems.
|
||||
|
||||
([[!commit 1b8026b2cbc8df0274082c5f08a8b4f8ca47c5c9]] was similar,
|
||||
although that was MetaField and this appears to be MetaValue.)
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2020-10-26T17:57:12Z"
|
||||
content="""
|
||||
you say \"windows port\", I say \"windows as a whole\", e.g. today revelation (or just a come back if I ran into it before but forgot) to me [was inability to have a file/directory named `con`...](https://github.com/datalad/datalad/issues/5097) - no bloody sense on how such design decision has happened and how it dragged all the way into the flagman of the 2020 product.
|
||||
"""]]
|
|
@ -0,0 +1,42 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-10-26T18:41:27Z"
|
||||
content="""
|
||||
Hmm, this uses viewedFiles, which generates filenames
|
||||
based on the MetaValue. Note use of pathProduct, which uses
|
||||
System.FilePath.combine.
|
||||
|
||||
So, generating random ascii (including escape sequences)
|
||||
bytestrings, and passing them through decodeBS to generate FilePaths,
|
||||
and then operating on those filepaths. What could possibly go wrong.
|
||||
|
||||
And aha! I made pathProduct use System.FilePath.Windows.combine
|
||||
and was able to reproduce the test suite failure on Linux.
|
||||
|
||||
And aha again:
|
||||
|
||||
MetaValue (CurrentlySet True) "c:"
|
||||
|
||||
Which of course breaks it on windows because it wanted to generate
|
||||
something like "bar/c:/baz/a" but instead it gets "c:/bar/baz/a"
|
||||
|
||||
git-annex does replace '/' and '\' when generating these filenames.
|
||||
Not as a security measure (when the view branch is checked out, git's
|
||||
security checks apply same as any branch so it piggybacks on those),
|
||||
but to let the user build a view and successfully check it out
|
||||
when their metadata happens to include such stuff.
|
||||
|
||||
However, windows does have enough special filenames and gotchas
|
||||
that it simply does not seem to make sense for git-annex to try to work
|
||||
around them all in the view code. If a MetaValue happens to end with a
|
||||
period, or is "nul", and so the generated filename is illegal on Windows,
|
||||
it'll blow up at checkout time, and I am ok with that.
|
||||
|
||||
So I think it would make sense to also escape ':', but that's about as far
|
||||
as this should go. *Especially* because the filenames it generates need to
|
||||
roundtrip back to metadata cleanly, which is what this test case is
|
||||
testing. While I can finesse individual characters, it would be quite hard
|
||||
to make a filename w/o a trailing dot roundtrip back to one with it, for
|
||||
example.
|
||||
"""]]
|
|
@ -0,0 +1,19 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 4"
|
||||
date="2021-01-22T16:44:58Z"
|
||||
content="""
|
||||
did it come back, I see
|
||||
|
||||
```
|
||||
2021-01-22T04:32:25.5012547Z prop_view_roundtrips: FAIL (0.09s)
|
||||
2021-01-22T04:32:25.5015902Z *** Failed! Falsified (after 218 tests):
|
||||
2021-01-22T04:32:25.5016251Z AssociatedFile (Just \"rdmBBP\")
|
||||
2021-01-22T04:32:25.5018130Z MetaData (fromList [(MetaField \"CkL\",fromList [MetaValue (CurrentlySet False) \"\",MetaValue (CurrentlySet True) \"\SOH5:R9\EM\DC4\",MetaValue (CurrentlySet True) \"\STX\US\fL2\ACK|\\\r[$\",MetaValue (CurrentlySet False) \"\ETBRi\",MetaValue (CurrentlySet False) \"/\FS}\",MetaValue (CurrentlySet True) \"W\",MetaValue (CurrentlySet False) \"X=sQh\NAK^\",MetaValue (CurrentlySet False) \"l\SUB\a\"]),(MetaField \"jM\",fromList [MetaValue (CurrentlySet False) \"\",MetaValue (CurrentlySet False) \"\FSSivk\",MetaValue (CurrentlySet True) \"J'<\SYN\STXGJP\"]),(MetaField \"V\",fromList [MetaValue (CurrentlySet False) \"\",MetaValue (CurrentlySet True) \"\n\NUL\",MetaValue (CurrentlySet True) \"\r\",MetaValue (CurrentlySet False) \"+X\",MetaValue (CurrentlySet True) \"@aN\t~c\SIy\",MetaValue (CurrentlySet False) \"K>xq\",MetaValue (CurrentlySet True) \"a:\"]),(MetaField \"W\",fromList [MetaValue (CurrentlySet True) \"0\DC4qL\",MetaValue (CurrentlySet False) \"K\",MetaValue (CurrentlySet False) \"LD\DC3<M\",MetaValue (CurrentlySet False) \"a\v\",MetaValue (CurrentlySet True) \"dO\",MetaValue (CurrentlySet True) \"w\EOT\"])])
|
||||
2021-01-22T04:32:25.5020545Z True
|
||||
2021-01-22T04:32:25.5020894Z Use --quickcheck-replay=455629 to reproduce.
|
||||
```
|
||||
|
||||
on https://github.com/datalad/git-annex/runs/1746587663?check_suite_focus=true with `8.20201129+git169-gaa07e68ed_x64`
|
||||
"""]]
|
|
@ -0,0 +1,18 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-01-22T16:59:55Z"
|
||||
content="""
|
||||
Not sure if this is really the same bug, though certainly related. These
|
||||
quickcheck tests are fuzz tests, they can find numerous bugs, that's kind of
|
||||
the point of them. In any case, posting to a closed bug report risks your
|
||||
followup being lost and deprioritises it.
|
||||
|
||||
The problem this new failure shows is that toViewPath is failing to escape the
|
||||
final character in the path in some cases. Which is not a windows-specific
|
||||
bug at all really, it could also happen with a metadata value such as "foo/"
|
||||
being set on linux. Fixed that bug.
|
||||
|
||||
Which shows the point of these quickcheck fuzz tests: To be able to catch
|
||||
lots of different bugs with a single test case.
|
||||
"""]]
|
|
@ -0,0 +1,42 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Since cron build of 20210828
|
||||
|
||||
```
|
||||
(git)smaug:/mnt/datasets/datalad/ci/git-annex/builds/2021/08[master]git
|
||||
$> git grep -l 'Unable to remove all write permissions'
|
||||
cron-20210828/build-macos.yaml-403-69466103-failed/2_test-annex (crippled-tmp).txt
|
||||
cron-20210828/build-macos.yaml-403-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
|
||||
cron-20210829/build-macos.yaml-404-69466103-failed/2_test-annex (crippled-tmp).txt
|
||||
cron-20210829/build-macos.yaml-404-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
|
||||
cron-20210830/build-macos.yaml-405-69466103-failed/2_test-annex (crippled-tmp).txt
|
||||
cron-20210830/build-macos.yaml-405-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
|
||||
cron-20210831/build-macos.yaml-406-69466103-failed/2_test-annex (crippled-tmp).txt
|
||||
cron-20210831/build-macos.yaml-406-69466103-failed/test-annex (crippled-tmp)/8_Run tests.txt
|
||||
```
|
||||
|
||||
we got two test fails on a crippled FS on Mac (does not happen on linux afaik)
|
||||
|
||||
[example CI log](https://github.com/datalad/git-annex/runs/3468283573?check_suite_focus=true)
|
||||
|
||||
Both look like
|
||||
|
||||
```
|
||||
2021-08-31T02:15:42.0758760Z magic: OK (2.41s)
|
||||
2021-08-31T02:15:42.6972710Z import: FAIL (0.62s)
|
||||
2021-08-31T02:15:42.6973680Z ./Test/Framework.hs:57:
|
||||
2021-08-31T02:15:42.6974230Z import failed (transcript follows)
|
||||
2021-08-31T02:15:42.6974760Z import import1/f
|
||||
2021-08-31T02:15:42.6976570Z Unable to remove all write permissions from /Volumes/crippledfs/importtestvjfjz3/import1/f -- perhaps it has an xattr or ACL set.
|
||||
2021-08-31T02:15:42.6977430Z failed
|
||||
2021-08-31T02:15:42.6977830Z import: 1 failed
|
||||
2021-08-31T02:15:44.1985050Z reinject: OK (1.50s)
|
||||
```
|
||||
|
||||
[here is the script](https://github.com/datalad/git-annex/blob/master/.github/workflows/tools/setup_crippledfs#L24) to setup such a crippled (FAT32) FS on OSX.
|
||||
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] (provisionally, waiting on test run) --[[Joey]]
|
|
@ -0,0 +1,34 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2021-09-01T13:48:57Z"
|
||||
content="""
|
||||
Seems that mounting that way on OSX results in a FS where files are always mode
|
||||
777 and the permissions cannot be changed.
|
||||
|
||||
When I tried using git-annex on such a FS, I saw:
|
||||
|
||||
datalads-imac:x joey$ git annex init
|
||||
init
|
||||
Detected a filesystem without fifo support.
|
||||
|
||||
Disabling ssh connection caching.
|
||||
|
||||
Filesystem allows writing to files whose write bit is not set.
|
||||
|
||||
Detected a crippled filesystem.
|
||||
|
||||
And it skips the new permissions check when on a crippled filesystem.
|
||||
|
||||
But in that that test run, it seems it is failing to detect a crippled
|
||||
filesystem. Both because of the failure and also the test suite does
|
||||
not even run the "v8 unlocked" tests when it detects a crippled filesystem.
|
||||
|
||||
Is the test suite running as root? Looks like probably yes. Running as
|
||||
root prevents detecting the issue that made it use a crippled FS above. And it
|
||||
seems that, when a FAT fs is mounted on OSX that way, symlinks actually work
|
||||
(!!!) so the other crippled FS tests also don't notice a problem.
|
||||
|
||||
So, the fix should be for init to also test if it can remove the write
|
||||
bits from a file, and it should try that test even when root.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2021-09-01T16:22:22Z"
|
||||
content="""
|
||||
> Is the test suite running as root? Looks like probably yes.
|
||||
|
||||
FWIW, it is a `runner` user [ref](https://github.com/datalad/git-annex/pull/76/checks?check_run_id=3486443350#step:8:1) (did in a temp [PR](https://github.com/datalad/git-annex/pull/76)) who is not `root` but is part of the `admin` group thus with super privileges indeed (that is why I guess we can also use `hdiutil` directly to mount that crippled FS).
|
||||
"""]]
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2021-09-02T16:22:56Z"
|
||||
content="""
|
||||
OSX test is still failing after that fix, reopened.
|
||||
"""]]
|
|
@ -0,0 +1,27 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2021-09-02T16:26:14Z"
|
||||
content="""
|
||||
My prior analysis seems right, as far as it running as root would go, but it is
|
||||
not running as root. So I missed something.
|
||||
|
||||
The test failures are both of `git-annex import`.
|
||||
Otherwise locking down files does succeed. The difference with import
|
||||
must be that the file located in a directory outside the repository.
|
||||
|
||||
Aha... The test suite is being run with TEMPDIR set to the crippled FS,
|
||||
but `.t` is in another, non-crippled FS. A very smart idea to test that,
|
||||
although I think this import test is the only one that actually uses
|
||||
TEMPDIR. (Reading the workflow file, I think it was maybe expected that
|
||||
all the tests would run in TEMPDIR, but they don't; `git-annex test`
|
||||
writes to `./.t`, other than this one test.
|
||||
|
||||
When the import directory is on a crippled FS, and the repo
|
||||
is not, it will think the FS is not crippled. Then it fails
|
||||
to remove write perms from the file while it is in the import
|
||||
directory, and the perm check then fails.
|
||||
|
||||
So, I think it should skip the perm check when doing the initial lockdown
|
||||
of the file it's going to import.
|
||||
"""]]
|
|
@ -0,0 +1,7 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2021-09-02T17:39:20Z"
|
||||
content="""
|
||||
Ok, fixed some more, hopefully all the way this time..
|
||||
"""]]
|
|
@ -0,0 +1,66 @@
|
|||
### Please describe the problem.
|
||||
|
||||
There was some recent work to ["centralize" such prompts](https://git-annex.branchable.com/devblog/day_457__improved_ssh_password_prompting/) but it seems some are still "leaking through" multiple times. May be it is because there are 2 available repos on that remote host, so annex generates one per each of those? (although it knows only about origin)
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
6.20170810+gitgff6f9e203-1~ndall+1
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
[[!format sh """
|
||||
|
||||
$> git annex get -J5 .
|
||||
get R042/R042-2013-08-16/R042-2013-08-16-CSC01a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC02a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC03a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC05a.ncs get R042/R042-2013-08-16/R042-2013-08-16-CSC04a.ncs (from datalad-archives...)
|
||||
(from datalad-archives...) (from datalad-archives...)
|
||||
|
||||
(from datalad-archives...)
|
||||
(from datalad-archives...)
|
||||
[ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
|
||||
| Unable to access these remotes: origin
|
||||
|
|
||||
| Try making some of these repositories available:
|
||||
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
|
||||
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
|
||||
| failed
|
||||
| err=git-annex: get: 1 failed
|
||||
|
|
||||
[ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
|
||||
| Unable to access these remotes: origin
|
||||
|
|
||||
| Try making some of these repositories available:
|
||||
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
|
||||
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
|
||||
| failed
|
||||
| err=git-annex: get: 1 failed
|
||||
|
|
||||
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--0501aab6b4d1ce0565921728bc92ef74f81edf0d7bcd5a77946ca58f977f2537.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
|
||||
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--8b3b08310db20ca7e3e784a21f935a78f8669efdf1396168596411f1e355e43b.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
|
||||
(from origin...) (from origin...) [ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
|
||||
| Unable to access these remotes: origin
|
||||
|
|
||||
| Try making some of these repositories available:
|
||||
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
|
||||
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
|
||||
| failed
|
||||
| err=git-annex: get: 1 failed
|
||||
|
|
||||
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--08ce5a67c7fc09f02b994a3987812a75727eaf51f3e70fa7e1030dae934f9fbc.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
|
||||
(from origin...) [ERROR] Failed to run ['git-annex', 'get', '--key', 'MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip'] under '/mnt/btrfs/datasets/datalad/crawl/workshops/mind-2017/MotivationalT'. Exit code=1. out=get MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip (transfer already in progress, or unable to take transfer lock)
|
||||
| Unable to access these remotes: origin
|
||||
|
|
||||
| Try making some of these repositories available:
|
||||
| aaa0bc14-51fc-45c8-81c2-76dff067755b -- mvdm@atlantis.hpcc.dartmouth.edu:~/data/mind-2017/MotivationalT_
|
||||
| f7f97046-ea49-4af1-9f5a-8475a5ea1e0a -- yhalchen@atlantis.hpcc.dartmouth.edu:~/mind-2017/MotivationalT [origin]
|
||||
| failed
|
||||
| err=git-annex: get: 1 failed
|
||||
|
|
||||
[ERROR] Failed to fetch any archive containing SHA256E-s17136940--bc145f07c79584181cad3763a763a2ea047282bd41153d20a63d85a44fb27a7f.ncs. Tried: ['MD5E-s237624713--dbdc4079b005b8b7f1549e00647b36d6.zip']
|
||||
(from origin...) yhalchen@discovery.dartmouth.edu's password: yhalchen@discovery.dartmouth.edu's password:
|
||||
|
||||
"""]]
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> warning added; [[done]] --[[Joey]]
|
|
@ -0,0 +1,34 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 1"
|
||||
date="2017-08-12T04:08:04Z"
|
||||
content="""
|
||||
ha -- if I do specify `--from=origin` -- only 1 prompt
|
||||
|
||||
[[!format sh \"\"\"
|
||||
$> git annex get -J5 --from=origin .
|
||||
get R042/R042-2013-08-16/R042-2013-08-16-CSC01a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC03a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC02a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC04a.ncs (from origin...) get R042/R042-2013-08-16/R042-2013-08-16-CSC05a.ncs (from origin...) yhalchen@discovery.dartmouth.edu's password:
|
||||
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
|
||||
|
||||
Unable to run git-annex-shell on remote .
|
||||
|
||||
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
|
||||
|
||||
Unable to run git-annex-shell on remote .
|
||||
|
||||
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
|
||||
|
||||
Unable to run git-annex-shell on remote .
|
||||
|
||||
git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
|
||||
|
||||
Unable to run git-annex-shell on remote .
|
||||
|
||||
SHA256E-s17136940--08ce5a67c7fc09f02b994a3987812a75727eaf51f3e70fa7e1030dae934f9fbc.ncs
|
||||
0 0% 0.00kB/s 0:00:00 SHA256E-s17136940--bc145f07c79584181cad3763a763a2ea047282bd41153d20a63d85a44fb27a7f.ncs
|
||||
0 0% 0.00kB/s 0:00:00 SHA256E-s17136940--c3a8af948c77a2df422eae50807a6e7e6e5db7a3451a562bca529d3f1a1a234f.ncs
|
||||
0 0% 0.00kB/s 0:00:00 git-annex-shell: git: createProcess: runInteractiveProcess: chdir: does not exist (No such file or directory)
|
||||
|
||||
\"\"\"]]
|
||||
"""]]
|
|
@ -0,0 +1,25 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2017-08-15T16:43:07Z"
|
||||
content="""
|
||||
The improvements around ssh password prompting require ssh connection
|
||||
caching to work. If a ssh connection fails because the wrong password is
|
||||
entered or because there's no usable tty or whatever, there's no cached
|
||||
ssh connection to reuse, so the next attempt to access that host will
|
||||
result in another password prompt.
|
||||
|
||||
Also, datalad does not seem to be running git-annex with -J. So it *can't*
|
||||
be trying to make two ssh connection at the same time. My recent work on
|
||||
ssh password prompting was mostly to fix cases where git-annex is run with
|
||||
-J.
|
||||
|
||||
It's also possible that some ssh configuration that I don't know of could
|
||||
make ssh password prompt even when git-annex is running it with
|
||||
`BatchMode=true` to avoid password prompts (in order to test if the ssh
|
||||
connection is already up). That would then result in two ssh password
|
||||
prompts, one after the other, which seems to match your transcript.
|
||||
|
||||
If you have only one remote, specifying `--from=origin` won't change
|
||||
anything. Entering the right password would change something there though..
|
||||
"""]]
|
|
@ -0,0 +1,43 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="the issue persists"
|
||||
date="2019-11-01T18:12:27Z"
|
||||
content="""
|
||||
Ran into the same problem again, and it is not clear to me either connection caching is enabled or not (and why?):
|
||||
|
||||
[[!format sh \"\"\"
|
||||
[d31548v@discovery7 bids]$ git -c annex.sshcaching=true annex --debug get -J2 --from=origin sub-sid000005
|
||||
[2019-11-01 14:10:56.178577] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"git-annex\"]
|
||||
[2019-11-01 14:10:56.475956] process done ExitSuccess
|
||||
[2019-11-01 14:10:56.47622] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
|
||||
[2019-11-01 14:10:56.836271] process done ExitSuccess
|
||||
[2019-11-01 14:10:56.865928] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"log\",\"refs/heads/git-annex..8a694d5c54eb81b1e5c5446fa63bdcd13daa34b3\",\"--pretty=%H\",\"-n1\"]
|
||||
[2019-11-01 14:10:57.229787] process done ExitSuccess
|
||||
[2019-11-01 14:10:57.234655] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
|
||||
[2019-11-01 14:10:57.23592] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
|
||||
[2019-11-01 14:10:57.546203] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"symbolic-ref\",\"-q\",\"HEAD\"]
|
||||
[2019-11-01 14:10:57.780246] process done ExitSuccess
|
||||
[2019-11-01 14:10:57.780454] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"refs/heads/master\"]
|
||||
[2019-11-01 14:10:58.097345] process done ExitSuccess
|
||||
[2019-11-01 14:10:58.09754] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"ls-files\",\"--cached\",\"-z\",\"--\",\"sub-sid000005\"]
|
||||
[2019-11-01 14:10:58.298181] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
|
||||
[2019-11-01 14:10:58.29998] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
|
||||
[2019-11-01 14:10:58.305022] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
|
||||
[2019-11-01 14:10:58.306024] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
|
||||
[2019-11-01 14:10:58.62005] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
|
||||
[2019-11-01 14:10:58.621714] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
|
||||
[2019-11-01 14:10:58.632596] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
|
||||
[2019-11-01 14:10:58.6338] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"]
|
||||
get sub-sid000005/ses-actions1/fmap/sub-sid000005_ses-actions1_acq-25mm_magnitude2.nii.gz get sub-sid000005/ses-actions1/fmap/sub-sid000005_ses-actions1_acq-25mm_magnitude1.nii.gz (from origin...) (from origin...)
|
||||
[2019-11-01 14:10:59.489719] chat: ssh [\"yohtest@rolando.cns.dartmouth.edu\",\"-T\",\"git-annex-shell 'p2pstdio' '/inbox/BIDS/Haxby/Sam/1021_actions' '--debug' 'fd3f7af9-cf7d-4d7e-8efd-30e6bedf838d' --uuid d839134c-3afe-4456-920a-e280ce0fdf2a\"]
|
||||
|
||||
[2019-11-01 14:10:59.553029] chat: ssh [\"yohtest@rolando.cns.dartmouth.edu\",\"-T\",\"git-annex-shell 'p2pstdio' '/inbox/BIDS/Haxby/Sam/1021_actions' '--debug' 'fd3f7af9-cf7d-4d7e-8efd-30e6bedf838d' --uuid d839134c-3afe-4456-920a-e280ce0fdf2a\"]
|
||||
yohtest@rolando.cns.dartmouth.edu's password: yohtest@rolando.cns.dartmouth.edu's password:
|
||||
|
||||
|
||||
[d31548v@discovery7 bids]$ git annex version
|
||||
git-annex version: 7.20191024-g6dc2272
|
||||
\"\"\"]]
|
||||
Could you hint me on what/where to dig?
|
||||
"""]]
|
|
@ -0,0 +1,21 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2020-01-23T15:51:46Z"
|
||||
content="""
|
||||
I notice that debug output has no BatchMode=true in any ssh call. But
|
||||
the version of git-annex you show always runs ssh with that when
|
||||
-J is used, unless sshcaching is disabled.
|
||||
|
||||
More evidence that sshcaching is disabled in your transcript is that when
|
||||
it does run ssh, it does not pass -S.
|
||||
|
||||
I think the repository must be on a crippled filesystem, on which
|
||||
git-annex can't do ssh connection caching, because the filesystem
|
||||
does not support unix sockets. (Or it potentially could be crippled in some
|
||||
other way.) So it ignores the annex.sshcaching setting.
|
||||
You could work around this by setting the (undocumented)
|
||||
GIT_ANNEX_TMP_DIR to some temporary directory on a non-crippled filesystem.
|
||||
|
||||
I'm going to add a warning message in this situation.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 5"
|
||||
date="2020-01-23T17:51:58Z"
|
||||
content="""
|
||||
Thank you Joey! I can only confirm that the file system was likely a crippled/NFS one... So we would likely need to do some sensing on DataLad side and instruct git-annex. Will continue on our end at https://github.com/datalad/datalad/issues/4075
|
||||
"""]]
|
|
@ -0,0 +1,79 @@
|
|||
In datalad test builds with git-annex 7.20191114+git43-ge29663773, one
|
||||
of the new test failures is due to an unexpectedly dirty repository
|
||||
([related datalad issue][0]). The dirty status comes from a file that
|
||||
was tracked in Git switching over to an annex pointer file. Here's a
|
||||
script that distills enough of the test to trigger the failure on my
|
||||
end.
|
||||
|
||||
[[!format sh """
|
||||
#!/bin/sh
|
||||
|
||||
set -eu
|
||||
|
||||
assert_clean () {
|
||||
if test -n "$(git status --porcelain)"
|
||||
then
|
||||
printf "\n\nUnexpectedly dirty:\n" >&2
|
||||
git status >&2
|
||||
git diff >&2
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
cd "$(mktemp -d --tmpdir gx-pointer-dirty-XXXXXXX)"
|
||||
git init && git annex init
|
||||
|
||||
printf content-git >file-git
|
||||
git -c annex.largefiles=nothing annex add -- file-git
|
||||
git commit -m'file-git added'
|
||||
assert_clean
|
||||
|
||||
printf content-annex >file-annex
|
||||
git -c annex.largefiles=anything annex add -- file-annex
|
||||
git commit -m'file-annex annexed'
|
||||
assert_clean
|
||||
"""]]
|
||||
|
||||
On Travis as well as my local machine, the failure is intermittent,
|
||||
but seems to happen much more often than not. In the failing case,
|
||||
the last assert_clean call shows:
|
||||
|
||||
```
|
||||
Unexpectedly dirty:
|
||||
On branch master
|
||||
Changes not staged for commit:
|
||||
modified: file-git
|
||||
|
||||
no changes added to commit
|
||||
diff --git a/file-git b/file-git
|
||||
index d1c416a..b41ca32 100644
|
||||
--- a/file-git
|
||||
+++ b/file-git
|
||||
@@ -1 +1 @@
|
||||
-content-git
|
||||
\ No newline at end of file
|
||||
+/annex/objects/SHA256E-s11--726732d25826965592478fcc7c145d5a10fa1aa70c49fe3a4f847174b6d8889c
|
||||
```
|
||||
|
||||
I see the failure with git-annex built from the latest master
|
||||
b962471c2 (2019-12-12). Bisecting against the git-annex repo (with a
|
||||
commit being marked "bad" if there was a failure within ten runs of the
|
||||
above script), points to ec08b66bd (shouldAnnex: check isInodeKnown,
|
||||
2019-10-23) as the first bad commit. Just looking at the topic of
|
||||
the commit, that result seems plausible to me.
|
||||
|
||||
### Other details
|
||||
|
||||
My git version 2.24.1 and locally I'm building git-annex through guix.
|
||||
On the failing Travis run, git-annex 7.20191114+git43-ge29663773 came
|
||||
from neurodebian, and the git version was 2.24.0.
|
||||
|
||||
Hopefully the script above is sufficient to trigger the issue on your end.
|
||||
Thanks for having a look.
|
||||
|
||||
[0]: https://github.com/datalad/datalad/issues/3890
|
||||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -0,0 +1,97 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-12-26T16:56:38Z"
|
||||
content="""
|
||||
The title makes it sound like a work tree file gets replaced with a
|
||||
dangling pointer file, which is not the case. A worktree file that was
|
||||
not annexed is is being added to the annex, if you choose to commit that
|
||||
state.
|
||||
|
||||
For whatever reason, git becomes confused about whether this file is
|
||||
modified. I seem to recall that git distrusts information it recorded in
|
||||
its own index if the mtime of the index file is too close to the
|
||||
mtime recorded inside it, or something like that. (Likely as a
|
||||
workaround for mtime granularity issues with various filesystems.) Whatever
|
||||
the reason, git-annex is not involved in it; it will happen sometimes even
|
||||
when git-annex has not initialized the repo and is not being used.
|
||||
|
||||
It's not normally a problem that git gets confused or distrusts its
|
||||
index or whatever, since all it does is stat the file, or
|
||||
feed it through the clean filter again, and if the file is not
|
||||
modified, nothing changes.
|
||||
|
||||
Why does the clean filter decide to add the file to annex in this case?
|
||||
|
||||
Well, because this is all happening inside this:
|
||||
|
||||
git -c annex.largefiles=anything annex add -- file-annex
|
||||
|
||||
And there you've told it to add all files to the annex with
|
||||
annex.largefiles=anything. So it does.
|
||||
|
||||
To complete the description of what happens:
|
||||
`git-annex add` runs `git add` on the `file-annex` symlink it's adding.
|
||||
`git add file-annex`, for whatever reason, decides to run the clean filter on
|
||||
file-git.
|
||||
The annex.largefiles=anything gets inherited through this chain of calls.
|
||||
|
||||
While the resulting "change" does not get staged by `git add`
|
||||
(it was never asked to operate on that file), the clean filter
|
||||
duly ingests the content into the annex, and remembers its inode.
|
||||
So when the clean filter later gets run by `git status`, it sees an inode
|
||||
it knows it saw before, and assumes it should remain annexed.
|
||||
(This is why the commit that checks for known inodes was fingered by the
|
||||
bisection.)
|
||||
|
||||
---
|
||||
|
||||
Note that, you can accomplish the same thing without setting
|
||||
annex.largefiles, assuming a current version of git-annex:
|
||||
|
||||
git add file-git
|
||||
git annex add file-annex
|
||||
|
||||
I think the only reason for setting annex.largefiles in either of the two
|
||||
places you did is if there's a default value that you want to
|
||||
temporarily override?
|
||||
|
||||
----
|
||||
|
||||
Also, just touching file-git before the annex.largefiles=anything
|
||||
operation causes the same problem, again git-annex add runs git add
|
||||
file-annex, which runs the clean filter on file-git, which this time
|
||||
is legitimately modified.
|
||||
|
||||
---
|
||||
|
||||
Possible ways to improve this short of improving git's behavior:
|
||||
|
||||
`git annex` could set annex.gitaddtoannex=false when it runs `git add`.
|
||||
Since git-annex never relies on `git add` adding files to the annex,
|
||||
that seems entirely safe to always do (perhaps even when running all git
|
||||
commands aside from git-annex commands of course). But, that would
|
||||
not help with a variant where rather than `git-annex add`,
|
||||
this is run:
|
||||
|
||||
git -c annex.largefiles=anything add file-annex
|
||||
|
||||
The clean filter could delay long enough that git stops distrusting
|
||||
its index based on timestamps. A 1 second sleep if the file's mtime
|
||||
is too close to the current time works; I prototyped a patch doing that.
|
||||
But, that does not deal with the case
|
||||
mentioned above where file-git gets touched or legitimately modified.
|
||||
|
||||
The clean filter could check if the file is already
|
||||
in the index but is not annexed, and avoid converting it to annexed.
|
||||
But that would prevent legitimate conversions from git to annexed
|
||||
as well, which rely on the same kind of use of annex.largefiles.
|
||||
|
||||
Temporary overrides of annex.largefiles could be ignored by the clean
|
||||
filter. Same problem as previous.
|
||||
|
||||
So, I think that fixing this will involve adding a new interface for
|
||||
converting between git and annexed files that does not involve
|
||||
-c annex.largefiles. That plus having the clean filter check for
|
||||
non-annexed files seems like the best approach.
|
||||
"""]]
|
|
@ -0,0 +1,42 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2019-12-27T06:22:23Z"
|
||||
content="""
|
||||
On second thought, making the clean filter check for non-annexed files
|
||||
would prevent use cases like annex.largefiles=largerthan(100kb)
|
||||
from working as the user intended and letting a small file start out
|
||||
non-annexed and get annexed once it gets too large. Users certianly rely on
|
||||
that and this bug that only affects an edge case does not justify breaking
|
||||
that.
|
||||
|
||||
What would work to make the clean filter detect when a file's content
|
||||
has not changed, though its mtime (or inode) has changed. In that case,
|
||||
it's reasonable for the clean filter to ignore annex.largefiles and keep
|
||||
the content represented in git however it already was (non-annexed or
|
||||
annexed).
|
||||
|
||||
To detect that, in the case where the file in the index is not annexed:
|
||||
First check if the file size is the same as the
|
||||
size in the index. If it is, run git hash-object on the file, and see if
|
||||
the sha1 is the same as in the index. This avoids hashing any unusually
|
||||
large files, so the clean filter only gets a bit slower.
|
||||
|
||||
And when the file in the index is annexed, check if the file size is the
|
||||
same as the size of the annexed key. If it is, verify if the file content
|
||||
matches the key. (typically be hashing). Cases where keys lack size or
|
||||
don't use a checksum could lead to false positives or negatives though.
|
||||
Although, I've not managed to find a version of this bug that makes an
|
||||
annexed file get converted to git unintentionally, so maybe this part does
|
||||
not need to be done?
|
||||
|
||||
----
|
||||
|
||||
Or.. Since the root of the problem is temporarily overriding annex.largefiles,
|
||||
it could just be documented that it's not a good idea to use
|
||||
-c annex.largefiles=anything/nothing, because such broad overrides
|
||||
can affect other files than the ones you intended.
|
||||
(And since the documented methods of converting files from annexed to git and
|
||||
git to annexed use such overrides, that documentation would need to be
|
||||
changed.)
|
||||
"""]]
|
|
@ -0,0 +1,16 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2019-12-27T17:11:42Z"
|
||||
content="""
|
||||
A variant of this where an annexed unlocked file is added first,
|
||||
then the file is touched, and then some other file is added
|
||||
with -c annex.largefiles=nothing does result in the clean filter sending
|
||||
the whole annexed file content back to git, rather than keeping it annexed.
|
||||
For whatever reason, git does not store that content in .git/objects or
|
||||
update the index for that file though, so it doesn't show up as a change.
|
||||
|
||||
So *apparently* that variant is only potentially an expensive cat of a
|
||||
large annexed file, and does not need to be dealt with. Unless git
|
||||
sometimes behaves otherwise.
|
||||
"""]]
|
|
@ -0,0 +1,45 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2019-12-27T18:41:12Z"
|
||||
content="""
|
||||
It's almost possible to get the same unwanted conversion without any git
|
||||
races:
|
||||
|
||||
echo content-git > file-git
|
||||
sleep 2
|
||||
git add file-git
|
||||
git commit -m add
|
||||
|
||||
echo foo > file-git
|
||||
echo content-annex > file-annex
|
||||
git -c annex.largefiles=anything annex add file-annex
|
||||
|
||||
In this case, git currently does not run the modified file-git through the
|
||||
clean filter in the last line, so the annex.largefiles=anything doesn't
|
||||
affect it.
|
||||
|
||||
But, as far as I can see, there's nothing preventing a future version
|
||||
of git from deciding it does want to run file-git through the clean filter
|
||||
in this case.
|
||||
|
||||
I am not going to try to prevent against such a thing happening.
|
||||
As far as I can see, anything that the clean filter can possibly do to
|
||||
avoid such a situation will cripple existing uses cases of
|
||||
annex.largefiles, like largerthan() as mentioned above.
|
||||
The user has told git-annex to annex "anything", and if git
|
||||
decides to run the clean filter while that is in effect, caveat emptor.
|
||||
|
||||
Which is not to say I'm not going to fix the specific case this bug was
|
||||
filed about. I actually have a fix developed now. But just to say that
|
||||
setting annex.largefiles=anything/nothing temporarily is a blunt instrument,
|
||||
and you risk accidental conversion when using it, and so it would be a good
|
||||
idea to not do that.
|
||||
|
||||
One idea: Make `git-annex add --annex` and `git-annex add --git`
|
||||
add a specific file to annex or git, bypassing annex.largefiles and all
|
||||
other configuration and state. This could also be used to easily switch
|
||||
a file from one storage to the other. I'd hope the existence of that
|
||||
would prevent one-off setting of annex.largefiles=anything/nothing.
|
||||
[[todo/git_annex_add_option_to_control_to_where]]
|
||||
"""]]
|
|
@ -0,0 +1,58 @@
|
|||
[[!comment format=mdwn
|
||||
username="kyle"
|
||||
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||
subject="comment 5"
|
||||
date="2019-12-28T21:06:46Z"
|
||||
content="""
|
||||
Thanks for the explanation and the fix.
|
||||
|
||||
> For whatever reason, git becomes confused about whether this file is
|
||||
> modified. I seem to recall that git distrusts information it recorded in
|
||||
> its own index if the mtime of the index file is too close to the
|
||||
> mtime recorded inside it, or something like that.
|
||||
|
||||
I see. I think the problem and associated workaround you're referring
|
||||
to is described in git's Documentation/technical/racy-git.txt.
|
||||
|
||||
> Note that, you can accomplish the same thing without setting
|
||||
> annex.largefiles, assuming a current version of git-annex:
|
||||
>
|
||||
> git add file-git
|
||||
> git annex add file-annex
|
||||
>
|
||||
> I think the only reason for setting annex.largefiles in either of the two
|
||||
> places you did is if there's a default value that you want to
|
||||
> temporarily override?
|
||||
|
||||
Right. DataLad's methods that are responsible for calling out to `git
|
||||
annex add` have a `git={None,False,True}` parameter. By default
|
||||
(`None`), DataLad just calls `git annex add ...` and let's any
|
||||
configuration in the repo control whether the file goes to git or is
|
||||
annexed. But with `git=True` or `git=False`, the `annex add` call
|
||||
includes a `-c annex.largefiles=` argument with a value of `nothing`
|
||||
or `anything`, respectively.
|
||||
|
||||
> But just to say that setting annex.largefiles=anything/nothing
|
||||
> temporarily is a blunt instrument, and you risk accidental
|
||||
> conversion when using it, and so it would be a good idea to not do
|
||||
> that.
|
||||
|
||||
Noted. As mentioned above, DataLad's default behavior is to honor the
|
||||
repo's `annex.largefiles` configuration. And the documentation for
|
||||
`datalad save`, DataLad's main user-facing entry point for `annex
|
||||
add`, recommends that the user configure .gitattributes rather than
|
||||
using the option that leads calling `annex add` with `-c
|
||||
annex.largefiles=nothing`.
|
||||
|
||||
> One idea: Make `git-annex add --annex` and `git-annex add --git`
|
||||
> add a specific file to annex or git, bypassing annex.largefiles and all
|
||||
> other configuration and state. This could also be used to easily switch
|
||||
> a file from one storage to the other. I'd hope the existence of that
|
||||
> would prevent one-off setting of annex.largefiles=anything/nothing.
|
||||
|
||||
As far as I can see, those flags would completely cover DataLad's
|
||||
one-off setting of `annex.largefiles=anything/nothing`. They map
|
||||
directly to DataLad's `git=False/True` option described above. So,
|
||||
from DataLad's perspective, they'd be very useful and welcome.
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2020-01-01T17:41:13Z"
|
||||
content="""
|
||||
I've added git-annex add --force-large and --force-small, which would be
|
||||
good to use to avoid this kind of too-broad overriding problem in the future.
|
||||
"""]]
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue