merging sqlite and bs branches

Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
This commit is contained in:
Joey Hess 2019-12-06 15:17:54 -04:00
commit 2f9a80d803
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
266 changed files with 2860 additions and 1325 deletions

View file

@ -0,0 +1,123 @@
### Please describe the problem.
git-annex fails to add file to the repository because of permission problem (probably faulty permission handling in WSL). Interestingly, it is possible to add a file anyway, by executing `git annex add` twice. Unfortunately, files added this way are writeable, when they shouldn't.
It's probably not in the scope of git-annex developing, but I think it's good to keep trace on the problem.
### What steps will reproduce the problem?
```
cd /mnt/c
git init test
cd test
git annex init test
init test
touch file
git annex add file
```
### What version of git-annex are you using? On what operating system?
Windows 10 Pro version 1909 build 18363.476 - WSL (Arch)
```
git-annex version: 7.20191114-ga95efcbc55
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.3 feed-1.2.0.1 ghc-8.6.5 http-client-0.6.4 persistent-sqlite-2.10.5 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
local repository version: 7
```
### Please provide any additional information below.
[[!format sh """
$ git annex init test
init test
Detected a filesystem without fifo support.
Disabling ssh connection caching.
(scanning for unlocked files...)
ok
(recording state in git...)
$ touch file
$ git annex add file --debug
[2019-11-28 11:52:53.048398] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:52:53.0570463] process done ExitSuccess
[2019-11-28 11:52:53.0573639] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:52:53.0656397] process done ExitFailure 1
[2019-11-28 11:52:53.0660529] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--others","--exclude-standard","-z","--","file"]
[2019-11-28 11:52:53.0742999] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","check-attr","-z","--stdin","annex.backend","annex.numcopies","annex.largefiles","--"]
[2019-11-28 11:52:53.0822627] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2019-11-28 11:52:53.0853736] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
add file [2019-11-28 11:52:53.0949002] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:52:53.1027361] process done ExitSuccess
[2019-11-28 11:52:53.1030132] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:52:53.1122577] process done ExitFailure 1
[2019-11-28 11:52:53.1232169] call: cp ["--reflink=auto","--preserve=timestamps",".git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","file"]
[2019-11-28 11:52:53.1606206] process done ExitSuccess
git-annex: .git/annex/othertmp/file.0/file: rename: permission denied (Permission denied)
failed
[2019-11-28 11:52:53.1617248] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--modified","-z","--","file"]
[2019-11-28 11:52:53.1693198] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","diff","--name-only","--diff-filter=T","-z","--cached","--","file"]
[2019-11-28 11:52:53.1825925] process done ExitSuccess
[2019-11-28 11:52:53.1835521] process done ExitSuccess
[2019-11-28 11:52:53.1844047] process done ExitSuccess
git-annex: add: 1 failed
"""]]
Second attempt:
[[!format sh """
$ git annex add file --debug
[2019-11-28 11:57:56.4029726] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:57:56.4114361] process done ExitSuccess
[2019-11-28 11:57:56.411681] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:57:56.4201317] process done ExitFailure 1
[2019-11-28 11:57:56.420548] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--others","--exclude-standard","-z","--","file"]
[2019-11-28 11:57:56.4316368] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","check-attr","-z","--stdin","annex.backend","annex.numcopies","annex.largefiles","--"]
[2019-11-28 11:57:56.4416827] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2019-11-28 11:57:56.4452357] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
add file [2019-11-28 11:57:56.4545013] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:57:56.4626846] process done ExitSuccess
[2019-11-28 11:57:56.4629866] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:57:56.4735385] process done ExitFailure 1
[2019-11-28 11:57:56.4848163] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2019-11-28 11:57:56.488706] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
ok
[2019-11-28 11:57:56.4964438] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--modified","-z","--","file"]
[2019-11-28 11:57:56.5043041] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","diff","--name-only","--diff-filter=T","-z","--cached","--","file"]
(recording state in git...)
[2019-11-28 11:57:56.5152453] feed: xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"]
[2019-11-28 11:57:56.5426207] process done ExitSuccess
[2019-11-28 11:57:56.5438586] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--stdin-paths","--no-filters"]
[2019-11-28 11:57:56.5478542] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-index","-z","--index-info"]
[2019-11-28 11:57:56.5713] process done ExitSuccess
[2019-11-28 11:57:56.5716027] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2019-11-28 11:57:56.5803067] process done ExitSuccess
[2019-11-28 11:57:56.5807703] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","write-tree"]
[2019-11-28 11:57:56.6111405] process done ExitSuccess
[2019-11-28 11:57:56.6115303] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit-tree","ffa5a12eba0b2ea9bc5b529278597615f70c901c","--no-gpg-sign","-p","refs/heads/git-annex"]
[2019-11-28 11:57:56.6269742] process done ExitSuccess
[2019-11-28 11:57:56.6272697] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-ref","refs/heads/git-annex","0ece4a3a069693ea12cb61168cfb701040c8a7a7"]
[2019-11-28 11:57:56.6465065] process done ExitSuccess
[2019-11-28 11:57:56.6506175] process done ExitSuccess
[2019-11-28 11:57:56.651426] process done ExitSuccess
[2019-11-28 11:57:56.6520969] process done ExitSuccess
[2019-11-28 11:57:56.6527282] process done ExitSuccess
[2019-11-28 11:57:56.6536136] process done ExitSuccess
[2019-11-28 11:57:56.6554327] process done ExitSuccess
$ echo "this should fail" > file
$ cat file
this should fail
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes! Thank you very much Joey for your hard work and digging into WSL bugs :)

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="ply"
avatar="http://cdn.libravatar.org/avatar/1270501a59ed4a4042366b00295fe236"
subject="comment 3"
date="2019-11-28T11:18:50Z"
content="""
Thanks Joey for investigating this! It looks like I need to wait for WSL 2 to become available in windows public release. In the meantime I've submitted a bug on [faulty behaviour of `git annex add` on DrvFs](https://git-annex.branchable.com/bugs/WSL1__58___git-annex-add_fails_in_DrvFs_filesystem/). I don't think you can fix it, as it is apparantly a WSL problem, but I think it's good to keep track of it and warn potential users
"""]]

View file

@ -0,0 +1,146 @@
It is not a ground shaking issue, but probably would be best to handle it more gracefully.
Initially mentioned while doing install using datalad. Account/permission is required to access this particular repo, ask Canadians for access if you don't have it yet Joey. credentials I guess got asked for and cached by git upon initial invocation, so upon subsequent calls didn't ask for any:
[[!format sh """
$> datalad install https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
[INFO ] Cloning https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids [1 other candidates] into '/tmp/Coffey-mri-bids'
[INFO ] fatal: bad config line 1 in file /home/yoh/.tmp/git-annex96493-5.tmp
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
install(ok): /tmp/Coffey-mri-bids (dataset)
"""]]
which boiled down to that message being spited out during `git annex init` which samples the remote, but fails to download the config and gets instead a redirected html page:
[[!format sh """
$> git clone https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
Cloning into 'Coffey-mri-bids'...
warning: redirecting to https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids.git/
remote: Enumerating objects: 398, done.
remote: Counting objects: 100% (398/398), done.
remote: Compressing objects: 100% (282/282), done.
remote: Total 398 (delta 53), reused 393 (delta 48)
Receiving objects: 100% (398/398), 34.97 KiB | 795.00 KiB/s, done.
Resolving deltas: 100% (53/53), done.
$> git -C Coffey-mri-bids annex init --debug
...
[2019-11-27 19:27:01.341315979] Request {
host = "git.bic.mni.mcgill.ca"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/7.20190819+git2-g908476a9b-1~ndall+1")]
path = "/bic/Coffey-mri-bids/config"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
[2019-11-27 19:27:01.90016181] read: git ["config","--null","--list","--file","/home/yoh/.tmp/git-annex228094-5.tmp"]
fatal: bad config line 1 in file /home/yoh/.tmp/git-annex228094-5.tmp
[2019-11-27 19:27:01.913302324] process done ExitFailure 128
Remote origin not usable by git-annex; setting annex-ignore
$> wget -S https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
--2019-11-27 19:29:25-- https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
Resolving git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)... 132.216.133.92
Connecting to git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)|132.216.133.92|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 302 Found
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 109
Connection: keep-alive
Cache-Control: no-cache
Location: https://git.bic.mni.mcgill.ca/users/sign_in
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; secure; HttpOnly
X-Request-Id: xTcSyu4H36
X-Runtime: 0.071681
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Location: https://git.bic.mni.mcgill.ca/users/sign_in [following]
--2019-11-27 19:29:26-- https://git.bic.mni.mcgill.ca/users/sign_in
Reusing existing connection to git.bic.mni.mcgill.ca:443.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=0, private, must-revalidate
Etag: W/"305857ff0ba591a1e4ee7fec83b5687c"
Referrer-Policy: strict-origin-when-cross-origin
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; expires=Thu, 28 Nov 2019 02:29:26 -0000; secure; HttpOnly
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: DENY
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: MHFi7Yjxe82
X-Runtime: 0.063359
X-Ua-Compatible: IE=edge
X-Xss-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Length: unspecified [text/html]
Saving to: config
config [ <=> ] 13.19K --.-KB/s in 0s
2019-11-27 19:29:26 (89.1 MB/s) - config saved [13505]
$> cat config
<!DOCTYPE html>
<html class="devise-layout-html">
<head prefix="og: http://ogp.me/ns#">
<meta charset="utf-8">
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="object" property="og:type">
<meta content="GitLab" property="og:site_name">
<meta content="Sign in" property="og:title">
...
"""]]
I guess the problem is multi-faceted:
1. in case of authenticated http remote, `git` caches credentials, but then `git annex` tries to download file directly (instead of somehow via git), it could not "sense" that remote to be a valid annex and/or get files from it.
You can try with this simple one -- user "demo", password "demo":
[[!format sh """
$> git clone http://www.onerussian.com/tmp/secret-repo/.git
Cloning into 'secret-repo'...
Username for 'http://www.onerussian.com': demo
Password for 'http://demo@www.onerussian.com':
$> git -C secret-repo annex init
init (merging origin/git-annex into git-annex...)
(recording state in git...)
Remote origin not usable by git-annex; setting annex-ignore
ok
(recording state in git...)
"""]]
although remote is a proper annex, indeed `git annex` cannot use it since does not authenticate as git does.
So even though the error message is not incorrect, I would say the situation is suboptimal
2. if remote server instead of just returning 404 or 403 error code (as eg github seems to do in similar cases of non-authenticated access) instead redirects to some login page, annex feeds that page as a config to git, ignores the error message and just marks that remote as ignored for annex, while leaking that obscure "fatal" error message from git.
IMHO, ideally 1. should be addressed properly (authentication), and for 2. annex should spit out some more sensible message ("git failed to parse a config file fetched from the remote X. Please inspect it at this /path/config"), so keep that file around for debugging. As it is now I had to dig quite deep to figure out WTF is going on.
git annex 7.20190819+git2-g908476a9b-1~ndall+1 and the same with bleeding edge 7.20191114+git43-ge29663773-1~ndall+1 (probably that commit is the one with my patch for stricter git versioning, so use the count of 42 ;))
[[!meta author=yoh]]
[[!tag projects/dandi]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="related: shouldn't git annex try external remotes to download config?"
date="2019-11-28T01:22:53Z"
content="""
I haven't tested, but I can see the situation where a specific repository URL could be handled by external special remote (such as datalad, downloaders of which do handle obscure setups such as this one without 403/404 but rather forwarding to login page) which would provide authenticated access to the URL. Would annex even try that config URL via external special remotes?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2019-11-29T18:09:45Z"
content="""
one of the use-cases (will be) https://gin.g-node.org/ -- an archive of (primarily) electrophys data. The platform is based on gogs, but uses git-annex underneath. It \"will be\" because currently access to git-annex is provided only via ssh, but as of today it is already possible to `git clone` (tried on public, didn't try private) datasets via https, and developers are looking into exposing git-annex also via http. To access private datasets authentication will need to be handled
"""]]