Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
f22289e90f
15 changed files with 182 additions and 0 deletions
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 24"
|
||||
date="2018-09-19T16:50:05Z"
|
||||
content="""
|
||||
It seems that *E backends ignore file extensions longer than four chars: https://git-annex.branchable.com/bugs/file_extensions_of___62__4_chars_ignored_by___42__E_backends/
|
||||
Is there some reason for doing it this way?
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
### Please describe the problem.
|
||||
If a special remote has URIs that do not end in a file extension, then symlinks to annexed files fetched from that remote will have null extensions, even if
|
||||
CHECKURL returned a filename with an extension.
|
||||
git annex --verbose --debug addurl dx://file-BXF0vYQ0QyBF509G9J12g927
|
||||
creates the symlink
|
||||
contaminants.fasta -> ../.git/annex/objects/ZF/Gj/SHA256E-s18932--c1ed24754e8b7736e275359a34b312d2d6ce2efc1d236061b04d27a9e8147c1a/SHA256E-s18932--c1ed24754e8b7736e275359a34b312d2d6ce2efc1d236061b04d27a9e8147c1a
|
||||
|
||||
which doesn't have an extension even though
|
||||
|
||||
[2018-09-18 23:49:54.875801] git-annex-remote-dnanexus[1] --> CHECKURL-CONTENTS 18932 contaminants.fasta
|
||||
addurl dx://file-BXF0vYQ0QyBF509G9J12g927 (from dnanexus) (to contaminants.fasta) [2018-09-18 23:49:54.876016] read: git ["--version"]
|
||||
[2018-09-18 23:49:54.876825] process done ExitSuccess
|
||||
|
||||
and the SHA256E backend is used.
|
|
@ -0,0 +1,38 @@
|
|||
### Please describe the problem.
|
||||
It seems that SHA256E, MD5E etc backends ignore file extensions longer than 4 characters, considering files with such extensions to have an empty extension.
|
||||
But it's not uncommon to have longer extensions; e.g. .fasta and .fasta.gz files are common in bioinformatics.
|
||||
Is it possible to remove this 4-character limit, or make it configurable?
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
(master_env_py27_v28) [12:37 PM /data/ilya-work/sw]$ cp c10.yyy c11.yyyy
|
||||
(master_env_py27_v28) [12:37 PM /data/ilya-work/sw]$ git annex calckey c11.yyyy
|
||||
SHA256E-s18841--9fd9a2607e019b7726c722d9d6f6171e6578f255bc60a0b79c525f8a3ffa05de.yyyy
|
||||
(master_env_py27_v28) [12:37 PM /data/ilya-work/sw]$ cp c10.yyy c12.yyyyy
|
||||
(master_env_py27_v28) [12:37 PM /data/ilya-work/sw]$ git annex calckey c12.yyyyy
|
||||
SHA256E-s18841--9fd9a2607e019b7726c722d9d6f6171e6578f255bc60a0b79c525f8a3ffa05de
|
||||
|
||||
(master_env_py27_v28) [12:43 PM /data/ilya-work/sw]$ git annex calckey c10.yyyy.gz
|
||||
SHA256E-s2168--21bb6c514473754cc49a455f45bc84961fe4fceb2cb0527ba2a1cfabdce6bf80.yyyy.gz
|
||||
(master_env_py27_v28) [12:43 PM /data/ilya-work/sw]$ mv c10.yyyy.gz c10.yyyyy.gz
|
||||
(master_env_py27_v28) [12:43 PM /data/ilya-work/sw]$ git annex calckey c10.yyyyy.gz
|
||||
SHA256E-s2168--21bb6c514473754cc49a455f45bc84961fe4fceb2cb0527ba2a1cfabdce6bf80.gz
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
git-annex version: 6.20180807-gebc1bb5
|
||||
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV Inotify ConcurrentOutput TorrentParser MagicMime Feed\
|
||||
s Testsuite
|
||||
dependency versions: aws-0.17.1 bloomfilter-2.0.1.0 cryptonite-0.23 DAV-1.3.1 feed-0.3.12.0 ghc-8.0.2 http-client-0.5.7.0 persistent-s\
|
||||
qlite-2.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.4.5
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_2\
|
||||
24 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE\
|
||||
2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256\
|
||||
BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
|
||||
operating system: linux x86_64
|
||||
supported repository versions: 3 5 6
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5
|
||||
local repository version: 5
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 31"
|
||||
date="2018-09-19T11:01:31Z"
|
||||
content="""
|
||||
What exacly is the difference between SETURIPRESENT and SETURLPRESENT?
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 32"
|
||||
date="2018-09-19T18:19:22Z"
|
||||
content="""
|
||||
Some questions about CHECKPRESENT Key: (1) if Key is a URL backend key, should this return true if CHECKURL on the URL would return CHECKURL-CONTENTS?
|
||||
(2) Should the external special remote implementation call GETURLS on the key and return true if CHECKURL would return CHECKURL-CONTENTS for any of the URLs?
|
||||
(3) Calling GETURLS on a URL key returns an empty list; shouldn't it return a one-element list containing the included URL (at least if a CHECKURL call on that URL
|
||||
would return CHECKURL-CONTENTS)?
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 19"
|
||||
date="2018-09-19T19:12:58Z"
|
||||
content="""
|
||||
This page says direct mode is \"deprecated\"; but, v6 is not yet official ( https://git-annex.branchable.com/forum/default_repo_version_is_still_5__63__/ )? So, for now, direct mode is the recommended thing to use?
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
Hello there!
|
||||
|
||||
I sent a part of my files to a special rsync remote ```git annex move --all --to rsync_remote```
|
||||
Then I rewrote the history of my repository ```git annex forget```
|
||||
And some days later I try ```git annex whereis file``` which tells me there is no copy left and ````git annex info rsync_remote``` which outputs no annexed keys.
|
||||
Now, I know the data is still in the rsync special remote because looking for the files SHA keys, I am able to find their annex objects directories.
|
||||
The annex objects directories are different between the local git annex and the special rsync remote, supposedly because of my rewrite.
|
||||
|
||||
Is there any simple way to repair that? To tell git-annex that the keys are accessible in the special rsync remote?
|
||||
I might be able to recover the data doing something like copying all the rsync directory locally and reinjecting files but it seems over complicated while the keys are valid.
|
||||
Moreover, the computer with the special rsync remote is a distant one whith limited support (I can't install git-annex on it) so it would be very slow to copy all the data through the network.
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="CandyAngel"
|
||||
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
|
||||
subject="comment 1"
|
||||
date="2018-09-19T11:03:54Z"
|
||||
content="""
|
||||
I think `git annex fsck --fast --from $remote` will make the local annex relearn all the files the remote has without transferring them.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="webanck"
|
||||
avatar="http://cdn.libravatar.org/avatar/cd273f76ef8c4218510b4f50ef7e1f3d"
|
||||
subject="comment 2"
|
||||
date="2018-09-19T11:40:56Z"
|
||||
content="""
|
||||
Indeed, this command did the trick!
|
||||
I didn't even think about using ```fsck``` because the ```git annex info``` was returning 0 references.
|
||||
Thanks a lot!
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 1"
|
||||
date="2018-09-19T17:40:16Z"
|
||||
content="""
|
||||
If a key has a URL by which it can be downloaded from an external special remote, and the remote supports checkurl but not checkpresent, checkpresentkey says the key isn't in the remote; but git annex whereis --key says there is. Maybe it's as intended, checkpresentkey being a plumbing command, but just wanted to note.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 1"
|
||||
date="2018-09-19T03:40:05Z"
|
||||
content="""
|
||||
After doing rmurl to remove the last URL at which a file is available, 'git annex whereis filename' still shows 'web' as having one of the copies. Bug?
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 1"
|
||||
date="2018-09-19T18:57:13Z"
|
||||
content="""
|
||||
Maybe, mention in the docs that whereis does not actually contact remotes to check they still have the files (unlike drop). Maybe, print the timestamp of the last check?
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 8"
|
||||
date="2018-09-19T16:07:44Z"
|
||||
content="""
|
||||
\"Each subdirectory has the name of a key in one of the key-value backends. The file inside also has the name of the key.\" -- is it necessary for the file inside to also have the name of the key? Repeating the already long key name leads to very long symlink targets. Could the file inside just be 'f.txt' (or whatever the extension is)?
|
||||
|
||||
Also, the terms \"key\" and \"name of the key\" are used in various places; are these the same thing?
|
||||
"""]]
|
|
@ -0,0 +1,25 @@
|
|||
Given that git-annex has interactions with AWS S3 built-in, similar to my whining about ssh:// urls, I wondered if may be s3:// urls could be supported directly by git-annex.
|
||||
Unfortunately not the case, and messages are a tiny bit misleading (see below) that initially annex just says that configuration disallows access to S3 but when tried to allow, seems to offload that to libcurl which doesn't support it.
|
||||
|
||||
The reason I am asking, is that lots of data is on S3 and for now we either relied on our datalad special remote to provide access to S3:// so we could authenticate, but for public buckets it would be overkill to demand datalad. Although we could replace them with http urls, I thought it might be better if annex could just download s3:// directly.
|
||||
|
||||
[[!format sh """
|
||||
$> git annex addurl s3://images.cocodataset.org/annotations/image_info_test2014.zip
|
||||
addurl s3://images.cocodataset.org/annotations/image_info_test2014.zip Configuration does not allow accessing s3://images.cocodataset.org/annotations/image_info_test2014.zip
|
||||
|
||||
Configuration does not allow accessing s3://images.cocodataset.org/annotations/image_info_test2014.zip
|
||||
failed
|
||||
git-annex: addurl: 1 failed
|
||||
|
||||
$> git -c annex.security.allowed-url-schemes="http https s3" -c annex.security.allowed-http-addresses=all annex addurl s3://images.cocodataset.org/annotations/image_info_test2014.zip
|
||||
addurl s3://images.cocodataset.org/annotations/image_info_test2014.zip
|
||||
curl: (1) Protocol "s3" not supported or disabled in libcurl
|
||||
failed
|
||||
git-annex: addurl: 1 failed
|
||||
|
||||
$> git annex version
|
||||
git-annex version: 6.20180913+git33-g2cd5a723f-1~ndall+1
|
||||
|
||||
"""]]
|
||||
|
||||
[[!meta author=yoh]]
|
|
@ -0,0 +1,5 @@
|
|||
"Downloading unverified content from (non-encrypted) external special remotes is prevented, because they could follow http redirects to web servers on localhost or on a private network, or in some cases to a file:/// url" -- it's be good if an exception to this could be configured for a given type of external special remote, and/or for specific special remotes.
|
||||
Sometimes I _know_ that a given external special remote doesn't do redirects, or that a given special remote repository won't have bad URLs. Remembering to do
|
||||
git -c annex.security.allow-unverified-downloads=ACKTHPPT annex get myfile
|
||||
every time is another thing to think about, when the whole point of git-annex is to not have to think about where things are :) While configuring
|
||||
annex.security.allow-unverified-downloads=ACKTHPPT permanently opens security holes.
|
Loading…
Add table
Reference in a new issue