Merge branch 'master' into bs

This commit is contained in:
Joey Hess 2019-12-05 11:41:30 -04:00
commit c7a4411e71
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
26 changed files with 537 additions and 10 deletions

View file

@ -1,4 +1,4 @@
When a file is annexed, a key is generated from its content and/or filesystem
When a file is annexed, a [[key|internals/key_format]] is generated from its content and/or filesystem
metadata. The file checked into git symlinks to the key. This key can later
be used to retrieve the file's content (its value).

View file

@ -0,0 +1,123 @@
### Please describe the problem.
git-annex fails to add file to the repository because of permission problem (probably faulty permission handling in WSL). Interestingly, it is possible to add a file anyway, by executing `git annex add` twice. Unfortunately, files added this way are writeable, when they shouldn't.
It's probably not in the scope of git-annex developing, but I think it's good to keep trace on the problem.
### What steps will reproduce the problem?
```
cd /mnt/c
git init test
cd test
git annex init test
init test
touch file
git annex add file
```
### What version of git-annex are you using? On what operating system?
Windows 10 Pro version 1909 build 18363.476 - WSL (Arch)
```
git-annex version: 7.20191114-ga95efcbc55
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.3 feed-1.2.0.1 ghc-8.6.5 http-client-0.6.4 persistent-sqlite-2.10.5 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
local repository version: 7
```
### Please provide any additional information below.
[[!format sh """
$ git annex init test
init test
Detected a filesystem without fifo support.
Disabling ssh connection caching.
(scanning for unlocked files...)
ok
(recording state in git...)
$ touch file
$ git annex add file --debug
[2019-11-28 11:52:53.048398] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:52:53.0570463] process done ExitSuccess
[2019-11-28 11:52:53.0573639] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:52:53.0656397] process done ExitFailure 1
[2019-11-28 11:52:53.0660529] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--others","--exclude-standard","-z","--","file"]
[2019-11-28 11:52:53.0742999] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","check-attr","-z","--stdin","annex.backend","annex.numcopies","annex.largefiles","--"]
[2019-11-28 11:52:53.0822627] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2019-11-28 11:52:53.0853736] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
add file [2019-11-28 11:52:53.0949002] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:52:53.1027361] process done ExitSuccess
[2019-11-28 11:52:53.1030132] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:52:53.1122577] process done ExitFailure 1
[2019-11-28 11:52:53.1232169] call: cp ["--reflink=auto","--preserve=timestamps",".git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","file"]
[2019-11-28 11:52:53.1606206] process done ExitSuccess
git-annex: .git/annex/othertmp/file.0/file: rename: permission denied (Permission denied)
failed
[2019-11-28 11:52:53.1617248] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--modified","-z","--","file"]
[2019-11-28 11:52:53.1693198] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","diff","--name-only","--diff-filter=T","-z","--cached","--","file"]
[2019-11-28 11:52:53.1825925] process done ExitSuccess
[2019-11-28 11:52:53.1835521] process done ExitSuccess
[2019-11-28 11:52:53.1844047] process done ExitSuccess
git-annex: add: 1 failed
"""]]
Second attempt:
[[!format sh """
$ git annex add file --debug
[2019-11-28 11:57:56.4029726] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:57:56.4114361] process done ExitSuccess
[2019-11-28 11:57:56.411681] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:57:56.4201317] process done ExitFailure 1
[2019-11-28 11:57:56.420548] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--others","--exclude-standard","-z","--","file"]
[2019-11-28 11:57:56.4316368] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","check-attr","-z","--stdin","annex.backend","annex.numcopies","annex.largefiles","--"]
[2019-11-28 11:57:56.4416827] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2019-11-28 11:57:56.4452357] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
add file [2019-11-28 11:57:56.4545013] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2019-11-28 11:57:56.4626846] process done ExitSuccess
[2019-11-28 11:57:56.4629866] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2019-11-28 11:57:56.4735385] process done ExitFailure 1
[2019-11-28 11:57:56.4848163] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2019-11-28 11:57:56.488706] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
ok
[2019-11-28 11:57:56.4964438] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--modified","-z","--","file"]
[2019-11-28 11:57:56.5043041] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","diff","--name-only","--diff-filter=T","-z","--cached","--","file"]
(recording state in git...)
[2019-11-28 11:57:56.5152453] feed: xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"]
[2019-11-28 11:57:56.5426207] process done ExitSuccess
[2019-11-28 11:57:56.5438586] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--stdin-paths","--no-filters"]
[2019-11-28 11:57:56.5478542] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-index","-z","--index-info"]
[2019-11-28 11:57:56.5713] process done ExitSuccess
[2019-11-28 11:57:56.5716027] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2019-11-28 11:57:56.5803067] process done ExitSuccess
[2019-11-28 11:57:56.5807703] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","write-tree"]
[2019-11-28 11:57:56.6111405] process done ExitSuccess
[2019-11-28 11:57:56.6115303] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit-tree","ffa5a12eba0b2ea9bc5b529278597615f70c901c","--no-gpg-sign","-p","refs/heads/git-annex"]
[2019-11-28 11:57:56.6269742] process done ExitSuccess
[2019-11-28 11:57:56.6272697] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-ref","refs/heads/git-annex","0ece4a3a069693ea12cb61168cfb701040c8a7a7"]
[2019-11-28 11:57:56.6465065] process done ExitSuccess
[2019-11-28 11:57:56.6506175] process done ExitSuccess
[2019-11-28 11:57:56.651426] process done ExitSuccess
[2019-11-28 11:57:56.6520969] process done ExitSuccess
[2019-11-28 11:57:56.6527282] process done ExitSuccess
[2019-11-28 11:57:56.6536136] process done ExitSuccess
[2019-11-28 11:57:56.6554327] process done ExitSuccess
$ echo "this should fail" > file
$ cat file
this should fail
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes! Thank you very much Joey for your hard work and digging into WSL bugs :)

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="ply"
avatar="http://cdn.libravatar.org/avatar/1270501a59ed4a4042366b00295fe236"
subject="comment 3"
date="2019-11-28T11:18:50Z"
content="""
Thanks Joey for investigating this! It looks like I need to wait for WSL 2 to become available in windows public release. In the meantime I've submitted a bug on [faulty behaviour of `git annex add` on DrvFs](https://git-annex.branchable.com/bugs/WSL1__58___git-annex-add_fails_in_DrvFs_filesystem/). I don't think you can fix it, as it is apparantly a WSL problem, but I think it's good to keep track of it and warn potential users
"""]]

View file

@ -0,0 +1,146 @@
It is not a ground shaking issue, but probably would be best to handle it more gracefully.
Initially mentioned while doing install using datalad. Account/permission is required to access this particular repo, ask Canadians for access if you don't have it yet Joey. credentials I guess got asked for and cached by git upon initial invocation, so upon subsequent calls didn't ask for any:
[[!format sh """
$> datalad install https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
[INFO ] Cloning https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids [1 other candidates] into '/tmp/Coffey-mri-bids'
[INFO ] fatal: bad config line 1 in file /home/yoh/.tmp/git-annex96493-5.tmp
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
install(ok): /tmp/Coffey-mri-bids (dataset)
"""]]
which boiled down to that message being spited out during `git annex init` which samples the remote, but fails to download the config and gets instead a redirected html page:
[[!format sh """
$> git clone https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids
Cloning into 'Coffey-mri-bids'...
warning: redirecting to https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids.git/
remote: Enumerating objects: 398, done.
remote: Counting objects: 100% (398/398), done.
remote: Compressing objects: 100% (282/282), done.
remote: Total 398 (delta 53), reused 393 (delta 48)
Receiving objects: 100% (398/398), 34.97 KiB | 795.00 KiB/s, done.
Resolving deltas: 100% (53/53), done.
$> git -C Coffey-mri-bids annex init --debug
...
[2019-11-27 19:27:01.341315979] Request {
host = "git.bic.mni.mcgill.ca"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/7.20190819+git2-g908476a9b-1~ndall+1")]
path = "/bic/Coffey-mri-bids/config"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
[2019-11-27 19:27:01.90016181] read: git ["config","--null","--list","--file","/home/yoh/.tmp/git-annex228094-5.tmp"]
fatal: bad config line 1 in file /home/yoh/.tmp/git-annex228094-5.tmp
[2019-11-27 19:27:01.913302324] process done ExitFailure 128
Remote origin not usable by git-annex; setting annex-ignore
$> wget -S https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
--2019-11-27 19:29:25-- https://git.bic.mni.mcgill.ca/bic/Coffey-mri-bids/config
Resolving git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)... 132.216.133.92
Connecting to git.bic.mni.mcgill.ca (git.bic.mni.mcgill.ca)|132.216.133.92|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 302 Found
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 109
Connection: keep-alive
Cache-Control: no-cache
Location: https://git.bic.mni.mcgill.ca/users/sign_in
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; secure; HttpOnly
X-Request-Id: xTcSyu4H36
X-Runtime: 0.071681
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Location: https://git.bic.mni.mcgill.ca/users/sign_in [following]
--2019-11-27 19:29:26-- https://git.bic.mni.mcgill.ca/users/sign_in
Reusing existing connection to git.bic.mni.mcgill.ca:443.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 28 Nov 2019 00:29:26 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=0, private, must-revalidate
Etag: W/"305857ff0ba591a1e4ee7fec83b5687c"
Referrer-Policy: strict-origin-when-cross-origin
Set-Cookie: _gitlab_session=8a4f8d5569636004aaebfb73588a2d53; path=/; expires=Thu, 28 Nov 2019 02:29:26 -0000; secure; HttpOnly
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: DENY
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: MHFi7Yjxe82
X-Runtime: 0.063359
X-Ua-Compatible: IE=edge
X-Xss-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
Length: unspecified [text/html]
Saving to: config
config [ <=> ] 13.19K --.-KB/s in 0s
2019-11-27 19:29:26 (89.1 MB/s) - config saved [13505]
$> cat config
<!DOCTYPE html>
<html class="devise-layout-html">
<head prefix="og: http://ogp.me/ns#">
<meta charset="utf-8">
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="object" property="og:type">
<meta content="GitLab" property="og:site_name">
<meta content="Sign in" property="og:title">
...
"""]]
I guess the problem is multi-faceted:
1. in case of authenticated http remote, `git` caches credentials, but then `git annex` tries to download file directly (instead of somehow via git), it could not "sense" that remote to be a valid annex and/or get files from it.
You can try with this simple one -- user "demo", password "demo":
[[!format sh """
$> git clone http://www.onerussian.com/tmp/secret-repo/.git
Cloning into 'secret-repo'...
Username for 'http://www.onerussian.com': demo
Password for 'http://demo@www.onerussian.com':
$> git -C secret-repo annex init
init (merging origin/git-annex into git-annex...)
(recording state in git...)
Remote origin not usable by git-annex; setting annex-ignore
ok
(recording state in git...)
"""]]
although remote is a proper annex, indeed `git annex` cannot use it since does not authenticate as git does.
So even though the error message is not incorrect, I would say the situation is suboptimal
2. if remote server instead of just returning 404 or 403 error code (as eg github seems to do in similar cases of non-authenticated access) instead redirects to some login page, annex feeds that page as a config to git, ignores the error message and just marks that remote as ignored for annex, while leaking that obscure "fatal" error message from git.
IMHO, ideally 1. should be addressed properly (authentication), and for 2. annex should spit out some more sensible message ("git failed to parse a config file fetched from the remote X. Please inspect it at this /path/config"), so keep that file around for debugging. As it is now I had to dig quite deep to figure out WTF is going on.
git annex 7.20190819+git2-g908476a9b-1~ndall+1 and the same with bleeding edge 7.20191114+git43-ge29663773-1~ndall+1 (probably that commit is the one with my patch for stricter git versioning, so use the count of 42 ;))
[[!meta author=yoh]]
[[!tag projects/dandi]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="related: shouldn't git annex try external remotes to download config?"
date="2019-11-28T01:22:53Z"
content="""
I haven't tested, but I can see the situation where a specific repository URL could be handled by external special remote (such as datalad, downloaders of which do handle obscure setups such as this one without 403/404 but rather forwarding to login page) which would provide authenticated access to the URL. Would annex even try that config URL via external special remotes?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2019-11-29T18:09:45Z"
content="""
one of the use-cases (will be) https://gin.g-node.org/ -- an archive of (primarily) electrophys data. The platform is based on gogs, but uses git-annex underneath. It \"will be\" because currently access to git-annex is provided only via ssh, but as of today it is already possible to `git clone` (tried on public, didn't try private) datasets via https, and developers are looking into exposing git-annex also via http. To access private datasets authentication will need to be handled
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="anarcat"
avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
subject="amazing!"
date="2019-11-26T21:07:32Z"
content="""
66% performance improvements is an amazing number! i take it this will be especially good for repositories with a large number of files? if so this could make my life MUCH better! :)
i wonder if this connects with the [problems gorzen identified in python 3 about POSIX paths](https://changelog.complete.org/archives/10063-the-fundamental-problem-in-python-3)... does Haskell have similar problems with non-unicode filenames?
in any case, I thank you for this awesome work...
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="parallelization"
date="2019-11-27T17:30:12Z"
content="""
This is great.
One other potential for speedup is fixing [[issues with parallel operations|forum/people's_experience_with_parallel_git-annex_operations]]. My current fix is to use `-J1`, giving up a potential 96X speedup. There may also be additional [[todo/parallel_possibilities]].
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="linnearight02@915958f850452a19de84ec14a765402d1f7ecdb0"
nickname="linnearight02"
avatar="http://cdn.libravatar.org/avatar/9c146ceff6ab204aa75ec5a686bd6cfb"
subject="Online Coursework Service"
date="2019-11-26T11:11:07Z"
content="""
Get the best [online coursework service](https://www.allassignmenthelp.com/online-coursework-service.html) by the top Aussie writers at cheap rates. We at [AllAssignmentHelp](https://www.allassignmenthelp.com/) known to provide custom coursework services and unlimited support to the Australian students when they place an order with us. All of our writers are well-qualified and trained professional writers, thus no need to be worried about the quality of the delivered work.
"""]]

View file

@ -0,0 +1,29 @@
[[!comment format=mdwn
username="atrent"
avatar="http://cdn.libravatar.org/avatar/6069dfebff03997460874771defa0fa4"
subject="can't find unused objects"
date="2019-12-02T07:26:41Z"
content="""
I recently migrated an annex to SHA256 (without \"E\") and I'm now trying to clean the repo from unused data.
I have a strange situation: there are 62G of unused objects:
$ du -ks .git/annex/objects/
64334024 .git/annex/objects/
but 'git annex unused' gives me only:
$ git annex unused
unused ...
Some annexed data is no longer used by any files:
NUMBER KEY
1 SHA256E-s27--32efec98dc9e05442fc2385bb85d855a8c7824c68abd4ab5bf55a4dfe412b334.pdf
(To see where data was previously used, try: git log --stat --no-textconv -S'KEY')
To remove unwanted data: git-annex dropunused NUMBER
ok
I've checked (through a small shell script) that none of the object is in fact referenced by any symlink...
May I delete them? Shall I do some other checking/fscking/repairing?
Thank you
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="atrent"
avatar="http://cdn.libravatar.org/avatar/6069dfebff03997460874771defa0fa4"
subject="P.S. they are all SHA256E"
date="2019-12-02T07:29:06Z"
content="""
the \"lost\" objects are all SHA256E (*with* the \"E\")
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="atrent"
avatar="http://cdn.libravatar.org/avatar/6069dfebff03997460874771defa0fa4"
subject="P.P.S. i dropped all local copies"
date="2019-12-02T08:03:01Z"
content="""
I forgot to tell that after migrating I synced to all remotes and dropped everything in 'here', that's why I was expecting no more objects locally.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 7"
date="2019-12-02T16:02:47Z"
content="""
Did you commit after git-migrate? Does the worktree have any uncommitted changes?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="dropping contents of old keys after migration"
date="2019-12-02T16:48:47Z"
content="""
\"May I delete them\" -- `git-annex-drop --force` may be safer, as it also updates [[location_tracking]]. You might also want to [[git-annex-dead]] the dropped keys to prevent [[git-annex-fsck]] from complaining about lost contents.
Re: why [[git-annex-unused]] isn't finding the unused contents, try running it with `--used-refspec=+HEAD`, and make sure `annex.used-refspec` git config is not set. Note that this will mark as unused any annexed contents not referenced from the latest tree of the HEAD branch, e.g. annexed files that were removed in some older commit.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 9"
date="2019-12-02T16:58:25Z"
content="""
\"I synced to all remotes and dropped everything in 'here'\" -- [[git-annex-unused]] \"Checks the *annex*\" for the unused contents (unless `--from=repository` is used), so if you dropped everything in `here`, there's nothing to find. But it seems from `du` results that contents wasn't actually dropped? [[git-annex-whereis]] tells where git-annex thinks contents is.
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="nangal.vivek@08b8bc308cb03037792b7930fd839b9deec118df"
nickname="nangal.vivek"
avatar="http://cdn.libravatar.org/avatar/f8e7f170b837feb1008df116c7f9d0de"
subject="not able to find git-annex on openSUSE using zypper"
date="2019-12-01T17:04:53Z"
content="""
Getting the following error on runnning `zypper in git-annex`
Loading repository data...
Reading installed packages...
Package 'git-annex' not found.
I am running openSUSE for WSL with the following info
NAME=\"openSUSE Leap\"
VERSION=\"15.1 \"
ID=\"opensuse-leap\"
ID_LIKE=\"suse opensuse\"
VERSION_ID=\"15.1\"
PRETTY_NAME=\"openSUSE Leap 15.1\"
ANSI_COLOR=\"0;32\"
CPE_NAME=\"cpe:/o:opensuse:leap:15.1\"
BUG_REPORT_URL=\"https://bugs.opensuse.org\"
HOME_URL=\"https://www.opensuse.org/\"
"""]]

View file

@ -0,0 +1,31 @@
[[!comment format=mdwn
username="atrent"
avatar="http://cdn.libravatar.org/avatar/6069dfebff03997460874771defa0fa4"
subject="duplicate objects?"
date="2019-11-30T14:04:17Z"
content="""
Do I understand correctly that in .git/annex/objects dir there should be no duplicates?
Here follows a run of 'rdfind' done in the objects dir:
$ rdfind .
Now scanning \".\", found 12874 files.
Now have 12874 files in total.
Removed 0 files due to nonunique device and inode.
Total size is 75579281486 bytes or 70 GiB
Removed 8376 files due to unique sizes from list.4498 files left.
Now eliminating candidates based on first bytes:removed 68 files from list.4430 files left.
Now eliminating candidates based on last bytes:removed 66 files from list.4364 files left.
Now eliminating candidates based on sha1 checksum:removed 0 files from list.4364 files left.
It seems like you have 4364 files that are not unique
Totally, 10 GiB can be reduced.
Now making results file results.txt
And here is an example pair of dupes (excerpt from the abovementioned 'results.txt'):
DUPTYPE_FIRST_OCCURRENCE 2073 3 86558 26 21057567 1 ./53/zv/SHA256E-s86558--e79a0891bb94fc9212ce2f28178fe84591c5fb24c07b5239d367099118e12ede.jpg/SHA256E-s86558--e79a0891bb94fc9212ce2f28178fe84591c5fb24c07b5239d367099118e12ede.jpg
DUPTYPE_WITHIN_SAME_TREE -2073 3 86558 26 1080608 1 ./7w/w2/SHA256E-s86558--e79a0891bb94fc9212ce2f28178fe84591c5fb24c07b5239d367099118e12ede.56.jpeg/SHA256E-s86558--e79a0891bb94fc9212ce2f28178fe84591c5fb24c07b5239d367099118e12ede.56.jpeg
Any clues?
Thank you
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="same contents with different keys"
date="2019-11-30T16:51:58Z"
content="""
@atrent -- some [[backends]] (like SHA256E) base the key not just on object contents, but also on part of its filename (the extension). So the same content can exist with two different keys. In your example, the same contents exists in one file ending with .jpg and in another ending with .56.jpeg . (This is done to give the annexed contents the same extension as the original file had before annexing, to avoid confusing some programs). There are also backends like WORM and URL, not based on checksums, that could lead to different keys with same contents. There could also be same contents added under different backends (see also [[`git-annex-migrate`|git-annex-migrate]]). Finally, there is the theoretical possibility of hash collisions.
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="atrent"
avatar="http://cdn.libravatar.org/avatar/6069dfebff03997460874771defa0fa4"
subject="no collisions"
date="2019-11-30T20:37:00Z"
content="""
I can confirm that these are not collisions: these identical files are the same photos with different names, shame on Dropbox syncing from my smartphone. I was actually hoping to dedupe through git-annex ;-)
Some more questions/suggestions/conversation-starters:
* I suppose I can dedup them with rdfind (i.e., hardlinking identical files), do you foresee any side effects?
* may I change the hash function of git-annex to something not depending on filenames? (I suppose so, I'll have a look at the docs)
* if I can change the hash function can I regenerate the whole annex without re-creating it? (again I'll have a look at docs)
Thanks
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 13"
date="2019-11-30T21:11:53Z"
content="""
[[git-annex-migrate]] to a backend not ending in E (e.g. SHA256 not SHA256E), then [[git-annex-unused]] to drop the old keys.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="hardlinking identical files in annex may break invariants"
date="2019-11-30T21:36:38Z"
content="""
P.S. Re: hardlinking identical files -- git-annex [[keeps track of inodes|todo/inode_based_clean_filter_for_less_surprising_git_add]] where contents is stored, so deleting a file might make that info stale. Also, dropping one key will drop another key's contents without updating [[location_tracking]] info. And dropping then getting files would lead to two separate copies again. So I wouldn't recommend that.
See also [[tips/local_caching_of_annexed_files]].
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="atrent"
avatar="http://cdn.libravatar.org/avatar/6069dfebff03997460874771defa0fa4"
subject="migrating..."
date="2019-11-30T22:30:06Z"
content="""
I'm git-annex-migrating (to SHA256) now, thank you for all suggestions!
"""]]

View file

@ -0,0 +1,5 @@
It would be useful to have a [[`git-annex-cat`|forum/Is_there_a___34__git_annex_cat-file__34___type_command__63__/]] command that outputs the contents of an annexed file without storing it in the annex. This [[can be faster|OPT: "bundle" get + check (of checksum) in a single operation]] than `git-annex-get` followed by `cat`, even if file is already present. It avoids some failure modes of `git-annex-get` (like running out of local space, or contending for locks). It supports a common use case of just needing a file for some operation, without needing to remember to drop it later. It could be used to implement a web server or FUSE filesystem that serves git-annex repo files on demand.
If file is not present, or `remote.here.cost` is higher than `remote.someremote.cost` where file is present, `someremote` would get a `TRANSFER` request where the `FILE` argument is a named pipe, and a `cat` of that named pipe would be started.
If file is not annexed, for uniformity `git-annex-cat file` would just call `cat file`.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="reference original bug report"
date="2019-11-29T17:58:28Z"
content="""
original bug report was https://git-annex.branchable.com/bugs/git-lfs_remote_URL_is_not_recorded__63__/ for an attempt to share some NWB data on github's LFS
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="representing paths"
date="2019-11-27T15:08:40Z"
content="""
Thanks for working on this Joey.
I don't know Haskell or git-annex architecture, so my thoughts might make no sense, but I'll post just in case.
\"There are likely quite a few places where a value is converted back and forth several times\" -- as a quick/temp fix, could memoization speed this up? Or memoizing the results of some system calls?
The many filenames flying around often share long prefixes. Could that be used to speed things up? E.g. if they could be represented as pointers into some compact storage, maybe cache performance would improve.
\"git annex find... files fly by much more snappily\" -- does this mean `git-annex-find` is testing each file individually, as opposed to constructing a SQL query to an indexed db? Maybe, simpler `git-annex-find` queries that are fully mappable to SQL queries could be special-cased?
Sorry for naive comments, I'll eventually read up on Haskell and make more sense...
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="parallelization"
date="2019-11-27T17:23:14Z"
content="""
When operating on many files, maybe run N parallel commands where i'th command ignores paths for which `(hash(filename) module N) != i`. Or, if git index has size I, i'th command ignores paths that are not legixographically between `index[(I/N)*i]` and `index[(I/N)*(i+1)]` (for index state at command start). Extending [[git-annex-matching-options]] with `--block=i` would let this be done using `xargs`.
"""]]