Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-02-02 16:48:58 -04:00
commit 1cfe72c103
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 98 additions and 1 deletions

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="jeanpmbox-456@7222359de8d1f37a7cf25a519e8faf90a9517b50"
nickname="jeanpmbox-456"
avatar="http://cdn.libravatar.org/avatar/164eb4254c5f83d95d3e0b810ff7aab9"
subject="comment 1"
date="2020-02-01T11:35:37Z"
content="""
I finally saw thanks to the file `Build/NullSoftInstaller.hs` and to NirSoft Program WhatInStartup that the startup script is located in `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup`.
It would be nice to have an option to activate this or not in the installation.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="erewhon"
avatar="http://cdn.libravatar.org/avatar/b9bd5ad7176ebe149d0f051dcfe0a63e"
subject="Thank you"
date="2020-02-02T18:44:51Z"
content="""
Thank you both for the suggestions. I am going to give the approach using the local cache a try.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="alternate keys"
date="2020-01-31T19:23:35Z"
content="""
\"every time something about a key is looked up in the git-annex branch, it would also need to look at the metadata to see if this alt_keys field is set\" -- not every time, just when checking if the key is checksum-based, and if content matches the checksum. Also, isn't metadata [[cached in a database|design/caching_database]]?
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="https://christian.amsuess.com/chrysn"
nickname="chrysn"
avatar="http://christian.amsuess.com/avatar/c6c0d57d63ac88f3541522c4b21198c3c7169a665a2f2d733b4f78670322ffdc"
subject="Re: comment 1 "
date="2020-01-31T19:47:59Z"
content="""
The proposed implementation may be inefficient, but the idea has merit.
What if that information is stored in a place where it can be used to verify migrations?
For example, when entering that the migrating remote dropped the data into `git-annex:aaa/bbb/SHA1-s1234--somehash.log`, somewhere near there a record could be added that this was migrated to SHA512-s1234--longerhash. When then all the other remotes are asked to drop that file, they can actually do that because they see that it has been migrated, can verify the migration and are free to drop the file.
Even later, when a remote wants to get an old name (eg. because it checked out an old version of master), it can look up the key, find where it was migrated to, and make the data available under its own name (by copying, or maybe by placing a symlink pointing file from `.git/annex/objects/Aa/Bb/SHA1-s1234--somehash/SHA1-s1234--somehash` to the new.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 4"
date="2020-01-31T20:32:00Z"
content="""
\"can be used to verify migrations\" -- my hope was to *avoid* migrations, i.e. to get the benefit you'd get from migrating to a checksum-based key, without doing the migration.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="simpler proposal"
date="2020-01-31T21:46:57Z"
content="""
So, to fully and properly implement what the title of this todo suggests -- \"alternate keys for same content\" -- might be hard. But to simply enable adding checksums to WORM/URL keys, stored separately on the git-annex branch rather than encoded in the key's name, is simpler. This would let some WORM/URL keys to be treated as checksum-based keys when getting contents from untrusted remotes or when checking integrity with `git-annex-fsck`. But this isn't really \"alternate keys for same content\": the content would be stored under only the WORM/URL key under which it was initially recorded. The corresponding MD5 key would not be recorded in [[location_tracking]] as present.
Checking whether a WORM/URL key has an associated checksum could be sped up by keeping a Bloom filter representing the set of WORM/URL keys for which `alt_keys` is set.
In the `addurl --fast` case for special remotes, where the remote can determine a file's checksum without downloading, a checksum-based key would be recorded to begin with, as happens with `addurl` without `--fast`. Currently I do this by manually calling plumbing commands like `git-annex-setpresentkey`, but having `addurl` do it seems better.
"""]]

View file

@ -0,0 +1,19 @@
[[!comment format=mdwn
username="Chel"
avatar="http://cdn.libravatar.org/avatar/a42feb5169f70b3edf7f7611f7e3640c"
subject="comment 6"
date="2020-02-01T02:32:01Z"
content="""
There is also `aaa/bbb/*.log.cid` in git-annex branch for \"per-remote content identifiers for keys\".
It could be another place to store alternate keys, but it is per-remote, so... no.
As for the metadata field `alt_keys` — it is another case of
\"[setting a metadata field to a key](/todo/Bidirectional_metadata/#comment-788380998b25267c5b99c4a865277102)\"
in [[Bidirectional metadata]].
Also, there is an interesting idea of [[git-annex-migrate using git-replace]].
By the way, as far as I know (maybe things have changed since then),
ipfs has a similar problem of different identifiers for the same content.
Because it encodes how things are stored. And hash functions can also be changed.
"""]]

View file

@ -34,5 +34,5 @@ be useful to speed up checks on larger files. The license is a
I know it might sound like a conflict of interest, but I *swear* I am
not bringing this up only as a oblique feline reference. ;) -- [[anarcat]]
> Let's concentrate on xxhash or other new hashes that are getting general
> Let's concentrate on [[xxhash|todo/add_xxHash_backend]] or other new hashes that are getting general
> adoption, not niche hashes like meow. [[done]] --[[Joey]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="Chel"
avatar="http://cdn.libravatar.org/avatar/a42feb5169f70b3edf7f7611f7e3640c"
subject="comment 1"
date="2020-02-01T02:55:03Z"
content="""
Very interesting idea! But some problems:
- As mentioned, not only `.git/annex/<...>` blobs need to be replaces for every key, but also `/annex/<...>`
and all `../.git/annex/<...>`, `../../.git/annex/<...>`, etc.
- In big repositories it can create a giant amount of *refs/replace/* refs.
I don't know how it affects the performance if they are stored in .git/packed-refs,
but it can interfere with the normal operation on a repo.
For example `git show-ref` will not work without ` | grep` or something.
"""]]