This commit is contained in:
Joey Hess 2024-04-30 16:08:46 -04:00
parent fa0bcba86e
commit 5b36e6b4fb
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 71 additions and 0 deletions

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2024-04-30T19:31:35Z"
content="""
See also [[todo/wishlist__58___derived_content_support]].
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2024-04-30T19:34:34Z"
content="""
An interesting benefit of using URL keys for this is the recently added
VURL keys in today's release, which work just like url keys, except a
checksum gets calculated when the content is downloaded from the url.
This allows `git-annex fsck` to verify the checksums, as well as letting
the checksum be verified when transferring the content between repositories.
(See `git-annex addurl --verifiable` documentation.)
And a nice thing about using URL or VURL keys for this is that it allows
for both fully reproducible computations and computations that generate
equivilant but not identical files. The latter corresponds to `git-annex
addurl --relaxed`.
If you use a VURL key and give it a size, then the checksum is calculated
on first download from your compute special remote, and subsequent
downloads are required to have the same checksum. Without a size, it's
relaxed and anything your compute special remote generates is treated as
effectively the same key, so there can be several checksums that git-annex
knows about, attached to the same VURL key.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2024-04-30T19:48:55Z"
content="""
About worktree provisioning, couldn't you record the sha1 of the tree
containing the files needed to generate the object, and then use
`git worktree` to make a temporary checkout of that tree? You could
`git-annex get` whatever files are necessary within the temp worktree,
which could result in recursive computations to get dependencies.
I would be careful to avoid dependency cycles though..
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2024-04-30T19:53:43Z"
content="""
On trust, it seems to me that if someone chooses to enable a particular
special remote, they are choosing to trust whatever kind of computations it
supports.
Eg a special remote could choose to always run a computation inside a
particular container system and then if you trust that container system is
secure, you can choose to use it.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2024-04-30T20:00:53Z"
content="""
About request one, receive many, it would be possible for a special remote
to run eg `git-annex reinject --guesskeys` to move additional generated
object files into .git/annex/objects/.
(Doesn't datalad do something like that when it download and unpacks a
tarball that contains several annexed files besides the one it was running
the download to get? Or perhaps it only stores the tarball in the annex and
unpacks it several times?)
"""]]