Added a comment
This commit is contained in:
parent
2f278fd4b3
commit
acdefd77a6
1 changed files with 24 additions and 0 deletions
|
@ -0,0 +1,24 @@
|
|||
[[!comment format=mdwn
|
||||
username="matrss"
|
||||
avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0"
|
||||
subject="comment 13"
|
||||
date="2025-01-29T09:56:12Z"
|
||||
content="""
|
||||
> @m.risse in your example the \"data.nc\" file gets new content when retrieved from the special remote and the source file has changed.
|
||||
|
||||
True, that can happen, and the user was explicit in that they either don't care about it (non-checksum backend, URL in my PoC), or do care (checksum backend) and git-annex would fail the checksum verification.
|
||||
|
||||
> But if you already have data.nc file present in a repository, it does not get updated immediately when you update the source \"data.grib\" file.
|
||||
>
|
||||
> So, a drop and re-get of a file changes the version of the file you have available. For that matter, if the old version has been stored on other remotes, a get may retrieve either an old or a new version. That is not intuitive and it makes me wonder if using a special remote is really a good fit for what you're wanting to do
|
||||
|
||||
This I haven't entirely thought through. I'd say if the key uses a non-checksum backend, then it can only be assumed and is the users responsibility that the resulting file is functionally, even if not bit-by-bit, identical. E.g. with netCDF checksums can differ due to small details like chunking, but the data might be the same. With a checksum backend git-annex would just fail the next recompute, but the interactions with copies on other remotes could indeed get confusing.
|
||||
|
||||
> In your \"cdo\" example, it's not clear to me if the new version of the software generates an identical file to the old, or if it has a bug fix that causes it to generate a significantly different output. If the two outputs are significantly different then treating them as the same git-annex key seems questionable to me.
|
||||
|
||||
Again, two possible cases depending on if the key uses a checksum or a non-checksum backend. With a checksum: if the new version produces the same output everything is fine; if the new version produces different output then git-annex would indicate this discrepancy on the next recompute and the user has to decide how to handle it (probably by checking that the output of the new version is either functionally the same or in some way \"better\" than the old one and updating the repository to record this new key as that file).
|
||||
|
||||
Without a checksum backend the user would again have been explicit in that they don't care if the data changes for whatever reason, the key is essentially just a placeholder for the computation without a guarantee about its content.
|
||||
|
||||
Something like VURL would be a compromise between the two: it would avoid the upfront cost of computing all files (which might be very expensive), but still instruct git-annex to error out if the checksum changes at some point after the first compute. A regular migration of the computed-files-so-far to a checksum backend could achieve the same.
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue