When writing doc/tips/computing_annexed_files.mdwn, I noticed
that a recompute --reproducible followed by a drop and a re-get did not
actually test if the file could be reproducible computed again.
Turns out that get and drop both operate on staged files. If there is an
unstaged modification in the work tree, that's ignored. Somewhat
surprisingly, other commands like info do operate on staged files. So
behavior is inconsistent, and fairly surprising really, when there are
unstaged modifications to files.
Probably this is rarely noticed because `git-annex add` is used to add a
new version of a file, and then it's staged. Or `git mv` is used to move
a file, rather than `mv` of a file over top of an existing file. So it's
uncommon to have an unstaged annexed file in a worktree.
It might be worth making things more consistent, but that's out of scope
for what I'm working on currently.
Also, I anticipate that supporting unlocked files with recompute will
require it to stage changes anyway.
So, make recompute stage the new version of the file.
I considered having recompute refuse to overwrite an existing staged
file. After all, whatever version was staged before will get lost when
the new version is staged over top of it. But, that's no different than
`git-annex addcomputed` being run with the name of an existing staged
file. Or `git-annex add` being run with a new file content when there is
an existing staged file. Or, for that matter, `git add` being ran with a
new content when there is an existing staged file.
I've lost track of them all, but it includes:
* Using the same key backend as was used in the original computation.
* Fixing bug that prevented updating the source file key in the compute
state
* Handling --reproducible and --unreproducible.
* recompute --original of a file using VURL, when the result is
different, but the key remains the same, makes the object file
be updated with the new content
* Detecting some other ways the program behavior can change, just for
completeness.
* Also adds --backend to addcomputed.
When a computed file has been renamed, a recompute needs to write to the
new filename.
I decided to remove --others because it's not clear what it should do in
the face of renames. Should it update only other files that have not
been renamed? Or update files that use the old key to the new key
anywhere in the tree? Or write the other files to the cwd, ignoring
renames? Since --others is just a way to save on compute time, adding
this complexity at this point seems like a bad idea. May revisit later.
Added temporary TODO-compute file
Proper behavior without --others implemented.
And eliminated most of the code duplication through refactoring.
Also, changed it to not stage recomputed files. This way, git diff will
show files that have differences.
The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.
Eg, a computation might be run in "foo/" and refer to "../bar" as an
input or output.
So, the subdir is part of the computation state.
Also, prevent input or output of files that are outside the git
repository. Of course, the program can access any file on disk if it
wants to; this is just a guard against mistakes. And it may also be
useful if the program comunicates with something less trusted than it,
eg a container image, so input/output files communicated by that are not
the source of security problems.