fix recompute of renamed files

When a computed file has been renamed, a recompute needs to write to the
new filename.

I decided to remove --others because it's not clear what it should do in
the face of renames. Should it update only other files that have not
been renamed? Or update files that use the old key to the new key
anywhere in the tree? Or write the other files to the cwd, ignoring
renames? Since --others is just a way to save on compute time, adding
this complexity at this point seems like a bad idea. May revisit later.

Added temporary TODO-compute file
This commit is contained in:
Joey Hess 2025-02-27 11:10:44 -04:00
parent 5d2a608a56
commit 9c2c3002a6
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 71 additions and 42 deletions

View file

@ -96,11 +96,11 @@ perform o r = do
Remote.Compute.runComputeProgram program state
(Remote.Compute.ImmutableState False)
(getInputContent fast)
(addComputed "adding" True r (reproducible o) (const True) fast)
(addComputed "adding" True r (reproducible o) Just fast)
next $ return True
addComputed :: StringContainingQuotedPath -> Bool -> Remote -> Maybe Reproducible -> (OsPath -> Bool) -> Bool -> Remote.Compute.ComputeState -> OsPath -> NominalDiffTime -> Annex ()
addComputed addaction stagefiles r reproducibleconfig wantfile fast state tmpdir ts = do
addComputed :: StringContainingQuotedPath -> Bool -> Remote -> Maybe Reproducible -> (OsPath -> Maybe OsPath) -> Bool -> Remote.Compute.ComputeState -> OsPath -> NominalDiffTime -> Annex ()
addComputed addaction stagefiles r reproducibleconfig destfile fast state tmpdir ts = do
let outputs = Remote.Compute.computeOutputs state
when (M.null outputs) $
giveup "The computation succeeded, but it did not generate any files."
@ -120,29 +120,29 @@ addComputed addaction stagefiles r reproducibleconfig wantfile fast state tmpdir
where
addfile outputfile
| fast = do
when (wantfile outputfile) $
if stagefiles
then addSymlink outputfile stateurlk Nothing
else makelink stateurlk
case destfile outputfile of
Nothing -> noop
Just f
| stagefiles -> addSymlink f stateurlk Nothing
| otherwise -> makelink f stateurlk
return stateurlk
| isreproducible = do
sz <- liftIO $ getFileSize outputfile'
metered Nothing sz Nothing $ \_ p ->
if wantfile outputfile
then ingesthelper p Nothing
else genkey p
| otherwise =
if wantfile outputfile
then ingesthelper nullMeterUpdate
(Just stateurlk)
else return stateurlk
case destfile outputfile of
Just f -> ingesthelper f p Nothing
Nothing -> genkey outputfile p
| otherwise = case destfile outputfile of
Just f -> ingesthelper f nullMeterUpdate
(Just stateurlk)
Nothing -> return stateurlk
where
stateurl = Remote.Compute.computeStateUrl r state outputfile
stateurlk = fromUrl stateurl Nothing True
outputfile' = tmpdir </> outputfile
ld = LockedDown ldc ks
ks = KeySource
{ keyFilename = outputfile
ld f = LockedDown ldc (ks f)
ks f = KeySource
{ keyFilename = f
, contentLocation = outputfile'
, inodeCache = Nothing
}
@ -151,16 +151,16 @@ addComputed addaction stagefiles r reproducibleconfig wantfile fast state tmpdir
Just k -> do
logStatus NoLiveUpdate k InfoPresent
return k
genkey p = do
genkey f p = do
backend <- chooseBackend outputfile
fst <$> genKey ks p backend
makelink k = void $ makeLink outputfile k Nothing
ingesthelper p mk
fst <$> genKey (ks f) p backend
makelink f k = void $ makeLink f k Nothing
ingesthelper f p mk
| stagefiles = ingestwith $
ingestAdd' p (Just ld) mk
ingestAdd' p (Just (ld f)) mk
| otherwise = ingestwith $ do
mk' <- fst <$> ingest p (Just ld) mk
maybe noop makelink mk'
mk' <- fst <$> ingest p (Just (ld f)) mk
maybe noop (makelink f) mk'
return mk'
ldc = LockDownConfig

View file

@ -29,7 +29,6 @@ cmd = notBareRepo $
data RecomputeOptions = RecomputeOptions
{ recomputeThese :: CmdParams
, originalOption :: Bool
, othersOption :: Bool
, reproducible :: Maybe Reproducible
, computeRemote :: Maybe (DeferredParse Remote)
}
@ -41,10 +40,6 @@ optParser desc = RecomputeOptions
( long "original"
<> help "recompute using original content of input files"
)
<*> switch
( long "others"
<> help "stage other files that are recomputed in passing"
)
<*> parseReproducible
<*> optional (mkParseRemoteOption <$> parseRemoteOption)
@ -111,25 +106,28 @@ start' o r si file key =
-- TODO When reproducible is not set, preserve the
-- reproducible/unreproducible of the input key.
perform :: RecomputeOptions -> Remote -> OsPath -> Key -> Remote.Compute.ComputeState -> CommandPerform
perform o r file key oldstate = do
perform o r file key origstate = do
program <- Remote.Compute.getComputeProgram r
fast <- Annex.getRead Annex.fast
showOutput
Remote.Compute.runComputeProgram program oldstate
Remote.Compute.runComputeProgram program origstate
(Remote.Compute.ImmutableState True)
(getinputcontent program fast)
(addComputed "processing" False r (reproducible o) wantfile fast)
(addComputed "processing" False r (reproducible o) destfile fast)
next $ return True
where
getinputcontent program fast p
| originalOption o =
case M.lookup p (Remote.Compute.computeInputs oldstate) of
case M.lookup p (Remote.Compute.computeInputs origstate) of
Just inputkey -> getInputContent' fast inputkey
(fromOsPath p ++ "(key " ++ serializeKey inputkey ++ ")")
Nothing -> Remote.Compute.computationBehaviorChangeError program
"requesting a new input file" p
| otherwise = getInputContent fast p
wantfile outputfile
| othersOption o = True
| otherwise = outputfile == file
destfile outputfile
| Just outputfile == origfile = Just file
| otherwise = Nothing
origfile = headMaybe $ M.keys $ M.filter (== Just key)
(Remote.Compute.computeOutputs origstate)

36
TODO-compute Normal file
View file

@ -0,0 +1,36 @@
* recompute could ingest keys for other files than the one being
recomputed, and remember them. Then recomputing those files could just
use those keys, without re-running a computation. (Better than --others
which got removed.)
* `git-annex recompute foo bar baz`, when foo depends on bar which depends
on baz, and when baz has changed, will not recompute foo, because bar has
not changed. It then recomputes bar. So running the command again is
needed to recompute foo.
What it could do is, after it recomputes bar, notice that it already
considered foo, and revisit foo, and recompute it then. It could either
use a bloom filter to remember the files it considered but did not
compute, or it could just notice that the command line includes foo
(or includes a directory that contains foo), and then foo is not
modified.
Or it could build a DAG and traverse it, but building a DAG of a large
directory tree has its own problems.
* recompute should use the same key backend for a file that it used before
(except when --reproducible/--unreproducible is passed).
* Check recompute's handling of --reproducible and --unreproducible.
* addcomputed should honor annex.addunlocked.
* Perhaps recompute should write a new version of a file as an unlocked
file when the file is currently unlocked?
* Support non-annexed files as inputs to computations.
* Should addcomputed honor annex.smallfiles? That would seem to imply
that recompute should also support recomputing non-annexed files.
Otherwise, adding a file and then recomputing it would vary in
what the content of the file is, depending on annex.smallfiles setting.

View file

@ -23,11 +23,6 @@ updated with the new content.
Use the original content of input files.
* `--others`
When recomputing one file also generates new versions of other files,
update those other files too.
* `--unreproducible`, `-u`
Convert files that were added with `git-annex addcomputed --reproducible`