From a673fc7cfd3a8d104b9d25c4369142e468cbb4a4 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 12 Mar 2025 13:36:16 -0400 Subject: [PATCH] recompute: stage new version of file in git When writing doc/tips/computing_annexed_files.mdwn, I noticed that a recompute --reproducible followed by a drop and a re-get did not actually test if the file could be reproducible computed again. Turns out that get and drop both operate on staged files. If there is an unstaged modification in the work tree, that's ignored. Somewhat surprisingly, other commands like info do operate on staged files. So behavior is inconsistent, and fairly surprising really, when there are unstaged modifications to files. Probably this is rarely noticed because `git-annex add` is used to add a new version of a file, and then it's staged. Or `git mv` is used to move a file, rather than `mv` of a file over top of an existing file. So it's uncommon to have an unstaged annexed file in a worktree. It might be worth making things more consistent, but that's out of scope for what I'm working on currently. Also, I anticipate that supporting unlocked files with recompute will require it to stage changes anyway. So, make recompute stage the new version of the file. I considered having recompute refuse to overwrite an existing staged file. After all, whatever version was staged before will get lost when the new version is staged over top of it. But, that's no different than `git-annex addcomputed` being run with the name of an existing staged file. Or `git-annex add` being run with a new file content when there is an existing staged file. Or, for that matter, `git add` being ran with a new content when there is an existing staged file. --- Command/AddComputed.hs | 19 +++++-------------- Command/Recompute.hs | 2 +- doc/git-annex-recompute.mdwn | 3 +-- ...ompute_special_remote_remaining_todos.mdwn | 7 ------- 4 files changed, 7 insertions(+), 24 deletions(-) diff --git a/Command/AddComputed.hs b/Command/AddComputed.hs index dd6c310b06..02d8826683 100644 --- a/Command/AddComputed.hs +++ b/Command/AddComputed.hs @@ -102,12 +102,11 @@ perform o r = do (Remote.Compute.ImmutableState False) (getInputContent fast) Nothing - (addComputed (Just "adding") True r (reproducible o) chooseBackend Just fast) + (addComputed (Just "adding") r (reproducible o) chooseBackend Just fast) next $ return True addComputed :: Maybe StringContainingQuotedPath - -> Bool -> Remote -> Maybe Reproducible -> (OsPath -> Annex Backend) @@ -117,7 +116,7 @@ addComputed -> OsPath -> NominalDiffTime -> Annex () -addComputed maddaction stagefiles r reproducibleconfig choosebackend destfile fast result tmpdir ts = do +addComputed maddaction r reproducibleconfig choosebackend destfile fast result tmpdir ts = do when (M.null outputs) $ giveup "The computation succeeded, but it did not generate any files." oks <- forM (M.keys outputs) $ \outputfile -> do @@ -148,9 +147,7 @@ addComputed maddaction stagefiles r reproducibleconfig choosebackend destfile fa | fast = do case destfile outputfile of Nothing -> noop - Just f - | stagefiles -> addSymlink f stateurlk Nothing - | otherwise -> makelink f stateurlk + Just f -> addSymlink f stateurlk Nothing return stateurlk | isreproducible = do sz <- liftIO $ getFileSize outputfile' @@ -175,16 +172,10 @@ addComputed maddaction stagefiles r reproducibleconfig choosebackend destfile fa genkey f p = do backend <- choosebackend outputfile fst <$> genKey (ks f) p backend - makelink f k = void $ makeLink f k Nothing - ingesthelper f p mk - | stagefiles = ingestwith $ do + ingesthelper f p mk = + ingestwith $ do k <- maybe (genkey f p) return mk ingestAdd' p (Just (ld f)) (Just k) - | otherwise = ingestwith $ do - k <- maybe (genkey f p) return mk - mk' <- fst <$> ingest p (Just (ld f)) (Just k) - maybe noop (makelink f) mk' - return mk' ingestwith a = a >>= \case Nothing -> giveup "ingestion failed" Just k -> do diff --git a/Command/Recompute.hs b/Command/Recompute.hs index 17246d10e4..82ed7ab37e 100644 --- a/Command/Recompute.hs +++ b/Command/Recompute.hs @@ -137,7 +137,7 @@ perform o r file origkey origstate = do go program reproducibleconfig result tmpdir ts = do checkbehaviorchange program (Remote.Compute.computeState result) - addComputed Nothing False r reproducibleconfig + addComputed Nothing r reproducibleconfig choosebackend destfile False result tmpdir ts checkbehaviorchange program state = do diff --git a/doc/git-annex-recompute.mdwn b/doc/git-annex-recompute.mdwn index 498c85e26c..f10125827c 100644 --- a/doc/git-annex-recompute.mdwn +++ b/doc/git-annex-recompute.mdwn @@ -15,8 +15,7 @@ By default, this only recomputes files whose input files have changed. The new contents of the input files are used to re-run the computation. When the output of the computation is different, the computed file is -updated with the new content. The updated file is written to the worktree, -but is not staged, in order to avoid overwriting any staged changes. +updated with the new content. The updated file is staged in git. # OPTIONS diff --git a/doc/todo/compute_special_remote_remaining_todos.mdwn b/doc/todo/compute_special_remote_remaining_todos.mdwn index 820b423199..c6e5a64de6 100644 --- a/doc/todo/compute_special_remote_remaining_todos.mdwn +++ b/doc/todo/compute_special_remote_remaining_todos.mdwn @@ -1,13 +1,6 @@ This is the remainder of my todo list while I was building the compute special remote. --[[Joey]] -* recompute should stage files in git. Otherwise, - `git-annex drop` after recompute --reproducible drops the staged - file, and `git-annex get` gets the staged file, and if it wasn't - actually reproducible, this is not apparent. - - This is blocking adding the tip. - * Support parallel get of input files. The design allows for this, but how much parallelism makes sense? Would it be possible to use the usual worker pool?