recompute: stage new version of file in git

When writing doc/tips/computing_annexed_files.mdwn, I noticed
that a recompute --reproducible followed by a drop and a re-get did not
actually test if the file could be reproducible computed again.

Turns out that get and drop both operate on staged files. If there is an
unstaged modification in the work tree, that's ignored. Somewhat
surprisingly, other commands like info do operate on staged files. So
behavior is inconsistent, and fairly surprising really, when there are
unstaged modifications to files.

Probably this is rarely noticed because `git-annex add` is used to add a
new version of a file, and then it's staged. Or `git mv` is used to move
a file, rather than `mv` of a file over top of an existing file. So it's
uncommon to have an unstaged annexed file in a worktree.

It might be worth making things more consistent, but that's out of scope
for what I'm working on currently.

Also, I anticipate that supporting unlocked files with recompute will
require it to stage changes anyway.

So, make recompute stage the new version of the file.

I considered having recompute refuse to overwrite an existing staged
file. After all, whatever version was staged before will get lost when
the new version is staged over top of it. But, that's no different than
`git-annex addcomputed` being run with the name of an existing staged
file. Or `git-annex add` being run with a new file content when there is
an existing staged file. Or, for that matter, `git add` being ran with a
new content when there is an existing staged file.
This commit is contained in:
Joey Hess 2025-03-12 13:36:16 -04:00
parent 21b45da406
commit a673fc7cfd
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 7 additions and 24 deletions

View file

@ -102,12 +102,11 @@ perform o r = do
(Remote.Compute.ImmutableState False) (Remote.Compute.ImmutableState False)
(getInputContent fast) (getInputContent fast)
Nothing Nothing
(addComputed (Just "adding") True r (reproducible o) chooseBackend Just fast) (addComputed (Just "adding") r (reproducible o) chooseBackend Just fast)
next $ return True next $ return True
addComputed addComputed
:: Maybe StringContainingQuotedPath :: Maybe StringContainingQuotedPath
-> Bool
-> Remote -> Remote
-> Maybe Reproducible -> Maybe Reproducible
-> (OsPath -> Annex Backend) -> (OsPath -> Annex Backend)
@ -117,7 +116,7 @@ addComputed
-> OsPath -> OsPath
-> NominalDiffTime -> NominalDiffTime
-> Annex () -> Annex ()
addComputed maddaction stagefiles r reproducibleconfig choosebackend destfile fast result tmpdir ts = do addComputed maddaction r reproducibleconfig choosebackend destfile fast result tmpdir ts = do
when (M.null outputs) $ when (M.null outputs) $
giveup "The computation succeeded, but it did not generate any files." giveup "The computation succeeded, but it did not generate any files."
oks <- forM (M.keys outputs) $ \outputfile -> do oks <- forM (M.keys outputs) $ \outputfile -> do
@ -148,9 +147,7 @@ addComputed maddaction stagefiles r reproducibleconfig choosebackend destfile fa
| fast = do | fast = do
case destfile outputfile of case destfile outputfile of
Nothing -> noop Nothing -> noop
Just f Just f -> addSymlink f stateurlk Nothing
| stagefiles -> addSymlink f stateurlk Nothing
| otherwise -> makelink f stateurlk
return stateurlk return stateurlk
| isreproducible = do | isreproducible = do
sz <- liftIO $ getFileSize outputfile' sz <- liftIO $ getFileSize outputfile'
@ -175,16 +172,10 @@ addComputed maddaction stagefiles r reproducibleconfig choosebackend destfile fa
genkey f p = do genkey f p = do
backend <- choosebackend outputfile backend <- choosebackend outputfile
fst <$> genKey (ks f) p backend fst <$> genKey (ks f) p backend
makelink f k = void $ makeLink f k Nothing ingesthelper f p mk =
ingesthelper f p mk ingestwith $ do
| stagefiles = ingestwith $ do
k <- maybe (genkey f p) return mk k <- maybe (genkey f p) return mk
ingestAdd' p (Just (ld f)) (Just k) ingestAdd' p (Just (ld f)) (Just k)
| otherwise = ingestwith $ do
k <- maybe (genkey f p) return mk
mk' <- fst <$> ingest p (Just (ld f)) (Just k)
maybe noop (makelink f) mk'
return mk'
ingestwith a = a >>= \case ingestwith a = a >>= \case
Nothing -> giveup "ingestion failed" Nothing -> giveup "ingestion failed"
Just k -> do Just k -> do

View file

@ -137,7 +137,7 @@ perform o r file origkey origstate = do
go program reproducibleconfig result tmpdir ts = do go program reproducibleconfig result tmpdir ts = do
checkbehaviorchange program checkbehaviorchange program
(Remote.Compute.computeState result) (Remote.Compute.computeState result)
addComputed Nothing False r reproducibleconfig addComputed Nothing r reproducibleconfig
choosebackend destfile False result tmpdir ts choosebackend destfile False result tmpdir ts
checkbehaviorchange program state = do checkbehaviorchange program state = do

View file

@ -15,8 +15,7 @@ By default, this only recomputes files whose input files have changed.
The new contents of the input files are used to re-run the computation. The new contents of the input files are used to re-run the computation.
When the output of the computation is different, the computed file is When the output of the computation is different, the computed file is
updated with the new content. The updated file is written to the worktree, updated with the new content. The updated file is staged in git.
but is not staged, in order to avoid overwriting any staged changes.
# OPTIONS # OPTIONS

View file

@ -1,13 +1,6 @@
This is the remainder of my todo list while I was building the This is the remainder of my todo list while I was building the
compute special remote. --[[Joey]] compute special remote. --[[Joey]]
* recompute should stage files in git. Otherwise,
`git-annex drop` after recompute --reproducible drops the staged
file, and `git-annex get` gets the staged file, and if it wasn't
actually reproducible, this is not apparent.
This is blocking adding the tip.
* Support parallel get of input files. The design allows for this, * Support parallel get of input files. The design allows for this,
but how much parallelism makes sense? Would it be possible to use the but how much parallelism makes sense? Would it be possible to use the
usual worker pool? usual worker pool?