improve sqlite MultiWriter handling of read after write
This removes a messy caveat that was easy to forget and caused at least one bug. The price paid is that, after a write to a MultiWriter db, it has to close the db connection that it had been using to read, and open a new connection. So it might be a little bit slower. But, writes are usually batched together, so there's often only a single write, and so there should not be much of a slowdown. Notice that SingleWriter already closed the db connection after a write, so paid the same overhead. This is the second try at fixing a bug: git-annex get when run as the first git-annex command in a new repo did not populate all unlocked files. (Reversion in version 8.20210621) Sponsored-by: Boyd Stephen Smith Jr. on Patreon
This commit is contained in:
parent
ade67b78c5
commit
f4bdecc4ec
4 changed files with 54 additions and 19 deletions
|
@ -14,6 +14,9 @@ git-annex (8.20211012) UNRELEASED; urgency=medium
|
|||
occurred when downloading the chunk, rather than the error that
|
||||
occurred when trying to download the unchunked content, which is less
|
||||
likely to actually be stored in the remote.
|
||||
* git-annex get when run as the first git-annex command in a new repo
|
||||
did not populate all unlocked files.
|
||||
(Reversion in version 8.20210621)
|
||||
|
||||
-- Joey Hess <id@joeyh.name> Mon, 11 Oct 2021 14:09:13 -0400
|
||||
|
||||
|
|
|
@ -45,19 +45,13 @@ type TableName = String
|
|||
{- Sqlite only allows a single write to a database at a time; a concurrent
|
||||
- write will crash.
|
||||
-
|
||||
- MultiWrter works around this limitation.
|
||||
- The downside of using MultiWriter is that after writing a change to the
|
||||
- database, the a query using the same DbHandle will not immediately see
|
||||
- the change! This is because the change is actually written using a
|
||||
- separate database connection, and caching can prevent seeing the change.
|
||||
- Also, consider that if multiple processes are writing to a database,
|
||||
- you can't rely on seeing values you've just written anyway, as another
|
||||
- process may change them.
|
||||
- MultiWrter works around this limitation. It uses additional resources
|
||||
- when writing, because it needs to open the database multiple times. And
|
||||
- writes to the database may block for some time, if other processes are also
|
||||
- writing to it.
|
||||
-
|
||||
- When a database can only be written to by a single process (enforced by
|
||||
- a lock file), use SingleWriter. Changes written to the database will
|
||||
- always be immediately visible then. Multiple threads can write; their
|
||||
- writes will be serialized.
|
||||
- a lock file), use SingleWriter. (Multiple threads can still write.)
|
||||
-}
|
||||
data DbConcurrency = SingleWriter | MultiWriter
|
||||
|
||||
|
@ -89,9 +83,6 @@ closeDb (DbHandle _ worker jobs) = do
|
|||
- Only one action can be run at a time against a given DbHandle.
|
||||
- If called concurrently in the same process, this will block until
|
||||
- it is able to run.
|
||||
-
|
||||
- Note that when the DbHandle was opened in MultiWriter mode, recent
|
||||
- writes may not be seen by queryDb.
|
||||
-}
|
||||
queryDb :: DbHandle -> SqlPersistM a -> IO a
|
||||
queryDb (DbHandle _ _ jobs) a = do
|
||||
|
@ -165,7 +156,7 @@ workerThread db tablename jobs = go
|
|||
Right (QueryJob a) -> a >> loop
|
||||
Right (ChangeJob a) -> do
|
||||
a
|
||||
-- Exit this sqlite transaction so the
|
||||
-- Exit this sqlite connection so the
|
||||
-- database gets updated on disk.
|
||||
return True
|
||||
-- Change is run in a separate database connection
|
||||
|
@ -174,7 +165,11 @@ workerThread db tablename jobs = go
|
|||
-- that the write is made to.
|
||||
Right (RobustChangeJob a) -> do
|
||||
liftIO (a (runSqliteRobustly tablename db))
|
||||
loop
|
||||
-- Exit this sqlite connection so the
|
||||
-- change that was just written, using
|
||||
-- a different db handle, is immediately
|
||||
-- visible to queries.
|
||||
return True
|
||||
|
||||
-- Like runSqlite, but more robust.
|
||||
--
|
||||
|
|
|
@ -64,9 +64,6 @@ flushDbQueue (DQ hdl qvar) = do
|
|||
-
|
||||
- Queries will not see changes that have been recently queued,
|
||||
- so use with care.
|
||||
-
|
||||
- Also, when the database was opened in MultiWriter mode,
|
||||
- queries may not see changes even after flushDbQueue.
|
||||
-}
|
||||
queryDbQueue :: DbQueue -> SqlPersistM a -> IO a
|
||||
queryDbQueue (DQ hdl _) = queryDb hdl
|
||||
|
|
|
@ -43,3 +43,43 @@ This outputs 1 for foo, followed by annex pointer files for files bar and baz.
|
|||
|
||||
The previous fix attempt did make foo get populated, before that none
|
||||
of the files were populated.
|
||||
|
||||
----
|
||||
|
||||
`GIT_TRACE=1` shows that git only runs the smudge filter on the first
|
||||
file, not the other two. And indeed, restagePointerFile is only called
|
||||
on the first file.
|
||||
|
||||
Added debugging to Database.Keys.reconcileStaged, and it adds all 3 files to
|
||||
the associated files table, but only adds the inode cache of foo.
|
||||
And that's what I see in the db after the fact too. Which is
|
||||
not itself a problem, to the extent that the other files are not
|
||||
populated, and only populated files have an inode cache recorded.
|
||||
|
||||
So, Database.Keys.reconcileStaged is called after it gets foo,
|
||||
but before the other files are present, and in reconcilepointerfile it
|
||||
calls populatePointerFile and records the inode cache for foo.
|
||||
That is how foo gets populated.
|
||||
|
||||
But, the other 2 files do not have populatePointerFile run on them.
|
||||
In moveAnnex, it calls getAssociatedFiles and somehow that returns
|
||||
`[]`, for all 3 files. This does not matter for foo, because it gets
|
||||
populated by reconcileStaged as explained above. But for the other 2, with
|
||||
no known associated files of course it fails to populate them.
|
||||
|
||||
So: Why is getAssociatedFiles returning `[]`? Those calls come
|
||||
after Database.Keys.reconcileStaged has added the associated files,
|
||||
but are somehow not seeing the changes it made.
|
||||
|
||||
Ah.. The keys db is opened in MultiWriter mode.
|
||||
See the comment above the definition of MultiWriter,
|
||||
which explains that a write to a MultiWriter database,
|
||||
followed by a flushDbQueue may not be visible when reading
|
||||
from that same database.
|
||||
|
||||
Verified this by making it re-open the db after reconcileStaged,
|
||||
which did fix the problem.
|
||||
|
||||
A better fix is possible: Make MultiWriter mode not have this hidden
|
||||
gotcha, by re-opening the db after writing to it always. [[done]]
|
||||
--[[Joey]]
|
||||
|
|
Loading…
Reference in a new issue