improve sqlite MultiWriter handling of read after write
This removes a messy caveat that was easy to forget and caused at least one bug. The price paid is that, after a write to a MultiWriter db, it has to close the db connection that it had been using to read, and open a new connection. So it might be a little bit slower. But, writes are usually batched together, so there's often only a single write, and so there should not be much of a slowdown. Notice that SingleWriter already closed the db connection after a write, so paid the same overhead. This is the second try at fixing a bug: git-annex get when run as the first git-annex command in a new repo did not populate all unlocked files. (Reversion in version 8.20210621) Sponsored-by: Boyd Stephen Smith Jr. on Patreon
This commit is contained in:
parent
ade67b78c5
commit
f4bdecc4ec
4 changed files with 54 additions and 19 deletions
|
@ -14,6 +14,9 @@ git-annex (8.20211012) UNRELEASED; urgency=medium
|
||||||
occurred when downloading the chunk, rather than the error that
|
occurred when downloading the chunk, rather than the error that
|
||||||
occurred when trying to download the unchunked content, which is less
|
occurred when trying to download the unchunked content, which is less
|
||||||
likely to actually be stored in the remote.
|
likely to actually be stored in the remote.
|
||||||
|
* git-annex get when run as the first git-annex command in a new repo
|
||||||
|
did not populate all unlocked files.
|
||||||
|
(Reversion in version 8.20210621)
|
||||||
|
|
||||||
-- Joey Hess <id@joeyh.name> Mon, 11 Oct 2021 14:09:13 -0400
|
-- Joey Hess <id@joeyh.name> Mon, 11 Oct 2021 14:09:13 -0400
|
||||||
|
|
||||||
|
|
|
@ -45,19 +45,13 @@ type TableName = String
|
||||||
{- Sqlite only allows a single write to a database at a time; a concurrent
|
{- Sqlite only allows a single write to a database at a time; a concurrent
|
||||||
- write will crash.
|
- write will crash.
|
||||||
-
|
-
|
||||||
- MultiWrter works around this limitation.
|
- MultiWrter works around this limitation. It uses additional resources
|
||||||
- The downside of using MultiWriter is that after writing a change to the
|
- when writing, because it needs to open the database multiple times. And
|
||||||
- database, the a query using the same DbHandle will not immediately see
|
- writes to the database may block for some time, if other processes are also
|
||||||
- the change! This is because the change is actually written using a
|
- writing to it.
|
||||||
- separate database connection, and caching can prevent seeing the change.
|
|
||||||
- Also, consider that if multiple processes are writing to a database,
|
|
||||||
- you can't rely on seeing values you've just written anyway, as another
|
|
||||||
- process may change them.
|
|
||||||
-
|
-
|
||||||
- When a database can only be written to by a single process (enforced by
|
- When a database can only be written to by a single process (enforced by
|
||||||
- a lock file), use SingleWriter. Changes written to the database will
|
- a lock file), use SingleWriter. (Multiple threads can still write.)
|
||||||
- always be immediately visible then. Multiple threads can write; their
|
|
||||||
- writes will be serialized.
|
|
||||||
-}
|
-}
|
||||||
data DbConcurrency = SingleWriter | MultiWriter
|
data DbConcurrency = SingleWriter | MultiWriter
|
||||||
|
|
||||||
|
@ -89,9 +83,6 @@ closeDb (DbHandle _ worker jobs) = do
|
||||||
- Only one action can be run at a time against a given DbHandle.
|
- Only one action can be run at a time against a given DbHandle.
|
||||||
- If called concurrently in the same process, this will block until
|
- If called concurrently in the same process, this will block until
|
||||||
- it is able to run.
|
- it is able to run.
|
||||||
-
|
|
||||||
- Note that when the DbHandle was opened in MultiWriter mode, recent
|
|
||||||
- writes may not be seen by queryDb.
|
|
||||||
-}
|
-}
|
||||||
queryDb :: DbHandle -> SqlPersistM a -> IO a
|
queryDb :: DbHandle -> SqlPersistM a -> IO a
|
||||||
queryDb (DbHandle _ _ jobs) a = do
|
queryDb (DbHandle _ _ jobs) a = do
|
||||||
|
@ -165,7 +156,7 @@ workerThread db tablename jobs = go
|
||||||
Right (QueryJob a) -> a >> loop
|
Right (QueryJob a) -> a >> loop
|
||||||
Right (ChangeJob a) -> do
|
Right (ChangeJob a) -> do
|
||||||
a
|
a
|
||||||
-- Exit this sqlite transaction so the
|
-- Exit this sqlite connection so the
|
||||||
-- database gets updated on disk.
|
-- database gets updated on disk.
|
||||||
return True
|
return True
|
||||||
-- Change is run in a separate database connection
|
-- Change is run in a separate database connection
|
||||||
|
@ -174,7 +165,11 @@ workerThread db tablename jobs = go
|
||||||
-- that the write is made to.
|
-- that the write is made to.
|
||||||
Right (RobustChangeJob a) -> do
|
Right (RobustChangeJob a) -> do
|
||||||
liftIO (a (runSqliteRobustly tablename db))
|
liftIO (a (runSqliteRobustly tablename db))
|
||||||
loop
|
-- Exit this sqlite connection so the
|
||||||
|
-- change that was just written, using
|
||||||
|
-- a different db handle, is immediately
|
||||||
|
-- visible to queries.
|
||||||
|
return True
|
||||||
|
|
||||||
-- Like runSqlite, but more robust.
|
-- Like runSqlite, but more robust.
|
||||||
--
|
--
|
||||||
|
|
|
@ -64,9 +64,6 @@ flushDbQueue (DQ hdl qvar) = do
|
||||||
-
|
-
|
||||||
- Queries will not see changes that have been recently queued,
|
- Queries will not see changes that have been recently queued,
|
||||||
- so use with care.
|
- so use with care.
|
||||||
-
|
|
||||||
- Also, when the database was opened in MultiWriter mode,
|
|
||||||
- queries may not see changes even after flushDbQueue.
|
|
||||||
-}
|
-}
|
||||||
queryDbQueue :: DbQueue -> SqlPersistM a -> IO a
|
queryDbQueue :: DbQueue -> SqlPersistM a -> IO a
|
||||||
queryDbQueue (DQ hdl _) = queryDb hdl
|
queryDbQueue (DQ hdl _) = queryDb hdl
|
||||||
|
|
|
@ -43,3 +43,43 @@ This outputs 1 for foo, followed by annex pointer files for files bar and baz.
|
||||||
|
|
||||||
The previous fix attempt did make foo get populated, before that none
|
The previous fix attempt did make foo get populated, before that none
|
||||||
of the files were populated.
|
of the files were populated.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
`GIT_TRACE=1` shows that git only runs the smudge filter on the first
|
||||||
|
file, not the other two. And indeed, restagePointerFile is only called
|
||||||
|
on the first file.
|
||||||
|
|
||||||
|
Added debugging to Database.Keys.reconcileStaged, and it adds all 3 files to
|
||||||
|
the associated files table, but only adds the inode cache of foo.
|
||||||
|
And that's what I see in the db after the fact too. Which is
|
||||||
|
not itself a problem, to the extent that the other files are not
|
||||||
|
populated, and only populated files have an inode cache recorded.
|
||||||
|
|
||||||
|
So, Database.Keys.reconcileStaged is called after it gets foo,
|
||||||
|
but before the other files are present, and in reconcilepointerfile it
|
||||||
|
calls populatePointerFile and records the inode cache for foo.
|
||||||
|
That is how foo gets populated.
|
||||||
|
|
||||||
|
But, the other 2 files do not have populatePointerFile run on them.
|
||||||
|
In moveAnnex, it calls getAssociatedFiles and somehow that returns
|
||||||
|
`[]`, for all 3 files. This does not matter for foo, because it gets
|
||||||
|
populated by reconcileStaged as explained above. But for the other 2, with
|
||||||
|
no known associated files of course it fails to populate them.
|
||||||
|
|
||||||
|
So: Why is getAssociatedFiles returning `[]`? Those calls come
|
||||||
|
after Database.Keys.reconcileStaged has added the associated files,
|
||||||
|
but are somehow not seeing the changes it made.
|
||||||
|
|
||||||
|
Ah.. The keys db is opened in MultiWriter mode.
|
||||||
|
See the comment above the definition of MultiWriter,
|
||||||
|
which explains that a write to a MultiWriter database,
|
||||||
|
followed by a flushDbQueue may not be visible when reading
|
||||||
|
from that same database.
|
||||||
|
|
||||||
|
Verified this by making it re-open the db after reconcileStaged,
|
||||||
|
which did fix the problem.
|
||||||
|
|
||||||
|
A better fix is possible: Make MultiWriter mode not have this hidden
|
||||||
|
gotcha, by re-opening the db after writing to it always. [[done]]
|
||||||
|
--[[Joey]]
|
||||||
|
|
Loading…
Reference in a new issue