diff --git a/CHANGELOG b/CHANGELOG index e8e4ad1d08..8c86d935a7 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -14,6 +14,9 @@ git-annex (8.20211012) UNRELEASED; urgency=medium occurred when downloading the chunk, rather than the error that occurred when trying to download the unchunked content, which is less likely to actually be stored in the remote. + * git-annex get when run as the first git-annex command in a new repo + did not populate all unlocked files. + (Reversion in version 8.20210621) -- Joey Hess Mon, 11 Oct 2021 14:09:13 -0400 diff --git a/Database/Handle.hs b/Database/Handle.hs index d7f1822dc1..349dbca30d 100644 --- a/Database/Handle.hs +++ b/Database/Handle.hs @@ -45,19 +45,13 @@ type TableName = String {- Sqlite only allows a single write to a database at a time; a concurrent - write will crash. - - - MultiWrter works around this limitation. - - The downside of using MultiWriter is that after writing a change to the - - database, the a query using the same DbHandle will not immediately see - - the change! This is because the change is actually written using a - - separate database connection, and caching can prevent seeing the change. - - Also, consider that if multiple processes are writing to a database, - - you can't rely on seeing values you've just written anyway, as another - - process may change them. + - MultiWrter works around this limitation. It uses additional resources + - when writing, because it needs to open the database multiple times. And + - writes to the database may block for some time, if other processes are also + - writing to it. - - When a database can only be written to by a single process (enforced by - - a lock file), use SingleWriter. Changes written to the database will - - always be immediately visible then. Multiple threads can write; their - - writes will be serialized. + - a lock file), use SingleWriter. (Multiple threads can still write.) -} data DbConcurrency = SingleWriter | MultiWriter @@ -89,9 +83,6 @@ closeDb (DbHandle _ worker jobs) = do - Only one action can be run at a time against a given DbHandle. - If called concurrently in the same process, this will block until - it is able to run. - - - - Note that when the DbHandle was opened in MultiWriter mode, recent - - writes may not be seen by queryDb. -} queryDb :: DbHandle -> SqlPersistM a -> IO a queryDb (DbHandle _ _ jobs) a = do @@ -165,7 +156,7 @@ workerThread db tablename jobs = go Right (QueryJob a) -> a >> loop Right (ChangeJob a) -> do a - -- Exit this sqlite transaction so the + -- Exit this sqlite connection so the -- database gets updated on disk. return True -- Change is run in a separate database connection @@ -174,7 +165,11 @@ workerThread db tablename jobs = go -- that the write is made to. Right (RobustChangeJob a) -> do liftIO (a (runSqliteRobustly tablename db)) - loop + -- Exit this sqlite connection so the + -- change that was just written, using + -- a different db handle, is immediately + -- visible to queries. + return True -- Like runSqlite, but more robust. -- diff --git a/Database/Queue.hs b/Database/Queue.hs index 68e3e42f87..434acfc9a7 100644 --- a/Database/Queue.hs +++ b/Database/Queue.hs @@ -64,9 +64,6 @@ flushDbQueue (DQ hdl qvar) = do - - Queries will not see changes that have been recently queued, - so use with care. - - - - Also, when the database was opened in MultiWriter mode, - - queries may not see changes even after flushDbQueue. -} queryDbQueue :: DbQueue -> SqlPersistM a -> IO a queryDbQueue (DQ hdl _) = queryDb hdl diff --git a/doc/bugs/initial_get_of_unlocked_file_fails_to_populate_pointer.mdwn b/doc/bugs/initial_get_of_unlocked_file_fails_to_populate_pointer.mdwn index 885205068d..74806b6952 100644 --- a/doc/bugs/initial_get_of_unlocked_file_fails_to_populate_pointer.mdwn +++ b/doc/bugs/initial_get_of_unlocked_file_fails_to_populate_pointer.mdwn @@ -43,3 +43,43 @@ This outputs 1 for foo, followed by annex pointer files for files bar and baz. The previous fix attempt did make foo get populated, before that none of the files were populated. + +---- + +`GIT_TRACE=1` shows that git only runs the smudge filter on the first +file, not the other two. And indeed, restagePointerFile is only called +on the first file. + +Added debugging to Database.Keys.reconcileStaged, and it adds all 3 files to +the associated files table, but only adds the inode cache of foo. +And that's what I see in the db after the fact too. Which is +not itself a problem, to the extent that the other files are not +populated, and only populated files have an inode cache recorded. + +So, Database.Keys.reconcileStaged is called after it gets foo, +but before the other files are present, and in reconcilepointerfile it +calls populatePointerFile and records the inode cache for foo. +That is how foo gets populated. + +But, the other 2 files do not have populatePointerFile run on them. +In moveAnnex, it calls getAssociatedFiles and somehow that returns +`[]`, for all 3 files. This does not matter for foo, because it gets +populated by reconcileStaged as explained above. But for the other 2, with +no known associated files of course it fails to populate them. + +So: Why is getAssociatedFiles returning `[]`? Those calls come +after Database.Keys.reconcileStaged has added the associated files, +but are somehow not seeing the changes it made. + +Ah.. The keys db is opened in MultiWriter mode. +See the comment above the definition of MultiWriter, +which explains that a write to a MultiWriter database, +followed by a flushDbQueue may not be visible when reading +from that same database. + +Verified this by making it re-open the db after reconcileStaged, +which did fix the problem. + +A better fix is possible: Make MultiWriter mode not have this hidden +gotcha, by re-opening the db after writing to it always. [[done]] +--[[Joey]]