diff --git a/CHANGELOG b/CHANGELOG
index 6c045d7609..3f118af109 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -12,6 +12,7 @@ git-annex (10.20221004) UNRELEASED; urgency=medium
     by only asking for files under the prefix.
   * When importing from versioned remotes, fix tracking of the content
     of deleted files.
+  * More robust handling of ErrorBusy when writing to sqlite databases.
 
  -- Joey Hess <id@joeyh.name>  Mon, 03 Oct 2022 13:36:42 -0400
 
diff --git a/Database/Handle.hs b/Database/Handle.hs
index ce22d0a661..283eef4cf2 100644
--- a/Database/Handle.hs
+++ b/Database/Handle.hs
@@ -82,22 +82,20 @@ queryDb (DbHandle _ jobs) a = do
 
 {- Writes a change to the database.
  -
- - Writes can fail if another write is happening concurrently.
- - So write failures are caught and retried repeatedly for up to 10
- - seconds, which should avoid all but the most exceptional problems.
+ - Writes can fail when another write is happening concurrently.
+ - So write failures are caught and retried repeatedly.
  -}
 commitDb :: DbHandle -> SqlPersistM () -> IO ()
-commitDb h wa = robustly Nothing 100 (commitDb' h wa)
+commitDb h wa = robustly (commitDb' h wa)
   where
-	robustly :: Maybe SomeException -> Int -> IO (Either SomeException ()) -> IO ()
-	robustly e 0 _ = error $ "failed to commit changes to sqlite database: " ++ show e
-	robustly _ n a = do
+	robustly :: IO (Either SomeException ()) -> IO ()
+	robustly a = do
 		r <- a
 		case r of
 			Right _ -> return ()
-			Left e -> do
+			Left _ -> do
 				threadDelay 100000 -- 1/10th second
-				robustly (Just e) (n-1) a
+				robustly a
 
 commitDb' :: DbHandle -> SqlPersistM () -> IO (Either SomeException ())
 commitDb' (DbHandle _ jobs) a = do
diff --git a/doc/bugs/get_is_busy_doing_nothing/comment_27_45168f110bded2f8c8f9777e1edda945._comment b/doc/bugs/get_is_busy_doing_nothing/comment_27_45168f110bded2f8c8f9777e1edda945._comment
new file mode 100644
index 0000000000..55307eb53d
--- /dev/null
+++ b/doc/bugs/get_is_busy_doing_nothing/comment_27_45168f110bded2f8c8f9777e1edda945._comment
@@ -0,0 +1,36 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 27"""
+ date="2022-10-17T18:49:47Z"
+ content="""
+[[todo/withExclusiveLock_blocking_issue]] does not have to be solved for
+every other lock in git-annex first. Since the sqlite database lock would
+be a new lock file, it could use the mtime update method described in there
+without backwards compatibility issues.
+
+ErrorBusy can also occur when opening a new database connection for read,
+but it retries that as often as necessary. Which does mean that suspending
+git-annex at just the wrong time can already cause other git-annex
+processes to stall forever waiting to read from the database. 
+
+So, in a way, it would be ok for write to also retry each time it gets
+ErrorBusy, rather than the current limited number of retries. If that does
+cause git-annex to block when another git-annex process is suspended, it
+would not be a new behavior.
+
+Also, the mtime file method described in
+[[todo/withExclusiveLock_blocking_issue]] could be used without a lock file
+in order to detect when a suspended process is causing ErrorBusy. And can
+avoid that situation for both writes and reads.
+
+So, plan: 
+
+1. Retry forever on ErrorBusy when writing to sqlite database.
+   (I've made this change now... So I think probably this bug can't
+   occur any longer.)
+2. While running opensettle and ChangeJob, have a background thread that 
+   periodically updates a mtime file.
+3. If ErrorBusy is received repeatedly for some amount of time,
+   check the mtime file. If it's not being updated, give up, since
+   a suspended git-annex process apparently has the sqlite database locked.
+"""]]
diff --git a/doc/todo/withExclusiveLock_blocking_issue.mdwn b/doc/todo/withExclusiveLock_blocking_issue.mdwn
index 6915015da3..32c750ed7b 100644
--- a/doc/todo/withExclusiveLock_blocking_issue.mdwn
+++ b/doc/todo/withExclusiveLock_blocking_issue.mdwn
@@ -1,20 +1,83 @@
-Some parts of git-annex use withExclusiveLock or otherwise wait for an
-exclusive lock and hold it while performing an operation. Now consider what
-happens if the git-annex process is suspended. Another git-annex process
-that is running and that blocks on the same lock will stall forever, until
-the git-annex process is resumed.
+Some parts of git-annex wait for an exclusive lock, and once they take it,
+hold it while performing an operation. Now consider what happens if the
+git-annex process is suspended. Another git-annex process that is running
+and that waits to take the same exclusive lock (or a shared lock of the
+same file) will stall forever, until the git-annex process is resumed.
 
 These time windows tend to be small, but may not always be.
 
-Would it be better for the second git-annex process, rather than hanging
-indefinitely, to try to take the lock a few times over a few seconds, and
-then error out? The risk with doing that is, when 2 concurrent git-annex
-processes are running and taking the locks repeatedly, one might get
-unlucky, fail to take the lock, and error out, when waiting a little longer
-would have succeeded, because the other process is not holding the lock all
-the time.
-
 Is there any better way git-annex could handle this? Is it a significant
 problem at all? I don't think I've ever seen it happen, but I rarely ^Z
 git-annex either. How do other programs handle this, if at all?
 --[[Joey]]
+
+----
+
+Would it be better for the second git-annex process, rather than hanging
+indefinitely, to timeout after a few seconds?
+
+But how many seconds? What if the system is under heavy load?
+
+> What could be done is, update the lock's file's mtime after successfully
+> taking the lock. Then, as long as the mtime is advancing, some other
+> process is actively using it, and it's ok for our process to wait
+> longer.
+> 
+> (Updating the mtime would be a problem when locking annex object files
+> in v9 and earlier. Luckily, that locking is not done with a blocking
+> lock anyway.)
+
+> If the lock file's mtime is being checked, the process that is
+> blocking with the lock held could periodically update the mtime.
+> A background thread could manage that. If that's done every ten seconds,
+> then an mtime more than 20 seconds old indicates that the lock is
+> held by a suspended process. So git-annex would stall for up to 20-30
+> seconds before erroring out when a lock is held by a suspended process.
+> That seems acceptible, it doesn't need to deal with this situation
+> instantly, it just needs to not block indefinitely. And updating the
+> mtime every 10 seconds should not be too much IO.
+> 
+> When an old version of git-annex has the lock held, it won't be updating
+> the mtime. So if it takes longer than 10 seconds to do the operation with
+> the lock held, a new version may complain that it's suspended when it's
+> really not. This could be avoided by checking what process holds the
+> lock, and whether it's suspended. But probably 10 seconds is enough
+> time for all the operations git-annex takes a blocking lock for
+> currently to finish, and if so we don't need to worry about this situation?
+> 
+> >  Unfortunately not: importKeys takes an exclusive lock and holds it while
+> > downloading all the content! This seems like a bug though, because it can
+> > cause other git-annex processes that are eg storing content in a remote
+> > to block for a long time.
+> > 
+> > Another one is Database.Export.writeLockDbWhile, which takes an
+> > exclusive lock while running eg, Command.Export.changeExport,
+> > which may sometimes need to do a lot of work.
+> >
+> > Another one is Annex.Queue.flush, which probably mostly runs in under
+> > 10 seconds, but maybe not always, and when annex.queuesize is changed,
+> > could surely take longer.
+> 
+> To avoid problems when old git-annex's are also being used, it could
+> update and check the mtime of a different file than the lock file.
+> 
+> Start by trying to take the lock for up to 10 seconds. If it takes the
+> lock, create the mtime file and start a thread that updates the mtime 
+> every 10 seconds until the lock is closed, and delete the mtime file
+> before closing the lock handle. 
+> 
+> When it times out taking the lock, if the mtime file does not exist, an
+> old git-annex has the lock; if the mtime file does exist, then check
+> if its timestamp has advanced; if not then a new git-annex has the lock
+> and is suspended and it can error out.
+> 
+> Oops: There's a race in the method above; a timeout may occur
+> right when the other process has taken the lock, but has not updated
+> the mtime file yet. Then that process would incorrectly be treated
+> as an old git-annex process.
+> 
+> So: To support old git-annex, it seems it will need to check, when the
+> lock is held, what process has the lock. And then check if that process
+> is suspended or not. Which means looking in /proc. Ugh.
+> 
+> Or: Change to checking lock mtimes only in git-annex v11..