2015-12-09 21:00:37 +00:00
|
|
|
{- Sqlite database of information about Keys
|
2015-12-07 17:42:03 +00:00
|
|
|
-
|
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
|
|
|
- Copyright 2015-2022 Joey Hess <id@joeyh.name>
|
2016-01-01 19:09:42 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2015-12-07 17:42:03 +00:00
|
|
|
-}
|
|
|
|
|
2016-01-12 17:31:13 +00:00
|
|
|
{-# LANGUAGE ScopedTypeVariables #-}
|
2020-04-07 17:27:11 +00:00
|
|
|
{-# LANGUAGE OverloadedStrings #-}
|
2021-06-08 16:48:30 +00:00
|
|
|
{-# LANGUAGE BangPatterns #-}
|
2015-12-07 17:42:03 +00:00
|
|
|
|
2015-12-09 21:00:37 +00:00
|
|
|
module Database.Keys (
|
2015-12-07 17:42:03 +00:00
|
|
|
DbHandle,
|
2016-05-16 18:49:12 +00:00
|
|
|
closeDb,
|
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
|
|
|
flushDb,
|
2015-12-09 21:00:37 +00:00
|
|
|
addAssociatedFile,
|
|
|
|
getAssociatedFiles,
|
2021-06-15 15:12:27 +00:00
|
|
|
getAssociatedFilesIncluding,
|
2015-12-15 17:05:23 +00:00
|
|
|
getAssociatedKey,
|
2015-12-09 21:00:37 +00:00
|
|
|
removeAssociatedFile,
|
2015-12-09 21:47:05 +00:00
|
|
|
storeInodeCaches,
|
|
|
|
addInodeCaches,
|
|
|
|
getInodeCaches,
|
|
|
|
removeInodeCaches,
|
smudge: check for known annexed inodes before checking annex.largefiles
smudge: Fix a case where an unlocked annexed file that annex.largefiles
does not match could get its unchanged content checked into git, due to git
running the smudge filter unecessarily.
When the file has the same inodecache as an already annexed file,
we can assume that the user is not intending to change how it's stored in
git.
Note that checkunchangedgitfile already handled the inverse case, where the
file was added to git previously. That goes further and actually sha1
hashes the new file and checks if it's the same hash in the index.
It would be possible to generate a key for the file and see if it's the
same as the old key, however that could be considerably more expensive than
sha1 of a small file is, and it is not necessary for the case I have, at
least, where the file is not modified or touched, and so its inode will
match the cache.
git-annex add was changed, when adding a small file, to remove the inode
cache for it. This is necessary to keep the recipe in
doc/tips/largefiles.mdwn for converting from annex to git working.
It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
which the earlier try at this change introduced.
2021-05-10 17:05:08 +00:00
|
|
|
removeInodeCache,
|
2019-10-23 18:06:11 +00:00
|
|
|
isInodeKnown,
|
2016-10-17 18:58:33 +00:00
|
|
|
runWriter,
|
2022-10-12 19:21:19 +00:00
|
|
|
updateDatabase,
|
2015-12-07 17:42:03 +00:00
|
|
|
) where
|
|
|
|
|
2016-01-11 19:52:11 +00:00
|
|
|
import qualified Database.Keys.SQL as SQL
|
2015-12-07 17:42:03 +00:00
|
|
|
import Database.Types
|
2015-12-23 22:34:51 +00:00
|
|
|
import Database.Keys.Handle
|
2022-10-12 19:21:19 +00:00
|
|
|
import Database.Keys.Tables
|
2015-12-23 18:59:58 +00:00
|
|
|
import qualified Database.Queue as H
|
2017-02-13 21:30:28 +00:00
|
|
|
import Database.Init
|
2016-01-20 20:36:33 +00:00
|
|
|
import Annex.Locations
|
|
|
|
import Annex.Common hiding (delete)
|
2016-01-01 19:50:59 +00:00
|
|
|
import qualified Annex
|
2015-12-07 17:42:03 +00:00
|
|
|
import Annex.LockFile
|
2018-08-22 19:28:57 +00:00
|
|
|
import Annex.Content.PointerFile
|
2021-07-27 18:21:09 +00:00
|
|
|
import Annex.Content.Presence.LowLevel
|
2021-10-19 17:07:49 +00:00
|
|
|
import Annex.Link (Restage(..), maxPointerSz, parseLinkTargetOrPointerLazy)
|
2015-12-09 21:00:37 +00:00
|
|
|
import Utility.InodeCache
|
2015-12-09 21:47:05 +00:00
|
|
|
import Annex.InodeSentinal
|
2018-08-21 20:48:20 +00:00
|
|
|
import Git
|
2016-01-01 19:09:42 +00:00
|
|
|
import Git.FilePath
|
2018-08-21 20:48:20 +00:00
|
|
|
import Git.Command
|
|
|
|
import Git.Types
|
2018-08-22 17:04:12 +00:00
|
|
|
import Git.Index
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
import Git.Sha
|
2021-06-07 18:51:38 +00:00
|
|
|
import Git.CatFile
|
2021-05-24 15:33:23 +00:00
|
|
|
import Git.Branch (writeTreeQuiet, update')
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
import qualified Git.Ref
|
2023-02-14 18:11:23 +00:00
|
|
|
import Config
|
2021-01-04 17:12:28 +00:00
|
|
|
import Config.Smudge
|
2020-11-05 22:45:37 +00:00
|
|
|
import qualified Utility.RawFilePath as R
|
2015-12-23 22:34:51 +00:00
|
|
|
|
2019-12-18 20:45:03 +00:00
|
|
|
import qualified Data.ByteString as S
|
2020-04-07 17:27:11 +00:00
|
|
|
import qualified Data.ByteString.Char8 as S8
|
2019-12-18 20:45:03 +00:00
|
|
|
import qualified System.FilePath.ByteString as P
|
2021-06-07 18:51:38 +00:00
|
|
|
import Control.Concurrent.Async
|
2019-12-18 20:45:03 +00:00
|
|
|
|
2015-12-23 22:34:51 +00:00
|
|
|
{- Runs an action that reads from the database.
|
|
|
|
-
|
|
|
|
- If the database is already open, any writes are flushed to it, to ensure
|
|
|
|
- consistency.
|
|
|
|
-
|
2022-10-12 19:21:19 +00:00
|
|
|
- Any queued writes to the table will be flushed before the read.
|
2015-12-23 22:34:51 +00:00
|
|
|
-}
|
2022-10-12 19:21:19 +00:00
|
|
|
runReader :: Monoid v => DbTable -> (SQL.ReadHandle -> Annex v) -> Annex v
|
|
|
|
runReader t a = do
|
2021-04-02 19:26:21 +00:00
|
|
|
h <- Annex.getRead Annex.keysdbhandle
|
2016-07-19 18:02:49 +00:00
|
|
|
withDbState h go
|
2015-12-23 22:34:51 +00:00
|
|
|
where
|
2016-02-12 18:15:28 +00:00
|
|
|
go DbUnavailable = return (mempty, DbUnavailable)
|
2022-10-12 19:21:19 +00:00
|
|
|
go (DbOpen (qh, tableschanged)) = do
|
|
|
|
tableschanged' <- if isDbTableChanged tableschanged t
|
|
|
|
then do
|
|
|
|
liftIO $ H.flushDbQueue qh
|
|
|
|
return mempty
|
|
|
|
else return tableschanged
|
2016-01-11 19:52:11 +00:00
|
|
|
v <- a (SQL.ReadHandle qh)
|
2022-10-12 19:21:19 +00:00
|
|
|
return (v, DbOpen (qh, tableschanged'))
|
2015-12-23 22:34:51 +00:00
|
|
|
go DbClosed = do
|
2022-10-12 19:21:19 +00:00
|
|
|
st <- openDb False DbClosed
|
|
|
|
v <- case st of
|
|
|
|
(DbOpen (qh, _)) -> a (SQL.ReadHandle qh)
|
2015-12-23 22:34:51 +00:00
|
|
|
_ -> return mempty
|
2022-10-12 19:21:19 +00:00
|
|
|
return (v, st)
|
2015-12-23 22:34:51 +00:00
|
|
|
|
2022-10-12 19:21:19 +00:00
|
|
|
runReaderIO :: Monoid v => DbTable -> (SQL.ReadHandle -> IO v) -> Annex v
|
|
|
|
runReaderIO t a = runReader t (liftIO . a)
|
2015-12-23 22:34:51 +00:00
|
|
|
|
|
|
|
{- Runs an action that writes to the database. Typically this is used to
|
|
|
|
- queue changes, which will be flushed at a later point.
|
|
|
|
-
|
|
|
|
- The database is created if it doesn't exist yet. -}
|
2022-10-12 19:21:19 +00:00
|
|
|
runWriter :: DbTable -> (SQL.WriteHandle -> Annex ()) -> Annex ()
|
|
|
|
runWriter t a = do
|
2021-04-02 19:26:21 +00:00
|
|
|
h <- Annex.getRead Annex.keysdbhandle
|
2015-12-23 22:34:51 +00:00
|
|
|
withDbState h go
|
|
|
|
where
|
2022-10-12 19:21:19 +00:00
|
|
|
go (DbOpen (qh, tableschanged)) = do
|
2016-01-11 19:52:11 +00:00
|
|
|
v <- a (SQL.WriteHandle qh)
|
2022-10-12 19:21:19 +00:00
|
|
|
return (v, DbOpen (qh, addDbTable tableschanged t))
|
2015-12-23 22:34:51 +00:00
|
|
|
go st = do
|
2021-08-30 16:34:19 +00:00
|
|
|
st' <- openDb True st
|
2015-12-23 22:34:51 +00:00
|
|
|
v <- case st' of
|
2022-10-12 19:21:19 +00:00
|
|
|
DbOpen (qh, _) -> a (SQL.WriteHandle qh)
|
2015-12-23 22:34:51 +00:00
|
|
|
_ -> error "internal"
|
2015-12-24 17:06:03 +00:00
|
|
|
return (v, st')
|
2015-12-23 22:34:51 +00:00
|
|
|
|
2022-10-12 19:21:19 +00:00
|
|
|
runWriterIO :: DbTable -> (SQL.WriteHandle -> IO ()) -> Annex ()
|
|
|
|
runWriterIO t a = runWriter t (liftIO . a)
|
2021-06-08 13:11:24 +00:00
|
|
|
|
2021-05-24 18:46:59 +00:00
|
|
|
{- Opens the database, creating it if it doesn't exist yet.
|
2015-12-16 17:24:45 +00:00
|
|
|
-
|
|
|
|
- Multiple readers and writers can have the database open at the same
|
|
|
|
- time. Database.Handle deals with the concurrency issues.
|
|
|
|
- The lock is held while opening the database, so that when
|
|
|
|
- the database doesn't exist yet, one caller wins the lock and
|
|
|
|
- can create it undisturbed.
|
|
|
|
-}
|
2021-06-08 13:11:24 +00:00
|
|
|
openDb :: Bool -> DbState -> Annex DbState
|
|
|
|
openDb _ st@(DbOpen _) = return st
|
|
|
|
openDb False DbUnavailable = return DbUnavailable
|
2022-08-11 20:57:44 +00:00
|
|
|
openDb forwrite _ = do
|
|
|
|
lck <- calcRepo' gitAnnexKeysDbLock
|
|
|
|
catchPermissionDenied permerr $ withExclusiveLock lck $ do
|
|
|
|
dbdir <- calcRepo' gitAnnexKeysDbDir
|
|
|
|
let db = dbdir P.</> "db"
|
|
|
|
dbexists <- liftIO $ R.doesPathExist db
|
|
|
|
case dbexists of
|
2022-11-18 17:16:57 +00:00
|
|
|
True -> open db False
|
2022-08-11 20:57:44 +00:00
|
|
|
False -> do
|
|
|
|
initDb db SQL.createTables
|
2022-11-18 17:16:57 +00:00
|
|
|
open db True
|
2015-12-23 22:34:51 +00:00
|
|
|
where
|
2021-05-24 18:46:59 +00:00
|
|
|
-- If permissions don't allow opening the database, and it's being
|
|
|
|
-- opened for read, treat it as if it does not exist.
|
|
|
|
permerr e
|
|
|
|
| forwrite = throwM e
|
|
|
|
| otherwise = return DbUnavailable
|
2018-08-21 20:48:20 +00:00
|
|
|
|
2022-11-18 17:16:57 +00:00
|
|
|
open db dbisnew = do
|
2021-10-20 16:24:40 +00:00
|
|
|
qh <- liftIO $ H.openDbQueue db SQL.containedTable
|
2022-11-18 17:16:57 +00:00
|
|
|
tc <- reconcileStaged dbisnew qh
|
2022-10-12 19:21:19 +00:00
|
|
|
return $ DbOpen (qh, tc)
|
2015-12-09 18:55:47 +00:00
|
|
|
|
2016-05-16 18:49:12 +00:00
|
|
|
{- Closes the database if it was open. Any writes will be flushed to it.
|
|
|
|
-
|
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
|
|
|
- This does not prevent further use of the database; it will be re-opened
|
|
|
|
- as necessary.
|
2016-05-16 18:49:12 +00:00
|
|
|
-}
|
|
|
|
closeDb :: Annex ()
|
2021-04-02 19:26:21 +00:00
|
|
|
closeDb = liftIO . closeDbHandle =<< Annex.getRead Annex.keysdbhandle
|
2016-05-16 18:49:12 +00:00
|
|
|
|
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
2022-10-12 17:50:46 +00:00
|
|
|
{- Flushes any queued writes to the database. -}
|
|
|
|
flushDb :: Annex ()
|
|
|
|
flushDb = liftIO . flushDbQueue =<< Annex.getRead Annex.keysdbhandle
|
|
|
|
|
2016-01-05 21:22:19 +00:00
|
|
|
addAssociatedFile :: Key -> TopFilePath -> Annex ()
|
2022-10-12 19:21:19 +00:00
|
|
|
addAssociatedFile k f = runWriterIO AssociatedTable $ SQL.addAssociatedFile k f
|
2015-12-07 17:42:03 +00:00
|
|
|
|
2015-12-09 21:00:37 +00:00
|
|
|
{- Note that the files returned were once associated with the key, but
|
2015-12-07 17:42:03 +00:00
|
|
|
- some of them may not be any longer. -}
|
2016-01-05 21:22:19 +00:00
|
|
|
getAssociatedFiles :: Key -> Annex [TopFilePath]
|
2022-10-12 19:21:19 +00:00
|
|
|
getAssociatedFiles k = emptyWhenBare $ runReaderIO AssociatedTable $
|
|
|
|
SQL.getAssociatedFiles k
|
2022-01-11 17:01:49 +00:00
|
|
|
|
|
|
|
{- Queries for associated files never return anything when in a bare
|
|
|
|
- repository, since without a work tree there can be no associated files.
|
|
|
|
-
|
|
|
|
- Normally the keys database is not even populated with associated files
|
|
|
|
- in a bare repository, but it might happen if a non-bare repo got
|
|
|
|
- converted to bare. -}
|
|
|
|
emptyWhenBare :: Annex [a] -> Annex [a]
|
2023-02-14 18:11:23 +00:00
|
|
|
emptyWhenBare a = ifM isBareRepo
|
2022-01-11 17:01:49 +00:00
|
|
|
( return []
|
|
|
|
, a
|
|
|
|
)
|
2015-12-07 17:42:03 +00:00
|
|
|
|
2021-06-15 15:12:27 +00:00
|
|
|
{- Include a known associated file along with any recorded in the database. -}
|
|
|
|
getAssociatedFilesIncluding :: AssociatedFile -> Key -> Annex [RawFilePath]
|
2022-01-11 17:01:49 +00:00
|
|
|
getAssociatedFilesIncluding afile k = emptyWhenBare $ do
|
2021-06-15 15:12:27 +00:00
|
|
|
g <- Annex.gitRepo
|
|
|
|
l <- map (`fromTopFilePath` g) <$> getAssociatedFiles k
|
|
|
|
return $ case afile of
|
|
|
|
AssociatedFile (Just f) -> f : filter (/= f) l
|
|
|
|
AssociatedFile Nothing -> l
|
|
|
|
|
2015-12-15 17:05:23 +00:00
|
|
|
{- Gets any keys that are on record as having a particular associated file.
|
|
|
|
- (Should be one or none but the database doesn't enforce that.) -}
|
2016-01-05 21:22:19 +00:00
|
|
|
getAssociatedKey :: TopFilePath -> Annex [Key]
|
2022-10-12 19:21:19 +00:00
|
|
|
getAssociatedKey f = emptyWhenBare $ runReaderIO AssociatedTable $
|
|
|
|
SQL.getAssociatedKey f
|
2015-12-15 17:05:23 +00:00
|
|
|
|
2016-01-05 21:22:19 +00:00
|
|
|
removeAssociatedFile :: Key -> TopFilePath -> Annex ()
|
2022-10-12 19:21:19 +00:00
|
|
|
removeAssociatedFile k = runWriterIO AssociatedTable .
|
|
|
|
SQL.removeAssociatedFile k
|
2015-12-23 22:34:51 +00:00
|
|
|
|
2015-12-09 21:47:05 +00:00
|
|
|
{- Stats the files, and stores their InodeCaches. -}
|
2019-12-11 18:12:22 +00:00
|
|
|
storeInodeCaches :: Key -> [RawFilePath] -> Annex ()
|
2021-07-27 16:29:10 +00:00
|
|
|
storeInodeCaches k fs = withTSDelta $ \d ->
|
|
|
|
addInodeCaches k . catMaybes
|
2019-12-11 18:12:22 +00:00
|
|
|
=<< liftIO (mapM (\f -> genInodeCache f d) fs)
|
2015-12-09 21:47:05 +00:00
|
|
|
|
|
|
|
addInodeCaches :: Key -> [InodeCache] -> Annex ()
|
2022-10-12 19:21:19 +00:00
|
|
|
addInodeCaches k is = runWriterIO ContentTable $ SQL.addInodeCaches k is
|
2015-12-09 21:00:37 +00:00
|
|
|
|
2015-12-09 21:47:05 +00:00
|
|
|
{- A key may have multiple InodeCaches; one for the annex object, and one
|
2021-07-27 18:21:09 +00:00
|
|
|
- for each pointer file that is a copy of it.
|
|
|
|
-
|
|
|
|
- When there are no pointer files, the annex object typically does not
|
|
|
|
- have its InodeCache recorded either, so the list will be empty.
|
|
|
|
-
|
|
|
|
- Note that, in repos upgraded from v7, there may be InodeCaches recorded
|
|
|
|
- for pointer files, but none recorded for the annex object.
|
|
|
|
-}
|
2015-12-09 21:47:05 +00:00
|
|
|
getInodeCaches :: Key -> Annex [InodeCache]
|
2022-10-12 19:21:19 +00:00
|
|
|
getInodeCaches = runReaderIO ContentTable . SQL.getInodeCaches
|
2015-12-09 21:47:05 +00:00
|
|
|
|
smudge: check for known annexed inodes before checking annex.largefiles
smudge: Fix a case where an unlocked annexed file that annex.largefiles
does not match could get its unchanged content checked into git, due to git
running the smudge filter unecessarily.
When the file has the same inodecache as an already annexed file,
we can assume that the user is not intending to change how it's stored in
git.
Note that checkunchangedgitfile already handled the inverse case, where the
file was added to git previously. That goes further and actually sha1
hashes the new file and checks if it's the same hash in the index.
It would be possible to generate a key for the file and see if it's the
same as the old key, however that could be considerably more expensive than
sha1 of a small file is, and it is not necessary for the case I have, at
least, where the file is not modified or touched, and so its inode will
match the cache.
git-annex add was changed, when adding a small file, to remove the inode
cache for it. This is necessary to keep the recipe in
doc/tips/largefiles.mdwn for converting from annex to git working.
It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
which the earlier try at this change introduced.
2021-05-10 17:05:08 +00:00
|
|
|
{- Remove all inodes cached for a key. -}
|
2015-12-09 21:47:05 +00:00
|
|
|
removeInodeCaches :: Key -> Annex ()
|
2022-10-12 19:21:19 +00:00
|
|
|
removeInodeCaches = runWriterIO ContentTable . SQL.removeInodeCaches
|
2018-08-21 20:48:20 +00:00
|
|
|
|
smudge: check for known annexed inodes before checking annex.largefiles
smudge: Fix a case where an unlocked annexed file that annex.largefiles
does not match could get its unchanged content checked into git, due to git
running the smudge filter unecessarily.
When the file has the same inodecache as an already annexed file,
we can assume that the user is not intending to change how it's stored in
git.
Note that checkunchangedgitfile already handled the inverse case, where the
file was added to git previously. That goes further and actually sha1
hashes the new file and checks if it's the same hash in the index.
It would be possible to generate a key for the file and see if it's the
same as the old key, however that could be considerably more expensive than
sha1 of a small file is, and it is not necessary for the case I have, at
least, where the file is not modified or touched, and so its inode will
match the cache.
git-annex add was changed, when adding a small file, to remove the inode
cache for it. This is necessary to keep the recipe in
doc/tips/largefiles.mdwn for converting from annex to git working.
It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
which the earlier try at this change introduced.
2021-05-10 17:05:08 +00:00
|
|
|
{- Remove cached inodes, for any key. -}
|
|
|
|
removeInodeCache :: InodeCache -> Annex ()
|
2022-10-12 19:21:19 +00:00
|
|
|
removeInodeCache = runWriterIO ContentTable . SQL.removeInodeCache
|
smudge: check for known annexed inodes before checking annex.largefiles
smudge: Fix a case where an unlocked annexed file that annex.largefiles
does not match could get its unchanged content checked into git, due to git
running the smudge filter unecessarily.
When the file has the same inodecache as an already annexed file,
we can assume that the user is not intending to change how it's stored in
git.
Note that checkunchangedgitfile already handled the inverse case, where the
file was added to git previously. That goes further and actually sha1
hashes the new file and checks if it's the same hash in the index.
It would be possible to generate a key for the file and see if it's the
same as the old key, however that could be considerably more expensive than
sha1 of a small file is, and it is not necessary for the case I have, at
least, where the file is not modified or touched, and so its inode will
match the cache.
git-annex add was changed, when adding a small file, to remove the inode
cache for it. This is necessary to keep the recipe in
doc/tips/largefiles.mdwn for converting from annex to git working.
It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
which the earlier try at this change introduced.
2021-05-10 17:05:08 +00:00
|
|
|
|
2019-10-23 18:06:11 +00:00
|
|
|
isInodeKnown :: InodeCache -> SentinalStatus -> Annex Bool
|
2022-10-12 19:21:19 +00:00
|
|
|
isInodeKnown i s = or <$> runReaderIO ContentTable
|
|
|
|
((:[]) <$$> SQL.isInodeKnown i s)
|
2019-10-23 18:06:11 +00:00
|
|
|
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
{- Looks at staged changes to annexed files, and updates the keys database,
|
|
|
|
- so that its information is consistent with the state of the repository.
|
2018-08-21 20:48:20 +00:00
|
|
|
-
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
- This is run with a lock held, so only one process can be running this at
|
|
|
|
- a time.
|
2018-08-21 20:48:20 +00:00
|
|
|
-
|
2022-08-19 21:45:04 +00:00
|
|
|
- To avoid unnecessary work, the index file is statted, and if it's not
|
2018-08-21 20:48:20 +00:00
|
|
|
- changed since last time this was run, nothing is done.
|
2018-08-22 17:04:12 +00:00
|
|
|
-
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
- A tree is generated from the index, and the diff between that tree
|
|
|
|
- and the last processed tree is examined for changes.
|
2018-08-22 19:28:57 +00:00
|
|
|
-
|
|
|
|
- This also cleans up after a race between eg a git mv and git-annex
|
2021-06-08 14:43:48 +00:00
|
|
|
- get/drop/similar. If git moves a pointer file between this being run and the
|
|
|
|
- get/drop, the moved pointer file won't be updated for the get/drop.
|
2018-08-22 19:28:57 +00:00
|
|
|
- The next time this runs, it will see the staged change. It then checks
|
2021-06-08 14:43:48 +00:00
|
|
|
- if the pointer file needs to be updated to contain or not contain the
|
|
|
|
- annex content.
|
2018-08-22 19:28:57 +00:00
|
|
|
-
|
2021-06-08 14:43:48 +00:00
|
|
|
- Note: There is a situation where, after this has run, the database can
|
|
|
|
- still contain associated files that have been deleted from the index.
|
2021-05-24 16:05:35 +00:00
|
|
|
- That happens when addAssociatedFile is used to record a newly
|
|
|
|
- added file, but that file then gets removed from the index before
|
|
|
|
- this is run. Eg, "git-annex add foo; git rm foo"
|
|
|
|
- So when using getAssociatedFiles, have to make sure the file still
|
|
|
|
- is an associated file.
|
2018-08-21 20:48:20 +00:00
|
|
|
-}
|
2022-11-18 17:16:57 +00:00
|
|
|
reconcileStaged :: Bool -> H.DbQueue -> Annex DbTablesChanged
|
2023-02-14 18:11:23 +00:00
|
|
|
reconcileStaged dbisnew qh = ifM isBareRepo
|
2022-10-12 19:21:19 +00:00
|
|
|
( return mempty
|
|
|
|
, do
|
|
|
|
gitindex <- inRepo currentIndexFile
|
|
|
|
indexcache <- fromRawFilePath <$> calcRepo' gitAnnexKeysDbIndexCache
|
|
|
|
withTSDelta (liftIO . genInodeCache gitindex) >>= \case
|
|
|
|
Just cur -> readindexcache indexcache >>= \case
|
|
|
|
Nothing -> go cur indexcache =<< getindextree
|
|
|
|
Just prev -> ifM (compareInodeCaches prev cur)
|
|
|
|
( return mempty
|
|
|
|
, go cur indexcache =<< getindextree
|
|
|
|
)
|
|
|
|
Nothing -> return mempty
|
|
|
|
)
|
2018-08-21 20:48:20 +00:00
|
|
|
where
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
lastindexref = Ref "refs/annex/last-index"
|
|
|
|
|
2021-05-24 15:38:22 +00:00
|
|
|
readindexcache indexcache = liftIO $ maybe Nothing readInodeCache
|
|
|
|
<$> catchMaybeIO (readFile indexcache)
|
|
|
|
|
2021-05-24 15:33:23 +00:00
|
|
|
getoldtree = fromMaybe emptyTree <$> inRepo (Git.Ref.sha lastindexref)
|
2021-06-07 18:51:38 +00:00
|
|
|
|
2021-05-24 15:33:23 +00:00
|
|
|
go cur indexcache (Just newtree) = do
|
|
|
|
oldtree <- getoldtree
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
when (oldtree /= newtree) $ do
|
2021-06-08 15:57:23 +00:00
|
|
|
fastDebug "Database.Keys" "reconcileStaged start"
|
2021-06-07 18:51:38 +00:00
|
|
|
g <- Annex.gitRepo
|
2021-06-08 13:11:24 +00:00
|
|
|
void $ catstream $ \mdfeeder ->
|
|
|
|
void $ updatetodiff g
|
|
|
|
(Just (fromRef oldtree))
|
|
|
|
(fromRef newtree)
|
|
|
|
(procdiff mdfeeder)
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
liftIO $ writeFile indexcache $ showInodeCache cur
|
|
|
|
-- Storing the tree in a ref makes sure it does not
|
|
|
|
-- get garbage collected, and is available to diff
|
|
|
|
-- against next time.
|
|
|
|
inRepo $ update' lastindexref newtree
|
2021-06-08 15:57:23 +00:00
|
|
|
fastDebug "Database.Keys" "reconcileStaged end"
|
2022-10-12 19:21:19 +00:00
|
|
|
return (DbTablesChanged True True)
|
2021-05-24 15:33:23 +00:00
|
|
|
-- git write-tree will fail if the index is locked or when there is
|
|
|
|
-- a merge conflict. To get up-to-date with the current index,
|
2021-06-07 16:52:36 +00:00
|
|
|
-- diff --staged with the old index tree. The current index tree
|
2021-05-24 15:33:23 +00:00
|
|
|
-- is not known, so not recorded, and the inode cache is not updated,
|
|
|
|
-- so the next time git-annex runs, it will diff again, even
|
|
|
|
-- if the index is unchanged.
|
2021-06-07 16:52:36 +00:00
|
|
|
--
|
|
|
|
-- When there is a merge conflict, that will not see the new local
|
|
|
|
-- version of the files that are conflicted. So a second diff
|
|
|
|
-- is done, with --staged but no old tree.
|
2021-05-24 15:33:23 +00:00
|
|
|
go _ _ Nothing = do
|
2021-06-08 15:57:23 +00:00
|
|
|
fastDebug "Database.Keys" "reconcileStaged start (in conflict)"
|
2021-05-24 15:33:23 +00:00
|
|
|
oldtree <- getoldtree
|
2021-06-07 18:51:38 +00:00
|
|
|
g <- Annex.gitRepo
|
|
|
|
catstream $ \mdfeeder -> do
|
|
|
|
conflicted <- updatetodiff g
|
|
|
|
(Just (fromRef oldtree)) "--staged" (procdiff mdfeeder)
|
|
|
|
when conflicted $
|
|
|
|
void $ updatetodiff g Nothing "--staged"
|
|
|
|
(procmergeconflictdiff mdfeeder)
|
2021-06-08 15:57:23 +00:00
|
|
|
fastDebug "Database.Keys" "reconcileStaged end"
|
2022-10-12 19:21:19 +00:00
|
|
|
return (DbTablesChanged True True)
|
2021-06-07 16:52:36 +00:00
|
|
|
|
2021-06-07 18:51:38 +00:00
|
|
|
updatetodiff g old new processor = do
|
|
|
|
(l, cleanup) <- pipeNullSplit' (diff old new) g
|
|
|
|
processor l False
|
|
|
|
`finally` void cleanup
|
2018-08-22 17:04:12 +00:00
|
|
|
|
2021-05-24 19:31:06 +00:00
|
|
|
-- Avoid running smudge clean filter, which would block trying to
|
|
|
|
-- access the locked database. git write-tree sometimes calls it,
|
|
|
|
-- even though it is not adding work tree files to the index,
|
|
|
|
-- and so the filter cannot have an effect on the contents of the
|
|
|
|
-- index or on the tree that gets written from it.
|
|
|
|
getindextree = inRepo $ \r -> writeTreeQuiet $ r
|
|
|
|
{ gitGlobalOpts = gitGlobalOpts r ++ bypassSmudgeConfig }
|
|
|
|
|
2021-05-24 15:33:23 +00:00
|
|
|
diff old new =
|
2021-05-24 19:31:06 +00:00
|
|
|
-- Avoid running smudge clean filter, since we want the
|
|
|
|
-- raw output, and it would block trying to access the
|
2018-09-13 17:55:25 +00:00
|
|
|
-- locked database. The --raw normally avoids git diff
|
|
|
|
-- running them, but older versions of git need this.
|
2021-01-04 17:12:28 +00:00
|
|
|
bypassSmudgeConfig ++
|
|
|
|
-- Avoid using external diff command, which would be slow.
|
|
|
|
-- (The -G option may make it be used otherwise.)
|
|
|
|
[ Param "-c", Param "diff.external="
|
2018-08-21 20:48:20 +00:00
|
|
|
, Param "diff"
|
2021-06-07 16:52:36 +00:00
|
|
|
] ++ maybeToList (Param <$> old) ++
|
|
|
|
[ Param new
|
2018-08-21 20:48:20 +00:00
|
|
|
, Param "--raw"
|
|
|
|
, Param "-z"
|
2020-01-07 16:29:37 +00:00
|
|
|
, Param "--no-abbrev"
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
-- Optimization: Limit to pointer files and annex symlinks.
|
|
|
|
-- This is not perfect. A file could contain with this and not
|
|
|
|
-- be a pointer file. And a pointer file that is replaced with
|
|
|
|
-- a non-pointer file will match this. This is only a
|
|
|
|
-- prefilter so that's ok.
|
|
|
|
, Param $ "-G" ++ fromRawFilePath (toInternalGitPath $
|
2022-06-22 20:08:49 +00:00
|
|
|
P.pathSeparator `S.cons` objectDir)
|
2018-08-21 20:48:20 +00:00
|
|
|
-- Disable rename detection.
|
|
|
|
, Param "--no-renames"
|
|
|
|
-- Avoid other complications.
|
|
|
|
, Param "--ignore-submodules=all"
|
2022-02-01 17:43:18 +00:00
|
|
|
-- Avoid using external textconv command, which would be slow
|
|
|
|
-- and possibly wrong.
|
|
|
|
, Param "--no-textconv"
|
2018-08-21 20:48:20 +00:00
|
|
|
, Param "--no-ext-diff"
|
|
|
|
]
|
|
|
|
|
2021-06-07 18:51:38 +00:00
|
|
|
procdiff mdfeeder (info:file:rest) conflicted
|
2020-04-07 17:27:11 +00:00
|
|
|
| ":" `S.isPrefixOf` info = case S8.words info of
|
2021-06-01 15:24:15 +00:00
|
|
|
(_colonsrcmode:dstmode:srcsha:dstsha:status:[]) -> do
|
2021-06-07 16:52:36 +00:00
|
|
|
let conflicted' = status == "U"
|
2021-06-01 15:24:15 +00:00
|
|
|
-- avoid removing associated file when
|
|
|
|
-- there is a merge conflict
|
2021-06-07 18:51:38 +00:00
|
|
|
unless conflicted' $
|
|
|
|
send mdfeeder (Ref srcsha) $ \case
|
2021-06-01 15:24:15 +00:00
|
|
|
Just oldkey -> do
|
|
|
|
liftIO $ SQL.removeAssociatedFile oldkey
|
|
|
|
(asTopFilePath file)
|
|
|
|
(SQL.WriteHandle qh)
|
|
|
|
return True
|
|
|
|
Nothing -> return False
|
2021-06-07 18:51:38 +00:00
|
|
|
send mdfeeder (Ref dstsha) $ \case
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
Just key -> do
|
2022-11-18 17:16:57 +00:00
|
|
|
liftIO $ addassociatedfile key
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
(asTopFilePath file)
|
|
|
|
(SQL.WriteHandle qh)
|
|
|
|
when (dstmode /= fmtTreeItemType TreeSymlink) $
|
2021-06-08 13:27:53 +00:00
|
|
|
reconcilepointerfile (asTopFilePath file) key
|
include locked files in the keys database associated files
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 19:47:37 +00:00
|
|
|
return True
|
|
|
|
Nothing -> return False
|
2021-06-07 18:51:38 +00:00
|
|
|
procdiff mdfeeder rest
|
2021-06-07 16:52:36 +00:00
|
|
|
(conflicted || conflicted')
|
2021-06-07 18:51:38 +00:00
|
|
|
_ -> return conflicted -- parse failed
|
|
|
|
procdiff _ _ conflicted = return conflicted
|
|
|
|
|
2021-06-07 16:52:36 +00:00
|
|
|
-- Processing a diff --index when there is a merge conflict.
|
|
|
|
-- This diff will have the new local version of a file as the
|
|
|
|
-- first sha, and a null sha as the second sha, and we only
|
|
|
|
-- care about files that are in conflict.
|
2021-06-07 18:51:38 +00:00
|
|
|
procmergeconflictdiff mdfeeder (info:file:rest) conflicted
|
2021-06-07 16:52:36 +00:00
|
|
|
| ":" `S.isPrefixOf` info = case S8.words info of
|
|
|
|
(_colonmode:_mode:sha:_sha:status:[]) -> do
|
2021-06-07 18:51:38 +00:00
|
|
|
send mdfeeder (Ref sha) $ \case
|
2021-06-07 16:52:36 +00:00
|
|
|
Just key -> do
|
|
|
|
liftIO $ SQL.addAssociatedFile key
|
|
|
|
(asTopFilePath file)
|
|
|
|
(SQL.WriteHandle qh)
|
|
|
|
return True
|
|
|
|
Nothing -> return False
|
2021-06-07 18:51:38 +00:00
|
|
|
let conflicted' = status == "U"
|
|
|
|
procmergeconflictdiff mdfeeder rest
|
2021-06-07 16:52:36 +00:00
|
|
|
(conflicted || conflicted')
|
2021-06-07 18:51:38 +00:00
|
|
|
_ -> return conflicted -- parse failed
|
|
|
|
procmergeconflictdiff _ _ conflicted = return conflicted
|
2018-08-21 20:48:20 +00:00
|
|
|
|
2021-06-08 13:27:53 +00:00
|
|
|
reconcilepointerfile file key = do
|
2021-06-08 15:09:15 +00:00
|
|
|
ics <- liftIO $ SQL.getInodeCaches key (SQL.ReadHandle qh)
|
|
|
|
obj <- calcRepo (gitAnnexLocation key)
|
2021-07-27 18:21:09 +00:00
|
|
|
mobjic <- withTSDelta (liftIO . genInodeCache obj)
|
2021-07-27 21:34:56 +00:00
|
|
|
let addinodecaches k v = liftIO $
|
|
|
|
SQL.addInodeCaches k v (SQL.WriteHandle qh)
|
2021-07-27 18:21:09 +00:00
|
|
|
-- Like inAnnex, check the annex object is unmodified
|
2021-06-08 15:09:15 +00:00
|
|
|
-- when annex.thin is set.
|
|
|
|
keypopulated <- ifM (annexThin <$> Annex.getGitConfig)
|
2021-07-27 21:34:56 +00:00
|
|
|
( case mobjic of
|
|
|
|
Just objic -> isUnmodifiedLowLevel addinodecaches key obj objic ics
|
|
|
|
Nothing -> pure False
|
2021-07-27 18:21:09 +00:00
|
|
|
, pure (isJust mobjic)
|
2021-06-08 15:09:15 +00:00
|
|
|
)
|
2019-12-09 17:49:05 +00:00
|
|
|
p <- fromRepo $ fromTopFilePath file
|
2021-06-08 15:09:15 +00:00
|
|
|
filepopulated <- sameInodeCache p ics
|
2018-08-22 19:28:57 +00:00
|
|
|
case (keypopulated, filepopulated) of
|
|
|
|
(True, False) ->
|
2021-06-08 15:09:15 +00:00
|
|
|
populatePointerFile (Restage True) key obj p >>= \case
|
2018-08-22 19:28:57 +00:00
|
|
|
Nothing -> return ()
|
2021-07-27 21:34:56 +00:00
|
|
|
Just ic -> addinodecaches key
|
|
|
|
(catMaybes [Just ic, mobjic])
|
2018-08-22 19:28:57 +00:00
|
|
|
(False, True) -> depopulatePointerFile key p
|
|
|
|
_ -> return ()
|
2021-06-07 18:51:38 +00:00
|
|
|
|
|
|
|
send :: ((Maybe Key -> Annex a, Ref) -> IO ()) -> Ref -> (Maybe Key -> Annex a) -> IO ()
|
|
|
|
send feeder r withk = feeder (withk, r)
|
|
|
|
|
|
|
|
-- Streaming through git cat-file like this is significantly
|
|
|
|
-- faster than using catKey.
|
|
|
|
catstream a = do
|
|
|
|
g <- Annex.gitRepo
|
|
|
|
catObjectMetaDataStream g $ \mdfeeder mdcloser mdreader ->
|
|
|
|
catObjectStream g $ \catfeeder catcloser catreader -> do
|
|
|
|
feedt <- liftIO $ async $
|
|
|
|
a mdfeeder
|
|
|
|
`finally` void mdcloser
|
|
|
|
proct <- liftIO $ async $
|
|
|
|
procthread mdreader catfeeder
|
|
|
|
`finally` void catcloser
|
2021-06-08 16:48:30 +00:00
|
|
|
dbchanged <- dbwriter False largediff catreader
|
2021-06-07 18:51:38 +00:00
|
|
|
-- Flush database changes now
|
|
|
|
-- so other processes can see them.
|
|
|
|
when dbchanged $
|
|
|
|
liftIO $ H.flushDbQueue qh
|
|
|
|
() <- liftIO $ wait feedt
|
|
|
|
liftIO $ wait proct
|
|
|
|
return ()
|
|
|
|
where
|
|
|
|
procthread mdreader catfeeder = mdreader >>= \case
|
|
|
|
Just (ka, Just (sha, size, _type))
|
2022-02-23 16:54:40 +00:00
|
|
|
| size <= fromIntegral maxPointerSz -> do
|
2021-06-07 18:51:38 +00:00
|
|
|
() <- catfeeder (ka, sha)
|
|
|
|
procthread mdreader catfeeder
|
|
|
|
Just _ -> procthread mdreader catfeeder
|
|
|
|
Nothing -> return ()
|
|
|
|
|
2021-06-08 16:48:30 +00:00
|
|
|
dbwriter dbchanged n catreader = liftIO catreader >>= \case
|
2021-06-07 18:51:38 +00:00
|
|
|
Just (ka, content) -> do
|
|
|
|
changed <- ka (parseLinkTargetOrPointerLazy =<< content)
|
2021-06-08 16:48:30 +00:00
|
|
|
!n' <- countdownToMessage n
|
|
|
|
dbwriter (dbchanged || changed) n' catreader
|
2021-06-07 18:51:38 +00:00
|
|
|
Nothing -> return dbchanged
|
2021-06-08 16:48:30 +00:00
|
|
|
|
|
|
|
-- When the diff is large, the scan can take a while,
|
|
|
|
-- so let the user know what's going on.
|
|
|
|
countdownToMessage n
|
|
|
|
| n < 1 = return 0
|
|
|
|
| n == 1 = do
|
|
|
|
showSideAction "scanning for annexed files"
|
|
|
|
return 0
|
|
|
|
| otherwise = return (pred n)
|
|
|
|
|
|
|
|
-- How large is large? Too large and there will be a long
|
|
|
|
-- delay before the message is shown; too short and the message
|
|
|
|
-- will clutter things up unncessarily. It's uncommon for 1000
|
|
|
|
-- files to change in the index, and processing that many files
|
|
|
|
-- takes less than half a second, so that seems about right.
|
|
|
|
largediff :: Int
|
|
|
|
largediff = 1000
|
|
|
|
|
2022-11-18 17:16:57 +00:00
|
|
|
-- When the database is known to have been newly created and empty
|
|
|
|
-- before reconcileStaged started, it is more efficient to use
|
|
|
|
-- newAssociatedFile. It's safe to use it here because this is run
|
|
|
|
-- with a lock held that blocks any other process that opens the
|
|
|
|
-- database, and when the database is newly created, there is no
|
|
|
|
-- existing process that has it open already. And it's not possible
|
|
|
|
-- for reconcileStaged to call this twice on the same filename with
|
|
|
|
-- two different keys.
|
|
|
|
addassociatedfile
|
|
|
|
| dbisnew = SQL.newAssociatedFile
|
|
|
|
| otherwise = SQL.addAssociatedFile
|
|
|
|
|
2022-10-12 19:21:19 +00:00
|
|
|
{- Normally the keys database is updated incrementally when opened,
|
|
|
|
- by reconcileStaged. Calling this explicitly allows running the
|
|
|
|
- update at an earlier point.
|
|
|
|
-}
|
|
|
|
updateDatabase :: Annex ()
|
|
|
|
updateDatabase = runWriter ContentTable (const noop)
|