git-annex/Annex
Joey Hess f6aa097a39
avoid import writing to cidsdb initially
Speed up importing trees from special remotes somewhat by avoiding
redundant writes to sqlite database.

Before, import would write to both the git-annex branch and also to the
sqlite database. But then the next time it was run, needsUpdateFromLog
would see the branch had changed, so run updateFromLog, which would make
the same writes to the sqlite database a second time.

Now import writes only to the git-annex branch. The next time it's run,
needsUpdateFromLog sees that the branch has changed and so calls
updateFromLog, which updates the sqlite database.

Why defer the write to the sqlite database like this? It seems that it
could write to the database as it goes, and at the end call
recordAnnexBranchTree to indicate that the information in the git-annex
branch has all been written to the cidsdb. That would avoid the second
import doing extra work.

But, there could be other processes running at the same time, and one of
them may update the git-annex branch, eg merging a remote git-annex branch
into it. Any cids logs on that merged git-annex branch would not be
reflected in the cidsdb yet. If the import then called
recordAnnexBranchTree, the cidsdb would never get updated with that merged
information.

I don't think there's a good way to prevent, or to detect that situation.
So, it can't call recordAnnexBranchTree at the end. So it might as well
wait until the next run and do updateFromLog then. It could instead do
updateFromLog at the end, but it's going to check needsUpdateFromLog
at the beginning anyway.

Note that the database writes were queued, so there is already a cidmap
that is used to remember changes that the current process has made.
So, omitting database writes can't change the behavior of the current
process.

Also note that thirdpartypopulatedimport uses recordcidkeyindb, which
reflects what it already did. That code path does not use the cidmap,
but does not need to query it either. It might be possible to make that
code path also only update the git-annex branch and not the db, but I
haven't checked.

Sponsored-by: Noam Kremen on Patreon
2023-05-30 17:05:28 -04:00
..
AdjustedBranch filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
Branch handle transitions with read-only unmerged git-annex branches 2021-12-28 13:23:32 -04:00
Concurrent differentiate between concurrency enabled at command line and by git config 2020-09-16 11:47:12 -04:00
Content filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Debug implement fastDebug 2021-04-06 15:24:28 -04:00
LockPool avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
MetaData update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
SpecialRemote configremote 2023-04-18 15:30:49 -04:00
VectorClock deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
View annex.maxextensionlength for view 2023-03-24 14:01:38 -04:00
Action.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
AdjustedBranch.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
AutoMerge.hs Windows: Support long filenames in more (possibly all) of the code 2023-03-01 15:55:58 -04:00
BloomFilter.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Branch.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
BranchState.hs disable journalIgnorable in enableInteractiveBranchAccess 2022-07-15 13:48:41 -04:00
CatFile.hs read a consistent amount from pointer file 2022-02-23 12:52:34 -04:00
ChangedRefs.hs Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
CheckAttr.hs mincopies 2021-01-06 14:15:19 -04:00
CheckIgnore.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Common.hs rename Git.Filename to Git.Quote 2023-04-12 17:22:03 -04:00
Concurrent.hs use ResourcePool for hash-object handles 2022-07-25 17:32:39 -04:00
Content.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
CopyFile.hs Copy with a reflink when exporting a tree to a directory special remote 2023-03-28 13:09:14 -04:00
CurrentBranch.hs refactor getCurrentBranch 2018-10-19 17:29:18 -04:00
Debug.hs fix fastDebug to check if debugging is actually enabled 2021-04-06 16:28:37 -04:00
Difference.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
DirHashes.hs Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https. 2020-09-01 15:16:35 -04:00
Drop.hs prevent numcopies or mincopies being configured to 0 2022-03-28 15:20:34 -04:00
Environment.hs improve comments 2023-04-04 15:23:39 -04:00
Export.hs rename Git.Filename to Git.Quote 2023-04-12 17:22:03 -04:00
ExternalAddonProcess.hs use fastDebug everywhere it can be used 2021-04-06 15:41:24 -04:00
FileMatcher.hs Support "inbackend" in preferred content expressions 2022-09-26 16:06:49 -04:00
Fixup.hs fix a bug that prevented git-annex init from working in a submodule 2021-01-21 15:33:15 -04:00
GitOverlay.hs filter out control characters in error messages 2023-04-10 13:50:51 -04:00
HashObject.hs use ResourcePool for hash-object handles 2022-07-25 17:32:39 -04:00
Hook.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Import.hs avoid import writing to cidsdb initially 2023-05-30 17:05:28 -04:00
Ingest.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Init.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
InodeSentinal.hs fix perms for core.sharedRepository 2023-04-26 16:29:11 -04:00
Journal.hs fix perms for core.sharedRepository 2023-04-26 16:29:11 -04:00
Link.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Locations.hs Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
LockFile.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
LockPool.hs update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
Magic.hs Serialize use of C magic library, which is not thread safe. 2020-09-17 17:27:42 -04:00
MetaData.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Multicast.hs use programPath consistently, not readProgramFile 2020-03-30 16:06:27 -04:00
Notification.hs fix build when dbus is enabled 2022-07-05 13:06:45 -04:00
NumCopies.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
Path.hs Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Perms.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
PidLock.hs fix windows build 2022-09-26 12:08:04 -04:00
Queue.hs add restage log 2022-09-23 15:47:24 -04:00
RemoteTrackingBranch.hs refactor 2019-11-11 19:10:52 -04:00
ReplaceFile.hs improve createDirectoryUnder to allow alternate top directories 2022-08-12 12:52:37 -04:00
SpecialRemote.hs init: Avoid autoenabling special remotes that have control characters in their names 2023-04-12 12:37:12 -04:00
Ssh.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
StallDetection.hs bwlimit 2021-09-21 16:58:10 -04:00
TaggedPush.hs Ref ByteString conversion done 2020-04-07 17:41:09 -04:00
Tmp.hs Windows: Support long filenames in more (possibly all) of the code 2023-03-01 15:55:58 -04:00
Transfer.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
TransferrerPool.hs avoid build warning on windows 2023-03-27 12:19:26 -04:00
UntrustedFilePath.hs fix mojibake reversion in display of utf8 2023-04-12 13:53:30 -04:00
UpdateInstead.hs v7 for all repositories 2019-08-30 14:09:14 -04:00
Url.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
UUID.hs simplify and speed up Utility.FileSystemEncoding 2021-08-11 12:13:31 -04:00
VariantFile.hs more RawFilePath 2019-12-18 17:10:28 -04:00
VectorClock.hs deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
Verify.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
Version.hs v8 repositories automatically upgrade to v9 2022-07-25 16:20:04 -04:00
View.hs annex.maxextensionlength for view 2023-03-24 14:01:38 -04:00
Wanted.hs new matching options --want-get-by and --want-drop-by 2022-07-28 13:26:03 -04:00
WorkerPool.hs start splitting out readonly values from AnnexState 2021-04-02 15:51:44 -04:00
WorkTree.hs use lookupKeyStaged in --batch code paths 2022-10-26 14:43:06 -04:00
YoutubeDl.hs default to yt-dlp and fix progress parsing bugs 2023-05-27 13:04:53 -04:00