git-annex/Annex
Joey Hess ce455223df
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.

It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)

While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.

Sponsored-by: Dartmouth College's DANDI project
2022-07-18 13:22:50 -04:00
..
AdjustedBranch use RawFilePath version of rename 2022-06-22 16:47:34 -04:00
Branch handle transitions with read-only unmerged git-annex branches 2021-12-28 13:23:32 -04:00
Concurrent differentiate between concurrency enabled at command line and by git config 2020-09-16 11:47:12 -04:00
Content move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Debug implement fastDebug 2021-04-06 15:24:28 -04:00
LockPool fine-grained locking when annex.pidlock is enabled 2021-12-03 17:20:21 -04:00
MetaData update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
SpecialRemote remove redundant imports 2020-06-22 11:05:34 -04:00
VectorClock deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
View Fix test suite failure on Windows 2021-08-24 14:03:29 -04:00
Action.hs fix cat-file leak in get with -J 2021-11-19 12:51:08 -04:00
AdjustedBranch.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
AutoMerge.hs support git 2.34.0's handling of merge conflict between annexed and non-annexed file 2021-11-22 16:10:24 -04:00
BloomFilter.hs Revert "data type that starts off using a set but converts to a bloom filter when large" 2020-07-01 20:12:19 -04:00
Branch.hs split out appending to journal from writing, high level only 2022-07-18 13:22:50 -04:00
BranchState.hs disable journalIgnorable in enableInteractiveBranchAccess 2022-07-15 13:48:41 -04:00
CatFile.hs read a consistent amount from pointer file 2022-02-23 12:52:34 -04:00
ChangedRefs.hs more RawFilePath conversion 2020-10-29 14:20:57 -04:00
CheckAttr.hs mincopies 2021-01-06 14:15:19 -04:00
CheckIgnore.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Common.hs use fastDebug everywhere it can be used 2021-04-06 15:41:24 -04:00
Concurrent.hs remove unused import 2021-11-23 16:15:57 -04:00
Content.hs use RawFilePath version of rename 2022-06-22 16:47:34 -04:00
CopyFile.hs incremental verification for retrieval from import remotes 2022-05-09 15:39:43 -04:00
CurrentBranch.hs refactor getCurrentBranch 2018-10-19 17:29:18 -04:00
Debug.hs fix fastDebug to check if debugging is actually enabled 2021-04-06 16:28:37 -04:00
Difference.hs include git-annex-shell back in 2019-12-02 11:51:52 -04:00
DirHashes.hs Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https. 2020-09-01 15:16:35 -04:00
Drop.hs prevent numcopies or mincopies being configured to 0 2022-03-28 15:20:34 -04:00
Environment.hs include git-annex-shell back in 2019-12-02 11:51:52 -04:00
Export.hs convert Key to ShortByteString 2021-10-05 20:20:08 -04:00
ExternalAddonProcess.hs use fastDebug everywhere it can be used 2021-04-06 15:41:24 -04:00
FileMatcher.hs prep for fixing find --branch --unlocked 2021-03-02 13:39:31 -04:00
Fixup.hs fix a bug that prevented git-annex init from working in a submodule 2021-01-21 15:33:15 -04:00
GitOverlay.hs add: Significantly speed up adding lots of non-large files to git 2021-01-04 13:12:28 -04:00
HashObject.hs more RawFilePath conversion 2020-10-28 17:25:59 -04:00
Hook.hs don't try to remove pre-commit-annex and post-update-annex-hooks 2020-10-19 13:13:49 -04:00
Import.hs incremental verification for retrieval from import remotes 2022-05-09 15:39:43 -04:00
Ingest.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Init.hs make path absolute for display 2022-05-31 12:17:27 -04:00
InodeSentinal.hs add debugging in sameInodeCache 2021-07-26 10:58:07 -04:00
Journal.hs split out appending to journal from writing, high level only 2022-07-18 13:22:50 -04:00
Link.hs use RawFilePath version of rename 2022-06-22 16:47:34 -04:00
Locations.hs remove objectDir' 2022-06-22 16:08:49 -04:00
LockFile.hs more RawFilePath conversion 2020-10-29 10:50:29 -04:00
LockPool.hs update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
Magic.hs Serialize use of C magic library, which is not thread safe. 2020-09-17 17:27:42 -04:00
MetaData.hs fix error message 2021-12-09 15:25:59 -04:00
Multicast.hs use programPath consistently, not readProgramFile 2020-03-30 16:06:27 -04:00
Notification.hs fix build when dbus is enabled 2022-07-05 13:06:45 -04:00
NumCopies.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Path.hs assistant: Fix a crash on startup by avoiding using forkProcess 2021-05-12 15:08:03 -04:00
Perms.hs fix typo in comment 2022-05-23 12:53:55 -04:00
PidLock.hs close pid lock only once no threads use it 2021-12-06 15:01:39 -04:00
Queue.hs Avoid git status taking a long time after git-annex unlock of many files. 2022-02-18 15:06:40 -04:00
RemoteTrackingBranch.hs refactor 2019-11-11 19:10:52 -04:00
ReplaceFile.hs use RawFilePath version of rename 2022-06-22 16:47:34 -04:00
SpecialRemote.hs info: Added --autoenable option 2022-06-01 14:20:38 -04:00
Ssh.hs Added annex.adviceNoSshCaching config. 2021-05-27 12:37:49 -04:00
StallDetection.hs bwlimit 2021-09-21 16:58:10 -04:00
TaggedPush.hs Ref ByteString conversion done 2020-04-07 17:41:09 -04:00
Tmp.hs propagate signals to the transferrer process group 2020-12-11 15:32:00 -04:00
Transfer.hs add a comment about checkSaneLock 2021-10-27 14:55:30 -04:00
TransferrerPool.hs avoid using temp file size when deciding whether to retry failed transfer 2021-06-25 12:04:23 -04:00
UntrustedFilePath.hs importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_' 2020-08-05 11:35:00 -04:00
UpdateInstead.hs v7 for all repositories 2019-08-30 14:09:14 -04:00
Url.hs final readonly values moves to AnnexRead 2022-06-28 16:04:58 -04:00
UUID.hs simplify and speed up Utility.FileSystemEncoding 2021-08-11 12:13:31 -04:00
VariantFile.hs more RawFilePath 2019-12-18 17:10:28 -04:00
VectorClock.hs deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
Verify.hs incremental verification for retrieval from all export remotes 2022-05-09 13:49:33 -04:00
Version.hs have v9 autoupgrade to v10 2022-01-26 13:16:06 -04:00
View.hs turn of PackageImports in cabal file 2022-02-25 13:16:36 -04:00
Wanted.hs prevent dropping required content of other file using same content 2021-05-25 11:34:06 -04:00
WorkerPool.hs start splitting out readonly values from AnnexState 2021-04-02 15:51:44 -04:00
WorkTree.hs work around strange auto-init bug 2021-07-30 18:36:03 -04:00
YoutubeDl.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00