git-annex/Annex
Joey Hess 6a3bd283b8
add restage log
When pointer files need to be restaged, they're first written to the
log, and then when the restage operation runs, it reads the log. This
way, if the git-annex process is interrupted before it can do the
restaging, a later git-annex process can do it.

Currently, this lets a git-annex get/drop command be interrupted and
then re-ran, and as long as it gets/drops additional files, it will
clean up after the interrupted command. But more changes are
needed to make it easier to restage after an interrupted process.

Kept using the git queue to run the restage action, even though the
list of files that it builds up for that action is not actually used by
the action. This could perhaps be simplified to make restaging a cleanup
action that gets registered, rather than using the git queue for it. But
I wasn't sure if that would cause visible behavior changes, when eg
dropping a large number of files, currently the git queue flushes
periodically, and so it restages incrementally, rather than all at the
end.

In restagePointerFiles, it reads the restage log twice, once to get
the number of files and size, and a second time to process it.
This seemed better than reading the whole file into memory, since
potentially a huge number of files could be in there. Probably the OS
will cache the file in memory and there will not be much performance
impact. It might be better to keep running tallies in another file
though. But updating that atomically with the log seems hard.

Also note that it's possible for calcRestageLog to see a different file
than streamRestageLog does. More files may be added to the log in
between. That is ok, it will only cause the filterprocessfaster heuristic to
operate with slightly out of date information, so it may make the wrong
choice for the files that got added and be a little slower than ideal.

Sponsored-by: Dartmouth College's DANDI project
2022-09-23 15:47:24 -04:00
..
AdjustedBranch Typo fix unncessary -> unnecessary. 2022-08-20 09:40:19 -04:00
Branch handle transitions with read-only unmerged git-annex branches 2021-12-28 13:23:32 -04:00
Concurrent differentiate between concurrency enabled at command line and by git config 2020-09-16 11:47:12 -04:00
Content Typo fix unncessary -> unnecessary. 2022-08-20 09:40:19 -04:00
Debug implement fastDebug 2021-04-06 15:24:28 -04:00
LockPool fine-grained locking when annex.pidlock is enabled 2021-12-03 17:20:21 -04:00
MetaData update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
SpecialRemote remove redundant imports 2020-06-22 11:05:34 -04:00
VectorClock deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
View Fix test suite failure on Windows 2021-08-24 14:03:29 -04:00
Action.hs fix cat-file leak in get with -J 2021-11-19 12:51:08 -04:00
AdjustedBranch.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
AutoMerge.hs support git 2.34.0's handling of merge conflict between annexed and non-annexed file 2021-11-22 16:10:24 -04:00
BloomFilter.hs Revert "data type that starts off using a set but converts to a bloom filter when large" 2020-07-01 20:12:19 -04:00
Branch.hs work around git segfault 2022-08-04 14:20:57 -04:00
BranchState.hs disable journalIgnorable in enableInteractiveBranchAccess 2022-07-15 13:48:41 -04:00
CatFile.hs read a consistent amount from pointer file 2022-02-23 12:52:34 -04:00
ChangedRefs.hs improve createDirectoryUnder to allow alternate top directories 2022-08-12 12:52:37 -04:00
CheckAttr.hs mincopies 2021-01-06 14:15:19 -04:00
CheckIgnore.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Common.hs add annex.dbdir (WIP) 2022-08-11 16:58:53 -04:00
Concurrent.hs use ResourcePool for hash-object handles 2022-07-25 17:32:39 -04:00
Content.hs use RawFilePath version of rename 2022-06-22 16:47:34 -04:00
CopyFile.hs incremental verification for retrieval from import remotes 2022-05-09 15:39:43 -04:00
CurrentBranch.hs refactor getCurrentBranch 2018-10-19 17:29:18 -04:00
Debug.hs fix fastDebug to check if debugging is actually enabled 2021-04-06 16:28:37 -04:00
Difference.hs include git-annex-shell back in 2019-12-02 11:51:52 -04:00
DirHashes.hs Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https. 2020-09-01 15:16:35 -04:00
Drop.hs prevent numcopies or mincopies being configured to 0 2022-03-28 15:20:34 -04:00
Environment.hs include git-annex-shell back in 2019-12-02 11:51:52 -04:00
Export.hs convert Key to ShortByteString 2021-10-05 20:20:08 -04:00
ExternalAddonProcess.hs use fastDebug everywhere it can be used 2021-04-06 15:41:24 -04:00
FileMatcher.hs prep for fixing find --branch --unlocked 2021-03-02 13:39:31 -04:00
Fixup.hs fix a bug that prevented git-annex init from working in a submodule 2021-01-21 15:33:15 -04:00
GitOverlay.hs add: Significantly speed up adding lots of non-large files to git 2021-01-04 13:12:28 -04:00
HashObject.hs use ResourcePool for hash-object handles 2022-07-25 17:32:39 -04:00
Hook.hs don't try to remove pre-commit-annex and post-update-annex-hooks 2020-10-19 13:13:49 -04:00
Import.hs change retrieveExportWithContentIdentifier to take a list of ContentIdentifier 2022-09-20 13:19:42 -04:00
Ingest.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Init.hs avoid redundant prompt for http password in git-annex get that does autoinit 2022-09-09 14:43:43 -04:00
InodeSentinal.hs add debugging in sameInodeCache 2021-07-26 10:58:07 -04:00
Journal.hs add annex.dbdir (WIP) 2022-08-11 16:58:53 -04:00
Link.hs add restage log 2022-09-23 15:47:24 -04:00
Locations.hs add restage log 2022-09-23 15:47:24 -04:00
LockFile.hs add annex.dbdir (WIP) 2022-08-11 16:58:53 -04:00
LockPool.hs update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
Magic.hs Serialize use of C magic library, which is not thread safe. 2020-09-17 17:27:42 -04:00
MetaData.hs fix error message 2021-12-09 15:25:59 -04:00
Multicast.hs use programPath consistently, not readProgramFile 2020-03-30 16:06:27 -04:00
Notification.hs fix build when dbus is enabled 2022-07-05 13:06:45 -04:00
NumCopies.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Path.hs assistant: Fix a crash on startup by avoiding using forkProcess 2021-05-12 15:08:03 -04:00
Perms.hs use a subdirectory of annex.dbdir 2022-08-12 13:18:15 -04:00
PidLock.hs add restage log 2022-09-23 15:47:24 -04:00
Queue.hs add restage log 2022-09-23 15:47:24 -04:00
RemoteTrackingBranch.hs refactor 2019-11-11 19:10:52 -04:00
ReplaceFile.hs improve createDirectoryUnder to allow alternate top directories 2022-08-12 12:52:37 -04:00
SpecialRemote.hs info: Added --autoenable option 2022-06-01 14:20:38 -04:00
Ssh.hs Added annex.adviceNoSshCaching config. 2021-05-27 12:37:49 -04:00
StallDetection.hs bwlimit 2021-09-21 16:58:10 -04:00
TaggedPush.hs Ref ByteString conversion done 2020-04-07 17:41:09 -04:00
Tmp.hs add annex.dbdir (WIP) 2022-08-11 16:58:53 -04:00
Transfer.hs add a comment about checkSaneLock 2021-10-27 14:55:30 -04:00
TransferrerPool.hs drain transferrer read handle when shutting it down 2022-09-22 14:39:39 -04:00
UntrustedFilePath.hs importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_' 2020-08-05 11:35:00 -04:00
UpdateInstead.hs v7 for all repositories 2019-08-30 14:09:14 -04:00
Url.hs don't force use of conduit in withUrlOptionsPromptingCreds 2022-09-09 16:07:32 -04:00
UUID.hs simplify and speed up Utility.FileSystemEncoding 2021-08-11 12:13:31 -04:00
VariantFile.hs more RawFilePath 2019-12-18 17:10:28 -04:00
VectorClock.hs deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
Verify.hs incremental verification for retrieval from all export remotes 2022-05-09 13:49:33 -04:00
Version.hs v8 repositories automatically upgrade to v9 2022-07-25 16:20:04 -04:00
View.hs turn of PackageImports in cabal file 2022-02-25 13:16:36 -04:00
Wanted.hs new matching options --want-get-by and --want-drop-by 2022-07-28 13:26:03 -04:00
WorkerPool.hs start splitting out readonly values from AnnexState 2021-04-02 15:51:44 -04:00
WorkTree.hs work around strange auto-init bug 2021-07-30 18:36:03 -04:00
YoutubeDl.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00