git-annex

Author	SHA1	Message	Date
Joey Hess	d4cb7afeed	remove unused Key parameter from isCryptographicallySecure This will allow using isCryptographicallySecure on a Backend, before a Key has been generated. Sponsored-by: Lawrence Brogan on Patreon	2023-03-27 14:34:00 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Yaroslav Halchenko	0ae5ff797f	Typo: sansative -> sensitive	2023-03-17 15:14:50 -04:00
Joey Hess	38e9ea8497	one-way escaping of newlines in uuid.log A repository can have a newline in its description due to being in a directory containing a newline, or due to git-annex describe being passed a string with a newline in it for some reason. Putting that newline in uuid.log breaks its format. So, escape the newline when it enters uuid.log, to \n This is a one-way escaping, it is not converted back to a newline when reading the log. If it were, commands like git-annex info and whereis would display a multi-line description, which could be confusing to read. And, implementing roundtripping would necessarily cause problems if an old version of git-annex were used to set a description that contained whatever special character is used to escape the \n. Eg, a \ or if it used the ! prefix before base64 data that is used in some other logs, the ! character. Then the description set by the old git-annex would not roundtrip. There just doesn't seem to be any benefit of roundtripping newlines through, so why bother? And, git often displays \n for newline when a filename contains a newline, so git-annex doing it in this case seems sorta ok by analogy to git. (Some other git-annex logs can also have newlines put into them if the user really wants to break git-annex. For example: git-annex config annex.largefiles "foo bar" The full list is probably config.log, remote.log, group.log, preferred-content.log, required-content.log, group-preferred-content.log, schedule.log. Probably there is no good reason to use a newline in any of these, and the breakage is probably limited to the bad data the user put in not coming back out. And users can write any garbage to log files themselves manually in any case. So, I am not going to address all of those at this time. If a problem such as this one with the newline in the repository path comes up, it can be dealt with on a case by case basis.) Sponsored-by: Dartmouth College's Datalad project	2023-03-13 14:19:32 -04:00
Joey Hess	aa0350ff49	add directory to views for files that lack specified metadata * view: New field?=glob and ?tag syntax that includes a directory "_" in the view for files that do not have the specified metadata set. * Added annex.viewunsetdirectory git config to change the name of the "_" directory in a view. When in a view using the new syntax, old git-annex will fail to parse the view log. It errors with "Not in a view.", which is not ideal. But that only affects view commands. annex.viewunsetdirectory is included in the View for a couple of reasons. One is to avoid needing to warn the user that it should not be changed when in a view, since that would confuse git-annex. Another reason is that it helped with plumbing the value through to some pure functions. annex.viewunsetdirectory is actually mangled the same as any other view directory. So if it's configured to something like "N/A", there won't be multiple levels of directories, which would also confuse git-annex. Sponsored-By: Jack Hill on Patreon	2023-02-07 16:28:46 -04:00
Joey Hess	579d9b60c1	improve concurrency of move/copy --from --to Use separate stages for download and upload. In the common case where it downloads the file from one remote and then uploads to the other, those are by far the most expensive operations, and there's a decent chance the two remotes bottleneck on different resources. Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads will be started both downloading from the src remote. They will probably finish at the same time. Then two threads will be started uploading to the dst remote. They will probably take the same time as well. Before this change, it would alternate back and forth, bottlenecking on src and dst. With this change, as soon as the two threads start uploading to dst, two more threads are able to start, downloading from src. So bandwidth to both remotes is saturated more often. Other commands that use transferStages only send in one direction at a time. So the worker threads for the other direction will sit idle, and there will be no change in their behavior. Sponsored-by: Dartmouth College's DANDI project	2023-01-24 13:59:39 -04:00
Joey Hess	2b5e6ff20a	test: Add --test-debug option This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.	2022-11-28 15:12:53 -04:00
Joey Hess	c2ad84b423	all keys are still present on versioned remote after import of a tree When importing from versioned remotes, fix tracking of the content of deleted files. Only S3 supports versioning so far, so only it was affected. But, the draft import/export interface for external remotes also seemed to need a change, so that versionedExport could be set.	2022-10-11 13:05:40 -04:00
Reiko Asakura	445aa0d93b	Fix annex.adviceNoSshCaching having no effect git will always return option names in lowercase	2022-09-30 14:03:06 -04:00
Joey Hess	f64eff9355	test: Added --test-with-git-config option Sponsored-by: Dartmouth College's DANDI project	2022-09-22 15:58:45 -04:00
Joey Hess	34e313f786	annex.diskreserve default increased from 1 mb to 100 mb It's hard to know what's a good default for this. But 1 mb seems way too small, because it's very easy for a git pull or some similar operation that we don't think of as using much space to use up 1 mb of space. Most people would want to free up some space if a filesystem only had 100 mb free. But on a small VPS, it's probably not uncommon to have only 1 gb free. So 1 gb is too large for annex.diskreserve. While old 1 gb USB keys are around, it's unlikely that anyone is relying on them to shuttle annex data around; it would be worth anyone's time to upgrade to a 32 gb or larger cheap modern USB key ($5). Sponsored-by: Kevin Mueller on Patreon	2022-09-21 15:00:13 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	4cfe17a9e8	use a subdirectory of annex.dbdir This allows annex.dbdir to be set globally or always set to the same value when needed. Each repository uses a subdirectory of it. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 13:18:15 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	3a513cfe73	add --dry-run: New option This is intended for users who want to see what it would output in order to eg, check if a file would be added to git or the annex. It is not intended as a way for scripts to get information. Sponsored-by: Dartmouth College's Datalad project	2022-08-03 11:16:04 -04:00
Joey Hess	36f0bdcd57	add annex.alwayscompact Added annex.alwayscompact setting which can be unset to speed up writes to the git-annex branch in some cases. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 16:39:19 -04:00
Joey Hess	b223988e22	remove --backend from global options --backend is no longer a global option, and is only accepted by commands that actually need it. Three commands that used to support backend but don't any longer are watch, webapp, and assistant. It would be possible to make them support it, but I doubt anyone used the option with these. And in the case of webapp and assistant, the option was handled inconsistently, only taking affect when the command is run with an existing git-annex repo, not when it creates a new one. Also, renamed GlobalOption etc to AnnexOption. Because there are many options of this type that are not actually global (any more) and get added to commands that need them. Sponsored-by: Kevin Mueller on Patreon	2022-06-29 13:33:25 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	280d41b58f	Fix a build failure with ghc 9.2.2 Thanks, gnezdo for the patch.	2022-05-02 14:21:48 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00
Joey Hess	025c18128b	test: Added --jobs option Default to the number of CPU cores, which seems about optimal on my laptop. Using one more saves me 2 seconds actually. Better packing of workers improves speed significantly. In 2 tests runs, I saw segfaulting workers despite my attempt to work around that issue. So detect when a worker does, and re-run it. Removed installSignalHandlers again, because I was seeing an error "lost signal due to full pipe", which I guess was somehow caused by using it. Sponsored-by: Dartmouth College's Datalad project	2022-03-16 14:42:07 -04:00
Joey Hess	cbd138e042	factor out Utility.Aeson.textKey	2022-03-02 18:24:06 -04:00
sternenseemann	ca596e7c54	allow building with aeson >= 2.0 In aeson 2.0, Text has been replaced by the Key type and HashMap by the KeyMap interface. Accomodating this required adding some CPP in order to still be able to compile with aeson < 2.0. The required changes were: * Prevent Key from being re-exported by Utilities.Aeson, as it clashes with git-annex's own Key type. * Fix up convertion from String/Text to Key (or Text in aeson 1.) in a couple of places Import Data.Aeson.KeyMap instead of Data.HashMap.Strict, as they are mostly API-compatible. insertWith needs to be replaced by unionWith, however, as KeyMap lacks the former function.	2022-03-02 18:01:41 -04:00
Joey Hess	07215cfeb5	complete annex.skipunknown transition annex.skipunknown now defaults to false, so commands like `git annex get foo*` will not silently skip over files/dirs that are not checked into git. Sponsored-by: Brock Spratlen on Patreon	2022-02-18 13:18:05 -04:00
Joey Hess	856ce5cf5f	split upgrade into v9 and v10 v10 will run 1 year after the upgrade to v9, to give time for any v8 processes to die. Until that point, the v10 upgrade will be tried by every process but deferred, so added support for deferring upgrades. The upgrade prevention lock file that will be used by v10 is not yet implemented, so it does not yet defer. Sponsored-by: Dartmouth College's Datalad project	2022-01-19 13:09:33 -04:00
Joey Hess	b1d719f9d2	handle transitions with read-only unmerged git-annex branches Capstone to this feature. Any transitions that have been performed on an unmerged remote ref but not on the local git-annex branch, or vice-versa have to be applied on the fly when reading files. Sponsored-by: Dartmouth College's Datalad project	2021-12-28 13:23:32 -04:00
Joey Hess	6d7ecd9e5d	merge git-annex branch in memory in read-only repository Improved support for using git-annex in a read-only repository, git-annex branch information from remotes that cannot be merged into the git-annex branch will now not crash it, but will be merged in memory. To avoid this making git-annex behave one way in a read-only repository, and another way when it can write, it's important that Annex.Branch.get return the same thing (modulo log file compaction) in both cases. This manages that mostly. There are some exceptions: - When there is a transition in one of the remote git-annex branches that has not yet been applied to the local or other git-annex branches. Transitions are not handled. - `git-annex log` runs git log on the git-annex branch, and so it will not be able to show information coming from the other, not yet merged branches. - Annex.Branch.files only looks at files in the git-annex branch and not unmerged branches. This affects git-annex info output. - Annex.Branch.hs.overBranchFileContents ditto. Affects --all and also importfeed (but importfeed cannot work in a read-only repo anyway). - CmdLine.Seek.seekFilteredKeys when precaching location logs. Note use of Annex.Branch.fullname - Database.ContentIdentifier.needsUpdateFromLog and updateFromLog These warts make this not suitable to be merged yet. This readonly code path is more expensive, since it has to query several branches. The value does get cached, but still large queries will be slower in a read-only repository when there are unmerged git-annex branches. When annex.merge-annex-branches=false, updateTo skips doing anything, and so the read-only repository code does not get triggered. So a user who is bothered by the extra work can set that. Other writes to the repository can still result in permissions errors. This includes the initial creation of the git-annex branch, and of course any writes to the git-annex branch. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 13:21:15 -04:00
Joey Hess	15d617f7e1	have setConcurrency stop any running git coprocesses When non-concurrent git coprocesses have been started, setConcurrency used to not stop them, and so could leak processes when enabling concurrency, eg when forkState is called. I do not think that ever actually happened, given where setConcurrency is called. And it probably would only leak one of each process, since it never downgrades from concurrent to non-concurrent.	2021-11-19 12:00:39 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	620813b889	minor optimisation	2021-10-05 21:26:11 -04:00
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	45dfddd33f	convert ExportLocation to ShortByteString to avoid PINNED memory fragmentation This adds the overhead of a copy whenever converting to/from ExportLocation and ImportLocation. borg: Some improvements to memory use when importing a lot of archives. (It's still pretty bad.) Sponsored-by: Mark Reidenbach on Patreon	2021-10-05 14:51:55 -04:00
Joey Hess	e47b4badb3	separate handles for cat-file and cat-file --batch-check This avoids starting one process when only the other one is needed. Eg in git-annex smudge --clean, this reduces the total number of cat-file processes that are started from 4 to 2. The only performance penalty is that when both are needed, it has to do twice as much work to maintain the two Maps. But both are very small, consisting of 1 or 2 items, so that work is negligible. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 13:16:13 -04:00
Joey Hess	e8496d62e4	improved bwrate limiting implementation New method is much better. Avoids unrestrained transfer at the beginning (except for the first block. Keeps right at or a few kb/s below the configured limit, with very little varation in the actual reported bandwidth. Removed the /s part of the config as it's not needed. Ready to merge. Sponsored-by: Luke Shumaker on Patreon	2021-09-22 15:27:16 -04:00
Joey Hess	798b33ba3d	simplify annex.bwlimit handling RemoteGitConfig parsing looks for annex.bwlimit when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Svenne Krap on Patreon	2021-09-22 10:52:01 -04:00
Joey Hess	05a097cde8	Merge branch 'master' into bwlimit	2021-09-22 10:48:27 -04:00
Joey Hess	4fef94d764	simplify annex.stalldetection handling RemoteGitConfig parsing looks for annex.stalldetection when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Noam Kremen on Patreon	2021-09-22 10:46:10 -04:00
Joey Hess	55b405a965	fix remote git config vs global git config order Bug fix: Git configs such as annex.verify were incorrectly overriding per-remote git configs such as remote.name.annex-verify. This dates all the way back to 2013, commit `8a5b397ac4`, where hlint apparently somehow confused me into parsing in the wrong order. Before that it was correct. Amazing noone has noticed until now. Sponsored-by: Kevin Mueller on Patreon	2021-09-22 10:41:56 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	6d4a728455	Added annex.youtube-dl-command config This can be used to run some forks of youtube-dl. Sponsored-by: Brett Eisenberg on Patreon	2021-08-27 09:44:23 -04:00
Joey Hess	449851225a	refactor IncrementalVerifier moved to Utility.Hash, which will let Utility.Url use it later. It's perhaps not really specific to hashing, but making a separate module just for the data type seemed unncessary. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 13:19:02 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	c4aba8e032	better handling of finishing up incomplete incremental verify Now it's run in VerifyStage. I thought about keeping the file handle open, and resuming reading where tailVerify left off. But that risks leaking open file handles, until the GC closes them, if the deferred verification does not get resumed. Since that could perhaps happen if there's an exception somewhere, I decided that was too unsafe. Instead, re-open the file, seek, and resume. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 14:52:59 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	e07625df8a	convert tailVerify to not finalize the verification Added failIncremental so it can force failure to verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 13:39:02 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00

1 2 3 4 5 ...

705 commits