git-annex

Author	SHA1	Message	Date
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	c4aba8e032	better handling of finishing up incomplete incremental verify Now it's run in VerifyStage. I thought about keeping the file handle open, and resuming reading where tailVerify left off. But that risks leaking open file handles, until the GC closes them, if the deferred verification does not get resumed. Since that could perhaps happen if there's an exception somewhere, I decided that was too unsafe. Instead, re-open the file, seek, and resume. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 14:52:59 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	e07625df8a	convert tailVerify to not finalize the verification Added failIncremental so it can force failure to verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 13:39:02 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00
Joey Hess	4b1b9d7a83	Added annex.freezecontent-command and annex.thawcontent-command configs Freeze first sets the file perms, and then runs freezecontent-command. Thaw runs thawcontent-command before restoring file permissions. This is in case the freeze command prevents changing file perms, as eg setting a file immutable does. Also, changing file perms tends to mess up previously set ACLs. git-annex init's probe for crippled filesystem uses them, so if file perms don't work, but freezecontent-command manages to prevent write to a file, it won't treat the filesystem as crippled. When the the filesystem has been probed as crippled, the hooks are not used, because there seems to be no point then; git-annex won't be relying on locking annex objects down. Also, this avoids them being run when the file perms have not been changed, in case they somehow rely on git-annex's setting of the file perms in order to work. Sponsored-by: Dartmouth College's Datalad project	2021-06-21 14:40:52 -04:00
Joey Hess	7b6deb1109	display scanning message whenever reconcileStaged has enough files to chew on Clear visible progress bar first. Removed showSideActionAfter because it can't be used in reconcileStaged (import loop). Instead, it counts the number of files it processes and displays it after it's seen a sufficient to know it's taking a while. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 12:48:30 -04:00
Joey Hess	189fb05ffb	Added annex.adviceNoSshCaching config. Sponsored-by: Brock Spratlen on Patreon	2021-05-27 12:37:49 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	a166d2520b	check mincopies is satisfied even when numcopies is known to be satisfied I had been assuming that numcopies would be a larger or at most equal to mincopies, so no need to check both. But users get confused and use configs that don't really make sense, so make sure to handle mincopies being larger than numcopies. Also add something to the mincopies man page to discourage this misconfiguration. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-04-27 13:37:18 -04:00
Joey Hess	d3e49b210a	git-annex-config: Allow setting annex.securehashesonly Which has otherwise been supported since 2019, but was missing from the list of allowed repo-global configs. Reordered the list to match the order in the git-annex-config man page, to make them easy to cross-compare. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-04-26 13:50:37 -04:00
Joey Hess	32138b8cd8	implement annex.privateremote and remote.name.private configs The slightly unusual parsing in Types.GitConfig avoids the need to look at the remote list to get configs of remotes. annexPrivateRepos combines all the configs, and will only be calculated once, so it's nice and fast. privateUUIDsKnown and regardingPrivateUUID now need to read from the annex mvar, so are not entirely free. But that overhead can be optimised away, as seen in getJournalFileStale. The other call sites didn't seem worth optimising to save a single MVar access. The feature should have impreceptable speed overhead when not being used.	2021-04-23 14:21:57 -04:00
Joey Hess	d16d739ce2	implement fastDebug Most of the changes here involve global option parsing: GlobalSetter changed so it can both run an Annex action to set state, but can also change the AnnexRead value, which is immutable once the Annex monad is running. That allowed a debugselector value to be added to AnnexRead, seeded from the git config. The --debugfilter option's GlobalSetter then updates the AnnexRead. This improved GlobalSetter can later be used to move more stuff to AnnexRead. Things that don't involve a git config will be easier to move, and probably a lot of things can be moved eventually. fastDebug, while implemented, is not used anywhere yet. But it should be fast..	2021-04-06 15:24:28 -04:00
Joey Hess	1b645e1ace	added --debugfilter (and annex.debugfilter)	2021-04-05 15:31:10 -04:00
Joey Hess	798f685077	New annex.supportunlocked config Can beet to false to avoid some expensive things needed to support unlocked files. See my comment for why this only controls what init sets up, and not other behavior. I didn't bother with making the v5 upgrade code path look at this, though it easily could, because the docs say to run git-annex init after setting it to make it take effect.	2021-03-23 14:04:34 -04:00
Joey Hess	5d75cbcdcf	webdav: deal with buggy webdav servers in renameExport box.com already had a special case, since its renaming was known buggy. In its case, renaming to the temp file succeeds, but then renaming the temp file to final destination fails. Then this 4shared server has buggy handling of renames across directories. While already worked around with for the temp files when storing exports now being in the same directory as the final filename, that also affected renameExport when the file moves between directories. I'm not entirely clear what happens on the 4shared server when it fails this way. It kind of looks like it may rename the file to destination and then still fail. To handle both, when rename fails, delete both the source and the destination, and fall back to uploading the content again. In the box.com case, the temp file is the source, and deleting it makes sure the temp file gets cleaned up. In the 4shared case, the file may have been renamed to the destination and so cleaning that up avoids any interference with the re-upload to the destination.	2021-03-22 13:08:18 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	6481991208	export --json: Fill in the file field Like import was using ActionItemWorkTreeFile, it's ok to use it for export, even though it might not correspond with a file in the work tree. And renamed it to ActionItemTreeFile to make that clearer. Note that when an export has to rename files, it still uses ActionItemOther, so file will still be null in that case, but as no file is being transferred, that seems ok.	2021-03-12 14:11:31 -04:00
Joey Hess	cbf94fd13d	prep for fixing find --branch --unlocked Added LinkType to ProvidedInfo, and unified MatchingKey with ProvidedInfo. They're both used in the same way, so there was no real reason to keep separate. Note that addLocked and addUnlocked still set matchNeedsFileName, because to handle MatchingFile, they do need it. However, they don't use it when MatchingInfo is provided. This should be ok, the --branch case will be able skip checking matchNeedsFileName, since it will provide a filename in any case.	2021-03-02 13:39:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	ed684f651e	add incremental hashing interface to Backend As yet unused. Backend.External could perhaps implement it too, although that would involve sending chunks of data to it via a pipe or something, so likely to be slow.	2021-02-09 15:00:51 -04:00
Joey Hess	fa3d71d924	Tahoe: Avoid verifying hash after download, since tahoe does sufficient verification itself See my comment in the next commit for some details about why Verified needs a hash with preimage resistance. As far as tahoe goes, it's fully cryptographically secure. I think that bup could also return Verified. However, the Retriever interface does not currenly support that.	2021-02-09 13:42:16 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	135757d64a	automatic stall detection annex.stalldetection can now be set to "true" to make git-annex do automatic stall detection when it detects a remote is updating its transfer progress consistently enough. This commit was sponsored by Luke Shumaker on Patreon.	2021-02-03 13:33:57 -04:00
Joey Hess	aec2cf0abe	addon commands Seems only fair, that, like git runs git-annex, git-annex runs git-annex-foo. Implementation relies on O.forwardOptions, so that any options are passed through to the addon program. Note that this includes options before the subcommand, eg: git-annex -cx=y foo Unfortunately, git-annex eats the --help/-h options. This is because it uses O.hsubparser, which injects that option into each subcommand. Seems like this should be possible to avoid somehow, to let commands display their own --help, instead of the dummy one git-annex displays. The two step searching mirrors how git works, it makes finding git-annex-foo fast when "git annex foo" is run, but will also support fuzzy matching, once findAllAddonCommands gets implemented. This commit was sponsored by Dr. Land Raider on Patreon.	2021-02-02 16:32:49 -04:00
Joey Hess	5d2a7f7764	remove blank	2021-01-11 13:15:21 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	cefbfc678d	document what importKey returning Nothing does That was added for thirdpartypopulated remotes, but for others it also has the effect of skipping including the file in the imported tree.	2020-12-30 13:23:16 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	6280af2901	generate more compact git-annex branch for imports Especially from borg, where the content identifier logs all end up being the same identical file! But also, for other imports, the location tracking logs can, in some cases, be identical files. Bonus optimisation: Avoid looking up (and parsing when set) GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to. Although the lookup does happen at startup even when no log will be written now.	2020-12-23 15:25:16 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	c2d6f335a6	notes on ImportableContents history not being used for retrieval	2020-12-22 11:24:11 -04:00
Joey Hess	1c054f1cf7	started borg special remote Still need to implement 3 methods, but importKeyM looks like it will work well to find annex object files.	2020-12-18 16:56:54 -04:00
Joey Hess	3207e8293b	start borg special remote Compiles, but unusable so far.	2020-12-18 16:03:51 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	e81e43b829	improve comment	2020-12-17 13:12:52 -04:00
Joey Hess	170185fb78	improve docs	2020-12-17 12:32:41 -04:00
Joey Hess	6b13574827	Windows: include= and exclude= containing '/' will also match filenames that are written using '\' And vice-versa, but it's better to use '/' for portability. Notably, standardPreferredContent contains "archive/*" and that might not match if the filename ends up coming in with the slashes the other way around.	2020-12-15 12:39:34 -04:00
Joey Hess	5e094d02d6	avoid using MatchingKey where MatchingFile can be used now This is actually matching worktree files, and now that a Key can be provided along with the file when doing that, using MatchingFile reflects that.	2020-12-14 17:54:25 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	205a837e8a	clarify comment	2020-12-14 16:52:53 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	0c46ee5ce0	simplify transferr protocol	2020-12-11 12:52:22 -04:00
Joey Hess	095cdc7e83	extend transferrer protocol to send progress bar total size updates New protocol is not back-compat with old one, but it's never been released so that's ok.	2020-12-11 12:42:28 -04:00
Joey Hess	94b323a8e8	use TotalSize more extensively	2020-12-11 12:10:43 -04:00
Joey Hess	04c12aa6df	custom protocol for transferrer Rather than using Read/Show, which would force me to preserve data types into the future. I considered just deriving json and sending that, but I don't much like deriving json with data types that have named constructors (like Key does) because again it locks in data type details. So instead, used SimpleProtocol, with a fairly complex and unreadable protocol. But it is as efficient as the p2p protocol at least, and as future proof. (Writing my own custom json instances would have worked but I thought of it too late and don't want to do all the work twice. The only real benefit might be that aeson could be faster.) Note that, when a new protocol request type is added later, git-annex trying to use it will cause the git-annex transferrer to display a protocol error message. That seems ok; it would only happen if a new git-annex found an old version of itself in PATH or the program file. So it's unlikely, and all it can do anyway is display an error. (The error message could perhaps be improved..) This commit was sponsored by Jack Hill on Patreon.	2020-12-09 16:13:59 -04:00

1 2 3 4 5 ...

662 commits