git-annex

Author	SHA1	Message	Date
Joey Hess	eb594c710e	unregisterurl: New command Implemented by generalizing registerurl. Without the implicit batch mode of registerurl since that is only a backwards compatability thing (see commit `1d1054faa6`).	2021-03-01 14:28:24 -04:00
Joey Hess	97ae474585	registerurl: Allow it to be used in a bare repository.	2021-03-01 14:03:03 -04:00
Joey Hess	a8b627d82b	uninit: Fix a small bug that left a lock file in .git/annex unannex using git queue caused the queue lock to be taken after uninit had cleaned out .git/annex. Flush the queue earlier to avoid.	2021-03-01 13:05:47 -04:00
Joey Hess	530e96b80e	fix unannex data overwrite bug unannex, uninit: When an annexed file is modified, don't overwrite the modified version with an older version from the annex This commit was sponsored by Mark Reidenbach on Patreon.	2021-02-22 13:35:00 -04:00
Joey Hess	62d5a73bdd	unannex, uninit: Avoid running git rm once per annexed file, for a large speedup.	2021-02-22 12:56:11 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	1b63132ca3	add searchPathContents And rename related functions for consistency.	2021-02-02 19:06:15 -04:00
Joey Hess	8d4eb2d34e	get: Improve output when failing to get a file fails showTriedRemotes lists the remotes it tried to access. So there's no need to list those again in "Try making some of these remotes available".	2021-01-29 15:11:19 -04:00
Joey Hess	6f78497572	When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked Missed this when implementing it because of the default case catching the new constructor. So, removed that default case to make sure future types of adjusted branches don't make the same mistake. Complicated by git-annex addurl --fast which adds the file whose content is not present, so it needs to stay unlocked when on such a branch. This commit was sponsored by Brock Spratlen on Patreon.	2021-01-28 12:47:46 -04:00
Joey Hess	d4aac64282	fix breakage caused by recent commit `34a535ebea` broke the test suite. Getting a file started failing in one case, because the annex object did not have its inode cached, so was not trusted to be unmodified. This adds something very similar to what was added to linkAnnex in commit `2e9341a47d` -- if there are not yet any inodes cached for a key, add the inode of the annex object when adding the inode of the unlocked file. Feels like this should be handled in a more principled way. How do we know the addInodeCaches call in getMoveRaceRecovery just above this change is currently correct? It doesn't add the annex object inode cache. Ah well, maybe sometime when I've not had my entire evening eaten by a reversion that the test suite caught as I was cooking dinner.	2021-01-25 21:22:18 -04:00
Joey Hess	47338bf270	support modifying and running git add on an unlocked file that used an URL key Avoids the smudge --clean filter failing because URL keys do not support genKey. Instead the modified content will be added using the default backend. This commit was sponsored by Jochen Bartl on Patreon.	2021-01-25 17:37:16 -04:00
Joey Hess	34a535ebea	adjust: Fix some bad behavior when unlocked files use URL keys. This avoids the smudge --clean filter failing on the URL keys. git checkout runs the post-checkout hook, which runs smudge --update. That populates all the pointer files, but it neglected to store their inode caches in the keys db. With that done, and the keys db flushed before smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap check can tell the file is not modified, so will not try to re-ingest it, which does not work with URL keys because they do not support genKey. It also seems possible that the isUnmodifiedCheap was also failing for non-URL keys, which would cause them to be re-ingested, leading to a lot of extra work. I have not verified that, but don't see why it wouldn't have happened. So this probably also speeds up checking out adjusted branches. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-01-25 17:25:42 -04:00
Joey Hess	6a30d04ece	Bug fix: export with -J could fail when two files had the same content. Exporting is done inside a call to writeLockDbWhile which guarantees there is only one process uploading to a given ExportLocation.	2021-01-13 14:50:48 -04:00
Joey Hess	09b0562ec3	test: avoid unnecessary tests of variants of git remote Configuring chunking and encryption for a git remote has no effect, so skip testing those variants in the TestRemote call. It would be better if TestRemote itself could do this, but it doesn't seem possible there. There is no way to look at a Remote and tell if it supports chunking or encryption. Note that, while the test suite displays output as it it's testing exporting, it actually skips doing anything for the tests when run on the git remote. So at least does not waste time even though the output is not ideal. This commit was sponsored by Noam Kremen on Patreon.	2021-01-11 13:43:55 -04:00
Joey Hess	8db09feeba	fix format of message newlines are eaten	2021-01-11 13:14:09 -04:00
Joey Hess	6a0030a110	Behavior change: git-annex trust now needs --force Since unconsidered use of trusted repositories can lead to data loss. Trusted has always been this way, but it used to be acceptable for git-annex to be set up so that data could be lost without using --force, and most or all other ways that can happen have already been eliminated. This commit was sponsored by Mark Reidenbach on Patreon.	2021-01-07 10:09:39 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	5ce61c6b2a	add: Significantly speed up adding lots of non-large files to git * add: Significantly speed up adding lots of non-large files to git, by disabling the annex smudge filter when running git add. * add --force-small: Run git add rather than updating the index itself, so any other smudge filters than the annex one that may be enabled will be used.	2021-01-04 13:12:28 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	6280af2901	generate more compact git-annex branch for imports Especially from borg, where the content identifier logs all end up being the same identical file! But also, for other imports, the location tracking logs can, in some cases, be identical files. Bonus optimisation: Avoid looking up (and parsing when set) GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to. Although the lookup does happen at startup even when no log will be written now.	2020-12-23 15:25:16 -04:00
Joey Hess	7916fc98a3	graft in imported tree to avoid gc Fix a bug that could prevent getting files from an importtree=yes remote, because the imported tree was allowed to be garbage collected.	2020-12-23 14:27:38 -04:00
Joey Hess	1574972ba9	make sync --content get from third-party populated remotes like borg	2020-12-23 12:10:39 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	15000dee07	improve thirdpartypopulated support May actually work now. Note that, importKey now has to add the size to the key if it's supposed to have size. Remote.Directory relied on the importer adding the size, which is no longer done, so it was changed; it was the only one. This way, importKey does not need to behave differently between regular and thirdpartypopulated imports.	2020-12-21 16:19:44 -04:00
Joey Hess	57b03630b3	support thirdPartyPopulated These don't have importTree in their config, because they don't support tree import, but they do still support import, and do not support export or key/value modification.	2020-12-21 13:49:47 -04:00
Joey Hess	771b6c64f0	Merge branch 'master' into borg	2020-12-18 16:05:09 -04:00
Joey Hess	e0062c4f93	build fix	2020-12-18 16:04:56 -04:00
Joey Hess	909318dcee	Merge branch 'master' into borg	2020-12-18 15:27:24 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	f62aee0525	fix handling of importtree-only remotes Don't want to try to use these remotes as key/value remotes, which will surely fail. It only recently became possible for importtree to be set w/o exporttree, so before this code was ok. (cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)	2020-12-18 15:13:30 -04:00
Joey Hess	53fd1564b1	improve synopsis	2020-12-17 12:51:49 -04:00
Joey Hess	2abda21123	update	2020-12-15 16:35:06 -04:00
Joey Hess	f29d49d478	check Remote.hasKeyCheap again In `cd1676d604`, it stopped using that to avoid surprising behavior when the location log and remote content were out of sync. But, it seems that may have changed some behavior users relied on as well, and also Remote.hasKeyCheap should be faster than checking then location log. So, try Remote.hasKeyCheap first, and only if it does not have the key, fall back to checking the location log. If the location log still thinks it's present, go ahead and try to get it, so the user will see a failure rather than silently skipping a file what whereis says is on the remote. This does make slightly slower the case where the remote does not have the key, and location log and Remote.hasKeyCheap agree, since it now checks both. But only 1 stat slower.	2020-12-15 14:44:00 -04:00
Joey Hess	00526a6739	pass along -c options to child git-annex processes	2020-12-15 10:49:29 -04:00
Joey Hess	ed68a2166d	importfeed: Avoid using youtube-dl when a feed does not contain an enclosure, but only a link to an url which youtube-dl does not support This is common in some feeds, which might mix some items with enclosures, with others that link to posts or whatever. Before this, it would try to use youtube-dl and fail, or if youtube-dl was not allowed, it would incorrectly complain that an url was supported by youtube-dl.	2020-12-15 01:13:21 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	4a8723246d	avoid transferrer committing the git-annex branch on shutdown The parent is will do it when it shuts down, and having both of them trying to do it at the same time seems like something good to avoid.	2020-12-11 16:16:07 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	cedad7b37d	refactor	2020-12-10 16:33:52 -04:00
Joey Hess	04c12aa6df	custom protocol for transferrer Rather than using Read/Show, which would force me to preserve data types into the future. I considered just deriving json and sending that, but I don't much like deriving json with data types that have named constructors (like Key does) because again it locks in data type details. So instead, used SimpleProtocol, with a fairly complex and unreadable protocol. But it is as efficient as the p2p protocol at least, and as future proof. (Writing my own custom json instances would have worked but I thought of it too late and don't want to do all the work twice. The only real benefit might be that aeson could be faster.) Note that, when a new protocol request type is added later, git-annex trying to use it will cause the git-annex transferrer to display a protocol error message. That seems ok; it would only happen if a new git-annex found an old version of itself in PATH or the program file. So it's unlikely, and all it can do anyway is display an error. (The error message could perhaps be improved..) This commit was sponsored by Jack Hill on Patreon.	2020-12-09 16:13:59 -04:00
Joey Hess	004a4f5fb1	factor out Types.Transferrer	2020-12-09 13:28:49 -04:00
Joey Hess	677003a6df	rename helper More consistent name with TransferrerPool	2020-12-09 13:24:24 -04:00
Joey Hess	05c0543e8e	move new interface to git-annex transfer This is to avoid breakage when upgrading or downgrading git-annex with a process running that uses the interface. It's better to keep the compatability code for a few years than worry about such breakage. This commit was sponsored by Brett Eisenberg on Patreon.	2020-12-09 12:33:56 -04:00
Joey Hess	fcc9e01556	finally using transferkeys Seems to work! Even progress bars. Have not tested prompting or various error message displays yet. transferkeys had to be made to operate in different modes for the Assistant and Annex monads. A bit ugly, but it did relegate that really ugly Database.Keys.closeDb in transferkeys to only the assistant code path. This commit was sponsored by Noam Kremen.	2020-12-07 16:18:26 -04:00
Joey Hess	4c47568876	refactoring This is groundwork for using git-annex transferkeys to run transfers, in order to allow stalled transfers to be interrupted and retried. The new upload and download are closer to what git-annex transferkeys does, so the plan is to make them use it. Then things that were left using upload' and download' won't recover from stalls. Notably, that includes import and export. But at least get/move/copy will be able to. (Also the assistant hopefully, but not yet.) This commit was sponsored by Jake Vosloo on Patreon.	2020-12-07 14:49:17 -04:00
Joey Hess	438d5be1f7	support prompt in message serialization That seems to be the last thing needed for message serialization. Although it's only used in the assistant currently, so hard to tell if I forgot something. At this point, it should be possible to start using transferkeys when performing transfers, which will allow killing a transferkeys process if a transfer times out or stalls. But that's for another day. This commit was sponsored by Ethan Aubin.	2020-12-04 14:54:09 -04:00
Joey Hess	7a9b618d5d	fix problem with last commit and assistant liftAnnex blocks all others calls, so avoid using it with a long-duration call to readResponse.	2020-12-04 12:20:04 -04:00
Joey Hess	cad147cbbf	new protocol for transferkeys, with message serialization Necessarily threw out the old protocol, so if an old git-annex assistant is running, and starts a transferkeys from the new git-annex, it would fail. But, that seems unlikely; the assistant starts up transferkeys processes and then keeps them running. Still, may need to test that scenario. The new protocol is simple read/show and looks like this: TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo")) TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9})) TransferOutput (OutputMessage "(checksum...) ") TransferResult True Granted, this is not optimally fast, but it seems good enough, and is probably nearly as fast as the old protocol anyhow. emitSerializedOutput for ProgressMeter is not yet implemented. It needs to somehow start or update a progress meter. There may need to be a new message that allocates a progress meter, and then have ProgressMeter update it. This commit was sponsored by Ethan Aubin	2020-12-03 16:21:20 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	631c8d3e5b	avoid redundant adjusted branch update in sync sync still does update it if the config would otherwise not, since it already did.	2020-11-16 15:13:48 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	26cf26caca	Merge branch 'master' into symlink-missing	2020-11-16 10:03:12 -04:00
Joey Hess	5a8d01f63e	examinekey: Added a "file" format variable For consistency with find, and for easier scripting.	2020-11-16 09:59:11 -04:00
Joey Hess	ccfa9b2dc4	make sync update --unlock-present branch	2020-11-13 15:04:34 -04:00
Joey Hess	e66b7d2e1b	rename to --unlock-present and better reverse adjusting An --unlock-present branch reverses back to a branch where all files that get modified or renamed become locked, even if they were originally unlocked. This is the same that reversing a --unlock branch works, and the new name makes that commonality more clear.	2020-11-13 14:56:43 -04:00
Joey Hess	3899e216af	Merge branch 'master' into symlink-missing	2020-11-13 14:19:45 -04:00
Joey Hess	a30030c4a6	move: Fix a regression in the last release that made move --to not honor numcopies settings This commit was sponsored by Svenne Krap on Patreon.	2020-11-13 14:19:32 -04:00
Joey Hess	c8e49c5ef5	git-annex adjust --lock-missing Like --hide-missing the branch does not get updated when content availability changes. Seems to basically work, but sync does not update it yet. Also, when a file is present and so unlocked, git mv followed by git-annex sync results in the basis branch being updated to contain the file with the new name, unlocked. This seems different than what happens in an adjusted unlocked branch, where the commit propigates back locked. Probably the reverse adjustment code needs to be improved to handle this case.	2020-11-13 13:39:44 -04:00
Joey Hess	7566aa6bc5	examinekey: Added --migrate-to-backend Note that, the way the SeekInput parser is written to support batch mode, it's actually possible to do git-annex examinekey "SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E While that might be kind of useful to support multiple migrations not using batch mode, I have not documented it. It would be better to take pairs of key and file in that case.	2020-11-12 14:09:14 -04:00
Joey Hess	12e32d1dee	examinekey: Added two new format variables: objectpath and objectpointer	2020-11-12 13:02:31 -04:00
Joey Hess	92b7b1964d	add warning on add of annex link Warn when adding a annex symlink or pointer file that uses a key that is not known to the repository, to prevent confusion if the user has copied it from some other repository. This commit was sponsored by Jake Vosloo on Patreon.	2020-11-10 12:10:51 -04:00
Joey Hess	e81bb05b25	add debug in two unusual situations	2020-11-09 17:52:06 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	4bcb4030a5	more RawFilePath conversion 580/645 This commit was sponsored by Jack Hill on Patreon.	2020-11-03 18:34:27 -04:00
Joey Hess	eb42cd4d46	more RawFilePath conversion 535/645 This commit was sponsored by Brett Eisenberg on Patreon.	2020-11-03 10:11:04 -04:00
Joey Hess	55400a03d3	more RawFilePath conversion This commit was sponsored by Luke Shumaker on Patreon.	2020-11-02 16:31:28 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	a108b00b33	testremote: Display exceptions when tests fail, to aid debugging	2020-10-23 15:41:57 -04:00
Joey Hess	0133b7e5a8	move: Improve resuming a move that was interrupted after the object was transferred In cases where numcopies checks prevented the resumed move from dropping the object from the source repository, it now relies on a log of recent moves to replicate the behavior of the interrupted command. Performance: Probably noticable impact, since it has to add to the log, check the log, and remove from the log. Seems worth it to avoid this annoying edge case. The log functions are pretty well optimised to avoid unncessary work. An performance improvement to make later would be to avoid cleanup doing anything if it's not written to the log file, and has confirmed that the log file does not contain the log line. This commit was sponsored by Jake Vosloo on Patreon.	2020-10-21 10:31:56 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	2dd38b6403	switch to Haskell2010 When I put in Haskell98 this spring, I was under the mistaken apprehension that ghc defaulted to that. But it actually its default is a third mode, which is closer to Haskell2010 but with some differences. The manual says "By default, GHC mainly aims to behave (mostly) like a Haskell 2010 compiler" Fixed two cases where the Haskell98 do indentation flexability let wrongly indented code build. That is one of the places where ghc does not behave like Haskell2010 by default. The other place that I think I was concerned about, is GHC manual section 19.1.1.3. Expressions and patterns. But that only seems to affect code using bottoms, so would only affect pure functions throwing an error, which I don't think git-annex does in many places as it's pretty horrid style. And it would only affect rare cases like shown in that section. If it did happen, it would mean that the error was not thrown before specifying Haskell98, and then was. Haskell2010 behaves the same as Haskell98. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-10-19 11:26:16 -04:00
Joey Hess	c56efbbdb6	import: Check gitignores when importing trees from special remotes It seemed best to do this, for consistency with every other way files can get into a git-annex repo. Although it's just a bit strange that a local .gitignore file affects the pseudo-commits made for the remote that's imported from. This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-30 10:41:59 -04:00
Joey Hess	0033e08193	avoid a second traversal of the ImportableContents Do all filtering in one pass.	2020-09-30 10:10:03 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	1610d94776	addurl: Avoid a redundant git ignores check for speed Ensure that checkCanAdd is used everywhere a file is added to git, so git add is run with -f, presumably avoiding the work it would usually do to check ignores.	2020-09-29 13:00:41 -04:00
Joey Hess	658ea7ca3c	sync --no-content import from directory special remote sync: When run without --content, import without copying from importtree=yes directory special remotes. (Other special remotes may support this later as well.) This commit was sponsored by Svenne Krap on Patreon.	2020-09-28 15:29:08 -04:00
Joey Hess	3eaaec3113	consistently use importKey when available This avoids import with --no-content and with --content potentially generating two different trees, leading to a merge conflict when run in two different clones of a repo. And it's necessary groundwork to make git-annex sync --no-content import from special remotes that support importKey. Only the directory special remote currently supports importKey, and it generates the same key as git-annex usually does, so there is no behavior change for it. Future special remotes will need to take care when adding importKey, if it generates different keys. Added some warnings about that to comments. This commit was sponsored by Noam Kremen on Patreon.	2020-09-28 15:27:46 -04:00
Joey Hess	8b74f01a26	split ProvidedInfo and UserProvidedInfo The latter is for git-annex matchexpression and matching against it can throw an exception. Splitting out the former reduces the potential for mistakes and avoids needing to worry about matching against that throwing an exception. This is more groundwork for matching largefiles while importing, without downloading content. This commit was sponsored by Graham Spencer on Patreon.	2020-09-28 12:12:38 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	f624876dc2	remove zombie process in file seeking This was the last one marked as a zombie. There might be others I don't know about, but except for in the hypothetical case of a thread dying due to an async exception before it can wait on a process it started, I don't know of any. It would probably be safe to remove the reapZombies now, but let's wait and so that in its own commit in case it turns out to cause problems. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-25 11:38:42 -04:00
Joey Hess	ca454c47f2	explicitly wait for a git process Eliminate a zombie that was only cleaned up by the later zombie cleanup code. This is still not ideal, it would be cleaner if it used conduit or something, and if the thread gets killed before waiting, it won't stop the process. Only remaining zombies are in CmdLine.Seek	2020-09-25 11:03:12 -04:00
Joey Hess	051e16a945	remove debug print	2020-09-24 15:37:39 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	b45b37b088	wait for first pass to complete before second pass Otherwise the bloom filter may not be fully populated when the second pass starts, which could have led to incorrect behavior with --all -J, probably in very rare circumstances.	2020-09-24 14:23:25 -04:00
Joey Hess	167da965b9	remove obsolete comment	2020-09-24 14:22:56 -04:00
Joey Hess	c1b4d76e6b	make MatchFiles introspectable matchNeedsFileContent is not used yet, but shows how to add information about terminals. That one would be needed for https://git-annex.branchable.com/todo/sync_fast_import/ Note the tricky bit in Annex.FileMatcher.call where it folds over the included matcher to propagate the information. This commit was sponsored by Svenne Krap on Patreon.	2020-09-24 14:01:53 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	3457b526ef	make git-annex add --no-check-gitignore not skip ignored files, same as with --force	2020-09-18 13:33:35 -04:00
Joey Hess	d0b06c17c0	Added --no-check-gitignore option for finer grained control than using --force. add, addurl, importfeed, import: Added --no-check-gitignore option for finer grained control than using --force. (--force is used for too many different things, and at least one of these also uses it for something else. I would like to reduce --force's footprint until it only forces drops or a few other data losses. For now, --force still disables checking ignores too.) addunused: Don't check .gitignores when adding files. This is a behavior change, but I justify it by analogy with git add of a gitignored file adding it, asking to add all unused files back should add them all back, not skip some. The old behavior was surprising. In Command.Lock and Command.ReKey, CheckGitIgnore False does not change behavior, it only makes explicit what is done. Since these commands are run on annexed files, the file is already checked into git, so git add won't check ignores.	2020-09-18 13:19:13 -04:00
Joey Hess	fcf5d11c63	add "input" field to json output The use case of this field is mostly to support -J combined with --json. When that is implemented, a user will be able to look at the field to determine which of the requests they have sent it corresponds to. The field typically has a single value in its list, but in some cases mutliple values (eg 2 command-line params) are combined together and the list will have more. Note that json parsing was already non-strict, so old git-annex metadata --json --batch can be fed json produced by the new git-annex and will not stumble over the new field.	2020-09-15 16:22:44 -04:00
Joey Hess	2a3c2b1843	use Branch.name instead of hard coding the branch name Makes much more clear why ActionItemOther is being passed "git-annex".	2020-09-15 15:47:22 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	f4c4b89aa3	refactor Make all calls to git merge go through autoMergeFrom, in preparation for fine-tuning git merge's config for automatic merge conflict resolution. This commit was sponsored by Ryan Newton on Patreon.	2020-09-07 13:26:16 -04:00
Joey Hess	46eb48d7c0	Retry transfers to exporttree=yes remotes same as for other remotes The comment about noRetry is not well-justified, because transfers to many remotes cannot be resumed, but retries are still allowed for those.	2020-09-04 13:24:08 -04:00
Joey Hess	7bdb0cdc0d	add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess Fixes reversion in 8.20200617 that made annex.pidlock being enabled result in some commands stalling, particularly those needing to autoinit. Renamed runsGitAnnexChildProcess to make clearer where it should be used. Arguably, it would be better to have a way to make any process git-annex runs have the env var set. But then it would need to take the pid lock when running any and all processes, and that would be a problem when git-annex runs two processes concurrently. So, I'm left doing it ad-hoc in places where git-annex really does run a child process, directly or indirectly via a particular git command.	2020-08-25 14:57:49 -04:00
Joey Hess	2ca1ff62dc	addurl --file youtube-dl reversion fix addurl: Fix reversion in 7.20190322 that made --file not be honored when youtube-dl was used to download media. `8758f9c561` was on the right track, but missed that \| otherwise prevented the code it added from being used. Also, refactored out a common function. This commit was sponsored by Graham Spencer on Patreon.	2020-08-25 12:56:45 -04:00
Joey Hess	4c58433c48	avoid using MonadFail in ParseDuration There's no instance for Either String, so that makes it not as useful as it could be, so instead just return an Either String.	2020-08-15 15:53:35 -04:00
Joey Hess	5d380c6c5c	when workTreeItems finds a problem with a parameter, don't go on to process it Part of workTreeItems is trying detect a case where git porcelain refuses to process a file, and where git ls-files silently outputs nothing. But, it's hard to perfectly replicate git's behavior, and besides, git's behavior could change. So it could be that we warn, but then git ls-files does not skip over it, and so git-annex also processes it after warning about it. So, if we think we have a problem with a parameter, display the warning, and skip processing it at all. Implementing this was complicated by needing to handle the case where all command-line parameters get filtered out this way. Which is different than the case where there are none, because we don't want to operate on all files in this new case..	2020-08-06 13:47:45 -04:00
Joey Hess	283d2f85d1	importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_' sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was running it on parts of the template. So eg the leading '.' in the extension got sanitized. Note the added case for sanitizeLeadingFilePathCharacter ('/':_) -- this was added because, if the template is title/episode and the title is not set, it would expand to "/episode". So this is another potential security fix.	2020-08-05 11:35:00 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	c30fd24d91	add back inAnnex check after seeking The test suite noticed this case, where two files with the same key are dropped, and the seek stage sees both have content due to the way files stream through it. But then locking the content to drop fails on the second file, because the first file has already been dropped. So, add back otherwise redundant inAnnex check.	2020-07-25 11:18:50 -04:00
Joey Hess	18f1fb5841	drop performance improvements Sped up seeking files to drop by 2x, and also some performance improvements to checking numcopies. Interestingly, the seek speedup is not due to precaching, but I think is due to calling getParsed earlier. Annex.Drop had to be changed to check inAnnex there, since it was removed from Command.Drop. All other users of Command.Drop already checked inAnnex themselves. This commit was sponsored by Ryan Newton on Patreon.	2020-07-24 13:27:46 -04:00
Joey Hess	a01aa214be	enable location log precaching for mirror It will be some perf increase, but the command is not much used so I have not bothered to benchmark it.	2020-07-24 13:19:24 -04:00
Joey Hess	d732ef1a89	move, copy: Sped up seeking for annexed files to operate on by a factor of nearly 2x.	2020-07-24 12:56:02 -04:00
Joey Hess	00865cdae8	Fix a bug in find --branch in the previous version inAnnex check was lost for that code path. To avoid more such mistakes, made withKeyOptions check it when the AnnexedFileSeeker specifies.	2020-07-24 12:05:28 -04:00
Joey Hess	2d771a7d32	add back inAnnex check for keys options Lost in recent commit.	2020-07-24 11:49:15 -04:00
Joey Hess	4685612f43	small git-annex get speedup Remove an redundant inAnnex check. The checkContentPresent handles that, and after the last commit also does in batch mode.	2020-07-22 14:29:30 -04:00
Joey Hess	1be92381ec	unify batch mode with non-batch by using AnnexedFileSeeker	2020-07-22 14:23:28 -04:00
Joey Hess	abd56fb019	Fix a bug in find --batch in the previous version.	2020-07-20 19:50:53 -04:00
Joey Hess	c4cc2cdf4c	rename getKey to genKey for consistency with external backend protocol	2020-07-20 14:06:05 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	a7156b875c	fix fsck reversion `75aab72d23` made fsck skip files whose content is not present, but it should complain if there are not enough copies.	2020-07-15 11:21:43 -04:00
Joey Hess	9c23f99d45	add back missing check that content is present Lost in `75aab72d23` and some related commits. unannex skips files whose content is not present.	2020-07-15 11:15:28 -04:00
Joey Hess	377866d884	remove unused import	2020-07-14 14:37:40 -04:00
Joey Hess	7b2d236556	importfeed: stream metadata for 5% speedup On top of the 10% speedup from streaming url logs.	2020-07-14 14:35:26 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	df58609804	convert sync to use seekFilteredKeys This only speeds up sync --content from 34.75 to 33.17 seconds; location log precaching will probably be a bigger win.	2020-07-13 15:02:52 -04:00
Joey Hess	88a7fb5cbb	convert all applicable commands to new 2x faster annexed file seeking This removes all calls to inAnnex, except for some involving --batch. It may be that the batch code could get a similar speedup, but I don't know if people habitually pass a huge number of files through --batch that git-annex does not need to do anything to process, so I skipped it for now. A few calls to ifAnnexed remain, and might be worth doing more to convert. In particular, Command.Sync has one that would probably speed it up by a good amount. (also removed some dead code from Command.Lock)	2020-07-10 15:45:38 -04:00
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	4c9ad1de46	optimisation: stream keys through git cat-file --buffer This is only implemented for git-annex get so far. It makes git-annex get nearly twice as fast in a repo with 10k files, all of them present! But, see the TODO for some caveats.	2020-07-10 13:54:52 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	4229713e63	importfeed: Added some additional --template variables for date and time This commit was sponsored by Ethan Aubin.	2020-06-24 14:24:50 -04:00
Joey Hess	7757c0e900	Honor annex.largefiles when importing a tree from a special remote. This commit was sponsored by Martin D on Patreon.	2020-06-23 16:07:18 -04:00
Joey Hess	5098236c6b	testremote: Fix over-allocation of resources and bad caching Including starting up a large number of external special remote processes. (Regression introduced in version 8.20200501)	2020-06-22 14:25:49 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	d5451afc8f	fix deadlock Fix a deadlock that could occur after git-annex got an unlocked file, causing the command to hang indefinitely. Known to happen on vfat filesystems, possibly others. Note that a deadlock is still theoretically possible, if anything smudge --clean does causes it to run the git queue for some other reason. Apparently that doesn't happen, but will need to keep an eye on it.	2020-06-18 12:56:29 -04:00
Joey Hess	96f6aa39dd	add runsGitAnnexChildProcess calls This is all the calls to git-annex that seem capable of possibly locking the same pidlock as their parent. Except possibly for some in the assistant.	2020-06-17 15:31:03 -04:00
Joey Hess	c4f2c56f5e	checkpresentkey: fix behavior to match documentation checkpresentkey: When no remote is specified, try all remotes, not only ones that the location log says contain the key. This is what the documentation has always said it did. Still try the logged remotes first, because they are far more likely to have the key.	2020-06-16 13:54:26 -04:00
Joey Hess	2670890b17	convert to withCreateProcess for async exception safety This handles all createProcessSuccess callers, and aside from process pools, the complete conversion of all process running to async exception safety should be complete now. Also, was able to remove from Utility.Process the old API that I now know was not a good idea. And proof it was bad: The code size went down, despite there being a fair bit of boilerplate for some future API to reduce.	2020-06-04 15:45:52 -04:00
Joey Hess	92f775eba0	convert to withCreateProcess for async exception safety Not yet 100% done, so far I've grepped for waitForProcess and converted everything that uses that to start the process with withCreateProcess. Except for some things like P2P.IO and Assistant.TransferrerPool, and Utility.CoProcess, that manage a pool of processes. See #2 in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7 for how those will need to be dealt with. checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so callers of them will also need to be dealt with, and have not been yet.	2020-06-03 15:48:09 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	484a74f073	auto-init autoenable=yes Try to enable special remotes configured with autoenable=yes when git-annex auto-initialization happens in a new clone of an existing repo. Previously, git-annex init had to be explicitly run to enable them. That was a bit of a wart of a special case for users to need to keep in mind. Special remotes cannot display anything when autoenabled this way, to avoid interfering with the output of git-annex query commands. Any error messages will be hidden, and if it fails, nothing is displayed. The user will realize the remote isn't enable when they try to use it, and can run git-annex init manually then to try the autoenable again and see what failed. That seems like a reasonable approach, and it's less complicated than communicating something across a pipe in order to display it as a side message. Other reason not to do that is that, if the first command the user runs is one like git-annex find that has machine readable output, any message about autoenable failing would need to not be displayed anyway. So better to not display a failure message ever, for consistency. (Had to split out Remote.List.Util to avoid an import cycle.)	2020-05-27 12:40:35 -04:00
Joey Hess	3824645368	change to new waitForAllRunningCommandActions waitForAllRunningCommandActions is a subset of finishCommandActions and more appropriate for what is being done here: Just a concurrency barrier.	2020-05-26 14:00:51 -04:00
Joey Hess	864ba4ecaa	disable buggy concurrency in Command.Export Fix a crash or potentially not all files being exported when sync -J --content is used with an export remote. Crash as described in fixed bug report. waitForAllRunningCommandActions inserted in several points where all the commandActions started before need to have finished before moving on to the next stage of the export. A race across those points could have maybe resulted in not all files being exported, or a wrong tree being export. For example, changeExport starting up an action like a rename of A to B. Then, with that action still running, fillExport uploading a new A, before the rename occurred. That race seems unlikely to have happened. There are some other ones that this also fixes.	2020-05-26 13:54:08 -04:00
Joey Hess	e04a931439	improve transfer stages for some commands move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup actions in separate job pool from uploads. transferStages was confusingly named, it's only useful when doing downloads as then the verify actions can be run concurrently with other downloads. For commands that upload, there will be more concurrency from running cleanup actions in a separate job pool. As for sync, I left it using downloadStages although that's not optimal for the part of a sync that uploads. Perhaps it should use the union of both?	2020-05-26 11:55:50 -04:00
Joey Hess	0d82a88742	drop: use commandStages, not transferStages I cannot find any rationalle for why this was changed before. drop certianly does not do any transfers, so commandStages will perform better.	2020-05-26 11:47:54 -04:00
Joey Hess	0bcecb67f5	export: Let concurrent transfers be done with -J or annex.jobs Tested working, although I did find this bug in my testing, which also afflicts sync -J to an export remote.	2020-05-26 11:44:07 -04:00
Joey Hess	f7fe71602c	import: Added --json-progress Already supported --json, but not that. Also checked all other commands that only support --json, and the only other one that does transfers is fsck (--from), which it did not seem worth adding --json-progress to really.	2020-05-26 11:27:47 -04:00
Joey Hess	5b8524e1e6	addurl: Make --preserve-filename also apply when eg a torrent contains multiple files Forgot to remove sanitizeFilePath after adding sanitizeOrPreserveFilePath here.	2020-05-26 10:45:57 -04:00
Joey Hess	fc9833f68d	export: Added options for json output Just worked, no need to do anything except add the options.	2020-05-26 10:31:10 -04:00
Joey Hess	d7c7245438	whereis: Added --format option. One way this can be used is to remove all urls for some website that went away: git-annex whereis --format '${file} ${url}\0' \| \ grep -z whatever.com \| git-annex rmurl --batch -z Combining ${url} and ${uuid} is a bit of a combinatorial explosion. It didn't seem worth only outputting a uuid alongside an url belonging to it, so each uuid is output beside each url.	2020-05-19 16:20:56 -04:00
Joey Hess	6361074174	convert renameExport to throw exception Finishes the transition to make remote methods throw exceptions, rather than silently hide them. A bit on the fence about this one, because when renameExport fails, it falls back to deleting instead, and so does the user care why it failed? However, it did let me clean up several places in the code. This commit was sponsored by Ethan Aubin.	2020-05-15 15:08:09 -04:00
Joey Hess	037440ef36	convert removeExportDirectory to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 14:43:18 -04:00
Joey Hess	cdbfaae706	change removeExport to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Graham Spencer on Patreon.	2020-05-15 14:15:14 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	4814b444dd	make storeExport throw exceptions	2020-05-15 12:20:02 -04:00
Joey Hess	dc7dc1e179	refactor	2020-05-14 14:21:58 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	39d7e6dd2a	addurl --preserve-filename for other remotes Finishing work begun in `6952060665` Also, truncate filenames provided by other remotes if they're too long, when --preserve-filename is not used. That seems to have been omitted before by accident.	2020-05-11 14:33:27 -04:00
Joey Hess	5f5170b22b	remove SafeFilePath Move sanitizeFilePath call to where fromSafeFilePath had been.	2020-05-11 14:04:56 -04:00
Thomas Koch	8a0480daf3	Fix haddock parse error I run haddock with `cabal haddock --executables`. It fails with: Types/Remote.hs:271:17: error: parse error on input ‘->’ Apparently haddock does not like to find haddock blocks outside of declarations? In any case, this patch makes these type of errors go away. Afterwards, I see errors like these, that need to be investigated as a next step: haddock: internal error: internal: extractDecl CallStack (from HasCallStack): error, called at utils/haddock/haddock-api/src/Haddock/Interface/Create.hs:1116:12 in main:Haddock.Interface.Create	2020-05-11 08:40:13 +02:00
Joey Hess	6952060665	addurl --preserve-filename and a few related changes * addurl --preserve-filename: New option, uses server-provided filename without any sanitization, but with some security checking. Not yet implemented for remotes other than the web. * addurl, importfeed: Avoid adding filenames with leading '.', instead it will be replaced with '_'. This might be considered a security fix, but a CVE seems unwattanted. It was possible for addurl to create a dotfile, which could change behavior of some program. It was also possible for a web server to say the file name was ".git" or "foo/.git". That would not overrwrite the .git directory, but would cause addurl to fail; of course git won't add "foo/.git". sanitizeFilePath is too opinionated to remain in Utility, so moved it. The changes to mkSafeFilePath are because it used sanitizeFilePath. In particular: isDrive will never succeed, because "c:" gets munged to "c_" ".." gets sanitized now ".git" gets sanitized now It will never be null, because sanitizeFilePath keeps the length the same, and splitDirectories never returns a null path. Also, on the off chance a web server suggests a filename of "", ignore that, rather than trying to save to such a filename, which would fail in some way.	2020-05-08 16:22:55 -04:00
Joey Hess	0040d2c129	sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from Now the warning gets displayed, which is better than an arcane git error. The warning is still kind of ugly, especially when the pull later in the sync will clear up what it warns about. But, this is an unusual situation not likely to happen, and if there is no remote to pull from, the warning message is needed or the sync will seem to succeed despite not merging the synced master branch. Would still be better if it could merge the synced master branch in this situation, making an empty commit to master to do it seems wrong, and otherwise it would need a whole separate code path, and would bypass using git merge in favor of say, setting master to the syned branch. Which would bypass git configs like arguably merge.ff and certianly merge.verifySignatures. So don't want to do that.	2020-05-05 14:31:37 -04:00
Joey Hess	9fa940569c	added remote variants Todo item is done at last. Might later want to think about testing some other types of remotes that can be tested locally. The git remote itself is probably already well enough tested by the test suite that testremote is not needed. Could test things like bup, or rsync to a local directory. Or even external, although that would require embedding an external special remote program into the test suite..	2020-04-30 13:52:03 -04:00
Joey Hess	fc1ae62ef1	added export remote tests	2020-04-30 13:13:08 -04:00
Joey Hess	735d2e90df	testremote in test is working Not yet testing export, or remote variants, but it already adds several hundred test cases, so big win.	2020-04-30 12:59:20 -04:00
Joey Hess	d7db481471	wip This does not compile, and I hit a bad dead end. Wah.	2020-04-29 15:48:39 -04:00
Joey Hess	20f954c3b2	groundwork for adding testremote to git-annex test Factored out a mkTestTree, which can be used to get a TestTree, w/o needing to first run any annex actions, which the main test suite cannot do because it does not operate in an annex repo to start with, and it needs to start testing before a repo is available.	2020-04-29 13:16:43 -04:00
Joey Hess	fa98025de0	fix testremote to not throw away annex state `aeca7c2207` exposed this problem, but it was never a good idea to have a series of test cases, some of which depend on prior ones, and throw away annex state after each.	2020-04-28 17:19:07 -04:00
Joey Hess	19b5137227	addurl --fast error message improvement addurl: When run with --fast on an url that annex.security.allowed-ip-addresses prevents accessing, display a more useful message. (Also importfeed --fast potentially.)	2020-04-27 13:48:14 -04:00
Joey Hess	c05c4e549e	sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes Do not sync with a faster remote that was not specified. That old behavior was only documented in the changelog, and was certianly surprising. It also meant adding --fast made it slower..	2020-04-23 16:08:45 -04:00
Joey Hess	cd1676d604	fix bug involving local git remote and out of date location log get --from, move --from: When used with a local git remote, these used to silently skip files that the location log thought were present on the remote, when the remote actually no longer contained them. Since that behavior could be surprising, now instead display a warning. I got very confused when I encountered this behavior, since it was silently skipping a file I needed that whereis said was on the remote. get without --from already displayed a "unable to access these remotes" message, which while a bit misleading in that the remote is likely accessible, but just doesn't contain the file, at least indicated something went wrong. Having get --from display a warning makes it in line with get w/o --from, so seems certianly ok. It might be there are situations where move --from is used, on eg a whole directory, and the user only wants to move whatever is present in the remote, and is perfectly ok with files that are not present being skipped. So I'm less sure about the new warning being ok there. OTOH, only local git remotes avoiding displaying a warning in that case too, so this just brings them into line with other remotes. (Also note that this makes it a little bit faster when dealing with a lot of files, since it avoids a redundant stat of the file.)	2020-04-21 12:36:58 -04:00
Joey Hess	529f488ec4	fix a thundering herd problem Avoid repeatedly opening keys db when accessing a local git remote and -J is used. What was happening was that Remote.Git.onLocal created a new annex state as each thread started up. The way the MVar was used did not prevent that. And that, in turn, led to repeated opening of the keys db, as well as probably other extra work or resource use. Also managed to get rid of Annex.remoteannexstate, and it turned out there was an unncessary Maybe in the keysdbhandle, since the handle starts out closed.	2020-04-17 17:09:29 -04:00
Joey Hess	957a87b437	fix absolute filenames fed into --batch and git-annex info	2020-04-15 16:04:05 -04:00
Joey Hess	f85ca7dc80	fix all remaining -Wincomplete-uni-patterns warnings A couple of these were probably actual bugs in edge cases. Most of the changes I'm fine with. The fact that aeson's object returns sometihng that we know will be an Object, but the type checker does not know is kind of annoying.	2020-04-15 13:55:08 -04:00
Joey Hess	9cb69dbb76	support boolean git configs that are represented by the name of the setting with no value Eg"core.bare" is the same as "core.bare = true". Note that git treats "core.bare =" the same as "core.bare = false", so the code had to become more complicated in order to treat the absense of a value differently than an empty value. Ugh.	2020-04-13 13:35:22 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	c0cd07c36b	Ref ByteString conversion done Test suite passes.	2020-04-07 17:41:09 -04:00
Joey Hess	435c722904	remove unused import	2020-03-30 16:07:10 -04:00
Joey Hess	87d5583a91	use programPath consistently, not readProgramFile Improve git-annex's ability to find the path to its program, especially when it needs to run itself in another repo to upgrade it. Some parts of the code used readProgramFile, probably because I forgot that programPath exists. I noticed this when a git-annex auto-upgrade failed because it was running git-annex upgrade --autoonly, but the code to run git-annex used readProgramFile, which happened to point to an older build of git-annex.	2020-03-30 16:06:27 -04:00
Joey Hess	0e4d80d5c1	remove pre-commit hook This was originally added so that unannex could prevent the hook from running while files were in a state that the hook would interpret as old-style unlocked and so would lock. Now that's gone, so the only thing the hook was preventing was two pre-commit processes running simulantaneously. But such concurrency is normal in git-annex and should not be a problem. Does mean that .git/hooks/pre-commit-annex might run more concurrently, that seems the only risk of it causing any problems.	2020-03-30 11:54:04 -04:00
Kyle Meyer	39131b55ca	add --force-small: Send all non-regular files through addFile Running `git annex add --force-small` on a modified submodule fails when the submodule path is fed to hash-object. This failure is unlikely to be triggered by a caller passing a submodule explicitly to `git annex add` because there's nothing useful that annex-add can do with a submodule. A more likely scenario for hitting this failure is that the caller passes "." or a subdirectory to `annex-add` while a submodule underneath the specified path happens to be modified. addSmallOverridden already routes symbolic links through addFile rather than using the custom hash-object/update-index call. The latter is valid only for regular files, so extend this condition so that everything that isn't a regular file goes through addFile. Doing so avoids the above error because submodules come in as directories.	2020-03-26 13:14:16 -04:00
Kyle Meyer	339aebc6ad	add --force-small: Don't dereference link when checking file status addSmallOverridden calls getFileStatus and then checks the result with isSymbolicLink. getFileStatus dereferences symbolic links, so isSymbolicLink will always return false (assuming the getFileStatus call doesn't fail on a broken link). Use getSymbolicLinkStatus instead.	2020-03-26 13:11:27 -04:00
Joey Hess	afe72d04ff	fix problems with upgrade of local remotes Upgrade other repos than the current one by running git-annex upgrade inside them, which avoids problems with upgrade code making assumptions that the cwd will be inside the repo being upgraded. In particular, this fixes a problem where upgrading a v7 repo to v8 caused an ugly git error message. I actually could not find a way to make Upgrade.V7 work properly without changing directory to the remote. Once I got git ls-files to work, the git cat-file failed because :path can only be used in the current git repo.	2020-03-09 16:49:28 -04:00
Joey Hess	1978a24207	Fix bug that caused unlocked annexed dotfiles to be added to git by the smudge filter when annex.dotfiles was not set.	2020-03-09 14:20:02 -04:00
Joey Hess	01acb5212e	fix build	2020-03-09 12:31:14 -04:00
Joey Hess	7f992ef59c	mostly finished with createDirectoryUnder conversion Remaining things needing converted are in the assistant, and Annex.Ssh. Every other remaining call to createDirectoryIfMissing True has been audited and is not relevant. The ones in Build/ of course don't get included in the program. Others included eg, Remote.Tahoe and Config.Files which both write to dotfiles under the home directory.	2020-03-06 11:57:15 -04:00
Joey Hess	eaa49ab53d	convert replaceFile to createDirectoryUnder Since it was used on both worktree and .git/annex files, split into multiple functions. In passing, this also improves permissions of created directories in .git/annex, using createAnnexDirectory on those.	2020-03-06 11:31:01 -04:00
Joey Hess	ccd8c43dc8	git-annex config: guard against non-repo-global configs git-annex config: Only allow configs be set that are ones git-annex actually supports reading from repo-global config, to avoid confused users trying to set other configs with this.	2020-03-02 15:54:18 -04:00
Joey Hess	f6d629e483	changelog and minor style	2020-02-28 12:57:55 -04:00
Peter Simons	73cf523a4b	Fix build with ghc-8.8.x. The 'fail' method has been moved to the 'MonadFail' class. I made the changes so that the code still compiles with previous versions of 'base' that don't have the new MonadFail class exported by Prelude yet.	2020-02-28 12:54:20 -04:00
Joey Hess	2366e7fb84	catch whereisKey exception and provide error messages when external programs neglect to * whereis: If a remote fails to report on urls where a key is located, display a warning, rather than giving up and not displaying any information. * When external special remotes fail but neglect to provide an error message, say what request failed, which is better than displaying an empty error message to the user.	2020-02-27 14:09:18 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00
Joey Hess	d37975357d	Bugfix: export --tracking (a deprecated option) set annex-annex-tracking-branch, instead of annex-tracking-branch. (cherry picked from commit `a3a674d15b`)	2020-02-26 18:08:04 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	c31e1be781	convert KeySource to RawFilePath	2020-02-21 10:04:44 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	79a0435b77	automate remote.name.skipFetchAll initremote, enableremote: Set remote.name.skipFetchAll when the remote cannot be fetched from by git, so git fetch --all will not try to use it.	2020-02-19 13:58:26 -04:00
Joey Hess	69f2d1dd43	remoteConfig rework remoteAnnexConfig will avoid bugs like `a3a674d15b` Use now more generic remoteConfig in a couple places that built non-annex config settings manually before.	2020-02-19 13:45:11 -04:00
Joey Hess	a3a674d15b	Bugfix: export --tracking (a deprecated option) set annex-annex-tracking-branch, instead of annex-tracking-branch.	2020-02-19 13:34:24 -04:00
Joey Hess	72959b23e5	remove mention of receive.denyNonFastforwards on push failure That was added back in 2013 commit `2af652e1b8` and I'm a bit unclear about the reasons. It seemed that, at the time, receive.denyNonFastforwards=true, which is the default in a repo created by git init --shared --bare (but not without --shared), which the assistant did, caused problems syncing. But even at the time the bug report showed an error message clearly explaining that it was a non-fast-forward push being denied. I tried it with the current version, and since git-annex sync pulls from the bare repo and merges, it pushes a fast-forward. So there's no failure to push. (There could be one if another push happened after the pull, but you'd want it to fail then presumably.) I'm not 100% sure what changed to make it not be a problem, but I know I've seen this message in many circumstances and I can't ever recall it having anything to do with any issue that prevented a push. Based on doc/forum/non_fast_forward_error_with_git_annex_sync.mdwn, which showed the problem when syncing from a direct mode repo, and on doc/forum/receiving_indirect_renames_on_direct_repo___63__/comment_3_0246fff6c7c75f6be45bd257ec3872a5._comment which seems to show the problem was actually a problem pulling, I think there's a good chance that the problem actually involved direct mode.	2020-02-19 11:46:24 -04:00
Joey Hess	06f6eb7a70	--only-annex --no-content combination	2020-02-18 12:29:31 -04:00
Joey Hess	a78eb6dd58	sync --only-annex and annex.synconlyannex * Added sync --only-annex, which syncs the git-annex branch and annexed content but leaves managing the other git branches up to you. * Added annex.synconlyannex git config setting, which can also be set with git-annex config to configure sync in all clones of the repo. Use case is then the user has their own git workflow, and wants to use git-annex without disrupting that, so they sync --only-annex to get the git-annex stuff in sync in addition to their usual git workflow. When annex.synconlyannex is set, --not-only-annex can be used to override it. It's not entirely clear what --only-annex --commit or --only-annex --push should do, and I left that combination not documented because I don't know if I might want to change the current behavior, which is that such options do not override the --only-annex. My gut feeling is that there is no good reasons to use such combinations; if you want to use your own git workflow, you'll be doing your own committing and pulling and pushing. A subtle question is, how should import/export special remotes be handled? Importing updates their remote tracking branch and merges it into master. If --only-annex prevented that git branch stuff, then it would prevent exporting to the special remote, in the case where it has changes that were not imported yet, because there would be a unresolved conflict. I decided that it's best to treat the fact that there's a remote tracking branch for import/export as an implementation detail in this case. The more important thing is that an import/export special remote is entirely annexed content, and so it makes a lot of sense that --only-annex will still sync with it.	2020-02-17 16:33:10 -04:00
Joey Hess	879f52a116	annex.tune.branchhash1=true bugfix Fix support for repositories tuned with annex.tune.branchhash1=true, including --all not working and git-annex log not displaying anything for annexed files.	2020-02-14 15:22:48 -04:00
Joey Hess	352963690a	fsck --from remote -J concurrency bug fsck --from remote: Fix a concurrency bug that could make it incorrectly detect that content in the remote is corrupt, and remove it, resulting in data loss.	2020-02-14 14:52:15 -04:00
Joey Hess	1883f7ef8f	support git remotes that need http basic auth using git credential to get the password One thing this doesn't do is wrap the password prompting inside the prompt action. So with -J, the output can be a bit garbled.	2020-01-22 16:16:19 -04:00
Joey Hess	5c6bf1be97	--whatelse is a better name than --describe-other-params The use case is basically the user having forgotten, so --help would be best, but it would be quite hard to include this in --help, since it may even have to spin up an external special remote program. I also considered --umm but typoed it the first time I tried it as --uum, and while memorable, it's too cutesy. --whatelse is good because it explicitly asks, what other params, besides the ones I've given?	2020-01-20 17:04:45 -04:00
Joey Hess	2be4122bfc	include passthrough params in --describe-other-params	2020-01-20 16:53:27 -04:00
Joey Hess	aa949bbb7d	initremote --describe-other-params Does not yet include descriptions from external special remote programs.	2020-01-20 16:05:51 -04:00
Joey Hess	987076690c	started on --list-params-for	2020-01-15 14:09:30 -04:00
Joey Hess	6a982e38eb	a few more field functions	2020-01-15 12:57:56 -04:00
Joey Hess	2edf0506a5	a few forgotten remote config fields preferreddir can be used with any special remote, so its parser needs to be included in the commonFieldParsers. initremote with uuid= changed to delete that field, so it does not need to be included in commonFieldParsers. Note that, existing remotes initialized before this change will have the field in remote.log. This will not cause problems parsing, because the value will be Accepted. Grepping for 'Accepted "' found these, and I'm pretty sure this is all of them.	2020-01-15 11:22:36 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	6db4aee7df	use --no-abbrev instead of --abbrev=40 This avoids hardcoding the sha size, so when git uses sha256, it will output the full sha256 and not a truncation to 40 characters. I reviewed git's history, and while there have been some bugs with commands not supporting --no-abbrev (eg git diff --no-index --no-abbrev was broken in git 2.1), none of the commands git-annex uses will be impacted by those old bugs.	2020-01-07 12:29:37 -04:00
Joey Hess	5e4deb3620	support sha256 git repos Git will eventually switch to sha2 and there will not be one single shaSize anymore, but two (40 and 64). Changed all parsers for git plumbing output to support both sizes of shas. One potential problem this does not deal with is, if somewhere in git-annex it reads two shas from different sources, and compares them to see if they're the same sha, it would fail if they're sha1 and sha256 of the same value. I don't know if that will really be a concern.	2020-01-07 12:22:19 -04:00
Joey Hess	2de3dddfd2	reinject --known: Fix bug that prevented it from working in a bare repo. ifAnnexed in a bare repo passes to git cat-file :./filename , which it refuses to do since the repo is bare. Note that, reinject somefile someannexedfile in a bare repo silently does nothing, because someannexedfile is never actually an annexed worktree file, because the repo is bare.	2020-01-06 14:22:22 -04:00
Joey Hess	2cea674d1e	Merge branch 'master' into v8	2020-01-01 14:26:43 -04:00
Joey Hess	503788238c	add --force-annex/--force-git options make it easier to override annex.largefiles configuration (and potentially safer as it avoids bugs like the smudge bug fixed in the last release) Deleted some old comments that were posted to the man page discussing such options. Updated docs that used -c annex.largefiles to use the options. Note that addSmallOverridden was needed to avoid the clean filter running on the file. It would be possible to make addFile also update the index directly, rather than going via git add. However, it was not necessary, and I want to avoid breaking on some edge case, particularly if the code in addSmallOverridden has some oversight. Also, when annex.addunlocked is set and annex.largefiles does not match a file, git annex add --force-large works, but git status will then show the file as added, with a unstaged modification. The unstaged modification adds the file to git. This is identical behavior to using -c annex.largefiles=nothing when annex.addunlocked is set. This does not prevent committing what was intended to be added. I have not gotten to the bottom of why git thinks the file is modified and runs it through the clean filter in this case.	2020-01-01 14:03:06 -04:00
Joey Hess	ea3cb7d277	fix a case where file tracked by git unexpectedly becomes annex pointer file smudge: When annex.largefiles=anything, files that were already stored in git, and have not been modified could sometimes be converted to being stored in the annex. Changes in 7.20191024 made this more of a problem. This case is now detected and prevented.	2019-12-27 15:08:03 -04:00
Joey Hess	3cd3757236	annex.dotfiles The git add behavior changes could be avoided if it turns out to be really annoying, but then it would need to behave the old way when annex.dotfiles=false and the new way when annex.dotfiles=true. I'd rather not have the config option result in such divergent behavior as `git annex add .` skipping a dotfile (old) vs adding to annex (new). Note that the assistant always adds dotfiles to the annex. This is surprising, but not new behavior. Might be worth making it also honor annex.dotfiles, but I wonder if perhaps some user somewhere uses it and keeps large files in a directory that happens to begin with a dot. Since dotfiles and dotdirs are a unix culture thing, and the assistant users may not be part of that culture, it seems best to keep its current behavior for now.	2019-12-26 16:33:39 -04:00
Joey Hess	de14a7bab5	didn't mean to commit this incomplete workaround though I suppose it's nice to have it in the history..	2019-12-26 15:07:50 -04:00
Joey Hess	293f95c2d6	analysis	2019-12-26 15:05:36 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	5591622731	git-annex-config --set/--unset: No longer change the local git config setting `e53070c1f` quietly made it set the local git config too, but that was never documented anywhere, and it had surprising results. If I set annex.largefiles globally in a repo, I would expect to be able to change it in another repo, and the original repo would get the change and use it, rather than being stuck on the old value set there. And, if I have a local annex.largefiles and set a different global default, I'd be surprised to have my local setting overwritten. annex.securehashesonly does need to be set locally, since it's a security feature and the global is only a default until it gets set locally. So special cased.	2019-12-20 13:17:28 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	7d9dff5b05	Merge branch 'master' into bs and update changelog	2019-12-18 15:13:30 -04:00
Joey Hess	7fd5376334	inprogress: Support --key	2019-12-18 14:14:16 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	a7004375ec	avoid deprecation warning	2019-12-06 15:47:56 -04:00
Joey Hess	a0168cd9a2	use RawFilePath getSymbolicLinkStatus for speed	2019-12-06 15:42:54 -04:00
Joey Hess	5f391179f1	use RawFilePath getFileStatus for speed Only done on those calls to getFileStatus that had a RawFilePath, not a FilePath. The others would probably be just as fast if converted to use it with toRawFilePath, but I'm not 100% sure. Note that genInodeCache' uses fromRawFilePath, but that value only gets used on Windows, so on unix the thunk will never be evaluated.	2019-12-06 14:44:42 -04:00
Joey Hess	0e9d699ef3	use R.readSymbolicLink This will be faster once gitAnnexLink is converted to a RawFilePath.	2019-12-06 14:20:18 -04:00
Joey Hess	3266ad3ff7	everything is building again However, the test suite fails some quickchecks, so this branch is not yet in a mergeable state.	2019-12-05 15:10:23 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	3c7fd09ec8	get many more commands building again about half are building now	2019-12-05 11:40:10 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	25ba8156bc	improve benchmark --databases * benchmark: Changed --databases to take a parameter specifiying the size of the database to benchmark. * benchmark --databases: Display size of the populated database. * benchmark --databases: Improve the "addAssociatedFile to (new)" benchmark to really add new values, not overwriting old values.	2019-11-21 17:25:20 -04:00
Joey Hess	6f35b576d7	encourage use of import from directory special remote rather than legacy interface	2019-11-19 13:30:27 -04:00
Joey Hess	890330f0fe	make --json-error-messages capture url download errors Convert Utility.Url to return Either String so the error message can be displated in the annex monad and so captured. (When curl is used, its errors are still not caught.)	2019-11-12 13:52:38 -04:00
Joey Hess	0be23bae2f	refactor Better to not have a single function module, and better to have a more specific type than Bool. This commit was sponsored by Jack Hill on Patreon	2019-11-11 19:10:52 -04:00
Joey Hess	3b34d123ed	Added annex.allowsign option. This commit was sponsored by Ilya Shlyakhter on Patreon.	2019-11-11 16:28:56 -04:00
Joey Hess	25f912de5b	benchmark: Add --databases to benchmark sqlite databases Rescued from commit `11d6e2e260` which removed db benchmarks in favor of benchmarking arbitrary git-annex commands. Which is nice and general, but microbenchmarks are useful too.	2019-10-29 16:59:27 -04:00
Joey Hess	4a3f3a2cb5	make git add only annex when configured by annex.largefiles	2019-10-24 14:17:29 -04:00
Joey Hess	168f91efec	avoid warning over name	2019-10-24 11:46:40 -04:00
Joey Hess	bd197be3ad	annex.gitaddtoannex configuration Added annex.gitaddtoannex configuration. Setting it to false prevents git add from usually adding files to the annex. (Unless the file was annexed before, or a renamed annexed file is detected.) Currently left at true; some users are encouraging it be set to false.	2019-10-23 15:29:46 -04:00
Joey Hess	ec08b66bda	shouldAnnex: check isInodeKnown Renamed unlocked files are now detected, and will always be annexed, unless annex.largefiles disallows it. This allows for git add's behavior to later be changed to otherwise not annex files (whether by default or as a config option), without worrying about the rename case. This is not a major behavior change; annexing is still the default. But there is one case where the behavior is changed, I think for the better: touch f git -c annex.largefiles=nothing add f git add bigfile git commit -m ... mv bigfile f git add f Before, git-annex would see that f was previously not annexed, and so the renamed bigfile content gets added to git. Now, it notices that the inode is the one that bigfile used, and so it annexes it. This potentially slows down git add a lot in some repositories because of the poor performance of isInodeKnown when there are a lot of unlocked files. Configuring annex.largefiles avoids the speed hit.	2019-10-23 14:49:45 -04:00
Joey Hess	3d4aab38ce	remove obsolete comment	2019-10-21 13:51:38 -04:00
Joey Hess	668b878995	remove recently added and unncessary cwd parameter I later made Utility.Su change back to the cwd, so this parameter is not needed.	2019-10-21 13:48:52 -04:00
Joey Hess	9a5d9019ba	Deal with pkexec changing to root's home directory when running a command. Wow, that's not documented anywhere, and seems like a major gotcha in pkexec. Broke enable-tor.	2019-10-21 12:39:19 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	debafcba2b	autoenable sameas remotes	2019-10-11 15:52:40 -04:00
Joey Hess	ec778888d2	got enableremote working for sameas Also the assistant can enable sameas remotes, should work, but not tested.	2019-10-11 15:11:08 -04:00
Joey Hess	35d7ffe128	initremote --sameas fully working And using sameas remotes is working. Moved annex-config-uuid setting out of Remote.Helper.Special. EnableRemote will also have to set it.	2019-10-11 14:19:10 -04:00
Joey Hess	91eed85fd4	add sameas inherited configs to newConfig This makes initremote --sameas work with encryption inherited.	2019-10-11 13:05:20 -04:00
Joey Hess	59908586f4	rename RemoteConfigKey to RemoteConfigField And some associated renames. I was going to have some values named fooKeyKey otherwise..	2019-10-10 15:44:05 -04:00
Joey Hess	d1130ea04a	get rid of hardcoded "name" lookups Support "sameas-name" being set instead. In RenameRemote, rename which ever of the two is set.	2019-10-10 13:25:10 -04:00
Joey Hess	97b499a4dc	use sameas-name and sameas-uuid for sameas remotes initremote --sameas=remotename sets sameas-name and sameas-uuid Using sameas-name rather than name prevents old git-annex initremote from enabling a sameas remote by name, since it would not handle it correctly.	2019-10-10 12:32:05 -04:00
Joey Hess	61b384d2b7	add --sameas option, not yet used	2019-10-01 12:36:25 -04:00
Joey Hess	2b55a2b882	remotedaemon: Don't list --stop in help since it's not supported. Also, move out of plumbing section. When using tor, the remotedaemon is part of the user's workflow, as it runs the tor hidden service.	2019-09-30 14:40:46 -04:00
Joey Hess	090898a138	adjust --lock: This enters an adjusted branch where files are locked. Straightforward, except for the issue of how to reverse LockAdjustment. With --unlock, a commit that modifies/adds unlocked files gets reverse adjusted to use locked files. That's fairly reasonable, I think. But reversing --lock by unlocking all modified files feels wrong. Maybe that's just because repositories typically seem to still have mostly locked files in them (unless one is in an adjusted unlocked branch of course!) It may be that eventually how to reverse both will need to be configurable, I don't know.	2019-09-27 14:23:25 -04:00
Joey Hess	53fd746705	avoid some build warnings on windows	2019-09-12 14:11:19 -04:00
Joey Hess	99b509572d	post-receive hook updateInstead emulation cleanup The code is only needed because for a long time, git-annex didn't install hooks in repos on crippled filesystems. Now it does, and they work at least on FAT (where all files are executable) and Windows. It would be possible to remove this code in v8 simply by re-installing the hooks.	2019-09-11 14:41:51 -04:00
Joey Hess	061231621e	Merge branch 'master' into v7-default	2019-09-10 16:06:43 -04:00
Joey Hess	0af7ebdc2a	info: Display trust level when getting info on a uuid, same as on a remote.	2019-09-01 16:48:46 -04:00
Joey Hess	f845195354	Added annex.autoupgraderepository configuration Can be set to false to prevent any automatic repository upgrades. Also, removed direct mode specific upgrade code in Annex.Init, and made needsUpgrade always include the name/path of the repo, so if there's a problem it's clear what repo has the problem. And, made needsUpgrade catch any exceptions that might occur during the upgrade, so it can display a more useful error message than just the exception.	2019-09-01 13:42:26 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	4f59ac05b6	info: remove "repository mode" info: Removed the "repository mode" from its output (including the --json output) since with the removal of direct mode, there is no repository mode.	2019-08-29 14:12:22 -04:00
Joey Hess	36cf61d752	simplification Whether or not there's a false index, it can't Restage here. When there's a false index, restaging would alter it and not the real index, but it fails anyway because that index is locked. When there's not a false index, the index is locked, and so restaging can't alter it.	2019-08-28 15:46:35 -04:00
Joey Hess	da6f4d8887	remove direct mode support from Annex.Content No longer used. The only possible user of it would be code in Upgrade.V5, so I verified that the parts of Annex.Content it used were not used to manipulate direct mode files.	2019-08-27 13:14:06 -04:00
Joey Hess	3a0842d9f8	fix bug introduced in direct mode conversion oops, the code was "if direct && not present" and I removed the direct which made the wrong path be taken.	2019-08-27 12:29:05 -04:00
Joey Hess	a51a479fb9	fix a couple warnings	2019-08-27 12:24:31 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	20741b1eb4	Automatically convert direct mode repositories to v7 with adjusted unlocked branches * Automatically convert direct mode repositories to v7 with adjusted unlocked branches and set annex.thin. * init: When run on a crippled filesystem with --version=5, will error out, since version 7 is needed for adjusted unlocked branch. * direct: This command always errors out as direct mode is no longer supported. * indirect: This command has become a deprecated noop. * proxy: This command is deprecated because it was only needed in direct mode. (But it continues to work.) Also removed mentions of direct mode throughough the documentation. I have not removed all the direct mode code yet.	2019-08-26 15:05:25 -04:00
Joey Hess	c650389118	info: error out when file matching options used on non-directory When file matching options are specified when getting info of something other than a directory, they won't have any effect, so error out to avoid confusion. This commit was sponsored by mo on Patreon.	2019-08-24 13:20:19 -04:00
Joey Hess	88c61dea00	typo	2019-08-13 13:36:52 -04:00
Joey Hess	3049271fd0	fix build warnings	2019-08-13 13:12:41 -04:00
Joey Hess	b87ea12b6b	git-annex merge branch * merge: When run with a branch parameter, merges from that branch. This is especially useful when using an adjusted branch, because it applies the same adjustment to the branch before merging it.	2019-08-09 13:21:15 -04:00
Joey Hess	70b71bf660	have init --version fail when repo is already initialized with other version init: When the repo is already initialized, and --version requests a different version, error out rather than silently not changing the version.	2019-08-08 14:13:02 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00
Joey Hess	d2cc747d66	add back setDirect, lost in recent commit Oops, thanks goodness for test suite that found this..	2019-06-25 13:38:18 -04:00
Joey Hess	42c386fc47	add: Display progress meter when hashing files. * add: Display progress meter when hashing files. * add: Support --json-progress option.	2019-06-25 13:12:47 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	7264203eb1	importfeed: When there's a problem parsing the feed, --debug will output the feed content that was downloaded. And let the user know about it in the failure messages.	2019-06-20 12:37:07 -04:00
Joey Hess	9d36c826c0	use fine-grained WorkerStages when transferring and verifying This means that Command.Move and Command.Get don't need to manually set the stage, and is a lot cleaner conceptually. Also, this makes Command.Sync.syncFile use the worker pool better. In the scenario where it first downloads content and then uploads it to some other remotes, it will start in TransferStage, then enter VerifyStage and then go back to TransferStage for each transfer to the remotes. Before, it entered CleanupStage after the download, and stayed in it for the upload, so too many transfer jobs could run at the same time. Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent inside onLocal. That has a Annex state for the remote, with no worker pool. So the resulting calls to enteringStage won't block in there. While Remote.Git.copyToRemote does do checksum verification, I realized that should not use a verification slot in the WorkerPool to do it. Because, it's reading back from eg, a removable disk to checksum. That will contend with other writes to that disk. It's best to treat that checksum verification as just part of the transer. So, removed the todo item about that, as there's nothing needing to be done.	2019-06-19 13:24:20 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	04cc470201	run download checksum verification in separate job pool get, move, copy, sync: When -J or annex.jobs has enabled concurrency, checksum verification uses a separate job pool than is used for downloads, to keep bandwidth saturated. Not yet done for upload checksum verification, but that only affects remotes on local disks.	2019-06-17 14:58:02 -04:00
Joey Hess	ba2551da6f	add startingNoMessage Fixes the last wart in the StartMessage transition. A few commands include other CommandStart actions that generate output, and do not themselves need to display a start/end message.	2019-06-12 14:11:23 -04:00
Joey Hess	8e5ea28c26	finish CommandStart transition The hoped for optimisation of CommandStart with -J did not materialize. In fact, not runnign CommandStart in parallel is slower than -J3. So, CommandStart are still run in parallel. (The actual bad performance I've been seeing with -J in my big repo has to do with building the remoteList.) But, this is still progress toward making -J faster, because it gets rid of the onlyActionOn roadblock in the way of making CommandCleanup jobs run separate from CommandPerform jobs. Added OnlyActionOn constructor for ActionItem which fixes the onlyActionOn breakage in the last commit. Made CustomOutput include an ActionItem, so even things using it can specify OnlyActionOn. In Command.Move and Command.Sync, there were CommandStarts that used includeCommandAction, so output messages, which is no longer allowed. Fixed by using startingCustomOutput, but that's still not quite right, since it prevents message display for the includeCommandAction run inside it too.	2019-06-12 13:24:01 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	082e1f1738	Don't try to import .git directories from special remotes Because git does not support storing git repositories inside a git repository.	2019-06-04 15:14:20 -04:00
Joey Hess	a14f6ce758	fix repo description setting bugs * init: When the repository already has a description, don't change it. * describe: When run with no description parameter it used to set the description to "", now it will error out.	2019-05-23 12:51:01 -04:00
Joey Hess	e06feb7316	honor preferred content when importing Importing from a special remote honors its preferred content too; unwanted files are not imported. But, some preferred content expressions can't be checked before files are imported, and trying to import with such an expression will fail. Tested this with scenarios including changing the preferred content expression and making sure merging the import didn't delete files that were no longer wanted. There was one minor inefficiency mentioned in the todo that I punted on.	2019-05-21 14:38:06 -04:00

... 4 5 6 7 8 ...

2667 commits