git-annex

Author	SHA1	Message	Date
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	4acbb40112	git-annex config annex.largefiles annex.largefiles can be configured by git-annex config, to more easily set a default that will also be used by clones, without needing to shoehorn the expression into the gitattributes file. The git config and gitattributes override that. Whenever something is added to git-annex config, we have to consider what happens if a user puts a purposfully bad value in there. Or, if a new git-annex adds some new value that an old git-annex can't parse. In this case, a global annex.largefiles that can't be parsed currently makes an error be thrown. That might not be ideal, but the gitattribute behaves the same, and is almost equally repo-global. Performance notes: git-annex add and addurl construct a matcher once and uses it for every file, so the added time penalty for reading the global config log is minor. If the gitattributes annex.largefiles were deprecated, git-annex add would get around 2% faster (excluding hashing), because looking that up for each file is not fast. So this new way of setting it is progress toward speeding up add. git-annex smudge does need to load the log every time. As well as checking the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids both overheads.	2019-12-20 13:01:41 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	650a631ef8	include all remotes back in	2019-12-02 12:26:33 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	b207d944f3	sync, assistant: Pull and push from git-lfs remotes. Oversight, forgot to add it to gitSyncableRemote	2019-11-18 16:13:21 -04:00
Joey Hess	5877de5e80	git-lfs: remember urls, and autoenable remotes using known urls * git-lfs: The url provided to initremote/enableremote will now be stored in the git-annex branch, allowing enableremote to be used without an url. initremote --sameas can be used to add additional urls. * git-lfs: When there's a git remote with an url that's known to be used for git-lfs, automatically enable the special remote.	2019-11-18 16:09:09 -04:00
Joey Hess	cee14f147a	stop displaying rsync progress, and use git-annex's own progress display for local-to-local repo transfers Reasons to do this include: 1. I've gotten pretty used to git-annex's own progress display, which is used for all transfers over ssh (except to old git-annex-shell), and for most special remote transfers. It's getting to seem weird to see the rsync progress display instead. 2. When -J was used, the rsync output could not be shown, and so there was no progress display. Now there will be. Progress will also be displayed now when cp CoW is used. But I'd expect a CoW copy to typically run so fast that the progress display will barely be noticable. This commit was sponsored by Peter on Patreon.	2019-11-15 13:21:06 -04:00
Joey Hess	890330f0fe	make --json-error-messages capture url download errors Convert Utility.Url to return Either String so the error message can be displated in the annex monad and so captured. (When curl is used, its errors are still not caught.)	2019-11-12 13:52:38 -04:00
Joey Hess	9e8d40181f	remove some unncessary uses of warningIO warningIO is not concurrent output safe, and it doesn't go to --json-error-messages There are a few more that would be too hard to remove, and there are also several dozen direct prints to stderr still.	2019-11-12 10:07:27 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	35d7ffe128	initremote --sameas fully working And using sameas remotes is working. Moved annex-config-uuid setting out of Remote.Helper.Special. EnableRemote will also have to set it.	2019-10-11 14:19:10 -04:00
Joey Hess	2bd6e81bb0	support annex-config-uuid when generating remote This is used by a special remote with sameas-uuid= The remote's uuid is the sameas-uuid, but it needs to get its RemoteConfig from the annex-config-uuid.	2019-10-11 12:34:11 -04:00
Joey Hess	df5b0ffab3	inherit other fields I think this is all that need to be inherited.	2019-10-10 16:11:21 -04:00
Joey Hess	c3975ff3b4	sameas RemoteConfig inheritance I found a way to avoid inheritance complicating anything outside of Logs.Remote. It seems fine to require all inherited values to be inherited and not set in the sameas remote's config. Since inherited values will be used for stuff like encryption and perhaps chunking, which control the actual content stored on the remote, it seems likely that there will not be any reason to need them to vary between two remotes that access the same underlying data store. The newer version of containers is free; the minimum ghc version is bundled with a newer version than that.	2019-10-10 15:58:22 -04:00
Joey Hess	59908586f4	rename RemoteConfigKey to RemoteConfigField And some associated renames. I was going to have some values named fooKeyKey otherwise..	2019-10-10 15:44:05 -04:00
Joey Hess	d1130ea04a	get rid of hardcoded "name" lookups Support "sameas-name" being set instead. In RenameRemote, rename which ever of the two is set.	2019-10-10 13:25:10 -04:00
Joey Hess	92ff30df70	set annex-config-uuid when RemoteConfig contains a sameas-uuid Initremote sets that, so after both initremote and enableremote, the git config will be set. Any remote that does not use Annex.SpecialRemote won't set annex-config-uuid. But that's only Remote.Git, which doesn't use RemoteConfig anyway.	2019-10-10 12:58:59 -04:00
Joey Hess	46071a2435	use storeUUIDIn	2019-10-10 12:38:17 -04:00
Joey Hess	06c04ffe29	use storeUUIDIn	2019-10-10 12:12:09 -04:00
Joey Hess	a6c3d1cb6d	avoid unneccesary extra blank line before git-credentials prompt	2019-09-24 18:06:10 -04:00
Joey Hess	bc1b9a2c0a	improved GitLFS api	2019-09-24 18:05:11 -04:00
Joey Hess	6ae0a44c64	git-lfs: Added support for http basic auth	2019-09-24 14:46:20 -04:00
Joey Hess	de564df8b3	git-lfs: Only do endpoint discovery once when concurrency is enabled This avoids some extra work, but I don't think it was possible for two ssh endpoint discoveries run concurrently to both prompt for the ssh password; Annex.Ssh itself deals with concurrency. This is mostly groundwork for http password prompting.	2019-09-24 13:01:51 -04:00
Joey Hess	53fd746705	avoid some build warnings on windows	2019-09-12 14:11:19 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	e804f48f82	remove a few more isDirect tests	2019-08-28 11:53:10 -04:00
Joey Hess	16f646c9a6	don't hide message when ensureInitialized fails	2019-08-27 12:38:47 -04:00
Joey Hess	bb18bbd426	consolidate calls to ensureInitialized tryGitConfigRead may run ensureInitialized first, but when checkuuid = false, that is skipped. So, make sure it's run before all onLocal actions. ensureInitialized is inexpensive, so the extra call by tryGitConfigRead is not a big deal. But since it was easy to do, I made it only be run once by all calls to onLocal. A few calls to onLocal didn't call ensureInitialized before. Notably, the checkPresent action didn't, and does now. That means that there's a guarantee that any necessary repo upgrades will be run before the checkPresent action runs in the repo. Which is important especially for the direct mode conversion, because without that upgrade, the checkPresent action would need to support direct mode still. Now I can remove the last bits of direct mode support in Annex.Content without worrying that it will break accessing remotes that have not been upgraded. This does necessarily mean that checkPresent needs to write to the disk when performing such a repo upgrade. The other remote actions already did, so retrieval from a readonly remote that needed to be upgraded would fail. Having checkPresent also fail doesn't seem like a large reversion, especially since it already failed in the default case when checkuuid = true.	2019-08-27 12:18:01 -04:00
Joey Hess	708fc6567f	S3: Fix encoding when generating public urls of S3 objects. This code feels worryingly stringily typed, but using URI does not help because the uriPath still has to be constructed with the right uri-encoding.	2019-08-15 12:56:46 -04:00
Joey Hess	386c0ce90a	close handle so windows can stat the file windows cannot stat a file that another process has open, which caused this to crash with an exception	2019-08-13 13:26:25 -04:00
Joey Hess	cfd0b4108e	avoid windows build warning	2019-08-13 13:10:33 -04:00
Joey Hess	5004381dd9	improve error display when storing to an export/import remote fails Prompted by the test suite on windows failing to with "export foo failed" and no information about what went wrong. Note that only storeExportWithContentIdentifier has been converted. storeExport still returns a Bool and so exceptions may be hidden. However, storeExportWithContentIdentifier has many more failure modes, since it needs to avoid overwriting modified files. So it's more important it have better error display.	2019-08-13 12:05:00 -04:00
Joey Hess	f27c5db5c5	avoid rsync failing with a permissions error The test suite was intermittently failing with rsync complaining it could not write to dest. get foo (from origin...) SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77 20 100% 0.00kB/s 0:00:00 ^M 20 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1) (from origin...) SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77 20 100% 0.00kB/s 0:00:00 ^M 20 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1) rsync: open "/home/joey/src/git-annex/.t/tmprepo1103/.git/annex/tmp/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77" failed: Permission denied (13) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3] It seems that the first rsync actually transferred the file, but then for some reason git-annex thinks it failed, so it retries. The second rsync then fails because the first rsync copied the file mode over and so the file is not writable now. So, this fixes that problem, but leaves open the question of why git-annex would think rsync failed when it wrote the file and didn't output any error message. Possibly a bug in rsyncProgress that either hides an error message, or somehow makes rsync unhappy?	2019-08-09 15:26:58 -04:00
Joey Hess	fb7d92457f	support using gcrypt with git-lfs special remote	2019-08-05 13:43:45 -04:00
Joey Hess	8401b09e32	Allow setting up a gcrypt special remote with encryption=shared It was documented to work, but seems it has been broken for a while/forever.	2019-08-05 12:41:05 -04:00
Joey Hess	3f450f0f4a	add encryption warning	2019-08-05 11:35:26 -04:00
Joey Hess	ecf7f34c23	remember sha256 and size when necessary Using Logs.RemoteState for this means that if the same key gets uploaded twice to a git-lfs remote, but somehow has different content the two times (eg it's an URL key with non-stable content), the sha256/size of the newer content uploaded will overwrite what was remembered before. That seems ok; it just means that git-annex will request the newer version of the content when downloading from git-lfs. It will remember the sha256 and size if both are not known, or if only the sha256 is not known but the size is known, it only remembers the sha256, to avoid wasting space on the size. I did not add special case for when the sha256 is known and the size is not, because it's been a long time since git-annex created SHA256 keys without a size. (See doc/upgrades/SHA_size.mdwn)	2019-08-05 11:05:59 -04:00
Joey Hess	f5eb28682a	expand	2019-08-04 13:59:24 -04:00
Joey Hess	408cb0af39	remove unused imports	2019-08-04 12:43:53 -04:00
Joey Hess	9aab851a55	fix reversion lost check of resp_actions in `b82ecf7076`	2019-08-04 12:43:16 -04:00
Joey Hess	7269851550	download from LFS working including resuming	2019-08-04 12:32:36 -04:00
Joey Hess	b82ecf7076	verify that LFS server responds with requested object The protocol design allows the server to respond with some other object; if a server for some reason a server did that, it would not be right for git-annex to download its content. I don't think it would be a security hole, since git-annex is downloading a specific key and will verify the key's content. Seems like a good idea to belt-and-suspenders test for such a misuse of the protocol.	2019-08-03 16:23:47 -04:00
Joey Hess	28c0395d61	start at retieval from LFS Doesn't yet download the content, which will need to support resuming.	2019-08-03 12:51:16 -04:00

1 2 3 4 5 ...

1168 commits