git-annex

Author	SHA1	Message	Date
Joey Hess	e7134ca1eb	avoid partial functions in Git.Url After the last commit, it was able to throw errors just due to an unparseable url. This avoids needing to worry about that, as long as the call site has already checked that it has a parseable url.	2021-01-18 15:07:23 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	2dd38b6403	switch to Haskell2010 When I put in Haskell98 this spring, I was under the mistaken apprehension that ghc defaulted to that. But it actually its default is a third mode, which is closer to Haskell2010 but with some differences. The manual says "By default, GHC mainly aims to behave (mostly) like a Haskell 2010 compiler" Fixed two cases where the Haskell98 do indentation flexability let wrongly indented code build. That is one of the places where ghc does not behave like Haskell2010 by default. The other place that I think I was concerned about, is GHC manual section 19.1.1.3. Expressions and patterns. But that only seems to affect code using bottoms, so would only affect pure functions throwing an error, which I don't think git-annex does in many places as it's pretty horrid style. And it would only affect rare cases like shown in that section. If it did happen, it would mean that the error was not thrown before specifying Haskell98, and then was. Haskell2010 behaves the same as Haskell98. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-10-19 11:26:16 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	3175015d1b	lockContent for S3 (with versioning=yes) and git-lfs Made several special remotes support locking content on them while dropping, which allows dropping from another special remote when the content will only remain on a special remote of these types. In both cases, verify the content is present actively, because it's certianly possible for things other than git-annex to have removed it. Worth thinking about what to do if at some later point, git-lfs gains support for dropping content, and a content locking operation. That would probably need a transition; first would need to make lockContent use the locking operation. Then, once enough time had passed that we can assume any git-annex operating on the git-lfs remote had that change, git-annex could finally allow dropping from git-lfs. Or, it could be that git-lfs gains support for dropping content, but not locking it. In that case, it seems this commit would need to be reverted, and then wait long enough for that git-annex to be everywhere, and only then can git-annex safely support dropping from git-lfs. So, the assumption made in this commit could lead to bother later.. But I think it's actually highly unlikely git-lfs does ever support dropping; it's outside their centralized model. Probably. :) Worth keeping in mind as the same assumption is made about other special remotes though. This commit was sponsored by Ethan Aubin.	2020-06-26 13:46:42 -04:00
Joey Hess	01eb863a14	Build with the git-lfs library when available Otherwise use the vendored copy as before. The library is in Debian testing but not stable. Once it reaches stable, the vendored copy can be removed. Did not add it to debian/control because IIRC that's used to build git-annex on stable too, possibly. However, the Debian maintainer will probably want to make the package depend on libghc-git-lfs-dev. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:21:25 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	69e2e4763e	only check --force at init time, not enable time git-lfs repos that encrypt the annexed content but not the git repo only need --force passed to initremote, allow enableremote and autoenable of such remotes without forcing again. Needing --force again particularly made autoenable of such a repo not work. And once such a repo has been set up, it seems a second --force when enabling it elsewhere has little added value. It does tell the user about the possibly insecure configuration, but if the git repo has already been pushed to that remote in the clear, data has already been exposed. The goal of that --force was not to prevent every situation where such an exposure can happen -- anyone who sets up a public git repo and pushes to it will expose things similarly and git-annex is not involved. Instead, the purpose of the --force is to point out to the user that they're asking for a configuration where encryption is inconsistently applied.	2020-05-07 15:59:29 -04:00
Joey Hess	ccd8c43dc8	git-annex config: guard against non-repo-global configs git-annex config: Only allow configs be set that are ones git-annex actually supports reading from repo-global config, to avoid confused users trying to set other configs with this.	2020-03-02 15:54:18 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	69f2d1dd43	remoteConfig rework remoteAnnexConfig will avoid bugs like `a3a674d15b` Use now more generic remoteConfig in a couple places that built non-annex config settings manually before.	2020-02-19 13:45:11 -04:00
Joey Hess	1883f7ef8f	support git remotes that need http basic auth using git credential to get the password One thing this doesn't do is wrap the password prompting inside the prompt action. So with -J, the output can be a bit garbled.	2020-01-22 16:16:19 -04:00
Joey Hess	7038acf96c	add descriptions for all remote config fields not yet used	2020-01-20 15:20:04 -04:00
Joey Hess	99cb3e75f1	add LISTCONFIGS to external special remote protocol Special remote programs that use GETCONFIG/SETCONFIG are recommended to implement it. The description is not yet used, but will be useful later when adding a way to make initremote list all accepted configs. configParser now takes a RemoteConfig parameter. Normally, that's not needed, because configParser returns a parter, it does not parse it itself. But, it's needed to look at externaltype and work out what external remote program to run for LISTCONFIGS. Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported used to use NoUUID. The code that now checks for Nothing used to behave in some undefined way if the external program made requests that triggered it. Also, note that in externalSetup, once it generates external, it parses the RemoteConfig strictly. That generates a ParsedRemoteConfig, which is thrown away. The reason it's ok to throw that away, is that, if the strict parse succeeded, the result must be the same as the earlier, lenient parse. initremote of an external special remote now runs the program three times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least one of those, and it should be possible to only run the program once.	2020-01-17 16:07:17 -04:00
Joey Hess	c498269a88	convert configParser to Annex action and add passthrough option Needed so Remote.External can query the external program for its configs. When the external program does not support the query, the passthrough option will make all input fields be available.	2020-01-14 13:52:03 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	5877de5e80	git-lfs: remember urls, and autoenable remotes using known urls * git-lfs: The url provided to initremote/enableremote will now be stored in the git-annex branch, allowing enableremote to be used without an url. initremote --sameas can be used to add additional urls. * git-lfs: When there's a git remote with an url that's known to be used for git-lfs, automatically enable the special remote.	2019-11-18 16:09:09 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	d1130ea04a	get rid of hardcoded "name" lookups Support "sameas-name" being set instead. In RenameRemote, rename which ever of the two is set.	2019-10-10 13:25:10 -04:00
Joey Hess	a6c3d1cb6d	avoid unneccesary extra blank line before git-credentials prompt	2019-09-24 18:06:10 -04:00
Joey Hess	bc1b9a2c0a	improved GitLFS api	2019-09-24 18:05:11 -04:00
Joey Hess	6ae0a44c64	git-lfs: Added support for http basic auth	2019-09-24 14:46:20 -04:00
Joey Hess	de564df8b3	git-lfs: Only do endpoint discovery once when concurrency is enabled This avoids some extra work, but I don't think it was possible for two ssh endpoint discoveries run concurrently to both prompt for the ssh password; Annex.Ssh itself deals with concurrency. This is mostly groundwork for http password prompting.	2019-09-24 13:01:51 -04:00
Joey Hess	fb7d92457f	support using gcrypt with git-lfs special remote	2019-08-05 13:43:45 -04:00
Joey Hess	3f450f0f4a	add encryption warning	2019-08-05 11:35:26 -04:00
Joey Hess	ecf7f34c23	remember sha256 and size when necessary Using Logs.RemoteState for this means that if the same key gets uploaded twice to a git-lfs remote, but somehow has different content the two times (eg it's an URL key with non-stable content), the sha256/size of the newer content uploaded will overwrite what was remembered before. That seems ok; it just means that git-annex will request the newer version of the content when downloading from git-lfs. It will remember the sha256 and size if both are not known, or if only the sha256 is not known but the size is known, it only remembers the sha256, to avoid wasting space on the size. I did not add special case for when the sha256 is known and the size is not, because it's been a long time since git-annex created SHA256 keys without a size. (See doc/upgrades/SHA_size.mdwn)	2019-08-05 11:05:59 -04:00
Joey Hess	f5eb28682a	expand	2019-08-04 13:59:24 -04:00
Joey Hess	408cb0af39	remove unused imports	2019-08-04 12:43:53 -04:00
Joey Hess	9aab851a55	fix reversion lost check of resp_actions in `b82ecf7076`	2019-08-04 12:43:16 -04:00
Joey Hess	7269851550	download from LFS working including resuming	2019-08-04 12:32:36 -04:00
Joey Hess	b82ecf7076	verify that LFS server responds with requested object The protocol design allows the server to respond with some other object; if a server for some reason a server did that, it would not be right for git-annex to download its content. I don't think it would be a security hole, since git-annex is downloading a specific key and will verify the key's content. Seems like a good idea to belt-and-suspenders test for such a misuse of the protocol.	2019-08-03 16:23:47 -04:00
Joey Hess	28c0395d61	start at retieval from LFS Doesn't yet download the content, which will need to support resuming.	2019-08-03 12:51:16 -04:00
Joey Hess	5be0a35dae	implemented checkPresent for git-lfs	2019-08-03 12:21:28 -04:00
Joey Hess	a16e83eec8	also debug http response status code	2019-08-03 11:30:06 -04:00
Joey Hess	fc09a41ed1	storing objects in git-lfs is working Still need to record the sha256 and size when they cannot be determined by inspecting the key.	2019-08-02 13:56:55 -04:00
Joey Hess	6c1130a3bb	lfs endpoint discovery and caching in git-lfs special remote	2019-08-02 12:38:14 -04:00
Joey Hess	1cef791cf3	skeleton git-lfs special remote This is a special remote and a git remote at the same time; git can pull and push to it and git-annex can use it as a special remote. Remote.Git has to check if it's configured as a git-lfs special remote and sets it up as one if so. Object methods not implemented yet.	2019-08-01 15:30:12 -04:00

49 commits