git-annex

Author	SHA1	Message	Date
Joey Hess	9a67ed0f10	importtree: support preferred content expressions needing keys When importing from a special remote, support preferred content expressions that use terms that match on keys (eg "present", "copies=1"). Such terms are ignored when importing, since the key is not known yet. When "standard" or "groupwanted" is used, the terms in those expressions also get pruned accordingly. This does allow setting preferred content to "not (copies=1)" to make a special remote into a "source" type of repository. Importing from it will import all files. Then exporting to it will drop all files from it. In the case of setting preferred content to "present", it's pruned on import, so everything gets imported from it. Then on export, it's applied, and everything in it is left on it, and no new content is exported to it. Since the old behavior on these preferred content expressions was for importtree to error out, there's no backwards compatability to worry about. Except that sync/pull/etc will now import where before it errored out.	2023-12-18 16:27:59 -04:00
Joey Hess	518a51a8a0	--explain for preferred/required content matching And annex.largefiles and annex.addunlocked. Also git-annex matchexpression --explain explains why its input expression matches or fails to match. When there is no limit, avoid explaining why the lack of limit matches. This is also done when no preferred content expression is set, although in a few cases it defaults to a non-empty matcher, which will be explained. Sponsored-by: Dartmouth College's DANDI project	2023-07-26 14:50:04 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	57b89c635f	support required groupwanted When the required content is set to "groupwanted", use whatever expression has been set in groupwanted as the required content of the repo, similar to how setting required content to "standard" already worked.	2020-04-28 13:31:26 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	b646a85c4a	oops, fixed wrong change	2019-05-23 12:44:27 -04:00
Joey Hess	e1aa8b9001	avoid build failure with older base	2019-05-23 12:16:57 -04:00
Joey Hess	e06feb7316	honor preferred content when importing Importing from a special remote honors its preferred content too; unwanted files are not imported. But, some preferred content expressions can't be checked before files are imported, and trying to import with such an expression will fail. Tested this with scenarios including changing the preferred content expression and making sure merging the import didn't delete files that were no longer wanted. There was one minor inefficiency mentioned in the todo that I punted on.	2019-05-21 14:38:06 -04:00
Joey Hess	354c0eb57f	support standard and groupwanted in keyless mode Only when the preferred content expression includes them will a parse failure due to them needing keys result in the preferred content expression not parsing in keyless mode.	2019-05-14 14:59:03 -04:00
Joey Hess	9411a7c93c	matching preferred content before key is known This will let import try to match preferred content expressions before downloading the content and generating its key. If an expression needs a key, it preferredContentParser with preferredContentKeylessTokens will fail to parse it. standard and groupwanted are not in preferredContentKeylessTokens because they may refer to an expression that refers to a key. That needs further work to support them.	2019-05-14 14:28:23 -04:00
Joey Hess	aa7710982b	avoid list lookup by parseToken Minor optimisation to parsing of a preferred content expression.	2019-05-14 13:11:29 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	9887a378fe	renamings to make clean when old-format logs are being used	2019-02-21 13:43:44 -04:00
Joey Hess	591e4b145f	convert old uuid-based log parsers to attoparsec This preserves the workaround for the old bug that caused NoUUID items to be stored in the log, prefixing log lines with " ". It's now handled implicitly, by using takeWhile1 (/= ' ') to get the uuid. There is a behavior change from the old parser, which split the value into words and then recombined it. That meant that "foo bar" and "foo\tbar" came out as "foo bar". That behavior was not documented, and seems surprising; it meant that after a git-annex describe here "foo bar", you wouldn't get that same string back out when git-annex displayed repo descriptions. Otoh, some other parsers relied on the old behavior, and the attoparsec rewrites had to deal with the issue themselves... For group.log, there are some edge cases around the user providing a group name with a leading or trailing space. The old parser would ignore such excess whitespace. The new parser does too, because the alternative is to refuse to parse something like " group1 group2 " due to excess whitespace, which would be even more confusing behavior. The only git-annex branch log file that is not converted to attoparsec and bytestring-builder now is transitions.log.	2019-01-10 16:34:20 -04:00
Joey Hess	bfc9039ead	convert git-annex branch access to ByteStrings and Builders Most of the individual logs are not converted yet, only presense logs have an efficient ByteString Builder implemented so far. The rest convert to and from String.	2019-01-03 13:21:48 -04:00
Joey Hess	10138056dc	v6: avoid accidental conversion when annex.largefiles is not configured v6: When annex.largefiles is not configured for a file, running git add or git commit, or otherwise using git to stage a file will add it to the annex if the file was in the annex before, and to git otherwise. This is to avoid accidental conversion. Note that git-annex add's behavior has not changed, for reasons explained in the added comment. Performance: No added overhead when annex.largefiles is configured. When not configured, there is an added call to catObjectMetaData, which involves a round trip through git cat-file --batch. However, the earlier catKeyFile primes the cache for it. This commit was supported by the NSF-funded DataLad project.	2018-08-27 14:51:10 -04:00
Joey Hess	403b56fb91	Limit annex.largefiles parsing to the subset of preferred content expressions that make sense in its context. So, not "standard" or "lackingcopies", etc.	2016-02-03 15:04:42 -04:00
Joey Hess	cdf5977053	simplify	2016-02-03 13:23:34 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	983c1894eb	avoid unnecessary reading of git-annex branch data when matching on annex.largefiles This makes git annex clean not look at the git-annex branch at all, and so speeds it up by 50% or more.	2015-12-04 15:06:41 -04:00
Joey Hess	6fbabfcf16	oops, didn't mean to commit this debug	2015-10-06 17:28:20 -04:00
Joey Hess	ba7ecf68c0	analysis	2015-10-06 17:11:52 -04:00
Joey Hess	16947ef654	Fix bug in combination of preferred and required content settings. When one was set to the empty string and the other set to some expression, this bug caused all files to be wanted, instead of only files matching the expression. Avoid: MAny `MOr` otherexpression Which matches anything.	2015-09-15 12:50:14 -04:00
Joey Hess	6e829939e9	add test case that all standard group preferred content expressions parse	2015-06-17 13:44:19 -04:00
Joey Hess	e8c376e0ad	import Data.Default in Common	2015-01-28 16:11:28 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	9eaabf0382	webapp: avoid overwriting remote configs when enabling it Avoid stomping on existing group and preferred content settings when enabling or combining with an already existing remote. Two level fix. First, use defaultStandardGroup rather than setStandardGroup, so if there is an existing configuration in the git-annex branch, it's not overwritten. To handle pre-existing ssh remotes (including gcrypt), a second level is needed, because before syncing with the remote, it's configuration won't be available locally. (And syncing could take a long time.) So, in this case, keep track of whether the remote is being created or enabled, and only set configs when creating it. This commit was sponsored by Anders Lannerback.	2014-05-30 14:03:04 -04:00
Joey Hess	065248f3d2	Added required content configuration. This includes checking when dropping files that any required content configuration is satisfied. However, it does not yet include an active check on the required content; the location log is trusted when checking the required content expression.	2014-03-29 16:03:33 -04:00
Joey Hess	fe19e15040	reorg matcher types; no non-type code changes	2014-03-29 14:43:34 -04:00
Joey Hess	ed30b81e2c	Improve behavior when unable to parse a preferred content expression (thanks, ion). Fall back to "present" as the preferred conent expression, which will not result in any content movement.	2014-03-20 00:10:12 -04:00
Joey Hess	6a4dd42328	finish wiring up groupwanted	2014-03-15 17:08:55 -04:00
Joey Hess	417aea25be	vicfg: Allows editing preferred content expressions for groups. This is stored in the git-annex branch, but not yet actually hooked up and used.	2014-03-15 16:17:01 -04:00
Joey Hess	3551d40b05	"standard" can now be used as a first-class keyword in preferred content expressions. For example "standard or (include=otherdir/*)" or even "not standard" Note that the implementation avoids any potential for loops (if a standard preferred content expression itself mentioned standard). This commit was sponsored by Jochen Bartl.	2014-03-14 15:04:33 -04:00
Joey Hess	3518c586cf	fix transfers of key with no associated file Several places assumed this would not happen, and when the AssociatedFile was Nothing, did nothing. As part of this, preferred content checks pass the Key around. Note that checkMatcher is sometimes now called with Just Key and Just File. It currently constructs a FileMatcher, ignoring the Key. However, if it constructed a FileKeyMatcher, which contained both, then it might be possible to speed up parts of Limit, which currently call the somewhat expensive lookupFileKey to get the Key. I have not made this optimisation yet, because I am not sure if the key is always the same. Will need some significant checking to satisfy myself that's the case..	2014-01-23 16:44:02 -04:00
Joey Hess	8e3032df2d	added GETWANTED, SETWANTED for Tobias's flickr remote This was unexpectedly difficult because of a depdenency cycle. To parse a preferred content expression involves several things that need to operate on the list of remotes. Which needs Remote.External. The only way to avoid this cycle (I tried breaking it at several points) was to skip parsing the expression in SETWANTED. That's sorta ok, because git-annex already has to deal with unparsable preferred content expressions being stored, in order to handle eg, upgrades. But I'm still not very happy that I cannot check it. I feel this is a strong indication that I need to beware of further bloating the special remote protocol interface.	2014-01-01 20:12:20 -04:00
Joey Hess	f0a6de1ca2	add PreferredContentExpression type	2014-01-01 19:58:02 -04:00
Richard Hartmann	974fe009bf	Another round of s/amoung/among/	2013-12-19 12:30:53 -04:00
Joey Hess	049e80e865	refactor	2013-10-28 14:05:55 -04:00
Joey Hess	62beaa1a86	refactor git-annex branch log filename code into central location Having one module that knows about all the filenames used on the branch allows working back from an arbitrary filename to enough information about it to implement dropping dead remotes and doing other log file compacting as part of a forget transition.	2013-08-29 19:13:00 -04:00
Joey Hess	0ae8c82c53	per-IA-item content directories	2013-04-25 23:44:55 -04:00
Joey Hess	91b7de97e8	invalidated the wrong cache when setting preferred content	2013-03-31 19:00:14 -04:00
Joey Hess	67e817c6a1	New annex.largefiles setting, which configures which files `git annex add` and the assistant add to the annex. I would have sort of liked to put this in .gitattributes, but it seems it does not support multi-word attribute values. Also, making this a single config setting makes it easy to only parse the expression once. A natural next step would be to make the assistant `git add` files that are not annex.largefiles. OTOH, I don't think `git annex add` should `git add` such files, because git-annex command line tools are not in the business of wrapping git command line tools.	2013-03-29 16:17:13 -04:00
Joey Hess	0d50a6105b	whitespace fixes	2012-12-13 00:45:27 -04:00
Joey Hess	99a8a5297c	--auto fixes * get/copy --auto: Transfer data even if it would exceed numcopies, when preferred content settings want it. * drop --auto: Fix dropping content when there are no preferred content settings.	2012-12-06 13:22:16 -04:00
Joey Hess	2172cc586e	where indenting	2012-11-11 00:51:07 -04:00
Joey Hess	c7c2015435	add ConfigMonitor thread Monitors git-annex branch for changes, which are noticed by the Merger thread whenever the branch ref is changed (either due to an incoming push, or a local change), and refreshes cached config values for modified config files. Rate limited to run no more often than once per minute. This is important because frequent git-annex branch changes happen when files are being added, or transferred, etc. A primary use case is that, when preferred content changes are made, and get pushed to remotes, the remotes start honoring those settings. Other use cases include propigating repository description and trust changes to remotes, and learning when a remote has added a new special remote, so the webapp can present the GUI to enable that special remote locally. Also added a uuid.log cache. All other config files already had caches.	2012-10-20 16:43:35 -04:00
Joey Hess	40aab719df	Replace "in=" with "present" in preferred content expressions in= was problimatic in two ways. First, it referred to a remote by name, but preferred content expressions can be evaluated elsewhere, where that remote doesn't exist, or a different remote has the same name. This name lookup code could error out at runtime. Secondly, in= seemed pretty useless. in=here did not cause content to be gotten, but it did let present content be dropped. present is more useful, although "not present" is unstable and should be avoided.	2012-10-19 16:09:21 -04:00
Joey Hess	e7780a39f5	Preferred content path matching bugfix. When in a subdir, both the normal filepath, and the filepath relative to the top of the git repo are needed for matching. The former for key lookup, and the latter for include/exclude to match against. Previously, key lookup didn't work in this situation.	2012-10-17 16:01:09 -04:00

1 2

63 commits