git-annex

Author	SHA1	Message	Date
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	55400a03d3	more RawFilePath conversion This commit was sponsored by Luke Shumaker on Patreon.	2020-11-02 16:31:28 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	ca454c47f2	explicitly wait for a git process Eliminate a zombie that was only cleaned up by the later zombie cleanup code. This is still not ideal, it would be cleaner if it used conduit or something, and if the thread gets killed before waiting, it won't stop the process. Only remaining zombies are in CmdLine.Seek	2020-09-25 11:03:12 -04:00
Joey Hess	fcf5d11c63	add "input" field to json output The use case of this field is mostly to support -J combined with --json. When that is implemented, a user will be able to look at the field to determine which of the requests they have sent it corresponds to. The field typically has a single value in its list, but in some cases mutliple values (eg 2 command-line params) are combined together and the list will have more. Note that json parsing was already non-strict, so old git-annex metadata --json --batch can be fed json produced by the new git-annex and will not stumble over the new field.	2020-09-15 16:22:44 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	957a87b437	fix absolute filenames fed into --batch and git-annex info	2020-04-15 16:04:05 -04:00
Joey Hess	c0cd07c36b	Ref ByteString conversion done Test suite passes.	2020-04-07 17:41:09 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	0af7ebdc2a	info: Display trust level when getting info on a uuid, same as on a remote.	2019-09-01 16:48:46 -04:00
Joey Hess	4f59ac05b6	info: remove "repository mode" info: Removed the "repository mode" from its output (including the --json output) since with the removal of direct mode, there is no repository mode.	2019-08-29 14:12:22 -04:00
Joey Hess	da6f4d8887	remove direct mode support from Annex.Content No longer used. The only possible user of it would be code in Upgrade.V5, so I verified that the parts of Annex.Content it used were not used to manipulate direct mode files.	2019-08-27 13:14:06 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	c650389118	info: error out when file matching options used on non-directory When file matching options are specified when getting info of something other than a directory, they won't have any effect, so error out to avoid confusion. This commit was sponsored by mo on Patreon.	2019-08-24 13:20:19 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	8fdea8f444	WIP Added graftTree but it's buggy. Should use graftTree in Annex.Branch.graftTreeish; it will be faster than the current implementation there. Started Annex.Import, but untested and it doesn't yet handle tree grafting.	2019-02-21 17:32:59 -04:00
Joey Hess	f0a57825e2	shorten some too-long descriptions	2019-01-16 14:16:32 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	894716512d	add a UUIDDesc type containing a ByteString Groundwork for handling uuid.log using ByteString	2019-01-01 16:17:54 -04:00
Joey Hess	9cc6d5549b	convert UUID from String to ByteString This should make == comparison of UUIDs somewhat faster, and perhaps a few other operations around maps of UUIDs etc. FromUUID/ToUUID are used to convert String, which is still used for all IO of UUIDs. Eventually the hope is those instances can be removed, and all git-annex branch log files etc use ByteString throughout, for a real speed improvement. Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually contains only alphanumerics and so could be treated as ascii, it's conceivable that some git-annex repository has been initialized using a UUID that is not only not a canonical UUID, but contains high unicode or invalid unicode. Using the filesystem encoding avoids any problems with such a thing. However, a NUL in a UUID seems extremely unlikely, so I didn't use encodeBS / decodeBS to avoid their extra overhead in handling NULs. The Read/Show instance for UUID luckily serializes the same way for ByteString as it did for String.	2019-01-01 14:45:33 -04:00
Joey Hess	38d691a10f	removed the old Android app Running git-annex linux builds in termux seems to work well enough that the only reason to keep the Android app would be to support Android 4-5, which the old Android app supported, and which I don't know if the termux method works on (although I see no reason why it would not). According to [1], Android 4-5 remains on around 29% of devices, down from 51% one year ago. [1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/ This is a rather large commit, but mostly very straightfoward removal of android ifdefs and patches and associated cruft. Also, removed support for building with very old ghc < 8.0.1, and with yesod < 1.4.3, and without concurrent-output, which were only being used by the cross build. Some documentation specific to the Android app (screenshots etc) needs to be updated still. This commit was sponsored by Brett Eisenberg on Patreon.	2018-10-13 01:41:11 -04:00
Joey Hess	53526136e8	move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-10-01 14:12:06 -04:00
Joey Hess	1d1054faa6	added -z Added -z option to git-annex commands that use --batch, useful for supporting filenames containing newlines. It only controls input to --batch, the output will still be line delimited unless --json or etc is used to get some other output. While git often makes -z affect both input and output, I don't like trying them together, and making it affect output would have been a significant complication, and also git-annex output is generally not intended to be machine parsed, unless using --json or a format option. Commands that take pairs like "file key" still separate them with a space in --batch mode. All such commands take care to support filenames with spaces when parsing that, so there was no need to change it, and it would have needed significant changes to the batch machinery to separate tose with a null. To make fromkey and registerurl support -z, I had to give them a --batch option. The implicit batch mode they enter when not provided with input parameters does not support -z as that would have complicated option parsing. Seemed better to move these toward using the same --batch as everything else, though the implicit batch mode can still be used. This commit was sponsored by Ole-Morten Duesund on Patreon.	2018-09-20 16:11:47 -04:00
Joey Hess	6091b7b9db	info: Display uuid and description when a repository is identified by uuid, and for "here".	2018-06-24 17:38:18 -04:00
Joey Hess	1c8ee99b46	Fix build with ghc 8.4+, which broke due to the Semigroup Monoid change https://prime.haskell.org/wiki/Libraries/Proposals/SemigroupMonoid I am not happy with the fragile pile of CPP boilerplate required to support ghc back to 7.0, which git-annex still targets for both the android build and the standalone build targeting old linux kernels. It makes me unlikely to want to use Semigroup more in git-annex, because the benefit of the abstraction is swamped by the ugliness. I actually considered ripping out all the Semigroup instances, but some are needed to use optparse-applicative. The problem, I think, is they made this transaction on too fast a timeline. (Although ironically, work on it started in 2015 or earlier!) In particular, Debian oldstable is not out of security support, and it's not possible to follow the simpler workarounds documented on the wiki and have it build on oldstable (because the semigroups package in it is too old). I have only tested this build with ghc 8.2.2, not the newer and older versions that branches of the CPP support. So there could be typoes, we'll see. This commit was sponsored by Brock Spratlen on Patreon.	2018-05-30 12:28:43 -04:00
Joey Hess	2fc768ce72	avoid git annex info remote buffering list of keys This leaves git annex unused --from remote still using loggedKeysFor and buffering more than ought to be necessary, but I can't see a way to improve that.	2018-04-26 16:13:05 -04:00
Joey Hess	89e1a05a8f	Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale As long as all code imports Utility.Aeson rather than Data.Aeson, and no Strings that may contain utf-8 characters are used for eg, object keys via T.pack, this is guaranteed to fix the problem everywhere that git-annex generates json. It's kind of annoying to need to wrap ToJSON with a ToJSON', especially since every data type that has a ToJSON instance has to be ported over. However, that only took 50 lines of code, which is worth it to ensure full coverage. I initially tried an alternative approach of a newtype FileEncoded, which had to be used everywhere a String was fed into aeson, and chasing down all the sites would have been far too hard. Did consider creating an intentionally overlapping instance ToJSON String, and letting ghc fail to build anything that passed in a String, but am not sure that wouldn't pollute some library that git-annex depends on that happens to use ToJSON String internally. This commit was supported by the NSF-funded DataLad project.	2018-04-16 16:21:21 -04:00
Joey Hess	6cb5b7294f	info: Changed sorting of numcopies stats table, so it's ordered by the variance from the desired number of copies. Compare these... numcopies stats: numcopies -1: 1986 numcopies +0: 1170 numcopies -2: 769 numcopies +1: 716 numcopies -4: 696 numcopies -3: 485 numcopies -6: 230 numcopies -5: 111 numcopies -7: 91 numcopies -9: 9 numcopies stats: numcopies +1: 716 numcopies +0: 1170 numcopies -1: 1986 numcopies -2: 769 numcopies -3: 485 numcopies -4: 696 numcopies -5: 111 numcopies -6: 230 numcopies -7: 91 numcopies -9: 9 I feel that the former is a jumbled mess that doesn't tell much overall, while the second shows pretty clearly that most files are within 1 degree of the desired number of copies, with some outliers without enough.	2018-04-05 14:54:39 -04:00
Joey Hess	817ebb5765	info: Added "combined size of repositories containing these files" stat when run on a directory This commit was sponsored by andrea rota.	2018-04-05 14:44:58 -04:00
Joey Hess	6583448bab	add --json-error-messages (not yet implemented) Added --json-error-messages option, which includes error messages in the json output, rather than outputting them to stderr. The actual rediretion of errors is not implemented yet, this is only the docs and option plumbing. This commit was supported by the NSF-funded DataLad project.	2018-02-19 14:32:15 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	27eca014be	fix up Read instance incompatability caused by recent commit `9c4650358c` changed the Read instance for Key. I've checked all uses of that instance (by removing it and seeing what breaks), and they're all limited to the webapp, except one. That is GitAnnexDistribution's Read instance. So, `9c4650358c` would have broken upgrades of git-annex from downloads.kitenet.net. Once the .info files there got updated for a new release, old releases would have failed to parse them and never upgraded. To fix this, I found a way to make the .info files that contain GitAnnexDistribution values be readable by the old version of git-annex. This commit was sponsored by Ewen McNeill.	2017-02-24 18:59:12 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	0e30e71e9c	info: Support being passed a treeish, and show info about the annexed files in it similar to how a directory is handled.	2016-09-15 12:51:00 -04:00
Joey Hess	1a0e2c9901	get, move, copy, mirror: Added --failed switch which retries failed copies/moves Note that get --from foo --failed will get things that a previous get --from bar tried and failed to get, etc. I considered making --failed only retry transfers from the same remote, but it was easier, and seems more useful, to not have the same remote requirement. Noisy due to some refactoring into Types/	2016-08-03 12:37:12 -04:00
Joey Hess	f0886a1bdd	info: When run on a file now includes an indication of whether the content is present locally.	2016-07-30 12:29:59 -04:00
Joey Hess	870873bdaa	Removed dependency on json library; all JSON is now handled by aeson. I've eyeballed all --json commands, and the only difference should be that some fields are re-ordered.	2016-07-26 19:15:34 -04:00
Joey Hess	a030d0a8b7	allow using Aeson for streaming JSON output Keeping Text.JSON use for now, because it seems a better fit for most of the commands, which don't use very structured JSON objects, but just output whatever fields suites them. But this lets Aeson be used when a more structured data type is available to serialize to JSON.	2016-07-26 13:30:07 -04:00

1 2

95 commits