git-annex

Author	SHA1	Message	Date
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	5f391179f1	use RawFilePath getFileStatus for speed Only done on those calls to getFileStatus that had a RawFilePath, not a FilePath. The others would probably be just as fast if converted to use it with toRawFilePath, but I'm not 100% sure. Note that genInodeCache' uses fromRawFilePath, but that value only gets used on Windows, so on unix the thunk will never be evaluated.	2019-12-06 14:44:42 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	3a0842d9f8	fix bug introduced in direct mode conversion oops, the code was "if direct && not present" and I removed the direct which made the wrong path be taken.	2019-08-27 12:29:05 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	d5ee5fef65	fsck: Detect situations where annex.thin has caused data loss to the content of locked files. In particular, when two files had the same content, and one was unlocked and modified, with annex.thin that can corrupt the content of the annex object, and so fsck on the other file should detect that. getKeyStatus was relying on Database.Keys.getAssociatedFiles to tell when a file is unlocked, but that can false positive because the database can list old associated files. Instead, separate out the case of unlocked object which has multiple hardlinks when annex.thin is in use.	2019-03-18 15:59:43 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	5d98cba923	use ByteStrings when reading annex symlinks and pointers Now there's a ByteString used all the way from disk to Key. The main complication in this conversion was the use of fromInternalGitPath in several places to munge things on Windows. The things that used that were changed to parse the ByteString using either path separator. Also some code that had read from files to a String lazily was changed to read a minimal strict ByteString.	2019-01-14 15:37:08 -04:00
Joey Hess	0acbbf208f	use fileKey here This doesn't change behavior in any way worth mentioning, but it's the right thing to do.	2019-01-14 13:22:33 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	2e9f128dea	moved module and relicensed	2018-10-29 23:13:36 -04:00
Joey Hess	53526136e8	move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-10-01 14:12:06 -04:00
Joey Hess	2da2ae0919	fix migration bug and make fsck warn * migrate: Fix bug in migration between eg SHA256 and SHA256E, that caused the extension to be included in SHA256 keys, and omitted from SHA256E keys. (Bug introduced in version 6.20170214) * migrate: Check for above bug when migrating from SHA256 to SHA256 (and same for SHA1 to SHA1 etc), and remove the extension that should not be in the SHA256 key. * fsck: Detect and warn when keys need an upgrade, either to fix up from the above migrate bug, or to add missing size information (a long ago transition), or because of a few other past key related bugs. This commit was sponsored by Henrik Riomar on Patreon.	2018-05-23 14:07:51 -04:00
Joey Hess	6583448bab	add --json-error-messages (not yet implemented) Added --json-error-messages option, which includes error messages in the json output, rather than outputting them to stderr. The actual rediretion of errors is not implemented yet, this is only the docs and option plumbing. This commit was supported by the NSF-funded DataLad project.	2018-02-19 14:32:15 -04:00
Joey Hess	42ba888875	optimise for case where there are no required contents Avoid reading location log in this case.	2018-02-08 14:16:00 -04:00
Joey Hess	7f5c6a28a6	fsck: Warn when required content is not present in the repository that requires it. This commit was sponsored by Jack Hill on Patreon.	2018-02-08 14:08:41 -04:00
Joey Hess	24df95f0f6	Fix several places where files in .git/annex/ were written with modes that did not take the core.sharedRepository config into account. git grep writeFile finds some more that might also be problems, but for now I've concentrated on .git/annex/ log files. There are certianly cases where writeFile is not a problem too. This commit was sponsored by mo on Patreon.	2018-01-02 17:25:25 -04:00
Joey Hess	fc845e6530	more lambda-case conversion	2017-12-05 15:00:50 -04:00
Joey Hess	4781ca297b	showStart variant for when there's no worktree file Clean up some uses of showStart with "" for the file, or in some cases, a non-filename description string. That would generate bad json, although none of the commands doing that supported --json. Using "" for the file resulted in output like "foo rest"; now the extra space is eliminated. This commit was sponsored by Fernando Jimenez on Patreon.	2017-11-28 15:14:16 -04:00
Joey Hess	85ed38a574	Avoid repeated checking that files passed on the command line exist. git annex add, git annex lock etc make multiple seek passes, and each seek pass checked that files existed. That was unncessary redundant work. Fixed by adding a new WorkTreeItem type, make seek actions use it, and check that the files exist when constructing it. This commit was supported by the NSF-funded DataLad project.	2017-10-16 14:10:20 -04:00
Joey Hess	81a861326d	fsck: Support --json. One use case is to get a list of files that fsck fails on, in order to eg, drop them from a remote. This commit was sponsored by Nick Daly on Patreon.	2017-06-26 13:40:57 -04:00
Joey Hess	5ee6912cf3	support parsing options like --to=here Reworked remote name parsing to allow things like that. Command.Move uses it for --to=here, although there's not yet an implementation of that option. This commit was sponsored by Ignacio on Patreon.	2017-05-31 16:49:28 -04:00
Joey Hess	55b178a6ba	minor cleanup	2017-03-10 15:03:33 -04:00
Joey Hess	71a05b0d25	use ActionItem rather than String This changes fsck -A warnings to include the name of the key, which is a bit redundant in one case, but was missing in another case.	2017-03-10 14:13:10 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	f90e2d0893	fix fsck bug introduced in `301aff34c4` Got two Maybe FilePaths crossed. Test suite caught it. Slightly improved types to avoid this mistake.	2017-03-10 12:11:00 -04:00
Joey Hess	301aff34c4	fsck -q: When a file has bad content, include the name of the file in the warning message. This commit was sponsored by Alexander Thompson on Patreon.	2017-03-08 15:15:20 -04:00
Joey Hess	942e0174b3	make fsck check annex.securehashesonly, and new tip for working around SHA1 collisions with git-annex This commit was sponsored by andrea rota.	2017-02-27 13:55:15 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	2577f1c0a2	fsck --all --from was checking the content of files in the local repository, rather than on the special remote. Straight up forgot to handle this case! This commit was sponsored by Fernando Jimenez on Patreon.	2016-11-16 15:33:57 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	1a0e2c9901	get, move, copy, mirror: Added --failed switch which retries failed copies/moves Note that get --from foo --failed will get things that a previous get --from bar tried and failed to get, etc. I considered making --failed only retry transfers from the same remote, but it was easier, and seems more useful, to not have the same remote requirement. Noisy due to some refactoring into Types/	2016-08-03 12:37:12 -04:00
Joey Hess	d13194b230	--branch, stage 2 Show branch:file that is being operated on. I had to make ActionItem a type and not a type class because withKeyOptions' passed two different types of values when using the type class, and I could not get the type checker to accept that.	2016-07-20 15:23:43 -04:00
Joey Hess	5642daa651	fsck: Fix a reversion in direct mode fsck of a file that is present when the location log thinks it is not. Reversion introduced in version 5.20151208.	2016-07-12 13:41:03 -04:00
Joey Hess	ae65aecb0b	fsck: When a key is not previously known in the location log, record something so that reinject --known will work.	2016-05-10 13:20:45 -04:00
Joey Hess	9169234c34	fix overindent	2016-05-10 13:08:24 -04:00
Joey Hess	bd516af734	fsck: Warn when core.sharedRepository is set and an annex object file's write bit is not set and cannot be set due to the file being owned by a different user. Made all Annex.Perms file mode changing functions ignore errors when core.sharedRepository is set, because the file might be owned by someone else. I don't fancy getting bug reports about crashes due to set modes in this configuration, which is a very foot-shooty configuration in the first place. The fsck warning is necessary because old repos kept files mode 444, which doesn't allow locking them, and so if the mode remains 444 due to the file being owned by someone else, the user should be told about it.	2016-04-14 15:36:53 -04:00
Joey Hess	b7c8bf5274	Preserve execute bits of unlocked files in v6 mode. When annex.thin is set, adding an object will add the execute bits to the work tree file, and this does mean that the annex object file ends up executable. This doesn't add any complexity that wasn't already present, because git annex add of an executable file has always ingested it so that the annex object ends up executable. But, since an annex object file can be executable or not, when populating an unlocked file from one, the executable bit is always added or removed to match the mode of the pointer file.	2016-04-14 14:47:08 -04:00
Joey Hess	ec198fec83	fsck: When the only copy of a file is in a dead repository, mention the repository.	2016-02-19 15:12:11 -04:00
Joey Hess	885e54df0a	fsck: Populate unlocked files in v6 repositories whose content is present in annex/objects but didn't reach the work tree. This also handles fixing up after `cf260d9a15`	2016-02-14 17:27:50 -04:00
Joey Hess	675321264f	fsck: Detect and fix missing associated file mappings in v6 repositories. This also handles fixing up after the bad data written by `cf260d9a15`.	2016-02-14 17:09:54 -04:00
Joey Hess	74bbdfa888	files with only 1 linkCount may still be unlocked When on crippled filesystem, or without annex.thin set.	2016-02-14 17:04:09 -04:00
Joey Hess	5b51db7645	clean up	2016-02-14 16:52:43 -04:00

1 2 3 4 5

249 commits