git-annex

Author	SHA1	Message	Date
Joey Hess	7f992ef59c	mostly finished with createDirectoryUnder conversion Remaining things needing converted are in the assistant, and Annex.Ssh. Every other remaining call to createDirectoryIfMissing True has been audited and is not relevant. The ones in Build/ of course don't get included in the program. Others included eg, Remote.Tahoe and Config.Files which both write to dotfiles under the home directory.	2020-03-06 11:57:15 -04:00
Joey Hess	c31e1be781	convert KeySource to RawFilePath	2020-02-21 10:04:44 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	6f35b576d7	encourage use of import from directory special remote rather than legacy interface	2019-11-19 13:30:27 -04:00
Joey Hess	0be23bae2f	refactor Better to not have a single function module, and better to have a more specific type than Bool. This commit was sponsored by Jack Hill on Patreon	2019-11-11 19:10:52 -04:00
Joey Hess	3b34d123ed	Added annex.allowsign option. This commit was sponsored by Ilya Shlyakhter on Patreon.	2019-11-11 16:28:56 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	8e5ea28c26	finish CommandStart transition The hoped for optimisation of CommandStart with -J did not materialize. In fact, not runnign CommandStart in parallel is slower than -J3. So, CommandStart are still run in parallel. (The actual bad performance I've been seeing with -J in my big repo has to do with building the remoteList.) But, this is still progress toward making -J faster, because it gets rid of the onlyActionOn roadblock in the way of making CommandCleanup jobs run separate from CommandPerform jobs. Added OnlyActionOn constructor for ActionItem which fixes the onlyActionOn breakage in the last commit. Made CustomOutput include an ActionItem, so even things using it can specify OnlyActionOn. In Command.Move and Command.Sync, there were CommandStarts that used includeCommandAction, so output messages, which is no longer allowed. Fixed by using startingCustomOutput, but that's still not quite right, since it prevents message display for the includeCommandAction run inside it too.	2019-06-12 13:24:01 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	082e1f1738	Don't try to import .git directories from special remotes Because git does not support storing git repositories inside a git repository.	2019-06-04 15:14:20 -04:00
Joey Hess	e06feb7316	honor preferred content when importing Importing from a special remote honors its preferred content too; unwanted files are not imported. But, some preferred content expressions can't be checked before files are imported, and trying to import with such an expression will fail. Tested this with scenarios including changing the preferred content expression and making sure merging the import didn't delete files that were no longer wanted. There was one minor inefficiency mentioned in the todo that I punted on.	2019-05-21 14:38:06 -04:00
Joey Hess	2d33122215	avoid ingest lockdown file escaping the withOtherTmp call Fixes bug that caused git-annex to fail to add a file when another git-annex process cleaned up the temp directory it was using. Solution is just to push withOtherTmp out to a higher level, so that the whole ingest process can be completed inside it. But in the assistant, that was not practical to do, since withOtherTmp runs in the Annex monad and the assistant does not. Worked around by introducing a separate temp directory that only the assistant uses for lockdown. Since only one assistant can run at a time, it's easy to clean up that directory of old cruft at startup.	2019-05-07 13:04:57 -04:00
Joey Hess	2bd0e07ed8	make merge commit on export that preserves the import history	2019-05-01 13:13:00 -04:00
Joey Hess	1503b86a14	make import tree from remote generate a merge commit This way no history is lost, neither what was exported to the remote, or the history of changes that is imported from it. No complicated correlation of two possibly very different histories is needed, just record what we know and then git merge will do a good job. Also, it notices when the remote tracking branch doesn't need to be updated, and avoids doing anything, so noop remotes are super cheap. The only catch here is that, since the commits generated for imports from the remote don't have a stable date or author/committer, each (non-noop) import generates different commits for the same imported trees. So, when the imported remote tracking branch is merged into master and then a change is imported again, there will be an extra series of commits, which will get more and more expensive each time. This seems to call for making stable commits for imports. Also that seems a good idea to make importing in several repositories have the same result.	2019-04-30 16:13:21 -04:00
Joey Hess	f95f340c73	sync: When listing contents on an import remote fails, proceed with other syncing instead of aborting Switch listContents to being a proper CommandStart, so if it throws an exception, it will be treated like any other command action that fails. downloadImport apparently does not ever throw an exception, and itself uses commandAction, so it can't be a CommandStart.	2019-04-10 17:02:56 -04:00
Joey Hess	5ab97333e4	import: Let --force overwrite symlinks, not only regular files The docs already implied this should work.	2019-03-18 16:40:15 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	c755788256	sync: import when annex-tracking-branch is configured This works, and tested syncing both gets changes from a special remote and sends changes to it, keeping it fully in sync nicely! But have not tried it with a subdir configured.	2019-03-09 13:57:49 -04:00
Joey Hess	ca1a3caaa8	refactor	2019-03-09 13:34:57 -04:00
Joey Hess	d9ee048d85	doc updates for import	2019-03-09 13:10:30 -04:00
Joey Hess	e412129523	concurrency and status messages when downloading from import	2019-03-08 12:33:44 -04:00
Joey Hess	be6085cfe5	fix option parser Alternative doesn't combine the subparsers the way I wanted. Unfortunately this new parser has suboptimal usage because everything is all jumbled together.	2019-03-06 13:10:29 -04:00
Joey Hess	aaacf431d8	handle importtree=yes config For now, it's only allowed when exporttree=yes is also set. That simplified the implementation, but could later be changed if there's a remote that makes sense to be an import but not an export. However, it may work just as well to make a remote be readonly to prevent export to it while still allowing import.	2019-03-04 16:07:35 -04:00
Joey Hess	519cadd1de	refactor RemoteTrackingBranch Not specific to Import; export will use it too.	2019-03-01 14:47:56 -04:00
Joey Hess	d28b0a8bd0	use disconnected history for import tracking branch This avoids the first merge from it deleting all files in the current branch, which was very surpring and unwanted behavior.	2019-03-01 14:33:29 -04:00
Joey Hess	45aacd888b	import downloader complete (untested) Made some api changes. listImportableContents needs to provide the size of the data, so the downloader can check disk free space. retrieveExportWithContentIdentifier is passed the filepath to write to Use temporary "CID" key during download of a ContentIdentifier from a remote, so withTmp can be used and then move the content to the real key once it's known.	2019-02-27 13:15:02 -04:00
Joey Hess	f4b773e9a1	incomplete action to download files from import	2019-02-26 15:25:28 -04:00
Joey Hess	b6e2a5e9c2	reorg	2019-02-26 14:22:08 -04:00
Joey Hess	e4e464da65	import command is updating tracking branch	2019-02-26 13:15:48 -04:00
Joey Hess	5afe4135c2	import --from option parsing	2019-02-26 12:06:19 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	53526136e8	move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-10-01 14:12:06 -04:00
Joey Hess	6583448bab	add --json-error-messages (not yet implemented) Added --json-error-messages option, which includes error messages in the json output, rather than outputting them to stderr. The actual rediretion of errors is not implemented yet, this is only the docs and option plumbing. This commit was supported by the NSF-funded DataLad project.	2018-02-19 14:32:15 -04:00
Joey Hess	c1ece47ea0	import --reinject-duplicates This is the same as running git annex reinject --known, followed by git-annex import. The advantage to having it in one command is that it only has to hash each file once; the two commands have to hash the imported files a second time. This commit was sponsored by Shane-o on Patreon.	2017-02-09 15:41:00 -04:00
Joey Hess	f617988a29	Make import --deduplicate and --skip-duplicates only hash once, not twice import: --deduplicate and --skip-duplicates were implemented inneficiently; they unncessarily hashed each file twice. They have been improved to only hash once. The new approach is to lock down (minimally) and hash files, and then reuse that information when importing them. This was rather tricky, especially in detecting changes to files while they are being imported. The output of import changed slightly. While before it silently skipped over files with eg --skip-duplicates, now it shows each file as it starts to act on it. Since every file is hashed first thing, it would otherwise not be clear what file import is chewing on. (Actually, it wasn't clear before when any of the duplicates switches were used.) This commit was sponsored by Alexander Thompson on Patreon.	2017-02-09 15:32:22 -04:00
Joey Hess	e7e36b6e72	import: Changed how --deduplicate, --skip-duplicates, and --clean-duplicates determine if a file is a duplicate Before, only content known to be present somewhere was considered a duplicate. Now, any content that has been annexed before will be considered a duplicate, even if all annexed copies of the data have been lost. Note that --clean-duplicates and --deduplicate still check numcopies, so won't delete duplicate files unless there's an annexed copy. This makes import use the same method as reinject --known. The man page already said that duplicate meant "its content is either present in the local repository already, or git-annex knows of another repository that contains it, or it was present in the annex before but has been removed now". So, this is really only bringing the implementation into line with the man page. This commit was sponsored by Jochen Bartl on Patreon.	2017-02-07 17:41:58 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	d37fe6a547	annex.largefiles can be configured in .gitattributes too This is particulary useful for v6 repositories, since the .gitattributes configuration will apply in all clones of the repository.	2016-02-02 15:18:17 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	7b8e79c0f0	add, import: Support --json output. Include added key in output.	2016-01-19 11:56:38 -04:00
Joey Hess	f16e235983	addurl, importfeed: Changed to honor annex.largefiles settings, when the content of the url is downloaded. (Not when using --fast or --relaxed.) importfeed just calls addurl functions, so inherits this from it. Note that addurl still generates a temp file, and uses that key to download the file. It just adds it to the work tree at the end when the file is small.	2015-12-02 15:12:33 -04:00
Joey Hess	dc8099872a	import: Changed to honor annex.largefiles settings.	2015-12-02 14:49:03 -04:00
Joey Hess	53db9d0b5c	work around git check-ignore --batch bad exit status bug, and bring back import -J	2015-11-06 15:39:51 -04:00
Joey Hess	362ab39aad	import -J fails at the end, disable util it can be fixed	2015-11-05 18:48:46 -04:00
Joey Hess	7dc90f2225	import: Avoid very ugly error messages when the directory files are imported to is not a directort, but perhaps an annexed file.	2015-11-05 18:46:05 -04:00
Joey Hess	5db7d435e7	-J for add/addurl/import	2015-11-05 18:24:15 -04:00

1 2

93 commits