git-annex

Author	SHA1	Message	Date
Joey Hess	ccfa9b2dc4	make sync update --unlock-present branch	2020-11-13 15:04:34 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	c56efbbdb6	import: Check gitignores when importing trees from special remotes It seemed best to do this, for consistency with every other way files can get into a git-annex repo. Although it's just a bit strange that a local .gitignore file affects the pseudo-commits made for the remote that's imported from. This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-30 10:41:59 -04:00
Joey Hess	658ea7ca3c	sync --no-content import from directory special remote sync: When run without --content, import without copying from importtree=yes directory special remotes. (Other special remotes may support this later as well.) This commit was sponsored by Svenne Krap on Patreon.	2020-09-28 15:29:08 -04:00
Joey Hess	3eaaec3113	consistently use importKey when available This avoids import with --no-content and with --content potentially generating two different trees, leading to a merge conflict when run in two different clones of a repo. And it's necessary groundwork to make git-annex sync --no-content import from special remotes that support importKey. Only the directory special remote currently supports importKey, and it generates the same key as git-annex usually does, so there is no behavior change for it. Future special remotes will need to take care when adding importKey, if it generates different keys. Added some warnings about that to comments. This commit was sponsored by Noam Kremen on Patreon.	2020-09-28 15:27:46 -04:00
Joey Hess	f624876dc2	remove zombie process in file seeking This was the last one marked as a zombie. There might be others I don't know about, but except for in the hypothetical case of a thread dying due to an async exception before it can wait on a process it started, I don't know of any. It would probably be safe to remove the reapZombies now, but let's wait and so that in its own commit in case it turns out to cause problems. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-25 11:38:42 -04:00
Joey Hess	051e16a945	remove debug print	2020-09-24 15:37:39 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	b45b37b088	wait for first pass to complete before second pass Otherwise the bloom filter may not be fully populated when the second pass starts, which could have led to incorrect behavior with --all -J, probably in very rare circumstances.	2020-09-24 14:23:25 -04:00
Joey Hess	167da965b9	remove obsolete comment	2020-09-24 14:22:56 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	f4c4b89aa3	refactor Make all calls to git merge go through autoMergeFrom, in preparation for fine-tuning git merge's config for automatic merge conflict resolution. This commit was sponsored by Ryan Newton on Patreon.	2020-09-07 13:26:16 -04:00
Joey Hess	7bdb0cdc0d	add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess Fixes reversion in 8.20200617 that made annex.pidlock being enabled result in some commands stalling, particularly those needing to autoinit. Renamed runsGitAnnexChildProcess to make clearer where it should be used. Arguably, it would be better to have a way to make any process git-annex runs have the env var set. But then it would need to take the pid lock when running any and all processes, and that would be a problem when git-annex runs two processes concurrently. So, I'm left doing it ad-hoc in places where git-annex really does run a child process, directly or indirectly via a particular git command.	2020-08-25 14:57:49 -04:00
Joey Hess	5d380c6c5c	when workTreeItems finds a problem with a parameter, don't go on to process it Part of workTreeItems is trying detect a case where git porcelain refuses to process a file, and where git ls-files silently outputs nothing. But, it's hard to perfectly replicate git's behavior, and besides, git's behavior could change. So it could be that we warn, but then git ls-files does not skip over it, and so git-annex also processes it after warning about it. So, if we think we have a problem with a parameter, display the warning, and skip processing it at all. Implementing this was complicated by needing to handle the case where all command-line parameters get filtered out this way. Which is different than the case where there are none, because we don't want to operate on all files in this new case..	2020-08-06 13:47:45 -04:00
Joey Hess	1be92381ec	unify batch mode with non-batch by using AnnexedFileSeeker	2020-07-22 14:23:28 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	df58609804	convert sync to use seekFilteredKeys This only speeds up sync --content from 34.75 to 33.17 seconds; location log precaching will probably be a bigger win.	2020-07-13 15:02:52 -04:00
Joey Hess	4c9ad1de46	optimisation: stream keys through git cat-file --buffer This is only implemented for git-annex get so far. It makes git-annex get nearly twice as fast in a repo with 10k files, all of them present! But, see the TODO for some caveats.	2020-07-10 13:54:52 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	96f6aa39dd	add runsGitAnnexChildProcess calls This is all the calls to git-annex that seem capable of possibly locking the same pidlock as their parent. Except possibly for some in the assistant.	2020-06-17 15:31:03 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	3824645368	change to new waitForAllRunningCommandActions waitForAllRunningCommandActions is a subset of finishCommandActions and more appropriate for what is being done here: Just a concurrency barrier.	2020-05-26 14:00:51 -04:00
Joey Hess	e04a931439	improve transfer stages for some commands move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup actions in separate job pool from uploads. transferStages was confusingly named, it's only useful when doing downloads as then the verify actions can be run concurrently with other downloads. For commands that upload, there will be more concurrency from running cleanup actions in a separate job pool. As for sync, I left it using downloadStages although that's not optimal for the part of a sync that uploads. Perhaps it should use the union of both?	2020-05-26 11:55:50 -04:00
Joey Hess	0040d2c129	sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from Now the warning gets displayed, which is better than an arcane git error. The warning is still kind of ugly, especially when the pull later in the sync will clear up what it warns about. But, this is an unusual situation not likely to happen, and if there is no remote to pull from, the warning message is needed or the sync will seem to succeed despite not merging the synced master branch. Would still be better if it could merge the synced master branch in this situation, making an empty commit to master to do it seems wrong, and otherwise it would need a whole separate code path, and would bypass using git merge in favor of say, setting master to the syned branch. Which would bypass git configs like arguably merge.ff and certianly merge.verifySignatures. So don't want to do that.	2020-05-05 14:31:37 -04:00
Joey Hess	c05c4e549e	sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes Do not sync with a faster remote that was not specified. That old behavior was only documented in the changelog, and was certianly surprising. It also meant adding --fast made it slower..	2020-04-23 16:08:45 -04:00
Joey Hess	529f488ec4	fix a thundering herd problem Avoid repeatedly opening keys db when accessing a local git remote and -J is used. What was happening was that Remote.Git.onLocal created a new annex state as each thread started up. The way the MVar was used did not prevent that. And that, in turn, led to repeated opening of the keys db, as well as probably other extra work or resource use. Also managed to get rid of Annex.remoteannexstate, and it turned out there was an unncessary Maybe in the keysdbhandle, since the handle starts out closed.	2020-04-17 17:09:29 -04:00
Joey Hess	c0cd07c36b	Ref ByteString conversion done Test suite passes.	2020-04-07 17:41:09 -04:00
Joey Hess	87d5583a91	use programPath consistently, not readProgramFile Improve git-annex's ability to find the path to its program, especially when it needs to run itself in another repo to upgrade it. Some parts of the code used readProgramFile, probably because I forgot that programPath exists. I noticed this when a git-annex auto-upgrade failed because it was running git-annex upgrade --autoonly, but the code to run git-annex used readProgramFile, which happened to point to an older build of git-annex.	2020-03-30 16:06:27 -04:00
Joey Hess	79a0435b77	automate remote.name.skipFetchAll initremote, enableremote: Set remote.name.skipFetchAll when the remote cannot be fetched from by git, so git fetch --all will not try to use it.	2020-02-19 13:58:26 -04:00
Joey Hess	69f2d1dd43	remoteConfig rework remoteAnnexConfig will avoid bugs like `a3a674d15b` Use now more generic remoteConfig in a couple places that built non-annex config settings manually before.	2020-02-19 13:45:11 -04:00
Joey Hess	72959b23e5	remove mention of receive.denyNonFastforwards on push failure That was added back in 2013 commit `2af652e1b8` and I'm a bit unclear about the reasons. It seemed that, at the time, receive.denyNonFastforwards=true, which is the default in a repo created by git init --shared --bare (but not without --shared), which the assistant did, caused problems syncing. But even at the time the bug report showed an error message clearly explaining that it was a non-fast-forward push being denied. I tried it with the current version, and since git-annex sync pulls from the bare repo and merges, it pushes a fast-forward. So there's no failure to push. (There could be one if another push happened after the pull, but you'd want it to fail then presumably.) I'm not 100% sure what changed to make it not be a problem, but I know I've seen this message in many circumstances and I can't ever recall it having anything to do with any issue that prevented a push. Based on doc/forum/non_fast_forward_error_with_git_annex_sync.mdwn, which showed the problem when syncing from a direct mode repo, and on doc/forum/receiving_indirect_renames_on_direct_repo___63__/comment_3_0246fff6c7c75f6be45bd257ec3872a5._comment which seems to show the problem was actually a problem pulling, I think there's a good chance that the problem actually involved direct mode.	2020-02-19 11:46:24 -04:00
Joey Hess	06f6eb7a70	--only-annex --no-content combination	2020-02-18 12:29:31 -04:00
Joey Hess	a78eb6dd58	sync --only-annex and annex.synconlyannex * Added sync --only-annex, which syncs the git-annex branch and annexed content but leaves managing the other git branches up to you. * Added annex.synconlyannex git config setting, which can also be set with git-annex config to configure sync in all clones of the repo. Use case is then the user has their own git workflow, and wants to use git-annex without disrupting that, so they sync --only-annex to get the git-annex stuff in sync in addition to their usual git workflow. When annex.synconlyannex is set, --not-only-annex can be used to override it. It's not entirely clear what --only-annex --commit or --only-annex --push should do, and I left that combination not documented because I don't know if I might want to change the current behavior, which is that such options do not override the --only-annex. My gut feeling is that there is no good reasons to use such combinations; if you want to use your own git workflow, you'll be doing your own committing and pulling and pushing. A subtle question is, how should import/export special remotes be handled? Importing updates their remote tracking branch and merges it into master. If --only-annex prevented that git branch stuff, then it would prevent exporting to the special remote, in the case where it has changes that were not imported yet, because there would be a unresolved conflict. I decided that it's best to treat the fact that there's a remote tracking branch for import/export as an implementation detail in this case. The more important thing is that an import/export special remote is entirely annexed content, and so it makes a lot of sense that --only-annex will still sync with it.	2020-02-17 16:33:10 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	99b509572d	post-receive hook updateInstead emulation cleanup The code is only needed because for a long time, git-annex didn't install hooks in repos on crippled filesystems. Now it does, and they work at least on FAT (where all files are executable) and Windows. It would be possible to remove this code in v8 simply by re-installing the hooks.	2019-09-11 14:41:51 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	9d36c826c0	use fine-grained WorkerStages when transferring and verifying This means that Command.Move and Command.Get don't need to manually set the stage, and is a lot cleaner conceptually. Also, this makes Command.Sync.syncFile use the worker pool better. In the scenario where it first downloads content and then uploads it to some other remotes, it will start in TransferStage, then enter VerifyStage and then go back to TransferStage for each transfer to the remotes. Before, it entered CleanupStage after the download, and stayed in it for the upload, so too many transfer jobs could run at the same time. Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent inside onLocal. That has a Annex state for the remote, with no worker pool. So the resulting calls to enteringStage won't block in there. While Remote.Git.copyToRemote does do checksum verification, I realized that should not use a verification slot in the WorkerPool to do it. Because, it's reading back from eg, a removable disk to checksum. That will contend with other writes to that disk. It's best to treat that checksum verification as just part of the transer. So, removed the todo item about that, as there's nothing needing to be done.	2019-06-19 13:24:20 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	04cc470201	run download checksum verification in separate job pool get, move, copy, sync: When -J or annex.jobs has enabled concurrency, checksum verification uses a separate job pool than is used for downloads, to keep bandwidth saturated. Not yet done for upload checksum verification, but that only affects remotes on local disks.	2019-06-17 14:58:02 -04:00
Joey Hess	ba2551da6f	add startingNoMessage Fixes the last wart in the StartMessage transition. A few commands include other CommandStart actions that generate output, and do not themselves need to display a start/end message.	2019-06-12 14:11:23 -04:00
Joey Hess	8e5ea28c26	finish CommandStart transition The hoped for optimisation of CommandStart with -J did not materialize. In fact, not runnign CommandStart in parallel is slower than -J3. So, CommandStart are still run in parallel. (The actual bad performance I've been seeing with -J in my big repo has to do with building the remoteList.) But, this is still progress toward making -J faster, because it gets rid of the onlyActionOn roadblock in the way of making CommandCleanup jobs run separate from CommandPerform jobs. Added OnlyActionOn constructor for ActionItem which fixes the onlyActionOn breakage in the last commit. Made CustomOutput include an ActionItem, so even things using it can specify OnlyActionOn. In Command.Move and Command.Sync, there were CommandStarts that used includeCommandAction, so output messages, which is no longer allowed. Fixed by using startingCustomOutput, but that's still not quite right, since it prevents message display for the includeCommandAction run inside it too.	2019-06-12 13:24:01 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	568af1073e	filter exported tree through remote's preferred content setting The filtering is fairly efficient as far as building the trees goes, since it reuses adjustTree. But it still needs to traverse the whole tree, and look up the keys used by every file. The tree that gets recorded to export.log is the filtered tree. This way resumes of interrupted sync to an export uses it without needing to recalculate it. And, a change to the preferred content settings of the remote will result in a different tree, so the export will be updated accordingly. The original tree is still used in the remote tracking branch. That branch represents the special remote as a git remote, and if it were a normal git remote, the tree in its head would not be affected by preferred content.	2019-05-20 11:54:55 -04:00
Joey Hess	3d6f1b7dba	Made git-annex sync --content much faster when all the remotes it's syncing with are export/import remotes It was unnecessarily going over all files and checking preferred content against no remotes.	2019-04-10 12:42:10 -04:00
Joey Hess	37041b629d	improve messages around export/import conflicts A conflict can be caused by either export or import when the remote supports both.	2019-04-09 13:03:59 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	28e46d947a	avoid sync --content trying to sendKey to exporttree remotes	2019-03-11 14:09:46 -04:00

1 2 3 4 5 ...

290 commits