git-annex

Author	SHA1	Message	Date
Joey Hess	99b509572d	post-receive hook updateInstead emulation cleanup The code is only needed because for a long time, git-annex didn't install hooks in repos on crippled filesystems. Now it does, and they work at least on FAT (where all files are executable) and Windows. It would be possible to remove this code in v8 simply by re-installing the hooks.	2019-09-11 14:41:51 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	9d36c826c0	use fine-grained WorkerStages when transferring and verifying This means that Command.Move and Command.Get don't need to manually set the stage, and is a lot cleaner conceptually. Also, this makes Command.Sync.syncFile use the worker pool better. In the scenario where it first downloads content and then uploads it to some other remotes, it will start in TransferStage, then enter VerifyStage and then go back to TransferStage for each transfer to the remotes. Before, it entered CleanupStage after the download, and stayed in it for the upload, so too many transfer jobs could run at the same time. Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent inside onLocal. That has a Annex state for the remote, with no worker pool. So the resulting calls to enteringStage won't block in there. While Remote.Git.copyToRemote does do checksum verification, I realized that should not use a verification slot in the WorkerPool to do it. Because, it's reading back from eg, a removable disk to checksum. That will contend with other writes to that disk. It's best to treat that checksum verification as just part of the transer. So, removed the todo item about that, as there's nothing needing to be done.	2019-06-19 13:24:20 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	04cc470201	run download checksum verification in separate job pool get, move, copy, sync: When -J or annex.jobs has enabled concurrency, checksum verification uses a separate job pool than is used for downloads, to keep bandwidth saturated. Not yet done for upload checksum verification, but that only affects remotes on local disks.	2019-06-17 14:58:02 -04:00
Joey Hess	ba2551da6f	add startingNoMessage Fixes the last wart in the StartMessage transition. A few commands include other CommandStart actions that generate output, and do not themselves need to display a start/end message.	2019-06-12 14:11:23 -04:00
Joey Hess	8e5ea28c26	finish CommandStart transition The hoped for optimisation of CommandStart with -J did not materialize. In fact, not runnign CommandStart in parallel is slower than -J3. So, CommandStart are still run in parallel. (The actual bad performance I've been seeing with -J in my big repo has to do with building the remoteList.) But, this is still progress toward making -J faster, because it gets rid of the onlyActionOn roadblock in the way of making CommandCleanup jobs run separate from CommandPerform jobs. Added OnlyActionOn constructor for ActionItem which fixes the onlyActionOn breakage in the last commit. Made CustomOutput include an ActionItem, so even things using it can specify OnlyActionOn. In Command.Move and Command.Sync, there were CommandStarts that used includeCommandAction, so output messages, which is no longer allowed. Fixed by using startingCustomOutput, but that's still not quite right, since it prevents message display for the includeCommandAction run inside it too.	2019-06-12 13:24:01 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	568af1073e	filter exported tree through remote's preferred content setting The filtering is fairly efficient as far as building the trees goes, since it reuses adjustTree. But it still needs to traverse the whole tree, and look up the keys used by every file. The tree that gets recorded to export.log is the filtered tree. This way resumes of interrupted sync to an export uses it without needing to recalculate it. And, a change to the preferred content settings of the remote will result in a different tree, so the export will be updated accordingly. The original tree is still used in the remote tracking branch. That branch represents the special remote as a git remote, and if it were a normal git remote, the tree in its head would not be affected by preferred content.	2019-05-20 11:54:55 -04:00
Joey Hess	3d6f1b7dba	Made git-annex sync --content much faster when all the remotes it's syncing with are export/import remotes It was unnecessarily going over all files and checking preferred content against no remotes.	2019-04-10 12:42:10 -04:00
Joey Hess	37041b629d	improve messages around export/import conflicts A conflict can be caused by either export or import when the remote supports both.	2019-04-09 13:03:59 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	28e46d947a	avoid sync --content trying to sendKey to exporttree remotes	2019-03-11 14:09:46 -04:00
Joey Hess	057999f0fc	fix sync --content with remote.name.annex-tracking-branch=master:subdir It was exporting the whole tree not just the subdir. Now tested fully working in both directions.	2019-03-11 14:07:52 -04:00
Joey Hess	8ae0db925b	fix name of annex-tracking-branch config	2019-03-11 13:56:59 -04:00
Joey Hess	c755788256	sync: import when annex-tracking-branch is configured This works, and tested syncing both gets changes from a special remote and sends changes to it, keeping it fully in sync nicely! But have not tried it with a subdir configured.	2019-03-09 13:57:49 -04:00
Joey Hess	633021e135	--no-push and remote.name.annex-push prevent exporting trees to special remotes Users may want sync to only export, or only import and this is broadly analagous to push and pull, so it makes sense to use the same configuration for it.	2019-03-09 13:21:49 -04:00
Joey Hess	e3a704224f	fix export db locking deadlock	2019-03-07 16:06:02 -04:00
Joey Hess	18d7a1dbbb	make export and sync update special remote tracking branch The branch is only updated once the export is 100% complete. This way, if an export is started but interrupted and so the remote does not yet contain some of the files, an import will make a commit on the old branch, and so won't delete the missing files.	2019-03-01 16:35:48 -04:00
Joey Hess	4747fa923d	export: Deprecated the --tracking option. Instead, users can configure remote.<name>.annex-tracking-branch themselves.	2019-02-23 15:54:33 -04:00
Joey Hess	9cebfd7002	purify exportActions Purifying exportActions will allow introspecting and modifying it, which is needed to add progress bar display to it. Only S3 and WebDAV ran an Annex action while constructing ExportActions. There was a small performance gain from them doing that, since a resource was able to be prepared and reused for multiple actions by Command.Export. As seen in commit `809cfbbd8a` and `5d394023eb` S3 and WebDAV actually create a new handle for each access in normal, non-export use. It doesn't seem worth making export use of them marginally more efficient than normal use. It would be better to do that work upfront when constructing the remote. Or perhaps use a MVar to cache a handle. This commit was sponsored by Nick Piper on Patreon.	2019-01-30 15:11:40 -04:00
Joey Hess	ad1d422dd7	fix false positive in export conflict detection Like the earlier fixed one in Command.Export, it occurred when the same tree was exported by multiple clones. Previous fix was incomplete since several other places looked at the list of exported trees to detect when there was an export conflict. Added a single unified function to avoid missing any places it needed to be fixed. This commit was sponsored by mo on Patreon.	2019-01-30 12:36:30 -04:00
Joey Hess	894716512d	add a UUIDDesc type containing a ByteString Groundwork for handling uuid.log using ByteString	2019-01-01 16:17:54 -04:00
Joey Hess	6d381df0e6	sync --content: Fix dropping unwanted content from the local repository This fixes a bug with the numcopies counting when using sync --content. It did not always pass the local repo uuid to handleDropsFrom, and so the numcopies counting was off by one, and unwanted local content would only be dropped when there were numcopies+1 remote copies. Also, support dropping local content that has reached an exporttree remote that is not untrusted (currently only S3 remotes with versioning).	2018-12-18 13:58:12 -04:00
Joey Hess	d65df7ab21	improve messages around export conflicts When an export conflict prevents accessing a special remote, be clearer about what the problem is and how to resolve it. This commit was sponsored by Trenton Cronholm on Patreon.	2018-11-13 15:50:06 -04:00
Joey Hess	4a6ebb1034	make sync update adjusted branch to hide/unhide This completes initial support for --hide-missing, although the assistant still needs to be updated and it perhaps needs to be sped up, and maybe there needs to be a way for git-annex get to operate on missing files. Opened some more todos for those things. This commit was sponsored by Henrik Riomar.	2018-10-20 14:22:28 -04:00
Joey Hess	4a788fbb3b	sync --content now supports --hide-missing adjusted branches This relies on git ls-files --with-tree, which I'm using in a way that its man page does not document. Hm. I emailed the git list to try to get the docs improved, but at least the git test suite does test the same kind of use case I'm using here. Performance impact when not in an adjusted branch is limited to some additional MVar accesses, and a single git call to determine the name of the current branch. So very minimal. When in an adjusted branch, the performance impact is in Annex.WorkTree.lookupFile, which starts doing an equal amount of work for files that didn't exist as it already did for files that were unlocked. This commit was sponsored by Jochen Bartl on Patreon.	2018-10-19 17:51:25 -04:00
Joey Hess	8be5a7269a	refactor getCurrentBranch Both Command.Sync and Annex.Ingest had their own versions of this. The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem to need it. That is only checking to see if it's in an adjusted unlocked branch, and when in an adjusted branch, the branch does in fact exist, so the added check that Git.Branch.current does is fine. This commit was sponsored by Denis Dzyubenko on Patreon.	2018-10-19 17:29:18 -04:00
Joey Hess	53526136e8	move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-10-01 14:12:06 -04:00
Joey Hess	9adee3f2fb	sync: Warn when a remote's export is not updated to the current tree because export tracking is not configured. Only display the warning when the current branch has a tree that is not the same as the tree in the export. Note that it doesn't check to see if the current tree is in incompleteExportedTreeish; it might be worth checking that and reminding the user about an incomplete export, but when export tracking is not configured, they are probably not in the right clone of the repository to resolve the incomplete export. This commit was sponsored by Ethan Aubin.	2018-09-27 15:41:18 -04:00
Joey Hess	ae11394efa	added annex.commitmessage Added annex.commitmessage config that can specify a commit message for the git-annex branch instead of the usual "update". This commit was supported by the NSF-funded DataLad project.	2018-08-02 14:06:06 -04:00
Joey Hess	fd5a392006	cache remotes via annex-speculate-present Added remote.name.annex-speculate-present config that can be used to make cache remotes. Implemented it in Remote.keyPossibilities, which is used by the get/move/copy/mirror commands, and nothing else. This way, things like whereis will not show content that's speculatively present. The assistant and sync --content were not using Remote.keyPossibilities, and were changed to use it. The efficiency hit should be small; Remote.keyPossibilities is only used before transferring a file, which is the expensive operation. And, it's only doing one lookup of the remoteList and a very cheap filter over it. Note that, git-annex still updates the location log when copying content to a remote with annex-speculate-present set. In this case, the location tracking will indicate that content is present in the remote. This may not be wanted for caches, or may not be a real problem for them. TBD. This commit was supported by the NSF-funded DataLad project.	2018-08-01 14:28:05 -04:00
Joey Hess	a5f598a6aa	remove use of remoteGitConfig Unfortunately one more use remains.. This should be just as fast as the other method. The remote's Git.Repo has already had its config read, so Annex.new's call to Git.Config.read is a noop. Thid commit was sponsored by andrea rota.	2018-06-05 13:15:04 -04:00
Joey Hess	67e46229a5	change Remote.repo to Remote.getRepo This is groundwork for letting a repo be instantiated the first time it's actually used, instead of at startup. The only behavior change is that some old special cases for xmpp remotes were removed. Where before git-annex silently did nothing with those no-longer supported remotes, it may now fail in some way. The additional IO action should have no performance impact as long as it's simply return. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon	2018-06-04 15:30:26 -04:00
Joey Hess	af8546990d	move: --safe/--unsafe and potential drop race fix move: Added --safe option, which makes move honor numcopies settings. Also --unsafe enables the default behavior, anticipating that the default may one day change. This commit was sponsored by Ethan Aubin.	2018-04-09 16:20:10 -04:00
Joey Hess	db057dcff0	fix sync bug in direct mode sync: Fix bug that prevented pulling changes into direct mode repositories that were committed to remotes using git commit rather than git-annex sync. This commit was supported by the NSF-funded DataLad project.	2018-02-26 14:10:03 -04:00
Joey Hess	25703e1413	finally really add back custom-setup stanza Fourth or fifth try at this and finally found a way to make it work. Absurd amount of busy-work forced on me by change in cabal's behavior. Split up Utility modules that need posix stuff out of ones used by Setup. Various other hacks around inability for Setup to use anything that ifdefs a use of unix. Probably lost a full day of my life to this. This is how build systems make their users hate them. Just saying.	2017-12-31 16:36:39 -04:00
Joey Hess	4781ca297b	showStart variant for when there's no worktree file Clean up some uses of showStart with "" for the file, or in some cases, a non-filename description string. That would generate bad json, although none of the commands doing that supported --json. Using "" for the file resulted in output like "foo rest"; now the extra space is eliminated. This commit was sponsored by Fernando Jimenez on Patreon.	2017-11-28 15:14:16 -04:00
Joey Hess	e1ac299ad0	better dup key with -J fix This avoids all the complication about redundant work discussed in the previous try at fixing this. At the expense of needing each command that could have the problem to be patched to simply wrap the action in onlyActionOn once the key is known. But there do not seem to be many such commands. onlyActionOn' should not be used with a CommandStart (or CommandPerform), although the types do allow it. onlyActionOn handles running the whole CommandStart chain. I couldn't immediately see a way to avoid mistken use of onlyActionOn'. This commit was supported by the NSF-funded DataLad project.	2017-10-17 18:48:53 -04:00
Joey Hess	85ed38a574	Avoid repeated checking that files passed on the command line exist. git annex add, git annex lock etc make multiple seek passes, and each seek pass checked that files existed. That was unncessary redundant work. Fixed by adding a new WorkTreeItem type, make seek actions use it, and check that the files exist when constructing it. This commit was supported by the NSF-funded DataLad project.	2017-10-16 14:10:20 -04:00
Joey Hess	e8c9a5c515	sync: Added --cleanup, which removes local and remote synced/ branches. Also deletes any tagged pushes that the assistant might have done, since those would also prevent resetting a branch back. This commit was sponsored by andrea rota.	2017-09-28 14:58:48 -04:00
Joey Hess	d71c65ca0a	add exporter thread to assistant This is similar to the pusher thread, but a separate thread because git pushes can be done in parallel with exports, and updating a big export should not prevent other git pushes going out in the meantime. The exportThread only runs at most every 30 seconds, since updating an export is more expensive than pushing. This may need to be tuned. Added a separate channel for export commits; the committer records a commit in that channel. Also, reconnectRemotes records a dummy commit, to make the exporter thread wake up and make sure all exports are up-to-date. So, connecting a drive with a directory special remote export will immediately update it, and getting online will automatically update S3 and WebDAV exports. The transfer queue is not involved in exports. Instead, failed exports are retried much like failed pushes. This commit was sponsored by Ewen McNeill.	2017-09-20 15:29:13 -04:00
Joey Hess	2e69efea8d	git annex sync --content to exports Assistant still todo. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon	2017-09-19 14:20:47 -04:00
Joey Hess	d39c120afa	add annex-ignore-command and annex-sync-command configs Added remote configuration settings annex-ignore-command and annex-sync-command, which are dynamic equivilants of the annex-ignore and annex-sync configurations. For this I needed a new DynamicConfig infrastructure. Its implementation should be as fast as before when there is no dynamic config, and it caches so shell commands are only run once. Note that annex-ignore-command exits nonzero when the remote should be ignored. While that may seem backwards, it allows using the same command for it as for annex-sync-command when you want to disable both. This commit was sponsored by Trenton Cronholm on Patreon.	2017-08-17 13:54:14 -04:00
Joey Hess	94351daba6	configuration to disable automatic merge conflict resolution * Added annex.resolvemerge configuration, which can be set to false to disable the usual automatic merge conflict resolution done by git-annex sync and the assistant. * sync: Added --no-resolvemerge option. Note that disabling merge conflict resolution is probably not a good idea in a direct mode repo or adjusted branch. Since updates to both are done outside the usual work tree, if it fails the tree is not left in a conflicted state, and it would be hard to manually resolve the conflict. Still, made annex.resolvemerge be supported in those cases for consistency. This commit was sponsored by Riku Voipio.	2017-06-01 12:51:01 -04:00
Joey Hess	db1600b2de	de-Maybe remoteGitConfig It's always set, so does not need to be a Maybe.	2017-05-11 16:05:01 -04:00
Joey Hess	29e73f76ef	Added remote.<name>.annex-push and remote.<name>.annex-pull The former can be useful to make remotes that don't get fully synced with local changes, which comes up in a lot of situations. The latter was mostly added for symmetry, but could be useful (though less likely to be). Implementing `remote.<name>.annex-pull` was a bit tricky, as there's no one place where git-annex pulls/fetches from remotes. I audited all instances of "fetch" and "pull". A few cases were left not checking this config: * Git.Repair can try to pull missing refs from a remote, and if the local repo is corrupted, that seems a reasonable thing to do even though the config would normally prevent it. * Assistant.WebApp.Gpg and Remote.Gcrypt and Remote.Git do fetches as part of the setup process of a remote. The config would probably not be set then, and having the setup fail seems worse than honoring it if it is already set. I have not prevented all the code that does a "merge" from merging branches from remotes with remote.<name>.annex-pull=false. That could perhaps be done, but it would need a way to map from branch name to remote name, and the way refspecs work makes that hard to get really correct. So if the user fetches manually, the git-annex branch will get merged, for example. Anther way of looking at/justifying this is that the setting is called "annex-pull", not "annex-merge". This commit was supported by the NSF-funded DataLad project.	2017-04-05 13:22:35 -04:00
Joey Hess	64f924dc93	sync --content-of=path For when you want to sync only some files' contents, not the whole working tree. This commit was sponsored by Anthony DeRobertis on Patreon.	2017-03-20 16:00:48 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00

1 2 3 4 5 ...

254 commits