git-annex

Author	SHA1	Message	Date
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	36cf61d752	simplification Whether or not there's a false index, it can't Restage here. When there's a false index, restaging would alter it and not the real index, but it fails anyway because that index is locked. When there's not a false index, the index is locked, and so restaging can't alter it.	2019-08-28 15:46:35 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	8e5ea28c26	finish CommandStart transition The hoped for optimisation of CommandStart with -J did not materialize. In fact, not runnign CommandStart in parallel is slower than -J3. So, CommandStart are still run in parallel. (The actual bad performance I've been seeing with -J in my big repo has to do with building the remoteList.) But, this is still progress toward making -J faster, because it gets rid of the onlyActionOn roadblock in the way of making CommandCleanup jobs run separate from CommandPerform jobs. Added OnlyActionOn constructor for ActionItem which fixes the onlyActionOn breakage in the last commit. Made CustomOutput include an ActionItem, so even things using it can specify OnlyActionOn. In Command.Move and Command.Sync, there were CommandStarts that used includeCommandAction, so output messages, which is no longer allowed. Fixed by using startingCustomOutput, but that's still not quite right, since it prevents message display for the includeCommandAction run inside it too.	2019-06-12 13:24:01 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	cb375977a6	follow-on changes from MetaData type changes Including writing and parsing the metadata log files with bytestring-builder and attoparsec.	2019-01-07 15:51:05 -04:00
Joey Hess	ca7de61454	git post-checkout and post-merge hooks * init, upgrade: Install git post-checkout and post-merge hooks that run git annex smudge --update. * precommit: Run git annex smudge --update, because the post-merge hook is not run when there is a merge conflict. So the work tree will be updated when a commit is made to resolve the merge conflict. * precommit: Run git annex smudge --update, because the post-merge hook is not run when there is a merge conflict. So the work tree will be updated when a commit is made to resolve the merge conflict. * Note that git has no hooks run after git stash or git cherry-pick, so the user will have to manually run git annex smudge --update after such commands. Nothing currently installs the hooks into v6 repos that already exist. Something will need to be done about that, either move this behavior to v7, or document that the user will need to manually fix up their v6 repos. This commit was sponsored by Eric Drechsel on Patreon.	2018-10-25 15:59:51 -04:00
Joey Hess	53526136e8	move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-10-01 14:12:06 -04:00
Joey Hess	85ed38a574	Avoid repeated checking that files passed on the command line exist. git annex add, git annex lock etc make multiple seek passes, and each seek pass checked that files existed. That was unncessary redundant work. Fixed by adding a new WorkTreeItem type, make seek actions use it, and check that the files exist when constructing it. This commit was supported by the NSF-funded DataLad project.	2017-10-16 14:10:20 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	121f5d5b0c	annex.thin Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.	2015-12-27 15:59:59 -04:00
Joey Hess	d245a80518	avoid pre-commit check having to do with v5 unlocked files when in v6 mode	2015-12-15 14:09:36 -04:00
Joey Hess	a983a3a7a2	rename stuff for v5 unlocked files to indicate it's old	2015-12-15 14:08:07 -04:00
Joey Hess	751120c171	avoid pre-commit hook messing up new-style unlocked files in v6 repo	2015-12-09 15:18:54 -04:00
Joey Hess	6e5c1f8db3	convert all commands to work with optparse-applicative Still no options though.	2015-07-08 15:08:02 -04:00
Joey Hess	a2ba701056	started converting to use optparse-applicative This is a work in progress. It compiles and is able to do basic command dispatch, including git autocorrection, while using optparse-applicative for the core commandline parsing. * Many commands are temporarily disabled before conversion. * Options are not wired in yet. * cmdnorepo actions don't work yet. Also, removed the [Command] list, which was only used in one place.	2015-07-08 13:36:25 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	adc5ca70a8	pre-commit: Block partial commit of unlocked annexed file, since that left a typechange staged in index I had hoped that the git devs could change git's handling of partial commits to not use a false index file, but seems not. So, this relies on some git internals to detect that case. The test suite has a test case added to catch it if changes to git break it. This commit was sponsored by Paul Tagliamonte.	2014-11-10 15:36:24 -04:00
Joey Hess	59f88558d5	doh't use "def" for command definitions, it conflicts with Data.Default.def	2014-10-14 14:20:10 -04:00
Joey Hess	b61c6bc2ff	hlint	2014-10-09 15:46:05 -04:00
Joey Hess	d279180266	reorganize and refactor lock code Added a convenience Utility.LockFile that is not a windows/posix portability shim, but still manages to cut down on the boilerplate around locking. This commit was sponsored by Johan Herland.	2014-08-20 16:45:58 -04:00
Joey Hess	092041fab0	Ensure that all lock fds are close-on-exec, fixing various problems with them being inherited by child processes such as git commands. (With the exception of daemon pid locking.) This fixes at part of #758630. I reproduced the assistant locking eg, a removable drive's annex journal lock file and forking a long-running git-cat-file process that inherited that lock. This did not affect Windows. Considered doing a portable Utility.LockFile layer, but git-annex uses posix locks in several special ways that have no direct Windows equivilant, and it seems like it would mostly be a complication. This commit was sponsored by Protonet.	2014-08-20 11:37:02 -04:00
Joey Hess	c784ef4586	unify exception handling into Utility.Exception Removed old extensible-exceptions, only needed for very old ghc. Made webdav use Utility.Exception, to work after some changes in DAV's exception handling. Removed Annex.Exception. Mostly this was trivial, but note that tryAnnex is replaced with tryNonAsync and catchAnnex replaced with catchNonAsync. In theory that could be a behavior change, since the former caught all exceptions, and the latter don't catch async exceptions. However, in practice, nothing in the Annex monad uses async exceptions. Grepping for throwTo and killThread only find stuff in the assistant, which does not seem related. Command.Add.undo is changed to accept a SomeException, and things that use it for rollback now catch non-async exceptions, rather than only IOExceptions.	2014-08-07 22:03:29 -04:00
Joey Hess	7dc6804154	unannex, uninit: Avoid committing after every file is unannexed, for massive speedup. pre-commit hook lock added, so unannex can prevent the hook from running in a confusing state. This commit was sponsored by Fredrik Hammar	2014-03-21 14:41:05 -04:00
Joey Hess	d0fce426c4	pre-commit-annex hook script to automatically extract metadata from lots of types of files Using the extract(1) program to do the heavy lifting. Decided to make git-annex run pre-commit-annex when committing. Since git-annex pre-commit also runs it, it'll be run when git commit is run too, via the pre-commit hook. This basically gives back the pre-commit hook that git-annex took away. The implementation avoids repeatedly looking for the hook script when the assistant is running and committing repeatedly; only checks if the hook is available once. To make the script simpler, made git-annex metadata -s field?=value only set a field when it's not already got a value. This commit was sponsored by bak.	2014-03-02 20:11:58 -04:00
Joey Hess	1435c4f149	factor out new module	2014-02-22 13:35:50 -04:00
Joey Hess	39ebfa1a2e	pre-commit: Update metadata when committing changes to annexed files within a view. So the user can now switch to a view and then move files around within it to manage metadata. For example, moving a file into a new directory when in the tags=* view adds a tag to it. Implementation is fairly efficient. One diff-index, which is no more expensive than the first stage of a git commit, followed by possibly some cat-file --batch traffic to find the key (when deleting a file). Very similar to what's done in direct mode when committing. And like direct mode when updating the WC after a merge, it has to buffer the diff-tree values in order to make 2 passes over them. When not in a view, pre-commit now does one extra git symbolic-ref, which is tiny overhead. This commit was sponsored by Andrew Eskridge.	2014-02-19 14:17:58 -04:00
Joey Hess	cce69eee4d	avoid using function named that conflicts with name used in newer version of process library	2014-01-29 13:44:53 -04:00
Joey Hess	34c8af74ba	fix inversion of control in CommandSeek (no behavior changes) I've been disliking how the command seek actions were written for some time, with their inversion of control and ugly workarounds. The last straw to fix it was sync --content, which didn't fit the Annex [CommandStart] interface well at all. I have not yet made it take advantage of the changed interface though. The crucial change, and probably why I didn't do it this way from the beginning, is to make each CommandStart action be run with exceptions caught, and if it fails, increment a failure counter in annex state. So I finally remove the very first code I wrote for git-annex, which was before I had exception handling in the Annex monad, and so ran outside that monad, passing state explicitly as it ran each CommandStart action. This was a real slog from 1 to 5 am. Test suite passes. Memory usage is lower than before, sometimes by a couple of megabytes, and remains constant, even when running in a large repo, and even when repeatedly failing and incrementing the error counter. So no accidental laziness space leaks. Wall clock speed is identical, even in large repos. This commit was sponsored by an anonymous bitcoiner.	2014-01-20 04:57:36 -04:00
Joey Hess	03932212ec	Avoid using git commit in direct mode, since in some situations it will read the full contents of files in the tree. The assistant's commit code also always avoids git commit, for simplicity. Indirect mode sync still does a git commit -a to catch unstaged changes. Note that this means that direct mode sync no longer runs the pre-commit hook or any other hooks git commit might call. The git annex pre-commit hook action for direct mode is however explicitly run. (The assistant already ran git commit with hooks disabled, so no change there.)	2013-12-01 13:59:45 -04:00
Joey Hess	7206dcb376	update for DiffTree change This actually fixes a bug; if pre-commit was run in a subdir, it would pass relative files when updating the associated file maps, and so the maps wouldn't update. I don't think this bug happened in practice, due to the way pre-commit is called by the hook. It happened to chdir to the top of the work tree.	2013-10-17 14:52:12 -04:00
Joey Hess	b405295aee	hlint test suite still passes	2013-09-25 03:09:06 -04:00
Joey Hess	006cf7976f	more completely solve catKey memory leak Done using a mode witness, which ensures it's fixed everywhere. Fixing catFileKey was a bear, because git cat-file does not provide a nice way to query for the mode of a file and there is no other efficient way to do it. Oh, for libgit2.. Note that I am looking at tree objects from HEAD, rather than the index. Because I cat-file cannot show a tree object for the index. So this fix is technically incomplete. The only cases where it matters are: 1. A new large file has been directly staged in git, but not committed. 2. A file that was committed to HEAD as a symlink has been staged directly in the index. This could be fixed a lot better using libgit2.	2013-09-19 16:41:21 -04:00
Joey Hess	eb42bde19a	sync, pre-commit, indirect: Avoid unnecessarily catting non-symlink files from git, which can be so large it runs out of memory.	2013-09-19 14:48:42 -04:00
guilhem	f15fda60ed	Speed up the 'unused' command. Instead of populating the second-level Bloom filter with every key referenced in every Git reference, consider only those which differ from what's referenced in the index. Incidentaly, unlike with its old behavior, staged modifications/deletion/... will now be detected by 'unused'. Credits to joeyh for the algorithm. :-)	2013-08-25 21:02:13 -04:00
Joey Hess	cfd3b16fe1	add section metadata to all commands Not yet used .. mindless train work.	2013-03-24 18:28:21 -04:00
Joey Hess	547d7745fb	pre-commit: Update direct mode mappings. Making the pre-commit hook look at git diff-index to find changed direct mode files and update the mappings works pretty well. One case where it does not work is when a file is git annex added, and then git rmed, and then this is committed. That's a no-op commit, so the hook probably doesn't even run, and it certianly never notices that the file was deleted, so the mapping will still have the original filename in it. For this and other reasons, it's important that the mappings still be treated as possibly inconsistent. Also, the assistant now allows the pre-commit hook to run when in direct mode, so the mappings also get updated there.	2013-02-06 12:44:19 -04:00
Joey Hess	7272179979	avoid running pre-commit hook in direct mode The code that handles committing unlocked files in indirect mode did something unexpected and data lossy.	2013-01-17 14:11:01 -04:00
Joey Hess	f12202f771	optimize pre-commit in direct mode	2013-01-06 16:56:55 -04:00
Joey Hess	20fafc6a2d	avoid pre-commit in direct mode It was a no-op until my recent change that made lookupFile work in direct mode.	2013-01-05 16:06:20 -04:00
Joey Hess	60ab3d84e1	added ifM and nuked 11 lines of code no behavior changes	2012-03-14 17:43:34 -04:00
Joey Hess	cbaebf538a	rework git check-attr interface Now gitattributes are looked up, efficiently, in only the places that really need them, using the same approach used for cat-file. The old CheckAttr code seemed very fragile, in the way it streamed files through git check-attr. I actually found that `cad8824852` was still deadlocking with ghc 7.4, at the end of adding a lot of files. This should fix that problem, and avoid future ones. The best part is that this removes withAttrFilesInGit and withNumCopies, which were complicated Seek methods, as well as simplfying the types for several other Seek methods that had a Backend tupled in.	2012-02-13 23:52:21 -04:00
Joey Hess	b327227ba5	better limiting of start actions to only run whenAnnexed Mostly only refactoring, but this does remove one redundant stat of the symlink by copy.	2011-11-10 23:45:14 -04:00
Joey Hess	f97c783283	clean up check selection code This new approach allows filtering out checks from the default set that are not appropriate for a command, rather than having to list every check that is appropriate. It also reduces some boilerplate. Haskell does not define Eq for functions, so I had to go a long way around with each check having a unique id. Meh.	2011-10-29 15:19:05 -04:00
Joey Hess	b955238ec7	Fail if --from or --to is passed to commands that do not support them.	2011-10-27 18:56:54 -04:00
Joey Hess	5b74b130a3	refactored and generalized pre-command sanity checking	2011-10-27 16:31:35 -04:00
Joey Hess	5ff04bf2af	tweak	2011-09-15 16:59:52 -04:00
Joey Hess	35145202d2	remove command type definitions These were a mistake, they make the type signatures harder to read and less flexible. The CommandSeek, CommandStart, CommandPerform, and CommandCleanup types were a good idea, but composing them with the parameters expected is going too far.	2011-09-15 16:50:49 -04:00

1 2

72 commits