git-annex

Author	SHA1	Message	Date
Joey Hess	c15fa17635	optimise adjustTree when adding many TreeItems (take 2) The old code traversed the list of addtreeitems once per subdirectory in the tree, so could get quite slow. Converting to Map lookups sped it up significantly. In my test case, git-annex import used to take about 2 minutes, when calling adjustTree to add back excluded files to the imported tree. This dropped it down to 6 seconds. Of which 4 seconds are the actual enumeration of the contents of the remote, so really only 2 seconds for this. The path prefix map is a bit suboptimal memory-wise, since items get stored in the map once per subdirectory on the path to the item. It would perhaps be better to use a tree data structure. Also it's suboptimal memory-wise that it builds two maps, as well as retaining a reference to addtreeitems. I could not see a way around that though. This is a fixed version of commit `2c86651180`. It fixes a test suite reversion. Sponsored-by: Jack Hill on Patreon	2024-01-16 11:53:57 -04:00
Joey Hess	d9f36085c6	Revert "optimise adjustTree when adding many TreeItems" This reverts commit `2c86651180`. That commit caused a test failure and problably wrong trees to be imported, so revert until that is fixed.	2024-01-10 16:36:44 -04:00
Joey Hess	2c86651180	optimise adjustTree when adding many TreeItems The old code traversed the list of addtreeitems once per subdirectory in the tree, so could get quite slow. Converting to Map lookups sped it up significantly. In my test case, git-annex import used to take about 2 minutes, when calling adjustTree to add back excluded files to the imported tree. This dropped it down to 6 seconds. Of which 4 seconds are the actual enumeration of the contents of the remote, so really only 2 seconds for this. The path prefix map is a bit suboptimal memory-wise, since items get stored in the map once per subdirectory on the path to the item. It would perhaps be better to use a tree data structure. Also it's suboptimal memory-wise that it builds two maps, as well as retaining a reference to addtreeitems. I could not see a way around that though. Sponsored-by: Luke T. Shumaker on Patreon	2024-01-03 15:07:49 -04:00
Joey Hess	d06aee7ce0	make commitMigration interuption safe Fixed inversion of control issue, so the tree is recorded in streamLogFile finalizer. Sponsored-by: Leon Schuermann on Patreon	2023-12-06 16:29:58 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	7298123520	build git trees using ContentIdentifier to speed up import This gets the trees built, but it does not use them. Next step will be to remember the tree for next time an import is done, and diff between old and new trees to find the files that have changed. Added --missing to the mktree parameters. That only disables a check, so it's ok to do everywhere mktree is used. It probably also speeds up mktree to disable the check. Note that git fsck does not complain about the resulting tree objects that point to shas that are not in the repository. Even with --strict. A quick benchmark, importing 10000 files, this slowed it down from 2:04.06 to 2:04.28. So it will more than pay for itself. Sponsored-by: Luke Shumaker on Patreon	2023-05-31 12:46:54 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	1dc82f177f	use bytestring filepaths more This should be more efficient, and allocate less. Sponsored-by: Graham Spencer on Patreon	2021-10-05 15:44:02 -04:00
Joey Hess	5712a7ef93	fix incomplete pattern match warning There was not really a bug here, because the 2 lists are always the same length, but the compiler does not know that.	2021-03-30 12:59:53 -04:00
Joey Hess	4611813ef1	Fix bug importing from a special remote into a subdirectory more than one level deep Which generated unusual git trees that could confuse git merge, since they incorrectly had 2 subtrees with the same name. Root of the bug was a) not testing that at all! but also b) confusing graftdirs, which contains eg "foo/bar" with non-recursively read trees, which would contain eg "bar" when reading a subtree of "foo". It's worth noting that Annex.Import uses graftTree, but it really shouldn't have needed to. Eg, when importing into foo/bar from a remote, it's enough to generate a tree of foo/bar/x, foo/bar/y, and does not include other files that are at the top of the master branch. It uses graftTree, so it does include the other files, as well as the foo/bar tree. git merge will do the same thing for both trees. With that said, switching it away from graftTree would result in another import generating a new commit that seems to delete files that were there in a previous commit, so it probably has to keep using graftTree since it used it before. This commit was sponsored by Kevin Mueller on Patreon.	2021-03-26 16:04:36 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	ed717cf646	fix handling of subtree I don't think this actually fixes any buggy behavior in git-annex, I just noticed that using treeItemToLsTreeItem and then serializing it resulted in something starting with "160000 blob" rather than "160000 commit"	2021-03-12 13:24:19 -04:00
Joey Hess	4b57e1c0ad	allow adjusttreeitem to remove submodules	2021-03-12 13:19:23 -04:00
Joey Hess	33bcee86f1	avoid using wildcard near bug kyle fixed	2021-01-07 13:44:23 -04:00
Kyle Meyer	fd161da2c2	adjustTree: Consider submodule deletions In addition to regular file deletions, the removefiles argument passed to adjustTree may contain removed submodules. When making the new tree, filter these out in the same way that is done for regular files so that the deletion is propagated.	2021-01-07 13:43:09 -04:00
Joey Hess	6c81e0c8f1	ByteString Ref continued Several nice speed wins I think. At 340/633 files converted.	2020-04-07 13:27:11 -04:00
Kyle Meyer	376e69ec65	adjust: Propagate submodule changes back to original branch When the recorded submodule commit changes on an adjusted branch, the change is carried in the function that reverseAdjustedCommit passes for adjustTree's adjusttreeitem parameter. Update the CommitObject handling in adjustTree to consider adjusttreeitem so that a submodule change is synced back.	2020-03-26 15:16:08 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	6a97ff6b3a	wip RawFilePath Goal is to make git-annex faster by using ByteString for all the worktree traversal. For now, this is focusing on Command.Find, in order to benchmark how much it helps. (All other commands are temporarily disabled) Currently in a very bad unbuildable in-between state.	2019-11-25 16:18:19 -04:00
Joey Hess	bbdeb1a1a8	sync: Fix crash when there are submodules and an adjusted branch is checked out Reverse adjusting the branch uses treeItemToTreeContent, which was missed when adding submodule support earlier.	2019-10-23 11:52:56 -04:00
Joey Hess	97fd9da6e7	add back non-preferred files to imported tree Prevents merging the import from deleting the non-preferred files from the branch it's merged into. adjustTree previously appended the new list of items to the old, which could result in it generating a tree with multiple files with the same name. That is not good and confuses some parts of git. Gave it a function to resolve such conflicts. That allowed dealing with the problem of what happens when the import contains some files (or subtrees) with the same name as files that were filtered out of the export. The files from the import win.	2019-05-20 16:43:52 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	bab6c570b0	buildImportTrees is fully working buildImportCommit not yet tested	2019-02-22 12:41:17 -04:00
Joey Hess	1580ff3866	graphTree now works properly in all cases (That I could think of.)	2019-02-21 22:25:42 -04:00
Joey Hess	8fdea8f444	WIP Added graftTree but it's buggy. Should use graftTree in Annex.Branch.graftTreeish; it will be faster than the current implementation there. Started Annex.Import, but untested and it doesn't yet handle tree grafting.	2019-02-21 17:32:59 -04:00
Joey Hess	5483ea90ec	graft exported tree into git-annex branch So it will be available later and elsewhere, even after GC. I first though to use git update-index to do this, but feeding it a line with a tree object seems to always cause it to generate a git subtree merge. So, fell back to using the Git.Tree interface to maniupulate the trees, and not involving the git-annex branch index file at all. This commit was sponsored by Andreas Karlsson.	2017-08-31 18:06:49 -04:00
Joey Hess	a13c0ce66c	adjust: Fix behavior when used in a repository that contains submodules. Also fixed the LsFiles parser to not assume its output has a fixed width type field.	2017-02-20 13:44:55 -04:00
Joey Hess	e23028d19b	restart coprocess in raw mode Restarting a crashing git process could result in filename encoding issues when not in a unicode locale, as the restarted processes's handles were not read in raw mode. Since rawMode is always used when starting a coprocess, didn't bother to parameterise it and just always enable it for simplicity. This commit was sponsored by Jake Vosloo on Patreon.	2016-11-01 14:03:59 -04:00
Joey Hess	3f25317ad5	fix tree graft-in bug When adding a tree like a/b/c/d when a/b already exists, fixes the bug that the tree that got created was a/b/a/b/c/d Just need to flatten out the top N directories of the tree that's being grafted in, so we get the c/d part. This was complicated by the Tree data type being a rose tree rather than a regular tree. This commit was sponsored by Nick Daly on Patreon.	2016-10-11 15:36:40 -04:00
Joey Hess	b82c3e0783	sync: Fix bug in adjusted branch merging that could cause recently added files to be lost when updating the adjusted branch. The modification flag was not being set when making modifications deep in a tree, so parent trees were not updated to contain the modified tree. Seems to have exposed another bug where the wrong filename gets grafted in. This commit was sponsored by Brock Spratlen on Patreon.	2016-10-10 15:00:45 -04:00
Joey Hess	066f5bcdcb	more windows path fixes Let git-style filepaths be looked up in the removeset, even though windows-style filepaths are probably being fed into it.	2016-05-04 12:42:05 -04:00
Joey Hess	2cdfe33a4c	more windows path fixes beneathSubTree can be called with both windows-style and git-style paths, so needs to normalize to windows-style.	2016-05-04 12:36:50 -04:00
Joey Hess	db9269712f	avoid hardcoded slashes; broke on windows	2016-05-03 19:09:27 -04:00
Joey Hess	56dee9af10	fix build with ghc 7.6.3	2016-04-08 16:09:00 -04:00
Joey Hess	6c023e14ef	grafting new items into existing tree	2016-03-11 19:29:43 -04:00
Joey Hess	ad04550055	refactor	2016-03-11 16:45:40 -04:00
Joey Hess	f3b9c48a09	fixme	2016-03-11 16:37:31 -04:00
Joey Hess	ba1ef156a2	fix deletion of files in adjustTree	2016-03-11 16:30:06 -04:00
Joey Hess	b9184f69a7	improve propigation of commits from adjusted branches Only reverse adjust the changes in the commit, which means that adjustments do not need to be generally cleanly reversable. For example, an adjustment can unlock all locked files, but does not need to worry about files that were originally unlocked when reversing, because it will only ever be run on files that have been changed. So, it's ok if it locks all files when reversed, or even leaves all files as-is when reversed.	2016-03-11 16:05:06 -04:00
Joey Hess	3c4ad3eeca	indent	2016-03-11 14:46:54 -04:00
Joey Hess	fed8fcb99f	allow adding new items via adjustTree	2016-03-11 14:08:06 -04:00
Joey Hess	e5dd91b189	better encapsulation	2016-02-23 22:22:22 -04:00
Joey Hess	4ea36b8c63	few strictness improvemnets	2016-02-23 22:03:47 -04:00
Joey Hess	85b05a29df	refactor	2016-02-23 21:56:08 -04:00
Joey Hess	e08bebf0eb	add adjustTree (low-level) interface that avoids buffering much in memory Using getTree and recordTree in my big repo takes 594 mb ram. Using adjustTree takes 73 mb.	2016-02-23 21:35:16 -04:00
Joey Hess	123f823ef7	no streaming extractTree has to parse the whole input list in order to generate a tree, so convert interface to non-streaming. Some quick memory benchmarks in a repo with 60k files don't look too bad despite not streaming. To stream, without building up a whole tree object, one way would be a new interface: adjustTree :: MonadIO m :: (TreeItem -> m (Maybe TreeItem)) -> Ref -> Repo -> m Sha This would only need to buffer tree objects from the current one down to the root, in order to update trees when a TreeItem is changed. But, while it supports changing items in the tree, and removing items, it does not support adding new items, or moving items from one directory to another.	2016-02-23 20:25:31 -04:00
Joey Hess	e266a6ec78	use getSha	2016-02-23 18:30:11 -04:00
Joey Hess	fc072699b7	minor improvements	2016-02-23 17:21:42 -04:00
Joey Hess	ae76cfde7d	add mktree interface	2016-02-23 16:36:38 -04:00

50 commits