git-annex

Author	SHA1	Message	Date
Joey Hess	218e1983ad	reorg	2021-11-04 15:03:12 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	d706b49979	handle unhandled case	2021-11-04 14:36:48 -04:00
Joey Hess	b1f9dadafe	git long-running filter process implementation This module is not used yet, but the plan is to use it for smudge/clean filtering, at least as an option. In some circumstances, using this interface may perform better than the interface git-annex is currently using. Sponsored-by: Brock Spratlen on Patreon	2021-11-03 15:41:26 -04:00
Joey Hess	e9685aac5b	git pkt-line implementation This module is not used yet, but the plan is to implement the long running filter process for smudge/clean. Sponsored-by: Shae Erisson on Patreon	2021-11-03 15:30:25 -04:00
Joey Hess	b36cc0320e	avoid crashing tilde expansion on user who does not exist git does not crash when there's a remote configured for a user who does not exist, and this prevents git-annex from crashing too. Consider that a user might exist on one system but not another, and the git repo be moved between systems. So not crashing is desirable. Note that git fetch seems to mishandle a remote path like ~foo/bar when the user does not exist. While it does access ./~foo/bar, and gets as far as running git-upload-pack on the path, it then complains there is no such repo. So different parts of git seem to be doing different things in that edge case. Anyway, git-annex does not need to be bug-for-bug compatible with git. Sponsored-by: Jack Hill on Patreon	2021-10-13 09:16:36 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	1dc82f177f	use bytestring filepaths more This should be more efficient, and allocate less. Sponsored-by: Graham Spencer on Patreon	2021-10-05 15:44:02 -04:00
Joey Hess	ee31698825	remove errant print debug	2021-10-03 18:18:04 -04:00
Joey Hess	9012fa0187	reinject: Fix crash when reinjecting a file from outside the repository Commit `4bf7940d6b` introduced this problem, but was otherwise doing a good thing. Problem being that fileRef "/foo" used to return ":./foo", which was actually wrong, but as long as there was no foo in the local repository, catKey could operate on it without crashing. After that fix though, fileRef would return eg "../../foo", resulting in fileRef returning ":./../../foo", which will make git cat-file crash since that's not a valid path in the repo. Fix is simply to make fileRef detect paths outside the repo and return Nothing. Then catKey can be skipped. This needed several bugfixes to dirContains as well, in previous commits. In Command.Smudge, this led to needing to check for Nothing. That case should actually never happen, because the fileoutsiderepo check will detect it earlier. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 14:06:34 -04:00
Joey Hess	e47b4badb3	separate handles for cat-file and cat-file --batch-check This avoids starting one process when only the other one is needed. Eg in git-annex smudge --clean, this reduces the total number of cat-file processes that are started from 4 to 2. The only performance penalty is that when both are needed, it has to do twice as much work to maintain the two Maps. But both are very small, consisting of 1 or 2 items, so that work is negligible. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 13:16:13 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	fcd1b93a7d	whereused --historical Does not check the reflog, but otherwise works. It's possible for it to display something that is not an annexed file, if a non-annexed file somehow ends up containing something that looks like the key's name. This seems very unlikely to happen, and it would add a lot of complexity to detect it and somehow skip over that file, since the git log would need to either be run again, or not limited to 1 result and canceled once enough results have been read. Also, it kind of seems ok, if a file refers to a key, to consider that as a place the key was used, for some definition of used. So, I punted on dealing with that. May revisit later. Sponsored-by: Brock Spratlen on Patreon	2021-07-14 15:38:28 -04:00
Joey Hess	d2c48404a8	assistant: Avoid unncessary git repository repair In a situation where git fsck gets confused about a commit that is made while it's running. Sponsored-by: Graham Spencer on Patreon	2021-06-30 18:00:16 -04:00
Joey Hess	75d29de8d4	avoid using CopyFile If .git/config symlinks are ever a thing, this will handle them better. But mostly, this is to avoid git-repair needing to include CopyFile, which would complicate it unduely.	2021-06-29 13:21:21 -04:00
Joey Hess	6b0d732746	repair: Fix reversion in version 8.20200522 that prevented fetching missing objects from remotes In commit `dfc4e641b5` git repair was changed to use remote name, not url, when fetching. But it fetches into a temporary git repo, which doesn't have remotes configured. Oops. (In my defense, that commit was made just as covid lockdown started. But testing? Urk.) Sponsored-by: Mark Reidenbach on Patreon	2021-06-29 13:15:15 -04:00
Joey Hess	199391befe	make repair interruption safe Fixed bug that interrupting git-annex repair (or assistant) while it was fixing repository corruption would lose objects that were contained in pack files. Unpack all pack files and move objects into place before deleting the pack files. The old approach moved the pack files to a temp directory before unpacking them, which was not interruption safe. Sponsored-By: Jochen Bartl on Patreon	2021-06-29 13:14:28 -04:00
Joey Hess	70dbe61fc2	remove unnecessary liftIO	2021-06-07 14:51:12 -04:00
Joey Hess	efae085272	fixed reconcileStaged crash when index is locked or in conflict Eg, when git commit runs the smudge filter. Commit `428c91606b` introduced the crash, as write-tree fails in those situations. Now it will work, and git-annex always gets up-to-date information even in those situations. It does need to do a bit more work, each time git-annex is run with the index locked. Although if the index is unmodified from the last time write-tree succeeded, that work is avoided.	2021-05-24 11:33:23 -04:00
Joey Hess	984034f335	filter-branch working aside from some edge cases Added a note to man page about what happens to information that is recorded in the private journal. Since it uses Branch.get, that information will be copied when options allow. It seemed better to allow it and document it than not allow it, since the options allow excluding repositories and so can be used to exclude private repos if desired.	2021-05-17 13:24:58 -04:00
Joey Hess	1d16654a22	convert formatLsTree to ByteString for speed	2021-05-17 10:46:24 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	32138b8cd8	implement annex.privateremote and remote.name.private configs The slightly unusual parsing in Types.GitConfig avoids the need to look at the remote list to get configs of remotes. annexPrivateRepos combines all the configs, and will only be calculated once, so it's nice and fast. privateUUIDsKnown and regardingPrivateUUID now need to read from the annex mvar, so are not entirely free. But that overhead can be optimised away, as seen in getJournalFileStale. The other call sites didn't seem worth optimising to save a single MVar access. The feature should have impreceptable speed overhead when not being used.	2021-04-23 14:21:57 -04:00
Joey Hess	0e830b6bb5	make remoteKeyToRemoteName safer If it's passed a ConfigKey such as annex.version, avoid returning an empty remote name and return Nothing instead. Also, foo.bar.baz is not treated as a remote named "bar".	2021-04-23 13:29:21 -04:00
Joey Hess	5712a7ef93	fix incomplete pattern match warning There was not really a bug here, because the 2 lists are always the same length, but the compiler does not know that.	2021-03-30 12:59:53 -04:00
Joey Hess	4611813ef1	Fix bug importing from a special remote into a subdirectory more than one level deep Which generated unusual git trees that could confuse git merge, since they incorrectly had 2 subtrees with the same name. Root of the bug was a) not testing that at all! but also b) confusing graftdirs, which contains eg "foo/bar" with non-recursively read trees, which would contain eg "bar" when reading a subtree of "foo". It's worth noting that Annex.Import uses graftTree, but it really shouldn't have needed to. Eg, when importing into foo/bar from a remote, it's enough to generate a tree of foo/bar/x, foo/bar/y, and does not include other files that are at the top of the master branch. It uses graftTree, so it does include the other files, as well as the foo/bar tree. git merge will do the same thing for both trees. With that said, switching it away from graftTree would result in another import generating a new commit that seems to delete files that were there in a previous commit, so it probably has to keep using graftTree since it used it before. This commit was sponsored by Kevin Mueller on Patreon.	2021-03-26 16:04:36 -04:00
Joey Hess	5d78cd9d08	Sped up git-annex init in a clone of an existing repository Seems that hasOrigin was never finding origin's git-annex branch, so a new one got created each time. And so then it later needed to merge the two branches, which is expensive. Added --no-track to git branch to avoid it displaying a message about setting up tracking branches. Of course there's no reason to make the git-annex branch a tracking branch since git-annex auto-merges it.	2021-03-23 15:23:13 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	ed717cf646	fix handling of subtree I don't think this actually fixes any buggy behavior in git-annex, I just noticed that using treeItemToLsTreeItem and then serializing it resulted in something starting with "160000 blob" rather than "160000 commit"	2021-03-12 13:24:19 -04:00
Joey Hess	4b57e1c0ad	allow adjusttreeitem to remove submodules	2021-03-12 13:19:23 -04:00
Joey Hess	e07eabbf7f	Fix support for local gcrypt repositories with a space in their URI Git.Remote.parseRemoteLocation had a hack to handle URIs that contained characters like spaces, which is something git unfortunately allows despite not being a valid URI. However, that hack looked for "//" to guess something was an URI, and these gcrypt URIs, being to a local path, don't contain that. So instead escape all illegal characters and check if the resulting thing is an URI. And that was already done by Git.Construct.fromUrl, so internally the gcrypt URI with a space looks like "gcrypt::foo%20bar" and that needs to be de-escaped when converting back from URI to local repo path. This change might also allow a few other almost-valid URIs to be handled as URIs by git-annex. None that contain "//" will change, and any behavior change should result in git-annex doing closer to a right thing than it did before, probably. This commit was sponsored by Noam Kremen on Patreon.	2021-03-09 12:49:51 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	e3224ff77d	formatLsTree did not use a tab where git does Fixed that, and made parserLsTree accept the space as well as tab. Fixes a reversion that made import of a tree from a special remote result in a merge that deleted files that were not preferred content of that special remote.	2021-01-28 12:36:37 -04:00
Joey Hess	e7134ca1eb	avoid partial functions in Git.Url After the last commit, it was able to throw errors just due to an unparseable url. This avoids needing to worry about that, as long as the call site has already checked that it has a parseable url.	2021-01-18 15:07:23 -04:00
Joey Hess	2aa4fab62a	avoid crashing when there are remotes using unparseable urls Including the non-standard URI form that git-remote-gcrypt uses for rsync. Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port number. But git could have a remote with that, it would try to run git-remote-ook to handle it. So, git-annex has to allow for such things, rather than crashing. This commit was sponsored by Luke Shumaker on Patreon.	2021-01-18 14:59:08 -04:00
Joey Hess	5193aae385	Bug fix: Fix tilde expansion in ssh urls when the tilde is the last character in the url. Thanks, Grond for the patch.	2021-01-18 12:22:48 -04:00
Joey Hess	dc0caef297	merge from git-repair	2021-01-11 21:57:35 -04:00
Joey Hess	33bcee86f1	avoid using wildcard near bug kyle fixed	2021-01-07 13:44:23 -04:00
Kyle Meyer	fd161da2c2	adjustTree: Consider submodule deletions In addition to regular file deletions, the removefiles argument passed to adjustTree may contain removed submodules. When making the new tree, filter these out in the same way that is done for regular files so that the deletion is propagated.	2021-01-07 13:43:09 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	cd776ecb2e	avoid combining queued commands with different params I don't think this affected git-annex currently, but if the same command was queued twice with different params, one set of params was thrown away, and the files going with those were run with the other set of params.	2021-01-04 12:41:19 -04:00
Joey Hess	5d8e4a7c74	avoid borg list of archives that have been listed before This makes sync a lot faster in the common case where there's no new backup. There's still room for it to be faster. Currently the old imported tree has to be traversed, to generate the ImportableContents. Which then gets turned around to generate the new imported tree, which is identical. So, it would be possible to just return a "no new imports", or an ImportableContents that has a way to graft in a tree. The latter is probably too far to go to optimise this, unless other things need it. The former might be worth it, but it's already pretty fast, since git ls-tree is pretty fast.	2020-12-22 14:06:40 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	804808d569	squash build warnings on windows	2020-11-23 14:00:17 -04:00
Joey Hess	fcb1d67b41	fix build on windows	2020-11-20 12:53:25 -04:00
Joey Hess	ff0927bde9	converted reads from stderr to use hGetLineUntilExitOrEOF These are all unlikely to suffer from the inherited stderr fd problem, but who knows, it could happen.	2020-11-19 16:21:17 -04:00
Joey Hess	66497d39b3	convert git config reading to use hGetLineUntilExitOrEOF Much nicer than the old hack of waiting for a few seconds for stderr to be read.	2020-11-19 15:38:43 -04:00
Joey Hess	6b63278f31	init: When writing hook scripts, set all execute bits, not only the user execute bit	2020-11-17 13:31:12 -04:00

1 2 3 4 5 ...

703 commits