git-annex

Author	SHA1	Message	Date
Joey Hess	13b9a288d3	scanAnnexedFiles in smudge --update This makes git checkout and git merge hooks do the work to catch up with changes that they made to the tree. Rather than doing it at some later point when the user is not thinking about that past operation. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:37:47 -04:00
Joey Hess	22185b4a4e	stop using addAssociatedFileFast Use addAssociatedFile instead, after recent optimisations it seems just as fast.	2021-06-08 09:23:28 -04:00
Joey Hess	428c91606b	include locked files in the keys database associated files Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.	2021-05-21 16:24:37 -04:00
Joey Hess	05989556a2	start implementing hidden git-annex repositories This adds a separate journal, which does not currently get committed to an index, but is planned to be committed to .git/annex/index-private. Changes that are regarding a UUID that is private will get written to this journal, and so will not be published into the git-annex branch. All log writing should have been made to indicate the UUID it's regarding, though I've not verified this yet. Currently, no UUIDs are treated as private yet, a way to configure that is needed. The implementation is careful to not add any additional IO work when privateUUIDsKnown is False. It will skip looking at the private journal at all. So this should be free, or nearly so, unless the feature is used. When it is used, all branch reads will be about twice as expensive. It is very lucky -- or very prudent design -- that Annex.Branch.change and maybeChange are the only ways to change a file on the branch, and Annex.Branch.set is only internal use. That let Annex.Branch.get always yield any private information that has been recorded, without the risk that Annex.Branch.set might be called, with a non-private UUID, and end up leaking the private information into the git-annex branch. And, this relies on the way git-annex union merges the git-annex branch. When reading a file, there can be a public and a private version, and they are just concacenated together. That will be handled the same as if there were two diverged git-annex branches that got union merged.	2021-04-20 15:04:53 -04:00
Joey Hess	1e3b228154	speed up init This was making it git checkout master when that branch was already checked out, for apparently no good reason at all. In a 100,000 file repo, that takes about 1 second. Note, I'm not sure why it checks out the branch in the Nothing case, so I left that alone.	2021-03-23 15:43:42 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	88cef18fac	upgrade: Support an edge case upgrading a v5 direct mode repo where nothing had ever been committed to the head branch This commit was sponsored by Jack Hill on Patreon.	2020-11-24 12:31:17 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	b1eb47599a	move old direct mode stuff out of Annex.Locations	2020-11-12 12:40:35 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	b4b02e4c61	more RawFilePath conversion 412/645	2020-10-30 13:31:35 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	f45ad178cb	more RawFilePath conversion At 318/645 after 4k lines of changes This commit was sponsored by Jake Vosloo on Patreon.	2020-10-29 12:03:50 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	59263d2c6f	add import	2020-09-29 13:51:51 -04:00
Joey Hess	b2cf284d2a	upgrade: Avoid an upgrade failure of a bare repo in unusual circumstances	2020-09-29 13:45:14 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	9f6bd6cc05	add inRepoDetails planned to use for an optimisation most things using stagedDetails were not expecting to get dup files in a conflicted merge and deal with them, so converted them to use inRepoDetails.	2020-07-08 15:36:35 -04:00
Joey Hess	7347e50123	add stage number to stagedDetails parser And convert parser to attoparsec, probably faster. Before, a parse failure threw the whole --stage output line in to the filename, which was certianly a bad idea, so fixed that.	2020-07-08 15:05:12 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	bb88a01910	upgrade: When upgrade fails due to an exception, display it. `37b42e72e7` made it catch exceptions but thought they were unlikely to be useful to display, which may be right when a git command fails, but not in the case yoh found.	2020-05-07 12:22:32 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	c0cd07c36b	Ref ByteString conversion done Test suite passes.	2020-04-07 17:41:09 -04:00
Joey Hess	6c81e0c8f1	ByteString Ref continued Several nice speed wins I think. At 340/633 files converted.	2020-04-07 13:27:11 -04:00
Joey Hess	4ce518998a	Fix upgrade failure when a file has been deleted from the working tree	2020-03-09 16:59:18 -04:00
Joey Hess	7f992ef59c	mostly finished with createDirectoryUnder conversion Remaining things needing converted are in the assistant, and Annex.Ssh. Every other remaining call to createDirectoryIfMissing True has been audited and is not relevant. The ones in Build/ of course don't get included in the program. Others included eg, Remote.Tahoe and Config.Files which both write to dotfiles under the home directory.	2020-03-06 11:57:15 -04:00
Joey Hess	c78b9b55b6	rename changeGitConfig to overrideGitConfig and avoid unncessary calls It's important that it be clear that it overrides a config, such that reloading the git config won't change it, and in particular, setConfig won't change it. Most of the calls to changeGitConfig were actually after setConfig, which was redundant and unncessary. So removed those. The only remaining one, besides --debug, is in the handling of repository-global config values. That one's ok, because the way mergeGitConfig is implemented, it does not override any value that is set in git config. If a value with a repo-global setting was passed to setConfig, it would set it in the git config, reload the git config, re-apply mergeGitConfig, and use the newly set value, which is the right thing.	2020-02-27 01:11:53 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	879f52a116	annex.tune.branchhash1=true bugfix Fix support for repositories tuned with annex.tune.branchhash1=true, including --all not working and git-annex log not displaying anything for annexed files.	2020-02-14 15:22:48 -04:00
Joey Hess	3cd3757236	annex.dotfiles The git add behavior changes could be avoided if it turns out to be really annoying, but then it would need to behave the old way when annex.dotfiles=false and the new way when annex.dotfiles=true. I'd rather not have the config option result in such divergent behavior as `git annex add .` skipping a dotfile (old) vs adding to annex (new). Note that the assistant always adds dotfiles to the annex. This is surprising, but not new behavior. Might be worth making it also honor annex.dotfiles, but I wonder if perhaps some user somewhere uses it and keeps large files in a directory that happens to begin with a dot. Since dotfiles and dotdirs are a unix culture thing, and the assistant users may not be part of that culture, it seems best to keep its current behavior for now.	2019-12-26 16:33:39 -04:00
Joey Hess	02e00fd7ab	Merge branch 'master' into sqlite	2019-12-19 16:33:42 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	535b153381	building again after merge Nice, several conversions fell out.	2019-12-18 15:02:46 -04:00
Joey Hess	d5628a16b8	Merge branch 'bs' into sqlite-bs	2019-12-18 14:51:03 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	2f9a80d803	merging sqlite and bs branches Since the sqlite branch uses blobs extensively, there are some performance benefits, ByteStrings now get stored and retrieved w/o conversion in some cases like in Database.Export.	2019-12-06 15:30:45 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	1100e0d3c9	include upgrade code back in Remaining things that need to be fixed up to get this branch into a basically mergeable state: remotes, commands, and the assistant	2019-12-02 12:16:46 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	2f94b5419a	use new name for new format export dbs Delete the old export dbs on upgrade. Testing this an exporting to a directory with both exporttree=yes and importtree=yes, it refused to let an interrupted export proceed after upgrade, with "unsafe to overwrite file". An import resolved the problem.	2019-11-06 17:34:15 -04:00
Joey Hess	3b820f08f7	use new name for new format content identifier db It will be populated automatically by the next command that needs data from it, the same way it gets populated in a fresh clone. That may be a little expensive, but it's a one time cost, and no slower than in a fresh clone.	2019-11-06 16:43:52 -04:00
Joey Hess	1b5f4b67b5	use new name for new format fsck db The old db is cleaned up when a new incremental fsck is started. The incremental fsck won't pick up where the old one left off, but I consider this a minor enough thing that it can just be documented and won't be a problem.	2019-11-06 16:27:25 -04:00
Joey Hess	dc9295017f	v8 upgrade of keys db Renamed the database to .git/annex/keysdb; the old .git/annex/keys gets deleted during the upgrade. It is possible that an old git-annex process is running during the upgrade. If so, it will be able to continue using the old keys db until the upgrade is complete, and then will presumably fail in some ugly way. Or perhaps the upgrade will be unable to delete the open files on some systems, and so fail with an ugly error message. It's also possible for multiple processes to be running the upgrade concurrently. That should be fine; they will both write the same information into the keys db. Other databases still need to be upgraded.	2019-11-06 16:16:00 -04:00
Joey Hess	e2d4c133f5	init: fix data loss bug Fix bug that lost modifications to unlocked files when init is re-ran in an already initialized repo. In retrospect needing scanUnlockedFiles False in the direct mode upgrade path was a good hint that it was unsafe when used with True. However, this bug did not affect upgrade from v5. In such an upgrade, an unlocked file that is modified is left as-is. The only place scanUnlockedFiles True did overwrite modified unlocked files is during an git-annex init of a repo that was already initialized by git-annex. (I also tried a scenario where the repo had not been initialized by git-annex yet, but was cloned from a v7 repo with an unlocked file, and the pointer file replaced with some other content, and the data loss did not occur in that situation.) Since the fixed scanUnlockedFiles avoids overwriting non-pointer files, it should be safe to run in any situation, so there's no need any longer for the parameter.	2019-11-05 12:41:15 -04:00
Joey Hess	1558e03014	Refuse to upgrade direct mode repositories when git is older than 2.22 That git fixed a memory leak that could cause an OOM during the upgrade. Most git-annex builds have a new enough git already. OSX git was upgraded with brew. Linux i386ancient build's git was too old. Upgrading it to a fixed git didn't work (due to the newer git not working with the old ssh, https://bugs.chromium.org/p/git/issues/detail?id=7 ) Choices to deal with that were: * Somehow make direct mode upgrade work with the old git, avoiding its OOM problem. One way would be to switch the repo to indirect mode first, and so upgrade to a repo with locked files. Not good when the filesystem does not support symlinks. * backport the OOM fix from git 2.22 (And do what about the version number so git-annex knows it's fixed?) * backport openssh (and possibly more stuff) * move the i386ancient build to at least Debian stretch (still backporting git) But this will make it no longer work with some of the ancient kernels it targets. Of those, backporting the OOM fix seemed the best approach. Put "oomfix" in the git version number to indicate it. I have not automated building the git backport, so here's the patch I used: diff -ur orig/git-2.1.4/convert.c git-2.1.4/convert.c --- orig/git-2.1.4/convert.c 2014-12-18 18:42:18.000000000 +0000 +++ git-2.1.4/convert.c 2019-08-29 20:05:04.371872338 +0100 @@ -404,7 +404,7 @@ if (start_async(&async)) return 0; /* error was already reported */ - if (strbuf_read(&nbuf, async.out, len) < 0) { + if (strbuf_read(&nbuf, async.out, 0) < 0) { error("read from external filter %s failed", cmd); ret = 0; } diff -ur orig/git-2.1.4/GIT-VERSION-GEN git-2.1.4/GIT-VERSION-GEN --- orig/git-2.1.4/GIT-VERSION-GEN 2014-12-18 18:42:18.000000000 +0000 +++ git-2.1.4/GIT-VERSION-GEN 2019-08-29 20:06:39.132743228 +0100 @@ -1,7 +1,7 @@ #!/bin/sh GVF=GIT-VERSION-FILE -DEF_VER=v2.1.4 +DEF_VER=v2.1.4.oomfix LF=' ' diff -ur orig/git-2.1.4/configure git-2.1.4/configure --- orig/git-2.1.4/configure 2014-12-18 18:42:19.000000000 +0000 +++ git-2.1.4/configure 2019-08-29 20:27:45.896380015 +0100 @@ -580,8 +580,8 @@ # Identity of this package. PACKAGE_NAME='git' PACKAGE_TARNAME='git' -PACKAGE_VERSION='2.1.4' -PACKAGE_STRING='git 2.1.4' +PACKAGE_VERSION='2.1.4.oomfix' +PACKAGE_STRING='git 2.1.4.oomfix' PACKAGE_BUGREPORT='git@vger.kernel.org' PACKAGE_URL='' diff -ur orig/git-2.1.4/version git-2.1.4/version --- orig/git-2.1.4/version 2014-12-18 18:42:19.000000000 +0000 +++ git-2.1.4/version 2019-08-29 20:06:17.572545210 +0100 @@ -1 +1 @@ -2.1.4 +2.1.4.oomfix	2019-08-29 15:24:41 -04:00

1 2 3 4

186 commits