git-annex

Author	SHA1	Message	Date
Joey Hess	ea3cb7d277	fix a case where file tracked by git unexpectedly becomes annex pointer file smudge: When annex.largefiles=anything, files that were already stored in git, and have not been modified could sometimes be converted to being stored in the annex. Changes in 7.20191024 made this more of a problem. This case is now detected and prevented.	2019-12-27 15:08:03 -04:00
Joey Hess	2b821eb225	Merge branch 'master' into sqlite	2019-12-26 15:15:42 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	8e9e809d9b	when annex.largefiles parse fails, say where the config came from	2019-12-20 13:07:10 -04:00
Joey Hess	4acbb40112	git-annex config annex.largefiles annex.largefiles can be configured by git-annex config, to more easily set a default that will also be used by clones, without needing to shoehorn the expression into the gitattributes file. The git config and gitattributes override that. Whenever something is added to git-annex config, we have to consider what happens if a user puts a purposfully bad value in there. Or, if a new git-annex adds some new value that an old git-annex can't parse. In this case, a global annex.largefiles that can't be parsed currently makes an error be thrown. That might not be ideal, but the gitattribute behaves the same, and is almost equally repo-global. Performance notes: git-annex add and addurl construct a matcher once and uses it for every file, so the added time penalty for reading the global config log is minor. If the gitattributes annex.largefiles were deprecated, git-annex add would get around 2% faster (excluding hashing), because looking that up for each file is not fast. So this new way of setting it is progress toward speeding up add. git-annex smudge does need to load the log every time. As well as checking the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids both overheads.	2019-12-20 13:01:41 -04:00
Joey Hess	02e00fd7ab	Merge branch 'master' into sqlite	2019-12-19 16:33:42 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	d5628a16b8	Merge branch 'bs' into sqlite-bs	2019-12-18 14:51:03 -04:00
Joey Hess	322c542b5c	fix ByteString conversion on windows the encode' and decode' functions on Windows should not apply the filesystem encoding, which does not work there. Instead, convert to and from UTF-8. Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem encoding, so won't work as expected on windows.	2019-12-18 13:32:56 -04:00
Joey Hess	3d38ec9585	fix fileJournal My ByteString rewrite oversimplified it, resulting in any _ in a journal file turning into a / in the git-annex branch, which was often the wrong filename, or sometimes (//) an invalid filename that git refused to add.	2019-12-18 11:29:34 -04:00
Joey Hess	cee0d738fc	match also / path separator on windows	2019-12-11 17:08:08 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	2f9a80d803	merging sqlite and bs branches Since the sqlite branch uses blobs extensively, there are some performance benefits, ByteStrings now get stored and retrieved w/o conversion in some cases like in Database.Export.	2019-12-06 15:30:45 -04:00
Joey Hess	5f391179f1	use RawFilePath getFileStatus for speed Only done on those calls to getFileStatus that had a RawFilePath, not a FilePath. The others would probably be just as fast if converted to use it with toRawFilePath, but I'm not 100% sure. Note that genInodeCache' uses fromRawFilePath, but that value only gets used on Windows, so on unix the thunk will never be evaluated.	2019-12-06 14:44:42 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	6535aea49a	optimisation This was already optimised before, but profiling found that delEntry was around 1.5% of the total runtime of git-annex whereis. It was being called once per environment variable per file processed. Fixed by better caching. Since withIndexFile is almost always run with the same .git/annex/index file, it can cache the modified environment, rather than re-modifying it each time called.	2019-12-04 14:27:11 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	c756006374	fix hacked up AutoMerge module to work again	2019-12-02 10:51:43 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	6a97ff6b3a	wip RawFilePath Goal is to make git-annex faster by using ByteString for all the worktree traversal. For now, this is focusing on Command.Find, in order to benchmark how much it helps. (All other commands are temporarily disabled) Currently in a very bad unbuildable in-between state.	2019-11-25 16:18:19 -04:00
Joey Hess	ddf6973d22	minor optimisation avoid repeated scan of the same bytestring	2019-11-22 19:13:05 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	d4661959de	Merge branch 'master' into sqlite	2019-11-21 17:26:50 -04:00
Joey Hess	740e0ddbfe	avoid running scanUnlockedFiles in bare repo It's not necessary. And if the bare repo somehow has a pointer file in it with the same name as a file in HEAD, that file would be populated, which would be surprising since the file is not really under git's control.	2019-11-21 14:31:12 -04:00
Joey Hess	5877de5e80	git-lfs: remember urls, and autoenable remotes using known urls * git-lfs: The url provided to initremote/enableremote will now be stored in the git-annex branch, allowing enableremote to be used without an url. initremote --sameas can be used to add additional urls. * git-lfs: When there's a git remote with an url that's known to be used for git-lfs, automatically enable the special remote.	2019-11-18 16:09:09 -04:00
Joey Hess	667d38a8f1	Fix a crash (STM deadlock) when -J is used with multiple files that point to the same key See the comment for a trace of the deadlock. Added a new StartStage. New worker threads begin in the StartStage. Once a thread is ready to do work, it moves away from the StartStage, and no thread will ever transition back to it. A thread that blocks waiting on another thread that is processing the same key will block while in the StartStage. That other thread will never switch back to the StartStage, and so the deadlock is avoided.	2019-11-14 13:51:09 -04:00
Joey Hess	890330f0fe	make --json-error-messages capture url download errors Convert Utility.Url to return Either String so the error message can be displated in the annex monad and so captured. (When curl is used, its errors are still not caught.)	2019-11-12 13:52:38 -04:00
Joey Hess	99536e3a0b	remove one more warningIO Had to generalize Git.Queue so it can run an Annex action, yipes. Only remaining warningIO are in the legacy chunk code.	2019-11-12 10:45:52 -04:00
Joey Hess	0be23bae2f	refactor Better to not have a single function module, and better to have a more specific type than Bool. This commit was sponsored by Jack Hill on Patreon	2019-11-11 19:10:52 -04:00
Joey Hess	3b34d123ed	Added annex.allowsign option. This commit was sponsored by Ilya Shlyakhter on Patreon.	2019-11-11 16:28:56 -04:00
Joey Hess	3553867b66	v7 to v8 auto-upgrade bump version to 8 and update NEWS about it	2019-11-07 13:24:16 -04:00
Joey Hess	2f94b5419a	use new name for new format export dbs Delete the old export dbs on upgrade. Testing this an exporting to a directory with both exporttree=yes and importtree=yes, it refused to let an interrupted export proceed after upgrade, with "unsafe to overwrite file". An import resolved the problem.	2019-11-06 17:34:15 -04:00
Joey Hess	3b820f08f7	use new name for new format content identifier db It will be populated automatically by the next command that needs data from it, the same way it gets populated in a fresh clone. That may be a little expensive, but it's a one time cost, and no slower than in a fresh clone.	2019-11-06 16:43:52 -04:00
Joey Hess	1b5f4b67b5	use new name for new format fsck db The old db is cleaned up when a new incremental fsck is started. The incremental fsck won't pick up where the old one left off, but I consider this a minor enough thing that it can just be documented and won't be a problem.	2019-11-06 16:27:25 -04:00
Joey Hess	dc9295017f	v8 upgrade of keys db Renamed the database to .git/annex/keysdb; the old .git/annex/keys gets deleted during the upgrade. It is possible that an old git-annex process is running during the upgrade. If so, it will be able to continue using the old keys db until the upgrade is complete, and then will presumably fail in some ugly way. Or perhaps the upgrade will be unable to delete the open files on some systems, and so fail with an ugly error message. It's also possible for multiple processes to be running the upgrade concurrently. That should be fine; they will both write the same information into the keys db. Other databases still need to be upgraded.	2019-11-06 16:16:00 -04:00
Joey Hess	6147130e86	Merge branch 'master' into sqlite	2019-11-05 12:59:28 -04:00
Joey Hess	e2d4c133f5	init: fix data loss bug Fix bug that lost modifications to unlocked files when init is re-ran in an already initialized repo. In retrospect needing scanUnlockedFiles False in the direct mode upgrade path was a good hint that it was unsafe when used with True. However, this bug did not affect upgrade from v5. In such an upgrade, an unlocked file that is modified is left as-is. The only place scanUnlockedFiles True did overwrite modified unlocked files is during an git-annex init of a repo that was already initialized by git-annex. (I also tried a scenario where the repo had not been initialized by git-annex yet, but was cloned from a v7 repo with an unlocked file, and the pointer file replaced with some other content, and the data loss did not occur in that situation.) Since the fixed scanUnlockedFiles avoids overwriting non-pointer files, it should be safe to run in any situation, so there's no need any longer for the parameter.	2019-11-05 12:41:15 -04:00
Joey Hess	61c9b0945d	bump version, though there is no upgrade path yet I just don't want this branch to accidentially run in my production repos yet.	2019-10-29 17:06:35 -04:00
Joey Hess	c35a9047d3	improve data types for sqlite This is a non-backwards compatable change, so not suitable for merging w/o a annex.version bump and transition code. Not yet tested. This improves performance of git-annex benchmark --databases across the board by 10-25%, since eg Key roundtrips as a ByteString. (serializeKey' produces a lazy ByteString, so there is still a copy involved in converting it to a strict ByteString. It may be faster to switch to using bytestring-strict-builder.) FilePath and Key are both stored as blobs. This avoids mojibake in some situations. It would be possible to use varchar instead, if persistent could avoid converting that to Text, but it seems there is no good way to do so. See doc/todo/sqlite_database_improvements.mdwn Eliminated some ugly artifacts of using Read/Show serialization; constructors and quoted strings are no longer stored in sqlite. Renamed SRef to SSha to reflect that it is only ever a git sha, not a ref name. Since it is limited to the characters in a sha, it is not affected by mojibake, so still uses String.	2019-10-29 17:05:36 -04:00
Joey Hess	e98f230c95	remove unused function	2019-10-23 12:01:34 -04:00
Joey Hess	5db79339a1	init: Fix a failure when used in a submodule on a crippled filesystem. When the submodule's parent repo has an adjusted unlocked branch, it gets cloned by git, but git checks out master. git annex init then fails because it wants to enter the adjusted branch, but: adjusted branch adjusted/master(unlocked) already exists. Aborting because that branch may have changes that have not yet reached master Note that init actually then exits 0, leaving master checked out. This could also happen, absent submodules, if the parent repo has an adjusted unlocked branch, but it is not checked out. In the more common case where that branch is checked out, the clone uses the same branch, so no problem. The choices to fix this: * Init could delete the existing adjusted branch, and re-adjust. But then running init inside an adjusted branch on a crippled filesystem would lose any changes that have not been synced back to master. * Init could sync any changes back to master, but that would be very surprising behavior for it. * Init could simply check out the existing adjusted branch. If the branch is diverged from master, well, sync will sort that out later. This mirrors the behavior of cloning a repo that has an adjusted branch checked out that has not yet been synced back to master. Picked this choice.	2019-10-21 11:41:15 -04:00
Joey Hess	ce48eb797c	make DropDead transition minimize remote.log for dead sameas remotes All that needs to be retained in remote.log is the sameas-uuid. The rest of the config is eliminated. This doesn't save enough space to bother with, but it prevents anything sensitive in the config of the dead sameas remote from lingering around. Note that minimizesameasdead does not update the VectorClock when changing the log line. That's normally a no-no, but in this case, it makes each DropDead result in the exact same file contents, and vector clocks are not needed because the transition breaks the history chain.	2019-10-15 11:39:25 -04:00
Joey Hess	4306dfbe68	remove empty log files in transition forget --drop-dead: Remove several classes of git-annex log files when they become empty, further reducing the size of the git-annex branch. Noticed while testing sameas uuid removal, but it could happen other times too. An empty log file is always treated by git-annex the same as no file being present, and when the files are per-key, it can be a sizable space saving to exclude them from the tree.	2019-10-14 16:04:15 -04:00
Joey Hess	5e9a2cc37f	forget state of sameas remotes during DropDead transitions It would have been a lot less round-about to just make git annex dead also add the uuids of sameas remotes to the trust.log as dead. But, that would fail in the case where there's an unmerged other clone that has a sameas remote that the current repo does not know about. Then it would not get marked as dead. Handling it at transition time avoids that scenario. Note that the generation of trustmap' in dropDead should only happen once, due to the partial application.	2019-10-14 15:47:42 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	37f725a9f7	Merge branch 'master' into sameas	2019-10-11 15:56:00 -04:00
Joey Hess	debafcba2b	autoenable sameas remotes	2019-10-11 15:52:40 -04:00
Joey Hess	ec778888d2	got enableremote working for sameas Also the assistant can enable sameas remotes, should work, but not tested.	2019-10-11 15:11:08 -04:00
Joey Hess	1b9c1d1737	fix sameas inherited key removal	2019-10-11 13:18:27 -04:00
Joey Hess	8d7dc76dff	fix bad paste of field name	2019-10-11 13:05:25 -04:00
Joey Hess	91eed85fd4	add sameas inherited configs to newConfig This makes initremote --sameas work with encryption inherited.	2019-10-11 13:05:20 -04:00
Joey Hess	0dd5691951	update notes	2019-10-10 16:12:59 -04:00
Joey Hess	df5b0ffab3	inherit other fields I think this is all that need to be inherited.	2019-10-10 16:11:21 -04:00
Joey Hess	c3975ff3b4	sameas RemoteConfig inheritance I found a way to avoid inheritance complicating anything outside of Logs.Remote. It seems fine to require all inherited values to be inherited and not set in the sameas remote's config. Since inherited values will be used for stuff like encryption and perhaps chunking, which control the actual content stored on the remote, it seems likely that there will not be any reason to need them to vary between two remotes that access the same underlying data store. The newer version of containers is free; the minimum ghc version is bundled with a newer version than that.	2019-10-10 15:58:22 -04:00
Joey Hess	59908586f4	rename RemoteConfigKey to RemoteConfigField And some associated renames. I was going to have some values named fooKeyKey otherwise..	2019-10-10 15:44:05 -04:00
Joey Hess	92ff30df70	set annex-config-uuid when RemoteConfig contains a sameas-uuid Initremote sets that, so after both initremote and enableremote, the git config will be set. Any remote that does not use Annex.SpecialRemote won't set annex-config-uuid. But that's only Remote.Git, which doesn't use RemoteConfig anyway.	2019-10-10 12:58:59 -04:00
Joey Hess	97b499a4dc	use sameas-name and sameas-uuid for sameas remotes initremote --sameas=remotename sets sameas-name and sameas-uuid Using sameas-name rather than name prevents old git-annex initremote from enabling a sameas remote by name, since it would not handle it correctly.	2019-10-10 12:32:05 -04:00
Joey Hess	53da7f1cf8	update uninit to handle all the v7 stuff * uninit: Remove several git hooks that git-annex init sets up. * uninit: Remove the smudge and clean filters that git-annex init sets up.	2019-10-08 14:34:00 -04:00
Joey Hess	1113caa53e	preserve unlocked file mtime when dropping When dropping an unlocked file, preserve its mtime, which avoids git status unncessarily running the clean filter on the file. If the index file has close to the same mtime as a work tree file, git will not trust the index to be up-to-date, and re-runs the clean filter unncessarily. Preserving the mtime when depopulating a pointer file avoids git status doing a little (or maybe a lot) of unncessary work. There are other places that the mtime could be preserved, including other places where pointer files are written perhaps, but also populatePointerFile. But, I don't know of cases where those lead to git status doing unncessary work, so I just fixed the one I'm aware of for now.	2019-10-08 14:01:12 -04:00
Joey Hess	3066bdb1fb	fix annex.largefiles largerthan/smallerthan bug Fix bug in handling of annex.largefiles that use largerthan/smallerthan. When adding a modified file, it incorrectly used the file size of the old version of the file, not the current size. That was the only largefiles limit that didn't directly look at the file on disk already. Added a new type to keep straight the two different ways such a limit can be matched. I kind of wanted to extend MatchingFile or FileInfo to indicate that the matcher is supposed to operate on files from disk or annex, but it turned out to be too complex to implement it that way. This also changes the LimitAnnexFiles case when lookupFileKey does not find a key. It used to fall back to statting the file, now it always returns False. I doubt the old code could really get to that point, but if it somehow does, it's better for preferred content matching to be consistent.	2019-09-30 17:15:08 -04:00
Joey Hess	9f27d03945	fix a typo that didn't matter so far	2019-09-27 14:08:16 -04:00
Joey Hess	fda1bdd679	Added --mimetype and --mimeencoding file matching options. Already had these for largefiles matching, but I forgot to add them as command-line options.	2019-09-19 12:09:59 -04:00
Joey Hess	53fd746705	avoid some build warnings on windows	2019-09-12 14:11:19 -04:00
Joey Hess	fef3cd055d	Removed support for git versions older than 2.1 debian oldoldstable has 2.1, and that's what i386ancient uses. It would be better to require git 2.2, which is needed to use adjusted branches, but can't do that w/o losing support for some old linux kernels or a complicated git backport.	2019-09-11 16:14:43 -04:00
Joey Hess	061231621e	Merge branch 'master' into v7-default	2019-09-10 16:06:43 -04:00
Joey Hess	94c75d2bd9	init: Fix a reversion that broke initialization on systems that need to use pid locking This brings back .git/annex/misctmp, but only for init. If an init is interrupted while probing using that temp directory, the files it left will get deleted 1 week later by a subsequent git-annex run.	2019-09-10 13:37:07 -04:00
Joey Hess	f845195354	Added annex.autoupgraderepository configuration Can be set to false to prevent any automatic repository upgrades. Also, removed direct mode specific upgrade code in Annex.Init, and made needsUpgrade always include the name/path of the repo, so if there's a problem it's clear what repo has the problem. And, made needsUpgrade catch any exceptions that might occur during the upgrade, so it can display a more useful error message than just the exception.	2019-09-01 13:42:26 -04:00
Joey Hess	8f0af15020	one missed thing for automatic v5 -> v7 upgrades	2019-08-30 17:35:10 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	d6e1f09ed2	init: Catch more exceptions when testing locking.	2019-08-29 12:19:07 -04:00
Joey Hess	4e30b06ffb	remove unused import	2019-08-28 15:38:29 -04:00
Joey Hess	e804f48f82	remove a few more isDirect tests	2019-08-28 11:53:10 -04:00
Joey Hess	863db16e53	remove unused	2019-08-27 16:13:49 -04:00
Joey Hess	9b1331881c	reorg remaining direct mode code Only used for upgrading, so put it under there.	2019-08-27 14:05:38 -04:00
Joey Hess	e395ba2cb0	remove unused code	2019-08-27 13:57:17 -04:00
Joey Hess	da6f4d8887	remove direct mode support from Annex.Content No longer used. The only possible user of it would be code in Upgrade.V5, so I verified that the parts of Annex.Content it used were not used to manipulate direct mode files.	2019-08-27 13:14:06 -04:00
Joey Hess	770b8ff926	clearer message when direct mode upgrade fails When a remote is being upgraded, the message looked as if the local repo was where the problem was. So include the path of the repo.	2019-08-27 12:23:34 -04:00
Joey Hess	586db7f06d	Avoid making a commit when upgrading from direct mode to v7 Three reasons: * Committing as part of an upgrade is very unusual and unexpected. * The commit was failing with a weird error message when done during an automatic upgrade. * Let me remove more of that sweet^Whorrible direct mode code.	2019-08-26 16:35:44 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	20741b1eb4	Automatically convert direct mode repositories to v7 with adjusted unlocked branches * Automatically convert direct mode repositories to v7 with adjusted unlocked branches and set annex.thin. * init: When run on a crippled filesystem with --version=5, will error out, since version 7 is needed for adjusted unlocked branch. * direct: This command always errors out as direct mode is no longer supported. * indirect: This command has become a deprecated noop. * proxy: This command is deprecated because it was only needed in direct mode. (But it continues to work.) Also removed mentions of direct mode throughough the documentation. I have not removed all the direct mode code yet.	2019-08-26 15:05:25 -04:00
Joey Hess	f6fb4b8cdb	avoid side message when doing automatic upgrade to v7 An automatic upgrade is supposed to be silent.	2019-08-26 13:54:52 -04:00
Joey Hess	5877a15d7b	fix hard links when upgrading from direct mode When upgrading a direct mode repo to v7 with adjusted unlocked branches, fix a bug that prevented annex.thin from taking effect for the files in working tree. The hard links used to be ok, but commit `8e22114735` accidentially broke them. It repopulates the worktree file, which is already a hard link, and when it's creating the new file, the link count is already 2, and so it doesn't make a hard link then.	2019-08-26 13:54:39 -04:00
Joey Hess	1e02360283	remove only case	2019-08-26 13:28:28 -04:00
Joey Hess	2fd27c6df5	assistant: When creating a new repository use v7 adjusted branches with annex.thin Rather than direct mode, which this is a small step on the path to removing. Init on a crippled filesystem already used v7 adjusted branches, and like that, this doesn't pose any interoperability issues with old versions of git-annex that clone the same repo, because files are only unlocked on the adjusted branch.	2019-08-26 12:54:14 -04:00
Joey Hess	b599e8e6ac	move module only used by assistant	2019-08-26 12:32:45 -04:00
Joey Hess	bb16a26109	use headExists Turns out that `7be690f326` broke the test suite on the i386ancient builder. There, git show-ref --verify HEAD fails with "'HEAD' - not a valid ref". Apparently git 2.1.4 didn't support that. headExists works there and does the same thing.	2019-08-19 11:12:19 -04:00
Joey Hess	f845636e30	correct license to AGPL This code was already AGPL, except for the bit split out to Utility/MD5.hs in commit `426053cb6c`. That commit accidentially updated the license of this file from AGPL to GPL. Thanks to Sean Whitton for spotting this.	2019-08-17 14:08:07 -04:00
Joey Hess	e4a8366162	fix edge case failure in prop_view_roundtrips "./" made it fail, because that gets eliminated	2019-08-16 11:35:32 -04:00
Joey Hess	dc672863c3	init: Install working hook scripts when run on a crippled filesystem and on Windows	2019-08-13 15:14:17 -04:00
Joey Hess	868942e19b	fix unused module import warnings when building on windows	2019-08-08 12:18:53 -04:00
Joey Hess	8ba4de2d9c	remove unused import	2019-07-30 12:16:41 -04:00
Joey Hess	5080a7be1e	fix build	2019-07-29 12:41:45 -04:00
Joey Hess	426053cb6c	Corrected some license statements In `40ecf58d4b` I changed the license of code I wrote from GPL to AGPL. But, two files containing code I wrote combined with code by others were updated to say their license is AGPL, while in fact part of it was (the code I wrote) but part remained under the original license (the code written by others). Remote/Ddar.hs is now changed entirely back to GPL 3. Annex/DirHashes.hs stays AGPL, but I broke out Utility/MD5.hs with the code not written by me, and corrected its license statement to GPL-2, which is the actual version of the GPL included with the code in its original distribution at http://www.cs.ox.ac.uk/people/ian.lynagh/md5/	2019-07-28 14:27:33 -04:00
Joey Hess	4c5a489f3e	avoid build warning when built w/o magic-mime	2019-07-22 11:03:26 -04:00
Joey Hess	7fd650355e	merge from http-client-restricted I made some improvements to its API after splitting it out of git-annex, so merge those back in. This is groundwork for removing the embedded copy of it and depending on it. Also moved the managerResponseTimeout disabling to Annex.Url as it's git-annex specific. This commit was sponsored by Ethan Aubin on Patreon.	2019-07-17 16:48:50 -04:00
Joey Hess	7be690f326	check headRef not Branch.current Support running v7 upgrade in a repo where there is no branch checked out, but HEAD is set directly to some other ref. This commit was sponsored by Jack Hill on Patreon.	2019-07-16 12:36:29 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00
Joey Hess	26c54d6ea3	make metered more generic Allow it to be used when the Key is not known.	2019-06-25 12:33:36 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	84e729fda5	fix init default description reversion init: Fix a reversion in the last release that prevented automatically generating and setting a description for the repository. Seemed best to factor out uuidDescMapRaw that does not have the default mempty descrition behavior. I don't much like that behavior, but I know things depend on it. One thing in particular is `git annex info` which lists the uuids and descriptions; if the current repo has been initialized in some way that means it does not have a description, it would not show up w/o that. (Not only repos created due to this bug might lack that. For example a repo that was marked dead and had --drop-dead delete its git-annex branch info, and then came back from the dead would similarly not be in the uuid.log. Also there have been other versions of git-annex that didn't set a default description; for years there was no default description.)	2019-06-20 20:30:24 -04:00
Joey Hess	ba433bdc85	refactor	2019-06-19 20:19:38 -04:00
Joey Hess	26f0f8b20f	optimisation Avoid an unncessary STM transaction. This will happen when the worker pool is not completely full of the new stage, which is the common case. In the uncommon case, this adds only a tiny bit of overhead for the extra traversal of the worker pool. And the thread is going to block for some time anyway.	2019-06-19 20:13:19 -04:00
Joey Hess	37d505dd6b	avoid STM deadlock When all worker threads are running and enteringStage is called, it waits for an idle slot. If all off the other threads then call it in turn, a deadlock occurrs. This is the same problem I didn't actually fix in `5a9842d7ed`. Fixed by doing two separate STM transactions, the first replaces its active thread with an idle thread, and the second waits for another idle thread. That guarantees there will eventually be an idle thread to find. The changes to WorkerPool were necessary because it can't add an idle thread containing the Annex state and go on to run an action using that same state, so I had to remove the Annex state from IdleWorker.	2019-06-19 18:15:25 -04:00
Joey Hess	9671248fff	speed up enteringStage in non-concurrent mode Avoid a STM transaction. Also got rid of UnallocatedWorkerPool.	2019-06-19 15:47:54 -04:00
Joey Hess	05a908c3c9	fix oops	2019-06-19 14:52:44 -04:00
Joey Hess	9d36c826c0	use fine-grained WorkerStages when transferring and verifying This means that Command.Move and Command.Get don't need to manually set the stage, and is a lot cleaner conceptually. Also, this makes Command.Sync.syncFile use the worker pool better. In the scenario where it first downloads content and then uploads it to some other remotes, it will start in TransferStage, then enter VerifyStage and then go back to TransferStage for each transfer to the remotes. Before, it entered CleanupStage after the download, and stayed in it for the upload, so too many transfer jobs could run at the same time. Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent inside onLocal. That has a Annex state for the remote, with no worker pool. So the resulting calls to enteringStage won't block in there. While Remote.Git.copyToRemote does do checksum verification, I realized that should not use a verification slot in the WorkerPool to do it. Because, it's reading back from eg, a removable disk to checksum. That will contend with other writes to that disk. It's best to treat that checksum verification as just part of the transer. So, removed the todo item about that, as there's nothing needing to be done.	2019-06-19 13:24:20 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	8e5ea28c26	finish CommandStart transition The hoped for optimisation of CommandStart with -J did not materialize. In fact, not runnign CommandStart in parallel is slower than -J3. So, CommandStart are still run in parallel. (The actual bad performance I've been seeing with -J in my big repo has to do with building the remoteList.) But, this is still progress toward making -J faster, because it gets rid of the onlyActionOn roadblock in the way of making CommandCleanup jobs run separate from CommandPerform jobs. Added OnlyActionOn constructor for ActionItem which fixes the onlyActionOn breakage in the last commit. Made CustomOutput include an ActionItem, so even things using it can specify OnlyActionOn. In Command.Move and Command.Sync, there were CommandStarts that used includeCommandAction, so output messages, which is no longer allowed. Fixed by using startingCustomOutput, but that's still not quite right, since it prevents message display for the includeCommandAction run inside it too.	2019-06-12 13:24:01 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	659640e224	separate queue for cleanup actions When running multiple concurrent actions, the cleanup phase is run in a separate queue than the main action queue. This can make some commands faster, because less time is spent on bookkeeping in between each file transfer. But as far as I can see, nothing will be sped up much by this yet, because all the existing cleanup actions are very light-weight. This is just groundwork for deferring checksum verification to cleanup time. This change does mean that if the user expects -J2 will mean that they see no more than 2 jobs running at a time, they may be surprised to see 4 in some cases (if the cleanup actions are slow enough to notice). It might also make sense to enable background cleanup without the -J, for at least one cleanup action. Indeed, that's the behavior that -J1 has now. At some point in the future, it make make sense to make the behavior with no -J the same as -J1. The only reason it's not currently is that git-annex can build w/o concurrent-output, and also any bugs in concurrent-output (such as perhaps misbehaving on non-VT100 compatible terminals) are avoided by default by only using it when -J is used.	2019-06-05 17:54:35 -04:00
Joey Hess	c04b2af3e1	improved WorkerPool abstraction No behavior changes.	2019-06-05 14:26:48 -04:00
Joey Hess	082e1f1738	Don't try to import .git directories from special remotes Because git does not support storing git repositories inside a git repository.	2019-06-04 15:14:20 -04:00
Joey Hess	67c06f5121	add back support for ftp urls Add back support for ftp urls, which was disabled as part of the fix for security hole CVE-2018-10857 (except for configurations which enabled curl and bypassed public IP address restrictions). Now it will work if allowed by annex.security.allowed-ip-addresses.	2019-05-30 14:51:34 -04:00
Joey Hess	1871295765	rename annex.security.allowed-http-addresses Renamed annex.security.allowed-http-addresses to annex.security.allowed-ip-addresses because it is not really specific to the http protocol, also limiting eg, git-annex's use of ftp and via youtube-dl, several other protocols. The old name for the config will still work. If both old and new name are set, the new name will win.	2019-05-30 12:43:40 -04:00
Joey Hess	a14f6ce758	fix repo description setting bugs * init: When the repository already has a description, don't change it. * describe: When run with no description parameter it used to set the description to "", now it will error out.	2019-05-23 12:51:01 -04:00
Joey Hess	16a2bed710	avoid build warning on Windows about unused import	2019-05-23 12:15:33 -04:00
Joey Hess	e06feb7316	honor preferred content when importing Importing from a special remote honors its preferred content too; unwanted files are not imported. But, some preferred content expressions can't be checked before files are imported, and trying to import with such an expression will fail. Tested this with scenarios including changing the preferred content expression and making sure merging the import didn't delete files that were no longer wanted. There was one minor inefficiency mentioned in the todo that I punted on.	2019-05-21 14:38:06 -04:00
Joey Hess	0bd39c1315	remove a TODO I checked yesterday	2019-05-21 12:54:39 -04:00
Joey Hess	3b9a19171a	Merge branch 'master' into preferred	2019-05-21 11:34:45 -04:00
Joey Hess	5e1221ad53	Improve shape of commit tree when importing from unversioned special remotes Make the import have the previous import as a parent, so eg `git log --stat` displays a useful diff. Also a minor optimisation, only calculate the depth of the imported history once.	2019-05-21 11:32:54 -04:00
Joey Hess	97fd9da6e7	add back non-preferred files to imported tree Prevents merging the import from deleting the non-preferred files from the branch it's merged into. adjustTree previously appended the new list of items to the old, which could result in it generating a tree with multiple files with the same name. That is not good and confuses some parts of git. Gave it a function to resolve such conflicts. That allowed dealing with the problem of what happens when the import contains some files (or subtrees) with the same name as files that were filtered out of the export. The files from the import win.	2019-05-20 16:43:52 -04:00
Joey Hess	568af1073e	filter exported tree through remote's preferred content setting The filtering is fairly efficient as far as building the trees goes, since it reuses adjustTree. But it still needs to traverse the whole tree, and look up the keys used by every file. The tree that gets recorded to export.log is the filtered tree. This way resumes of interrupted sync to an export uses it without needing to recalculate it. And, a change to the preferred content settings of the remote will result in a different tree, so the export will be updated accordingly. The original tree is still used in the remote tracking branch. That branch represents the special remote as a git remote, and if it were a normal git remote, the tree in its head would not be affected by preferred content.	2019-05-20 11:54:55 -04:00
Joey Hess	354c0eb57f	support standard and groupwanted in keyless mode Only when the preferred content expression includes them will a parse failure due to them needing keys result in the preferred content expression not parsing in keyless mode.	2019-05-14 14:59:03 -04:00
Joey Hess	9411a7c93c	matching preferred content before key is known This will let import try to match preferred content expressions before downloading the content and generating its key. If an expression needs a key, it preferredContentParser with preferredContentKeylessTokens will fail to parse it. standard and groupwanted are not in preferredContentKeylessTokens because they may refer to an expression that refers to a key. That needs further work to support them.	2019-05-14 14:28:23 -04:00
Joey Hess	aa7710982b	avoid list lookup by parseToken Minor optimisation to parsing of a preferred content expression.	2019-05-14 13:11:29 -04:00
Joey Hess	c1957b6aeb	whitespace	2019-05-14 13:01:50 -04:00
Joey Hess	5cc0ee70c0	factor out MatchFiles Annex This makes parseToken more general	2019-05-14 12:44:50 -04:00
Joey Hess	82186ca58f	annex.jobs=cpus etc Added the ability to run one job per CPU (core), by setting annex.jobs=cpus, or using option --jobs=cpus or -Jcpus. Built with future expansion in mind, including not defaulting matching on Concurrency so more constructors can later be added, and using "cpu" instead of "0".	2019-05-10 13:27:08 -04:00
Joey Hess	2d33122215	avoid ingest lockdown file escaping the withOtherTmp call Fixes bug that caused git-annex to fail to add a file when another git-annex process cleaned up the temp directory it was using. Solution is just to push withOtherTmp out to a higher level, so that the whole ingest process can be completed inside it. But in the assistant, that was not practical to do, since withOtherTmp runs in the Annex monad and the assistant does not. Worked around by introducing a separate temp directory that only the assistant uses for lockdown. Since only one assistant can run at a time, it's easy to clean up that directory of old cruft at startup.	2019-05-07 13:04:57 -04:00
Joey Hess	2a41712ef1	avoid stageJournal escaping withOtherTmp This is only done for correctness sake; I don't see any way that it would have caused a problem here. The jlog file escaped withOtherTmp so another process could swoop in and delete it, but the file is only used as a buffer for a list of filenames, and its handle gets rewound and they're read back out, which will still work even if it's already been deleted. The only reason I didn't just pre-delete the file and keep the handle open is I'm not sure that works on all OS's (eg Windows). If there was a problem that this fixed it might involve an OS that doesn't support deleting an open file or something like that.	2019-05-07 11:57:12 -04:00
Joey Hess	b03e65d260	Improved locking when multiple git-annex processes are writing to the .git/index file	2019-05-06 15:15:12 -04:00
Joey Hess	c5e0f9b3a5	fix setting imported tree `bf7ecd6892` went too far and broke importing, the old tree was used on the remote tracking branch and not the newly imported tree. Test suite noticed the problem luckily.	2019-05-06 14:38:02 -04:00
Joey Hess	bf7ecd6892	fix export subtree reversion Fix reversion in last release that caused wrong tree to be written to remote tracking branch after an export of a subtree. The invariant "commitsha should have the treesha as its tree" was not met due to a bug. Guarantee it's met by catting the commitsha to find its actual tree. A little bit slower, but this is not run often.	2019-05-06 13:57:13 -04:00
Joey Hess	96dfba7b53	fix build w/o MagicMime more	2019-05-03 11:30:20 -04:00
Joey Hess	740c9f7da8	fix build w/o MagicMime	2019-05-03 11:20:25 -04:00
Joey Hess	ab36f2f535	fix windows build	2019-05-03 10:58:34 -04:00
Joey Hess	ec697721e4	simplify and a bit faster using Eq this way	2019-05-01 15:34:07 -04:00
Joey Hess	700a3f2787	Merge branch 'master' into import-from-s3	2019-05-01 14:30:52 -04:00
Joey Hess	a32f31235a	reuse old imported commits This avoids proliferation of different import commits for the same trees, and makes the resulting git history nice.	2019-05-01 14:20:26 -04:00
Joey Hess	2bd0e07ed8	make merge commit on export that preserves the import history	2019-05-01 13:13:00 -04:00
Joey Hess	d1c283b691	comments	2019-05-01 12:37:54 -04:00
Joey Hess	1503b86a14	make import tree from remote generate a merge commit This way no history is lost, neither what was exported to the remote, or the history of changes that is imported from it. No complicated correlation of two possibly very different histories is needed, just record what we know and then git merge will do a good job. Also, it notices when the remote tracking branch doesn't need to be updated, and avoids doing anything, so noop remotes are super cheap. The only catch here is that, since the commits generated for imports from the remote don't have a stable date or author/committer, each (non-noop) import generates different commits for the same imported trees. So, when the imported remote tracking branch is merged into master and then a change is imported again, there will be an extra series of commits, which will get more and more expensive each time. This seems to call for making stable commits for imports. Also that seems a good idea to make importing in several repositories have the same result.	2019-04-30 16:13:21 -04:00
Joey Hess	b69d11ec42	wip	2019-04-30 14:00:27 -04:00
Joey Hess	28b4310abe	typo	2019-04-30 12:22:13 -04:00
Joey Hess	9dd764e6f7	Added mimeencoding= term to annex.largefiles expressions. * Added mimeencoding= term to annex.largefiles expressions. This is probably mostly useful to match non-text files with eg "mimeencoding=binary" * git-annex matchexpression: Added --mimeencoding option.	2019-04-30 12:17:22 -04:00
Joey Hess	18cf21d3ed	wip	2019-04-26 10:17:02 -04:00

1 2 3 4 5 ...

1466 commits