git-annex

Author	SHA1	Message	Date
Joey Hess	7b2d236556	importfeed: stream metadata for 5% speedup On top of the 10% speedup from streaming url logs.	2020-07-14 14:35:26 -04:00
Joey Hess	535cdc8d48	importfeed: Made checking known urls step around 10% faster. This was a bit disappointing, I was hoping for a 2x speedup. But, I think the metadata lookup is wasting a lot of time and also needs to be made to stream. The changes to catObjectStreamLsTree were benchmarked to not also speed up --all around 3% more. Seems I managed to make it polymorphic after all.	2020-07-14 12:47:51 -04:00
Joey Hess	a6afa62a60	improve wording	2020-07-13 17:57:55 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	b4d0f6dfc2	slower but sequential filtering of large files from pointer files There should still be a speedup seeking over pointer files, just not as large as the one seeking over symlinks.	2020-07-10 15:21:58 -04:00
Joey Hess	de3d7d044d	make catObjectStream support newline and carriage return in filenames Turns out the %(rest) trick was not needed. Instead, just maintain a list of files we've asked for, and each cat-file response is for the next file in the list. This actually benchmarks 25% faster than before! Very surprising, but it must be due to needing to shove less data through the pipe, and parse less.	2020-07-08 13:49:03 -04:00
Joey Hess	d010ab04be	sped up the --all option by 2x to 16x by using git cat-file --buffer This assumes that no location log files will have a newline or carriage return in their name. catObjectStream skips any such files due to cat-file not supporting them. Keys have been prevented from containing newlines since 2011, commit `480495beb4`. If some old repo had a key with a newline in it, --all will just skip processing that key. Other things, like .git/annex/unused files certianly assume no newlines in keys too, and AFAICR, such keys never actually worked. Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys generated before that point could perhaps contain a CR. (URL probably not, http probably doesn't support an URL with a raw CR in it.) So, added a warning in fsck about such keys. Although, fsck --all will naturally skip them, so won't be able to warn about them. Not entirely satisfactory, but I'll bet there are not really any such keys in existence. Thanks to Lukey for finding this optimisation.	2020-07-07 13:54:04 -04:00
Joey Hess	d66fc1a464	Revert "async exception safety for coprocesses" This reverts commit `7013798df5`.	2020-07-06 15:11:28 -04:00
Joey Hess	dfa1c21b8a	comment and update changelog with benchmark results	2020-07-06 13:39:42 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	85cd79ea01	no importKey for android yet adb shell has sha256sum sha1sum and some others, so they could be used. They're provided by toybox, so seem about as likely to keep working as find and stat, which it already depends on. Or to not add a dep, could use stat the same as getExportContentIdentifier to get a mtime, and make a WORM key. But do I really want this to default to WORM? Unsure what's the best path, so punting for now.	2020-07-03 14:02:50 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	f912f8e5fd	refix bug in a better way Always run Git.Config.store, so when the git config gets reloaded, the override gets re-added to it, and changeGitRepo then calls extractGitConfig on it and sees the annex.* settings from the override. Remove any prior occurance of -c v and add it to the end. This way, -c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2 Note that, if git had some multiline config that got built up by multiple -c's, this would not work still. But it never worked because before the bug got fixed in the first place, the -c value was repeated many times, so the multivalue thing would have been wrong. I don't think -c can be used with multiline configs anyway, though git-config does talk about them?	2020-07-02 13:32:33 -04:00
Joey Hess	ec0f8a6e74	Fix reversion that broke passing git configs with -c Reverting commit `c8fec6ab0`	2020-07-02 12:42:13 -04:00
Joey Hess	8a797358b7	changelog wording	2020-06-26 14:27:42 -04:00
Joey Hess	8b22e0bf37	lockContent for tahoe Trivial since git-annex cannot remove, but do an active checkKey verification anyway, in case the data was lost somehow. This commit was sponsored by Ryan Newton on Patreon.	2020-06-26 14:23:21 -04:00
Joey Hess	3175015d1b	lockContent for S3 (with versioning=yes) and git-lfs Made several special remotes support locking content on them while dropping, which allows dropping from another special remote when the content will only remain on a special remote of these types. In both cases, verify the content is present actively, because it's certianly possible for things other than git-annex to have removed it. Worth thinking about what to do if at some later point, git-lfs gains support for dropping content, and a content locking operation. That would probably need a transition; first would need to make lockContent use the locking operation. Then, once enough time had passed that we can assume any git-annex operating on the git-lfs remote had that change, git-annex could finally allow dropping from git-lfs. Or, it could be that git-lfs gains support for dropping content, but not locking it. In that case, it seems this commit would need to be reverted, and then wait long enough for that git-annex to be everywhere, and only then can git-annex safely support dropping from git-lfs. So, the assumption made in this commit could lead to bother later.. But I think it's actually highly unlikely git-lfs does ever support dropping; it's outside their centralized model. Probably. :) Worth keeping in mind as the same assumption is made about other special remotes though. This commit was sponsored by Ethan Aubin.	2020-06-26 13:46:42 -04:00
Joey Hess	4229713e63	importfeed: Added some additional --template variables for date and time This commit was sponsored by Ethan Aubin.	2020-06-24 14:24:50 -04:00
Joey Hess	b651d3ede0	test: Fix some test cases that assumed git's default branch name git is making that configurable, and configuring it globally would break the test suite in a few places. No other part of git-annex assumes any branch name. Renamed a few placeholders to make that clearer. This commit was sponsored by Jake Vosloo on Patreon.	2020-06-23 16:40:51 -04:00
Joey Hess	7757c0e900	Honor annex.largefiles when importing a tree from a special remote. This commit was sponsored by Martin D on Patreon.	2020-06-23 16:07:18 -04:00
Joey Hess	5098236c6b	testremote: Fix over-allocation of resources and bad caching Including starting up a large number of external special remote processes. (Regression introduced in version 8.20200501)	2020-06-22 14:25:49 -04:00
Joey Hess	104b3a9c6a	Build with the http-client-restricted library when available Otherwise use the vendored copy as before. The library is in Debian testing but not stable. Once it reaches stable, the vendored copy can be removed. Did not add it to debian/control because IIRC that's used to build git-annex on stable too, possibly. However, the Debian maintainer will probably want to make the package depend on libghc-http-client-restricted-dev This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:31:31 -04:00
Joey Hess	01eb863a14	Build with the git-lfs library when available Otherwise use the vendored copy as before. The library is in Debian testing but not stable. Once it reaches stable, the vendored copy can be removed. Did not add it to debian/control because IIRC that's used to build git-annex on stable too, possibly. However, the Debian maintainer will probably want to make the package depend on libghc-git-lfs-dev. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:21:25 -04:00
Joey Hess	d5451afc8f	fix deadlock Fix a deadlock that could occur after git-annex got an unlocked file, causing the command to hang indefinitely. Known to happen on vfat filesystems, possibly others. Note that a deadlock is still theoretically possible, if anything smudge --clean does causes it to run the git queue for some other reason. Apparently that doesn't happen, but will need to keep an eye on it.	2020-06-18 12:56:29 -04:00
Joey Hess	48a88d822d	releasing package git-annex version 8.20200617	2020-06-17 15:59:34 -04:00
Joey Hess	82448bdf39	fix a annex.pidlock issue That made eg git-annex get of an unlocked file hang until the annex.pidlocktimeout and then fail. This fix should be fully thread safe no matter what else git-annex is doing. Only using runsGitAnnexChildProcess in the one place it's known to be a problem. Could audit for all places where git-annex runs itself as a child and add it to all of them, later.	2020-06-17 15:30:59 -04:00
Joey Hess	9583b267f5	confirmed fix	2020-06-17 12:12:41 -04:00
Joey Hess	ad81feb053	fix implicit embedcreds regression Fix bug that made creds not be stored in git when a special remote was initialized with gpg encryption, but without an explicit embedcreds=yes. (Yet nother regression introduced in version 7.20200202.7. 5th so far.)	2020-06-16 18:00:19 -04:00
Joey Hess	a1d4c8e4ec	external: SETCREDS include creds in externalConfigChanges This makes the creds get saved, since only things recorded there will be saved. IIRC, unparsedRemoteConfig was not originally available when I implemented this; now that it is things get a bit simpler. More could probably be simplified, is externalConfigChanges needed at all? This does not entirely fix the bugs though, because creds are only embedded when embedcreds=yes, but not when encryption=pubkey is used without embedcreds=yes.	2020-06-16 17:24:24 -04:00
Joey Hess	4773713cc9	analysis of regression and fix related less serious regression	2020-06-16 15:16:36 -04:00
Joey Hess	c4f2c56f5e	checkpresentkey: fix behavior to match documentation checkpresentkey: When no remote is specified, try all remotes, not only ones that the location log says contain the key. This is what the documentation has always said it did. Still try the logged remotes first, because they are far more likely to have the key.	2020-06-16 13:54:26 -04:00
Joey Hess	a76b1ba3d6	local git remote autoinit improvements * Improve display of problems auto-initializing or upgrading local git remotes. * When a local git remote cannot be initialized because it has no git-annex branch or a .noannex file, avoid displaying a message about it.	2020-06-16 13:24:00 -04:00
Joey Hess	41952204ce	S3: The REDUCED_REDUNDANCY storage class is no longer cheaper So stop documenting it, and stop offering it as a choice in the assistant. Removed the code that parses it into S3.ReducedRedundancy, because S3.OtherStorageClass with the value will work just the same and avoids a special case for a deprecated this.	2020-06-16 12:04:29 -04:00
Joey Hess	8a7c615a8f	import: Avoid using some strange names for temporary keys The ContentIdentifier can contain almost anything, so could have characters that are not fit for the filesystem, or might be longer than a key usually is, or contain a newline, or .... genKeyName deals with those problems. This should not present a back-compat issue, because this is a temporary key used while downloading the imported file, before the real key for it can be generated.	2020-06-11 16:07:36 -04:00
Joey Hess	6b0cb2d732	defer cleaning keys db of old data Avoid creating the keys database during init when there are no unlocked files, to prevent init failing when sqlite does not work in the filesystem.	2020-06-11 15:40:13 -04:00
Joey Hess	a49d300545	async exception safety for external special remote processes Since an external process can be in the middle of some operation when an async exception is received, it has to be shut down then. Using cleanupProcess will close its IO handles and send it a SIGTERM. If a special remote choses to catch SIGTERM, it's fine for it to do some cleanup then, but until it finishes, git-annex will be blocked waiting for it. If a special remote blocked SIGTERM, it would cause a hang. Mentioned in docs. Also, in passing, fixed a FD leak, it was not closing the error handle when shutting down the external. In practice that didn't matter before because it was only run when git-annex was itself shutting down, but now that it can run on exception, it would have been a problem.	2020-06-09 12:22:14 -04:00
Joey Hess	1dd770b1af	fix file descriptor leak when importing from a directory special remote that is configured with exporttree=yes	2020-06-05 15:34:43 -04:00
Joey Hess	2bff3b7c49	init: When annex.pidlock is set, skip lock probing.	2020-06-05 11:12:16 -04:00
Joey Hess	1d41ae5d2a	init warning on stalled lock probe init: If lock probing stalls for a long time (eg a broken NFS server), display a message to let the user know what's taking so long.	2020-06-05 11:06:19 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	484a74f073	auto-init autoenable=yes Try to enable special remotes configured with autoenable=yes when git-annex auto-initialization happens in a new clone of an existing repo. Previously, git-annex init had to be explicitly run to enable them. That was a bit of a wart of a special case for users to need to keep in mind. Special remotes cannot display anything when autoenabled this way, to avoid interfering with the output of git-annex query commands. Any error messages will be hidden, and if it fails, nothing is displayed. The user will realize the remote isn't enable when they try to use it, and can run git-annex init manually then to try the autoenable again and see what failed. That seems like a reasonable approach, and it's less complicated than communicating something across a pipe in order to display it as a side message. Other reason not to do that is that, if the first command the user runs is one like git-annex find that has machine readable output, any message about autoenable failing would need to not be displayed anyway. So better to not display a failure message ever, for consistency. (Had to split out Remote.List.Util to avoid an import cycle.)	2020-05-27 12:40:35 -04:00
Joey Hess	864ba4ecaa	disable buggy concurrency in Command.Export Fix a crash or potentially not all files being exported when sync -J --content is used with an export remote. Crash as described in fixed bug report. waitForAllRunningCommandActions inserted in several points where all the commandActions started before need to have finished before moving on to the next stage of the export. A race across those points could have maybe resulted in not all files being exported, or a wrong tree being export. For example, changeExport starting up an action like a rename of A to B. Then, with that action still running, fillExport uploading a new A, before the rename occurred. That race seems unlikely to have happened. There are some other ones that this also fixes.	2020-05-26 13:54:08 -04:00
Joey Hess	e04a931439	improve transfer stages for some commands move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup actions in separate job pool from uploads. transferStages was confusingly named, it's only useful when doing downloads as then the verify actions can be run concurrently with other downloads. For commands that upload, there will be more concurrency from running cleanup actions in a separate job pool. As for sync, I left it using downloadStages although that's not optimal for the part of a sync that uploads. Perhaps it should use the union of both?	2020-05-26 11:55:50 -04:00
Joey Hess	0bcecb67f5	export: Let concurrent transfers be done with -J or annex.jobs Tested working, although I did find this bug in my testing, which also afflicts sync -J to an export remote.	2020-05-26 11:44:07 -04:00
Joey Hess	f7fe71602c	import: Added --json-progress Already supported --json, but not that. Also checked all other commands that only support --json, and the only other one that does transfers is fsck (--from), which it did not seem worth adding --json-progress to really.	2020-05-26 11:27:47 -04:00
Joey Hess	5b8524e1e6	addurl: Make --preserve-filename also apply when eg a torrent contains multiple files Forgot to remove sanitizeFilePath after adding sanitizeOrPreserveFilePath here.	2020-05-26 10:45:57 -04:00
Joey Hess	fc9833f68d	export: Added options for json output Just worked, no need to do anything except add the options.	2020-05-26 10:31:10 -04:00
Joey Hess	01513da127	releasing package git-annex version 8.20200522	2020-05-22 12:07:59 -04:00
Joey Hess	27459c6e3f	Support building with tasty-1.3 This commit was sponsored by Ethan Aubin.	2020-05-21 15:26:44 -04:00
Joey Hess	e63dcbf36c	fix embedcreds=yes reversion Fix bug that made enableremote of S3 and webdav remotes, that have embedcreds=yes, fail to set up the embedded creds, so accessing the remotes failed. (Regression introduced in version 7.20200202.7 in when reworking all the remote configs to be parsed.) Root problem is that parseEncryptionConfig excludes all other config keys except encryption ones, so it is then unable to find the credPairRemoteField. And since that field is not required to be present, it proceeds as if it's not, rather than failing in any visible way. This causes it to not find any creds, and so it does not cache them. When when the S3 remote tries to make a S3 connection, it finds no creds, so assumes it's being used in no-creds mode, and tries to find a public url. With no public url available, it fails, but the failure doesn't say a lack of creds is the problem. Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full set of configs of the remote can be parsed. A bit annoying to need to parse the remote config before the full config (as returned by setRemoteCredPair) is available, but this avoids the problem. I assume webdav also had the problem by inspection, but didn't try to reproduce it with it. Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted String, but that does not seem right. Now that it runs that code, it crashed saying it had just a String. Remotes that have already been enableremoted, and so lack the cached creds file will work after this fix, because getRemoteCredPair will extract the creds from the remote config, writing the missing file. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-21 14:35:30 -04:00

1 2 3 4 5 ...

964 commits