git-annex

Author	SHA1	Message	Date
Joey Hess	8bde6101e3	sqlite datbase for importfeed importfeed: Use caching database to avoid needing to list urls on every run, and avoid using too much memory. Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster, and memory use dropped from 203000k to 59408k. Database.ImportFeed is Database.ContentIdentifier with the serial number filed off. There is a bit of code duplication I would like to avoid, particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use the persistent sqlite tables, so despite the code being the same, they cannot be factored out. Since this database includes the contentidentifier metadata, it will be slightly redundant if a sqlite database is ever added for metadata. I did consider making such a generic database and using it for this. But, that would then need importfeed to update both the url database and the metadata database, which is twice as much work diffing the git-annex branch trees. Or would entagle updating two databases in a complex way. So instead it seems better to optimise the database that importfeed needs, and if the metadata database is used by another command, use a little more disk space and do a little bit of redundant work to update it. Sponsored-by: unqueued on Patreon	2023-10-23 16:46:22 -04:00
Joey Hess	6a61c7ff45	Fix crash of enableremote when the special remote has embedcreds=yes The crash occurred because writeCreds got called twice, and writeFileProtected neglected to close its file handle, so the file was open for write when written the second time. It seems unncessary and suboptimal that writeCreds gets called twice. One call is from getRemoteCredPair and the other from setRemoteCredPair'. What happens is that in the enableremote case, code that also runs at initremote does unncessary work. Might be possible to improve that, but I've gone for the simple fix. Sponsored-by: k0ld on Patreon	2023-10-20 13:19:12 -04:00
Joey Hess	c268dc5878	only stage regular files from the journal git-annex only writes regular files there, but other things may drop junk like empty .DAV directories around the tree. And trying to hash such things can have weird and hard to understand effects. So it seems best to do a small amount of work in statting the journal file to make sure it's a regular file. Sponsored-by: Jack Hill on Patreon	2023-10-10 13:22:02 -04:00
Joey Hess	b9240d2c5d	releasing package git-annex version 10.20230926	2023-09-26 13:29:49 -04:00
Joey Hess	41f4d0bda9	enableremote: Avoid overwriting existing git remote when passed the uuid of a specialremote that was earlier initialized with the same name	2023-09-22 13:29:48 -04:00
Joey Hess	54da44d42a	Support being built with crypton rather than cryptonite crypton is a fork of cryptonite, and cryptonite's github repo has been archived. Some deps are already using cryptonite so it's clearly the way forward. Added a build flag without a default, so cabal configure will select on its own which to use. stack files pin to cryptonite for now. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-09-21 12:43:42 -04:00
Joey Hess	a18e40bdd7	lookupkey: Added --ref option Sponsored-by: Joshua Antonishen on Patreon	2023-09-12 12:49:11 -04:00
Joey Hess	7be8950138	propigateAdjustedCommits in seekExportContent push: When on an adjusted branch, propagate changes to parent branch before updating export remotes. This is a somewhat redundant call to propigateAdjustedCommits, since it also gets called at pushLocal time. That other one needs to come after importing from importtree remotes though, and seekExportContent has to come earlier, so I don't see a way to avoid doing it twice. Note that git-annex sync also manages to avoid the problem, it's only git-annex push that had the bug. Sponsored-by: Leon Schuermann on Patreon	2023-09-11 14:54:26 -04:00
Joey Hess	29ae536637	adb send to final filename not tmp file Avoids some problems with unusual character in exporttree filenames that confuse adb shell commands. In particular, with a filename that contains \351, adb push sends the file to the correct filename in /sdcard. And running find on the android device roundtrips the filename. But, running mv on that filename on the android device fails with "bad <filename>: No such file or directory". Interestingly, ls on android works, and rm fails. adb push to the final name to avoids this problem. But what about atomicity? Well, I tried an adb push and interrupted it part way through. The file was present while the push was running, but was removed once the push got interrupted. I also tried yanking the cable while adb push was running, and the partially received file was also deleted then. That avoids most problems. An import that runs at the same time as an export will see the partially sent file. But that is unlikely to be done, and if it did happen, it would notice that the imported file had changed in the meantime and discard it. Note that, since rm on the android device fails on these filenames, exporting a tree where the file is deleted is going to fail to remove it. I don't see what I can do about that, so long as android is using an rm that has issues with filename encodings. This was tested on a phone where find, ls, and rm all come from Toybox 0.8.6. Sponsored-by: unqueued on Patreon	2023-09-11 13:13:05 -04:00
Joey Hess	baf8e4f6ed	Override safe.bareRepository for git remotes Fix using git remotes that are bare when git is configured with safe.bareRepository = explicit Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:56:26 -04:00
Joey Hess	cbfd214993	set safe.directory when getting config for git-annex-shell or git remotes Fix more breakage caused by git's fix for CVE-2022-24765, this time involving a remote (either local or ssh) that is a repository not owned by the current user. Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:40:50 -04:00
Joey Hess	32cb2bd3fa	Fix linker optimisation in linux standalone tarballs Was only symlinking when there is a usr/ directory, but with usr/ merge, there are none. Sponsored-by: Dartmouth College's Datalad project	2023-09-07 12:59:27 -04:00
Joey Hess	50300a47fe	Removed the vendored git-lfs and the GitLfs build flag AFAICS all git-annex builds are using the git-lfs library not the vendored copy. Debian stable now includes a new enough haskell-git-lfs package as well. Last time this was tried it did not.	2023-08-28 13:12:31 -04:00
Joey Hess	a0a42e7ec1	releasing package git-annex version 10.20230828	2023-08-28 13:04:06 -04:00
Joey Hess	43e2a66a31	wording	2023-08-28 12:13:58 -04:00
Joey Hess	cf8b30c914	oldkeys: New command that lists the keys used by old versions of a file The tricky thing about this turned out to be handling renames and reverts. For that, it has to make two passes over the git log, and to avoid buffering a possibly huge amount of logs in memory (ie the whole git log of an entire repository!), runs git log twice. (It might be possible to speed this up by asking git log to show a diff, and so avoid needing to use catKey.) Sponsored-By: Brock Spratlen on Patreon	2023-08-22 14:51:06 -04:00
Joey Hess	379d58b499	diffdriver: Added --get option Removed the dontCheck repoExists, because running it in a repo that has not been initialized yet would update location log with nouuid. And I guess it's ok for it to only support running in git-annex repos.	2023-08-22 11:58:53 -04:00
Joey Hess	724ceeb1a9	avoid unncessary use of curl when conduit will do Avoid using curl when annex.security.allowed-ip-addresses is set but neither annex.web-options nor annex.security.allowed-url-schemes is set to a value that needs curl. Bug introduced in `840bd50390` Sponsored-By: Brock Spratlen on Patreon	2023-08-22 10:25:53 -04:00
Joey Hess	7aac60769a	implement Unavilable for gcrypt Sponsored-by: Brett Eisenberg on Patreon	2023-08-16 15:54:54 -04:00
Joey Hess	977403d338	implement Unavilable for borg bup ddar directory rsync Only gcrypt remains to add support for. (Well, possibly also adb?) Sponsored-by: Luke T. Shumaker on Patreon	2023-08-16 15:48:09 -04:00
Joey Hess	67c99a4db7	info: Added available to the info displayed for a remote Sponsored-by: Kevin Mueller on Patreon	2023-08-16 14:52:58 -04:00
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	75275ed41f	update Last commit also removed curl from linux standalone tarball. Which may or may not have been a mistake.. I'm inclined to go ahead and simplify it.	2023-08-15 14:22:30 -04:00
Joey Hess	571a516ed2	Stop bundling curl in the OSX dmg New curl binary links to libldap with a @loader_path that prevents using the binary when the dmg is used elsewhere. See https://github.com/datalad/git-annex/issues/170 git-annex doesn't use curl by default anyway, so it doesn't really need to be included in the dmg.	2023-08-15 14:21:53 -04:00
Joey Hess	10b5f79e2d	fix empty tree import when directory does not exist Fix behavior when importing a tree from a directory remote when the directory does not exist. An empty tree was imported, rather than the import failing. Merging that tree would delete every file in the branch, if those files had been exported to the directory before. The problem was that dirContentsRecursive returned [] when the directory did not exist. Better for it to throw an exception. But in commit `74f0d67aa3` back in 2012, I made it never theow exceptions, because exceptions throw inside unsafeInterleaveIO become untrappable when the list is being traversed. So, changed it to list the contents of the directory before entering unsafeInterleaveIO. So exceptions are thrown for the directory. But still not if it's unable to list the contents of a subdirectory. That's less of a problem, because the subdirectory does exist (or if not, it got removed after being listed, and it's ok to not include it in the list). A subdirectory that has permissions that don't allow listing it will have its contents omitted from the list still. (Might be better to have it return a type that includes indications of errors listing contents of subdirectories?) The rest of the changes are making callers of dirContentsRecursive use emptyWhenDoesNotExist when they relied on the behavior of it not throwing an exception when the directory does not exist. Note that it's possible some callers of dirContentsRecursive that used to ignore permissions problems listing a directory will now start throwing exceptions on them. The fix to the directory special remote consisted of not making its call in listImportableContentsM use emptyWhenDoesNotExist. So it will throw an exception as desired. Sponsored-by: Joshua Antonishen on Patreon	2023-08-15 12:57:41 -04:00
Joey Hess	adda6c1088	Add git-annex remote refs that are not newer to the merged refs list Significant startup speed increase by avoiding repeatedly checking if some remote git-annex branch refs need to be merged when it is not newer. One way this could happen is when there are 2 remotes that are themselves connected. The git-annex branch on the first remote gets updated. Then the second remote pulls from the first, and merges in its git-annex branch. Then the local repo pulls from the second remote, and merges its git-annex branch. At this point, a pull from the first remote will get a git-annex branch that is not newer, but is not on the merged refs list. In my big repo, git-annex startup time dropped from 4 seconds to 0.1 seconds. There were 5 to 10 such remote refs out of 18 remotes. Sponsored-by: Graham Spencer on Patreon	2023-08-09 13:31:36 -04:00
Joey Hess	3efad7f5f4	info: Added --dead-repositories option I considered a more wide-ranging config option to make other commands also show dead repositories. But it would be difficult to implement that because Remote.keyLocations is used to get locations, filtering out dead repos, and commands like get then try to use those locations. So a config setting would make dead repos sometimes be acted on by commands. Sponsored-by: unqueued on Patreon	2023-08-09 12:43:48 -04:00
Joey Hess	6d83bcff0f	Fix behavior of onlyingroup Sponsored-by: k0ld on Patreon	2023-08-07 13:05:11 -04:00
Joey Hess	d19139a10d	releasing package git-annex version 10.20230802	2023-08-02 16:09:14 -04:00
Joey Hess	6da6449fff	stack.yaml: Update to build with ghc-9.6.2 and aws-0.24 This enables some new features that need the new aws. Use http-client-restricted-0.1.0 because it uses the crypton side of the cryptonite/crypton fork, which seems to be needed for ghc-9.6.2. Dependency on connection removed because of the cryptonite/crypton fork. This avoids needing a build flag. It was only used to throw a typed exception in Utility.Url, which nothing depended on. Used a fork of bloomfilter because it's not being maintained and no longer builds as-of this ghc version. (I have been trying to contact its maintainer about it, and emailed him today suggesting I take over the package.) Sponsored-by: Brock Spratlen on Patreon	2023-08-01 18:53:26 -04:00
Joey Hess	68c9b08faf	fix build with unix-2.8.0 Changed the parameters to openFd. So needed to add a small wrapper library to keep supporting older versions as well.	2023-08-01 18:41:27 -04:00
Joey Hess	fb640bc2f4	support building with unix-compat 0.7 It removed System.PosixCompat.User.	2023-08-01 15:17:43 -04:00
Joey Hess	393275c105	Setup.hs: Stop installing man pages, desktop files, and the git-annex-shell and git-remote-tor-annex symlinks Anything still relying on that, eg via cabal v1-install will need to change to using make install-home. Which was added back in 2019 in `6491b62614` because cabal new-build (now the default) already didn't use Setup in a way that let its installation of those things work. Notably this means Setup does not need to depend on unix-compat, which is useful because in 0.7 it removed System.PosixCompat.User, which Setup needed to determine where to install the desktop files. See https://github.com/haskell-pkg-janitors/unix-compat/issues/3	2023-08-01 15:08:56 -04:00
Joey Hess	fa92383993	onlyingroup * Support "onlyingroup=" in preferred content expressions. * Support --onlyingroup= matching option. Sponsored-by: Jack Hill on Patreon	2023-07-31 14:43:58 -04:00
Joey Hess	518a51a8a0	--explain for preferred/required content matching And annex.largefiles and annex.addunlocked. Also git-annex matchexpression --explain explains why its input expression matches or fails to match. When there is no limit, avoid explaining why the lack of limit matches. This is also done when no preferred content expression is set, although in a few cases it defaults to a non-empty matcher, which will be explained. Sponsored-by: Dartmouth College's DANDI project	2023-07-26 14:50:04 -04:00
Joey Hess	f25eeedeac	initial implementation of --explain Currently it only displays explanations of options like --in and --copies. In the future, it should explain preferred content expression evaluation and other decisions. The explanations of a few things could be better. In particular, "standard" will just appear as-is (or as "!standard" if it doesn't match), rather than explaining why the standard preferred content expression for the group matches or not. Currently as implemented, it goes to stdout, and so commands like git-annex find that have custom output will not display --explain information. Perhaps that should change, dunno. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 16:52:57 -04:00
Joey Hess	2807ab0a09	gcrypt: Remove empty hash directories when dropping content As was recently done with the directory special remote. Note that the top directory passed to removeDirGeneric was changed to avoid deleting .git/annex or .git/annex/objects if they ended up empty. Sponsored-by: Brett Eisenberg on Patreon	2023-07-21 16:04:11 -04:00
Joey Hess	3b34266e9e	typo	2023-07-21 15:36:01 -04:00
Joey Hess	b15366494a	directory: Remove empty hash directories when dropping content Failure to remove is not treated as a problem, and no permissions modifications are done, to avoid unexpected states. Sponsored-by: Luke Shumaker on Patreon	2023-07-21 14:57:29 -04:00
Joey Hess	7f38355860	dropunused: Support --jobs Sponsored-by: Kevin Mueller on Patreon	2023-07-21 14:03:34 -04:00
Joey Hess	33ba537728	deal with Amazon S3 breaking change for public=yes * S3: Amazon S3 buckets created after April 2023 do not support ACLs, so public=yes cannot be used with them. Existing buckets configured with public=yes will keep working. * S3: Allow setting publicurl=yes without public=yes, to support buckets that are configured with a Bucket Policy that allows public access. Sponsored-by: Joshua Antonishen on Patreon	2023-07-21 13:59:07 -04:00
Joey Hess	7fc6503812	fix waiting for all started feed downloads with -J importfeed bug fix: When -J was used with multiple feeds, some feeds did not get their items downloaded. In my case, I had added a feed to the end of the list, and no items from it were ever downloaded. Sponsored-by: Leon Schuermann on Patreon	2023-07-11 22:08:35 -04:00
Joey Hess	e82823d448	nub list of files yt-dlp when resumed was observed having written the same filename twice into the file list. Perhaps once by the first download and once by the resumed one?	2023-07-09 14:18:25 -04:00
Joey Hess	51b24aac91	importfeed: Add feedurl to the metadata (And allow it to be used in the --template although that seems unlikely to be very useful.) My use case for this is that one of the podcast feeds I subscribe to is sometimes leaking episodes of some other podcast. The other podcast is also very close to spam, so this may be a form of intentional spamming. I have not been able to catch the podcast feed containing those episodes, so I don't know which one is at fault. So putting this in the metadata will let me eventually catch it.	2023-07-06 00:11:38 -04:00
Joey Hess	adb09117f1	propigateAdjustedCommits: avoid overwriting diverged original branch Bug fix: Re-running git-annex adjust or sync when in an adjusted branch would overwrite the original branch, losing any commits that had been made to it since the adjusted branch was created. When git-annex adjust is run in this situation, it will display a warning about the diverged branches. When git-annex sync is run in this situation, mergeToAdjustedBranch will merge the changes from the original branch to the adjusted branch. So it does not need to display the divergence warning. Note that for some reason, I'm needing to run sync twice for that to happen. The first run does not do the merge and the second does. I'm unsure why and so am not fully done with this bug. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-07-05 17:09:49 -04:00
Joey Hess	a05bc6a314	Fix breakage when git is configured with safe.bareRepository = explicit Running git config --list inside .git then fails, so better to only do that when --git-dir was specified explicitly. Otherwise, when the repository is not bare, run the command inside the working tree. Also make init detect when the uuid it just set cannot be read and fail with an error, in case git changes something that breaks this later. I still don't actually understand why git-annex add/assist -J2 was affected but -J1 was not. But I did show that it was skipping writing to the location log, because the uuid was NoUUID. Sponsored-by: Graham Spencer on Patreon	2023-07-05 14:43:14 -04:00
Joey Hess	3c1d18cb3b	assist: With --jobs, parallelize transferring content to/from remotes Command.Add.seek starts concurrency with CommandStages. And for Command.Sync, it needs TransferStages. So, to get both types of concurrency for the two different parts, it either needs to change the type of concurrency in between, or just call startConcurrency once for each. It seems safe enough to call startConcurrency twice, because it does shut down concurrency (mostly) at the end, and eg the old Annex.workers get emptied. Sponsored-by: unqueued on Patreon	2023-07-05 12:47:30 -04:00
Joey Hess	e1fc9e204e	added git-annex satisfy This ended up having an interface like sync, rather than like get/copy/drop. That let it be implemented in terms of sync, which took a lot less code. Also, it lets it handle many of the edge cases that sync does, such as getting files that are not visible in a --hide-missing branch, and sending files to exporttree remotes. As well as being easier to implement, `git-annex satisfy myremote` makes sense as it satisfies the preferred content settings of the remote. `git-annex satisfy somefile` does not form a sentence that makes sense. So while -C can be a little bit annoying, it still makes sense to have this syntax. Note that, while I initially thought this would also satisfy numcopies, it does not. Arguably it ought to. But, sync does not send files in order to satisfy numcopies, it only sends files to satisfy preferred content. And it's important that this transfer the same files as sync does, because it will probably be used in a workflow where the user sometimes syncs and sometimes satisfies, and does not expect satisfy to do things that sync would not do. (Also opened a new bug that also affects sync et all, not only this command.) Sponsored-by: Nicholas Golder-Manning on Patreon	2023-06-29 15:34:53 -04:00
Joey Hess	1b9958f4fd	document git-annex satisfy	2023-06-29 14:15:01 -04:00
Joey Hess	d5c6197791	diffdriver: Added --text option for easy diffing of the contents of annexed text files This was already possible, but it was rather hard to come up with the complex shell command needed. Note that the diff output starts with "diff a/... b/...". I left off the "--git" because it's not a git format diff.	2023-06-28 15:27:16 -04:00
Joey Hess	fbd4dbaafe	fix some typos Anarcat fixed these in the news file, so transferred it over	2023-06-28 13:15:06 -04:00
Joey Hess	d98aa35b3b	reinject: Added --guesskeys option Sponsored-by: Noam Kremen on Patreon	2023-06-26 14:05:31 -04:00
Joey Hess	a8779f4c2a	prep release	2023-06-26 10:41:36 -04:00
Joey Hess	928b2a4839	create journal directory in withJournalHandle Fixes a crash by git-annex repair when .git/annex/journal/ does not exist. Normally the journal directory is created before withJournalHandle gets run, but git-annex repair can be run in a situation where it does not exist.	2023-06-21 15:23:59 -04:00
Joey Hess	bad444342e	reorder and condense	2023-06-21 13:48:31 -04:00
Joey Hess	3cec932bb5	changelog	2023-06-21 12:51:33 -04:00
Joey Hess	a861d56428	httpalso: Support being used with special remotes that use chunking. Sponsored-by: k0ld on Patreon	2023-06-20 13:35:28 -04:00
Joey Hess	958c2fa6d2	Improve resuming interrupted download when using yt-dlp or youtube-dl Fixes a failure like this: curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume. That happens because the whole web page has already been downloaded previously, and kept, so now addurl tries to download it, and curl asks the server to resume from the last byte. And youtube.com can't, for whatever stupid reason. So, delete the temp file after determining that youtube-dl can be used.	2023-06-19 15:01:47 -04:00
Joey Hess	a36a81dea3	Improve resuming interrupted download when using yt-dlp Sometimes resuming an interrupted download will fail to resume and download more files with different names. That resulted in the workdir having multiple files at the end, which causes git-annex to give up because it does not know what was downloaded. To fix this, use a yt-dlp feature, which appends to a file the name of each file after it's finished downloading it. So the presence of other cruft in the workdir will not confuse git-annex.	2023-06-19 14:39:08 -04:00
Joey Hess	217a6abb19	assistant: Fix a crash when a small file is deleted immediately after being created git add will fail if the file got deleted in the meantime. And since it was queued, there was a window until the queue flushed where a deletion of the file would cause a crash. Instead, reuse Command.Add.addFile, which sha1 hashes the file itself immediately, and then queues the index update. Ignore exceptions that will happen if the file got deleted already. Sponsored-by: k0ld on Patreon	2023-06-19 12:44:56 -04:00
Joey Hess	114a2d7504	Fix display when run with -J1 Commit `b6642dde8a` broke it by enabling non-concurrent display mode while leaving concurrency set in the config and having already started concurrency earlier. (I don't actually know if that commit was a good idea.) Sponsored-By: Brett Eisenberg on Patreon	2023-06-15 10:07:54 -04:00
Joey Hess	64738ea157	config: Added the --show-origin and --for-file options * config: Added the --show-origin and --for-file options. * config: Support annex.numcopies and annex.mincopies. There is a little bit of redundancy here with other code elsewhere that combines the various configs and selects which to use. But really only for the special case of annex.numcopies, which is a git config that does not override the annex branch setting and for annex.mincopies, which does not have a git config but does have gitattributes settings as well as the annex branch setting. That seems small enough, and unlikely enough to grow into a mess that it was worth supporting annex.numcopies and annex.mincopies in git-annex config --show-origin. Because these settings are a prime thing that someone might get confused about and want to know where they were configured. And, it followed that git-annex config might as well support those two for --set and --get as well. While this is redundant with the speclialized commands, it's only a little code and it makes it more consistent. Note that --set does not have as nice output as numcopies/mincopies commands in some special cases like setting to 0 or a negative number. It does avoid setting to a bad value thanks to the smart constructors (eg configuredNumCopies). As for other git-annex branch configurations that are not set by git-annex config, things like trust and wanted that are specific to a repository don't map to a git config name, so don't really fit into git-annex config. And they are only configured in the git-annex branch with no local override (at least so far), so --show-origin would not be useful for them. Sponsored-by: Dartmouth College's DANDI project	2023-06-12 16:24:31 -04:00
Joey Hess	38153ad340	assistant: Add dotfiles to git by default, unless annex.dotfiles is configured Tthe same as git-annex add does. Sponsored-by: Luke Shumaker on Patreon	2023-06-12 13:25:04 -04:00
Joey Hess	c33c226abd	fixed	2023-06-09 16:13:52 -04:00
Joey Hess	a0ab425c95	add ContentIndentifiersCidRemoteKeyIndex Optimise database to further speed up importing large trees from special remotes. See comment for details of why the other index didn't help cid queries. It would probably be better to manually create an index on only cid, rather than adding a second uniqueness constraint that is a larger index. But persitent does not support creating indexes, and an attempt to manually add it to the migration failed. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-06-09 15:12:33 -04:00
Joey Hess	6821ba8dab	sync: use log to track adjusted branch needs updating Speeds up sync in an adjusted branch by avoiding re-adjusting the branch unncessarily, particularly when it is adjusted with --hide-missing or --unlock-present. When there are a lot of files, that was the majority of the time of a --no-content sync. Uses a log file, which is updated when content presence changes. This adds a little bit of overhead to every file get/drop when on such an adjusted branch. The overhead is minimal for get of any size of file, but might be noticable for drop in some cases. It seems like a reasonable trade-off. It would be possible to update the log file only at the end, but then it would not happen if the command is interrupted. When not in an adjusted branch, there should be no additional overhead. (getCurrentBranch is an MVar read, and it avoids the MVar read of getGitConfig.) Note that this does not deal with situations such as: git checkout master, git-annex get, git checkout adjusted branch, git-annex sync. The sync won't know that the adjusted branch needs to be updated. Dealing with that would add overhead to operation in non-adjusted branches, which I don't like. Also, there are other situations like having two adjusted branches that both need to be updated like this, and switching between them and sync not updating. This does mean a behavior change to sync, since it did previously deal with those situations. But, the documentation did not say that it did. The man pages only talk about sync updating the adjusted branch after it transfers content. I did consider making sync keep track of content it transferred (and dropped) and only update the adjusted branch then, not to catch up to other changes made previously. That would perform better. But it seemed rather hard to implement, and also it would have problems with races with a concurrent get/drop, which this implementation avoids. And it seemed pretty likely someone had gotten used to get/drop followed by sync updating the branch. It seems much less likely someone is switching branches, doing get/drop, and then switching back and expecting sync to update the branch. Re-running git-annex adjust still does a full re-adjusting of the branch, for anyone who needs that. Sponsored-by: Leon Schuermann on Patreon	2023-06-08 14:35:41 -04:00
Joey Hess	3c15e0f7a0	cache negative lookups of global numcopies and mincopies Speeds up eg git-annex sync --content by up to 50%. When it does not need to transfer or drop anything, it now noops a lot more quickly. I didn't see anything else in sync --content noop loop that could really be sped up. It has to cat git objects to keys, stat object files, etc. Sponsored-by: unqueued on Patreon	2023-06-06 14:43:25 -04:00
Joey Hess	cfad0def18	wrap	2023-06-05 15:15:20 -04:00
Joey Hess	fe1b2dfb4b	speed up very first tree import by 25% Reading from the cidsdb is responsible for about 25% of the runtime of an import. Since the cidmap is used to store the same information in ram, the cidsdb is not written to during an import any longer. And so, if it started off empty (and updateFromLog wasn't needed), those reads can just be skipped. This is kind of a cheesy optimisation, since after any import from any special remote, the database will no longer be empty, so it's a single use optimisation. But it's probably not uncommon to start by importing a lot of files, and it can save a lot of time then. Sponsored-by: Brock Spratlen on Patreon	2023-06-02 13:30:30 -04:00
Joey Hess	40017089f2	use importChanges optimisation Large speed up to importing trees from special remotes that contain a lot of files, by only processing changed files. Benchmarks: Importing from a special remote that has 10000 files, that have all been imported before, and 1 new file sped up from 26.06 to 2.59 seconds. An import with no change and 10000 unchanged files sped up from 24.3 to 1.99 seconds. Going up to 20000 files, an import with no changes sped up from 125.95 to 3.84 seconds. Sponsored-by: k0ld on Patreon	2023-06-01 13:47:00 -04:00
Joey Hess	f6aa097a39	avoid import writing to cidsdb initially Speed up importing trees from special remotes somewhat by avoiding redundant writes to sqlite database. Before, import would write to both the git-annex branch and also to the sqlite database. But then the next time it was run, needsUpdateFromLog would see the branch had changed, so run updateFromLog, which would make the same writes to the sqlite database a second time. Now import writes only to the git-annex branch. The next time it's run, needsUpdateFromLog sees that the branch has changed and so calls updateFromLog, which updates the sqlite database. Why defer the write to the sqlite database like this? It seems that it could write to the database as it goes, and at the end call recordAnnexBranchTree to indicate that the information in the git-annex branch has all been written to the cidsdb. That would avoid the second import doing extra work. But, there could be other processes running at the same time, and one of them may update the git-annex branch, eg merging a remote git-annex branch into it. Any cids logs on that merged git-annex branch would not be reflected in the cidsdb yet. If the import then called recordAnnexBranchTree, the cidsdb would never get updated with that merged information. I don't think there's a good way to prevent, or to detect that situation. So, it can't call recordAnnexBranchTree at the end. So it might as well wait until the next run and do updateFromLog then. It could instead do updateFromLog at the end, but it's going to check needsUpdateFromLog at the beginning anyway. Note that the database writes were queued, so there is already a cidmap that is used to remember changes that the current process has made. So, omitting database writes can't change the behavior of the current process. Also note that thirdpartypopulatedimport uses recordcidkeyindb, which reflects what it already did. That code path does not use the cidmap, but does not need to query it either. It might be possible to make that code path also only update the git-annex branch and not the db, but I haven't checked. Sponsored-by: Noam Kremen on Patreon	2023-05-30 17:05:28 -04:00
Joey Hess	5070087a63	repair: Fix handling of git ref names on Windows Sponsored-by: Kevin Mueller on Patreon	2023-05-30 16:09:13 -04:00
Joey Hess	f2db6da938	default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon	2023-05-27 13:04:53 -04:00
Joey Hess	0f89d221bd	version: Avoid error message when entire output is not read Sponsored-by: Dartmouth College's Datalad project	2023-05-19 15:00:57 -04:00
Joey Hess	c4ad9b1446	Fix bug in -z handling of trailing NUL in input The obvious way to fix this would be to adapt lines to split on null. However, it's actually nontrivial to rewrite lines. In particular it has a weird implementation to avoid a space leak. See: https://gitlab.haskell.org/ghc/ghc/-/issues/4334 Also, while that is a small amount of code, it's covered by a rather complex copyright and I'd have to include that copyright in git-annex. So, I opted to filter out the trailing empty string instead. Sponsored-by: Dartmouth College's Datalad project	2023-05-19 14:34:02 -04:00
Joey Hess	e955912ad0	git-annex assist assist: New command, which is the same as git-annex sync but with new files added and content transferred by default. (Also this fixes another reversion in git-annex sync, --commit --no-commit, and --message were not enabled, oops.) See added comment for why git-annex assist does commit staged changes elsewhere in the work tree, but only adds files under the cwd. Note that it does not support --no-commit, --no-push, --no-pull like sync does. My thinking is, why should it? If you want that level of control, use git commit, git annex push, git annex pull. Sync only got those options because pull and push were not split out. Sponsored-by: k0ld on Patreon	2023-05-18 14:37:43 -04:00
Joey Hess	f93a7fce1d	sync: Started transition to --content being enabled by default When used without --content or --no-content, warn about the upcoming transition, and suggest using one of the options, or setting annex.synccontent. Sponsored-by: Brett Eisenberg on Patreon	2023-05-17 13:23:42 -04:00
Joey Hess	40731ff9fd	sync: Added -g as a short option for --no-content I anticipate that if sync is transitioned to syncing content by default, people will want a short option. And in repositories where annex.synccontent = true, they already would. And pull and push sync content by default, so a short option is useful with them too. Mnemonic: -g makes only git data be synced Also, -a makes only annex data be synced. Would have preferred -c, which would complement -C, but it was already taken to set git configs. Sponsored-by: Noam Kremen on Patreon	2023-05-17 12:34:26 -04:00
Joey Hess	5df89d58c7	git-annex pull and push Split out two new commands, git-annex pull and git-annex push. Those plus a git commit are equivilant to git-annex sync. In a sense, git-annex sync conflates 3 things, and it would have been better to have push and pull from the beginning and not sync. Although note that git-annex sync --content is faster than a pull followed by a push, because it only has to walk the tree once, look at preferred content once, etc. So there is some value in git-annex sync in speed, as well as user convenience. And it would be hard to split out pull and push from sync, as far as the implementaton goes. The implementation inside sync was easy, just adjust SyncOptions so it does the right thing. Note that the new commands default to syncing content, unless annex.synccontent is explicitly set to false. I'd like sync to also do that, but that's a hard transition to make. As a start to that transition, I added a note to git-annex-sync.mdwn that it may start to do so in a future version of git-annex. But a real transition would necessarily involve displaying warnings when sync is used without --content, and time. Sponsored-by: Kevin Mueller on Patreon	2023-05-16 16:51:07 -04:00
Joey Hess	2e984c51b6	sync --no-pull and --no-push affect download and upload of content The man page is somewhat vague about this, but I do think it was a bug that these options didn't alreay behave that way. The options are documented to disable imports and exports, which is the same operations just with a special remote that uses trees. The real motivation for this is that I'm adding git-annex pull and git-annex push, and I want these options to turn off the equivilant of those commands. And git-annex pull will certianly download and push upload. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-05-16 16:25:23 -04:00
Joey Hess	212442dd9b	pullOption should be pushOption in seekExportContent sync: Fix bug that made --no-pull, rather than --no-push prevent exporting trees to special remotes. Sponsored-by: Joshua Antonishen on Patreon	2023-05-16 15:55:24 -04:00
Joey Hess	271f3b1ab4	uninit: Support --json and --json-error-messages Had to convert uninit to do everything that can error out inside a CommandStart. This was harder than feels nice. (Also, in passing, converted CommandCheck to use a data type, not a weird number that it was not clear how it managed to be unique.) Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-11 13:43:02 -04:00
Joey Hess	02cfef1f91	uninit: Avoid buffering the names of all annexed files in memory Oops, using the same list twice does prevent streaming in constant memory. Sponsored-by: unqueued on Patreon	2023-05-11 13:25:55 -04:00
Joey Hess	de84abb210	configremote: Support --json and --json-error-messages Seems unlikely to be too useful, but who knows. Moved the checkSafeConfig call to happen after an action is started, so it will be captured by --json-error-messages Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-10 14:21:42 -04:00
Joey Hess	a242eabc7a	enableremote: Support --json and --json-error-messages Seems unlikely to be too useful, but who knows. Was trivial anyway. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-10 14:09:27 -04:00
Joey Hess	b3cc8dbacb	initremote: Support --json and --json-error-messages Including special --whatelse handling. Otherwise, it seems unlikely to be too useful, but who knows. Refactored code to call starting before displaying error messages. This makes the error messages be captured by --json-error-messages Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-10 14:03:40 -04:00
Joey Hess	8d8e044458	upgrade: Support --json and --json-error-messages and --json-progress Seems unlikely to be very useful, but trivial. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-10 12:54:48 -04:00
Joey Hess	c98fb0b637	merge: Support --json and --json-error-messages and --json-progress Seems unlikely to be very useful, but trivial. And, this completes the story that git-annex sync does not need json, since every sub-operation is available in a command that does support json. (Well, except for committing, but that's not a git-annex command.) Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-10 12:34:19 -04:00
Joey Hess	7919349cee	importfeed: Support --json and --json-error-messages and --json-progress Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-09 16:51:16 -04:00
Joey Hess	04ee6c4c6b	importfeed: Support -J (and work toward supporting --json) Both -J and --json needed importfeed to be refactored to use commandAction. That was difficult, because of the interrelated nature of downloading feeds and then downloading files from feeds, both of which needed to use commandAction. And then checking for problems in feeds has to come after these actions, which may be run as background jobs. As for --json support, it's most of the way there, but still has some warts, so I didn't enable jsonOptions yet. The warts include: - An initial empty json record is displayed by getCache. - Input is not populated, should be feed url - feedProblem at end will not be captured by --json-error-messages (see FIXME) Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-09 16:13:56 -04:00
Joey Hess	a71c831949	renameremote: Support --json and --json-error-messages Seems unlikely to be useful, but it works so Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-08 16:25:40 -04:00
Joey Hess	3d8f93dc0a	reinject: Support --json and --json-error-messages Also fix support for operating on multiple pairs of files and keys. Moved notAnnexed to inside starting, so error message will get into the json. Cannot include the key in the starting as it's not known yet, so instead add it to the json later. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-08 15:43:37 -04:00
Joey Hess	91b9915b09	reinit: Support --json and --json-error-messages Basically same concerns as init.. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-08 15:07:40 -04:00
Joey Hess	f09a248fe2	init: Support --json and --json-error-messages Dunno how useful this will be, since about all that's accessible from the json is whether it succeeded or failed, and the error messages which were already on stderr. Note that, when autoenabling a special remote, it would be possible for one to stop and prompt or output not using Messages and so not output as part of the json. I don't think that happens, but I'm not 100% sure something doesn't manage to break it. Of course, the same could be the case for commands that transfer objects. Using Annex.Init.autoEnableSpecialRemotes in --json mode would avoid the problem, but I've chosen to wait until I know it's needed to use it. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-08 14:58:08 -04:00
Joey Hess	c208442292	unused: Support --json and --json-error-messages Generalized AddJSONActionItemField to allow it to add several fields. Not entirely happy with that, since the names of the fields have to be carefully chosen to not conflict with other json fields. And fields added that way can't be parsed back in FromJSON, except for the "fields" field that is special cased for metadata. Still, I couldn't see another way to do it. Also, omit file:null from the json output. Which does affect other commands, eg git-annex whereis --all --json. Hopefully that won't break something that expects a null file. If it did, that could be reverted, but it would be ugly to have file:null in the unused --json Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-08 14:39:57 -04:00
Joey Hess	365dbc89dc	expire, trust et al, dead, describe: Support --json and --json-error-messages For expire, the normal output is unchanged, but the --json output includes the uuid in machine parseable form. Which could be very useful for this somewhat obscure command. That needed ActionItemUUID to be implemented, which seemed like a lot of work, but then --- I had been going to skip implementing them for trust, untrust, dead, semitrust, and describe, but putting the uuid in the json is useful information, it tells what uuid git-annex picked given the input. It was not hard to support these once ActionItemUUID was implemented. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-05 15:33:30 -04:00
Joey Hess	1a9af823bc	addunused, dropunused: Support --json and --json-error-messages This also changes addunused to display the names of the files that it adds. That seems like a general usability improvement, and not displaying the input number does not seem likely to be a problem to a user, since the filename is based on the key. Displaying the filename was necessary to get it and the key included in the json. dropunused does not include the key in the json. It would be possible to add, but would need more changes. And I doubt that dropunused --json would be used in a situation where a program cared which keys were dropped. Note that drop --unused does have the key in its json, so such a program could just use it. Or could just dropkey --batch with the specific keys it wants to drop if it cares about specific keys. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-05 14:01:40 -04:00
Joey Hess	1d4bd2dcb8	migrate, undo: Support --json and --json-error-messages Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-04 16:34:35 -04:00
Joey Hess	38fc5d3fc7	rekey, setpresentkey: Support --json and --json-error-messages Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-04 16:03:54 -04:00
Joey Hess	f20c8b087e	fix: Support --json and --json-error-messages And triaged out some commands that don't need to support these options. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-04 14:28:21 -04:00

1 2 3 4 5 ...

1687 commits