git-annex

Author	SHA1	Message	Date
Joey Hess	38b9ebc5fd	newtype MapLog Noticed that Semigroup instance of Map is not suitable to use for MapLog. For example, it behaved like this: ghci> parseTrustLog "foo 1 timestamp=10\nfoo 2 timestamp=11" <> parseTrustLog "foo X timestamp=12" fromList [(UUID "foo",LogEntry {changed = VectorClock 11s, value = SemiTrusted})] Which was wrong, it lost the newer DeadTrusted value. Luckily, nothing used that Semigroup when operating on a MapLog. And this provides a safe instance. Sponsored-by: Graham Spencer on Patreon	2023-11-13 14:37:22 -04:00
Joey Hess	5d8b8a8ad0	git-annex log --totalsizes Note that dead repositories are not yet handled so their sizes show as nonzero after they are marked dead. Sponsored-By: unqueued on Patreon	2023-11-13 13:15:36 -04:00
Joey Hess	dc02236c85	git-annex log --sizes CSV format so it can be fed into a program to graph it. Note that dead repositories are not yet handled so their sizes show as nonzero after they are marked dead. Sponsored-By: k0ld on Patreon	2023-11-13 13:07:22 -04:00
Joey Hess	6203b8afba	the last line is for the current time	2023-11-10 17:37:55 -04:00
Joey Hess	574514545c	git-annex log --sizesof This can take a lot of memory. I decided to violate the usual rule in git-annex that it operate in constant memory no matter how many annexed objects. In this case, it would be hard to be fast without using a big map of the location logs. The main difficulty here is that there can be many git-annex branches and it needs to display a consistent view at a point in time, which means merging information from multiple git-annex branches. I have not checked if there are any laziness leaks in this code. It takes 1 gb to run in my big repo, which is around what I estimated before writing it. 2 options that are documented are not yet implemented. Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the next change after 12:59 is then. Then it waits until after 2:10 to display the next change. It ought to wait until after 2:00. Sponsored-by: Brock Spratlen on Patreon	2023-11-10 17:26:10 -04:00
Joey Hess	561c036664	split out generic git log parser Sponsored-By: Jack Hill on Patreon	2023-11-10 15:40:03 -04:00
Joey Hess	11cc9f1933	info: Added calculation of combined annex size of all repositories Factored out overLocationLogs from CmdLine.Seek, which can calculate this pretty fast even in a large repo. In my big repo, the time to run git-annex info went up from 1.33s to 8.5s. Note that the "backend usage" stats are for annexed files in the working tree only, not all annexed files. This new data source would let that be changed, but that would be a confusing behavior change. And I cannot retitle it either, out of fear something uses the current title (eg parsing the json). Also note that, while time says "402108maxresident" in my big repo now, up from "54092maxresident", top shows the RES constant at 64mb, and it was 48mb before. So I don't think there is a memory leak. I tried using deepseq to force full evaluation of addKeyCopies and memory use didn't change, which also says no memory leak. And indeed, not even calling addKeyCopies resulted in the same memory use. Probably the increased memory usage is buffering the stream of data from git in overLocationLogs. Sponsored-by: Brett Eisenberg on Patreon	2023-11-08 13:35:11 -04:00
Joey Hess	8768966d97	improve comments	2023-11-08 12:06:03 -04:00
Joey Hess	f8d35d9480	lookupkey: Sped up --batch When the file is relative, it does not need to be passed through git lsfiles to normalize it. Sponsored-by: Kevin Mueller on Patreon	2023-10-30 14:59:09 -04:00
Joey Hess	d9fd205cbb	push RawFilePath down into Annex.ReplaceFile Minor optimisation, but a win in every case, except for a couple where it's a wash. Note that replaceFile still takes a FilePath, because it needs to operate on Chars to truncate unicode filenames properly.	2023-10-26 13:36:49 -04:00
Joey Hess	c873586e14	eliminate s2w8 and w82s Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode characters. Luckily, genUUIDInNameSpace is only ever used on ASCII strings as far as I can determine. In particular, git-remote-gcrypt's gcrypt-id is an ASCII string.	2023-10-26 13:12:57 -04:00
Joey Hess	8bde6101e3	sqlite datbase for importfeed importfeed: Use caching database to avoid needing to list urls on every run, and avoid using too much memory. Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster, and memory use dropped from 203000k to 59408k. Database.ImportFeed is Database.ContentIdentifier with the serial number filed off. There is a bit of code duplication I would like to avoid, particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use the persistent sqlite tables, so despite the code being the same, they cannot be factored out. Since this database includes the contentidentifier metadata, it will be slightly redundant if a sqlite database is ever added for metadata. I did consider making such a generic database and using it for this. But, that would then need importfeed to update both the url database and the metadata database, which is twice as much work diffing the git-annex branch trees. Or would entagle updating two databases in a complex way. So instead it seems better to optimise the database that importfeed needs, and if the metadata database is used by another command, use a little more disk space and do a little bit of redundant work to update it. Sponsored-by: unqueued on Patreon	2023-10-23 16:46:22 -04:00
Joey Hess	41f4d0bda9	enableremote: Avoid overwriting existing git remote when passed the uuid of a specialremote that was earlier initialized with the same name	2023-09-22 13:29:48 -04:00
Joey Hess	ef7c867238	fix some build warnings from ghc 9.4.6 It now notices that a RepoLocation may not be Local, in which case pattern matching on Local wouldn't do.	2023-09-21 13:40:22 -04:00
Joey Hess	a147a31baa	fix some build warnings from ghc 9.4.6 For some reason it doesn't notice that req must be a Req, because the toplevel function matched on that.	2023-09-21 13:38:36 -04:00
Joey Hess	a18e40bdd7	lookupkey: Added --ref option Sponsored-by: Joshua Antonishen on Patreon	2023-09-12 12:49:11 -04:00
Joey Hess	7be8950138	propigateAdjustedCommits in seekExportContent push: When on an adjusted branch, propagate changes to parent branch before updating export remotes. This is a somewhat redundant call to propigateAdjustedCommits, since it also gets called at pushLocal time. That other one needs to come after importing from importtree remotes though, and seekExportContent has to come earlier, so I don't see a way to avoid doing it twice. Note that git-annex sync also manages to avoid the problem, it's only git-annex push that had the bug. Sponsored-by: Leon Schuermann on Patreon	2023-09-11 14:54:26 -04:00
Joey Hess	aeaadb8eb8	improve warning message when unable to update export A misleading message was displayed in several cases. If the user has run eg: git config remote.push-win-remote.annex-tracking-branch 'adjusted/main(unlocked)' That is not supported, and now it will tell them it's not a valid configuration. A user reported doing that, but I don't know if it's a common point of confusion. If it is a common problem, a better message would be possible, or it could convert back from the adjusted branch to the actual branch. Sponsored-by: Graham Spencer on Patreon	2023-09-11 14:21:36 -04:00
Joey Hess	49b97b0675	oldkeys: check associated files by default and add --unchecked Removed the prior code that checked for keys used by current versions of the files being acted on. It is redundant with the associated files check (so long as the associated files database is always up-to-date, which reconcileStaged should accomplish). Sponsored-by: Luke T. Shumaker on Patreon	2023-08-23 13:46:41 -04:00
Joey Hess	5489c2cdd6	oldkeys --revision-range Sponsored-by: Brett Eisenberg on Patreon	2023-08-22 15:00:29 -04:00
Joey Hess	cf8b30c914	oldkeys: New command that lists the keys used by old versions of a file The tricky thing about this turned out to be handling renames and reverts. For that, it has to make two passes over the git log, and to avoid buffering a possibly huge amount of logs in memory (ie the whole git log of an entire repository!), runs git log twice. (It might be possible to speed this up by asking git log to show a diff, and so avoid needing to use catKey.) Sponsored-By: Brock Spratlen on Patreon	2023-08-22 14:51:06 -04:00
Joey Hess	379d58b499	diffdriver: Added --get option Removed the dontCheck repoExists, because running it in a repo that has not been initialized yet would update location log with nouuid. And I guess it's ok for it to only support running in git-annex repos.	2023-08-22 11:58:53 -04:00
Joey Hess	67c99a4db7	info: Added available to the info displayed for a remote Sponsored-by: Kevin Mueller on Patreon	2023-08-16 14:52:58 -04:00
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	7f7c95b771	move comment	2023-08-16 13:19:17 -04:00
Joey Hess	10b5f79e2d	fix empty tree import when directory does not exist Fix behavior when importing a tree from a directory remote when the directory does not exist. An empty tree was imported, rather than the import failing. Merging that tree would delete every file in the branch, if those files had been exported to the directory before. The problem was that dirContentsRecursive returned [] when the directory did not exist. Better for it to throw an exception. But in commit `74f0d67aa3` back in 2012, I made it never theow exceptions, because exceptions throw inside unsafeInterleaveIO become untrappable when the list is being traversed. So, changed it to list the contents of the directory before entering unsafeInterleaveIO. So exceptions are thrown for the directory. But still not if it's unable to list the contents of a subdirectory. That's less of a problem, because the subdirectory does exist (or if not, it got removed after being listed, and it's ok to not include it in the list). A subdirectory that has permissions that don't allow listing it will have its contents omitted from the list still. (Might be better to have it return a type that includes indications of errors listing contents of subdirectories?) The rest of the changes are making callers of dirContentsRecursive use emptyWhenDoesNotExist when they relied on the behavior of it not throwing an exception when the directory does not exist. Note that it's possible some callers of dirContentsRecursive that used to ignore permissions problems listing a directory will now start throwing exceptions on them. The fix to the directory special remote consisted of not making its call in listImportableContentsM use emptyWhenDoesNotExist. So it will throw an exception as desired. Sponsored-by: Joshua Antonishen on Patreon	2023-08-15 12:57:41 -04:00
Joey Hess	d467c70ef7	change sync content transition plan and fine tune warning Only display warning when git-annex sync (without --content or --no-content) is used with repositories that have preferred content configured. Sponsored-by: Leon Schuermann on Patreon	2023-08-14 13:51:35 -04:00
Joey Hess	be028f10e5	split out Utility.Url.Parse This is mostly for git-repair which can't include all of Utility.Url without adding many dependencies that are not really necessary.	2023-08-14 12:28:10 -04:00
Joey Hess	3efad7f5f4	info: Added --dead-repositories option I considered a more wide-ranging config option to make other commands also show dead repositories. But it would be difficult to implement that because Remote.keyLocations is used to get locations, filtering out dead repos, and commands like get then try to use those locations. So a config setting would make dead repos sometimes be acted on by commands. Sponsored-by: unqueued on Patreon	2023-08-09 12:43:48 -04:00
Joey Hess	68c9b08faf	fix build with unix-2.8.0 Changed the parameters to openFd. So needed to add a small wrapper library to keep supporting older versions as well.	2023-08-01 18:41:27 -04:00
Joey Hess	aa5e333cb7	fix whitespace Thanks to a compile warning from new ghc	2023-08-01 18:36:54 -04:00
Joey Hess	518a51a8a0	--explain for preferred/required content matching And annex.largefiles and annex.addunlocked. Also git-annex matchexpression --explain explains why its input expression matches or fails to match. When there is no limit, avoid explaining why the lack of limit matches. This is also done when no preferred content expression is set, although in a few cases it defaults to a non-empty matcher, which will be explained. Sponsored-by: Dartmouth College's DANDI project	2023-07-26 14:50:04 -04:00
Joey Hess	7f38355860	dropunused: Support --jobs Sponsored-by: Kevin Mueller on Patreon	2023-07-21 14:03:34 -04:00
Joey Hess	7fc6503812	fix waiting for all started feed downloads with -J importfeed bug fix: When -J was used with multiple feeds, some feeds did not get their items downloaded. In my case, I had added a feed to the end of the list, and no items from it were ever downloaded. Sponsored-by: Leon Schuermann on Patreon	2023-07-11 22:08:35 -04:00
Joey Hess	240bae38f6	sync: When in an adjusted branch, merge changes from the original branch This causes changes to the original branch to get merged with a single sync. Before, it took 2 syncs; the first happened to update the synced/ branch, and the second merged changes from the synced/ branch into the ajusted branch. Using mergeToAdjustedBranch when tomerge == origbranch is probably overkill, but it does work fine. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-07-06 12:42:24 -04:00
Joey Hess	51b24aac91	importfeed: Add feedurl to the metadata (And allow it to be used in the --template although that seems unlikely to be very useful.) My use case for this is that one of the podcast feeds I subscribe to is sometimes leaking episodes of some other podcast. The other podcast is also very close to spam, so this may be a form of intentional spamming. I have not been able to catch the podcast feed containing those episodes, so I don't know which one is at fault. So putting this in the metadata will let me eventually catch it.	2023-07-06 00:11:38 -04:00
Joey Hess	3d810726af	diffdriver --text support options for diff Sponsored-by: KDM on Patreon	2023-07-05 15:43:29 -04:00
Joey Hess	3c1d18cb3b	assist: With --jobs, parallelize transferring content to/from remotes Command.Add.seek starts concurrency with CommandStages. And for Command.Sync, it needs TransferStages. So, to get both types of concurrency for the two different parts, it either needs to change the type of concurrency in between, or just call startConcurrency once for each. It seems safe enough to call startConcurrency twice, because it does shut down concurrency (mostly) at the end, and eg the old Annex.workers get emptied. Sponsored-by: unqueued on Patreon	2023-07-05 12:47:30 -04:00
Joey Hess	e1fc9e204e	added git-annex satisfy This ended up having an interface like sync, rather than like get/copy/drop. That let it be implemented in terms of sync, which took a lot less code. Also, it lets it handle many of the edge cases that sync does, such as getting files that are not visible in a --hide-missing branch, and sending files to exporttree remotes. As well as being easier to implement, `git-annex satisfy myremote` makes sense as it satisfies the preferred content settings of the remote. `git-annex satisfy somefile` does not form a sentence that makes sense. So while -C can be a little bit annoying, it still makes sense to have this syntax. Note that, while I initially thought this would also satisfy numcopies, it does not. Arguably it ought to. But, sync does not send files in order to satisfy numcopies, it only sends files to satisfy preferred content. And it's important that this transfer the same files as sync does, because it will probably be used in a workflow where the user sometimes syncs and sometimes satisfies, and does not expect satisfy to do things that sync would not do. (Also opened a new bug that also affects sync et all, not only this command.) Sponsored-by: Nicholas Golder-Manning on Patreon	2023-06-29 15:34:53 -04:00
Joey Hess	d5c6197791	diffdriver: Added --text option for easy diffing of the contents of annexed text files This was already possible, but it was rather hard to come up with the complex shell command needed. Note that the diff output starts with "diff a/... b/...". I left off the "--git" because it's not a git format diff.	2023-06-28 15:27:16 -04:00
Joey Hess	549d390d03	display drop from remote more consistently With eg copy --to remote This is particularly an improvement in sync --content output, which mixes the two, so it's nice to have consistent display.	2023-06-27 19:01:33 -04:00
Joey Hess	d98aa35b3b	reinject: Added --guesskeys option Sponsored-by: Noam Kremen on Patreon	2023-06-26 14:05:31 -04:00
Joey Hess	39f3d783fe	consolidate	2023-06-20 15:10:11 -04:00
Joey Hess	72715845a1	display destination file before youtube-dl download Rather than after it, which can leave one wondering what file it's downloading. youtubeDl should not ever return Right Nothing in normal operation, becaause it's already asked youtube-dl if it supports the url. So it would have to succeed at that, then not download any file, but also exit successfully, in order for the new error message to display. Also display the name of yt-dlp when using it.	2023-06-20 14:55:25 -04:00
Joey Hess	958c2fa6d2	Improve resuming interrupted download when using yt-dlp or youtube-dl Fixes a failure like this: curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume. That happens because the whole web page has already been downloaded previously, and kept, so now addurl tries to download it, and curl asks the server to resume from the last byte. And youtube.com can't, for whatever stupid reason. So, delete the temp file after determining that youtube-dl can be used.	2023-06-19 15:01:47 -04:00
Joey Hess	1f09b709fc	skip sending individual files to export remotes That will fail, and it already exports whole trees. `f6dd34ca81` made it sync content with import remotes, and if an import remote is also an export remote, that caused this new failure mode. Sponsored-by: Brock Spratlen on Patreon	2023-06-19 11:24:32 -04:00
Joey Hess	64738ea157	config: Added the --show-origin and --for-file options * config: Added the --show-origin and --for-file options. * config: Support annex.numcopies and annex.mincopies. There is a little bit of redundancy here with other code elsewhere that combines the various configs and selects which to use. But really only for the special case of annex.numcopies, which is a git config that does not override the annex branch setting and for annex.mincopies, which does not have a git config but does have gitattributes settings as well as the annex branch setting. That seems small enough, and unlikely enough to grow into a mess that it was worth supporting annex.numcopies and annex.mincopies in git-annex config --show-origin. Because these settings are a prime thing that someone might get confused about and want to know where they were configured. And, it followed that git-annex config might as well support those two for --set and --get as well. While this is redundant with the speclialized commands, it's only a little code and it makes it more consistent. Note that --set does not have as nice output as numcopies/mincopies commands in some special cases like setting to 0 or a negative number. It does avoid setting to a bad value thanks to the smart constructors (eg configuredNumCopies). As for other git-annex branch configurations that are not set by git-annex config, things like trust and wanted that are specific to a repository don't map to a git config name, so don't really fit into git-annex config. And they are only configured in the git-annex branch with no local override (at least so far), so --show-origin would not be useful for them. Sponsored-by: Dartmouth College's DANDI project	2023-06-12 16:24:31 -04:00
Joey Hess	f6dd34ca81	sync content with import remotes This didn't used to be needed because importKeys would import all content and so doing another pass was redundant. But since `40017089f2` it uses importChanges, so only new files are imported. If a file that was already imported before was dropped, that would prevent sync --content from gettng its content again. Sponsored-by: Jack Hill on Patreon	2023-06-01 18:52:19 -04:00
Joey Hess	40017089f2	use importChanges optimisation Large speed up to importing trees from special remotes that contain a lot of files, by only processing changed files. Benchmarks: Importing from a special remote that has 10000 files, that have all been imported before, and 1 new file sped up from 26.06 to 2.59 seconds. An import with no change and 10000 unchanged files sped up from 24.3 to 1.99 seconds. Going up to 20000 files, an import with no changes sped up from 125.95 to 3.84 seconds. Sponsored-by: k0ld on Patreon	2023-06-01 13:47:00 -04:00
Joey Hess	c6acf574c7	implement importChanges optimisaton (not used yet) For simplicity, I've not tried to make it handle History yet, so when there is a history, a full import will still be done. Probably the right way to handle history is to first diff from the current tree to the last imported tree. Then, diff from the current tree to each of the historical trees, and recurse through the history diffing from child tree to parent tree. I don't think that will need a record of the previously imported historical trees, and so Logs.Import doesn't store them. Although I did leave room for future expansion in that log just in case. Next step will be to change importTree to importChanges and modify recordImportTree et all to handle it, by using adjustTree. Sponsored-by: Brett Eisenberg on Patreon	2023-05-31 16:01:34 -04:00

1 2 3 4 5 ...

2752 commits