git-annex

Author	SHA1	Message	Date
jgoerzen	5dcbf7d41e	Added a comment	2023-06-02 03:25:27 +00:00
Joey Hess	f6dd34ca81	sync content with import remotes This didn't used to be needed because importKeys would import all content and so doing another pass was redundant. But since `40017089f2` it uses importChanges, so only new files are imported. If a file that was already imported before was dropped, that would prevent sync --content from gettng its content again. Sponsored-by: Jack Hill on Patreon	2023-06-01 18:52:19 -04:00
Joey Hess	92e4ed3cc0	retitle	2023-06-01 18:44:11 -04:00
Joey Hess	7178db5e06	Merge branch 'master' of ssh://git-annex.branchable.com	2023-06-01 18:43:29 -04:00
Joey Hess	2e92cef13f	comment	2023-06-01 18:43:17 -04:00
jgoerzen	53eeca40ae	Added a comment	2023-06-01 21:26:23 +00:00
Joey Hess	f1fe13c79c	devblog	2023-06-01 15:07:03 -04:00
Joey Hess	594110a6af	comment	2023-06-01 14:21:55 -04:00
Joey Hess	40017089f2	use importChanges optimisation Large speed up to importing trees from special remotes that contain a lot of files, by only processing changed files. Benchmarks: Importing from a special remote that has 10000 files, that have all been imported before, and 1 new file sped up from 26.06 to 2.59 seconds. An import with no change and 10000 unchanged files sped up from 24.3 to 1.99 seconds. Going up to 20000 files, an import with no changes sped up from 125.95 to 3.84 seconds. Sponsored-by: k0ld on Patreon	2023-06-01 13:47:00 -04:00
Joey Hess	029b08f54b	Merge branch 'master' of ssh://git-annex.branchable.com	2023-05-31 16:34:03 -04:00
Joey Hess	c6acf574c7	implement importChanges optimisaton (not used yet) For simplicity, I've not tried to make it handle History yet, so when there is a history, a full import will still be done. Probably the right way to handle history is to first diff from the current tree to the last imported tree. Then, diff from the current tree to each of the historical trees, and recurse through the history diffing from child tree to parent tree. I don't think that will need a record of the previously imported historical trees, and so Logs.Import doesn't store them. Although I did leave room for future expansion in that log just in case. Next step will be to change importTree to importChanges and modify recordImportTree et all to handle it, by using adjustTree. Sponsored-by: Brett Eisenberg on Patreon	2023-05-31 16:01:34 -04:00
Joey Hess	7298123520	build git trees using ContentIdentifier to speed up import This gets the trees built, but it does not use them. Next step will be to remember the tree for next time an import is done, and diff between old and new trees to find the files that have changed. Added --missing to the mktree parameters. That only disables a check, so it's ok to do everywhere mktree is used. It probably also speeds up mktree to disable the check. Note that git fsck does not complain about the resulting tree objects that point to shas that are not in the repository. Even with --strict. A quick benchmark, importing 10000 files, this slowed it down from 2:04.06 to 2:04.28. So it will more than pay for itself. Sponsored-by: Luke Shumaker on Patreon	2023-05-31 12:46:54 -04:00
Joey Hess	51319f8558	update	2023-05-30 17:19:23 -04:00
Joey Hess	f6aa097a39	avoid import writing to cidsdb initially Speed up importing trees from special remotes somewhat by avoiding redundant writes to sqlite database. Before, import would write to both the git-annex branch and also to the sqlite database. But then the next time it was run, needsUpdateFromLog would see the branch had changed, so run updateFromLog, which would make the same writes to the sqlite database a second time. Now import writes only to the git-annex branch. The next time it's run, needsUpdateFromLog sees that the branch has changed and so calls updateFromLog, which updates the sqlite database. Why defer the write to the sqlite database like this? It seems that it could write to the database as it goes, and at the end call recordAnnexBranchTree to indicate that the information in the git-annex branch has all been written to the cidsdb. That would avoid the second import doing extra work. But, there could be other processes running at the same time, and one of them may update the git-annex branch, eg merging a remote git-annex branch into it. Any cids logs on that merged git-annex branch would not be reflected in the cidsdb yet. If the import then called recordAnnexBranchTree, the cidsdb would never get updated with that merged information. I don't think there's a good way to prevent, or to detect that situation. So, it can't call recordAnnexBranchTree at the end. So it might as well wait until the next run and do updateFromLog then. It could instead do updateFromLog at the end, but it's going to check needsUpdateFromLog at the beginning anyway. Note that the database writes were queued, so there is already a cidmap that is used to remember changes that the current process has made. So, omitting database writes can't change the behavior of the current process. Also note that thirdpartypopulatedimport uses recordcidkeyindb, which reflects what it already did. That code path does not use the cidmap, but does not need to query it either. It might be possible to make that code path also only update the git-annex branch and not the db, but I haven't checked. Sponsored-by: Noam Kremen on Patreon	2023-05-30 17:05:28 -04:00
jgoerzen	f47e7abd57	Added a comment	2023-05-30 20:58:21 +00:00
Joey Hess	c1e415887a	improve test descriptions	2023-05-30 16:11:29 -04:00
Joey Hess	5070087a63	repair: Fix handling of git ref names on Windows Sponsored-by: Kevin Mueller on Patreon	2023-05-30 16:09:13 -04:00
Joey Hess	9ca81ed02a	update	2023-05-30 15:49:52 -04:00
Joey Hess	aaeae746f0	comment and a neat idea	2023-05-30 15:42:34 -04:00
Joey Hess	f9baf11e11	tab indentation	2023-05-30 15:42:11 -04:00
Joey Hess	5da7f703b0	comment	2023-05-30 14:30:39 -04:00
jgoerzen	e1fa970010		2023-05-30 12:23:28 +00:00
jgoerzen	4547a467b1		2023-05-30 00:37:10 +00:00
jgoerzen	da99a12f21		2023-05-30 00:35:54 +00:00
Mowgli	5fe8ae8f87	Added a comment: Use locales for that porpose	2023-05-29 22:42:13 +00:00
Daniel Höxtermann	afad119273	Add borg2annex to related_software	2023-05-28 07:12:15 +02:00
Joey Hess	595adac6ea	Merge branch 'master' of ssh://git-annex.branchable.com	2023-05-27 13:09:48 -04:00
Joey Hess	f2db6da938	default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon	2023-05-27 13:04:53 -04:00
matthew.cieslak@100d765b497d71318a302445df55bbab4b78f4d5	8dfc5dc16e		2023-05-25 13:44:52 +00:00
Joey Hess	f1cdb79ca4	assist: honor gitignore Sponsored-by: Graham Spencer on Patreon	2023-05-24 14:04:09 -04:00
nobodyinperson	0b9b85f009		2023-05-24 14:59:40 +00:00
yarikoptic	250194b7d1	Added a comment	2023-05-23 16:12:42 +00:00
Joey Hess	c64436518f	comment	2023-05-23 12:00:01 -04:00
Joey Hess	03437364b9	document -m	2023-05-23 11:46:54 -04:00
Joey Hess	b46126cd87	comment	2023-05-23 11:45:17 -04:00
Mowgli	b7788c718b		2023-05-23 13:10:45 +00:00
nobodyinperson	87e6c56a21		2023-05-22 11:22:42 +00:00
nobodyinperson	2510bdb799	Added a comment	2023-05-20 06:03:30 +00:00
yarikoptic	2616f7f0d3	Added a comment	2023-05-19 19:22:04 +00:00
yarikoptic	1fdef31769	Added a comment	2023-05-19 19:06:01 +00:00
Joey Hess	0f89d221bd	version: Avoid error message when entire output is not read Sponsored-by: Dartmouth College's Datalad project	2023-05-19 15:00:57 -04:00
Joey Hess	39f33a9988	Merge branch 'master' of ssh://git-annex.branchable.com	2023-05-19 14:54:09 -04:00
Joey Hess	5029fba7f4	comment	2023-05-19 14:53:18 -04:00
yarikoptic	b76a44511b	Added a comment	2023-05-19 18:49:48 +00:00
yarikoptic	b22d49b7f1	Added a comment	2023-05-19 18:49:29 +00:00
yarikoptic	b8a03643e5	Added a comment	2023-05-19 18:47:49 +00:00
Joey Hess	9ed59dab5b	assist: operate on all files in working tree by default Consistency with sync and internal consistency is more important than consistency with the assistant, which is not itself consistent about what it does when run in a subdirectory. Note that with -C, it will still commit staged changes to files outside the directory. Like sync does. Presumably if the user is manually staging things, then running this command, they intend to build up a commit. Sponsored-by: unqueued on Patreon	2023-05-19 14:47:05 -04:00
Joey Hess	c4ad9b1446	Fix bug in -z handling of trailing NUL in input The obvious way to fix this would be to adapt lines to split on null. However, it's actually nontrivial to rewrite lines. In particular it has a weird implementation to avoid a space leak. See: https://gitlab.haskell.org/ghc/ghc/-/issues/4334 Also, while that is a small amount of code, it's covered by a rather complex copyright and I'd have to include that copyright in git-annex. So, I opted to filter out the trailing empty string instead. Sponsored-by: Dartmouth College's Datalad project	2023-05-19 14:34:02 -04:00
Joey Hess	0184421a4d	comment	2023-05-19 13:53:21 -04:00
yarikoptic	ea7a904c0d	question about annotating availability in the snapshot	2023-05-19 14:36:46 +00:00

... 11 12 13 14 15 ...

43986 commits