Commit graph

33024 commits

Author SHA1 Message Date
Joey Hess
51319f8558
update 2023-05-30 17:19:23 -04:00
Joey Hess
f6aa097a39
avoid import writing to cidsdb initially
Speed up importing trees from special remotes somewhat by avoiding
redundant writes to sqlite database.

Before, import would write to both the git-annex branch and also to the
sqlite database. But then the next time it was run, needsUpdateFromLog
would see the branch had changed, so run updateFromLog, which would make
the same writes to the sqlite database a second time.

Now import writes only to the git-annex branch. The next time it's run,
needsUpdateFromLog sees that the branch has changed and so calls
updateFromLog, which updates the sqlite database.

Why defer the write to the sqlite database like this? It seems that it
could write to the database as it goes, and at the end call
recordAnnexBranchTree to indicate that the information in the git-annex
branch has all been written to the cidsdb. That would avoid the second
import doing extra work.

But, there could be other processes running at the same time, and one of
them may update the git-annex branch, eg merging a remote git-annex branch
into it. Any cids logs on that merged git-annex branch would not be
reflected in the cidsdb yet. If the import then called
recordAnnexBranchTree, the cidsdb would never get updated with that merged
information.

I don't think there's a good way to prevent, or to detect that situation.
So, it can't call recordAnnexBranchTree at the end. So it might as well
wait until the next run and do updateFromLog then. It could instead do
updateFromLog at the end, but it's going to check needsUpdateFromLog
at the beginning anyway.

Note that the database writes were queued, so there is already a cidmap
that is used to remember changes that the current process has made.
So, omitting database writes can't change the behavior of the current
process.

Also note that thirdpartypopulatedimport uses recordcidkeyindb, which
reflects what it already did. That code path does not use the cidmap,
but does not need to query it either. It might be possible to make that
code path also only update the git-annex branch and not the db, but I
haven't checked.

Sponsored-by: Noam Kremen on Patreon
2023-05-30 17:05:28 -04:00
Joey Hess
5070087a63
repair: Fix handling of git ref names on Windows
Sponsored-by: Kevin Mueller on Patreon
2023-05-30 16:09:13 -04:00
Joey Hess
9ca81ed02a
update 2023-05-30 15:49:52 -04:00
Joey Hess
aaeae746f0
comment and a neat idea 2023-05-30 15:42:34 -04:00
Joey Hess
5da7f703b0
comment 2023-05-30 14:30:39 -04:00
jgoerzen
e1fa970010 2023-05-30 12:23:28 +00:00
jgoerzen
4547a467b1 2023-05-30 00:37:10 +00:00
jgoerzen
da99a12f21 2023-05-30 00:35:54 +00:00
Mowgli
5fe8ae8f87 Added a comment: Use locales for that porpose 2023-05-29 22:42:13 +00:00
Daniel Höxtermann
afad119273 Add borg2annex to related_software 2023-05-28 07:12:15 +02:00
Joey Hess
595adac6ea
Merge branch 'master' of ssh://git-annex.branchable.com 2023-05-27 13:09:48 -04:00
Joey Hess
f2db6da938
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.

Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.

So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.

git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.

Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.

Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?

Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 13:04:53 -04:00
matthew.cieslak@100d765b497d71318a302445df55bbab4b78f4d5
8dfc5dc16e 2023-05-25 13:44:52 +00:00
Joey Hess
f1cdb79ca4
assist: honor gitignore
Sponsored-by: Graham Spencer on Patreon
2023-05-24 14:04:09 -04:00
nobodyinperson
0b9b85f009 2023-05-24 14:59:40 +00:00
yarikoptic
250194b7d1 Added a comment 2023-05-23 16:12:42 +00:00
Joey Hess
c64436518f
comment 2023-05-23 12:00:01 -04:00
Joey Hess
03437364b9
document -m 2023-05-23 11:46:54 -04:00
Joey Hess
b46126cd87
comment 2023-05-23 11:45:17 -04:00
Mowgli
b7788c718b 2023-05-23 13:10:45 +00:00
nobodyinperson
87e6c56a21 2023-05-22 11:22:42 +00:00
nobodyinperson
2510bdb799 Added a comment 2023-05-20 06:03:30 +00:00
yarikoptic
2616f7f0d3 Added a comment 2023-05-19 19:22:04 +00:00
yarikoptic
1fdef31769 Added a comment 2023-05-19 19:06:01 +00:00
Joey Hess
0f89d221bd
version: Avoid error message when entire output is not read
Sponsored-by: Dartmouth College's Datalad project
2023-05-19 15:00:57 -04:00
Joey Hess
39f33a9988
Merge branch 'master' of ssh://git-annex.branchable.com 2023-05-19 14:54:09 -04:00
Joey Hess
5029fba7f4
comment 2023-05-19 14:53:18 -04:00
yarikoptic
b76a44511b Added a comment 2023-05-19 18:49:48 +00:00
yarikoptic
b22d49b7f1 Added a comment 2023-05-19 18:49:29 +00:00
yarikoptic
b8a03643e5 Added a comment 2023-05-19 18:47:49 +00:00
Joey Hess
9ed59dab5b
assist: operate on all files in working tree by default
Consistency with sync and internal consistency is more important than
consistency with the assistant, which is not itself consistent about
what it does when run in a subdirectory.

Note that with -C, it will still commit staged changes to files outside
the directory. Like sync does. Presumably if the user is manually
staging things, then running this command, they intend to build up a
commit.

Sponsored-by: unqueued on Patreon
2023-05-19 14:47:05 -04:00
Joey Hess
c4ad9b1446
Fix bug in -z handling of trailing NUL in input
The obvious way to fix this would be to adapt lines to split on null.

However, it's actually nontrivial to rewrite lines. In particular it has a
weird implementation to avoid a space leak. See:
https://gitlab.haskell.org/ghc/ghc/-/issues/4334

Also, while that is a small amount of code, it's covered by a rather
complex copyright and I'd have to include that copyright in git-annex.

So, I opted to filter out the trailing empty string instead.

Sponsored-by: Dartmouth College's Datalad project
2023-05-19 14:34:02 -04:00
Joey Hess
0184421a4d
comment 2023-05-19 13:53:21 -04:00
yarikoptic
ea7a904c0d question about annotating availability in the snapshot 2023-05-19 14:36:46 +00:00
yarikoptic
d73467ebf7 dropkey -z not working 2023-05-19 13:53:46 +00:00
nobodyinperson
f8df46537d Added a comment: 👍 git annex assist 2023-05-19 05:54:44 +00:00
Joey Hess
e955912ad0
git-annex assist
assist: New command, which is the same as git-annex sync but with
new files added and content transferred by default.

(Also this fixes another reversion in git-annex sync,
--commit --no-commit, and --message were not enabled, oops.)

See added comment for why git-annex assist does commit staged
changes elsewhere in the work tree, but only adds files under
the cwd.

Note that it does not support --no-commit, --no-push, --no-pull
like sync does. My thinking is, why should it? If you want that
level of control, use git commit, git annex push, git annex pull.
Sync only got those options because pull and push were not split
out.

Sponsored-by: k0ld on Patreon
2023-05-18 14:37:43 -04:00
nobodyinperson
84563b7da4 Added a comment 2023-05-18 08:40:37 +00:00
Joey Hess
8987cc214e
update 2023-05-17 13:41:33 -04:00
Joey Hess
502e7affb9
Merge branch 'master' of ssh://git-annex.branchable.com 2023-05-17 13:33:47 -04:00
Joey Hess
5cc89b7444
comment 2023-05-17 13:33:42 -04:00
Joey Hess
c52d7338a2
improve docs
The man pages for these were not really clear that they add new files.
2023-05-17 13:32:47 -04:00
Joey Hess
f93a7fce1d
sync: Started transition to --content being enabled by default
When used without --content or --no-content, warn about the upcoming
transition, and suggest using one of the options, or setting
annex.synccontent.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-17 13:23:42 -04:00
Joey Hess
af6b73a7e6
push: Support --cleanup
This option is not specific to sync, so it seemed it should be in either
pull or push as well as sync. Since it does modify the remote, it seems
better to have it in push; the modification of the local repo pulls in
the direction of pull, but not hard enough.

Maybe it would be better to have it in both?

Sponsored-by: Luke Shumaker on Patreon
2023-05-17 12:51:49 -04:00
Joey Hess
40731ff9fd
sync: Added -g as a short option for --no-content
I anticipate that if sync is transitioned to syncing content by default,
people will want a short option. And in repositories where
annex.synccontent = true, they already would. And pull and push sync
content by default, so a short option is useful with them too.

Mnemonic: -g makes only git data be synced
Also, -a makes only annex data be synced.

Would have preferred -c, which would complement -C, but it
was already taken to set git configs.

Sponsored-by: Noam Kremen on Patreon
2023-05-17 12:34:26 -04:00
nobodyinperson
62b2a88dac Added a comment 2023-05-17 10:41:18 +00:00
Joey Hess
5df89d58c7
git-annex pull and push
Split out two new commands, git-annex pull and git-annex push. Those plus a
git commit are equivilant to git-annex sync.

In a sense, git-annex sync conflates 3 things, and it would have been
better to have push and pull from the beginning and not sync. Although
note that git-annex sync --content is faster than a pull followed by a
push, because it only has to walk the tree once, look at preferred
content once, etc. So there is some value in git-annex sync in speed, as
well as user convenience.

And it would be hard to split out pull and push from sync, as far as the
implementaton goes. The implementation inside sync was easy, just adjust
SyncOptions so it does the right thing.

Note that the new commands default to syncing content, unless
annex.synccontent is explicitly set to false. I'd like sync to also do
that, but that's a hard transition to make. As a start to that
transition, I added a note to git-annex-sync.mdwn that it may start to
do so in a future version of git-annex. But a real transition would
necessarily involve displaying warnings when sync is used without
--content, and time.

Sponsored-by: Kevin Mueller on Patreon
2023-05-16 16:51:07 -04:00
Joey Hess
705b7e2d36
comment 2023-05-16 13:08:43 -04:00
nobodyinperson
5c6b299ef6 Added a comment 2023-05-15 21:13:34 +00:00