From d8b5c8479fcbeac01472de331edaeb35afb09b29 Mon Sep 17 00:00:00 2001 From: bjornw Date: Tue, 2 Nov 2021 23:10:14 +0000 Subject: [PATCH 1/4] --- doc/forum/Dropping_checksum_from_URL_key.mdwn | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/forum/Dropping_checksum_from_URL_key.mdwn diff --git a/doc/forum/Dropping_checksum_from_URL_key.mdwn b/doc/forum/Dropping_checksum_from_URL_key.mdwn new file mode 100644 index 0000000000..fffbd2dd7a --- /dev/null +++ b/doc/forum/Dropping_checksum_from_URL_key.mdwn @@ -0,0 +1,9 @@ +I accidentally ran git annex importfeed without using the "--relaxed" option. This means that I know have a great many files in the annex with keys that look like this: + +"URL-s108794401--https://media.blubrry.com/thedi-7c04aebc2e18898889af95c74ab3edf0" + +The problem is that these keys seem to encode a size ("s108794401"), and when I attempt to fetch them from their URLs, git annex detects that the size has changed. I'd like to convert all such files to instead use a key without a size. For example: + +"URL--https://media.blubrry.com/thedi-7c04aebc2e18898889af95c74ab3edf0" + +What's the best way to do this? I tried "git annex migrate ... --backend URL" to no avail. From c853992e2fbc75e85279d392d91c550a5fe5c389 Mon Sep 17 00:00:00 2001 From: bjornw Date: Tue, 2 Nov 2021 23:12:33 +0000 Subject: [PATCH 2/4] --- doc/forum/Dropping_checksum_from_URL_key.mdwn | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/forum/Dropping_checksum_from_URL_key.mdwn b/doc/forum/Dropping_checksum_from_URL_key.mdwn index fffbd2dd7a..d0ae8d7c4b 100644 --- a/doc/forum/Dropping_checksum_from_URL_key.mdwn +++ b/doc/forum/Dropping_checksum_from_URL_key.mdwn @@ -1,9 +1,9 @@ I accidentally ran git annex importfeed without using the "--relaxed" option. This means that I know have a great many files in the annex with keys that look like this: -"URL-s108794401--https://media.blubrry.com/thedi-7c04aebc2e18898889af95c74ab3edf0" +> URL-s108794401--https://media.blubrry.com/thedi-7c04aebc2e18898889af95c74ab3edf0 The problem is that these keys seem to encode a size ("s108794401"), and when I attempt to fetch them from their URLs, git annex detects that the size has changed. I'd like to convert all such files to instead use a key without a size. For example: -"URL--https://media.blubrry.com/thedi-7c04aebc2e18898889af95c74ab3edf0" +> URL--https://media.blubrry.com/thedi-7c04aebc2e18898889af95c74ab3edf0 What's the best way to do this? I tried "git annex migrate ... --backend URL" to no avail. From bf1408f7bf5a6cc0c45e3aea865bba17e449ba4d Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 3 Nov 2021 15:44:05 -0400 Subject: [PATCH 3/4] long-running-smudge branch started --- doc/todo/git_smudge_clean_interface_suboptiomal.mdwn | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/todo/git_smudge_clean_interface_suboptiomal.mdwn b/doc/todo/git_smudge_clean_interface_suboptiomal.mdwn index e4e20d7612..fe6d94d51e 100644 --- a/doc/todo/git_smudge_clean_interface_suboptiomal.mdwn +++ b/doc/todo/git_smudge_clean_interface_suboptiomal.mdwn @@ -135,6 +135,8 @@ The best fix would be to improve git's smudge/clean interface: > > probably make git-annex twice as slow for large files, although > > it would speed up git add of many small files. git-annex add > > could be used to work around any speed impact. +> > (The long-running-smudge branch has some preliminary work to doing +> > this.) > > > > Or git could be extended > > with a capability in the protocol that lets the clean filter read the From efe0554f229e4d6c475da2fa36c9f01668b5008e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 3 Nov 2021 16:06:32 -0400 Subject: [PATCH 4/4] devblog --- ...day_641__an_alternative_smudge_filter.mdwn | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 doc/devblog/day_641__an_alternative_smudge_filter.mdwn diff --git a/doc/devblog/day_641__an_alternative_smudge_filter.mdwn b/doc/devblog/day_641__an_alternative_smudge_filter.mdwn new file mode 100644 index 0000000000..74e0c8f508 --- /dev/null +++ b/doc/devblog/day_641__an_alternative_smudge_filter.mdwn @@ -0,0 +1,26 @@ +Would you rather that `git checkout` got a lot faster at checking out a lot +of files, and `git add` got a lot faster at adding a lot of small files, if +the tradeoff was that `git add` and `git commit -a` got slower at adding +large files to the annex than they are now? + +Being able to make that choice is what I'm working on now. Of course, +we'd rather it were all fast, but due to +[[todo/git_smudge_clean_interface_suboptiomal]], that is not possible +without improvements to git. But I seem to have a plan that will +work around enough of the problems to let that choice be made. + +Today I've been laying the groundwork, by implementing git's +pkt-line interface, and the long-running filter process protocol. +Next step will be to add support for that in `git-annex smudge`, +so that users who want to can enable it with: + + git config filter.annex.process "git-annex smudge --process" + +I can imagine that becoming enabled by default at some point in v9, if most +users prefer it over the current method. Which would still be available +by unsetting the config. + +---- + +Today's work was sponsored by Mark Reidenbach +[on Patreon](https://patreon.com/joeyh)