From b370e6b0ade431ef9ea59112319cf157ab453591 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 23 Dec 2020 13:41:02 -0400 Subject: [PATCH] bug --- ...que_contentidentifier_which_gets_lost.mdwn | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 doc/bugs/borg_uses_non_unique_contentidentifier_which_gets_lost.mdwn diff --git a/doc/bugs/borg_uses_non_unique_contentidentifier_which_gets_lost.mdwn b/doc/bugs/borg_uses_non_unique_contentidentifier_which_gets_lost.mdwn new file mode 100644 index 0000000000..1138c908a5 --- /dev/null +++ b/doc/bugs/borg_uses_non_unique_contentidentifier_which_gets_lost.mdwn @@ -0,0 +1,27 @@ +borg uses a non-unique ContentIdentifier ("") for everything. +I think this is why, it eventually gets lost from the sqlite database, +preventing retrieval of content from the remote. + +Repositories affected by this problem can be fixed up by just: +`rm -rf .git/annex/cidsdb` + +The ContentIdentifiers table has a +"ContentIndentifiersCidRemoteIndex cid remote", and that's not just an +index, it's a uniqueness constraint. + +And that makes sense generally, the point of a ContentIdentifier is that +wherever a remote uses it, it identifies the same content. + +I think sqlite probably lets things be added +that violate the constraint at first, but then later writes it removes +the "non-unique" row. Which in this case associates the same cid with +a different key. + +I'm thinking this was a mistaken optimisation. getContentIdentifierKeys +is supposed to return a [Key] for a ContentIdentifier; there can be more +than one and it contains code that assumes it will get back all of them. +And if a remote uses a hash for generating ContentIdentifiers, two different +Key can have the same content in edge cases. + +So, need to upgrade the database, removing this constraint from it. +--[[Joey]]