almost have a plan

This commit is contained in:
Joey Hess 2021-05-13 14:09:06 -04:00
parent 10af498be1
commit 13a8706cda
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 103 additions and 0 deletions

View file

@ -0,0 +1,46 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-05-13T16:10:39Z"
content="""
Hmm, it seems possible that two repos could use the same uuid for a
remote, but have different configurations for it. Eg, an internal use repo
that might even embed creds for the remote, and a public use repo that
relies on public http urls to download from the remote.
So there would then be 3 things that need to be able to be specified:
* keys to copy
* uuids whose per-key information should be copied (or ones to skip)
* uuids whose non-per-key information should be copied (or ones to skip)
(remote description, special remote config, trust, group, preferred
content, etc)
Might as well add, for completeness:
* whether to copy global config settings, or not (numcopies, mincopies,
git-annex-config, group-preferred-content, difference.log)
Could get more granular than this, eg only copying some metadata fields and
not others, or description but not trust log, but I'd want to see a use
case. A line has to be drawn somewhere or it just gets ridiculous, and the
user might as well pull up [[internals]] and git-filter-branch and
post-process the tree generated by this command.
So a UI for these 3 or 4 things..
git-annex copy-branch --keys-from=path
--include-key-information-for=repo
--exclude-key-information-for=repo
--include-config-for=repo
--exclude-config-for=repo
--include-global-config
--exclude-global-config
Eg:
git-annex copy-branch --keys-from=.
--exclude-key-information-for=privateremote
--exclude-config-for=privateremote
--include-global-config
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-05-13T16:29:41Z"
content="""
The other axis is, I guess, should it include past commits to the git-annex
branch, or only the current data? I'm inclined toward only the current
data. The only thing that uses past data really is `git-annex log` and it's just
not worth the added time expense. And also `git annex forget` already
throws away the past data.
There is the added wart of exported treeishes being grafted into the
git-annex branch (to avoid them being lost in GC in some edge cases).
It would need to do like `git annex forget` was recently fixed to, and
include those grafts when throwing away the rest of the history.
(See [[!commit 8e7dc958d20861a91562918e24e071f70d34cf5b]])
"""]]

View file

@ -0,0 +1,40 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2021-05-13T16:55:36Z"
content="""
The filtering of uuids from logs this command needs is very closely
related to how the git-annex branch is filtered when dropping dead uuids
and keys.
Annex.Branch.Transitions.dropDead could alsmost be used as-is, just
providing it a trustmap that has the excluded uuids marked as dead.
But, it does not currently modify the trustLog, which makes sense for
transitions, but for this the trust log needs to include only the desired
uuids.
And, providing a trustmap does have the problem that,
if a uuid is mentioned in the branch without being in uuid.log,
it would not be in the trustmap, and so it would not be excluded. One way
for that to happen is well, using this command to copy only per-key info
for a remote, but not config for a remote. Hmm. Using a filtering
function, rather than a trustmap, would avoid this problem. But,
dropDead does some processing to handle sameas-uuid pointing to a dead
uuid, including a special case involving remoteLog.
Implementation plan:
* Address above problems with dropDead, somehow, so it can be reused.
* Add a function (in Logs) from a key to all possible git-annex branch log
files for that key.
* For each key seeked, run that function, query the branch to see which
log files exist, and pass through dropDead to filter and populate
the temporary index. This way, the command does not need to buffer
the whole set of keys in memory.
* Get a list of all non-key logs
`(topLevelNewUUIDBasedLogs++topLevelOldUUIDBasedLogs++otherLogs)`,
and pass them all through dropDead as well.
* Refactor regraftexports from Annex.Branch, and call it after
constructing the filtered index.
"""]]