plan for these

This commit is contained in:
Joey Hess 2021-05-21 13:50:26 -04:00
parent f39b7c3663
commit 1d9bad51d2
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 57 additions and 0 deletions

View file

@ -0,0 +1,47 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2021-05-21T17:05:48Z"
content="""
I think Ilya is on the right track with recording the last treeish scanned
for keys and diffing to the current treeish. This seems worth implementing.
So, plan:
* When dropping a file, after checking that it's not preferred content,
or when needed for <https://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/>
* When the stat of the index differs from the last recorded stat
* Generate a tree from the index with `git write-tree`, and when it
differs from the last recorded such tree
* `git diff-index --cached` with the last recorded tree, parse and cat
the links, and update the keys database associated files with the keys
it found. Note that it can also remove associated files when files get
deleted, which is currently not done.
* Then, look up in the keys database the associated files of the key, and
also check if the preferred content matches any of those other files.
So in the example above, it won't drop `archive/bar` because `foo` is
still preferred content.
This would not notice if a file was moved or deleted but that was not
staged in the index yet. I think that's ok -- If the file is moved and not
staged yet, it's not been added to git, so it's acceptable for git-annex to
not try to keep content around for it (git-annex unused would not either
in similar situations). If the file is deleted, git-annex might not drop
the content yet because it's still present in the index, but the same
happens if you delete a file and then `git annex drop .`, because it skips
over the deleted file whether or not it's staged in the index.
It seems sufficient to only do this when dropping a file, not in other
commands. That means the associated files database remains incomplete
for other commands, not including recent changes to locked files.
Since this is only a problem for dropping, and the associated files
database is not needed for locked files generally, that seems ok, although
it would certianly be a nice simplification if the database was always
kept up-to-date. Since the update could be a pretty expensive operation,
probably best to keep it for the drops that really need it.
Note that the `git write-tree` would mean that using git-annex could
create these tree objects, which never end up getting used. gc would
eventually delete them, but there might be a situation where doing this
a lot bloats the repo.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-05-21T17:29:09Z"
content="""
I have a plan over in [[bugs/indeterminite_preferred_content_state_for_duplicated_file]]
and will be working on it over there.
Linked worktrees seems out of scope for this though.
"""]]