in= was problimatic in two ways. First, it referred to a remote by name,
but preferred content expressions can be evaluated elsewhere, where that
remote doesn't exist, or a different remote has the same name. This name
lookup code could error out at runtime. Secondly, in= seemed pretty useless.
in=here did not cause content to be gotten, but it did let present content
be dropped.
present is more useful, although "not present" is unstable and should be
avoided.
This cache prevented noticing changes made by another process.
The case I just ran into involved the assistant dropping a file, which
cached its presence info. Then the same file was downloaded again,
but the assistant didn't know its presence info had changed.
I don't see a way to keep this cache. Will instead rely on the OS level
file cache, for files in the journal. May need to add more higher-level
caching of info that it's ok to have a potentially stale copy of,
although much of git-annex already does so.
This can result in the file being dropped, or being downloaded, or even
being dropped from some other repo.
It's even possible to create a file in a directory where content is not
wanted, which will make the assistant immediately send it elsewhere, and
then drop it.
This was complicated quite a bit by needing to check numcopies. I optimised
that, so it only looks up numcopies once per file, no matter how many
remotes it checks to drop from. Although it did just occur to me that
it might be better to first check if it wants to drop content, and only
then check numcopies..