f2713a3bb9
Checking .gitattributes adds a full minute to a git annex find looking for files that don't have enough copies. 2:25 increasts to 3:27. I feel this is too much of a slowdown to justify making it the default. So, exposed two versions of the preferred content expression, a slow one and a fast but approximate one. I'm using the approximate one in the default preferred content expressions to avoid slowing down the assistant.
86 lines
3.9 KiB
Markdown
86 lines
3.9 KiB
Markdown
The assistant and git annex sync --content do not try to proactively
|
|
download content that is not otherwise wanted in order to get numcopies
|
|
satisfied. (Unlike get --auto, which does take numcopies into account.)
|
|
|
|
Should these automated systems try to proactively satisfy numcopies? I
|
|
don't feel they should. It could result in surprising results. For example,
|
|
a transfer repository, which is of limited size, could start being filled
|
|
up with lots of content that all clients have, just because numcopies was
|
|
set to a larger number than the total number of clients. Another example,
|
|
a source repository on eg an Android phone, should never have content in it
|
|
that was not created on that device.
|
|
|
|
However, it would make sense for some specific
|
|
types of repositories to proactively get content to satisfy numcopies.
|
|
Currently some types of repositories use "or (not copies=semitrusted+:1)",
|
|
to ensure that if the only copy of a file is on a dead repository, they
|
|
will try to get that file before the repo goes away. This is done
|
|
by client repositories, and backup, and archive. Probably the same set
|
|
would make sense to proactively satisfy numcopies.
|
|
|
|
So, a new type of preferred content expression is called for. Such as, for
|
|
example, "numcopiesneeded=1". Which indicates that at least 1 more copy
|
|
is needed to satifsy numcopies.
|
|
|
|
(Note that it should only count semittrusted and higher trust
|
|
level repos as satisfying numcopies.)
|
|
|
|
But, preferred content expressions can only operate on info stored in the
|
|
git repo, or they will fail to be stable. Ie, repo A needs to be able to
|
|
calculate whether a file is preferred content by repo B and get the same
|
|
result as when repo B calculates that.
|
|
|
|
numcopies is currently configured in 3 places:
|
|
|
|
* .git/config `annex.numcopies` (global, stored only locally)
|
|
* .gitattributes `annex.numcopies` (per file, stored in git repo)
|
|
* --numcopies (not relevant)
|
|
|
|
So, need to add a global numcopies setting that is stored in the git repo.
|
|
That could either be a file in the git-annex branch, or just
|
|
`* annex.numcopies=2` in the toplevel .gitattributes. Note that the
|
|
assistant needs to be able to query and set it, which I think argues
|
|
against using .gitattributes for it. Also arguing against that is that the
|
|
.git/config numcopies valie applies even to objects with no file in the
|
|
work tree, which gitattributes settings do not.
|
|
|
|
Conclusion:
|
|
|
|
* Add to the git-annex branch a numcopies file that holds the global
|
|
numcopies default if present. **done**
|
|
* Modify the assistant to use it when configuring numcopies. **done**
|
|
* To deprecate .git/config's annex.numcopies, only make it take effect
|
|
when there is no numcopies file in the git-annex branch. **done**
|
|
* Add "numcopiesneeded=N" preferred content expression using the git-annex
|
|
branch numcopies setting, overridden by any .gitattributes numcopies setting
|
|
for a particular file. It should ignore the other ways to specify
|
|
numcopies, particularly git config annex.numcopies. **done**
|
|
* Make the repo groups that currently end with "or (not copies=semitrusted+:1)"
|
|
to instead end with "or numcopiesneeded=1" **done**
|
|
* See if "numcopiesneeded=N" can check .gitattributes without getting
|
|
a lot slower. If now, perhaps add a "numcopiesneededaccurate=N" that
|
|
checks it. **done**
|
|
|
|
[[done]]
|
|
|
|
## Stability analysis
|
|
|
|
If a remote prefers eg, "blah or numcopiesneeded=1", and
|
|
file $foo does not match blah, but needs more copies, then then the
|
|
expression will match.
|
|
|
|
So, git-annex will get $foo, adding a copy. Which means that the
|
|
numcopiesneeded=1 will no longer match, so the file is no longer wanted
|
|
now that it has been downloaded.
|
|
|
|
Now there are two cases for what can happen:
|
|
|
|
* git-annex tries to drop $foo, but fails because it cannot find enough
|
|
other copies
|
|
* git-annex copies $foo to some other remote that wants it, and then
|
|
manages to drop $foo from the local remote.
|
|
|
|
This seems ok. Files flow through repos and they act like transfer
|
|
repos when there are not enough copies.
|
|
|
|
--[[Joey]]
|