* p2phttp: Allow unauthenticated users to lock content by default.
* p2phttp: Added --unauth-nolocking option to prevent unauthenticated
users from locking content.
The rationalle for this is that locking is not really a write operation, so
makes sense to allow in a repository that only allows read-only access. Not
supporting locking in that situation will prevent the user from dropping
content from a special remote they control in cases where the other copy of
the content is on the p2phttp server.
Also, when p2phttp is configured to also allow authenticated access,
lockcontent was resulting in a password prompt for users who had no way to
authenticate. And there is no good way to distinguish between the two types
of users client side.
--unauth-nolocking anticipates that this might be abused, and seems better
than disabling unauthenticated access entirely if a server is being
attacked. It may be that rate limiting locking by IP address or similar
would be an effective measure in such a situation. Or just limiting the
number of locks by anonymous users that can be live at any one time. Since
the impact of such an DOS attempt is limited to preventing dropping content
from the server, it seems not a very appealing target anyway.
p2phttp: Support serving unauthenticated users while requesting
authentication for operations that need it. Eg, --unauth-readonly can be
combined with --authenv.
Drop locking currently needs authentication so it will prompt for that.
That still needs to be addressed somehow.
Probably a good idea for freezing, but especially I hope this fixes a
problem with git-annex sim run that caused it to sometimes crash in
removeDirectoryRecursive with directory not empty, presumably because a
thread was writing there at the same time.
Had to add Read instances to Key and NumCopies and some other similar
types. I only expect to use those in serializing a sim. Of course, this
risks that implementation changes break reading old data. For a sim,
that would not be a big problem.
The generator doesn't emit the best possible connect commands,
but it does output something valid. Eg, an input like:
connect A <-> B <-> C <-> D
becomes:
connect A <-> B <-> C
connect C <-> D
Also:
connect A -> B <- C
becomes:
connect A -> B
connect C -> B
Which could be improved.
Also disconnect commands are not prettified at all, but probably there's
no reason to.
Have most of the sim command handler, but to keep it pure while implementing
the rest will need some refactoring.
It seems likely that running the simulation itself will not be able to be
entirely pure. Preferred content evaluation runs in Annex after all.
Note that the somewhat awkward randomWords is because the i386ancient
build depends on a version of random too old to support generating a
random ByteString on its own.
copy and get do check preferred content, so need to prepareLiveUpdate.
move and mirror do not, but copy is implemented using move, so move also
needed to have a LiveUpdate plumbed through.
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.
So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.
There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.
Not currently in a compilable state.
A new repo that has no location log info yet, but has an entry in
uuid.log has 0 size, so make RepoSize aware of that.
Note that a new repo that does not yet appear in uuid.log will still not
be displayed.
When a remote is added but not synced with yet, it has no uuid.log
entry. If git-annex maxsize is used to configure that remote, it needs
to appear in the maxsize table, and the change to Command.MaxSize takes
care of that.
updateRepoSize is only called on the UUID of a repository, not any
cluster it might be a node of. But overLocationLogs and overLocationLogsJournal
were inclusing cluster UUIDs. So it was inconsistent.
Currently I don't see any reason to calculate RepoSize for a cluster.
It's not even clear what it should mean, the total size of all nodes, or
the amount of information stored in the cluster in total?
This will be used to prime the RepoSizes database, which will always
contain values that correpond to information in the git-annex branch, so
without anything from journal files.
Factored out overJournalFileContents which will later be used to
update Annex.reposizes to include information from journal files.
This will be partitcularly important to support private UUIDs which only
ever get to journal files and not to the branch.
5afbea25e7 changed it to ignore journal
files that did not correspond to a key in the git-annex branch. However,
when there is a private journal, that can happen.
Neither behavior is fully correct, so keep the old incorrect behavior
rather than introducing a new differently incorrect behavior.
I plan to eventually make git-annex info use Annex.reposizes instead of
calculating it itself, and once Annex.reposizes handles this all
correctly, this will be a moot problem.
git-annex info was displaying a message that didn't make sense in
context.
In calcRepoSizes, it seems better to return the information from the
git-annex branch, rather than giving up. Especially since balanced
preferred content uses it, and we can't just give up evaluating a
preferred content expression if git-annex is to be usable in such a
readonly repo.
Commit 6d7ecd9e5d nobly wanted git-annex
to behave the same with such unmerged branches as it does when it can
merge them. But for the purposes of preferred content, it seems to me
there's a sense that such an unmerged branch is the same as a remote we
have not pulled from. The balanced preferred content will either way
operate under outdated information, and so make not the best choices.
This removes versionedExport, which was only used by the S3 special
remote. Instead, versionedexport=yes is a common way for remotes to
indicate that they are versioned.
This allows git-annex post-receive, on the first push to
the remote to see that it is able to get a key from it in
order to upload it back.
Also avoided actively checking if the source remote contains a key.
The location log is good enough. If the location log is wrong,
the export of that file will fail with an informative message.