Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.
So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.
There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.
Not currently in a compilable state.
This removes versionedExport, which was only used by the S3 special
remote. Instead, versionedexport=yes is a common way for remotes to
indicate that they are versioned.
This is not perfect because it does not handle versioned special
remotes, which should not be untrustworthy, but now are when proxied.
The implementation turned out to be easy, because the exporttree field
is a default field, so is available in RemoteConfig even for git
remotes.
This avoids needing to re-upload the file again to get it to the
annexobjects location, which git-annex sync was doing when it was
preferred content.
If the file is not preferred content, sync will drop it from the
annexobjects location.
If the file has been deleted from the tree, it will remain in the
annexobjects location until an unused/dropunused pass is done.
The file in the annexobjects location may have been renamed from a
previously exported file that got deleted in a subsequent export.
Or it may be renamed to annexobjects temporarily before being renamed to
another name (to handle eg pairwise renames).
But, an exported file is not guaranteed to contain the content of the
key that the local repository last exported there. Another tree could
have been exported from elsewhere in the meantime.
So, files in annexobjects do not necessarily have the content of their
key. And so have to be strongly verified when retrieving. The same as
is done when retrieving exported files.
Removing the key from the annexobjects location when it's in the
exported tree would leave it in the exported tree, and so succeeding
would update the location log incorrectly. But this also can't remove it
from the exported tree, because that would cause import tree to see a
file got deleted. So, refuse to remove in this situation.
It would be possible to remove from the annexobjects location and then
fail. Then if a key somehow got stored in both the annexobjects location
and the exported tree location(s), the duplicate would be resolved. Not
doing this because first, I don't know how that situation could happen,
and second, it seems wrong for a failed remove to have a side-effect
like that.
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
This allows lockContentShared to lock content for eg, 10 minutes and
if the process then gets terminated before it can unlock, the content
will remain locked for that amount of time.
The Windows implementation is not yet tested.
In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio
or remotedaemon is serving the P2P protocol, and is asked to
LOCKCONTENT, and that process gets killed, the content will not be
subject to deletion. This is not a perfect solution to
doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most
of the way there, without needing any P2P protocol changes.
This is only done in v10 and higher repositories (or on Windows). It
might be possible to backport it to v8 or earlier, but it would
complicate locking even further, and without a separate lock file, might
be hard. I think that by the time this fix reaches a given user, they
will probably have been running git-annex 10.x long enough that their v8
repositories will have upgraded to v10 after the 1 year wait. And it's
not as if git-annex hasn't already been subject to this problem (though
I have not heard of any data loss caused by it) for 6 years already, so
waiting another fraction of a year on top of however long it takes this
fix to reach users is unlikely to be a problem.
Wanted to also list a cluster's nodes when showing info for the cluster,
but that's hard because it needs getting the name of the proxying
remote, which is some prefix of the cluster's name, but if the names
contain dashes there's no good way to know which prefix it is.
This allows an error message from a proxied special remote to be
displayed to the client.
In the case where removal from several nodes of a cluster fails,
there can be several errors. What to do? I decided to only show
the first error to the user. Probably in this case the user is not in a
position to do anything about an error message, so best keep it simple.
If the problem with the first node is fixed, they'll see the error from
the next node.
That error is now rethrown on the client, so it will be displayed.
For example:
$ git-annex fsck x --fast --from AMS-dir
fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed
No protocol version check is needed. Because in order to talk to a
proxied special remote, the client has to be running the upcoming
git-annex release. Which has this fix in it.
This will allow having an internal thread speaking P2P protocol,
which will be needed to support proxying to external special remotes.
No serialization is done on the internal P2P protocol of course.
When a ByteString is being exchanged, it may or may not be exactly
the length indicated by DATA. While that has to be carefully managed
for the serialized P2P protocol, here it would require buffering the
whole lazy bytestring in memory to check its length when sending,
so it's better to do length checks on the receiving side.
This makes eg git-annex get default to using the cluster rather than an
arbitrary node, which is better UI.
The actual cost of accessing a proxied node vs using the cluster is
basically the same. But using the cluster allows smarter load-balancing
to be done on the cluster.
When --debugfilter or annex.debugfilter is set, avoid propigating debug
output from git-annex-shell, since it cannot be filtered.
It would be possible to pass --debugfilter on to git-annex-shell,
but it only started accepting that option in 2022. So it would break
interop with older versions.
Walking a tightrope between security and convenience here, because
git-annex-shell needs to only proxy for things when there has been
an explicit, local action to configure them.
In this case, the user has to have run `git-annex extendcluster`,
which now sets annex-cluster-gateway on the remote.
Note that any repositories that the gateway is recorded to
proxy for will be proxied onward. This is not limited to cluster nodes,
because checking the node log would not add any security; someone could
add any uuid to it. The gateway of course then does its own
checking to determine if it will allow proxying for the remote.
Avoid `git-annex sync --content` etc from operating on cluster nodes by default
since syncing with a cluster implicitly syncs with its nodes. This avoids a
lot of unncessary work when a cluster has a lot of nodes just in checking
if each node's preferred content is satisfied. And it avoids content
being sent to nodes individually, so instead syncing with clusters always
fanout uploads to nodes.
The downside is that there are situations where a cluster's preferred content
settings can be met, but those of its nodes are not. Or where a node does not
contain a key, but the cluster does, and there are not enough copies of the key
yet, so it would be desirable the send it there. I think that's an acceptable
tradeoff. These kind of situations are ones where the cluster itself should
probably be responsible for copying content to the node. Which it can do much
less expensively than a client can. Part of the balanced preferred content
design that I will be working on in a couple of months involves rebalancing
clusters, so I expect to revisit this.
The use of annex-sync config does allow running git-annex sync with a specific
node, or nodes, and it will sync with it. And it's also possible to set
annex-sync git configs to make it sync with a node by default. (Although that
will require setting up an explicit git remote for the node rather than relying
on the proxied remote.)
Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster
due to a cycle. And the Annex.Startup load of clusters happens
too late for Remote.Git to use that. This does mean one redundant load
of the cluster log, though only when there is a proxy.
This makes git-annex sync and similar not treat proxied remotes as git
syncable remotes.
Also, display in git-annex info remote when the remote is proxied.
Loading the remote list a second time was removing all proxied remotes.
That happened because setting up the proxied remote added some config
fields to the in-memory git config, and on the second load, it saw those
configs and decided not to overwrite them with the proxy.
Now on the second load, that still happens. But now, the proxied
git configs are used to generate a remote same as if those configs were
all set. The reason that didn't happen before was twofold,
the gitremotes cache was not dropped, and the remote's url field was not
set correctly.
The problem with the remote's url field is that while it was marked as
proxy inherited, all other proxy inherited fields are annex- configs.
And the code to inherit didn't work for the url field.
Now it all works, but git-annex sync is left running git push/pull on
the proxied remote, which doesn't work. That still needs to be fixed.
Dropping from a cluster drops from every node of the cluster.
Including nodes that the cluster does not think have the content.
This is different from GET and CHECKPRESENT, which do trust the
cluster's location log. The difference is that removing from a cluster
should make 100% the content is gone from every node. So doing extra
work is ok. Compare with CHECKPRESENT where checking every node could
make it very expensive, and the worst that can happen in a false
negative is extra work being done.
Extended the P2P protocol with FAILURE-PLUS to handle the case where a
drop from one node succeeds, but a drop from another node fails. In that
case the entire cluster drop has failed.
Note that SUCCESS-PLUS is returned when dropping from a proxied remote
that is not a cluster, when the protocol version supports it. This is
because P2P.Proxy does not know when it's proxying for a single node
cluster vs for a remote that is not a cluster.
This is slightly belt and suspenders, but nothing guarantees that the
peer avoids including its uuid in the SUCCESS-PLUS list as it's supposed
to. And while it probably doesn't matter if the location log is updated
redundantly, let's not find out.