On push, first try to drop all outManifest keys listed in the current
manifest file, which resumes from an interrupted push that didn't
get a chance to delete those keys.
The new manifest gets its outManifest populated with the keys that were
in the old manifest, plus any of the keys that were unable to be
dropped.
Note that it would be possible for uploadManifest to skip dropping old
keys at all. The old keys would get dropped on the next push. But it
seems better to delete stuff immediately rather than waiting. And the
extra work is limited to push and typically is small.
A remote where dropKey always fails will result in an outManifest that
grows longer and longer. It would be possible to check if the remote
has appendonly = True and avoid populating the outManifest. Of course,
an appendonly remote will grow with every git push anyway. And currently
only Remote.GitLFS sets that, which can't be used as a git-remote-annex
remote anyway.
Implemented alternateJournal, which git-remote-annex
uses to avoid any writes to the git-annex branch while setting up
a special remote from an annex:: url.
That prevents the remote.log from being overwritten with the special
remote configuration from the url, which might not be 100% the same as
the existing special remote configuration.
And it prevents an overwrite deleting of other stuff that was
already in the remote.log.
Also, when the branch was created by git-remote-annex, only delete it
at the end if nothing else has been written to it by another command.
This fixes the race condition described in
797f27ab05, where git-remote-annex
set up the branch and git-annex init and other commands were
run at the same time and their writes to the branch were lost.
This turns out to only be necessary is edge cases. Most of the
time, git-annex unused --from remote doesn't see git-remote-annex keys
at all, because it does not record a location log for them.
On the other hand, git-annex unused does find them, since it does not
rely on the location log. And that's good because they're a local cache
that the user should be able to drop.
If, however, the user ran git-annex unused and then git-annex move
--unused --to remote, the keys would have a location log for that
remote. Then git-annex unused --from remote would see them, and would
consider them unused. Even when they are present on the special remote
they belong to. And that risks losing data if they drop the keys from
the special remote, but didn't expect it would delete git branches they
had pushed to it.
So, make git-annex unused --from skip git-remote-annex keys whose uuid
is the same as the remote.
I hope to support importtree=yes eventually, but it does not currently
work.
Added remote.<name>.allow-encrypted-gitrepo that needs to be set to
allow using it with encrypted git repos.
Note that even encryption=pubkey uses a cipher stored in the git repo
to encrypt the keys stored in the remote. While it would be possible to
not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using
encryption=pubkey, it doesn't currently work, and that would be a
complication that I doubt is worth it.
Updating the remote list needs the config to be written to the git-annex
branch, which was not done for good reasons. While it would be possible
to instead use Remote.List.remoteGen without writing to the branch, I
already have a plan to discard git-annex branch writes made by
git-remote-annex, so the simplest fix is to write the config to the
branch.
Sponsored-by: k0ld on Patreon
Put the annex objects in .git/annex/objects/ inside the export remote.
This way, when importing from the remote, they will be filtered out.
Note that, when importtree=yes, content identifiers are used, and this
means that pushing to a remote updates the git-annex branch. Urk.
Will need to try to prevent that later, but I already had a todo about
that for other reasons.
Untested!
Sponsored-By: Brock Spratlen on Patreon
Otherwise, it can be confusing to clone from a wrong url, since it fails
to download a manifest and so appears as if the remote exists but is empty.
Sponsored-by: Jack Hill on Patreon
This will eventually be used to recover from an interrupted fullPush
and drop the old bundle keys it was unable to delete.
Sponsored-by: Luke T. Shumaker on Patreon
fsck --fast was intended to disable checksumming, but checksumming is done
after transfers too. Due to the check being in the non-incremental path,
it would only affect non-incremental checksumming during a transfer,
and I'm not 100% sure that it was a problem.
Also, when using an external backend that does checksumming, fsck --fast
didn't disable it and now does.
Before commit c565340adc,
it was statting the file in order to get its size, which was needed to
use an external hasher. In that commit since it no longer needed the
stat, I made it check doesPathExist since the old code implicitly
checked if the file existed. But that is not necessary, this should only
be ever called on files that exist.
Update its todo with remaining items.
Add changelog entry.
Simplified internals document to no longer be notes to myself, but
target users who want to understand how the data is stored
and might want to extract these repos manually.
Sponsored-by: Kevin Mueller on Patreon
Such as annex::?type=foo&...
I accidentially left out the uuid when creating one,
and the result is it appears to clone an empty repository.
So let's guard against that mistake.
Full pushing will probably work, but is untested.
Incremental pushing is not implemented yet.
While a fairly straightforward port of the shell prototype, the details
of exactly how to get the objects to the remote were tricky. And the
prototype did not consider how to deal with partial failures and
interruptions.
I've taken considerable care to make sure it always leaves things in a
consistent state when interrupted or when it loses access to a remote in
the middle of a push.
Sponsored-by: Leon Schuermann on Patreon
It did not seem possible to avoid creating a git-annex branch while
git-remote-annex is running. Special remotes can even store their own
state in it. So instead, if it didn't exist before git-remote-annex
created it, it deletes it at the end.
This does possibly allow a race condition, where git-annex init and
perhaps other git-annex writing commands are run, that writes to the
git-annex branch, at the same time a git-remote-annex process is being
run by git fetch/push with a full annex:: url. Those writes would be
lost. If the repository has already been initialized before
git-remote-annex, that race won't happen. So it's pretty unlikely.
Sponsored-by: Graham Spencer on Patreon
Also support using annex:: urls that specify the whole special remote
config.
Both of these cases need a special remote to be initialized enough to
use it, which means writing to .git/config but not to the git-annex
branch. When cloning, the remote is left set up in .git/config,
so further use of it, by git-annex or git-remote-annex will work. When
using git with an annex:: url, a temporary remote is written to
.git/config, but then removed at the end.
While that's a little bit ugly, the fact is that the Remote interface
expects that it's ok to set git configs of the remote that is being
initialized. And it's nowhere near as ugly as the alternative of making
a temporary git repository and initializing the special remote in there.
Cloning from a repository that does not contain a git-annex branch and
then later running git-annex init is currently broken, although I've
gotten most of the way there to supporting it.
See cleanupInitialization FIXME.
Special shout out to git clone for running gitremote-helpers with
GIT_DIR set, but not in the git repository and with GIT_WORK_TREE not
set. Resulting in needing the fixupRepo hack.
Sponsored-by: unqueued on Patreon
Tested using a manually populated directory special remote.
Pushing is still to be done. So is fetching from special remotes
configured via the annex:: url.
Sponsored-by: Brock Spratlen on Patreon