Only display warning when git-annex sync (without --content or
--no-content) is used with repositories that have preferred content
configured.
Sponsored-by: Leon Schuermann on Patreon
Failure to remove is not treated as a problem, and no permissions
modifications are done, to avoid unexpected states.
Sponsored-by: Luke Shumaker on Patreon
* S3: Amazon S3 buckets created after April 2023 do not support ACLs,
so public=yes cannot be used with them. Existing buckets configured
with public=yes will keep working.
* S3: Allow setting publicurl=yes without public=yes, to support
buckets that are configured with a Bucket Policy that allows public
access.
Sponsored-by: Joshua Antonishen on Patreon
This causes changes to the original branch to get merged with a single
sync. Before, it took 2 syncs; the first happened to update the synced/
branch, and the second merged changes from the synced/ branch into the
ajusted branch.
Using mergeToAdjustedBranch when tomerge == origbranch is probably
overkill, but it does work fine.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Bug fix: Re-running git-annex adjust or sync when in an adjusted branch
would overwrite the original branch, losing any commits that had been made
to it since the adjusted branch was created.
When git-annex adjust is run in this situation, it will display a warning
about the diverged branches.
When git-annex sync is run in this situation, mergeToAdjustedBranch
will merge the changes from the original branch to the adjusted branch.
So it does not need to display the divergence warning.
Note that for some reason, I'm needing to run sync twice for that to
happen. The first run does not do the merge and the second does. I'm unsure
why and so am not fully done with this bug.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Running git config --list inside .git then fails, so better to only
do that when --git-dir was specified explicitly. Otherwise, when the
repository is not bare, run the command inside the working tree.
Also make init detect when the uuid it just set cannot be read and fail
with an error, in case git changes something that breaks this later.
I still don't actually understand why git-annex add/assist -J2 was
affected but -J1 was not. But I did show that it was skipping writing to
the location log, because the uuid was NoUUID.
Sponsored-by: Graham Spencer on Patreon
Command.Add.seek starts concurrency with CommandStages. And for
Command.Sync, it needs TransferStages. So, to get both types of concurrency
for the two different parts, it either needs to change the type of
concurrency in between, or just call startConcurrency once for each.
It seems safe enough to call startConcurrency twice, because it does shut
down concurrency (mostly) at the end, and eg the old Annex.workers get
emptied.
Sponsored-by: unqueued on Patreon
This ended up having an interface like sync, rather than like get/copy/drop.
That let it be implemented in terms of sync, which took a lot less code.
Also, it lets it handle many of the edge cases that sync does, such as
getting files that are not visible in a --hide-missing branch, and sending
files to exporttree remotes.
As well as being easier to implement, `git-annex satisfy myremote` makes
sense as it satisfies the preferred content settings of the remote.
`git-annex satisfy somefile` does not form a sentence that makes sense. So
while -C can be a little bit annoying, it still makes sense to have this
syntax.
Note that, while I initially thought this would also satisfy numcopies, it
does not. Arguably it ought to. But, sync does not send files in order to
satisfy numcopies, it only sends files to satisfy preferred content. And
it's important that this transfer the same files as sync does, because
it will probably be used in a workflow where the user sometimes syncs and
sometimes satisfies, and does not expect satisfy to do things that sync
would not do.
(Also opened a new bug that also affects sync et all, not only this command.)
Sponsored-by: Nicholas Golder-Manning on Patreon
Fixes a crash by git-annex repair when .git/annex/journal/ does not exist.
Normally the journal directory is created before withJournalHandle gets
run, but git-annex repair can be run in a situation where it does not
exist.
git add will fail if the file got deleted in the meantime. And since it was
queued, there was a window until the queue flushed where a deletion of the
file would cause a crash.
Instead, reuse Command.Add.addFile, which sha1 hashes the file itself
immediately, and then queues the index update. Ignore exceptions that will
happen if the file got deleted already.
Sponsored-by: k0ld on Patreon
Commit b6642dde8a broke it by enabling
non-concurrent display mode while leaving concurrency set in the config
and having already started concurrency earlier.
(I don't actually know if that commit was a good idea.)
Sponsored-By: Brett Eisenberg on Patreon
Optimise database to further speed up importing large trees from special
remotes.
See comment for details of why the other index didn't help cid queries.
It would probably be better to manually create an index on only cid, rather
than adding a second uniqueness constraint that is a larger index. But
persitent does not support creating indexes, and an attempt to manually add
it to the migration failed.
Sponsored-by: Nicholas Golder-Manning on Patreon
This avoids bottlenecking on git check-ignore in a particular situation.
Also, there may have been a correctness issue with it not having updated it.
When the exportdb is already up-to-date, this is not expensive. And the
exportdb is updated elsewhere, so usually it is up-to-date.
Sponsored-by: Joshua Antonishen on Patreon
Speeds up eg git-annex sync --content by up to 50%. When it does not need
to transfer or drop anything, it now noops a lot more quickly.
I didn't see anything else in sync --content noop loop that could really
be sped up. It has to cat git objects to keys, stat object files, etc.
Sponsored-by: unqueued on Patreon
Speed up importing trees from special remotes somewhat by avoiding
redundant writes to sqlite database.
Before, import would write to both the git-annex branch and also to the
sqlite database. But then the next time it was run, needsUpdateFromLog
would see the branch had changed, so run updateFromLog, which would make
the same writes to the sqlite database a second time.
Now import writes only to the git-annex branch. The next time it's run,
needsUpdateFromLog sees that the branch has changed and so calls
updateFromLog, which updates the sqlite database.
Why defer the write to the sqlite database like this? It seems that it
could write to the database as it goes, and at the end call
recordAnnexBranchTree to indicate that the information in the git-annex
branch has all been written to the cidsdb. That would avoid the second
import doing extra work.
But, there could be other processes running at the same time, and one of
them may update the git-annex branch, eg merging a remote git-annex branch
into it. Any cids logs on that merged git-annex branch would not be
reflected in the cidsdb yet. If the import then called
recordAnnexBranchTree, the cidsdb would never get updated with that merged
information.
I don't think there's a good way to prevent, or to detect that situation.
So, it can't call recordAnnexBranchTree at the end. So it might as well
wait until the next run and do updateFromLog then. It could instead do
updateFromLog at the end, but it's going to check needsUpdateFromLog
at the beginning anyway.
Note that the database writes were queued, so there is already a cidmap
that is used to remember changes that the current process has made.
So, omitting database writes can't change the behavior of the current
process.
Also note that thirdpartypopulatedimport uses recordcidkeyindb, which
reflects what it already did. That code path does not use the cidmap,
but does not need to query it either. It might be possible to make that
code path also only update the git-annex branch and not the db, but I
haven't checked.
Sponsored-by: Noam Kremen on Patreon
The obvious way to fix this would be to adapt lines to split on null.
However, it's actually nontrivial to rewrite lines. In particular it has a
weird implementation to avoid a space leak. See:
https://gitlab.haskell.org/ghc/ghc/-/issues/4334
Also, while that is a small amount of code, it's covered by a rather
complex copyright and I'd have to include that copyright in git-annex.
So, I opted to filter out the trailing empty string instead.
Sponsored-by: Dartmouth College's Datalad project
POSIX character classes allowed in globs was a surprise, but just
happened to fall out of the implementation in a way that seems
to behave correctly.
mdwn2man has to be tweaked to render the example properly.
The line I modified is the one that strips ikiwiki wikilinks out of the
man page.
Sponsored-by: Graham Spencer on Patreon
This makes annexFileMode be just an application of setAnnexPerm',
which avoids having 2 functions that do different versions of the same
thing.
Fixes some buggy behavior for some combinations of core.sharedRepository
and umask.
Sponsored-by: Jack Hill on Patreon
New command, currently limited to changing autoenable= setting of a special remote.
It will probably never be used for more than that given the limitations on
it.
Sponsored-by: Brock Spratlen on Patreon
enableremote: Support enableremote of a git remote (that was previously set
up with initremote) when additional parameters such as autoenable= are
passed.
The enableremote special case for regular git repos is intended to handle
ones that don't have a UUID probed, and the user wants git-annex to
re-probe. So, that special case is still needed. But, in that special
case, the user is not passing any extra parameters. So, when there are
parameters, instead run the special remote setup code. That requires there
to be a uuid known already, and it allows changing things like autoenable=
Remote.Git.enableRemote changed to be a no-op if a git remote with the name
already exists. Which it generally will in this case.
Sponsored-by: Jack Hill on Patreon
registerurl: When an url is claimed by a special remote other than the web,
update location tracking for that special remote.
registerurl's behavior was changed in commit
451171b7c1, apparently accidentially to not
update location tracking except for the web.
This makes registerurl followed by unregisterurl not be a no-op, when the
url happens to be claimed by a remote other than the web. It is a noop when
the url is unclaimed except by the web. I don't like the inconsistency,
and wish that registerurl and unregisterurl never updated location
tracking, which would be more in keeping with them being plumbing.
But there is the fact that it used to behave this way, and also it was
inconsistent that it updated location tracking for the web but not for
other remotes, unlike addurl. And there's an argument that the user might
not know what remote to expect to claim an url, so would be considerably in
the dark when using registerurl. (Although they have to know what content
gets downloaded, since they specify a key..)
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
This serves two purposes. --remote=web bypasses other special remotes that
claim the url, same as addurl --raw. And, specifying some other remote
allows making sure that an url is claimed by the remote you expect,
which makes then using setpresentkey not be fragile.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Support VERSION 2 in the external special remote protocol, which is
identical to VERSION 1, but avoids external remote programs neededing to
work around the above bug. External remote program that support
exporttree=yes are recommended to be updated to send VERSION 2.
Sponsored-by: Kevin Mueller on Patreon
Remote.Directory makes a temp file, then calls this, and since the temp
file exists, it prevented probing if CoW works.
Note that deleting the empty file does mean there's a small window for a
race. If another process is also exporting to the remote, that could let it
make the same temp file. However, the temp filename actually has the
processes's pid in it, which avoids that being a problem.
This may have been a reversion caused by commits around
63d508e885, but I haven't gone back and
tested to be sure. The directory special remote had supposedly supported
CoW for this going back to about half a year before that.
Sponsored-by: Graham Spencer on Patreon
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.
Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.
Sponsored-by: unqueued on Patreon
I don't know of scenarios where that can happen (besides the bug
fixed by the parent commit), but there probably are some.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
Avoid failure to update adjusted branch --unlock-present after git-annex
drop when annex.adjustedbranchrefresh=1
At higher values, it did flush the queue, which ran restagePointerFiles.
But at 1, adjustedBranchRefreshFull gets added to the queue, and while
restagePointerFiles is also in the queue, it runs after that.
Sponsored-by: Brock Spratlen on Patreon
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.
(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)
Sponsored-by: Jack Hill on Patreon