Fix a crash opening sqlite databases when run in a non-unicode locale,
with a remote that uses a non-unicode filepath. In that situation
converting to Text fails.
The fix needs git-annex to be built with persistent-sqlite 2.13.3.
Building against older versions still works, but that version is used when
building with stack.
Database.RawFilePath is a lot of code copied from persistent-sqlite and
lightly modified, since only 1 function in persistent-sqlite was made to
support RawFilePath. This is a bit of a pain, and I hope that
persistent-sqlite will eventually switch to using OsPath, allowing this
module to be removed from git-annex.
Sponsored-by: k0ld on Patreon
Avoid a problem with temp file names ending in "." on certian filesystems
that have problems with such filenames.
relatedTemplate is quite an ugly hack really; since it doesn't know the max
filename length of the filesystem it can only assume that the filename is
max allowed length. When given the input "lh.aparc.DKTatlas.annot", it
wants to reserve 20 characters for tempfile so it truncates to "lh.". That
ending period is apparently a problem on some filesystem (FAT eats it, but
does not throw EINVAL; ntfs does not seem bothered by it, I don't know what
FUSE filesystem the bug reporter was really using).
Sponsored-by: Brett Eisenberg on Patreon
Presumably git merge sometimes needs to verifiy if a worktree file is
modified, and so will then run git-annex filter-process which would try to
take the pid lock. And for whatever reason, git-annex sync already had the
pidlock held. I have not replicated that, but it does make enough sense to
deploy the workaround.
Like I said back in commit 7bdb0cdc0d,
Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
Sponsored-by: KDM on Patreon
Implementation was simple because it's equivilant to
--from=foo --to remote for each other remote, followed by
--to remote when there's a local copy.
(Or, in the edge case of --from-anywhere --to=here,
it's the same as --to=here.)
Note that, when the local repo does not have a copy,
fromToPerform gets it from a remote, sends it to the destination,
and drops the local copy. Another call to that for a second remote
will notice that the dest now has a copy, and simply drop from the
second remote, avoiding a second transfer.
Also note that, when numcopies doesn't allow dropping it from
everywhere, it will drop it from the cheapest remotes first
(maybe not ideal) up to more expensive remotes, and finally from the local
repo. So the local repo will generally end up holding a copy. Maybe not
ideal in all cases either, but it seems no worse to do that than to end up
with a copy undropped from a remote.
And I'm not entirely happy with the output, eg:
copy bigfile (from r3...) ok
copy bigfile ok
That makes sense if you think of the second line as being
the same as what is output by `git-annex copy bigfile --to bar`,
but it's less clear in this context. Maybe add "(from here...)"?
Also the --json output doesn't have a machine-readable field for
the "from" uuid, and maybe it should?
Sponsored-by: Dartmouth College's DANDI project
Eg when the destination is logged as containing a file, skip
actively checking that it does contain it.
Note that --fast does not prevent other verifications of content
location that are done in a copy --from --to. Perhaps it could, but this
change will already avoid the real unnecessary work of operating on
files that are already in the remote.
And avoiding other verifications
might cause it to fail if the location log thinks that --to does not
contain the content but does. Such complications with `git-annex copy
--to remote --fast` led to commit d006586cd0
which added a note that gets displayed when that fails, mentioning it
might be due to --fast being enabled.
copy --from --to is already complicated enough without needing to worry
about such edge cases, so continuing to doing some verification of
content location after the initial --fast filtering seems ok.
Sponsored-by: Dartmouth College's DANDI project
This allows getting rid of the ugly and error prone handling of
"bag of bytes" String in Remote.Helper.Encryptable.
Avoiding breakage like that dealt with by commit
9862d64bf9
And allows converting Utility.Gpg to use ByteString for IO, which is
a welcome change.
Tested the new git-annex interoperability with old, using all 3
encryption= types.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
The crash occurred because writeCreds got called twice, and writeFileProtected
neglected to close its file handle, so the file was open for write when
written the second time.
It seems unncessary and suboptimal that writeCreds gets called twice.
One call is from getRemoteCredPair and the other from setRemoteCredPair'.
What happens is that in the enableremote case, code that also runs at
initremote does unncessary work. Might be possible to improve that, but
I've gone for the simple fix.
Sponsored-by: k0ld on Patreon
git-annex only writes regular files there, but other things may drop junk
like empty .DAV directories around the tree. And trying to hash such things
can have weird and hard to understand effects. So it seems best to do a
small amount of work in statting the journal file to make sure it's a
regular file.
Sponsored-by: Jack Hill on Patreon
push: When on an adjusted branch, propagate changes to parent branch before
updating export remotes.
This is a somewhat redundant call to propigateAdjustedCommits, since it
also gets called at pushLocal time. That other one needs to come after
importing from importtree remotes though, and seekExportContent has to come
earlier, so I don't see a way to avoid doing it twice.
Note that git-annex sync also manages to avoid the problem, it's only
git-annex push that had the bug.
Sponsored-by: Leon Schuermann on Patreon
Fix more breakage caused by git's fix for CVE-2022-24765, this time
involving a remote (either local or ssh) that is a repository not owned by
the current user.
Sponsored-by: Dartmouth College's DANDI project
Only display warning when git-annex sync (without --content or
--no-content) is used with repositories that have preferred content
configured.
Sponsored-by: Leon Schuermann on Patreon
Failure to remove is not treated as a problem, and no permissions
modifications are done, to avoid unexpected states.
Sponsored-by: Luke Shumaker on Patreon
* S3: Amazon S3 buckets created after April 2023 do not support ACLs,
so public=yes cannot be used with them. Existing buckets configured
with public=yes will keep working.
* S3: Allow setting publicurl=yes without public=yes, to support
buckets that are configured with a Bucket Policy that allows public
access.
Sponsored-by: Joshua Antonishen on Patreon
This causes changes to the original branch to get merged with a single
sync. Before, it took 2 syncs; the first happened to update the synced/
branch, and the second merged changes from the synced/ branch into the
ajusted branch.
Using mergeToAdjustedBranch when tomerge == origbranch is probably
overkill, but it does work fine.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Bug fix: Re-running git-annex adjust or sync when in an adjusted branch
would overwrite the original branch, losing any commits that had been made
to it since the adjusted branch was created.
When git-annex adjust is run in this situation, it will display a warning
about the diverged branches.
When git-annex sync is run in this situation, mergeToAdjustedBranch
will merge the changes from the original branch to the adjusted branch.
So it does not need to display the divergence warning.
Note that for some reason, I'm needing to run sync twice for that to
happen. The first run does not do the merge and the second does. I'm unsure
why and so am not fully done with this bug.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Running git config --list inside .git then fails, so better to only
do that when --git-dir was specified explicitly. Otherwise, when the
repository is not bare, run the command inside the working tree.
Also make init detect when the uuid it just set cannot be read and fail
with an error, in case git changes something that breaks this later.
I still don't actually understand why git-annex add/assist -J2 was
affected but -J1 was not. But I did show that it was skipping writing to
the location log, because the uuid was NoUUID.
Sponsored-by: Graham Spencer on Patreon
Command.Add.seek starts concurrency with CommandStages. And for
Command.Sync, it needs TransferStages. So, to get both types of concurrency
for the two different parts, it either needs to change the type of
concurrency in between, or just call startConcurrency once for each.
It seems safe enough to call startConcurrency twice, because it does shut
down concurrency (mostly) at the end, and eg the old Annex.workers get
emptied.
Sponsored-by: unqueued on Patreon
This ended up having an interface like sync, rather than like get/copy/drop.
That let it be implemented in terms of sync, which took a lot less code.
Also, it lets it handle many of the edge cases that sync does, such as
getting files that are not visible in a --hide-missing branch, and sending
files to exporttree remotes.
As well as being easier to implement, `git-annex satisfy myremote` makes
sense as it satisfies the preferred content settings of the remote.
`git-annex satisfy somefile` does not form a sentence that makes sense. So
while -C can be a little bit annoying, it still makes sense to have this
syntax.
Note that, while I initially thought this would also satisfy numcopies, it
does not. Arguably it ought to. But, sync does not send files in order to
satisfy numcopies, it only sends files to satisfy preferred content. And
it's important that this transfer the same files as sync does, because
it will probably be used in a workflow where the user sometimes syncs and
sometimes satisfies, and does not expect satisfy to do things that sync
would not do.
(Also opened a new bug that also affects sync et all, not only this command.)
Sponsored-by: Nicholas Golder-Manning on Patreon
Fixes a crash by git-annex repair when .git/annex/journal/ does not exist.
Normally the journal directory is created before withJournalHandle gets
run, but git-annex repair can be run in a situation where it does not
exist.
git add will fail if the file got deleted in the meantime. And since it was
queued, there was a window until the queue flushed where a deletion of the
file would cause a crash.
Instead, reuse Command.Add.addFile, which sha1 hashes the file itself
immediately, and then queues the index update. Ignore exceptions that will
happen if the file got deleted already.
Sponsored-by: k0ld on Patreon
Commit b6642dde8a broke it by enabling
non-concurrent display mode while leaving concurrency set in the config
and having already started concurrency earlier.
(I don't actually know if that commit was a good idea.)
Sponsored-By: Brett Eisenberg on Patreon
Optimise database to further speed up importing large trees from special
remotes.
See comment for details of why the other index didn't help cid queries.
It would probably be better to manually create an index on only cid, rather
than adding a second uniqueness constraint that is a larger index. But
persitent does not support creating indexes, and an attempt to manually add
it to the migration failed.
Sponsored-by: Nicholas Golder-Manning on Patreon
This avoids bottlenecking on git check-ignore in a particular situation.
Also, there may have been a correctness issue with it not having updated it.
When the exportdb is already up-to-date, this is not expensive. And the
exportdb is updated elsewhere, so usually it is up-to-date.
Sponsored-by: Joshua Antonishen on Patreon
Speeds up eg git-annex sync --content by up to 50%. When it does not need
to transfer or drop anything, it now noops a lot more quickly.
I didn't see anything else in sync --content noop loop that could really
be sped up. It has to cat git objects to keys, stat object files, etc.
Sponsored-by: unqueued on Patreon
Speed up importing trees from special remotes somewhat by avoiding
redundant writes to sqlite database.
Before, import would write to both the git-annex branch and also to the
sqlite database. But then the next time it was run, needsUpdateFromLog
would see the branch had changed, so run updateFromLog, which would make
the same writes to the sqlite database a second time.
Now import writes only to the git-annex branch. The next time it's run,
needsUpdateFromLog sees that the branch has changed and so calls
updateFromLog, which updates the sqlite database.
Why defer the write to the sqlite database like this? It seems that it
could write to the database as it goes, and at the end call
recordAnnexBranchTree to indicate that the information in the git-annex
branch has all been written to the cidsdb. That would avoid the second
import doing extra work.
But, there could be other processes running at the same time, and one of
them may update the git-annex branch, eg merging a remote git-annex branch
into it. Any cids logs on that merged git-annex branch would not be
reflected in the cidsdb yet. If the import then called
recordAnnexBranchTree, the cidsdb would never get updated with that merged
information.
I don't think there's a good way to prevent, or to detect that situation.
So, it can't call recordAnnexBranchTree at the end. So it might as well
wait until the next run and do updateFromLog then. It could instead do
updateFromLog at the end, but it's going to check needsUpdateFromLog
at the beginning anyway.
Note that the database writes were queued, so there is already a cidmap
that is used to remember changes that the current process has made.
So, omitting database writes can't change the behavior of the current
process.
Also note that thirdpartypopulatedimport uses recordcidkeyindb, which
reflects what it already did. That code path does not use the cidmap,
but does not need to query it either. It might be possible to make that
code path also only update the git-annex branch and not the db, but I
haven't checked.
Sponsored-by: Noam Kremen on Patreon