Prevent listProxied from listing anything when the proxy remote's
url is a local directory. Proxying does not work in that situation,
because the proxied remotes have the same url, and so git-annex-shell
is not run when accessing them, instead the proxy remote is accessed
directly.
I don't think there is any good way to support this. Even if the instantiated
git repos for the proxied remotes somehow used an url that caused it to use
git-annex-shell to access them, planned features like `git-annex copy --to
proxy` accepting a key and sending it on to nodes behind the proxy would not
work, since git-annex-shell is not used to access the proxy.
So it would need to use something to access the proxy that causes
git-annex-shell to be run and speaks P2P protocol over it. And we have that.
It's a ssh connection to localhost. Of course, it would be possible to
take ssh out of that mix, and swap in something that does not have
encryption overhead and authentication complications, but otherwise
behaves the same as ssh. And if the user wants to do that, GIT_SSH
does exist.
This just happened to work correctly. Rather surprisingly. It turns out
that openP2PSshConnection actually also supports local git remotes,
by just running git-annex-shell with the path to the remote.
Renamed "P2PSsh" to "P2PShell" to make this clear.
getRepoUUID looks at that, and was seeing the annex.uuid of the proxy.
Which caused it to unncessarily set the git config. Probably also would
have led to other problems.
connRepo is only used when relaying git upload-pack and receive-pack.
That's only supposed to be used when git-annex-remotedaemon is serving
git-remote-tor-annex connections over tor. But, it was always set, and
so could be used in other places possibly.
Fixed by making connRepo optional in the P2P protocol interface.
In Command.EnableTor, it's not needed, because it only speaks the
protocol in order to check that it's able to connect back to itself via
the hidden service. So changed that to pass Nothing rather than the git
repo.
In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio,
so is making the requests, so will never need connRepo.
In git-annex-shell p2pstdio, it was accepting git upload-pack and
receive-pack requests over the P2P protocol, even though nothing sent
them. This is arguably a security hole, particularly if the user has
set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent
git push/pull via git-annex-shell.
Got the right git config settings inherited now.
Note that the url config is not passed on to git, so it won't be able
to access the proxied remote. That would need some kind of
git-remote-annex but for proxied remotes anyway. Unsure yet if that will
be needed.
Untested, but this should be close to working. The proxied remotes have
the same url but a different uuid. When talking to current
git-annex-shell, it will fail due to a uuid mismatch. Once it supports
proxies, it will know that the presented uuid is for a remote that it
proxies for.
The check for any git config settings for a remote with the same name as
the proxied remote is there for several reasons. One is security:
Writing a name to the proxy log should not cause changes to
how an existing, configured git remote operates in a different clone of
the repo.
It's possible that the user has been using a proxied remote, and decides
to set a git config for it. We can't tell the difference between that
scenario and an evil remote trying to eg, intercept a file upload
by replacing their remote with a proxied remote.
Also, if the user sets some git config, does it override the config
inherited from the proxy remote? Seems a difficult question. Luckily,
the above means we don't need to think through it.
This does mean though, that in order for a user to change the config of
a proxy remote, they have to manually set its annex-uuid and url, as
well as the config they want to change. They may also have to set any of
the inherited configs that they were relying on.
When exporttree=yes is also set. Probably it would also be possible to
support ones with only importtree=yes, by enabling exporttree=yes for
the remote only when using git-remote-annex, but let's keep this
simple... I'm not sure what gets recorded in .git/annex/ state
differently in the two cases that might cause a problem when doing that.
Note that the full annex:: urls generated and displayed for such a
remote omit the importree=yes. Which is ok, cloning from such an url
uses an exporttree=remote, but the git-annex branch doesn't get written
by this program, so once the real config is available from the git-annex
branch, it will still function as an importree=yes remote.
Check explicitly for an annex:: url, not just any url. While no built-in
special remotes set an url, except ones that can be synced with, it
seems possible that some external special remote sets an url for its own
use, but did not expect it to be used by git-annex sync et al.
The assistant also syncs with them.
I hope to support importtree=yes eventually, but it does not currently
work.
Added remote.<name>.allow-encrypted-gitrepo that needs to be set to
allow using it with encrypted git repos.
Note that even encryption=pubkey uses a cipher stored in the git repo
to encrypt the keys stored in the remote. While it would be possible to
not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using
encryption=pubkey, it doesn't currently work, and that would be a
complication that I doubt is worth it.
Added rclone special remote, which can be used without needing to install
the git-annex-remote-rclone program. This needs a new version of rclone,
which supports "rclone gitannex".
This is implemented as a variant of an external special remote, that
runs "rclone gitannex" instead of the usual git-annex-remote- command.
Parameterized Remote.External to support that.
Sponsored-by: Luke T. Shumaker on Patreon
When a file in the export is renamed, and the remote's renameExport
returned Nothing, renaming to the temp file would first say it was
renaming, and appear to succeed, but actually what it did was delete the
file. Then renaming from the temp file would not do anything, since the
temp file is not present on the remote. This appeared as if a file got
renamed to a temp file and left there.
Note that exporttree=yes importree=yes remotes have their usual
renameExport replaced with one that returns Nothing. (For reasons
explained in Remote.Helper.ExportImport.) So this happened
even with remotes that support renameExport.
Fix by letting renameExport = Nothing when it's not supported at all.
This avoids displaying the rename.
Sponsored-by: Graham Spencer on Patreon
This needs the content to be present in order to hash it. But it's not
possible for a module used by Backend.URL to call inAnnex because that
would entail a dependency loop. So instead, rely on the fact that
Command.Migrate calls inAnnex before performing a migration.
But, Command.ExamineKey calls fastMigrate and the key may or may not
exist, and it's not wanting to actually perform a migration in any case.
To handle that, had to add an additional value to fastMigrate to
indicate whether the content is inAnnex.
Factored generateEquivilantKey out of Remote.Web.
Note that migrateFromURLToVURL hardcodes use of the SHA256E backend.
It would have been difficult not to, given all the dependency loop
issues. But --backend and annex.backend are used to tell git-annex
migrate to use VURL in any case, so there's no config knob that
the user could expect to configure that.
Sponsored-by: Brock Spratlen on Patreon
Considerable difficulty to work around an import cycle. Had to move the
list of backends (except for VURL) to Backend.Variety to VURL could use
it.
Sponsored-by: Kevin Mueller on Patreon
When downloading a VURL from the web, make sure that the equivilant key
log is populated.
Unfortunately, this does not hash the content while it's being
downloaded from the web. There is not an interface in Backend currently
for incrementally hash generation, only for incremental verification of an
existing hash. So this might add a noticiable delay, and it has to show
a "(checksum...") message. This could stand to be improved.
But, that separate hashing step only has to happen on the first download
of new content from the web. Once the hash is known, the VURL key can have
its hash verified incrementally while downloading except when the
content in the web has changed. (Doesn't happen yet because
verifyKeyContentIncrementally is not implemented yet for VURL keys.)
Note that the equivilant key log file is formatted as a presence log.
This adds a tiny bit of overhead (eg "1 ") per line over just listing the
urls. The reason I chose to use that format is it seems possible that
there will need to be a way to remove an equivilant key at some point in
the future. I don't know why that would be necessary, but it seemed wise
to allow for the possibility.
Downloads of VURL keys from other special remotes that claim urls,
like bittorrent for example, does not popilate the equivilant key log.
So for now, no checksum verification will be done for those.
Sponsored-by: Nicholas Golder-Manning on Patreon
Not yet implemented is recording hashes on download from web and
verifying hashes.
addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.
It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.
Sponsored-by: Joshua Antonishen on Patreon
external: Monitor file size when getting content from external special
remotes and use that to update the progress meter, in case the external
special remote program does not report progress.
This relies on 703a70cafa to prevent ever
running the meter backwards.
Sponsored-by: Dartmouth College's DANDI project
Improve annex.stalldetection to handle remotes that update progress less
frequently than the configured time period.
In particular, this makes remotes that don't report progress but are
chunked work when transferring a single chunk takes longer than the
specified time period.
Any remotes that just have very low update granulatity would also be
handled by this.
The change to Remote.Helper.Chunked avoids an extra progress update when
resuming an interrupted upload. In that case, the code saw first Nothing
and then Just the already transferred number of bytes, which defeated this
new heuristic. This change will mean that, when resuming an interrupted
upload to a chunked remote that does not do its own progress reporting, the
progress display does not start out displaying the amount sent so far,
until after the first chunk is sent. This behavior change does not seem
like a major problem.
About the scalefudgefactor, it seems reasonable to expect subsequent chunks
to take no more than 1.5 times as long as the first chunk to transfer.
Could set it to 1, but then any chunk taking a little longer would be
treated as a stall. 2 also seems a likely value. Even 10 might be fine?
Sponsored-by: Dartmouth College's DANDI project
Improve disk free space checking when transferring unsized keys to
local git remotes. Since the size of the object file is known, can
check that instead.
Getting unsized keys from local git remotes does not check the actual
object size. It would be harder to handle that direction because the size
check is run locally, before anything involving the remote is done. So it
doesn't know the size of the file on the remote.
Also, transferring unsized keys to other remotes, including ssh remotes and
p2p remotes don't do disk size checking for unsized keys. This would need a
change in protocol.
(It does seem like it would be possible to implement the same thing for
directory special remotes though.)
In some sense, it might be better to not ever do disk free checking for
unsized keys, than to do it only sometimes. A user might notice this
direction working and consider it a bug that the other direction does not.
On the other hand, disk reserve checking is not implemented for most
special remotes at all, and yet it is implemented for a few, which is also
inconsistent, but best effort. And so doing this best effort seems to make
some sense. Fundamentally, if the user wants the size to always be checked,
they should not use unsized keys.
Sponsored-by: Brock Spratlen on Patreon
This works well, and it interoperates with gpg in my testing (although some
SOP commands might choose to use a profile that does not so caveat emptor).
Note that for creating the Cipher, gpg --gen-random is still used. SOP
does not have an eqivilant, and as long as the user has gpg around,
which seems likely, it doesn't matter that it uses gpg here, it's not being
used for encryption. That seemed better than implementing a second way
to get high quality entropy, at least for now.
The need for the sop command to run in an empty directory has each call
to encrypt and decrypt creating a new temporary directory. That is some
unncessary overhead, though probably swamped by the overhead of running
the sop command. This could be improved in the future by passing an
already empty directory to them, or a sufficiently empty directory
(.git/annex/tmp would probably suffice).
Sponsored-by: Brett Eisenberg on Patreon
This allows getting rid of the ugly and error prone handling of
"bag of bytes" String in Remote.Helper.Encryptable.
Avoiding breakage like that dealt with by commit
9862d64bf9
And allows converting Utility.Gpg to use ByteString for IO, which is
a welcome change.
Tested the new git-annex interoperability with old, using all 3
encryption= types.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode
characters. Luckily, genUUIDInNameSpace is only ever used on ASCII
strings as far as I can determine. In particular, git-remote-gcrypt's
gcrypt-id is an ASCII string.
Note the use of fromString and toString from Data.ByteString.UTF8 dated
back to commit 9b93278e8a. Back then it
was using the dataenc package for base64, which operated on Word8 and
String. But with the switch to sandi, it uses ByteString, and indeed
fromB64' and toB64' were already using ByteString without that
complication. So I think there is no risk of such an encoding related
breakage.
I also tested the case that 9b93278e8a
fixed:
git-annex metadata -s foo='a …' x
git-annex metadata x
metadata x
foo=a …
In Remote.Helper.Encryptable, it was avoiding using Utility.Base64
because of that UTF8 conversion. Since that's no longer done, it can
just use it now.
Avoids some problems with unusual character in exporttree filenames
that confuse adb shell commands.
In particular, with a filename that contains \351, adb push sends the file
to the correct filename in /sdcard. And running find on the android device
roundtrips the filename. But, running mv on that filename on the android
device fails with "bad <filename>: No such file or directory".
Interestingly, ls on android works, and rm fails.
adb push to the final name to avoids this problem. But what about
atomicity? Well, I tried an adb push and interrupted it part way through.
The file was present while the push was running, but was removed once the
push got interrupted. I also tried yanking the cable while adb push was
running, and the partially received file was also deleted then. That avoids
most problems.
An import that runs at the same time as an export will see the partially
sent file. But that is unlikely to be done, and if it did happen, it would
notice that the imported file had changed in the meantime and discard it.
Note that, since rm on the android device fails on these filenames,
exporting a tree where the file is deleted is going to fail to remove it. I
don't see what I can do about that, so long as android is using an rm that
has issues with filename encodings.
This was tested on a phone where find, ls, and rm all come from Toybox 0.8.6.
Sponsored-by: unqueued on Patreon
Fix more breakage caused by git's fix for CVE-2022-24765, this time
involving a remote (either local or ssh) that is a repository not owned by
the current user.
Sponsored-by: Dartmouth College's DANDI project
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.
Debian stable now includes a new enough haskell-git-lfs package as well.
Last time this was tried it did not.
This is groundwork for making special remotes like borg be skipped by
sync when on an offline drive.
Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension
to the external special remote protocol. The extension is needed because
old git-annex, if it sees that response, will display a warning
message. (It does continue as if the remote is globally available, which
is acceptable, and the warning is only displayed at initremote due to
remote.name.annex-availability caching, but still it seemed best to make
this a protocol extension.)
The remote.name.annex-availability git config is no longer used any
more, and is documented as such. It was only used by external special
remotes to cache the availability, to avoid needing to start the
external process every time. Now that availability is queried as an
Annex action, the external is only started by sync (and the assistant),
when they actually check availability.
Sponsored-by: Nicholas Golder-Manning on Patreon
Fix behavior when importing a tree from a directory remote when the
directory does not exist. An empty tree was imported, rather than the
import failing. Merging that tree would delete every file in the
branch, if those files had been exported to the directory before.
The problem was that dirContentsRecursive returned [] when the directory
did not exist. Better for it to throw an exception. But in commit
74f0d67aa3 back in 2012, I made it never
theow exceptions, because exceptions throw inside unsafeInterleaveIO become
untrappable when the list is being traversed.
So, changed it to list the contents of the directory before entering
unsafeInterleaveIO. So exceptions are thrown for the directory. But still
not if it's unable to list the contents of a subdirectory. That's less of a
problem, because the subdirectory does exist (or if not, it got removed
after being listed, and it's ok to not include it in the list). A
subdirectory that has permissions that don't allow listing it will have its
contents omitted from the list still.
(Might be better to have it return a type that includes indications of
errors listing contents of subdirectories?)
The rest of the changes are making callers of dirContentsRecursive
use emptyWhenDoesNotExist when they relied on the behavior of it not
throwing an exception when the directory does not exist. Note that
it's possible some callers of dirContentsRecursive that used to ignore
permissions problems listing a directory will now start throwing exceptions
on them.
The fix to the directory special remote consisted of not making its
call in listImportableContentsM use emptyWhenDoesNotExist. So it will
throw an exception as desired.
Sponsored-by: Joshua Antonishen on Patreon
As was recently done with the directory special remote.
Note that the top directory passed to removeDirGeneric was changed to
avoid deleting .git/annex or .git/annex/objects if they ended up empty.
Sponsored-by: Brett Eisenberg on Patreon
Failure to remove is not treated as a problem, and no permissions
modifications are done, to avoid unexpected states.
Sponsored-by: Luke Shumaker on Patreon
* S3: Amazon S3 buckets created after April 2023 do not support ACLs,
so public=yes cannot be used with them. Existing buckets configured
with public=yes will keep working.
* S3: Allow setting publicurl=yes without public=yes, to support
buckets that are configured with a Bucket Policy that allows public
access.
Sponsored-by: Joshua Antonishen on Patreon
Reading from the cidsdb is responsible for about 25% of the runtime of
an import. Since the cidmap is used to store the same information in
ram, the cidsdb is not written to during an import any longer. And so,
if it started off empty (and updateFromLog wasn't needed), those reads
can just be skipped.
This is kind of a cheesy optimisation, since after any import from any
special remote, the database will no longer be empty, so it's a single
use optimisation. But it's probably not uncommon to start by importing a
lot of files, and it can save a lot of time then.
Sponsored-by: Brock Spratlen on Patreon
This makes annexFileMode be just an application of setAnnexPerm',
which avoids having 2 functions that do different versions of the same
thing.
Fixes some buggy behavior for some combinations of core.sharedRepository
and umask.
Sponsored-by: Jack Hill on Patreon
Propagate Annex.force into the remote's Annex state.
Fixes this problem:
joey@darkstar:~/tmp/xxxx>git-annex copy mmm --to origin --force
copy mmm (to origin...)
not enough free space, need 908.72 MB more (use --force to override this check or adjust annex.diskreserve)
failed to send content to remote
failed
Does beg the question if anything else should be propagated.
Some things like Annex.forcenumcopies certianly not; using --numcopies
overrides the number of copies the current repo wants, not all of them.
Sponsored-by: Graham Spencer on Patreon
enableremote: Support enableremote of a git remote (that was previously set
up with initremote) when additional parameters such as autoenable= are
passed.
The enableremote special case for regular git repos is intended to handle
ones that don't have a UUID probed, and the user wants git-annex to
re-probe. So, that special case is still needed. But, in that special
case, the user is not passing any extra parameters. So, when there are
parameters, instead run the special remote setup code. That requires there
to be a uuid known already, and it allows changing things like autoenable=
Remote.Git.enableRemote changed to be a no-op if a git remote with the name
already exists. Which it generally will in this case.
Sponsored-by: Jack Hill on Patreon
This does, as a side effect, make long notes in json output not
be indented. The indentation is only needed to offset them
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brock Spratlen on Patreon
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)
error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.
Changed a lot of existing error to giveup when it is not strictly an
internal error.
Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.
Sponsored-by: Noam Kremen on Patreon
Support VERSION 2 in the external special remote protocol, which is
identical to VERSION 1, but avoids external remote programs neededing to
work around the above bug. External remote program that support
exporttree=yes are recommended to be updated to send VERSION 2.
Sponsored-by: Kevin Mueller on Patreon
Fix bug that caused broken protocol to be used with external remotes that
use exporttree=yes. In some cases this could result in the wrong content
being exported to, or retrieved from the remote.
Sponsored-by: Nicholas Golder-Manning on Patreon
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.
Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.
Sponsored-by: unqueued on Patreon
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.
Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.
An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.
It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.
(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)
Sponsored-by: Jack Hill on Patreon
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.
Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.
Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.
The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.
Sponsored-by: Dartmouth College's Datalad project
path to a bare repo when git config is not allowed to list the configs
due to the CVE-2022-24765 fix.
That resulted in a confusing error message, and prevented the nice
message that explains how to mark the repo as safe to use.
Made isBare a tristate so that the case where core.bare is not returned can
be handled.
The handling in updateLocation is to check if the directory
contains config and objects and if so assume it's bare.
Note that if that heuristic is somehow wrong, it would construct a repo
that thinks it's bare but is not. That could cause follow-on problems,
but since git-annex then checks checkRepoConfigInaccessible, and skips
using the repo anyway, a wrong guess should not be a problem.
Sponsored-by: Luke Shumaker on Patreon
S3: Support a region= configuration useful for some non-Amazon S3
implementations. This feature needs git-annex to be built with aws-0.24.
datacenter= sets both the AWS hostname and region in one setting, which is
easy when using AWS, but not useful for other hosts. So kept datacenter
as-is, but added this additional config.
Sponsored-By: Brett Eisenberg on Patreon
Note that when this is specified and an older git-annex is used to
enableremote such a special remote, it will simply ignore the cost= field
and use whatever the default cost is.
In passing, fixed adb to support the remote.name.cost and
remote.name.cost-command configs.
Sponsored-by: Dartmouth College's DANDI project
When a web special remote does not have urlinclude/urlexclude
configured, make it respect the configuration of other web special
remotes and avoid using urls that match the config of another.
Note that the other web special remote does not have to be enabled.
That seems ok, it would have been extra work to check for only ones that
are enabled.
The implementation does mean that the web special remote re-parses
its own config once at startup, as well as re-parsing the configs of any
other web special remotes. This should be a very small slowdown
unless there are lots of web special remotes.
Sponsored-by: Dartmouth College's DANDI project
Allow initremote of additional special remotes with type=web, in addition
to the default web special remote.
When --sameas=web is used, these provide additional names for the web
special remote, and may also have their own additional configuration
(once there is any for the web special remote) and cost.
Sponsored-by: Dartmouth College's DANDI project
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.
Debian stable does have a too old haskell-git-lfs package to be able to
build git-annex from source, but there is not currently a backport of a
recent git-annex to Debian stable. And if they update the backport at some
point, they should be able to backport the library too.
Sponsored-by: Svenne Krap on Patreon
Improve handling of some .git/annex/ subdirectories being on other
filesystems, in the bittorrent special remote, and youtube-dl integration,
and git-annex addurl.
The only one of these that I've confirmed to be a problem is in the
bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are
on different filesystems.
As well as auditing for renameFile, also audited for createLink, all of
those are ok as are the other remaining renameFile calls. Also audited all
code paths that use .git/annex/othertmp, and did not find any other
cross-device problems. So, removing mention of othertmp needing to be on
the same device.
Sponsored-by: Dartmouth College's Datalad project
aws-0.23 has been released.
When built with an older aws, initremote will error out when run
with signature=anonymous. And when a remote has been initialized with
that by a version of git-annex that does support it, older versions will
fail when the remote is accessed, with a useful error message.
Sponsored-by: Dartmouth College's DANDI project
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)
Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.
In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.
In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.
The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.
After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?
Sponsored-by: Dartmouth College's Datalad project
S3: Speed up importing from a large bucket when fileprefix= is set by only
asking for files under the prefix.
getBucket still returns the files with the prefix included, so the rest of
the fileprefix stripping still works unchanged.
Sponsored-by: Dartmouth College's DANDI project
This can be used, for example, with importtree=yes to import from a public
bucket.
This needs a patch that has not yet landed in the aws library, and will
need to be adjusted to support compiling with old versions of the library,
so is not yet suitable for merging.
See https://github.com/aristidb/aws/pull/281
The stack.yaml changes are provided to show how to build against the aws
fork and will need to be reverted as well.
Sponsored-by: Dartmouth College's DANDI project
Combined with commit 0ffc59d341, this
fixes the case where there are duplicate files on the special remote,
and the first gets modified/deleted, while the second is still present.
directory, adb: Fixed a bug when importtree=yes, and multiple files in the
special remote have the same content, that caused it to refuse to get a
file from the special remote, incorrectly complaining that it had changed,
due to only accepting the inode+mtime of one file (that was since modified
or deleted) and not accepting the inode+mtime of other duplicate files.
Sponsored-by: Max Thoursie on Patreon
This partly fixes an issue where there are duplicate files in the
special remote, and the first file gets swapped with another duplicate,
or deleted. The swap case is fixed by this, the deleted case will need
other changes.
This makes retrieveExportWithContentIdentifier take a list of allowed
ContentIdentifier, same as storeExportWithContentIdentifier,
removeExportWithContentIdentifier, and
checkPresentExportWithContentIdentifier.
Of the special remotes that support importtree, borg is a special case
and does not use content identifiers, S3 I assume can't get mixed up
like this, directory certainly has the problem, and adb also appears to
have had the problem.
Sponsored-by: Graham Spencer on Patreon
Improve handling of directory special remotes with importtree=yes whose
ignoreinode setting has been changed. (By either enableremote or by
upgrading to commit 3e2f1f73cbc5fc10475745b3c3133267bd1850a7.)
When getting a file from such a remote, accept the content that would have
been accepted with the previous ignoreinode setting.
After a change to ignoreinode, importing a tree from the remote will
re-import and generate new content identifiers using the new config. So
when ignoreinode has changed to no, the inodes will be learned, and after
that point, a change in an inode will be detected as a change. Before
re-importing, a change in an inode will be ignored, as it was before the
ignoreinode change. This seems acceptble, because the user can re-import
immediately if they urgently need to add inodes. And if not, they'll
do it sometime, presumably, and the change will take effect then.
Sponsored-by: Erik Bjäreholt on Patreon
autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git
remote has not been probed yet, that will do a http get that will prompt
for a password. And then the parent process will subsequently prompt
for a password when getting annexed files from the remote.
So the solution is for autoEnableSpecialRemotes to run remoteList before
the subprocess, which will probe for the uuid for the git remote in the
same process that will later be used to get annexed files.
But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git,
so Annex.Init cannot import Remote.List. Had to pass remoteList into
functions in Annex.Init to get around this dependency loop.