And follow-on changes.
Note that relatedTemplate was changed to operate on a RawFilePath, and
so when it counts the length, it is now the number of bytes, not the
number of code points. This will just make it truncate shorter strings
in some cases, the truncation is still unicode aware.
When not building with the OsPath flag, toOsPath . fromRawFilePath and
fromRawFilePath . fromOsPath do extra conversions back and forth between
String and ByteString. That overhead could be avoided, but that's the
non-optimised build mode, so didn't bother.
Sponsored-by: unqueued
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.
Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.
Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
Previously, when the git config was unable to be read from a ssh remote,
it would try to git fetch from it to determine if the remote was
otherwise accessible. That was unnessary work, since exit status 255
indicates a connection problem.
As well as avoiding the extra work of the fetch, this also improves
things when a ssh remote cannot be connected to due to a problem with
the git-annex ssh control socket. In that situation, ssh will also exit 255.
Before, the git fetch was tried in that situation, and would succeed, since
it does not use the git-annex ssh control socket. git-annex would conclude
that git-annex-shell was not installed on the remote, which could be wrong.
I suppose it also used to be possible for the user to need to enter a
ssh password on each connection to the remote. If they entered the wrong
password for the git-annex-shell call, but then the right password for
the git fetch, it would also incorrectly set annex-ignore, and that
situation is also now fixed.
* Removed the i386ancient standalone tarball build for linux, which
was increasingly unable to support new git-annex features.
* Removed support for building with ghc older than 9.0.2,
and with older versions of haskell libraries than are in current Debian
stable.
* stack.yaml: Update to lts-23.2.
Note that i386ancient was targeting linux 2.6.32, which has been EOL for
over 9 years now. Any old system still using such a kernel is certainly highly
insecure. And I suspect i386ancient had its own insecurities due to haskell
libraries and C libraries not having been updated.
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.
While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.
That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.
Anyway, there are other use cases for this that don't involve annex+http.
Logically, this should make it need a lot less memory when files have
been changed many times. In my tests, it didn't seem to change memory
use at all. Unsure why, it is working. It's possible the Response is not
getting garbage collected due to pinning. But as far as I can see, all
parts of it that are retained get copied in a way that won't keep the
whole thing pinned in memory.
Fix infinite loop and memory blowup when importing from an unversioned S3
bucket that is large enough to need pagination.
I don't think there actually ever will be a Marker element, a delimiter is
not set.
Probably this code path was never tested with pagination! Also the aws
library's lack of any docs made it easy to mess up.
Versioned buckets seem to not have the same problem. The API docs for
ListObjectVersions say that NextKeyMarker will always be provided when
paginating.
Changed the protocol docs because servant parses "true" and "false" for
booleans in query parameters, not "1" and "0".
clientPut with datapresent=True is not used by git-annex, and I don't
anticipate it being used in git-annex, except for testing.
I've tested this by making clientPut be called with datapresent=True and
git-annex copy to a remote succeeds once the object file is first
manually copied to the remote. That would be a good test for the test
suite, but running the http client means exposing it to at least
localhost, and would fail if a real http client was already running on
that port.
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
The problem was that when the proxy requests a key be retrieved to its
own temp file, fileRetriever was retriving it to the key's temp
location, and then moving it at the end, which broke streaming.
So, plumb through the path where the key is being retrieved to.
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.
So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.
There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.
Not currently in a compilable state.
This removes versionedExport, which was only used by the S3 special
remote. Instead, versionedexport=yes is a common way for remotes to
indicate that they are versioned.
This is not perfect because it does not handle versioned special
remotes, which should not be untrustworthy, but now are when proxied.
The implementation turned out to be easy, because the exporttree field
is a default field, so is available in RemoteConfig even for git
remotes.
This avoids needing to re-upload the file again to get it to the
annexobjects location, which git-annex sync was doing when it was
preferred content.
If the file is not preferred content, sync will drop it from the
annexobjects location.
If the file has been deleted from the tree, it will remain in the
annexobjects location until an unused/dropunused pass is done.
The file in the annexobjects location may have been renamed from a
previously exported file that got deleted in a subsequent export.
Or it may be renamed to annexobjects temporarily before being renamed to
another name (to handle eg pairwise renames).
But, an exported file is not guaranteed to contain the content of the
key that the local repository last exported there. Another tree could
have been exported from elsewhere in the meantime.
So, files in annexobjects do not necessarily have the content of their
key. And so have to be strongly verified when retrieving. The same as
is done when retrieving exported files.
Removing the key from the annexobjects location when it's in the
exported tree would leave it in the exported tree, and so succeeding
would update the location log incorrectly. But this also can't remove it
from the exported tree, because that would cause import tree to see a
file got deleted. So, refuse to remove in this situation.
It would be possible to remove from the annexobjects location and then
fail. Then if a key somehow got stored in both the annexobjects location
and the exported tree location(s), the duplicate would be resolved. Not
doing this because first, I don't know how that situation could happen,
and second, it seems wrong for a failed remove to have a side-effect
like that.