directory: When cp supports reflinks, use it when getting content from a
directory special remote.
Not yet for imports from directory though, and not for store.
Note that, when it's chunked, using cp --reflink would not speed it up, and
when reflink was not supported, would unnecessarily write the chunk to a
file before reading it back in. So, only using a fileRetriever in the
NoChunks case is necessary to keep chunking fast.
fileCopier is told not to verify, because the special remote interface
does not yet support verification in passing. AFAICS, fileCopies can
never return False when not verifying so the added giveup should never
actually happen.
Had to add to AnnexRead an indication of whether debugging is enabled.
Could have just made setupConsole not install a debug output action that
outputs, and have enableDebug be what installs that, but then in the
common case where there is no debug selector, and so all debug output is
selected, it would run the debug output action every time, which entails
an IORef access. Which would make fastDebug too slow..
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.
The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.
Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
Values in AnnexRead can be read more efficiently, without MVar overhead.
Only a few things have been moved into there, and the performance
increase so far is not likely to be noticable.
This is groundwork for putting more stuff in there, particularly a value
that indicates if debugging is enabled.
The obvious next step is to change option parsing to not run in the
Annex monad to set values in AnnexState, and instead return a pure value
that gets stored in AnnexRead.
Note that a key with no size field that is hard linked will
result in listImportableContents reporting a file size of 0,
rather than the actual size of the file. One result is that
the progress meter when getting the file will seem to get stuck
at 100%. Another is that the remote's preferred content expression,
if it tries to match against file size, will treat it as an empty file.
I don't see a way to improve the latter behavior, and the former behavior
is a minor enough problem.
This commit was sponsored by Jake Vosloo on Patreon.
Keys stored on the filesystem are mangled by keyFile to avoid problem
chars. So, that mangling has to be reversed when parsing files from a
borg backup back to a key.
The directory special remote also so mangles them. Some other special
remotes do not; eg S3 just serializes the key -- but S3 object names are
not limited to filesystem valid filenames anyway, so a S3 server must
not map them directly to files in any case. It seems unlikely that a
borg backup of some such special remote will get broken by this change.
This commit was sponsored by Graham Spencer on Patreon.
New error message:
Remote foo not usable by git-annex; setting annex-ignore
http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1
If git config parse fails, or the git config file is not available at the url,
a better error message for that is also shown.
This commit was sponsored by Mark Reidenbach on Patreon.
Not yet used, but allows getting the size of items in the tree fairly
cheaply.
I noticed that CmdLine.Seek uses ls-tree and the feeds the files into
another long-running process to check their size. That would be an
example of a place that might be sped up by using this. Although in that
particular case, it only needs to know the size of unlocked files, not
locked. And since enabling --long probably doubles the ls-tree runtime
or more, the overhead of using it there may outwweigh the benefit.
box.com already had a special case, since its renaming was known buggy.
In its case, renaming to the temp file succeeds, but then renaming the temp
file to final destination fails.
Then this 4shared server has buggy handling of renames across directories.
While already worked around with for the temp files when storing exports
now being in the same directory as the final filename, that also affected
renameExport when the file moves between directories.
I'm not entirely clear what happens on the 4shared server when it fails
this way. It kind of looks like it may rename the file to destination and
then still fail.
To handle both, when rename fails, delete both the source and the
destination, and fall back to uploading the content again. In the box.com
case, the temp file is the source, and deleting it makes sure the temp file
gets cleaned up. In the 4shared case, the file may have been renamed to the
destination and so cleaning that up avoids any interference with the
re-upload to the destination.
When autoenabling special remotes of type S3, weddav, or glacier, do not
take login credentials from environment variables, as the user may not be
expecting the autoenable to happen, and may have those set for other
purposes.
This may work better in some webdav server that gets confused at
cross-collection renamed. I don't know, let's find out.
The only real downside of doing this is that the temp files are not all
in the top-level collection, in case an interrupted run leaves one
behind. But that does not seem especially significant.
Which access a remote using rsync over ssh, and which git pushes to much
more efficiently than ssh urls.
There was some old partial support for rsync URIs from 2013, but it seemed
incomplete, and did not use rsync over ssh. Weird.
I'm not sure if there's any remaining benefit to using the non-rsync url
forms with gcrypt, now that this is implemented? Updated docs to encourage
using the rsync urls.
This commit was sponsored by Svenne Krap on Patreon.
Avoiding using a callback simplifies this and should make it easier to
implement incremental checksumming, which will need to happen partly in
writeRetrievedContent and partly in retrieveChunks.
This benchmarks only slightly faster than the old git-annex. Eg, for a 1
gb file, 14.56s vs 15.57s. (On a ram disk; there would certianly be
more of an effect if the file was written to disk and didn't stay in
cache.)
Commenting out the updateIncremental calls make the same run in 6.31s.
May be that overhead in the implementation, other than the actual
checksumming, is slowing it down. Eg, MVar access.
(I also tried using 10x larger chunks, which did not change the speed.)
This is groundwork for calculating checksums while copying, rather than
in a separate pass, but that's not done yet. For now, avoid using rsync
(and cp on Windows), and instead read and write the file ourselves, with
resume handling.
Benchmarking vs old git-annex that used rsync, this is faster,
at least once the file size is larger than a couple of MB.
Changing to the P2P protocol broke this, because preseedTmp copies
the local copy of the object to the temp file, and then the P2P transfer
sees the right length file and uses it as-is.
When git-annex-shell is too old and rsync is used, it did verify the
content, and when the local repo does not have the object it did verify the
content.
This could perhaps have caused a hard link to be made when the content
of the object was modified. I don't think that actually happened,
because the annexed file would have to be unlocked, with annex.thin, for
the object to get modified, and in that case, a hard link is not made.
However, to be sure, run the check.
Note that it seemed best to run the check only once, although the
current implementation is fast and safe to run repeatedly.
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.
Not tested at all yet, but I imagine it will work!
Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
See my comment in the next commit for some details about why
Verified needs a hash with preimage resistance. As far as tahoe goes,
it's fully cryptographically secure.
I think that bup could also return Verified. However, the Retriever
interface does not currenly support that.
When a git remote is configured with an absolute path, use that path,
rather than making it relative. If it's configured with a relative path,
use that.
Git.Construct.fromPath changed to preserve the path as-is,
rather than making it absolute. And Annex.new changed to not
convert the path to relative. Instead, Git.CurrentRepo.get
generates a relative path.
A few things that used fromAbsPath unncessarily were changed in passing to
use fromPath instead. I'm seeing fromAbsPath as a security check,
while before it was being used in some cases when the path was
known absolute already. It may be that fromAbsPath is not really needed,
but only git-annex-shell uses it now, and I'm not 100% sure that there's
not some input that would cause a relative path to be used, opening a
security hole, without the security check. So left it as-is.
Test suite passes and strace shows the configured remote url is used
unchanged in the path into it. I can't be 100% sure there's not some code
somewhere that takes an absolute path to the repo and converts it to
relative and uses it, but it seems pretty unlikely that the code paths used
for a git remote would call such code. One place I know of is gitAnnexLink,
but I'm pretty sure that git remotes never deal with annex symlinks. If
that did get called, it generates a path relative to cwd, which would have
been wrong before this change as well, when operating on a remote.
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.
Note that the progress meter display is not taken down when displaying
the message, so it will display like this:
0% 8 B 0 B/s
Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
0% 10 B 0 B/s
Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.
Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.
A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.
This commit was sponsored by Kevin Mueller on Patreon.
Don't accept the cid of the temp file that the content has just been
written to as something we will accept if another file has that same
content. There's no reason to, and on FAT, due to mtime resolution,
the test suite hit just such a case.
This fixes a reversion from 73df633a62
which removed inode from the ContentIdentifier.
Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.
A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.
I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.
There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.
This commit was sponsored by Brett Eisenberg on Patreon.
After the last commit, it was able to throw errors just due to an
unparseable url. This avoids needing to worry about that, as long
as the call site has already checked that it has a parseable url.
Including the non-standard URI form that git-remote-gcrypt uses for rsync.
Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port
number. But git could have a remote with that, it would try to run
git-remote-ook to handle it. So, git-annex has to allow for such things,
rather than crashing.
This commit was sponsored by Luke Shumaker on Patreon.
This code I'm reverting works. But it has a problem: The export db and
log and the ContentIdentifier db and log still list the content as being
stored in the remote. So when I ran borg create again and stored the
content in borg again in a new archive, git-annex sync noticed that, but
since it didn't update the tree for the old archives, it then thought
the content that had been removed from them was still in them, and so
git-annex get failed in an ugly way:
Include pattern 'tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' never matched.
[2020-12-28 16:40:44.878952393] process [933616] done ExitFailure 1
user error (borg ["extract","/tmp/b::abs4","tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"] exited 1)
It does not seem worth it to update the git tree for the export when dropping
content, that would make drop of many files very expensive in git tree objects
created. So, let's not support this I suppose..
Note that, after changing it with enableremote, syncing won't rescan
known archives in the borg repo using the changed config. Probably not a
problem?
Also used File in some places where filenames that could theoretically
start with - are passed to borg, to avoid it confusing them with
options.
This makes sync a lot faster in the common case where there's no new
backup.
There's still room for it to be faster. Currently the old imported tree
has to be traversed, to generate the ImportableContents. Which then
gets turned around to generate the new imported tree, which is
identical. So, it would be possible to just return a "no new imports",
or an ImportableContents that has a way to graft in a tree. The latter
is probably too far to go to optimise this, unless other things need it.
The former might be worth it, but it's already pretty fast, since git
ls-tree is pretty fast.
It's unusual to use a ContentIdentifier that is not semi-unique
for different contents. Note that in importKeys, it checks if a content
identifier is one that's known before, to avoid downloading the same
content twice. But that's done in a code path not used for borg repos,
because they are thirdpartypopulated.
Still some issues to deal with, see TODO and XXX.
Here's what gets logged, for each key:
cid log:
1608582045.832799227s 6720ebad-b20e-4460-a8f2-2477361aea75 !MjAyMC0xMi0yMVQxMTozMzoxNw==:!MjAyMC0xMi0yMVQxMzowNzoyNg==
The "!Mj" are base64 encoded borg archive names, since mine were
dates and contained some characters not allowed in cid logs unescaped.
There were archives that each contained the key. This list will grow as
more borg backups are done and learned about.
tree generated:
120000 blob 5ef6a4615c084819b44cd4e3a31657664ddf643b x/dotgit/annex/objects/06/mv/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da
120000 blob 063a139d3021c8db60f5c576d29fada2b824d91c x/dotgit/annex/objects/72/PP/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893
120000 blob b53b54916fd6abf21fedf796deca08d5ac7a75af x/dotgit/annex/objects/Ww/pk/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2
This commit was sponsored by Denis Dzyubenko on Patreon.
May actually work now.
Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
I think this could cause unnecessary changes to the git-annex branch,
and retrieveExportWithContentIdentifier is now also used for getting
content from importtree=yes remotes, so it would happen more frequently
so let's avoid.
This is better than using the equivilant actions for export remotes,
especially for getting content, since the ContentIdentifier checking
means we can be sure (enough) that the content is valid to not force
verification of content. Which allows getting keys of types that cannot
be verified.
Also, reorganized the internals of adjustExportImport which was becoming
very hard to follow. Now it's clear what each method does in each case.
Ah, it seemed too easy before when I was implementing importrree only,
and it was because all the key-based actions needed to be handled too.
Mostly copied from isexport, and this works. It does seem that
an import remote could use retrieveExportWithContentIdentifier
rather than retrieveExport, and checkPresentExportWithContentIdentifier
rather than checkPresentExport, which would both be more accurate.
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
Done on unix, could not implement it on windows quite.
The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.
Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
All callers adjusted to update it themselves.
In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.
This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
There was no particular reason not to support this, other than maybe a lack
of a use case. One use case would of course be a remote that you want to
avoid overwriting content on. A new use case is the idea of importing from
backups, eg borg, where exporting is not necessarily supported at all.
This commit was sponsored by Brock Spratlen on Patreon.
It's not concurrent-output safe, and doesn't support
--json-error-messages.
Using Annex.makeRunner is a bit scary, because what if it's run in a
different thread from an active annex action? Normally the same Annex
state is not used concurrently in several threads, and it's not designed
to be fully concurrency safe. (Annex.Concurrent exists to deal with
that.) I think it will be ok in these simple cases though. Eg,
when buffering a warning message to json, Annex.changeState is used,
and it modifies the MVar in a concurrency safe way.
The only warningIO remaining is not a problem.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.
Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.
The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink
This commit was sponsored by Ethan Aubin.
It looks to me like the old code would have already dealt with the case
of ssh starting a ssh daemon that inherits stderr and keeps it open.
The ender thread closed the handle, which would unblock the other thread
and let it exit. Using hGetLineUntilExitOrEOF makes this more explicit
that it's dealt with and simplifies the code.
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.
Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:
* If two files point to one file, then eg, `git annex get foo` will
update the branch to unlock foo, but will not unlock bar, because it
does not know about it. Might be fixable by making `git annex get
bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info
Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.
A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
removeFile changed to removeLink, because AFAICS it should be fine to
remove non-file things here. In particular, it's fine to remove a
symlink, since we're about to write a symlink. (removeLink does not
remove directories, so file, symlink, and unix socket are the only
possibilities.)
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.
This commit was sponsored by Graham Spencer on Patreon.
Only done in checkPresentChunks, although retrieveChunks could also do
it. Does not seem necessary though, because git-annex never retrives
content without first checking if it's present AFAICR. And really this will
only be needed when using fsck. Puttting it here, rather than in fsck
avoids breaking an abstraction boundary, and is nice and inexpensive.
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.
This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.
Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
When I put in Haskell98 this spring, I was under the mistaken
apprehension that ghc defaulted to that. But it actually its default
is a third mode, which is closer to Haskell2010 but with some differences.
The manual says "By default, GHC mainly aims to behave (mostly) like a
Haskell 2010 compiler"
Fixed two cases where the Haskell98 do indentation flexability let
wrongly indented code build. That is one of the places where
ghc does not behave like Haskell2010 by default.
The other place that I think I was concerned about, is GHC manual
section 19.1.1.3. Expressions and patterns. But that only seems to
affect code using bottoms, so would only affect pure functions throwing
an error, which I don't think git-annex does in many places as it's
pretty horrid style. And it would only affect rare cases like shown in
that section. If it did happen, it would mean that the error was not
thrown before specifying Haskell98, and then was. Haskell2010 behaves
the same as Haskell98.
This commit was sponsored by Denis Dzyubenko on Patreon.
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.
The youtube-dl output is no longer displayed, except for any errors.
This commit was sponsored by Denis Dzyubenko on Patreon.
Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/
although I don't know if it does.
My thinking is, ResourceT may allocate a resource and then free it,
and a unforced thunk to that resource could result in reading memory
that has since been overwritten by something else, or in a SEGV,
depending. While that seems kind of like a bug in ResourceT to me, if it
is what's happening, this will avoid it. If it's not, this doesn't
really hurt much since the values are all smallish.
This commit was sponsored by Graham Spencer on Patreon.
So these special remotes are always supported.
IIRC these build flags were added because the dep chains were a bit too
long, or perhaps because the libraries were not available in Debian stable,
or something like that. That was long ago, those reasons no longer apply,
and users get confused when builtin special remotes are not available, so
it seems best to remove the build flags now.
If this does cause a problem it can be reverted of course..
This commit was sponsored by Jochen Bartl on Patreon.
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.
Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.
Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
Also tested what happens if the other special remote has importtree=yes
and exporttree=yes, and in that case, download via httpalso works too,
without needing to implement any importtree methods here.
It might be possible to make it automatically set exporttree=yes if the
--sameas does. Didn't try, will probably be layering issues.
Or perhaps it should be inherited by sameas like some
other configs? But then, wouldn't it also make sense to inherit
importree=yes? But as shown here, it's not needed by this kind of
remote.
"http" was too generic and easy to confuse with web. The new name makes
clear it's used in addition to some other remote. And other protocols
can use the same naming scheme.
I'm sure this used to work, but somewhere along the line something or
things (getCost and getAvailability I think, probably others)
started catching the exception and not displaying it. So, show warnings.
TVars were not updated atomically, which was ok when each thread got its
own External that was the only thing using these TVars. But, with the
async extension, several External instances can share the same var, so
it needs to be a TMVar to avoid read/write conflicts.
In particular, this makes PREPARE only be sent once.
Moving jobid generation to the git-annex side lets it be simplified a
lot.
Note that it will also be possible to generate one jobid per connection,
rather than a new job per request. That will make overflow not an issue,
and will avoid some work, and will simplify some of the code.
Receive loop looks right. Still need the send loop.
And, a complication is that some messages git-annex
sends need to be wrapped in REPLY_ASYNC, while others
do not. So will probably need to split externalSend
into two.
Idea is for ASYNC extension, it will instead contain methods that communicate
with the thread that handles all communication with the external process.
This was already prevented in other ways, but as seen in commit
c30fd24d91, those were a bit fragile.
And I'm not sure races were avoided in every case before. At least a
race between two separate git-annex processes, dropping the same
content, seemed possible.
This way, if locking fails, and the content is not present, it will
always do the right thing. Also, it avoids the overhead of an unncessary
inAnnex check for every file.
This commit was sponsored by Denis Dzyubenko on Patreon.
adb shell has sha256sum sha1sum and some others, so they could be used.
They're provided by toybox, so seem about as likely to keep
working as find and stat, which it already depends on.
Or to not add a dep, could use stat the same as getExportContentIdentifier
to get a mtime, and make a WORM key. But do I really want this to
default to WORM?
Unsure what's the best path, so punting for now.
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.
git-annex sync --no-content does not yet use this to do imports
Trivial since git-annex cannot remove, but do an active checkKey verification
anyway, in case the data was lost somehow.
This commit was sponsored by Ryan Newton on Patreon.
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
Otherwise use the vendored copy as before.
The library is in Debian testing but not stable. Once it reaches
stable, the vendored copy can be removed.
Did not add it to debian/control because IIRC that's used to build
git-annex on stable too, possibly. However, the Debian maintainer will
probably want to make the package depend on libghc-git-lfs-dev.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Clean build under ghc 8.8.3, which seems to do better at finding cases
where two imports both provide the same symbol, and warns about one of
them.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Fix bug that made creds not be stored in git when a special remote was
initialized with gpg encryption, but without an explicit embedcreds=yes.
(Yet nother regression introduced in version 7.20200202.7. 5th so far.)
This makes the creds get saved, since only things recorded there will be
saved.
IIRC, unparsedRemoteConfig was not originally available when I
implemented this; now that it is things get a bit simpler.
More could probably be simplified, is externalConfigChanges needed at
all?
This does not entirely fix the bugs though, because creds are only
embedded when embedcreds=yes, but not when encryption=pubkey is used
without embedcreds=yes.
* Improve display of problems auto-initializing or upgrading local git
remotes.
* When a local git remote cannot be initialized because it has no
git-annex branch or a .noannex file, avoid displaying a message about it.
So stop documenting it, and stop offering it as a choice in the assistant.
Removed the code that parses it into S3.ReducedRedundancy, because
S3.OtherStorageClass with the value will work just the same and avoids a
special case for a deprecated this.
Some recent changes to use mask missed that async exceptions can still
be thrown inside it. The goal is to make sure a block of cleanup code
runs entirely, w/o being interrupted by an async exception, so use
uninterruptibleMask.
Also, converted a few to bracket, which is nicer.
Since an external process can be in the middle of some operation when an
async exception is received, it has to be shut down then. Using
cleanupProcess will close its IO handles and send it a SIGTERM.
If a special remote choses to catch SIGTERM, it's fine for it to do some
cleanup then, but until it finishes, git-annex will be blocked waiting
for it. If a special remote blocked SIGTERM, it would cause a hang.
Mentioned in docs.
Also, in passing, fixed a FD leak, it was not closing the error handle
when shutting down the external. In practice that didn't matter before because
it was only run when git-annex was itself shutting down, but now that it
can run on exception, it would have been a problem.
Fixes reversion in recent conversions, the old code relied on the GC
apparently, but the new code explicitly waits on the process, so must
close stdin handle first or the command will never exit.
It was not the wrong handle. The handle was not being closed, so bup
kept running.
Before 2670890b17, the code was:
withHandle StdinHandle createProcessSuccess cmd feeder
The stdin handle was not closed by the feeder.
Testing this:
withHandle StdinHandle createProcessSuccess (proc "cat" []) (\h -> hPutStrLn h "hi")
There's a rather long pause, a couple seconds, before it completes, but
it does complete. With hClose h, it immediately completes. This must be
the GC noticing that h is out of scope and closing it.
It seems likely that the old code worked only by that accident.
So, other similar changes made in that and nearby commits may also
have this problem, and need to explicitly close handles that were
somehow implicitly closed before.