2014-08-03 19:35:23 +00:00
|
|
|
{- helpers for special remotes
|
2011-03-30 18:00:54 +00:00
|
|
|
-
|
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.
This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.
Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.
The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 19:43:29 +00:00
|
|
|
- Copyright 2011-2021 Joey Hess <id@joeyh.name>
|
2011-03-30 18:00:54 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2011-03-30 18:00:54 +00:00
|
|
|
-}
|
|
|
|
|
incremental verify for byteRetriever special remotes
Several special remotes verify content while it is being retrieved,
avoiding a separate checksum pass. They are: S3, bup, ddar, and
gcrypt (with a local repository).
Not done when using chunking, yet.
Complicated by Retriever needing to change to be polymorphic. Which in turn
meant RankNTypes is needed, and also needed some code changes. The
change in Remote.External does not change behavior at all but avoids
the type checking failing because of a "rigid, skolem type" which
"would escape its scope". So I refactored slightly to make the type
checker's job easier there.
Unfortunately, directory uses fileRetriever (except when chunked),
so it is not amoung the improved ones. Fixing that would need a way for
FileRetriever to return a Verification. But, since the file retrieved
may be encrypted or chunked, it would be extra work to always
incrementally checksum the file while retrieving it. Hm.
Some other special remotes use fileRetriever, and so don't get incremental
verification, but could be converted to byteRetriever later. One is
GitLFS, which uses downloadConduit, which writes to the file, so could
verify as it goes. Other special remotes like web could too, but don't
use Remote.Helper.Special and so will need to be addressed separately.
Sponsored-by: Dartmouth College's DANDI project
2021-08-11 17:43:30 +00:00
|
|
|
{-# LANGUAGE RankNTypes #-}
|
2019-11-27 20:54:11 +00:00
|
|
|
{-# LANGUAGE OverloadedStrings #-}
|
|
|
|
|
2014-08-03 19:35:23 +00:00
|
|
|
module Remote.Helper.Special (
|
|
|
|
findSpecialRemotes,
|
|
|
|
gitConfigSpecialRemote,
|
2018-09-25 19:32:50 +00:00
|
|
|
mkRetrievalVerifiableKeysSecure,
|
2014-08-03 19:35:23 +00:00
|
|
|
Storer,
|
|
|
|
Retriever,
|
run Preparer to get Remover and CheckPresent actions
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
2014-08-06 18:28:36 +00:00
|
|
|
Remover,
|
|
|
|
CheckPresent,
|
2014-08-03 19:35:23 +00:00
|
|
|
ContentSource,
|
|
|
|
fileStorer,
|
|
|
|
byteStorer,
|
|
|
|
fileRetriever,
|
2021-08-16 20:22:00 +00:00
|
|
|
fileRetriever',
|
2014-08-03 19:35:23 +00:00
|
|
|
byteRetriever,
|
|
|
|
storeKeyDummy,
|
2020-05-13 21:05:56 +00:00
|
|
|
retrieveKeyFileDummy,
|
run Preparer to get Remover and CheckPresent actions
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
2014-08-06 18:28:36 +00:00
|
|
|
removeKeyDummy,
|
|
|
|
checkPresentDummy,
|
2014-08-03 19:35:23 +00:00
|
|
|
SpecialRemoteCfg(..),
|
|
|
|
specialRemoteCfg,
|
add LISTCONFIGS to external special remote protocol
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.
The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.
configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.
Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.
Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.
initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
2020-01-17 19:30:14 +00:00
|
|
|
specialRemoteConfigParsers,
|
2020-01-14 16:35:08 +00:00
|
|
|
specialRemoteType,
|
2014-08-03 19:35:23 +00:00
|
|
|
specialRemote,
|
|
|
|
specialRemote',
|
2019-10-10 17:08:17 +00:00
|
|
|
lookupName,
|
2014-08-03 19:35:23 +00:00
|
|
|
module X
|
|
|
|
) where
|
2011-03-30 18:00:54 +00:00
|
|
|
|
2016-01-20 20:36:33 +00:00
|
|
|
import Annex.Common
|
2019-10-10 16:48:26 +00:00
|
|
|
import Annex.SpecialRemote.Config
|
2014-08-03 19:35:23 +00:00
|
|
|
import Types.StoreRetrieve
|
2011-06-02 01:56:04 +00:00
|
|
|
import Types.Remote
|
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.
This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.
Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.
The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 19:43:29 +00:00
|
|
|
import Annex.Verify
|
2019-10-10 16:37:47 +00:00
|
|
|
import Annex.UUID
|
2024-10-15 18:29:06 +00:00
|
|
|
import Annex.Perms
|
2015-08-17 15:21:13 +00:00
|
|
|
import Config
|
2014-08-03 19:35:23 +00:00
|
|
|
import Config.Cost
|
|
|
|
import Utility.Metered
|
|
|
|
import Remote.Helper.Chunked as X
|
convert WebDAV to new special remote interface, adding new-style chunking support
Reusing http connection when operating on chunks is not done yet,
I had to submit some patches to DAV to support that. However, this is no
slower than old-style chunking was.
Note that it's a fileRetriever and a fileStorer, despite DAV using
bytestrings that would allow streaming. As a result, upload/download of
encrypted files is made a bit more expensive, since it spools them to temp
files. This was needed to get the progress meters to work.
There are probably ways to avoid that.. But it turns out that the current
DAV interface buffers the whole file content in memory, and I have
sent in a patch to DAV to improve its interfaces. Using the new interfaces,
it's certainly going to need to be a fileStorer, in order to read the file
size from the file (getting the size of a bytestring would destroy
laziness). It should be possible to use the new interface to make it be a
byteRetriever, so I'll change that when I get to it.
This commit was sponsored by Andreas Olsson.
2014-08-06 20:55:32 +00:00
|
|
|
import Remote.Helper.Encryptable as X
|
2014-08-03 19:35:23 +00:00
|
|
|
import Annex.Content
|
2015-04-03 19:33:28 +00:00
|
|
|
import Messages.Progress
|
2011-06-30 17:16:57 +00:00
|
|
|
import qualified Git
|
2011-12-13 19:05:07 +00:00
|
|
|
import qualified Git.Construct
|
2019-12-02 14:57:09 +00:00
|
|
|
import Git.Types
|
2011-03-30 18:00:54 +00:00
|
|
|
|
2019-11-27 20:54:11 +00:00
|
|
|
import qualified Data.ByteString as S
|
2014-08-03 19:35:23 +00:00
|
|
|
import qualified Data.ByteString.Lazy as L
|
|
|
|
import qualified Data.Map as M
|
|
|
|
|
2011-03-30 18:00:54 +00:00
|
|
|
{- Special remotes don't have a configured url, so Git.Repo does not
|
|
|
|
- automatically generate remotes for them. This looks for a different
|
|
|
|
- configuration key instead.
|
|
|
|
-}
|
|
|
|
findSpecialRemotes :: String -> Annex [Git.Repo]
|
|
|
|
findSpecialRemotes s = do
|
2011-12-14 19:30:14 +00:00
|
|
|
m <- fromRepo Git.config
|
2021-04-23 17:28:23 +00:00
|
|
|
liftIO $ catMaybes <$> mapM construct (remotepairs m)
|
2012-11-11 04:51:07 +00:00
|
|
|
where
|
|
|
|
remotepairs = M.toList . M.filterWithKey match
|
2019-11-27 20:54:11 +00:00
|
|
|
construct (k,_) = Git.Construct.remoteNamedFromKey k
|
|
|
|
(pure Git.Construct.fromUnknown)
|
2019-12-02 14:57:09 +00:00
|
|
|
match (ConfigKey k) _ =
|
|
|
|
"remote." `S.isPrefixOf` k
|
2021-08-11 00:45:02 +00:00
|
|
|
&& (".annex-" <> encodeBS s) `S.isSuffixOf` k
|
2011-03-30 18:00:54 +00:00
|
|
|
|
|
|
|
{- Sets up configuration for a special remote in .git/config. -}
|
2018-03-27 16:41:57 +00:00
|
|
|
gitConfigSpecialRemote :: UUID -> RemoteConfig -> [(String, String)] -> Annex ()
|
|
|
|
gitConfigSpecialRemote u c cfgs = do
|
|
|
|
forM_ cfgs $ \(k, v) ->
|
2021-08-11 00:45:02 +00:00
|
|
|
setConfig (remoteAnnexConfig c (encodeBS k)) v
|
2020-02-19 17:45:11 +00:00
|
|
|
storeUUIDIn (remoteAnnexConfig c "uuid") u
|
2014-08-03 19:35:23 +00:00
|
|
|
|
2018-09-25 19:32:50 +00:00
|
|
|
-- RetrievalVerifiableKeysSecure unless overridden by git config.
|
|
|
|
--
|
|
|
|
-- Only looks at the RemoteGitConfig; the GitConfig's setting is
|
|
|
|
-- checked at the same place the RetrievalSecurityPolicy is checked.
|
|
|
|
mkRetrievalVerifiableKeysSecure :: RemoteGitConfig -> RetrievalSecurityPolicy
|
|
|
|
mkRetrievalVerifiableKeysSecure gc
|
|
|
|
| remoteAnnexAllowUnverifiedDownloads gc = RetrievalAllKeysSecure
|
|
|
|
| otherwise = RetrievalVerifiableKeysSecure
|
|
|
|
|
2014-08-03 19:35:23 +00:00
|
|
|
-- A Storer that expects to be provided with a file containing
|
|
|
|
-- the content of the key to store.
|
2020-05-13 18:03:00 +00:00
|
|
|
fileStorer :: (Key -> FilePath -> MeterUpdate -> Annex ()) -> Storer
|
2014-08-03 19:35:23 +00:00
|
|
|
fileStorer a k (FileContent f) m = a k f m
|
|
|
|
fileStorer a k (ByteContent b) m = withTmp k $ \f -> do
|
2020-10-29 18:20:57 +00:00
|
|
|
let f' = fromRawFilePath f
|
|
|
|
liftIO $ L.writeFile f' b
|
|
|
|
a k f' m
|
2014-08-03 19:35:23 +00:00
|
|
|
|
|
|
|
-- A Storer that expects to be provided with a L.ByteString of
|
|
|
|
-- the content to store.
|
2020-05-13 18:03:00 +00:00
|
|
|
byteStorer :: (Key -> L.ByteString -> MeterUpdate -> Annex ()) -> Storer
|
2014-08-03 19:35:23 +00:00
|
|
|
byteStorer a k c m = withBytes c $ \b -> a k b m
|
|
|
|
|
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.
This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.
Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.
The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 19:43:29 +00:00
|
|
|
-- A Retriever that generates a lazy ByteString containing the Key's
|
|
|
|
-- content, and passes it to a callback action which will fully consume it
|
|
|
|
-- before returning.
|
2024-10-15 18:29:06 +00:00
|
|
|
byteRetriever :: (Key -> (L.ByteString -> Annex a) -> Annex a) -> Key -> MeterUpdate -> RawFilePath -> Maybe IncrementalVerifier -> (ContentSource -> Annex a) -> Annex a
|
|
|
|
byteRetriever a k _m _dest _miv callback = a k (callback . ByteContent)
|
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.
This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.
Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.
The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 19:43:29 +00:00
|
|
|
|
2024-10-15 18:29:06 +00:00
|
|
|
-- A Retriever that writes the content of a Key to a file.
|
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.
This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.
Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.
The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 19:43:29 +00:00
|
|
|
-- The action is responsible for updating the progress meter as it
|
|
|
|
-- retrieves data. The incremental verifier is updated in the background as
|
2021-08-16 18:50:21 +00:00
|
|
|
-- the action writes to the file, but may not be updated with the entire
|
|
|
|
-- content of the file.
|
2021-08-16 20:22:00 +00:00
|
|
|
fileRetriever :: (RawFilePath -> Key -> MeterUpdate -> Annex ()) -> Retriever
|
2022-05-09 17:18:47 +00:00
|
|
|
fileRetriever a = fileRetriever' $ \f k m miv ->
|
2021-08-16 20:22:00 +00:00
|
|
|
let retrieve = a f k m
|
2022-05-09 17:18:47 +00:00
|
|
|
in tailVerify miv f retrieve
|
2021-08-16 20:22:00 +00:00
|
|
|
|
2024-10-15 18:29:06 +00:00
|
|
|
{- A Retriever that writes the content of a Key to a file.
|
2021-08-16 20:22:00 +00:00
|
|
|
- The action is responsible for updating the progress meter and the
|
|
|
|
- incremental verifier as it retrieves data.
|
|
|
|
-}
|
|
|
|
fileRetriever' :: (RawFilePath -> Key -> MeterUpdate -> Maybe IncrementalVerifier -> Annex ()) -> Retriever
|
2024-10-15 18:29:06 +00:00
|
|
|
fileRetriever' a k m dest miv callback = do
|
|
|
|
createAnnexDirectory (parentDir dest)
|
|
|
|
a dest k m miv
|
|
|
|
pruneTmpWorkDirBefore dest (callback . FileContent . fromRawFilePath)
|
2014-08-03 19:35:23 +00:00
|
|
|
|
|
|
|
{- The base Remote that is provided to specialRemote needs to have
|
convert WebDAV to new special remote interface, adding new-style chunking support
Reusing http connection when operating on chunks is not done yet,
I had to submit some patches to DAV to support that. However, this is no
slower than old-style chunking was.
Note that it's a fileRetriever and a fileStorer, despite DAV using
bytestrings that would allow streaming. As a result, upload/download of
encrypted files is made a bit more expensive, since it spools them to temp
files. This was needed to get the progress meters to work.
There are probably ways to avoid that.. But it turns out that the current
DAV interface buffers the whole file content in memory, and I have
sent in a patch to DAV to improve its interfaces. Using the new interfaces,
it's certainly going to need to be a fileStorer, in order to read the file
size from the file (getting the size of a bytestring would destroy
laziness). It should be possible to use the new interface to make it be a
byteRetriever, so I'll change that when I get to it.
This commit was sponsored by Andreas Olsson.
2014-08-06 20:55:32 +00:00
|
|
|
- storeKey, retrieveKeyFile, removeKey, and checkPresent methods,
|
run Preparer to get Remover and CheckPresent actions
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
2014-08-06 18:28:36 +00:00
|
|
|
- but they are never actually used (since specialRemote replaces them).
|
2014-08-03 19:35:23 +00:00
|
|
|
- Here are some dummy ones.
|
|
|
|
-}
|
2024-07-01 14:42:27 +00:00
|
|
|
storeKeyDummy :: Key -> AssociatedFile -> Maybe FilePath -> MeterUpdate -> Annex ()
|
|
|
|
storeKeyDummy _ _ _ _ = error "missing storeKey implementation"
|
2021-08-17 16:41:36 +00:00
|
|
|
retrieveKeyFileDummy :: Key -> AssociatedFile -> FilePath -> MeterUpdate -> VerifyConfig -> Annex Verification
|
|
|
|
retrieveKeyFileDummy _ _ _ _ _ = error "missing retrieveKeyFile implementation"
|
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 16:23:46 +00:00
|
|
|
removeKeyDummy :: Maybe SafeDropProof -> Key -> Annex ()
|
|
|
|
removeKeyDummy _ _ = error "missing removeKey implementation"
|
run Preparer to get Remover and CheckPresent actions
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
2014-08-06 18:28:36 +00:00
|
|
|
checkPresentDummy :: Key -> Annex Bool
|
|
|
|
checkPresentDummy _ = error "missing checkPresent implementation"
|
|
|
|
|
|
|
|
type RemoteModifier
|
2020-01-13 16:35:39 +00:00
|
|
|
= ParsedRemoteConfig
|
2020-05-13 15:50:31 +00:00
|
|
|
-> Storer
|
|
|
|
-> Retriever
|
|
|
|
-> Remover
|
|
|
|
-> CheckPresent
|
run Preparer to get Remover and CheckPresent actions
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
2014-08-06 18:28:36 +00:00
|
|
|
-> Remote
|
|
|
|
-> Remote
|
2014-08-03 19:35:23 +00:00
|
|
|
|
|
|
|
data SpecialRemoteCfg = SpecialRemoteCfg
|
|
|
|
{ chunkConfig :: ChunkConfig
|
|
|
|
, displayProgress :: Bool
|
|
|
|
}
|
|
|
|
|
2020-01-13 16:35:39 +00:00
|
|
|
specialRemoteCfg :: ParsedRemoteConfig -> SpecialRemoteCfg
|
2014-08-03 19:35:23 +00:00
|
|
|
specialRemoteCfg c = SpecialRemoteCfg (getChunkConfig c) True
|
|
|
|
|
2020-01-14 16:35:08 +00:00
|
|
|
-- Modifies a base RemoteType to support chunking and encryption configs.
|
|
|
|
specialRemoteType :: RemoteType -> RemoteType
|
|
|
|
specialRemoteType r = r
|
add LISTCONFIGS to external special remote protocol
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.
The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.
configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.
Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.
Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.
initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
2020-01-17 19:30:14 +00:00
|
|
|
{ configParser = \c -> addRemoteConfigParser specialRemoteConfigParsers
|
|
|
|
<$> configParser r c
|
2020-01-14 16:35:08 +00:00
|
|
|
}
|
|
|
|
|
2020-01-14 17:18:15 +00:00
|
|
|
specialRemoteConfigParsers :: [RemoteConfigFieldParser]
|
|
|
|
specialRemoteConfigParsers = chunkConfigParsers ++ encryptionConfigParsers
|
2020-01-14 16:35:08 +00:00
|
|
|
|
2014-08-03 19:35:23 +00:00
|
|
|
-- Modifies a base Remote to support both chunking and encryption,
|
|
|
|
-- which special remotes typically should support.
|
2019-01-31 17:34:12 +00:00
|
|
|
--
|
|
|
|
-- Handles progress displays when displayProgress is set.
|
2014-08-03 19:35:23 +00:00
|
|
|
specialRemote :: RemoteModifier
|
|
|
|
specialRemote c = specialRemote' (specialRemoteCfg c) c
|
|
|
|
|
|
|
|
specialRemote' :: SpecialRemoteCfg -> RemoteModifier
|
2020-05-13 15:50:31 +00:00
|
|
|
specialRemote' cfg c storer retriever remover checkpresent baser = encr
|
2014-08-03 19:35:23 +00:00
|
|
|
where
|
|
|
|
encr = baser
|
2024-07-01 14:42:27 +00:00
|
|
|
{ storeKey = \k _af o p -> cip >>= storeKeyGen k o p
|
2021-08-17 16:41:36 +00:00
|
|
|
, retrieveKeyFile = \k _f d p vc -> cip >>= retrieveKeyFileGen k d p vc
|
2020-05-13 21:05:56 +00:00
|
|
|
, retrieveKeyFileCheap = case retrieveKeyFileCheap baser of
|
|
|
|
Nothing -> Nothing
|
|
|
|
Just a
|
|
|
|
-- retrieval of encrypted keys is never cheap
|
|
|
|
| isencrypted -> Nothing
|
|
|
|
| otherwise -> Just $ \k f d -> a k f d
|
2018-06-21 15:35:27 +00:00
|
|
|
-- When encryption is used, the remote could provide
|
|
|
|
-- some other content encrypted by the user, and trick
|
|
|
|
-- git-annex into decrypting it, leaking the decryption
|
|
|
|
-- into the git-annex repository. Verifiable keys
|
|
|
|
-- are the main protection against this attack.
|
|
|
|
, retrievalSecurityPolicy = if isencrypted
|
2018-09-25 19:32:50 +00:00
|
|
|
then mkRetrievalVerifiableKeysSecure (gitconfig baser)
|
2018-06-21 15:35:27 +00:00
|
|
|
else retrievalSecurityPolicy baser
|
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 16:23:46 +00:00
|
|
|
, removeKey = \k proof -> cip >>= removeKeyGen k proof
|
2014-08-06 17:45:19 +00:00
|
|
|
, checkPresent = \k -> cip >>= checkPresentGen k
|
2015-08-19 18:13:19 +00:00
|
|
|
, cost = if isencrypted
|
|
|
|
then cost baser + encryptedRemoteCostAdj
|
|
|
|
else cost baser
|
2014-10-21 18:36:09 +00:00
|
|
|
, getInfo = do
|
|
|
|
l <- getInfo baser
|
|
|
|
return $ l ++
|
|
|
|
[ ("encryption", describeEncryption c)
|
|
|
|
, ("chunking", describeChunkConfig (chunkConfig cfg))
|
|
|
|
]
|
2015-08-19 18:13:19 +00:00
|
|
|
, whereisKey = if noChunks (chunkConfig cfg) && not isencrypted
|
|
|
|
then whereisKey baser
|
|
|
|
else Nothing
|
2019-01-31 17:34:12 +00:00
|
|
|
, exportActions = (exportActions baser)
|
2024-01-19 19:14:26 +00:00
|
|
|
{ storeExport = \f k l p -> displayprogress uploadbwlimit p k (Just f) $
|
2019-01-31 17:34:12 +00:00
|
|
|
storeExport (exportActions baser) f k l
|
2024-01-19 19:14:26 +00:00
|
|
|
, retrieveExport = \k l f p -> displayprogress downloadbwlimit p k Nothing $
|
2019-01-31 17:34:12 +00:00
|
|
|
retrieveExport (exportActions baser) k l f
|
|
|
|
}
|
2014-08-03 19:35:23 +00:00
|
|
|
}
|
2016-05-23 21:27:15 +00:00
|
|
|
cip = cipherKey c (gitconfig baser)
|
2020-01-13 16:35:39 +00:00
|
|
|
isencrypted = isEncrypted c
|
2014-08-03 19:35:23 +00:00
|
|
|
|
|
|
|
-- chunk, then encrypt, then feed to the storer
|
2024-07-01 14:42:27 +00:00
|
|
|
storeKeyGen k o p enc = sendAnnex k o rollback $ \src _sz ->
|
2024-01-19 19:14:26 +00:00
|
|
|
displayprogress uploadbwlimit p k (Just src) $ \p' ->
|
2020-05-13 15:50:31 +00:00
|
|
|
storeChunks (uuid baser) chunkconfig enck k src p'
|
2021-02-16 19:46:14 +00:00
|
|
|
enc encr storer checkpresent
|
2014-08-03 19:35:23 +00:00
|
|
|
where
|
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 16:23:46 +00:00
|
|
|
rollback = void $ removeKey encr Nothing k
|
2016-04-27 16:54:43 +00:00
|
|
|
enck = maybe id snd enc
|
2014-08-03 19:35:23 +00:00
|
|
|
|
2015-04-27 21:40:21 +00:00
|
|
|
-- call retriever to get chunks; decrypt them; stream to dest file
|
2021-08-17 16:41:36 +00:00
|
|
|
retrieveKeyFileGen k dest p vc enc =
|
2024-01-19 19:14:26 +00:00
|
|
|
displayprogress downloadbwlimit p k Nothing $ \p' ->
|
2021-08-17 16:41:36 +00:00
|
|
|
retrieveChunks retriever (uuid baser) vc
|
incremental verify for byteRetriever special remotes
Several special remotes verify content while it is being retrieved,
avoiding a separate checksum pass. They are: S3, bup, ddar, and
gcrypt (with a local repository).
Not done when using chunking, yet.
Complicated by Retriever needing to change to be polymorphic. Which in turn
meant RankNTypes is needed, and also needed some code changes. The
change in Remote.External does not change behavior at all but avoids
the type checking failing because of a "rigid, skolem type" which
"would escape its scope". So I refactored slightly to make the type
checker's job easier there.
Unfortunately, directory uses fileRetriever (except when chunked),
so it is not amoung the improved ones. Fixing that would need a way for
FileRetriever to return a Verification. But, since the file retrieved
may be encrypted or chunked, it would be extra work to always
incrementally checksum the file while retrieving it. Hm.
Some other special remotes use fileRetriever, and so don't get incremental
verification, but could be converted to byteRetriever later. One is
GitLFS, which uses downloadConduit, which writes to the file, so could
verify as it goes. Other special remotes like web could too, but don't
use Remote.Helper.Special and so will need to be addressed separately.
Sponsored-by: Dartmouth College's DANDI project
2021-08-11 17:43:30 +00:00
|
|
|
chunkconfig enck k dest p' enc encr
|
2020-05-13 15:50:31 +00:00
|
|
|
where
|
2014-08-03 19:35:23 +00:00
|
|
|
enck = maybe id snd enc
|
|
|
|
|
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 16:23:46 +00:00
|
|
|
removeKeyGen proof k enc =
|
|
|
|
removeChunks remover (uuid baser) chunkconfig enck proof k
|
2014-08-03 19:35:23 +00:00
|
|
|
where
|
|
|
|
enck = maybe id snd enc
|
|
|
|
|
2020-05-13 15:50:31 +00:00
|
|
|
checkPresentGen k enc =
|
|
|
|
checkPresentChunks checkpresent (uuid baser) chunkconfig enck k
|
2014-08-03 19:35:23 +00:00
|
|
|
where
|
|
|
|
enck = maybe id snd enc
|
|
|
|
|
|
|
|
chunkconfig = chunkConfig cfg
|
|
|
|
|
2024-01-19 19:14:26 +00:00
|
|
|
downloadbwlimit = remoteAnnexBwLimitDownload (gitconfig baser)
|
|
|
|
<|> remoteAnnexBwLimit (gitconfig baser)
|
|
|
|
uploadbwlimit = remoteAnnexBwLimitUpload (gitconfig baser)
|
|
|
|
<|> remoteAnnexBwLimit (gitconfig baser)
|
|
|
|
|
|
|
|
displayprogress bwlimit p k srcfile a
|
bwlimit
Added annex.bwlimit and remote.name.annex-bwlimit config that works for git
remotes and many but not all special remotes.
This nearly works, at least for a git remote on the same disk. With it set
to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with
occasional spikes to 160 kb/s. So it needs to delay just a bit longer...
I'm unsure why.
However, at the beginning a lot of data flows before it determines the
right bandwidth limit. A granularity of less than 1s would probably improve
that.
And, I don't know yet if it makes sense to have it be 100ks/1s rather than
100kb/s. Is there a situation where the user would want a larger
granularity? Does granulatity need to be configurable at all? I only used that
format for the config really in order to reuse an existing parser.
This can't support for external special remotes, or for ones that
themselves shell out to an external command. (Well, it could, but it
would involve pausing and resuming the child process tree, which seems
very hard to implement and very strange besides.) There could also be some
built-in special remotes that it still doesn't work for, due to them not
having a progress meter whose displays blocks the bandwidth using thread.
But I don't think there are actually any that run a separate thread for
downloads than the thread that displays the progress meter.
Sponsored-by: Graham Spencer on Patreon
2021-09-21 20:58:02 +00:00
|
|
|
| displayProgress cfg = do
|
|
|
|
metered (Just p) (KeySizer k (pure (fmap toRawFilePath srcfile))) bwlimit (const a)
|
2014-08-03 19:35:23 +00:00
|
|
|
| otherwise = a p
|
|
|
|
|
|
|
|
withBytes :: ContentSource -> (L.ByteString -> Annex a) -> Annex a
|
|
|
|
withBytes (ByteContent b) a = a b
|
|
|
|
withBytes (FileContent f) a = a =<< liftIO (L.readFile f)
|