2011-06-02 01:56:04 +00:00
|
|
|
{- git-annex remotes types
|
2011-03-27 21:12:32 +00:00
|
|
|
-
|
2011-12-31 08:14:33 +00:00
|
|
|
- Most things should not need this, using Types instead
|
2011-03-27 19:56:43 +00:00
|
|
|
-
|
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 16:23:46 +00:00
|
|
|
- Copyright 2011-2024 Joey Hess <id@joeyh.name>
|
2011-03-27 19:56:43 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2011-03-27 19:56:43 +00:00
|
|
|
-}
|
|
|
|
|
2015-10-08 19:01:38 +00:00
|
|
|
{-# LANGUAGE RankNTypes #-}
|
|
|
|
|
2014-01-13 18:41:10 +00:00
|
|
|
module Types.Remote
|
2020-01-13 16:35:39 +00:00
|
|
|
( module Types.RemoteConfig
|
2014-01-13 18:41:10 +00:00
|
|
|
, RemoteTypeA(..)
|
|
|
|
, RemoteA(..)
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
, RemoteStateHandle
|
2017-02-07 18:35:58 +00:00
|
|
|
, SetupStage(..)
|
2014-01-13 18:41:10 +00:00
|
|
|
, Availability(..)
|
2021-08-17 16:41:36 +00:00
|
|
|
, VerifyConfigA(..)
|
other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8e66c26ef18317f56ae02bae1e7e280, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.
On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.
As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.
It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 17:56:42 +00:00
|
|
|
, Verification(..)
|
|
|
|
, unVerified
|
2018-06-21 15:35:27 +00:00
|
|
|
, RetrievalSecurityPolicy(..)
|
2017-09-07 17:45:31 +00:00
|
|
|
, isExportSupported
|
2019-03-04 20:02:56 +00:00
|
|
|
, isImportSupported
|
2017-09-01 17:02:07 +00:00
|
|
|
, ExportActions(..)
|
2019-02-20 19:34:33 +00:00
|
|
|
, ImportActions(..)
|
2019-02-27 17:42:34 +00:00
|
|
|
, ByteSize
|
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 16:23:46 +00:00
|
|
|
, SafeDropProof
|
2014-01-13 18:41:10 +00:00
|
|
|
)
|
|
|
|
where
|
2011-03-27 19:56:43 +00:00
|
|
|
|
2017-09-15 20:30:49 +00:00
|
|
|
import Data.Ord
|
2011-03-27 19:56:43 +00:00
|
|
|
|
2011-06-30 17:16:57 +00:00
|
|
|
import qualified Git
|
2011-06-02 01:56:04 +00:00
|
|
|
import Types.Key
|
2011-11-07 18:46:01 +00:00
|
|
|
import Types.UUID
|
2013-01-01 17:52:47 +00:00
|
|
|
import Types.GitConfig
|
2014-01-13 18:41:10 +00:00
|
|
|
import Types.Availability
|
2014-02-11 18:06:50 +00:00
|
|
|
import Types.Creds
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
import Types.RemoteState
|
2014-12-11 19:32:42 +00:00
|
|
|
import Types.UrlContents
|
2015-10-09 16:36:04 +00:00
|
|
|
import Types.NumCopies
|
2017-09-15 20:34:45 +00:00
|
|
|
import Types.Export
|
2019-02-21 17:38:27 +00:00
|
|
|
import Types.Import
|
2020-01-13 16:35:39 +00:00
|
|
|
import Types.RemoteConfig
|
2021-08-18 17:19:02 +00:00
|
|
|
import Utility.Hash (IncrementalVerifier)
|
2013-03-13 20:16:01 +00:00
|
|
|
import Config.Cost
|
2013-03-28 21:03:04 +00:00
|
|
|
import Utility.Metered
|
2017-09-07 17:45:31 +00:00
|
|
|
import Git.Types (RemoteName)
|
2013-10-11 20:03:18 +00:00
|
|
|
import Utility.SafeCommand
|
2014-12-08 17:40:15 +00:00
|
|
|
import Utility.Url
|
2019-02-27 17:15:02 +00:00
|
|
|
import Utility.DataUnits
|
2011-03-27 19:56:43 +00:00
|
|
|
|
2021-03-17 13:41:12 +00:00
|
|
|
data SetupStage = Init | Enable RemoteConfig | AutoEnable RemoteConfig
|
2017-02-07 18:35:58 +00:00
|
|
|
|
2011-03-29 03:51:07 +00:00
|
|
|
{- There are different types of remotes. -}
|
2017-09-07 17:45:31 +00:00
|
|
|
data RemoteTypeA a = RemoteType
|
2011-03-29 03:51:07 +00:00
|
|
|
-- human visible type name
|
2017-09-07 17:45:31 +00:00
|
|
|
{ typename :: String
|
2011-03-29 21:57:20 +00:00
|
|
|
-- enumerates remotes of this type
|
2015-08-05 17:49:54 +00:00
|
|
|
-- The Bool is True if automatic initialization of remotes is desired
|
2017-09-07 17:45:31 +00:00
|
|
|
, enumerate :: Bool -> a [Git.Repo]
|
2020-01-14 16:35:08 +00:00
|
|
|
-- generates a remote of this type
|
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.
Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)
To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.
And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.
(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 21:20:56 +00:00
|
|
|
, generate :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> a (Maybe (RemoteA a))
|
2020-01-13 16:35:39 +00:00
|
|
|
-- parse configs of remotes of this type
|
add LISTCONFIGS to external special remote protocol
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.
The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.
configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.
Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.
Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.
initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
2020-01-17 19:30:14 +00:00
|
|
|
, configParser :: RemoteConfig -> a RemoteConfigParser
|
2017-02-07 18:35:58 +00:00
|
|
|
-- initializes or enables a remote
|
2017-09-07 17:45:31 +00:00
|
|
|
, setup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> a (RemoteConfig, UUID)
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
-- check if a remote of this type is able to support export
|
2020-01-13 16:35:39 +00:00
|
|
|
, exportSupported :: ParsedRemoteConfig -> RemoteGitConfig -> a Bool
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
-- check if a remote of this type is able to support import
|
2020-01-13 16:35:39 +00:00
|
|
|
, importSupported :: ParsedRemoteConfig -> RemoteGitConfig -> a Bool
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
-- is a remote of this type not a usual key/value store,
|
|
|
|
-- or export/import of a tree of files, but instead a collection
|
|
|
|
-- of files, populated by something outside git-annex, some of
|
|
|
|
-- which may be annex objects?
|
|
|
|
, thirdPartyPopulated :: Bool
|
2017-09-07 17:45:31 +00:00
|
|
|
}
|
2011-03-29 03:51:07 +00:00
|
|
|
|
2011-12-31 08:11:39 +00:00
|
|
|
instance Eq (RemoteTypeA a) where
|
2011-12-31 07:27:37 +00:00
|
|
|
x == y = typename x == typename y
|
|
|
|
|
2011-03-29 03:51:07 +00:00
|
|
|
{- An individual remote. -}
|
2017-09-01 17:02:07 +00:00
|
|
|
data RemoteA a = Remote
|
2011-03-27 19:56:43 +00:00
|
|
|
-- each Remote has a unique uuid
|
2017-09-01 17:02:07 +00:00
|
|
|
{ uuid :: UUID
|
2011-03-27 19:56:43 +00:00
|
|
|
-- each Remote has a human visible name
|
2017-09-01 17:02:07 +00:00
|
|
|
, name :: RemoteName
|
2011-03-27 19:56:43 +00:00
|
|
|
-- Remotes have a use cost; higher is more expensive
|
2017-09-01 17:02:07 +00:00
|
|
|
, cost :: Cost
|
2014-07-26 17:25:06 +00:00
|
|
|
-- Transfers a key's contents from disk to the remote.
|
resume interrupted chunked uploads
Leverage the new chunked remotes to automatically resume uploads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.
But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!
This implementation will also allow starting an upload from one repository,
interrupting it, and then resuming the upload to the same remote from
an entirely different repository.
Note that I added a comment that storeKey should atomically move the content
into place once it's all received. This was already an undocumented
requirement -- it's necessary for hasKey to work reliably. This resume code
just uses hasKey to find the first chunk that's missing.
Note that if there are two uploads of the same key to the same chunked remote,
one might resume at the point the other had gotten to, but both will then
redundantly upload. As before.
In the non-resume case, this adds one hasKey call per storeKey, and only
if the remote is configured to use chunks. Future work: Try to eliminate that
hasKey. Notice that eg, `git annex copy --to` checks if the key is present
before sending it, so is already running hasKey.. which could perhaps
be cached and reused.
However, this additional overhead is not very large compared with
transferring an entire large file, and the ability to resume
is certianly worth it. There is an optimisation in place for small files,
that avoids trying to resume if the whole file fits within one chunk.
This commit was sponsored by Georg Bauer.
2014-07-28 18:18:08 +00:00
|
|
|
-- The key should not appear to be present on the remote until
|
|
|
|
-- all of its contents have been transferred.
|
2020-05-13 18:03:00 +00:00
|
|
|
-- Throws exception on failure.
|
2024-07-01 14:42:27 +00:00
|
|
|
, storeKey :: Key -> AssociatedFile -> Maybe FilePath -> MeterUpdate -> a ()
|
2013-04-11 21:15:45 +00:00
|
|
|
-- Retrieves a key's contents to a file.
|
other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8e66c26ef18317f56ae02bae1e7e280, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.
On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.
As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.
It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 17:56:42 +00:00
|
|
|
-- (The MeterUpdate does not need to be used if it writes
|
|
|
|
-- sequentially to the file.)
|
2020-05-13 21:05:56 +00:00
|
|
|
-- Throws exception on failure.
|
2021-08-17 16:41:36 +00:00
|
|
|
, retrieveKeyFile :: Key -> AssociatedFile -> FilePath -> MeterUpdate -> VerifyConfigA a -> a Verification
|
2024-10-15 19:35:09 +00:00
|
|
|
{- Will retrieveKeyFile write to the file in order? -}
|
|
|
|
, retrieveKeyFileInOrder :: a Bool
|
2015-04-18 17:07:57 +00:00
|
|
|
-- Retrieves a key's contents to a tmp file, if it can be done cheaply.
|
|
|
|
-- It's ok to create a symlink or hardlink.
|
2020-05-13 21:05:56 +00:00
|
|
|
-- Throws exception on failure.
|
|
|
|
, retrieveKeyFileCheap :: Maybe (Key -> AssociatedFile -> FilePath -> a ())
|
2018-06-21 15:35:27 +00:00
|
|
|
-- Security policy for reteiving keys from this remote.
|
|
|
|
, retrievalSecurityPolicy :: RetrievalSecurityPolicy
|
2020-05-14 18:08:09 +00:00
|
|
|
-- Removes a key's contents (succeeds even the contents are not present)
|
|
|
|
-- Can throw exception if unable to access remote, or if remote
|
toward SafeDropProof expiry checking
Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is
based on a LockedCopy. If there are several LockedCopies, it uses the
closest expiry time. That is not optimal, it may be that the proof
expires based on one LockedCopy but another one has not expired. But
that seems unlikely to really happen, and anyway the user can just
re-run a drop if it fails due to expiry.
Pass the SafeDropProof to removeKey, which is responsible for checking
it for expiry in situations where that could be a problem. Which really
only means in Remote.Git.
Made Remote.Git check expiry when dropping from a local remote.
Checking expiry when dropping from a P2P remote is not yet implemented.
P2P.Protocol.remove has SafeDropProof plumbed through to it for that
purpose.
Fixing the remaining 2 build warnings should complete this work.
Note that the use of a POSIXTime here means that if the clock gets set
forward while git-annex is in the middle of a drop, it may say that
dropping took too long. That seems ok. Less ok is that if the clock gets
turned back a sufficient amount (eg 5 minutes), proof expiry won't be
noticed. It might be better to use the Monotonic clock, but that doesn't
advance when a laptop is suspended, and while there is the linux
Boottime clock, that is not available on other systems. Perhaps a
combination of POSIXTime and the Monotonic clock could detect laptop
suspension and also detect clock being turned back?
There is a potential future flag day where
p2pDefaultLockContentRetentionDuration is not assumed, but is probed
using the P2P protocol, and peers that don't support it can no longer
produce a LockedCopy. Until that happens, when git-annex is
communicating with older peers there is a risk of data loss when
a ssh connection closes during LOCKCONTENT.
2024-07-04 16:23:46 +00:00
|
|
|
-- refuses to remove the content, or if the proof is expired.
|
|
|
|
--
|
|
|
|
-- The proof is verified not to have expired shortly
|
|
|
|
-- before calling this. But, if the remote's lockContent returns
|
|
|
|
-- LockedCopy, the proof's expiry should be checked on the remote,
|
|
|
|
-- so that a delay in communicating with the remote does not
|
|
|
|
-- cause the removal to happen after the proof expires.
|
|
|
|
, removeKey :: Maybe SafeDropProof -> Key -> a ()
|
2015-10-08 19:01:38 +00:00
|
|
|
-- Uses locking to prevent removal of a key's contents,
|
2015-10-09 17:07:03 +00:00
|
|
|
-- thus producing a VerifiedCopy, which is passed to the callback.
|
|
|
|
-- If unable to lock, does not run the callback, and throws an
|
2020-05-14 18:08:09 +00:00
|
|
|
-- exception.
|
2015-10-08 19:01:38 +00:00
|
|
|
-- This is optional; remotes do not have to support locking.
|
2017-09-01 17:02:07 +00:00
|
|
|
, lockContent :: forall r. Maybe (Key -> (VerifiedCopy -> a r) -> a r)
|
2014-08-06 17:45:19 +00:00
|
|
|
-- Checks if a key is present in the remote.
|
|
|
|
-- Throws an exception if the remote cannot be accessed.
|
2017-09-01 17:02:07 +00:00
|
|
|
, checkPresent :: Key -> a Bool
|
2014-08-06 17:45:19 +00:00
|
|
|
-- Some remotes can checkPresent without an expensive network
|
2011-03-27 19:56:43 +00:00
|
|
|
-- operation.
|
2017-09-01 17:02:07 +00:00
|
|
|
, checkPresentCheap :: Bool
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
-- Some remotes support export.
|
2019-01-30 18:55:28 +00:00
|
|
|
, exportActions :: ExportActions a
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
-- Some remotes support import.
|
2019-02-20 19:34:33 +00:00
|
|
|
, importActions :: ImportActions a
|
2012-02-14 07:49:48 +00:00
|
|
|
-- Some remotes can provide additional details for whereis.
|
2017-09-01 17:02:07 +00:00
|
|
|
, whereisKey :: Maybe (Key -> a [String])
|
2013-10-11 20:03:18 +00:00
|
|
|
-- Some remotes can run a fsck operation on the remote,
|
|
|
|
-- without transferring all the data to the local repo
|
|
|
|
-- The parameters are passed to the fsck command on the remote.
|
2017-09-01 17:02:07 +00:00
|
|
|
, remoteFsck :: Maybe ([CommandParam] -> a (IO Bool))
|
2013-10-27 19:38:59 +00:00
|
|
|
-- Runs an action to repair the remote's git repository.
|
2017-09-01 17:02:07 +00:00
|
|
|
, repairRepo :: Maybe (a Bool -> a (IO Bool))
|
2012-11-30 04:55:59 +00:00
|
|
|
-- a Remote has a persistent configuration store
|
2020-01-13 16:35:39 +00:00
|
|
|
, config :: ParsedRemoteConfig
|
2018-06-04 18:31:55 +00:00
|
|
|
-- Get the git repo for the Remote.
|
|
|
|
, getRepo :: a Git.Repo
|
2013-01-01 17:52:47 +00:00
|
|
|
-- a Remote's configuration from git
|
2017-09-01 17:02:07 +00:00
|
|
|
, gitconfig :: RemoteGitConfig
|
2023-03-14 02:39:16 +00:00
|
|
|
-- a Remote can be associated with a specific local filesystem path
|
2017-09-01 17:02:07 +00:00
|
|
|
, localpath :: Maybe FilePath
|
2012-08-26 19:39:02 +00:00
|
|
|
-- a Remote can be known to be readonly
|
2017-09-01 17:02:07 +00:00
|
|
|
, readonly :: Bool
|
2018-08-30 15:12:18 +00:00
|
|
|
-- a Remote can allow writes but not have a way to delete content
|
2020-12-28 18:37:15 +00:00
|
|
|
-- from it.
|
2018-08-30 15:12:18 +00:00
|
|
|
, appendonly :: Bool
|
2020-12-28 19:08:53 +00:00
|
|
|
-- Set if a remote cannot be trusted to continue to contain the
|
|
|
|
-- contents of files stored there. Notably, most export/import
|
|
|
|
-- remotes are untrustworthy because they are not key/value stores.
|
|
|
|
-- Since this prevents the user from adjusting a remote's trust
|
|
|
|
-- level, it's often better not not set it and instead let the user
|
|
|
|
-- decide.
|
|
|
|
, untrustworthy :: Bool
|
2013-03-15 23:16:13 +00:00
|
|
|
-- a Remote can be globally available. (Ie, "in the cloud".)
|
2023-08-16 18:31:31 +00:00
|
|
|
-- Some Remotes can mark themselves unavailable.
|
|
|
|
, availability :: a Availability
|
2011-12-31 07:27:37 +00:00
|
|
|
-- the type of the remote
|
2017-09-01 17:02:07 +00:00
|
|
|
, remotetype :: RemoteTypeA a
|
2014-08-10 18:52:58 +00:00
|
|
|
-- For testing, makes a version of this remote that is not
|
|
|
|
-- available for use. All its actions should fail.
|
2017-09-01 17:02:07 +00:00
|
|
|
, mkUnavailable :: a (Maybe (RemoteA a))
|
2014-10-21 18:36:09 +00:00
|
|
|
-- Information about the remote, for git annex info to display.
|
2017-09-01 17:02:07 +00:00
|
|
|
, getInfo :: a [(String, String)]
|
2020-05-21 15:58:57 +00:00
|
|
|
-- Some remotes can download from an url (or uri). This asks the
|
|
|
|
-- remote if it can handle a particular url. The actual download
|
|
|
|
-- will be done using retrieveKeyFile, and the remote can look up
|
|
|
|
-- up the url to download for a key using Logs.Web.getUrls.
|
2017-09-01 17:02:07 +00:00
|
|
|
, claimUrl :: Maybe (URLString -> a Bool)
|
2014-12-11 19:32:42 +00:00
|
|
|
-- Checks that the url is accessible, and gets information about
|
|
|
|
-- its contents, without downloading the full content.
|
2014-12-08 23:14:24 +00:00
|
|
|
-- Throws an exception if the url is inaccessible.
|
2017-09-01 17:02:07 +00:00
|
|
|
, checkUrl :: Maybe (URLString -> a UrlContents)
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
, remoteStateHandle :: RemoteStateHandle
|
2017-09-01 17:02:07 +00:00
|
|
|
}
|
2011-03-27 19:56:43 +00:00
|
|
|
|
2020-03-02 19:50:40 +00:00
|
|
|
instance RemoteNameable (RemoteA a) where
|
|
|
|
getRemoteName = name
|
|
|
|
|
2011-12-31 08:11:39 +00:00
|
|
|
instance Show (RemoteA a) where
|
2011-03-30 19:15:46 +00:00
|
|
|
show remote = "Remote { name =\"" ++ name remote ++ "\" }"
|
2011-03-27 19:56:43 +00:00
|
|
|
|
|
|
|
-- two remotes are the same if they have the same uuid
|
2011-12-31 08:11:39 +00:00
|
|
|
instance Eq (RemoteA a) where
|
2011-03-27 20:17:56 +00:00
|
|
|
x == y = uuid x == uuid y
|
2011-03-27 19:56:43 +00:00
|
|
|
|
2018-08-03 17:06:06 +00:00
|
|
|
-- Order by cost since that is the important order of remotes
|
|
|
|
-- when deciding which to use. But since remotes often have the same cost
|
|
|
|
-- and Ord must be total, do a secondary ordering by uuid.
|
2011-12-31 08:11:39 +00:00
|
|
|
instance Ord (RemoteA a) where
|
2018-08-03 17:06:06 +00:00
|
|
|
compare a b
|
|
|
|
| cost a == cost b = comparing uuid a b
|
|
|
|
| otherwise = comparing cost a b
|
other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8e66c26ef18317f56ae02bae1e7e280, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.
On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.
As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.
It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 17:56:42 +00:00
|
|
|
|
2015-10-08 21:58:32 +00:00
|
|
|
instance ToUUID (RemoteA a) where
|
|
|
|
toUUID = uuid
|
|
|
|
|
2021-08-17 16:41:36 +00:00
|
|
|
data VerifyConfigA a
|
|
|
|
= AlwaysVerify
|
|
|
|
| NoVerify
|
|
|
|
| RemoteVerify (RemoteA a)
|
|
|
|
| DefaultVerify
|
|
|
|
|
2018-03-13 18:18:30 +00:00
|
|
|
data Verification
|
|
|
|
= UnVerified
|
2018-03-13 18:50:49 +00:00
|
|
|
-- ^ Content was not verified during transfer, but is probably
|
2018-03-13 18:18:30 +00:00
|
|
|
-- ok, so if verification is disabled, don't verify it
|
|
|
|
| Verified
|
2018-03-13 18:50:49 +00:00
|
|
|
-- ^ Content was verified during transfer, so don't verify it
|
2021-02-09 17:42:16 +00:00
|
|
|
-- again. The verification does not need to use a
|
|
|
|
-- cryptographically secure hash, but the hash does need to
|
|
|
|
-- have preimage resistance.
|
2021-08-16 18:50:21 +00:00
|
|
|
| IncompleteVerify IncrementalVerifier
|
|
|
|
-- ^ Content was partially verified during transfer, but
|
|
|
|
-- the verification is not complete.
|
2022-05-09 16:25:04 +00:00
|
|
|
| MustVerify
|
|
|
|
-- ^ Content likely to have been altered during transfer,
|
|
|
|
-- verify even if verification is normally disabled
|
|
|
|
| MustFinishIncompleteVerify IncrementalVerifier
|
|
|
|
-- ^ Content likely to have been altered during transfer,
|
|
|
|
-- finish verification even if verification is normally disabled.
|
other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8e66c26ef18317f56ae02bae1e7e280, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.
On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.
As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.
It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 17:56:42 +00:00
|
|
|
|
2020-05-13 21:05:56 +00:00
|
|
|
unVerified :: Monad m => m a -> m (a, Verification)
|
other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8e66c26ef18317f56ae02bae1e7e280, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.
On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.
As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.
It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 17:56:42 +00:00
|
|
|
unVerified a = do
|
|
|
|
ok <- a
|
|
|
|
return (ok, UnVerified)
|
add API for exporting
Implemented so far for the directory special remote.
Several remotes don't make sense to export to. Regular Git remotes,
obviously, do not. Bup remotes almost certianly do not, since bup would
need to be used to extract the export; same store for Ddar. Web and
Bittorrent are download-only. GCrypt is always encrypted so exporting to
it would be pointless. There's probably no point complicating the Hook
remotes with exporting at this point. External, S3, Glacier, WebDAV,
Rsync, and possibly Tahoe should be modified to support export.
Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey
interface, rather than adding a new interface. But, it seemed better to
keep it separate, to avoid a complicated interface that sometimes
encrypts/chunks key/value storage and sometimes users non-key/value
storage. Any common parts can be factored out.
Note that storeExport is not atomic.
doc/design/exporting_trees_to_special_remotes.mdwn has some things in
the "resuming exports" section that bear on this decision. Basically,
I don't think, at this time, that an atomic storeExport would help with
resuming, because exports are not key/value storage, and we can't be
sure that a partially uploaded file is the same content we're currently
trying to export.
Also, note that ExportLocation will always use unix path separators.
This is important, because users may export from a mix of windows and
unix, and it avoids complicating the API with path conversions,
and ensures that in such a mix, they always use the same locations for
exports.
This commit was sponsored by Bruno BEAUFILS on Patreon.
2017-08-29 17:00:41 +00:00
|
|
|
|
2018-06-21 15:35:27 +00:00
|
|
|
-- Security policy indicating what keys can be safely retrieved from a
|
|
|
|
-- remote.
|
|
|
|
data RetrievalSecurityPolicy
|
|
|
|
= RetrievalVerifiableKeysSecure
|
|
|
|
-- ^ Transfer of keys whose content can be verified
|
|
|
|
-- with a hash check is secure; transfer of unverifiable keys is
|
|
|
|
-- not secure and should not be allowed.
|
|
|
|
--
|
|
|
|
-- This is used eg, when HTTP to a remote could be redirected to a
|
|
|
|
-- local private web server or even a file:// url, causing private
|
|
|
|
-- data from it that is not the intended content of a key to make
|
|
|
|
-- its way into the git-annex repository.
|
|
|
|
--
|
|
|
|
-- It's also used when content is stored encrypted on a remote,
|
|
|
|
-- which could replace it with a different encrypted file, and
|
|
|
|
-- trick git-annex into decrypting it and leaking the decryption
|
|
|
|
-- into the git-annex repository.
|
|
|
|
--
|
|
|
|
-- It's not (currently) used when the remote could alter the
|
|
|
|
-- content stored on it, because git-annex does not provide
|
|
|
|
-- strong guarantees about the content of keys that cannot be
|
|
|
|
-- verified with a hash check.
|
|
|
|
-- (But annex.securehashesonly does provide such guarantees.)
|
|
|
|
| RetrievalAllKeysSecure
|
|
|
|
-- ^ Any key can be securely retrieved.
|
|
|
|
|
2017-09-07 17:45:31 +00:00
|
|
|
isExportSupported :: RemoteA a -> a Bool
|
|
|
|
isExportSupported r = exportSupported (remotetype r) (config r) (gitconfig r)
|
|
|
|
|
2019-03-04 20:02:56 +00:00
|
|
|
isImportSupported :: RemoteA a -> a Bool
|
|
|
|
isImportSupported r = importSupported (remotetype r) (config r) (gitconfig r)
|
|
|
|
|
2017-09-07 17:45:31 +00:00
|
|
|
data ExportActions a = ExportActions
|
2017-09-01 17:02:07 +00:00
|
|
|
-- Exports content to an ExportLocation.
|
|
|
|
-- The exported file should not appear to be present on the remote
|
|
|
|
-- until all of its contents have been transferred.
|
2020-05-15 16:17:15 +00:00
|
|
|
-- Throws exception on failure.
|
|
|
|
{ storeExport :: FilePath -> Key -> ExportLocation -> MeterUpdate -> a ()
|
2017-09-01 17:02:07 +00:00
|
|
|
-- Retrieves exported content to a file.
|
|
|
|
-- (The MeterUpdate does not need to be used if it writes
|
|
|
|
-- sequentially to the file.)
|
2020-05-15 16:51:09 +00:00
|
|
|
-- Throws exception on failure.
|
2022-05-09 16:25:04 +00:00
|
|
|
, retrieveExport :: Key -> ExportLocation -> FilePath -> MeterUpdate -> a Verification
|
2017-09-01 17:02:07 +00:00
|
|
|
-- Removes an exported file (succeeds if the contents are not present)
|
2020-05-15 18:11:59 +00:00
|
|
|
-- Can throw exception if unable to access remote, or if remote
|
|
|
|
-- refuses to remove the content.
|
|
|
|
, removeExport :: Key -> ExportLocation -> a ()
|
2017-09-15 17:15:47 +00:00
|
|
|
-- Removes an exported directory. Typically the directory will be
|
2019-06-04 18:40:07 +00:00
|
|
|
-- empty, but it could possibly contain files or other directories,
|
2019-06-05 01:47:29 +00:00
|
|
|
-- and it's ok to delete those (but not required to).
|
|
|
|
-- If the remote does not use directories, or automatically cleans
|
|
|
|
-- up empty directories, this can be Nothing.
|
2020-05-15 18:32:45 +00:00
|
|
|
--
|
2019-06-05 01:47:29 +00:00
|
|
|
-- Should not fail if the directory was already removed.
|
2020-05-15 18:32:45 +00:00
|
|
|
--
|
|
|
|
-- Throws exception if unable to contact the remote, or perhaps if
|
|
|
|
-- the remote refuses to let the directory be removed.
|
|
|
|
, removeExportDirectory :: Maybe (ExportDirectory -> a ())
|
2017-09-01 17:02:07 +00:00
|
|
|
-- Checks if anything is exported to the remote at the specified
|
2020-12-28 18:37:15 +00:00
|
|
|
-- ExportLocation. It may check the size or other characteristics
|
|
|
|
-- of the Key, but does not need to guarantee that the content on
|
|
|
|
-- the remote is the same as the Key's content.
|
2017-09-01 17:02:07 +00:00
|
|
|
-- Throws an exception if the remote cannot be accessed.
|
|
|
|
, checkPresentExport :: Key -> ExportLocation -> a Bool
|
|
|
|
-- Renames an already exported file.
|
2020-05-15 19:05:52 +00:00
|
|
|
--
|
webdav: deal with buggy webdav servers in renameExport
box.com already had a special case, since its renaming was known buggy.
In its case, renaming to the temp file succeeds, but then renaming the temp
file to final destination fails.
Then this 4shared server has buggy handling of renames across directories.
While already worked around with for the temp files when storing exports
now being in the same directory as the final filename, that also affected
renameExport when the file moves between directories.
I'm not entirely clear what happens on the 4shared server when it fails
this way. It kind of looks like it may rename the file to destination and
then still fail.
To handle both, when rename fails, delete both the source and the
destination, and fall back to uploading the content again. In the box.com
case, the temp file is the source, and deleting it makes sure the temp file
gets cleaned up. In the 4shared case, the file may have been renamed to the
destination and so cleaning that up avoids any interference with the
re-upload to the destination.
2021-03-22 17:08:18 +00:00
|
|
|
-- If the remote does not support the requested rename,
|
|
|
|
-- it can return Nothing. It's ok if the remove deletes
|
|
|
|
-- the file in such a situation too; it will be re-exported to
|
|
|
|
-- recover.
|
2020-05-15 19:05:52 +00:00
|
|
|
--
|
|
|
|
-- Throws an exception if the remote cannot be accessed, or
|
|
|
|
-- the file doesn't exist or cannot be renamed.
|
2024-03-09 17:37:51 +00:00
|
|
|
, renameExport :: Maybe (Key -> ExportLocation -> ExportLocation -> a (Maybe ()))
|
2017-09-01 17:02:07 +00:00
|
|
|
}
|
2019-02-20 19:34:33 +00:00
|
|
|
|
|
|
|
data ImportActions a = ImportActions
|
|
|
|
-- Finds the current set of files that are stored in the remote,
|
2019-02-27 17:15:02 +00:00
|
|
|
-- along with their content identifiers and size.
|
2019-02-20 19:34:33 +00:00
|
|
|
--
|
|
|
|
-- May also find old versions of files that are still stored in the
|
2019-02-21 17:38:27 +00:00
|
|
|
-- remote.
|
2020-12-22 18:20:11 +00:00
|
|
|
--
|
|
|
|
-- Throws exception on failure to access the remote.
|
2020-12-22 18:35:02 +00:00
|
|
|
-- May return Nothing when the remote is unchanged since last time.
|
2021-10-06 21:05:32 +00:00
|
|
|
{ listImportableContents :: a (Maybe (ImportableContentsChunkable a (ContentIdentifier, ByteSize)))
|
2020-12-17 16:29:44 +00:00
|
|
|
-- Generates a Key (of any type) for the file stored on the
|
|
|
|
-- remote at the ImportLocation. Does not download the file
|
|
|
|
-- from the remote.
|
2020-07-03 17:41:57 +00:00
|
|
|
--
|
|
|
|
-- May update the progress meter if it needs to perform an
|
|
|
|
-- expensive operation, such as hashing a local file.
|
|
|
|
--
|
|
|
|
-- Ensures that the key corresponds to the ContentIdentifier,
|
|
|
|
-- bearing in mind that the file on the remote may have changed
|
|
|
|
-- since the ContentIdentifier was generated.
|
|
|
|
--
|
2020-12-30 17:21:40 +00:00
|
|
|
-- When it returns nothing, the file at the ImportLocation
|
2021-10-06 21:05:32 +00:00
|
|
|
-- will not be included in the imported tree.
|
2020-12-30 17:21:40 +00:00
|
|
|
--
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
-- When the remote is thirdPartyPopulated, this should check if the
|
|
|
|
-- file stored on the remote is the content of an annex object,
|
2020-12-30 17:21:40 +00:00
|
|
|
-- and return its Key, or Nothing if it is not.
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
--
|
|
|
|
-- Throws exception on failure to access the remote.
|
2020-12-18 20:52:49 +00:00
|
|
|
, importKey :: Maybe (ImportLocation -> ContentIdentifier -> ByteSize -> MeterUpdate -> a (Maybe Key))
|
2019-02-20 19:34:33 +00:00
|
|
|
-- Retrieves a file from the remote. Ensures that the file
|
change retrieveExportWithContentIdentifier to take a list of ContentIdentifier
This partly fixes an issue where there are duplicate files in the
special remote, and the first file gets swapped with another duplicate,
or deleted. The swap case is fixed by this, the deleted case will need
other changes.
This makes retrieveExportWithContentIdentifier take a list of allowed
ContentIdentifier, same as storeExportWithContentIdentifier,
removeExportWithContentIdentifier, and
checkPresentExportWithContentIdentifier.
Of the special remotes that support importtree, borg is a special case
and does not use content identifiers, S3 I assume can't get mixed up
like this, directory certainly has the problem, and adb also appears to
have had the problem.
Sponsored-by: Graham Spencer on Patreon
2022-09-20 17:15:31 +00:00
|
|
|
-- it retrieves has one of the requested ContentIdentifiers.
|
2019-02-20 19:34:33 +00:00
|
|
|
--
|
|
|
|
-- This has to be used rather than retrieveExport
|
|
|
|
-- when a special remote supports imports, since files on such a
|
|
|
|
-- special remote can be changed at any time.
|
2020-05-15 16:51:09 +00:00
|
|
|
--
|
|
|
|
-- Throws exception on failure.
|
2019-02-20 19:34:33 +00:00
|
|
|
, retrieveExportWithContentIdentifier
|
|
|
|
:: ExportLocation
|
change retrieveExportWithContentIdentifier to take a list of ContentIdentifier
This partly fixes an issue where there are duplicate files in the
special remote, and the first file gets swapped with another duplicate,
or deleted. The swap case is fixed by this, the deleted case will need
other changes.
This makes retrieveExportWithContentIdentifier take a list of allowed
ContentIdentifier, same as storeExportWithContentIdentifier,
removeExportWithContentIdentifier, and
checkPresentExportWithContentIdentifier.
Of the special remotes that support importtree, borg is a special case
and does not use content identifiers, S3 I assume can't get mixed up
like this, directory certainly has the problem, and adb also appears to
have had the problem.
Sponsored-by: Graham Spencer on Patreon
2022-09-20 17:15:31 +00:00
|
|
|
-> [ContentIdentifier]
|
2020-05-11 06:40:13 +00:00
|
|
|
-- file to write content to
|
2019-02-27 17:15:02 +00:00
|
|
|
-> FilePath
|
2022-05-09 19:38:21 +00:00
|
|
|
-- Either the key, or when it's not yet known, a callback
|
|
|
|
-- that generates a key from the downloaded content.
|
|
|
|
-> Either Key (a Key)
|
2019-02-20 19:34:33 +00:00
|
|
|
-> MeterUpdate
|
2022-05-09 19:38:21 +00:00
|
|
|
-> a (Key, Verification)
|
2019-02-20 19:34:33 +00:00
|
|
|
-- Exports content to an ExportLocation, and returns the
|
|
|
|
-- ContentIdentifier corresponding to the content it stored.
|
|
|
|
--
|
2019-03-05 18:20:14 +00:00
|
|
|
-- This is used rather than storeExport when a special remote
|
2019-02-20 19:34:33 +00:00
|
|
|
-- supports imports, since files on such a special remote can be
|
|
|
|
-- changed at any time.
|
|
|
|
--
|
|
|
|
-- Since other things can modify the same file on the special
|
|
|
|
-- remote, this must take care to not overwrite such modifications,
|
2019-03-04 18:46:25 +00:00
|
|
|
-- and only overwrite a file that has one of the ContentIdentifiers
|
|
|
|
-- passed to it, unless listContents can recover an overwritten file.
|
2019-02-20 19:34:33 +00:00
|
|
|
--
|
|
|
|
-- Also, since there can be concurrent writers, the implementation
|
|
|
|
-- needs to make sure that the ContentIdentifier it returns
|
|
|
|
-- corresponds to what it wrote, not to what some other writer
|
|
|
|
-- wrote.
|
2020-05-15 16:17:15 +00:00
|
|
|
--
|
|
|
|
-- Throws exception on failure.
|
2019-02-20 19:34:33 +00:00
|
|
|
, storeExportWithContentIdentifier
|
|
|
|
:: FilePath
|
|
|
|
-> Key
|
|
|
|
-> ExportLocation
|
2020-05-11 06:40:13 +00:00
|
|
|
-- old content that it's safe to overwrite
|
2019-02-20 19:34:33 +00:00
|
|
|
-> [ContentIdentifier]
|
|
|
|
-> MeterUpdate
|
2020-05-15 16:17:15 +00:00
|
|
|
-> a ContentIdentifier
|
2019-03-05 18:20:14 +00:00
|
|
|
-- This is used rather than removeExport when a special remote
|
|
|
|
-- supports imports.
|
|
|
|
--
|
|
|
|
-- It should only remove a file from the remote when it has one
|
|
|
|
-- of the ContentIdentifiers passed to it, unless listContents
|
|
|
|
-- can recover an overwritten file.
|
|
|
|
--
|
|
|
|
-- It needs to handle races similar to storeExportWithContentIdentifier.
|
2020-05-15 18:11:59 +00:00
|
|
|
--
|
|
|
|
-- Throws an exception when unable to remove.
|
2019-03-05 18:20:14 +00:00
|
|
|
, removeExportWithContentIdentifier
|
|
|
|
:: Key
|
|
|
|
-> ExportLocation
|
|
|
|
-> [ContentIdentifier]
|
2020-05-15 18:11:59 +00:00
|
|
|
-> a ()
|
2019-03-05 18:20:14 +00:00
|
|
|
-- Removes a directory from the export, but only when it's empty.
|
|
|
|
-- Used instead of removeExportDirectory when a special remote
|
|
|
|
-- supports imports.
|
|
|
|
--
|
|
|
|
-- If the directory is not empty, it should succeed.
|
2020-05-15 18:32:45 +00:00
|
|
|
--
|
|
|
|
-- Throws exception if unable to contact the remote, or perhaps if
|
|
|
|
-- the remote refuses to let the directory be removed.
|
|
|
|
, removeExportDirectoryWhenEmpty :: Maybe (ExportDirectory -> a ())
|
2019-03-05 20:02:33 +00:00
|
|
|
-- Checks if the specified ContentIdentifier is exported to the
|
|
|
|
-- remote at the specified ExportLocation.
|
|
|
|
-- Throws an exception if the remote cannot be accessed.
|
|
|
|
, checkPresentExportWithContentIdentifier
|
|
|
|
:: Key
|
|
|
|
-> ExportLocation
|
|
|
|
-> [ContentIdentifier]
|
|
|
|
-> a Bool
|
2019-02-20 19:34:33 +00:00
|
|
|
}
|
webdav: deal with buggy webdav servers in renameExport
box.com already had a special case, since its renaming was known buggy.
In its case, renaming to the temp file succeeds, but then renaming the temp
file to final destination fails.
Then this 4shared server has buggy handling of renames across directories.
While already worked around with for the temp files when storing exports
now being in the same directory as the final filename, that also affected
renameExport when the file moves between directories.
I'm not entirely clear what happens on the 4shared server when it fails
this way. It kind of looks like it may rename the file to destination and
then still fail.
To handle both, when rename fails, delete both the source and the
destination, and fall back to uploading the content again. In the box.com
case, the temp file is the source, and deleting it makes sure the temp file
gets cleaned up. In the 4shared case, the file may have been renamed to the
destination and so cleaning that up avoids any interference with the
re-upload to the destination.
2021-03-22 17:08:18 +00:00
|
|
|
|