2019-08-01 19:11:45 +00:00
|
|
|
{- Using git-lfs as a remote.
|
|
|
|
-
|
2020-01-14 16:35:08 +00:00
|
|
|
- Copyright 2019-2020 Joey Hess <id@joeyh.name>
|
2019-08-01 19:11:45 +00:00
|
|
|
-
|
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
|
|
|
-}
|
|
|
|
|
2019-11-27 20:54:11 +00:00
|
|
|
{-# LANGUAGE OverloadedStrings #-}
|
incremental verify for byteRetriever special remotes
Several special remotes verify content while it is being retrieved,
avoiding a separate checksum pass. They are: S3, bup, ddar, and
gcrypt (with a local repository).
Not done when using chunking, yet.
Complicated by Retriever needing to change to be polymorphic. Which in turn
meant RankNTypes is needed, and also needed some code changes. The
change in Remote.External does not change behavior at all but avoids
the type checking failing because of a "rigid, skolem type" which
"would escape its scope". So I refactored slightly to make the type
checker's job easier there.
Unfortunately, directory uses fileRetriever (except when chunked),
so it is not amoung the improved ones. Fixing that would need a way for
FileRetriever to return a Verification. But, since the file retrieved
may be encrypted or chunked, it would be extra work to always
incrementally checksum the file while retrieving it. Hm.
Some other special remotes use fileRetriever, and so don't get incremental
verification, but could be converted to byteRetriever later. One is
GitLFS, which uses downloadConduit, which writes to the file, so could
verify as it goes. Other special remotes like web could too, but don't
use Remote.Helper.Special and so will need to be addressed separately.
Sponsored-by: Dartmouth College's DANDI project
2021-08-11 17:43:30 +00:00
|
|
|
{-# LANGUAGE RankNTypes #-}
|
2023-01-04 21:33:29 +00:00
|
|
|
{-# LANGUAGE CPP #-}
|
2019-11-27 20:54:11 +00:00
|
|
|
|
2019-11-18 20:09:09 +00:00
|
|
|
module Remote.GitLFS (remote, gen, configKnownUrl) where
|
2019-08-01 19:11:45 +00:00
|
|
|
|
|
|
|
import Annex.Common
|
|
|
|
import Types.Remote
|
|
|
|
import Annex.Url
|
2019-08-02 17:56:55 +00:00
|
|
|
import Types.Key
|
2019-08-01 19:11:45 +00:00
|
|
|
import Types.Creds
|
2020-01-10 18:10:20 +00:00
|
|
|
import Types.ProposedAccepted
|
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
2020-06-26 17:46:42 +00:00
|
|
|
import Types.NumCopies
|
2019-08-05 15:29:32 +00:00
|
|
|
import qualified Annex
|
2019-08-01 19:11:45 +00:00
|
|
|
import qualified Git
|
2019-08-02 16:38:14 +00:00
|
|
|
import qualified Git.Types as Git
|
|
|
|
import qualified Git.Url
|
2019-11-18 20:09:09 +00:00
|
|
|
import qualified Git.Remote
|
2019-08-05 17:24:21 +00:00
|
|
|
import qualified Git.GCrypt
|
2019-09-24 18:46:20 +00:00
|
|
|
import qualified Git.Credential as Git
|
2019-08-01 19:11:45 +00:00
|
|
|
import Config
|
|
|
|
import Config.Cost
|
2020-01-14 16:35:08 +00:00
|
|
|
import Annex.SpecialRemote.Config
|
2019-08-01 19:11:45 +00:00
|
|
|
import Remote.Helper.Special
|
|
|
|
import Remote.Helper.ExportImport
|
|
|
|
import Remote.Helper.Git
|
2019-08-02 17:56:55 +00:00
|
|
|
import Remote.Helper.Http
|
2019-08-05 17:24:21 +00:00
|
|
|
import qualified Remote.GCrypt
|
2019-08-01 19:11:45 +00:00
|
|
|
import Annex.Ssh
|
|
|
|
import Annex.UUID
|
2019-08-04 16:32:36 +00:00
|
|
|
import Crypto
|
2019-08-02 17:56:55 +00:00
|
|
|
import Backend.Hash
|
|
|
|
import Utility.Hash
|
2019-08-04 16:32:36 +00:00
|
|
|
import Utility.SshHost
|
2020-01-22 20:13:48 +00:00
|
|
|
import Utility.Url
|
2019-11-18 20:09:09 +00:00
|
|
|
import Logs.Remote
|
2019-08-05 15:05:59 +00:00
|
|
|
import Logs.RemoteState
|
2019-11-18 20:09:09 +00:00
|
|
|
import qualified Git.Config
|
2019-08-01 19:11:45 +00:00
|
|
|
|
2023-01-04 21:33:29 +00:00
|
|
|
#ifdef WITH_GIT_LFS
|
2020-06-22 15:21:25 +00:00
|
|
|
import qualified Network.GitLFS as LFS
|
2023-01-04 21:33:29 +00:00
|
|
|
#else
|
|
|
|
import qualified Utility.GitLFS as LFS
|
|
|
|
#endif
|
|
|
|
|
2019-08-01 19:11:45 +00:00
|
|
|
import Control.Concurrent.STM
|
2019-08-02 16:38:14 +00:00
|
|
|
import Data.String
|
2019-08-03 16:21:28 +00:00
|
|
|
import Network.HTTP.Types
|
2019-08-04 16:32:36 +00:00
|
|
|
import Network.HTTP.Client hiding (port)
|
2019-08-01 19:11:45 +00:00
|
|
|
import qualified Data.Map as M
|
2019-08-02 17:56:55 +00:00
|
|
|
import qualified Data.ByteString.Lazy as L
|
2021-10-06 00:20:08 +00:00
|
|
|
import qualified Data.ByteString.Short as S (fromShort)
|
2019-08-02 17:56:55 +00:00
|
|
|
import qualified Data.Text as T
|
|
|
|
import qualified Data.Text.Encoding as E
|
2019-09-24 17:01:51 +00:00
|
|
|
import qualified Control.Concurrent.MSemN as MSemN
|
2019-08-01 19:11:45 +00:00
|
|
|
|
|
|
|
remote :: RemoteType
|
2020-01-14 16:35:08 +00:00
|
|
|
remote = specialRemoteType $ RemoteType
|
2019-08-01 19:11:45 +00:00
|
|
|
{ typename = "git-lfs"
|
|
|
|
-- Remote.Git takes care of enumerating git-lfs remotes too,
|
|
|
|
-- and will call our gen on them.
|
|
|
|
, enumerate = const (return [])
|
|
|
|
, generate = gen
|
2020-01-14 17:18:15 +00:00
|
|
|
, configParser = mkRemoteConfigParser
|
2020-01-20 19:20:04 +00:00
|
|
|
[ optionalStringParser urlField
|
|
|
|
(FieldDesc "url of git-lfs repository")
|
|
|
|
]
|
2019-08-01 19:11:45 +00:00
|
|
|
, setup = mySetup
|
|
|
|
, exportSupported = exportUnsupported
|
|
|
|
, importSupported = importUnsupported
|
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.
So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that the tree is recorded in the export log, and gets grafted
into the git-annex branch.
importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.
It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.
Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.
(Untested and unused as of yet.)
This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 18:52:57 +00:00
|
|
|
, thirdPartyPopulated = False
|
2019-08-01 19:11:45 +00:00
|
|
|
}
|
|
|
|
|
2020-01-14 16:35:08 +00:00
|
|
|
urlField :: RemoteConfigField
|
|
|
|
urlField = Accepted "url"
|
|
|
|
|
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.
Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)
To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.
And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.
(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 21:20:56 +00:00
|
|
|
gen :: Git.Repo -> UUID -> RemoteConfig -> RemoteGitConfig -> RemoteStateHandle -> Annex (Maybe Remote)
|
|
|
|
gen r u rc gc rs = do
|
|
|
|
c <- parsedRemoteConfig remote rc
|
2023-03-14 02:39:16 +00:00
|
|
|
-- If the repo uses gcrypt, get the underlying repo without the
|
2019-08-05 17:24:21 +00:00
|
|
|
-- gcrypt url, to do LFS endpoint discovery on.
|
|
|
|
r' <- if Git.GCrypt.isEncrypted r
|
|
|
|
then do
|
|
|
|
g <- Annex.gitRepo
|
|
|
|
liftIO $ Git.GCrypt.encryptedRemote g r
|
|
|
|
else pure r
|
2019-09-24 17:01:51 +00:00
|
|
|
sem <- liftIO $ MSemN.new 1
|
|
|
|
h <- liftIO $ newTVarIO $ LFSHandle Nothing Nothing sem r' gc
|
2023-01-12 17:42:28 +00:00
|
|
|
cst <- remoteCost gc c expensiveRemoteCost
|
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.
Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)
To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.
And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.
(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 21:20:56 +00:00
|
|
|
let specialcfg = (specialRemoteCfg c)
|
|
|
|
-- chunking would not improve git-lfs
|
|
|
|
{ chunkConfig = NoChunks
|
|
|
|
}
|
2019-08-01 19:11:45 +00:00
|
|
|
return $ Just $ specialRemote' specialcfg c
|
2020-05-13 15:50:31 +00:00
|
|
|
(store rs h)
|
|
|
|
(retrieve rs h)
|
|
|
|
(remove h)
|
|
|
|
(checkKey rs h)
|
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
2020-06-26 17:46:42 +00:00
|
|
|
(this c cst h)
|
2019-08-01 19:11:45 +00:00
|
|
|
where
|
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
2020-06-26 17:46:42 +00:00
|
|
|
this c cst h = Remote
|
2019-08-01 19:11:45 +00:00
|
|
|
{ uuid = u
|
|
|
|
, cost = cst
|
|
|
|
, name = Git.repoDescribe r
|
|
|
|
, storeKey = storeKeyDummy
|
2020-05-13 21:05:56 +00:00
|
|
|
, retrieveKeyFile = retrieveKeyFileDummy
|
|
|
|
, retrieveKeyFileCheap = Nothing
|
2019-08-01 19:11:45 +00:00
|
|
|
-- content stored on git-lfs is hashed with SHA256
|
|
|
|
-- no matter what git-annex key it's for, and the hash
|
|
|
|
-- is checked on download
|
|
|
|
, retrievalSecurityPolicy = RetrievalAllKeysSecure
|
|
|
|
, removeKey = removeKeyDummy
|
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
2020-06-26 17:46:42 +00:00
|
|
|
, lockContent = Just $ lockKey (this c cst h) rs h
|
2019-08-01 19:11:45 +00:00
|
|
|
, checkPresent = checkPresentDummy
|
|
|
|
, checkPresentCheap = False
|
|
|
|
, exportActions = exportUnsupported
|
|
|
|
, importActions = importUnsupported
|
|
|
|
, whereisKey = Nothing
|
|
|
|
, remoteFsck = Nothing
|
|
|
|
, repairRepo = Nothing
|
|
|
|
, config = c
|
|
|
|
, getRepo = return r
|
|
|
|
, gitconfig = gc
|
|
|
|
, localpath = Nothing
|
|
|
|
, remotetype = remote
|
|
|
|
, availability = GloballyAvailable
|
|
|
|
, readonly = False
|
|
|
|
-- content cannot be removed from a git-lfs repo
|
|
|
|
, appendonly = True
|
2020-12-28 19:08:53 +00:00
|
|
|
, untrustworthy = False
|
2019-08-01 19:11:45 +00:00
|
|
|
, mkUnavailable = return Nothing
|
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
2020-06-26 17:46:42 +00:00
|
|
|
, getInfo = gitRepoInfo (this c cst h)
|
2019-08-01 19:11:45 +00:00
|
|
|
, claimUrl = Nothing
|
|
|
|
, checkUrl = Nothing
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
, remoteStateHandle = rs
|
2019-08-01 19:11:45 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
mySetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID)
|
2020-05-07 19:59:29 +00:00
|
|
|
mySetup ss mu _ c gc = do
|
2019-08-01 19:11:45 +00:00
|
|
|
u <- maybe (liftIO genUUID) return mu
|
|
|
|
|
2019-08-05 17:24:21 +00:00
|
|
|
(c', _encsetup) <- encryptionSetup c gc
|
add LISTCONFIGS to external special remote protocol
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.
The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.
configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.
Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.
Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.
initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
2020-01-17 19:30:14 +00:00
|
|
|
pc <- either giveup return . parseRemoteConfig c' =<< configParser remote c'
|
2020-05-07 19:59:29 +00:00
|
|
|
let failinitunlessforced msg = case ss of
|
2022-06-28 19:28:14 +00:00
|
|
|
Init -> unlessM (Annex.getRead Annex.force) (giveup msg)
|
2020-05-07 19:59:29 +00:00
|
|
|
Enable _ -> noop
|
2021-03-17 13:41:12 +00:00
|
|
|
AutoEnable _ -> noop
|
2020-01-14 16:35:08 +00:00
|
|
|
case (isEncrypted pc, Git.GCrypt.urlPrefix `isPrefixOf` url) of
|
2019-08-05 17:24:21 +00:00
|
|
|
(False, False) -> noop
|
2020-01-14 16:35:08 +00:00
|
|
|
(True, True) -> Remote.GCrypt.setGcryptEncryption pc remotename
|
2020-05-07 19:59:29 +00:00
|
|
|
(True, False) -> failinitunlessforced $ unwords $
|
|
|
|
[ "Encryption is enabled for this remote,"
|
|
|
|
, "but only the files that git-annex stores on"
|
|
|
|
, "it would be encrypted; "
|
|
|
|
, "anything that git push sends to it would"
|
|
|
|
, "not be encrypted. Recommend prefixing the"
|
|
|
|
, "url with \"gcrypt::\" to also encrypt"
|
|
|
|
, "git pushes."
|
|
|
|
, "(Use --force if you want to use this"
|
|
|
|
, "likely insecure configuration.)"
|
|
|
|
]
|
|
|
|
(False, True) -> failinitunlessforced $ unwords
|
|
|
|
[ "You used a \"gcrypt::\" url for this remote,"
|
|
|
|
, "but encryption=none prevents git-annex"
|
|
|
|
, "from encrypting files it stores there."
|
|
|
|
, "(Use --force if you want to use this"
|
|
|
|
, "likely insecure configuration.)"
|
|
|
|
]
|
2019-08-01 19:11:45 +00:00
|
|
|
|
2019-11-18 20:09:09 +00:00
|
|
|
-- Set up remote.name.url to point to the repo,
|
2019-08-01 19:11:45 +00:00
|
|
|
-- (so it's also usable by git as a non-special remote),
|
2019-11-18 20:09:09 +00:00
|
|
|
-- and set remote.name.annex-git-lfs = true
|
|
|
|
gitConfigSpecialRemote u c' [("git-lfs", "true")]
|
2020-03-02 19:50:40 +00:00
|
|
|
setConfig (remoteConfig c "url") url
|
2019-11-18 20:09:09 +00:00
|
|
|
return (c', u)
|
2019-08-05 17:24:21 +00:00
|
|
|
where
|
2020-01-10 18:10:20 +00:00
|
|
|
url = maybe (giveup "Specify url=") fromProposedAccepted
|
2020-01-14 16:35:08 +00:00
|
|
|
(M.lookup urlField c)
|
2019-10-10 17:08:17 +00:00
|
|
|
remotename = fromJust (lookupName c)
|
2019-08-01 19:11:45 +00:00
|
|
|
|
2019-11-18 20:09:09 +00:00
|
|
|
{- Check if a remote's url is one known to belong to a git-lfs repository.
|
|
|
|
- If so, set the necessary configuration to enable using the remote
|
|
|
|
- with git-lfs. -}
|
|
|
|
configKnownUrl :: Git.Repo -> Annex (Maybe Git.Repo)
|
|
|
|
configKnownUrl r
|
|
|
|
| Git.repoIsUrl r = do
|
2020-09-22 17:52:26 +00:00
|
|
|
m <- remoteConfigMap
|
2019-11-18 20:09:09 +00:00
|
|
|
g <- Annex.gitRepo
|
2020-09-22 17:52:26 +00:00
|
|
|
case Annex.SpecialRemote.Config.findByRemoteConfig (match g) m of
|
2019-11-18 20:09:09 +00:00
|
|
|
((u, _, mcu):[]) -> Just <$> go u mcu
|
|
|
|
_ -> return Nothing
|
|
|
|
| otherwise = return Nothing
|
|
|
|
where
|
|
|
|
match g c = fromMaybe False $ do
|
2020-01-10 18:10:20 +00:00
|
|
|
t <- fromProposedAccepted
|
|
|
|
<$> M.lookup Annex.SpecialRemote.Config.typeField c
|
|
|
|
u <- fromProposedAccepted
|
2020-01-14 16:35:08 +00:00
|
|
|
<$> M.lookup urlField c
|
2023-03-23 19:19:04 +00:00
|
|
|
let u' = Git.Remote.parseRemoteLocation u False g
|
2019-11-18 20:09:09 +00:00
|
|
|
return $ Git.Remote.RemoteUrl (Git.repoLocation r) == u'
|
|
|
|
&& t == typename remote
|
|
|
|
go u mcu = do
|
|
|
|
r' <- set "uuid" (fromUUID u) =<< set "git-lfs" "true" r
|
|
|
|
case mcu of
|
|
|
|
Just (Annex.SpecialRemote.Config.ConfigFrom cu) ->
|
|
|
|
set "config-uuid" (fromUUID cu) r'
|
|
|
|
Nothing -> return r'
|
|
|
|
set k v r' = do
|
2020-02-19 17:45:11 +00:00
|
|
|
let k' = remoteAnnexConfig r' k
|
2019-12-02 14:57:09 +00:00
|
|
|
setConfig k' v
|
2021-08-11 00:45:02 +00:00
|
|
|
return $ Git.Config.store' k' (Git.ConfigValue (encodeBS v)) r'
|
2019-11-18 20:09:09 +00:00
|
|
|
|
2019-08-02 16:38:14 +00:00
|
|
|
data LFSHandle = LFSHandle
|
|
|
|
{ downloadEndpoint :: Maybe LFS.Endpoint
|
|
|
|
, uploadEndpoint :: Maybe LFS.Endpoint
|
2019-09-24 17:01:51 +00:00
|
|
|
, getEndPointLock :: MSemN.MSemN Int
|
2019-08-02 16:38:14 +00:00
|
|
|
, remoteRepo :: Git.Repo
|
|
|
|
, remoteGitConfig :: RemoteGitConfig
|
|
|
|
}
|
2019-08-01 19:11:45 +00:00
|
|
|
|
2019-09-24 17:01:51 +00:00
|
|
|
-- Only let one thread at a time do endpoint discovery.
|
|
|
|
withEndPointLock :: LFSHandle -> Annex a -> Annex a
|
|
|
|
withEndPointLock h = bracket_
|
|
|
|
(liftIO $ MSemN.wait l 1)
|
|
|
|
(liftIO $ MSemN.signal l 1)
|
|
|
|
where
|
|
|
|
l = getEndPointLock h
|
|
|
|
|
2019-08-02 16:38:14 +00:00
|
|
|
discoverLFSEndpoint :: LFS.TransferRequestOperation -> LFSHandle -> Annex (Maybe LFS.Endpoint)
|
|
|
|
discoverLFSEndpoint tro h
|
|
|
|
| Git.repoIsSsh r = gossh
|
|
|
|
| Git.repoIsHttp r = gohttp
|
2019-09-24 18:46:20 +00:00
|
|
|
| otherwise = unsupportedurischeme
|
2019-08-02 16:38:14 +00:00
|
|
|
where
|
|
|
|
r = remoteRepo h
|
|
|
|
lfsrepouri = case Git.location r of
|
|
|
|
Git.Url u -> u
|
|
|
|
_ -> giveup $ "unsupported git-lfs remote location " ++ Git.repoLocation r
|
2019-09-24 18:46:20 +00:00
|
|
|
|
|
|
|
unsupportedurischeme = do
|
|
|
|
warning "git-lfs endpoint has unsupported URI scheme"
|
|
|
|
return Nothing
|
|
|
|
|
2019-08-02 16:38:14 +00:00
|
|
|
gossh = case mkSshHost <$> Git.Url.hostuser r of
|
|
|
|
Nothing -> do
|
|
|
|
warning "Unable to parse ssh url for git-lfs remote."
|
|
|
|
return Nothing
|
|
|
|
Just (Left err) -> do
|
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 18:47:32 +00:00
|
|
|
warning (UnquotedString err)
|
2019-08-02 16:38:14 +00:00
|
|
|
return Nothing
|
|
|
|
Just (Right hostuser) -> do
|
|
|
|
let port = Git.Url.port r
|
2021-01-18 19:07:23 +00:00
|
|
|
let p = fromMaybe (error "unknown path")
|
|
|
|
(Git.Url.path r)
|
2019-08-02 16:38:14 +00:00
|
|
|
-- Remove leading /~/ from path. That is added when
|
|
|
|
-- converting a scp-style repository location with
|
|
|
|
-- a relative path into an url, and is legal
|
|
|
|
-- according to git-clone(1), but github does not
|
|
|
|
-- support it.
|
2021-01-18 19:07:23 +00:00
|
|
|
let remotepath = if "/~/" `isPrefixOf` p
|
|
|
|
then drop 3 p
|
|
|
|
else p
|
2019-08-02 16:38:14 +00:00
|
|
|
let ps = LFS.sshDiscoverEndpointCommand remotepath tro
|
|
|
|
-- Note that no shellEscape is done here, because
|
|
|
|
-- at least github's git-lfs implementation does
|
|
|
|
-- not allow for shell quoting.
|
|
|
|
let remotecmd = unwords ps
|
|
|
|
(sshcommand, sshparams) <- sshCommand NoConsumeStdin (hostuser, port) (remoteGitConfig h) remotecmd
|
|
|
|
liftIO (tryIO (readProcess sshcommand (toCommand sshparams))) >>= \case
|
|
|
|
Left err -> do
|
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 18:47:32 +00:00
|
|
|
warning $ UnquotedString $ "ssh connection to git-lfs remote failed: " ++ show err
|
2019-08-02 16:38:14 +00:00
|
|
|
return Nothing
|
|
|
|
Right resp -> case LFS.parseSshDiscoverEndpointResponse (fromString resp) of
|
|
|
|
Nothing -> do
|
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.
Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.
When json is being output, no quoting is done, since json gets its own
quoting.
This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.
Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 18:47:32 +00:00
|
|
|
warning "unexpected response from git-lfs remote when doing ssh endpoint discovery"
|
2019-08-02 16:38:14 +00:00
|
|
|
return Nothing
|
|
|
|
Just endpoint -> return (Just endpoint)
|
2019-09-24 18:46:20 +00:00
|
|
|
|
|
|
|
-- The endpoint may or may not need http basic authentication,
|
|
|
|
-- which involves using git-credential to prompt for the password.
|
|
|
|
--
|
|
|
|
-- To determine if it does, make a download or upload request to
|
|
|
|
-- it, not including any objects in the request, and see if
|
|
|
|
-- the server requests authentication.
|
|
|
|
gohttp = case LFS.guessEndpoint lfsrepouri of
|
|
|
|
Nothing -> unsupportedurischeme
|
2019-09-24 21:59:49 +00:00
|
|
|
Just endpoint -> do
|
|
|
|
let testreq = LFS.startTransferRequest endpoint transfernothing
|
|
|
|
flip catchNonAsync (const (returnendpoint endpoint)) $ do
|
|
|
|
resp <- makeSmallAPIRequest testreq
|
|
|
|
if needauth (responseStatus resp)
|
|
|
|
then do
|
2019-09-24 22:06:10 +00:00
|
|
|
cred <- prompt $ inRepo $ Git.getUrlCredential (show lfsrepouri)
|
2020-01-22 20:13:48 +00:00
|
|
|
let endpoint' = addbasicauth (Git.credentialBasicAuth cred) endpoint
|
2019-09-24 21:59:49 +00:00
|
|
|
let testreq' = LFS.startTransferRequest endpoint' transfernothing
|
|
|
|
flip catchNonAsync (const (returnendpoint endpoint')) $ do
|
2020-10-19 15:14:01 +00:00
|
|
|
resp' <- makeSmallAPIRequest testreq'
|
|
|
|
inRepo $ if needauth (responseStatus resp')
|
|
|
|
then Git.rejectUrlCredential cred
|
|
|
|
else Git.approveUrlCredential cred
|
|
|
|
returnendpoint endpoint'
|
2019-09-24 21:59:49 +00:00
|
|
|
else returnendpoint endpoint
|
2019-09-24 18:46:20 +00:00
|
|
|
where
|
|
|
|
transfernothing = LFS.TransferRequest
|
|
|
|
{ LFS.req_operation = tro
|
|
|
|
, LFS.req_transfers = [LFS.Basic]
|
|
|
|
, LFS.req_ref = Nothing
|
|
|
|
, LFS.req_objects = []
|
|
|
|
}
|
2019-09-24 21:59:49 +00:00
|
|
|
returnendpoint = return . Just
|
2019-09-24 18:46:20 +00:00
|
|
|
|
|
|
|
needauth status = status == unauthorized401
|
|
|
|
|
2020-01-22 20:13:48 +00:00
|
|
|
addbasicauth (Just ba) endpoint =
|
|
|
|
LFS.modifyEndpointRequest endpoint $
|
|
|
|
applyBasicAuth' ba
|
|
|
|
addbasicauth Nothing endpoint = endpoint
|
2019-08-02 16:38:14 +00:00
|
|
|
|
|
|
|
-- The endpoint is cached for later use.
|
|
|
|
getLFSEndpoint :: LFS.TransferRequestOperation -> TVar LFSHandle -> Annex (Maybe LFS.Endpoint)
|
|
|
|
getLFSEndpoint tro hv = do
|
|
|
|
h <- liftIO $ atomically $ readTVar hv
|
|
|
|
case f h of
|
|
|
|
Just endpoint -> return (Just endpoint)
|
2019-09-24 17:01:51 +00:00
|
|
|
Nothing -> withEndPointLock h $ discoverLFSEndpoint tro h >>= \case
|
2019-08-02 16:38:14 +00:00
|
|
|
Just endpoint -> do
|
|
|
|
liftIO $ atomically $ writeTVar hv $
|
|
|
|
case tro of
|
|
|
|
LFS.RequestDownload ->
|
|
|
|
h { downloadEndpoint = Just endpoint }
|
|
|
|
LFS.RequestUpload ->
|
|
|
|
h { uploadEndpoint = Just endpoint }
|
|
|
|
return (Just endpoint)
|
|
|
|
Nothing -> return Nothing
|
|
|
|
where
|
|
|
|
f = case tro of
|
|
|
|
LFS.RequestDownload -> downloadEndpoint
|
|
|
|
LFS.RequestUpload -> uploadEndpoint
|
|
|
|
|
2019-08-03 16:51:16 +00:00
|
|
|
-- Make an API request that is expected to have a small response body.
|
|
|
|
-- Not for use in downloading an object.
|
|
|
|
makeSmallAPIRequest :: Request -> Annex (Response L.ByteString)
|
|
|
|
makeSmallAPIRequest req = do
|
2019-08-03 15:30:06 +00:00
|
|
|
uo <- getUrlOptions
|
|
|
|
let req' = applyRequest uo req
|
2021-04-06 19:41:24 +00:00
|
|
|
fastDebug "Remote.GitLFS" (show req')
|
2019-08-03 15:30:06 +00:00
|
|
|
resp <- liftIO $ httpLbs req' (httpManager uo)
|
|
|
|
-- Only debug the http status code, not the json
|
|
|
|
-- which may include an authentication token.
|
2021-04-06 19:41:24 +00:00
|
|
|
fastDebug "Remote.GitLFS" (show $ responseStatus resp)
|
2019-08-03 15:30:06 +00:00
|
|
|
return resp
|
|
|
|
|
2019-08-02 17:56:55 +00:00
|
|
|
sendTransferRequest
|
|
|
|
:: LFS.IsTransferResponseOperation op
|
|
|
|
=> LFS.TransferRequest
|
|
|
|
-> LFS.Endpoint
|
|
|
|
-> Annex (Either String (LFS.TransferResponse op))
|
2019-09-24 21:59:49 +00:00
|
|
|
sendTransferRequest req endpoint = do
|
|
|
|
let httpreq = LFS.startTransferRequest endpoint req
|
|
|
|
httpresp <- makeSmallAPIRequest $ setRequestCheckStatus httpreq
|
|
|
|
return $ case LFS.parseTransferResponse (responseBody httpresp) of
|
|
|
|
LFS.ParsedTransferResponse resp -> Right resp
|
|
|
|
LFS.ParsedTransferResponseError tro -> Left $
|
|
|
|
T.unpack $ LFS.resperr_message tro
|
|
|
|
LFS.ParseFailed err -> Left err
|
2019-08-02 17:56:55 +00:00
|
|
|
|
|
|
|
extractKeySha256 :: Key -> Maybe LFS.SHA256
|
2019-11-22 20:24:04 +00:00
|
|
|
extractKeySha256 k = case fromKey keyVariety k of
|
2019-08-02 17:56:55 +00:00
|
|
|
SHA2Key (HashSize 256) (HasExt hasext)
|
|
|
|
| hasext -> eitherToMaybe $ E.decodeUtf8' (keyHash k)
|
2021-10-06 00:20:08 +00:00
|
|
|
| otherwise -> eitherToMaybe $ E.decodeUtf8' $ S.fromShort (fromKey keyName k)
|
2019-08-02 17:56:55 +00:00
|
|
|
_ -> Nothing
|
|
|
|
|
|
|
|
-- The size of an encrypted key is the size of the input data, but we need
|
|
|
|
-- the actual object size.
|
|
|
|
extractKeySize :: Key -> Maybe Integer
|
|
|
|
extractKeySize k
|
|
|
|
| isEncKey k = Nothing
|
2019-11-22 20:24:04 +00:00
|
|
|
| otherwise = fromKey keySize k
|
2019-08-02 17:56:55 +00:00
|
|
|
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
mkUploadRequest :: RemoteStateHandle -> Key -> FilePath -> Annex (LFS.TransferRequest, LFS.SHA256, Integer)
|
|
|
|
mkUploadRequest rs k content = case (extractKeySha256 k, extractKeySize k) of
|
2019-08-05 15:05:59 +00:00
|
|
|
(Just sha256, Just size) ->
|
|
|
|
ret sha256 size
|
|
|
|
(_, Just size) -> do
|
|
|
|
sha256 <- calcsha256
|
|
|
|
remembersha256 sha256
|
|
|
|
ret sha256 size
|
|
|
|
_ -> do
|
|
|
|
sha256 <- calcsha256
|
2020-11-05 15:26:34 +00:00
|
|
|
size <- liftIO $ getFileSize (toRawFilePath content)
|
2019-08-05 15:05:59 +00:00
|
|
|
rememberboth sha256 size
|
|
|
|
ret sha256 size
|
2019-08-03 16:21:28 +00:00
|
|
|
where
|
2019-08-05 15:05:59 +00:00
|
|
|
calcsha256 = liftIO $ T.pack . show . sha2_256 <$> L.readFile content
|
|
|
|
ret sha256 size = do
|
2019-08-03 16:21:28 +00:00
|
|
|
let obj = LFS.TransferRequestObject
|
|
|
|
{ LFS.req_oid = sha256
|
2019-08-05 15:05:59 +00:00
|
|
|
, LFS.req_size = size
|
2019-08-03 16:21:28 +00:00
|
|
|
}
|
2019-08-03 20:23:47 +00:00
|
|
|
let req = LFS.TransferRequest
|
2019-08-05 15:05:59 +00:00
|
|
|
{ LFS.req_operation = LFS.RequestUpload
|
2019-08-03 16:21:28 +00:00
|
|
|
, LFS.req_transfers = [LFS.Basic]
|
|
|
|
, LFS.req_ref = Nothing
|
|
|
|
, LFS.req_objects = [obj]
|
|
|
|
}
|
2019-08-05 15:05:59 +00:00
|
|
|
return (req, sha256, size)
|
|
|
|
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
remembersha256 sha256 = setRemoteState rs k (T.unpack sha256)
|
|
|
|
rememberboth sha256 size = setRemoteState rs k $
|
2019-08-05 15:05:59 +00:00
|
|
|
show size ++ " " ++ T.unpack sha256
|
2019-08-03 16:21:28 +00:00
|
|
|
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
mkDownloadRequest :: RemoteStateHandle -> Key -> Annex (Maybe (LFS.TransferRequest, LFS.SHA256, Integer))
|
|
|
|
mkDownloadRequest rs k = case (extractKeySha256 k, extractKeySize k) of
|
2019-08-05 15:05:59 +00:00
|
|
|
(Just sha256, Just size) -> ret sha256 size
|
|
|
|
(_, Just size) ->
|
|
|
|
remembersha256 >>= \case
|
|
|
|
Just sha256 -> ret sha256 size
|
|
|
|
Nothing -> return Nothing
|
|
|
|
_ -> do
|
|
|
|
rememberboth >>= \case
|
|
|
|
Just (sha256, size) -> ret sha256 size
|
|
|
|
Nothing -> return Nothing
|
|
|
|
where
|
|
|
|
ret sha256 size = do
|
2019-08-02 17:56:55 +00:00
|
|
|
let obj = LFS.TransferRequestObject
|
|
|
|
{ LFS.req_oid = sha256
|
|
|
|
, LFS.req_size = size
|
|
|
|
}
|
|
|
|
let req = LFS.TransferRequest
|
2019-08-05 15:05:59 +00:00
|
|
|
{ LFS.req_operation = LFS.RequestDownload
|
2019-08-02 17:56:55 +00:00
|
|
|
, LFS.req_transfers = [LFS.Basic]
|
|
|
|
, LFS.req_ref = Nothing
|
|
|
|
, LFS.req_objects = [obj]
|
|
|
|
}
|
2019-08-05 15:05:59 +00:00
|
|
|
return $ Just (req, sha256, size)
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
remembersha256 = fmap T.pack <$> getRemoteState rs k
|
|
|
|
rememberboth = maybe Nothing parse <$> getRemoteState rs k
|
2019-08-05 15:05:59 +00:00
|
|
|
where
|
|
|
|
parse s = case words s of
|
|
|
|
[ssize, ssha256] -> do
|
|
|
|
size <- readish ssize
|
|
|
|
return (T.pack ssha256, size)
|
|
|
|
_ -> Nothing
|
|
|
|
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
store :: RemoteStateHandle -> TVar LFSHandle -> Storer
|
|
|
|
store rs h = fileStorer $ \k src p -> getLFSEndpoint LFS.RequestUpload h >>= \case
|
2020-05-13 18:03:00 +00:00
|
|
|
Nothing -> giveup "unable to connect to git-lfs endpoint"
|
|
|
|
Just endpoint -> do
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
(req, sha256, size) <- mkUploadRequest rs k src
|
2019-08-02 17:56:55 +00:00
|
|
|
sendTransferRequest req endpoint >>= \case
|
|
|
|
Right resp -> do
|
2021-11-10 17:51:11 +00:00
|
|
|
let body (LFS.ServerSupportsChunks ssc) =
|
|
|
|
if ssc
|
|
|
|
then httpBodyStorerChunked src p
|
|
|
|
else RequestBodyIO $
|
|
|
|
httpBodyStorer src p
|
2019-08-02 17:56:55 +00:00
|
|
|
forM_ (LFS.objects resp) $
|
|
|
|
send body sha256 size
|
2020-05-13 18:03:00 +00:00
|
|
|
Left err -> giveup err
|
2019-08-02 17:56:55 +00:00
|
|
|
where
|
|
|
|
send body sha256 size tro
|
2019-08-03 20:23:47 +00:00
|
|
|
| LFS.resp_oid tro /= sha256 || LFS.resp_size tro /= size =
|
|
|
|
giveup "git-lfs server requested other object than the one we asked to send"
|
2019-08-02 17:56:55 +00:00
|
|
|
| otherwise = case LFS.resp_error tro of
|
|
|
|
Just err -> giveup $
|
|
|
|
T.unpack $ LFS.respobjerr_message err
|
|
|
|
Nothing -> case LFS.resp_actions tro of
|
|
|
|
Nothing -> noop
|
|
|
|
Just op -> case LFS.uploadOperationRequests op body sha256 size of
|
|
|
|
Nothing -> giveup "unable to parse git-lfs server upload url"
|
|
|
|
Just [] -> noop -- server already has it
|
2019-08-03 15:30:06 +00:00
|
|
|
Just reqs -> forM_ reqs $
|
2019-08-03 16:51:16 +00:00
|
|
|
makeSmallAPIRequest . setRequestCheckStatus
|
2019-08-02 16:38:14 +00:00
|
|
|
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
retrieve :: RemoteStateHandle -> TVar LFSHandle -> Retriever
|
2021-08-18 19:13:14 +00:00
|
|
|
retrieve rs h = fileRetriever' $ \dest k p iv -> getLFSEndpoint LFS.RequestDownload h >>= \case
|
2019-08-03 16:51:16 +00:00
|
|
|
Nothing -> giveup "unable to connect to git-lfs endpoint"
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
Just endpoint -> mkDownloadRequest rs k >>= \case
|
2019-08-03 16:51:16 +00:00
|
|
|
Nothing -> giveup "unable to download this object from git-lfs"
|
2019-08-03 20:23:47 +00:00
|
|
|
Just (req, sha256, size) -> sendTransferRequest req endpoint >>= \case
|
2019-08-03 16:51:16 +00:00
|
|
|
Left err -> giveup (show err)
|
|
|
|
Right resp -> case LFS.objects resp of
|
|
|
|
[] -> giveup "git-lfs server did not provide a way to download this object"
|
2019-08-03 20:23:47 +00:00
|
|
|
(tro:_)
|
|
|
|
| LFS.resp_oid tro /= sha256 || LFS.resp_size tro /= size ->
|
|
|
|
giveup "git-lfs server replied with other object than the one we requested"
|
2021-08-18 19:13:14 +00:00
|
|
|
| otherwise -> go dest p iv tro
|
2019-08-03 16:51:16 +00:00
|
|
|
where
|
2021-08-18 19:13:14 +00:00
|
|
|
go dest p iv tro = case LFS.resp_error tro of
|
2019-08-03 16:51:16 +00:00
|
|
|
Just err -> giveup $ T.unpack $ LFS.respobjerr_message err
|
|
|
|
Nothing -> case LFS.resp_actions tro of
|
|
|
|
Nothing -> giveup "git-lfs server did not provide a way to download this object"
|
|
|
|
Just op -> case LFS.downloadOperationRequest op of
|
|
|
|
Nothing -> giveup "unable to parse git-lfs server download url"
|
2019-08-04 16:32:36 +00:00
|
|
|
Just req -> do
|
|
|
|
uo <- getUrlOptions
|
2021-08-18 19:13:14 +00:00
|
|
|
liftIO $ downloadConduit p iv req (fromRawFilePath dest) uo
|
2019-08-02 16:38:14 +00:00
|
|
|
|
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.
In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.
Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.
Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.
So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.
This commit was sponsored by Ethan Aubin.
2020-06-26 17:46:42 +00:00
|
|
|
-- Since git-lfs does not support removing content, nothing needs to be
|
|
|
|
-- done to lock content in the remote, except for checking that the content
|
|
|
|
-- is actually present.
|
|
|
|
lockKey :: Remote -> RemoteStateHandle -> TVar LFSHandle -> Key -> (VerifiedCopy -> Annex a) -> Annex a
|
|
|
|
lockKey r rs h key callback =
|
|
|
|
ifM (checkKey rs h key)
|
|
|
|
( withVerifiedCopy LockedCopy (uuid r) (return True) callback
|
|
|
|
, giveup $ "content seems to be missing from " ++ name r
|
|
|
|
)
|
|
|
|
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
checkKey :: RemoteStateHandle -> TVar LFSHandle -> CheckPresent
|
|
|
|
checkKey rs h key = getLFSEndpoint LFS.RequestDownload h >>= \case
|
2019-08-02 16:38:14 +00:00
|
|
|
Nothing -> giveup "unable to connect to git-lfs endpoint"
|
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 16:33:27 +00:00
|
|
|
Just endpoint -> mkDownloadRequest rs key >>= \case
|
2019-08-03 16:21:28 +00:00
|
|
|
-- Unable to find enough information to request the key
|
|
|
|
-- from git-lfs, so it's not present there.
|
|
|
|
Nothing -> return False
|
2019-09-24 21:59:49 +00:00
|
|
|
Just (req, sha256, size) -> go sha256 size
|
|
|
|
=<< makeSmallAPIRequest (LFS.startTransferRequest endpoint req)
|
2019-08-03 16:21:28 +00:00
|
|
|
where
|
2019-08-03 20:23:47 +00:00
|
|
|
go sha256 size httpresp
|
|
|
|
| responseStatus httpresp == status200 = go' sha256 size $
|
|
|
|
LFS.parseTransferResponse (responseBody httpresp)
|
|
|
|
| otherwise = giveup $
|
|
|
|
"git-lfs server refused request: " ++ show (responseStatus httpresp)
|
2019-08-03 16:21:28 +00:00
|
|
|
|
2019-08-03 20:23:47 +00:00
|
|
|
go' :: LFS.SHA256 -> Integer -> LFS.ParsedTransferResponse LFS.DownloadOperation -> Annex Bool
|
|
|
|
go' _ _ (LFS.ParseFailed err) =
|
2019-08-03 16:21:28 +00:00
|
|
|
giveup $ "unable to parse response from git-lfs server: " ++ err
|
|
|
|
-- If the server responds with a json error message,
|
|
|
|
-- the content is presumably not present.
|
2019-08-03 20:23:47 +00:00
|
|
|
go' _ _ (LFS.ParsedTransferResponseError _) = return False
|
2019-08-03 16:21:28 +00:00
|
|
|
-- If the server responds with at least one download operation,
|
|
|
|
-- we will assume the content is present. We could also try to HEAD
|
|
|
|
-- that download, but there's no guarantee HEAD is supported, and
|
|
|
|
-- at most that would detect breakage where the server is confused
|
|
|
|
-- about what objects it has.
|
2019-08-03 20:23:47 +00:00
|
|
|
go' sha256 size (LFS.ParsedTransferResponse resp) =
|
|
|
|
case LFS.objects resp of
|
|
|
|
[] -> return False
|
|
|
|
(tro:_)
|
2019-08-04 16:43:16 +00:00
|
|
|
| isNothing (LFS.resp_actions tro) -> return False
|
|
|
|
| isJust (LFS.resp_error tro) -> return False
|
2019-08-03 20:23:47 +00:00
|
|
|
| LFS.resp_oid tro /= sha256 || LFS.resp_size tro /= size ->
|
|
|
|
giveup "git-lfs server replied with other object than the one we requested"
|
|
|
|
| otherwise -> return True
|
2019-08-01 19:11:45 +00:00
|
|
|
|
2019-08-02 16:38:14 +00:00
|
|
|
remove :: TVar LFSHandle -> Remover
|
2020-05-14 18:08:09 +00:00
|
|
|
remove _h _key = giveup "git-lfs does not support removing content"
|