2010-10-15 20:42:36 +00:00
|
|
|
{- git-annex "WORM" backend -- Write Once, Read Many
|
2010-10-27 20:53:54 +00:00
|
|
|
-
|
2015-01-21 16:50:09 +00:00
|
|
|
- Copyright 2010 Joey Hess <id@joeyh.name>
|
2010-10-27 20:53:54 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2010-10-27 20:53:54 +00:00
|
|
|
-}
|
2010-10-15 20:42:36 +00:00
|
|
|
|
2011-03-02 17:47:45 +00:00
|
|
|
module Backend.WORM (backends) where
|
2010-10-15 20:42:36 +00:00
|
|
|
|
2016-01-20 20:36:33 +00:00
|
|
|
import Annex.Common
|
2017-02-24 19:16:56 +00:00
|
|
|
import Types.Key
|
2011-06-02 01:56:04 +00:00
|
|
|
import Types.Backend
|
2012-06-20 20:07:14 +00:00
|
|
|
import Types.KeySource
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
import Backend.Utilities
|
2014-09-11 18:50:18 +00:00
|
|
|
import Git.FilePath
|
2019-06-25 15:37:52 +00:00
|
|
|
import Utility.Metered
|
2010-10-16 20:20:49 +00:00
|
|
|
|
2019-01-11 20:34:04 +00:00
|
|
|
import qualified Data.ByteString.Char8 as S8
|
2020-02-21 13:34:59 +00:00
|
|
|
import qualified Utility.RawFilePath as R
|
2019-01-11 20:34:04 +00:00
|
|
|
|
2011-12-31 08:11:39 +00:00
|
|
|
backends :: [Backend]
|
2011-03-02 17:47:45 +00:00
|
|
|
backends = [backend]
|
|
|
|
|
2011-12-31 08:11:39 +00:00
|
|
|
backend :: Backend
|
2012-06-05 23:51:03 +00:00
|
|
|
backend = Backend
|
2017-02-24 19:16:56 +00:00
|
|
|
{ backendVariety = WORMKey
|
2020-07-20 18:06:05 +00:00
|
|
|
, genKey = Just keyValue
|
2015-10-01 17:28:49 +00:00
|
|
|
, verifyKeyContent = Nothing
|
2021-02-09 19:00:51 +00:00
|
|
|
, verifyKeyContentIncrementally = Nothing
|
2017-08-17 19:09:38 +00:00
|
|
|
, canUpgradeKey = Just needsUpgrade
|
sped up the --all option by 2x to 16x by using git cat-file --buffer
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.
Keys have been prevented from containing newlines since 2011,
commit 480495beb4a3422f006ee529df807a20cc944727. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.
Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.
Thanks to Lukey for finding this optimisation.
2020-07-07 17:46:45 +00:00
|
|
|
, fastMigrate = Just removeProblemChars
|
2014-07-27 16:33:46 +00:00
|
|
|
, isStableKey = const True
|
2020-07-20 16:08:37 +00:00
|
|
|
, isCryptographicallySecure = const False
|
2012-06-05 23:51:03 +00:00
|
|
|
}
|
2010-10-15 20:42:36 +00:00
|
|
|
|
2011-03-16 01:34:13 +00:00
|
|
|
{- The key includes the file size, modification time, and the
|
2014-09-11 18:50:18 +00:00
|
|
|
- original filename relative to the top of the git repository.
|
2011-03-16 01:34:13 +00:00
|
|
|
-}
|
2020-05-15 16:51:09 +00:00
|
|
|
keyValue :: KeySource -> MeterUpdate -> Annex Key
|
2019-06-25 15:37:52 +00:00
|
|
|
keyValue source _ = do
|
2015-01-20 20:58:48 +00:00
|
|
|
let f = contentLocation source
|
2020-02-21 13:34:59 +00:00
|
|
|
stat <- liftIO $ R.getFileStatus f
|
2020-11-05 15:26:34 +00:00
|
|
|
sz <- liftIO $ getFileSize' f stat
|
2019-12-09 17:49:05 +00:00
|
|
|
relf <- fromRawFilePath . getTopFilePath
|
2020-02-21 13:34:59 +00:00
|
|
|
<$> inRepo (toTopFilePath $ keyFilename source)
|
2020-05-15 16:51:09 +00:00
|
|
|
return $ mkKey $ \k -> k
|
2015-01-06 21:58:57 +00:00
|
|
|
{ keyName = genKeyName relf
|
2017-02-24 19:16:56 +00:00
|
|
|
, keyVariety = WORMKey
|
2015-01-20 20:58:48 +00:00
|
|
|
, keySize = Just sz
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
, keyMtime = Just $ modificationTime stat
|
|
|
|
}
|
2017-08-17 19:09:38 +00:00
|
|
|
|
sped up the --all option by 2x to 16x by using git cat-file --buffer
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.
Keys have been prevented from containing newlines since 2011,
commit 480495beb4a3422f006ee529df807a20cc944727. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.
Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.
Thanks to Lukey for finding this optimisation.
2020-07-07 17:46:45 +00:00
|
|
|
{- Old WORM keys could contain spaces and carriage returns,
|
|
|
|
- and can be upgraded to remove them. -}
|
2017-08-17 19:09:38 +00:00
|
|
|
needsUpgrade :: Key -> Bool
|
sped up the --all option by 2x to 16x by using git cat-file --buffer
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.
Keys have been prevented from containing newlines since 2011,
commit 480495beb4a3422f006ee529df807a20cc944727. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.
Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.
Thanks to Lukey for finding this optimisation.
2020-07-07 17:46:45 +00:00
|
|
|
needsUpgrade key = any (`S8.elem` fromKey keyName key) [' ', '\r']
|
2017-08-17 19:09:38 +00:00
|
|
|
|
sped up the --all option by 2x to 16x by using git cat-file --buffer
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.
Keys have been prevented from containing newlines since 2011,
commit 480495beb4a3422f006ee529df807a20cc944727. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.
Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.
Thanks to Lukey for finding this optimisation.
2020-07-07 17:46:45 +00:00
|
|
|
removeProblemChars :: Key -> Backend -> AssociatedFile -> Annex (Maybe Key)
|
|
|
|
removeProblemChars oldkey newbackend _
|
2019-11-22 20:24:04 +00:00
|
|
|
| migratable = return $ Just $ alterKey oldkey $ \d -> d
|
|
|
|
{ keyName = encodeBS $ reSanitizeKeyName $ decodeBS $ keyName d }
|
2018-09-24 16:07:46 +00:00
|
|
|
| otherwise = return Nothing
|
2017-08-17 19:09:38 +00:00
|
|
|
where
|
|
|
|
migratable = oldvariety == newvariety
|
2019-11-22 20:24:04 +00:00
|
|
|
oldvariety = fromKey keyVariety oldkey
|
2017-08-17 19:09:38 +00:00
|
|
|
newvariety = backendVariety newbackend
|