Add and use numcopiesneeded preferred content expression.

* Add numcopiesneeded preferred content expression.
* Client, transfer, incremental backup, and archive repositories
  now want to get content that does not yet have enough copies.

This means the asssistant will make copies of files that don't yet
meet the configured numcopies, even to places that would not normally want
the file.

For example, if numcopies is 4, and there are 2 client repos and
2 transfer repos, and 2 removable backup drives, the file will be sent
to both transfer repos in order to make 4 copies. Once a removable drive
get a copy of the file, it will be dropped from one transfer repo or the
other (but not both).

Another example, numcopies is 3 and there is a client that has a backup
removable drive and two small archive repos. Normally once one of the small
archives has a file, it will not be put into the other one. But, to satisfy
numcopies, the assistant will duplicate it into the other small archive
too, if the backup repo is not available to receive the file.

I notice that these examples are fairly unlikely setups .. the old behavior
was not too bad, but it's nice to finally have it really correct.

.. Almost. I have skipped checking the annex.numcopies .gitattributes
out of fear it will be too slow.

This commit was sponsored by Florian Schlegel.
This commit is contained in:
Joey Hess 2014-01-20 17:34:58 -04:00
parent 5ddbd24a1c
commit 3159da2693
8 changed files with 52 additions and 9 deletions

View file

@ -70,6 +70,7 @@ parseToken checkpresent checkpreferreddir groupmap t
[ ("include", limitInclude)
, ("exclude", limitExclude)
, ("copies", limitCopies)
, ("numcopiesneeded", limitNumCopiesNeeded)
, ("inbackend", limitInBackend)
, ("largerthan", limitSize (>))
, ("smallerthan", limitSize (<))

View file

@ -41,6 +41,8 @@ options = Option.common ++
"match files present in a remote"
, Option ['C'] ["copies"] (ReqArg Limit.addCopies paramNumber)
"skip files with fewer copies"
, Option [] ["numcopiesneeded"] (ReqArg Limit.addNumCopiesNeeded paramNumber)
"match files that need more copies"
, Option ['B'] ["inbackend"] (ReqArg Limit.addInBackend paramName)
"match files using a key-value backend"
, Option [] ["inallgroup"] (ReqArg Limit.addInAllGroup paramGroup)

View file

@ -1,6 +1,6 @@
{- user-specified limits on files to act on
-
- Copyright 2011-2013 Joey Hess <joey@kitenet.net>
- Copyright 2011-2014 Joey Hess <joey@kitenet.net>
-
- Licensed under the GNU GPL version 3 or higher.
-}
@ -23,6 +23,7 @@ import qualified Backend
import Annex.Content
import Annex.UUID
import Logs.Trust
import Logs.NumCopies
import Types.TrustLevel
import Types.Key
import Types.Group
@ -177,6 +178,30 @@ limitCopies want = case split ":" want of
| "+" `isSuffixOf` s = (>=) <$> readTrustLevel (beginning s)
| otherwise = (==) <$> readTrustLevel s
{- Adds a limit to match files that need more copies made.
-
- Does not look at annex.numcopies .gitattributes, because that
- would require querying git check-attr every time a preferred content
- expression is checked, which would probably be quite slow.
-}
addNumCopiesNeeded :: String -> Annex ()
addNumCopiesNeeded = addLimit . limitNumCopiesNeeded
limitNumCopiesNeeded :: MkLimit
limitNumCopiesNeeded want = case readish want of
Just needed -> Right $ \notpresent -> checkKey $
handle needed notpresent
Nothing -> Left "bad value for numcopiesneeded"
where
handle needed notpresent key = do
gv <- getGlobalNumCopies
case gv of
Nothing -> return False
Just numcopies -> do
us <- filter (`S.notMember` notpresent)
<$> (trustExclude UnTrusted =<< Remote.keyLocations key)
return $ numcopies - length us >= needed
{- Adds a limit to skip files not believed to be present in all
- repositories in the specified group. -}
addInAllGroup :: String -> Annex ()

View file

@ -93,6 +93,6 @@ notArchived :: String
notArchived = "not (copies=archive:1 or copies=smallarchive:1)"
{- Most repositories want any content that is only on untrusted
- or dead repositories. -}
- or dead repositories, or that otherwise does not have enough copies. -}
lastResort :: String -> PreferredContentExpression
lastResort s = "(" ++ s ++ ") or (not copies=semitrusted+:1)"
lastResort s = "(" ++ s ++ ") or numcopiesneeded=1"

3
debian/changelog vendored
View file

@ -14,6 +14,9 @@ git-annex (5.20140118) UNRELEASED; urgency=medium
command is used to set the global number of copies, any annex.numcopies
git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.
* Add numcopiesneeded preferred content expression.
* Client, transfer, incremental backup, and archive repositories
now want to get content that does not yet have enough copies.
-- Joey Hess <joeyh@debian.org> Sat, 18 Jan 2014 11:54:17 -0400

View file

@ -1020,6 +1020,15 @@ file contents are present at either of two repositories.
copies, on remotes in the specified group. For example,
`--copies=archive:2`
* `--numcopiesneeded=number`
Matches only files that git-annex believes need the specified number or
more additional copies to be made in order to satisfy their numcopies
setting, as configured by the global numcopies setting of the repository.
Note that for various reasons, including speed, this does not look
at the annex.numcopies .gitattributes settings of files.
* `--inbackend=name`
Matches only files whose content is stored using the specified key-value

View file

@ -113,7 +113,7 @@ any repository that can will back it up.)
All content is preferred, unless it's for a file in a "archive" directory,
which has reached an archive repository.
`((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1))) or (not copies=semitrusted+:1)`
`((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1))) or numcopiesneeded=1`
### transfer
@ -147,20 +147,20 @@ All content is preferred.
Only prefers content that's not already backed up to another backup
or incremental backup repository.
`(include=* and (not copies=backup:1) and (not copies=incrementalbackup:1)) or (not copies=semitrusted+:1)`
`(include=* and (not copies=backup:1) and (not copies=incrementalbackup:1)) or numcopiesneeded=1`
### small archive
Only prefers content that's located in an "archive" directory, and
only if it's not already been archived somewhere else.
`((include=*/archive/* or include=archive/*) and not (copies=archive:1 or copies=smallarchive:1)) or (not copies=semitrusted+:1)`
`((include=*/archive/* or include=archive/*) and not (copies=archive:1 or copies=smallarchive:1)) or numcopiesneeded=1`
### full archive
All content is preferred, unless it's already been archived somewhere else.
`(not (copies=archive:1 or copies=smallarchive:1)) or (not copies=semitrusted+:1)`
`(not (copies=archive:1 or copies=smallarchive:1)) or numcopiesneeded=1`
Note that if you want to archive multiple copies (not a bad idea!),
you should instead configure all your archive repositories with a

View file

@ -54,9 +54,12 @@ Conclusion:
* Add "numcopiesneeded=N" preferred content expression using the git-annex
branch numcopies setting, overridden by any .gitattributes numcopies setting
for a particular file. It should ignore the other ways to specify
numcopies.
numcopies, particularly git config annex.numcopies. **done**
* Make the repo groups that currently end with "or (not copies=semitrusted+:1)"
to instead end with "or numcopiesneeded=1"
to instead end with "or numcopiesneeded=1" **done**
* See if "numcopiesneeded=N" can check .gitattributes without getting
a lot slower. If now, perhaps add a "numcopiesneededaccurate=N" that
checks it.
## Stability analysis