import: Changed how --deduplicate, --skip-duplicates, and --clean-duplicates determine if a file is a duplicate

Before, only content known to be present somewhere was considered a
duplicate. Now, any content that has been annexed before will be considered
a duplicate, even if all annexed copies of the data have been lost.

Note that --clean-duplicates and --deduplicate still check numcopies,
so won't delete duplicate files unless there's an annexed copy.

This makes import use the same method as reinject --known.

The man page already said that duplicate meant "its content is either
present in the local repository already, or git-annex knows of another
repository that contains it, or it was present in the annex before but has
been removed now". So, this is really only bringing the implementation into
line with the man page.

This commit was sponsored by Jochen Bartl on Patreon.
This commit is contained in:
Joey Hess 2017-02-07 17:35:51 -04:00
parent a8e64d4148
commit e7e36b6e72
No known key found for this signature in database
GPG key ID: C910D9222512E3C7
3 changed files with 13 additions and 3 deletions

View file

@ -34,6 +34,15 @@ git-annex (6.20170102) UNRELEASED; urgency=medium
UUID for the new special remote, instead of generating a UUID.
This can be useful in some situations, eg when the same data can be
accessed via two different special remote backends.
* import: Changed how --deduplicate, --skip-duplicates, and
--clean-duplicates determine if a file is a duplicate.
Before, only content known to be present somewhere was considered
a duplicate. Now, any content that has been annexed before will be
considered a duplicate, even if all annexed copies of the data have
been lost.
Note that --clean-duplicates and --deduplicate still check
numcopies, so won't delete duplicate files unless there's an annexed
copy.
-- Joey Hess <id@joeyh.name> Fri, 06 Jan 2017 15:22:06 -0400

View file

@ -18,6 +18,7 @@ import Types.KeySource
import Annex.CheckIgnore
import Annex.NumCopies
import Annex.FileMatcher
import Logs.Location
cmd :: Command
cmd = withGlobalOptions (jobsOption : jsonOption : fileMatchingOptions) $ notBareRepo $
@ -136,7 +137,7 @@ start largematcher mode (srcfile, destfile) =
let ks = KeySource srcfile srcfile Nothing
v <- genKey ks backend
case v of
Just (k, _) -> ifM (not . null <$> keyLocations k)
Just (k, _) -> ifM (isKnownKey k)
( return (maybe Nothing (\a -> Just (a k)) dupa)
, return notdupa
)

View file

@ -366,8 +366,8 @@ test_import = intmpclonerepo $ Utility.Tmp.withTmpDir "importtest" $ \importdir
git_annex "drop" ["--force", imported1, imported2, imported5] @? "drop failed"
annexed_notpresent_imported imported2
(toimportdup, importfdup, importeddup) <- mktoimport importdir "importdup"
git_annex "import" ["--clean-duplicates", toimportdup]
@? "import of missing duplicate with --clean-duplicates failed"
not <$> git_annex "import" ["--clean-duplicates", toimportdup]
@? "import of missing duplicate with --clean-duplicates failed to fail"
checkdoesnotexist importeddup
checkexists importfdup
where