improve import duplicate docs

This commit is contained in:
Joey Hess 2015-03-26 11:44:20 -04:00
parent b4fb5eac3b
commit e1b8853174
2 changed files with 43 additions and 11 deletions

View file

@ -13,11 +13,18 @@ the annex. Individual files to import can be specified.
If a directory is specified, the entire directory is imported.
git annex import /media/camera/DCIM/*
By default, importing two files with the same contents from two different
locations will result in both files being added to the repository.
(With all checksumming backends, including the default SHA256E,
only one copy of the data will be stored.)
When importing files, there's a possibility of importing a duplicate
of a file that is already known to git-annex -- its content is either
present in the local repository already, or git-annex knows of anther
repository that contains it.
By default, importing a duplicate of a known file will result in
a new filename being added to the repository, so the duplicate file
is present in the repository twice. (With all checksumming backends,
including the default SHA256E, only one copy of the data will be stored.)
Several options can be used to adjust handling of duplicate files.
# OPTIONS
@ -32,19 +39,18 @@ only one copy of the data will be stored.)
* `--deduplicate`
Only import files whose content has not been seen before by git-annex.
Duplicate files will be deleted from the import location.
Only import files that are not duplicates;
duplicate files will be deleted from the import location.
* `--skip-duplicates`
Only import files whose content has not been seen before by git-annex,
but avoid deleting duplicate files.
Only import files that are not duplicates; and avoid deleting
duplicate files from the import location.
* `--clean-duplicates`
Does not import any files, but any files found in the import location
that are duplicates of content in the annex are deleted.
that are duplicates are deleted.
* file matching options

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2015-03-26T15:28:45Z"
content="""
Well, you've found an edge case here.
It behaves as documented as long as the file being imported is located in some
repository know to git-annex. The file content does not have to be present in
the local repository for it to behave as documented.
In your case, the file being imported has a symlink in the git repo, but
git-annex knows about 0 annexed copies of the file, so it's treated as
if it's a new file and not a duplicate.
Since import is working at the key level, there's not a good way to look up
that there are some symlinks in the git repo even though the content is
gone. And even if there was, I think I'd be uncomfortable with it deleting
the file as "duplicate" when its content is not available in any known
repository. The only behavior improvement might be to import the content
but not make a redundant symlink in this case.
I think it's best to change the documentation. I've added a new
paragraph that more exactly and clearly explains what duplicate files
are for the purposes of importing.
"""]]