git-annex/Annex
Joey Hess f617988a29
Make import --deduplicate and --skip-duplicates only hash once, not twice
import: --deduplicate and --skip-duplicates were implemented inneficiently;
they unncessarily hashed each file twice. They have been improved to only
hash once.

The new approach is to lock down (minimally) and hash files, and then
reuse that information when importing them.

This was rather tricky, especially in detecting changes to files while
they are being imported.

The output of import changed slightly. While before it silently skipped
over files with eg --skip-duplicates, now it shows each file as it starts
to act on it. Since every file is hashed first thing, it would otherwise
not be clear what file import is chewing on. (Actually, it wasn't clear
before when any of the duplicates switches were used.)

This commit was sponsored by Alexander Thompson on Patreon.
2017-02-09 15:32:22 -04:00
..
Branch Unneded constraint 2016-01-28 12:34:07 -04:00
Content Always use filesystem encoding for all file and handle reads and writes. 2016-12-24 14:46:31 -04:00
LockPool clarify 2016-03-01 16:22:47 -04:00
MetaData update my email address and homepage url 2015-01-21 12:50:09 -04:00
View remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Action.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
AdjustedBranch.hs Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. 2016-11-15 21:29:54 -04:00
AutoMerge.hs Fix bad automatic merge conflict resolution between an annexed file and a directory with the same name when in an adjusted branch. 2016-06-07 12:53:35 -04:00
BloomFilter.hs Another redundant constraint 2016-01-28 12:34:07 -04:00
Branch.hs config: New command for storing configuration in the git-annex branch. 2017-01-30 16:46:38 -04:00
BranchState.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
CatFile.hs Always use filesystem encoding for all file and handle reads and writes. 2016-12-24 14:46:31 -04:00
ChangedRefs.hs make tor hidden service work when directory watching is not available 2016-12-09 16:40:47 -04:00
CheckAttr.hs annex.largefiles can be configured in .gitattributes too 2016-02-02 15:18:17 -04:00
CheckIgnore.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Common.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Concurrent.hs Sped up git-annex add in direct mode and v6 by using git hash-object --batch. 2016-03-14 15:58:46 -04:00
Content.hs Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. 2016-11-15 21:29:54 -04:00
Difference.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Direct.hs Some optimisations to string splitting code. 2017-01-31 19:06:22 -04:00
DirHashes.hs Always use filesystem encoding for all file and handle reads and writes. 2016-12-24 14:46:31 -04:00
Drop.hs --branch, stage 2 2016-07-20 15:23:43 -04:00
Environment.hs also avoid crashing in most circumstances if unable to determine the username 2016-06-08 15:04:15 -04:00
FileMatcher.hs Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. 2016-11-15 21:29:54 -04:00
Fixup.hs avoid warnings about not exported System.Directory.isSymbolicLink 2016-04-28 15:18:11 -04:00
GitOverlay.hs Optimisations to git-annex branch query and setting, avoiding repeated copies of the environment. 2016-09-29 13:36:48 -04:00
HashObject.hs Sped up git-annex add in direct mode and v6 by using git hash-object --batch. 2016-03-14 15:58:46 -04:00
Hook.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Ingest.hs Make import --deduplicate and --skip-duplicates only hash once, not twice 2017-02-09 15:32:22 -04:00
Init.hs Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. 2016-11-15 21:29:54 -04:00
InodeSentinal.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Journal.hs Always use filesystem encoding for all file and handle reads and writes. 2016-12-24 14:46:31 -04:00
Link.hs Always use filesystem encoding for all file and handle reads and writes. 2016-12-24 14:46:31 -04:00
Locations.hs Revert ServerAliveInterval 2016-12-13 12:12:38 -04:00
LockFile.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
LockPool.hs pid locking configuration and abstraction layer for git-annex 2015-11-12 17:50:34 -04:00
MakeRepo.hs Use git-annex init --version=6 to get v6 for now 2015-12-15 17:17:13 -04:00
MetaData.hs Added metadata --batch option, which allows getting, setting, deleting, and modifying metadata for multiple files/keys. 2016-07-27 10:46:25 -04:00
Notification.hs plumb assicated files through P2P protocol for updating transfer logs 2016-12-02 16:42:54 -04:00
NumCopies.hs handle SomeAsyncException same as AsyncException 2016-06-20 10:31:47 -04:00
Path.hs Fix bug introduced in the last release that broke git-annex sync when git-annex was installed from the standalone tarball. 2015-03-27 12:55:18 -04:00
Perms.hs fsck: Warn when core.sharedRepository is set and an annex object file's write bit is not set and cannot be set due to the file being owned by a different user. 2016-04-14 15:36:53 -04:00
Queue.hs withAltRepo needs a separate queue of changes 2016-06-03 13:57:00 -04:00
Quvi.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
ReplaceFile.hs Windows: Fix an over-long temp directory name. 2016-05-06 12:49:41 -04:00
SpecialRemote.hs add SetupStage parameter to RemoteType.setup 2017-02-07 14:55:58 -04:00
Ssh.hs Always use filesystem encoding for all file and handle reads and writes. 2016-12-24 14:46:31 -04:00
TaggedPush.hs Some optimisations to string splitting code. 2017-01-31 19:06:22 -04:00
Transfer.hs update progress logs in remotedaemon send/receive 2016-12-08 19:56:02 -04:00
Url.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
UUID.hs When built with ut uid-1.3.12, generate more random UUIDs than before 2016-07-27 07:46:08 -04:00
VariantFile.hs Always use filesystem encoding for all file and handle reads and writes. 2016-12-24 14:46:31 -04:00
Version.hs Support using v3 repositories without upgrading them to v5. 2016-10-05 16:53:09 -04:00
View.hs Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. 2016-11-15 21:29:54 -04:00
Wanted.hs remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
WorkTree.hs upgrade: Handle upgrade to v6 when the repository already contains v6 unlocked files whose content is already present. 2016-10-17 15:19:47 -04:00