Commit graph

40496 commits

Author SHA1 Message Date
Joey Hess
c380687aa3
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-08 11:13:09 -04:00
Joey Hess
7f742589f9
claw back annexed file scan speedup
Following commit c941ab6f5b, this avoids
the second, redundant scan when annex.thin is not set.

The benchmark now runs in 35.5 seconds, down from 40 seconds.

Note that the inode cache of the annex object has to be passed to
addInodeCaches now, because it might not already be in the inode caches,
unlike previously.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 11:09:15 -04:00
Joey Hess
ec1f2f246b
improve comment
remove obsolete part about a commit preventing it seeing changes
2021-06-08 10:43:48 -04:00
yarikoptic
62758ffb9f Added a comment: slow down is OSX specific 2021-06-08 14:28:18 +00:00
Joey Hess
d12120739d
comment 2021-06-08 10:19:04 -04:00
Joey Hess
2125367f3f
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-08 09:42:57 -04:00
Joey Hess
c941ab6f5b
avoid double work in git-annex init, second try
reconcileStaged populates the db, so scanAnnexedFiles does not need to
do it again. It still makes a pass over the HEAD tree, but populating
the db was most of the expensive part.

Benchmarking with 100,000 files, git-annex init now takes 40 seconds,
vs 37 seconds with the old, buggy version of this fix. It should be
possible to win those 3 precious seconds per 100k files back, in the
case when when annex.thin is not set, with improvements to reconcileStaged
that avoid needing this second pass.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 09:36:53 -04:00
Joey Hess
22185b4a4e
stop using addAssociatedFileFast
Use addAssociatedFile instead, after recent optimisations it seems just
as fast.
2021-06-08 09:23:28 -04:00
Joey Hess
2cb7b7b336
Revert "avoid double work in git-annex init"
This reverts commit 0f10f208a7.

The implementation of this turns out to be unsafe; it can lead to a keys
db deadlock. scanAnnexedFiles injects a call to inAnnex into
reconcileStaged, but inAnnex sometimes needs to read from the keys db,
which will try to re-open it when it's in the process of being opened.
The exclusive lock of gitAnnexKeysDbLock will then deadlock.

This needs to be done in some other way...
2021-06-08 09:11:24 -04:00
Joey Hess
c831a562f5
faster associated file replacement with upsert
Rather than first deleting and then inserting, upsert lets the key
associated with a file be updated in place.

Benchmarked with 100,000 files, and an empty keys database, running
reconcileStaged. It improved from 47 seconds to 34 seconds.
So this got reconcileStaged to be as fast as scanAssociatedFiles,
or faster -- scanAssociatedFiles benchmarks at 37 seconds.

(Also checked for other users of deleteWhere that could be sped up by
upsert. There are a couple, but they are not in performance critical
code paths, eg recordExportTreeCurrent is only run once per tree
export.)

I would have liked to rename FileKeyIndex to FileKeyUnique since it is
being used as a uniqueness constraint now, not just to get an index.
But, that gets converted into part of the SQL schema, and the name
is used by the upsert, so it can't be changed.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 07:53:36 -04:00
yarikoptic
57b567ac87 Added a comment 2021-06-07 21:39:05 +00:00
yarikoptic
2ffb9cc01b Added a comment: clarification 2021-06-07 21:20:35 +00:00
Joey Hess
e9a8b48a52
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-07 17:02:15 -04:00
Joey Hess
2467de4f9b
todo 2021-06-07 16:58:35 -04:00
Joey Hess
0f10f208a7
avoid double work in git-annex init
reconcileStaged was doing a redundant scan to scannAnnexedFiles.

It would probably make sense to move the body of scannAnnexedFiles
into reconcileStaged, the separation does not really serve any purpose.

Sponsored-by: Dartmouth College's Datalad project
2021-06-07 16:50:14 -04:00
Joey Hess
6ceb31a30a
optimise reconcileStaged with git cat-file streaming
Commit 428c91606b made it need to do more
work in situations like switching between very different branches.

Compare with seekFilteredKeys which has a similar optimisation. Might be
possible to factor out the common part from these?

Sponsored-by: Dartmouth College's Datalad project
2021-06-07 15:26:48 -04:00
Joey Hess
70dbe61fc2
remove unnecessary liftIO 2021-06-07 14:51:12 -04:00
Ilya_Shlyakhter
bdf3c06401 Added a comment: deferring the scan 2021-06-07 17:41:45 +00:00
Joey Hess
570e93abfd
comment 2021-06-07 13:28:36 -04:00
Joey Hess
1c35cacf8e
fix link 2021-06-07 13:06:16 -04:00
Joey Hess
b960ebf1b3
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-07 12:59:21 -04:00
Joey Hess
0101363eb8
correctly update keys db in merge conflict
This is quite a subtle edge case, see the bug report for full details.

The second git diff is needed only when there's a merge conflict.
It would be possible to speed it up marginally by using
--diff-filter=Unmerged, but probably not enough to bother with.

Sponsored-by: Graham Spencer on Patreon
2021-06-07 12:52:36 -04:00
Ilya_Shlyakhter
4d581ad6b4 Added a comment: deferring the keys-to-files scan 2021-06-07 16:11:01 +00:00
Joey Hess
da24034331
comment 2021-06-07 11:53:25 -04:00
Joey Hess
a0bba3afad
comment 2021-06-07 11:49:28 -04:00
Joey Hess
3aabaa1a2b
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-07 11:47:13 -04:00
Joey Hess
254199edc9
comment 2021-06-07 11:46:47 -04:00
Ilya_Shlyakhter
5359f8bc14 added suggestion to match keys by file extension in the key 2021-06-07 15:08:51 +00:00
Ilya_Shlyakhter
def7e001c6 Added a comment: keeping connected files together 2021-06-07 14:45:35 +00:00
jwodder
8822bf0803 2021-06-07 14:32:25 +00:00
Ilya_Shlyakhter
58a07c6a42 removed 2021-06-07 14:31:52 +00:00
Ilya_Shlyakhter
da98ede56a Added a comment: specifying preferred content by metadata 2021-06-07 14:26:57 +00:00
Ilya_Shlyakhter
3d8cd26497 Added a comment: specifying preferred content by metadata 2021-06-07 14:26:27 +00:00
Atemu
a706708d17 Added a comment 2021-06-06 20:47:31 +00:00
Lukey
cf9a93e901 Added a comment 2021-06-06 18:01:08 +00:00
Atemu
550f67d8b4 2021-06-06 17:18:14 +00:00
Atemu
234d235a03 rename bugs/delayadd_doesn__39__t_work.mdwn to bugs/delayadd_doesn__39__t_work_with_smallfiles.mdwn 2021-06-06 16:50:40 +00:00
jenkin.schibel@286264d9ceb79998aecff0d5d1a4ffe34f8b8421
fe518e13d5 Added a comment: using import tree and export tree 2021-06-06 14:43:40 +00:00
falsifian
f58d686ccf Add bug report 2021-06-05 20:31:39 +00:00
yarikoptic
717000f4f8 Added a comment 2021-06-05 13:50:43 +00:00
yarikoptic
c5e62c1968 Initial report on performance regression 2021-06-05 13:23:13 +00:00
alt
8054fef09a Added a comment 2021-06-05 13:07:48 +00:00
lucas.gautheron@09f1983993dfb0907d02ba268b3ca672f1dc3eea
b38dc11a37 Added a comment 2021-06-05 10:10:57 +00:00
Ilya_Shlyakhter
d39dfed2a7 Added a comment: "why all these wild ideas are being thrown out there" 2021-06-04 22:15:33 +00:00
Joey Hess
a2c9360905
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-04 16:45:02 -04:00
Joey Hess
8a13bbedd6
--size-limit exit 101
Sponsored-by: Mark Reidenbach on Patreon
2021-06-04 16:43:47 -04:00
Atemu
ee5f30ee6b 2021-06-04 20:26:27 +00:00
Joey Hess
771a122c9e
add --size-limit option
When this option is not used, there should be effectively no added
overhead, thanks to the optimisation in
b3cd0cc6ba.

When an action fails on a file, the size of the file still counts toward
the size limit. This was necessary to support concurrency, but also
generally seems like the right choice.

Most commands that operate on annexed files support the option.
export and import do not, and I don't know if it would make sense for
export to.. Why would you want an incomplete export? sync doesn't, and
while it would be easy to make it support it for transferring files,
it's not clear if dropping files should also take the size limit into
account. Commands like add that don't operate on annexed files don't
support the option either.

Exiting 101 not yet implemented.

Sponsored-by: Denis Dzyubenko on Patreon
2021-06-04 16:16:53 -04:00
Joey Hess
b3cd0cc6ba
minor optimisation
Avoid a second mvar access.

Sponsored-by: Jochen Bartl on Patreon
2021-06-04 14:57:19 -04:00
Joey Hess
4b6cb2b917
comment 2021-06-04 14:00:58 -04:00