Commit graph

109 commits

Author SHA1 Message Date
Joey Hess
b5a6dfc779
close smudge, open transition tracking item 2018-10-31 08:31:07 -04:00
Joey Hess
361740a112
close bug report, open todo item 2018-10-26 13:03:18 -04:00
Joey Hess
3db20b39f2
update 2018-10-26 12:28:43 -04:00
Joey Hess
dd5611b737
devblog 2018-10-25 16:51:52 -04:00
Joey Hess
6bd67d4d46
cherry-pick too 2018-10-22 16:24:14 -04:00
Joey Hess
fc26fd059b
ugh 2018-10-22 12:39:36 -04:00
Joey Hess
8a7fb2e2d8
plans 2018-10-22 06:16:01 -04:00
Joey Hess
fc7fe2b19d
more thoughts 2018-10-17 13:26:54 -04:00
Joey Hess
b2bafdb2fc
v6: Fix database inconsistency
That could cause git-annex to get confused about whether a locked file's
content was present, when the object file got touched.

Unfortunately this means more work sometimes when annex.thin is set,
since it has to checksum the file to tell if it's still got the right
content.

Had to suppress output when inAnnex calls isUnmodified, otherwise
"(checksum...)" would be printed in places it ought not to be,
eg "git annex get" could turn out not need to get anything, and
so only display that.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-10-16 13:51:37 -04:00
Joey Hess
6498643caf
bug report 2018-10-16 11:41:12 -04:00
Joey Hess
cd3f231d21
retitle 2018-09-12 14:21:39 -04:00
Joey Hess
6adc0d2b3f
bug triage 2018-08-27 15:10:05 -04:00
Joey Hess
98fd7ec6c9
recover from race between git mv+commit and git-annex get
Last of the known v6 races.

This also makes git add of a pointer file populate it when its content
is present in the annex. Which makes sense to do, I think.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 16:01:50 -04:00
Joey Hess
50fa17aee6
v6: recover from race between git mv and git-annex get/drop
Update pointer file next time reconcileStaged is run to recover from the
race.

Note that restagePointerFile causes git to run the clean filter,
and that will run reconcileStaged. So, normally by the time the git
annex get/drop command finishes, the race has already been dealt with.
It may be that, in some case, that won't happen and the race will be
dealt with at a later point. git-annex could run reconcileStaged at
shutdown if that becomes a problem.

This does not handle the situation where the git mv is committed before
git-annex gets a chance to run again. git commit does run the clean
filter, and that happens to re-inject the content if it was supposed to
be dropped but is still populated. But, the case where the file was
supposed to be gotten but is not populated is not handled yet.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 15:56:43 -04:00
Joey Hess
e9b2674281
plan 2018-08-22 13:58:32 -04:00
Joey Hess
38a934cf07
correction 2018-08-22 13:34:15 -04:00
Joey Hess
18ecf41917
avoid running reconcileStaged when the index has not changed
This commit was supported by the NSF-funded DataLad project.
2018-08-22 13:04:12 -04:00
Joey Hess
5e56d9b620
v6: Update associated files database when git has staged changes to pointer files
This commit was supported by the NSF-funded DataLad project.
2018-08-21 17:02:20 -04:00
Joey Hess
b8cd5fde17
idea 2018-08-20 16:13:46 -04:00
Joey Hess
54d49eeac8
avoid update-index race
This commit was supported by the NSF-funded DataLad project.
2018-08-17 16:03:40 -04:00
Joey Hess
ec91b6e4b2
plan to fix race 2018-08-17 11:18:53 -04:00
Joey Hess
5799d325f0
update todo categories 2018-08-16 16:36:47 -04:00
Joey Hess
82a239675f
narrow the race where a file gets modified before update-index
Check just before running update-index if the worktree file's content is
still the same, don't update it when it's been modified. This narrows
the race window a lot, from possibly minutes or hours, to seconds or
less.

(Use replaceFile so that the worktree update happens atomically,
allowing the InodeCache of the new worktree file to itself be gathered
w/o any other race.)

This doesn't eliminate the race; it can still occur in the window before
update-index runs. When annex.queue is large, a lot of files will be
statted by the checks, and so the window may still be large enough to be a
problem.

When only a few files are being processed, the window is as small as it
is in the race where a modification gets overwritten by git-annex when
it updates the worktree. Or maybe as small as whatever race git
checkout/pull/merge may have when the worktree gets modified during it.
Still, I've kept a todo about this race.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 15:56:43 -04:00
Joey Hess
82cfcfc838
better index file refresh method
Use git update-index --refresh, since it's a little bit more
efficient and the user can be told to run it if a locked index prevents
git-annex from running it.

This also fixes the problem where an annexed file was deleted in the index
and a get of another file that uses the same key caused the index update to
add back the deleted file. update-index will not add back the deleted file.

Documented in tips/unlocked_files.mdwn the gotcha that the index update
may conflict with other operations. I can't see any way to possibly avoid
that conflict.

One new todo about a race that causes a modification to be accidentially
staged.

Note that the assistant only flushes the git command queue when it
commits a modification. I have not tested the assistant with v6 unlocked
files, but assume most users of the assistant won't care if the index
shows a file as modified for a while.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 14:16:24 -04:00
Joey Hess
4c5a9965c1
remove invalid todo item
I tested it, and it's ok. I think I was adding it under a filename that
produced a different key.
2018-08-15 13:34:48 -04:00
Joey Hess
48e9e12961
finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.

What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.

So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.

This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.

This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.

This commit was supported by the NSF-funded DataLad project.
2018-08-14 16:23:58 -04:00
Joey Hess
66a4483dfa
response 2018-08-14 11:02:55 -04:00
Joey Hess
d8a8f2df70
full plan 2018-08-13 17:51:02 -04:00
Joey Hess
86df0d6e1b
even better idea 2018-08-13 17:43:16 -04:00
Joey Hess
df5823cea0
update 2018-08-13 17:29:33 -04:00
Joey Hess
a96972015d
massive v6 add speed/memory improvement
v6 add: Take advantage of improved SIGPIPE handler in git 2.5 to speed up
the clean filter by not reading the file content from the pipe. This also
avoids git buffering the whole file content in memory.

When built with an older git, still consumes stdin. If built with a newer
git and used with an older one, it breaks, but that's acceptable --
checking the git version every time would make repeated smudge runs slow.

This commit was supported by the NSF-funded DataLad project.
2018-08-09 18:17:46 -04:00
Joey Hess
d180ae039c
fix now-dead gmane links
gmane's disk crashed, I found one thread in another archive, but could
not find my whole patch set in any archive (perhaps some of the messages
were too long), so pulled it out of my personal mail archives.

This commit was supported by the NSF-funded DataLad project.
2017-12-26 12:21:44 -04:00
Joey Hess
99a1e6efe2
close as dup 2017-06-09 13:43:53 -04:00
Joey Hess
6f502c859b
todo 2016-12-12 17:21:32 -04:00
Joey Hess
be2832655b
update 2016-07-12 11:52:12 -04:00
Joey Hess
51751f68f7
link to patch 2016-06-16 16:35:58 -04:00
Joey Hess
36cf163321
found a bad memory use in git 2016-05-12 17:26:33 -04:00
Joey Hess
5cee93c4e5
link to my post "proposal for extending smudge/clean filters with raw file access" 2016-05-12 14:26:58 -04:00
Joey Hess
b19511822a
update 2016-04-13 13:39:16 -04:00
Joey Hess
7815f227d2
update 2016-04-12 14:31:37 -04:00
Joey Hess
f0ddc0a75c
comment 2016-04-09 13:22:27 -04:00
Joey Hess
973d66bb67
update 2016-04-08 15:31:53 -04:00
Joey Hess
8a69298bf2
init: Automatically enter the adjusted unlocked branch when in a v6 repo on a filesystem not supporting symlinks. 2016-03-29 13:54:42 -04:00
Joey Hess
84d657312e
comment 2016-02-12 14:36:00 -04:00
Joey Hess
4b1afda6c5
pointer 2016-02-09 12:36:22 -04:00
fiatjaf
4dffddb343 small formatting fix. 2016-01-19 19:40:39 +00:00
Joey Hess
ecd0684bfc
avoid hard linking object from other repository when annex.thin is set
This is simpler and less expensive than checking if the src file has a
link count >= 2, and also is unlocked.
2016-01-13 14:19:31 -04:00
Joey Hess
f9c5aa84e0
add database benchmark
The benchmark shows that the database access is quite fast indeed!
And, it scales linearly to the number of keys, with one exception,
getAssociatedKey.

Based on this benchmark, I don't think I need worry about optimising
for cases where all files are locked and the database is mostly empty.
In those cases, database access will be misses, and according to this
benchmark, should add only 50 milliseconds to runtime.

(NB: There may be some overhead to getting the database opened and locking
the handle that this benchmark doesn't see.)

joey@darkstar:~/src/git-annex>./git-annex benchmark
setting up database with 1000
setting up database with 10000
benchmarking keys database/getAssociatedFiles from 1000 (hit)
time                 62.77 μs   (62.70 μs .. 62.85 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 62.81 μs   (62.76 μs .. 62.88 μs)
std dev              201.6 ns   (157.5 ns .. 259.5 ns)

benchmarking keys database/getAssociatedFiles from 1000 (miss)
time                 50.02 μs   (49.97 μs .. 50.07 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 50.09 μs   (50.04 μs .. 50.17 μs)
std dev              206.7 ns   (133.8 ns .. 295.3 ns)

benchmarking keys database/getAssociatedKey from 1000 (hit)
time                 211.2 μs   (210.5 μs .. 212.3 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 211.0 μs   (210.7 μs .. 212.0 μs)
std dev              1.685 μs   (334.4 ns .. 3.517 μs)

benchmarking keys database/getAssociatedKey from 1000 (miss)
time                 173.5 μs   (172.7 μs .. 174.2 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 173.7 μs   (173.0 μs .. 175.5 μs)
std dev              3.833 μs   (1.858 μs .. 6.617 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarking keys database/getAssociatedFiles from 10000 (hit)
time                 64.01 μs   (63.84 μs .. 64.18 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 64.85 μs   (64.34 μs .. 66.02 μs)
std dev              2.433 μs   (547.6 ns .. 4.652 μs)
variance introduced by outliers: 40% (moderately inflated)

benchmarking keys database/getAssociatedFiles from 10000 (miss)
time                 50.33 μs   (50.28 μs .. 50.39 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 50.32 μs   (50.26 μs .. 50.38 μs)
std dev              202.7 ns   (167.6 ns .. 252.0 ns)

benchmarking keys database/getAssociatedKey from 10000 (hit)
time                 1.142 ms   (1.139 ms .. 1.146 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.142 ms   (1.140 ms .. 1.144 ms)
std dev              7.142 μs   (4.994 μs .. 10.98 μs)

benchmarking keys database/getAssociatedKey from 10000 (miss)
time                 1.094 ms   (1.092 ms .. 1.096 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.095 ms   (1.095 ms .. 1.097 ms)
std dev              4.277 μs   (2.591 μs .. 7.228 μs)
2016-01-12 13:07:03 -04:00
Joey Hess
55ad30d1d9
update 2016-01-08 16:30:31 -04:00
Joey Hess
c96fb11a96
devblog 2016-01-07 18:03:06 -04:00