git-annex

Author	SHA1	Message	Date
Joey Hess	b5a6dfc779	close smudge, open transition tracking item	2018-10-31 08:31:07 -04:00
Joey Hess	361740a112	close bug report, open todo item	2018-10-26 13:03:18 -04:00
Joey Hess	3db20b39f2	update	2018-10-26 12:28:43 -04:00
Joey Hess	dd5611b737	devblog	2018-10-25 16:51:52 -04:00
Joey Hess	6bd67d4d46	cherry-pick too	2018-10-22 16:24:14 -04:00
Joey Hess	fc26fd059b	ugh	2018-10-22 12:39:36 -04:00
Joey Hess	8a7fb2e2d8	plans	2018-10-22 06:16:01 -04:00
Joey Hess	fc7fe2b19d	more thoughts	2018-10-17 13:26:54 -04:00
Joey Hess	b2bafdb2fc	v6: Fix database inconsistency That could cause git-annex to get confused about whether a locked file's content was present, when the object file got touched. Unfortunately this means more work sometimes when annex.thin is set, since it has to checksum the file to tell if it's still got the right content. Had to suppress output when inAnnex calls isUnmodified, otherwise "(checksum...)" would be printed in places it ought not to be, eg "git annex get" could turn out not need to get anything, and so only display that. This commit was sponsored by Ole-Morten Duesund on Patreon.	2018-10-16 13:51:37 -04:00
Joey Hess	6498643caf	bug report	2018-10-16 11:41:12 -04:00
Joey Hess	cd3f231d21	retitle	2018-09-12 14:21:39 -04:00
Joey Hess	6adc0d2b3f	bug triage	2018-08-27 15:10:05 -04:00
Joey Hess	98fd7ec6c9	recover from race between git mv+commit and git-annex get Last of the known v6 races. This also makes git add of a pointer file populate it when its content is present in the annex. Which makes sense to do, I think. This commit was supported by the NSF-funded DataLad project.	2018-08-22 16:01:50 -04:00
Joey Hess	50fa17aee6	v6: recover from race between git mv and git-annex get/drop Update pointer file next time reconcileStaged is run to recover from the race. Note that restagePointerFile causes git to run the clean filter, and that will run reconcileStaged. So, normally by the time the git annex get/drop command finishes, the race has already been dealt with. It may be that, in some case, that won't happen and the race will be dealt with at a later point. git-annex could run reconcileStaged at shutdown if that becomes a problem. This does not handle the situation where the git mv is committed before git-annex gets a chance to run again. git commit does run the clean filter, and that happens to re-inject the content if it was supposed to be dropped but is still populated. But, the case where the file was supposed to be gotten but is not populated is not handled yet. This commit was supported by the NSF-funded DataLad project.	2018-08-22 15:56:43 -04:00
Joey Hess	e9b2674281	plan	2018-08-22 13:58:32 -04:00
Joey Hess	38a934cf07	correction	2018-08-22 13:34:15 -04:00
Joey Hess	18ecf41917	avoid running reconcileStaged when the index has not changed This commit was supported by the NSF-funded DataLad project.	2018-08-22 13:04:12 -04:00
Joey Hess	5e56d9b620	v6: Update associated files database when git has staged changes to pointer files This commit was supported by the NSF-funded DataLad project.	2018-08-21 17:02:20 -04:00
Joey Hess	b8cd5fde17	idea	2018-08-20 16:13:46 -04:00
Joey Hess	54d49eeac8	avoid update-index race This commit was supported by the NSF-funded DataLad project.	2018-08-17 16:03:40 -04:00
Joey Hess	ec91b6e4b2	plan to fix race	2018-08-17 11:18:53 -04:00
Joey Hess	5799d325f0	update todo categories	2018-08-16 16:36:47 -04:00
Joey Hess	82a239675f	narrow the race where a file gets modified before update-index Check just before running update-index if the worktree file's content is still the same, don't update it when it's been modified. This narrows the race window a lot, from possibly minutes or hours, to seconds or less. (Use replaceFile so that the worktree update happens atomically, allowing the InodeCache of the new worktree file to itself be gathered w/o any other race.) This doesn't eliminate the race; it can still occur in the window before update-index runs. When annex.queue is large, a lot of files will be statted by the checks, and so the window may still be large enough to be a problem. When only a few files are being processed, the window is as small as it is in the race where a modification gets overwritten by git-annex when it updates the worktree. Or maybe as small as whatever race git checkout/pull/merge may have when the worktree gets modified during it. Still, I've kept a todo about this race. This commit was supported by the NSF-funded DataLad project.	2018-08-16 15:56:43 -04:00
Joey Hess	82cfcfc838	better index file refresh method Use git update-index --refresh, since it's a little bit more efficient and the user can be told to run it if a locked index prevents git-annex from running it. This also fixes the problem where an annexed file was deleted in the index and a get of another file that uses the same key caused the index update to add back the deleted file. update-index will not add back the deleted file. Documented in tips/unlocked_files.mdwn the gotcha that the index update may conflict with other operations. I can't see any way to possibly avoid that conflict. One new todo about a race that causes a modification to be accidentially staged. Note that the assistant only flushes the git command queue when it commits a modification. I have not tested the assistant with v6 unlocked files, but assume most users of the assistant won't care if the index shows a file as modified for a while. This commit was supported by the NSF-funded DataLad project.	2018-08-16 14:16:24 -04:00
Joey Hess	4c5a9965c1	remove invalid todo item I tested it, and it's ok. I think I was adding it under a filename that produced a different key.	2018-08-15 13:34:48 -04:00
Joey Hess	48e9e12961	finally fixed v6 get/drop git status After updating the worktree for an add/drop, update git's index, so git status will not show the files as modified. What actually happens is that the index update removes the inode information from the index. The next git status (or similar) run then has to do some work. It runs the clean filter. So, this depends on the clean filter being reasonably fast and on git not leaking memory when running it. Both problems were fixed in `a96972015d`, but only for git 2.5. Anyone using an older git will see very expensive git status after an add/drop. This uses the same git update-index queue as other parts of git-annex, so the actual index update is fairly efficient. Of course, updating the index does still have some overhead. The annex.queuesize config will control how often the index gets updated when working on a lot of files. This is an imperfect workaround... Added several todos about new problems this workaround causes. Still, this seems a lot better than the old behavior. This commit was supported by the NSF-funded DataLad project.	2018-08-14 16:23:58 -04:00
Joey Hess	66a4483dfa	response	2018-08-14 11:02:55 -04:00
Joey Hess	d8a8f2df70	full plan	2018-08-13 17:51:02 -04:00
Joey Hess	86df0d6e1b	even better idea	2018-08-13 17:43:16 -04:00
Joey Hess	df5823cea0	update	2018-08-13 17:29:33 -04:00
Joey Hess	a96972015d	massive v6 add speed/memory improvement v6 add: Take advantage of improved SIGPIPE handler in git 2.5 to speed up the clean filter by not reading the file content from the pipe. This also avoids git buffering the whole file content in memory. When built with an older git, still consumes stdin. If built with a newer git and used with an older one, it breaks, but that's acceptable -- checking the git version every time would make repeated smudge runs slow. This commit was supported by the NSF-funded DataLad project.	2018-08-09 18:17:46 -04:00
Joey Hess	d180ae039c	fix now-dead gmane links gmane's disk crashed, I found one thread in another archive, but could not find my whole patch set in any archive (perhaps some of the messages were too long), so pulled it out of my personal mail archives. This commit was supported by the NSF-funded DataLad project.	2017-12-26 12:21:44 -04:00
Joey Hess	99a1e6efe2	close as dup	2017-06-09 13:43:53 -04:00
Joey Hess	6f502c859b	todo	2016-12-12 17:21:32 -04:00
Joey Hess	be2832655b	update	2016-07-12 11:52:12 -04:00
Joey Hess	51751f68f7	link to patch	2016-06-16 16:35:58 -04:00
Joey Hess	36cf163321	found a bad memory use in git	2016-05-12 17:26:33 -04:00
Joey Hess	5cee93c4e5	link to my post "proposal for extending smudge/clean filters with raw file access"	2016-05-12 14:26:58 -04:00
Joey Hess	b19511822a	update	2016-04-13 13:39:16 -04:00
Joey Hess	7815f227d2	update	2016-04-12 14:31:37 -04:00
Joey Hess	f0ddc0a75c	comment	2016-04-09 13:22:27 -04:00
Joey Hess	973d66bb67	update	2016-04-08 15:31:53 -04:00
Joey Hess	8a69298bf2	init: Automatically enter the adjusted unlocked branch when in a v6 repo on a filesystem not supporting symlinks.	2016-03-29 13:54:42 -04:00
Joey Hess	84d657312e	comment	2016-02-12 14:36:00 -04:00
Joey Hess	4b1afda6c5	pointer	2016-02-09 12:36:22 -04:00
fiatjaf	4dffddb343	small formatting fix.	2016-01-19 19:40:39 +00:00
Joey Hess	ecd0684bfc	avoid hard linking object from other repository when annex.thin is set This is simpler and less expensive than checking if the src file has a link count >= 2, and also is unlocked.	2016-01-13 14:19:31 -04:00
Joey Hess	f9c5aa84e0	add database benchmark The benchmark shows that the database access is quite fast indeed! And, it scales linearly to the number of keys, with one exception, getAssociatedKey. Based on this benchmark, I don't think I need worry about optimising for cases where all files are locked and the database is mostly empty. In those cases, database access will be misses, and according to this benchmark, should add only 50 milliseconds to runtime. (NB: There may be some overhead to getting the database opened and locking the handle that this benchmark doesn't see.) joey@darkstar:~/src/git-annex>./git-annex benchmark setting up database with 1000 setting up database with 10000 benchmarking keys database/getAssociatedFiles from 1000 (hit) time 62.77 μs (62.70 μs .. 62.85 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 62.81 μs (62.76 μs .. 62.88 μs) std dev 201.6 ns (157.5 ns .. 259.5 ns) benchmarking keys database/getAssociatedFiles from 1000 (miss) time 50.02 μs (49.97 μs .. 50.07 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.09 μs (50.04 μs .. 50.17 μs) std dev 206.7 ns (133.8 ns .. 295.3 ns) benchmarking keys database/getAssociatedKey from 1000 (hit) time 211.2 μs (210.5 μs .. 212.3 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 211.0 μs (210.7 μs .. 212.0 μs) std dev 1.685 μs (334.4 ns .. 3.517 μs) benchmarking keys database/getAssociatedKey from 1000 (miss) time 173.5 μs (172.7 μs .. 174.2 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 173.7 μs (173.0 μs .. 175.5 μs) std dev 3.833 μs (1.858 μs .. 6.617 μs) variance introduced by outliers: 16% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (hit) time 64.01 μs (63.84 μs .. 64.18 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 64.85 μs (64.34 μs .. 66.02 μs) std dev 2.433 μs (547.6 ns .. 4.652 μs) variance introduced by outliers: 40% (moderately inflated) benchmarking keys database/getAssociatedFiles from 10000 (miss) time 50.33 μs (50.28 μs .. 50.39 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 50.32 μs (50.26 μs .. 50.38 μs) std dev 202.7 ns (167.6 ns .. 252.0 ns) benchmarking keys database/getAssociatedKey from 10000 (hit) time 1.142 ms (1.139 ms .. 1.146 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.142 ms (1.140 ms .. 1.144 ms) std dev 7.142 μs (4.994 μs .. 10.98 μs) benchmarking keys database/getAssociatedKey from 10000 (miss) time 1.094 ms (1.092 ms .. 1.096 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.095 ms (1.095 ms .. 1.097 ms) std dev 4.277 μs (2.591 μs .. 7.228 μs)	2016-01-12 13:07:03 -04:00
Joey Hess	55ad30d1d9	update	2016-01-08 16:30:31 -04:00
Joey Hess	c96fb11a96	devblog	2016-01-07 18:03:06 -04:00

1 2 3

109 commits