git-annex

Author	SHA1	Message	Date
Joey Hess	6fcccbba19	clean up imports needed by old versions of ghc Now that ghc 9.0.2 is the oldest supported version. Eg cruft from https://web.archive.org/web/20190424185034/https://prime.haskell.org/wiki/Libraries/Proposals/SemigroupMonoid Sponsored-by: Jack Hill	2025-09-23 13:55:13 -04:00
Joey Hess	4fd71c125e	Improve performance when used with a local git remote that has a large working tree git write-tree was being run once per file git-annex acts on when eg, getting files, which is slow when the remote repository has a large tree. onLocal calls quiesce after each action, and quiesce closes the keys db since [[!commit ba7ecbc6a9c]]. Which has a relevant comment about performance. I have not addressed that, the keys db still gets closed and reopened after each file. Turns out that, since git write-tree was run by each call to reconcileStaged, the .git/annex/keysdb.cache value was never the same as the git index's inode. Because git write-tree updates the index's mtime even when no changes have been made. And so, when the database got closed and reopened, reconcileStaged would see a changed index, and run git write-tree again. Over and over. I considered writing the index's new inodecache after write-tree to the keysdb.cache, but that would be vulnerable to a race, if the index was changed just after write-tree. The fix was to stop using keysb.cache at all. When the database is closed and later reopened by the same process, avoid re-doing reconcileStaged. Now that .git/annex/keysdb.cache is no longer used. It could be removed, but the time overhead of removing it would be more than the space overhead of keeping it. Defferred removal to the v11 upgrade. Sponsored-by: unqueued	2025-09-10 12:08:11 -04:00
Joey Hess	6fbd337e34	avoid uncessary keys db writes; doubled speed! When running eg git-annex get, for each file it has to read from and write to the keys database. But it's reading exclusively from one table, and writing to a different table. So, it is not necessary to flush the write to the database before reading. This avoids writing the database once per file, instead it will buffer 1000 changes before writing. Benchmarking getting 1000 small files from a local origin, git-annex get now takes 13.62s, down from 22.41s! git-annex drop now takes 9.07s, down from 18.63s! Wowowowowowowow! (It would perhaps have been better if there were separate databases for the two tables. At least it would have avoided this complexity. Ah well, this is better than splitting the table in a annex.version upgrade.) Sponsored-by: Dartmouth College's Datalad project	2022-10-12 15:33:16 -04:00
Joey Hess	09edb07ac5	add debugLocks around database operations to track down a blocked indefinitely on MVar that seems to occur after sqlite throws ErrorBusy but that I have not been able to reproduce when I made commits synthetically throw ErrorBusy. Sponsored-by: Dartmouth College's Datalad project	2022-06-03 14:16:28 -04:00
Joey Hess	9f94d2894e	remove unused code	2021-07-30 18:01:36 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	e34046de38	slightly more efficient checking of versionUsesKeysDatabase It's a mvar lookup either way, but I think this way will be slightly more efficient. And it reduces the number of places where it's checked to 1.	2016-07-19 14:02:49 -04:00
Joey Hess	5f0b551c0c	assistant: Fix race in v6 mode that caused downloaded file content to sometimes not replace pointer files. The keys database handle needs to be closed after merging, because the smudge filter, in another process, updates the database. Old cached info can be read for a while from the open database handle; closing it ensures that the info written by the smudge filter is available. This is pretty horribly ad-hoc, and it's especially nasty that the transferrer closes the database every time.	2016-05-16 14:49:12 -04:00
Joey Hess	9df13e73ae	if keys database cannot be opened due to permissions, ignore This lets readonly repos be used. If a repo is readonly, we can ignore the keys database, because nothing that we can do will change the state of the repo anyway.	2016-02-12 14:16:35 -04:00
Joey Hess	bcdc6db2c3	fix build with pre-AMP ghc	2015-12-28 17:21:26 -04:00
Joey Hess	4224fae71f	optimise read and write for Keys database (untested) Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously	2015-12-23 19:18:52 -04:00

11 commits