git-annex

Author	SHA1	Message	Date
Yaroslav Halchenko	87e2ae2014	run codespell throughout fixing typos automagically === Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:21 -04:00
Joey Hess	8a3beabf35	use RawFilePath for opening sqlite databases Fix a crash opening sqlite databases when run in a non-unicode locale, with a remote that uses a non-unicode filepath. In that situation converting to Text fails. The fix needs git-annex to be built with persistent-sqlite 2.13.3. Building against older versions still works, but that version is used when building with stack. Database.RawFilePath is a lot of code copied from persistent-sqlite and lightly modified, since only 1 function in persistent-sqlite was made to support RawFilePath. This is a bit of a pain, and I hope that persistent-sqlite will eventually switch to using OsPath, allowing this module to be removed from git-annex. Sponsored-by: k0ld on Patreon	2023-12-26 18:31:52 -04:00
Joey Hess	6472da265b	silence build warning about ~	2023-09-15 07:56:10 -04:00
Joey Hess	f25eeedeac	initial implementation of --explain Currently it only displays explanations of options like --in and --copies. In the future, it should explain preferred content expression evaluation and other decisions. The explanations of a few things could be better. In particular, "standard" will just appear as-is (or as "!standard" if it doesn't match), rather than explaining why the standard preferred content expression for the group matches or not. Currently as implemented, it goes to stdout, and so commands like git-annex find that have custom output will not display --explain information. Perhaps that should change, dunno. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 16:52:57 -04:00
Joey Hess	a3f433eac8	improve error message when commitDb' fails due to disk full or IO error There's still a 60 second delay in this situation because it retries, in case the failure was due to something recoverable like another process. Sponsored-by: unqueued on Patreon	2023-04-19 12:43:30 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	cc36c8516a	Sped up sqlite inserts 2x when built with persistent 2.14.5.0 https://github.com/yesodweb/persistent/issues/1457 Sponsored-by: Dartmouth College's DANDI project	2023-03-31 14:38:25 -04:00
Joey Hess	195508fc65	Improve error message when unable to read a sqlite database due to permissions problem Old message was: sqlite query crashed: thread blocked indefinitely in an MVar operation New message is eg: sqlite worker thread crashed: SQLite3 returned ErrorCan'tOpen while attempting to perform open ".git/annex/keysdb/db". The worker thread used to throw an exception. But before that exception was seen by anything waiting on the worker thread to finish, the takeMVar in queryDb would have crashed with BlockedIndefinitelyOnMVar. Sponsored-by: k0ld on Patreon	2023-02-23 15:28:22 -04:00
Joey Hess	cde2e61105	improve sqlite retrying behavior Avoid hanging when a suspended git-annex process is keeping a sqlite database locked. Sponsored-by: Dartmouth College's Datalad project	2022-10-18 15:47:20 -04:00
Joey Hess	3149a1e2fe	More robust handling of ErrorBusy when writing to sqlite databases While ErrorBusy and other exceptions were caught and the write retried for up to 10 seconds, it was still possible for git-annex to eventually give up and error out without writing to the database. Now it will retry as long as necessary. This does mean that, if one git-annex process is suspended just as sqlite has locked the database for writing, another git-annex that tries to write it it might get stuck retrying forever. But, that could already happen when opening the sqlite database, which retries forever on ErrorBusy. This is an area where git-annex is known to not behave well, there's a todo about the general case of it. Sponsored-by: Dartmouth College's Datalad project	2022-10-17 15:56:19 -04:00
Joey Hess	0d762acf7e	update comment, probably not a sqlite bug Sqlite's page documenting WAL mode changed in Oct 2016 to mention ways that queries could fail with SQLITE_BUSY. http://web.archive.org/web/20161009044054/http://www.sqlite.org:80/wal.html Probably not cooincidentally, I emailed sqlite-users about such a situation in Feb 2015. https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg90580.html Noone ever replied to me, but at least now I understand why it does that. Since it's documented now, it's no longer a bug.	2022-10-17 15:09:47 -04:00
Joey Hess	b801812660	init: probe if sqlite works Help the user get annex.dbdir configured when their filesystem is not one that sqlite works on. The change in Database.Handle makes an error from sqlite not be ignored besides being displayed, which it was before. I can't see any reason git-annex would want to ignore these errors. I chose to use the fsck database rather than the keys database because opening the keys database populates it, and see commit `b3c4579c79`. The placement of the call to checkSqliteWorks inside checkInitializeAllowed avoids annex.uuid getting set before it's called. Sponsored-by: Dartmouth College's Datalad project	2022-08-17 13:12:26 -04:00
Joey Hess	5da1a78508	add debugging around commits to sqlite dbs	2022-06-06 12:36:55 -04:00
Joey Hess	09edb07ac5	add debugLocks around database operations to track down a blocked indefinitely on MVar that seems to occur after sqlite throws ErrorBusy but that I have not been able to reproduce when I made commits synthetically throw ErrorBusy. Sponsored-by: Dartmouth College's Datalad project	2022-06-03 14:16:28 -04:00
Joey Hess	d0ef8303cf	avoid using a second db connection for writes This is a potentially breaking change in a very delicate area. However, examining the code path for writes, I don't see any benefit to opening a second db connection for them. If the write throws an exception, commitDb will retry it with a new db connection. A potential benefit to not opening a second db connection, beyond using less resources, is it just might avoid problems in WSL with sqlite that I have hypothesized are caused by multiple db connections. Commit `5f9eff3f32` explains why it needs to shut down the db connection to force the database to be updated on disk: When closeDb does not get called, garbage collection of DbHandle may not give the workterThread time to cleanly shut down before git-annex exits, resulting in a recently written change not reaching disk.	2021-10-20 12:32:46 -04:00
Joey Hess	f5b642318d	eliminate single/multi writer distinction After commit `f4bdecc4ec`, there is no longer any distinction between SingleWriter and MultiWriter's handling of read after write. Databases that were SingleWriter still have lock files that are used to prevent multiple writers. This does make writing to such databases a bit more expensive, because the MultiWriter code path that is now used opens a second db connection in order to write to them.	2021-10-20 12:26:30 -04:00
Joey Hess	c47794991c	improve with continuation no behavior change	2021-10-20 12:13:49 -04:00
Joey Hess	f4bdecc4ec	improve sqlite MultiWriter handling of read after write This removes a messy caveat that was easy to forget and caused at least one bug. The price paid is that, after a write to a MultiWriter db, it has to close the db connection that it had been using to read, and open a new connection. So it might be a little bit slower. But, writes are usually batched together, so there's often only a single write, and so there should not be much of a slowdown. Notice that SingleWriter already closed the db connection after a write, so paid the same overhead. This is the second try at fixing a bug: git-annex get when run as the first git-annex command in a new repo did not populate all unlocked files. (Reversion in version 8.20210621) Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-10-19 15:13:29 -04:00
Joey Hess	8868a3a4c7	Fix build with persistent-2.12.0.1 persistent stopped using askLogFunc, and the thing to use is askLoggerIO from monad-logger. Bumped the dep to the first version that contained that. Note that the i386ancient build uses a newer monad-logger than 0.3.10, so the new versioned dep should not break it, and presumably nothing else either. This commit was sponsored by Noam Kremen on Patreon.	2021-04-01 12:21:02 -04:00
Joey Hess	2c8cf06e75	more RawFilePath conversion Converted file mode setting to it, and follow-on changes. Compiles up through 369/646. This commit was sponsored by Ethan Aubin.	2020-11-05 18:45:37 -04:00
Joey Hess	098afe144e	display sqlite error message when it crashes	2019-10-24 11:50:55 -04:00
Joey Hess	904b175707	Fix build with persistent-2.10. Added an additional constraint that persistent needs. This also builds with persistent-2.9.2 without needing any cpp.	2019-10-17 11:58:31 -04:00
Joey Hess	9628ae2e67	Close sqlite databases more robustly. Had a report of close throwing ErrorBusy on CIFS. Retrying up to 16 seconds is a balance between hopefully waiting long enough for the problem to clear up and waiting so long that git-annex seems to hang. The new dependency is free; persistent depends on unliftio-core.	2019-09-26 12:25:21 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	0db653f12c	remove unused BangPatterns	2018-11-09 13:09:26 -04:00
Joey Hess	1428568554	retry when sqlite throws ErrorIO I suspect this may be due to SQLITE_IOERR_SHORT_READ, but have not verified. I was able to reproduce it on Linux after running the test suite in a loop for 1-3 hours until it failed. The WAL mode entry change in `3963c5fcf5` may have hidden the problem I was seeing; I have not seen an ErrorIO since then.	2018-10-30 18:06:38 -04:00
Joey Hess	3963c5fcf5	better approach to enabling WAL mode The old approach opened the database an extra time to enable WAL mode, but more recent persistent-sqlite has a better API to enable it.	2018-10-30 13:47:38 -04:00
Joey Hess	a89db2c604	link to bug report blob delete from tree	2018-10-30 12:07:38 -04:00
Joey Hess	718915e9fc	improve comments	2018-10-30 11:52:05 -04:00
Joey Hess	5f9eff3f32	fix bug that prevented db being written to disk in SingleWriter mode The bug occurred when closeDb was not called, and garbage collection of the DbHandle didn't give the workerThread time to shut down. Fixed by exiting the runSqlite action when a commit is made. (MultiWriter mode already forked off a runSqlite action, so avoided the problem.) This commit was sponsored by Brock Spratlen on Patreon.	2017-09-18 19:42:20 -04:00
Joey Hess	6ab14710fc	fix consistency bug reading from export database The export database has writes made to it and then expects to read back the same data immediately. But, the way that Database.Handle does writes, in order to support multiple writers, makes that not work, due to caching issues. This resulted in export re-uploading files it had already successfully renamed into place. Fixed by allowing databases to be opened in MultiWriter or SingleWriter mode. The export database only needs to support a single writer; it does not make sense for multiple exports to run at the same time to the same special remote. All other databases still use MultiWriter mode. And by inspection, nothing else in git-annex seems to be relying on being able to immediately query for changes that were just written to the database. This commit was supported by the NSF-funded DataLad project.	2017-09-06 17:19:07 -04:00
Joey Hess	3b22ad9f47	Work around sqlite's incorrect handling of umask when creating databases. Refactored some common code into initDb. This only deals with the problem when creating new databases. If a repo got bad permissions into it, it's up to the user to deal with it. This commit was sponsored by Ole-Morten Duesund on Patreon.	2017-02-13 17:39:16 -04:00
Joey Hess	23d71423e1	work around ghc segfault hSetEncoding of a closed handle segfaults. https://ghc.haskell.org/trac/ghc/ticket/7161 `8484c0c197` introduced the crash. In particular, stdin may get closed (by eg, getContents) and then trying to set its encoding will crash. We didn't need to adjust stdin's encoding anyway, but only stderr, to work around https://github.com/yesodweb/persistent/issues/474 Thanks to Mesar Hameed for assistance related to reproducing this bug.	2016-12-30 18:14:19 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	4224fae71f	optimise read and write for Keys database (untested) Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously	2015-12-23 19:18:52 -04:00
Joey Hess	d43ac8056b	auto-close database connections when MVar is GCed	2015-12-23 16:11:36 -04:00
Joey Hess	6d38f54db4	split out Database.Queue from Database.Handle Fsck can use the queue for efficiency since it is write-heavy, and only reads a value before writing it. But, the queue is not suited to the Keys database.	2015-12-23 14:59:58 -04:00
Joey Hess	622da992f8	reorder database shutdown to be concurrency safe If a DbHandle is in use by another thread, it could be queueing changes while shutdown is running. So, wait for the worker to finish before flushing the queue, so that any last-minute writes are included. Before this fix, they would be silently dropped. Of course, if the other thread continues to try to use a DbHandle once it's closed, it will block forever as the worker is no longer reading from the jobs MVar. So, that would crash with "thread blocked indefinitely in an MVar operation".	2015-12-16 13:52:43 -04:00
Joey Hess	05b598a057	stash DbHandle in Annex state	2015-12-09 14:55:47 -04:00
Joey Hess	5072c62932	avoid ugly error about MVar if the sqlite worker thread crashes	2015-10-12 13:00:22 -04:00
Joey Hess	bc4129cc77	fsck: Commit incremental fsck database after every 1000 files fscked, or every 5 minutes, whichever comes first. Previously, commits were made every 1000 files fscked. Also, improve docs	2015-07-31 16:42:15 -04:00
Joey Hess	e143d5e7d1	avoid closing db handle when reconnecting to do a write	2015-02-22 14:21:39 -04:00
Joey Hess	bf80a16c2e	complete work around for sqlite SELECT ErrorBusy on new connection bug	2015-02-22 14:08:26 -04:00
Joey Hess	b541a5e38b	WIP	2015-02-18 17:46:58 -04:00
Joey Hess	80683871ee	deal with rare SELECT ErrorBusy failures I think they might be a sqlite bug. In discussions with sqlite devs.	2015-02-18 16:56:52 -04:00
Joey Hess	af254615b2	use WAL mode to ensure read from db always works, even when it's being written to Also, moved the database to a subdir, as there are multiple files. This seems to work well with concurrent fscks, although they still do redundant work due to the commit granularity. Occasionally two writes will conflict, and one is then deferred and happens later. Except, with 3 concurrent fscks, I got failures: git-annex: user error (SQLite3 returned ErrorBusy while attempting to perform prepare "SELECT \"fscked\".\"key\"\nFROM \"fscked\"\nWHERE \"fscked\".\"key\" = ?\n": database is locked) Argh!!!	2015-02-18 15:54:24 -04:00
Joey Hess	17cb219231	more robust handling of deferred commits Still not robust enough. I have 3 fscks running concurrently, and am seeing: ("commit deferred",user error (SQLite3 returned ErrorBusy while attempting to perform step.)) and git-annex: user error (SQLite3 returned ErrorBusy while attempting to perform prepare "SELECT \"fscked\".\"key\"\nFROM \"fscked\"\nWHERE \"fscked\".\"key\" = ?\n": database is locked)	2015-02-18 14:11:27 -04:00
Joey Hess	a3370ac459	allow for concurrent incremental fsck processes again (sorta) Sqlite doesn't support multiple concurrent writers at all. One of them will fail to write. It's not even possible to have two processes building up separate transactions at the same time. Before using sqlite, incremental fsck could work perfectly well with multiple fsck processes running concurrently. I'd like to keep that working. My partial solution, so far, is to make git-annex buffer writes, and every so often send them all to sqlite at once, in a transaction. So most of the time, nothing is writing to the database. (And if it gets unlucky and a write fails due to a collision with another writer, it can just wait and retry the write later.) This lets multiple processes write to the database successfully. But, for the purposes of concurrent, incremental fsck, it's not ideal. Each process doesn't immediately learn of files that another process has checked. So they'll tend to do redundant work. Only way I can see to improve this is to use some other mechanism for short-term IPC between the fsck processes. Not yet done. ---- Also, make addDb check if an item is in the database already, and not try to re-add it. That fixes an intermittent crash with "SQLite3 returned ErrorConstraint while attempting to perform step." I am not 100% sure why; it only started happening when I moved write buffering into the queue. It seemed to generally happen on the same file each time, so could just be due to multiple files having the same key. However, I doubt my sound repo has many duplicate keys, and I suspect something else is going on. ---- Updated benchmark, with the 1000 item queue: 6m33.808s	2015-02-17 16:56:12 -04:00
Joey Hess	ea76d04e15	show error when sqlite crashes worker thread Better than "blocked indefinitely in MVar"..	2015-02-17 13:03:57 -04:00
Joey Hess	99a1287f4f	avoid fromIntegral overhead	2015-02-16 17:22:00 -04:00

1 2

53 commits