git-annex

Author	SHA1	Message	Date
Joey Hess	2478e9e03a	restage: New git-annex command, handles restaging unlocked files This is much easier and less failure-prone than having the user run git update-index --refresh themselves. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 16:29:59 -04:00
Joey Hess	f7146c153b	fix restaging of transferred files after stalldetection kicks in Sponsored-by: Dartmouth College's DANDI project	2022-09-23 15:55:40 -04:00
Joey Hess	6a3bd283b8	add restage log When pointer files need to be restaged, they're first written to the log, and then when the restage operation runs, it reads the log. This way, if the git-annex process is interrupted before it can do the restaging, a later git-annex process can do it. Currently, this lets a git-annex get/drop command be interrupted and then re-ran, and as long as it gets/drops additional files, it will clean up after the interrupted command. But more changes are needed to make it easier to restage after an interrupted process. Kept using the git queue to run the restage action, even though the list of files that it builds up for that action is not actually used by the action. This could perhaps be simplified to make restaging a cleanup action that gets registered, rather than using the git queue for it. But I wasn't sure if that would cause visible behavior changes, when eg dropping a large number of files, currently the git queue flushes periodically, and so it restages incrementally, rather than all at the end. In restagePointerFiles, it reads the restage log twice, once to get the number of files and size, and a second time to process it. This seemed better than reading the whole file into memory, since potentially a huge number of files could be in there. Probably the OS will cache the file in memory and there will not be much performance impact. It might be better to keep running tallies in another file though. But updating that atomically with the log seems hard. Also note that it's possible for calcRestageLog to see a different file than streamRestageLog does. More files may be added to the log in between. That is ok, it will only cause the filterprocessfaster heuristic to operate with slightly out of date information, so it may make the wrong choice for the files that got added and be a little slower than ideal. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 15:47:24 -04:00
Joey Hess	8718125ae4	refactor the restage runner Sponsored-by: Dartmouth College's DANDI project	2022-09-23 13:12:17 -04:00
Joey Hess	6e3c9bea2e	drain transferrer read handle when shutting it down Fixes updating git index file after getting an unlocked file when annex.stalldetection is set. The transferrer may want to send additional protocol messages when it's shut down. Closing the read handle prevented it from doing that, and caused it to crash rather than cleanly shutting down. Draining the handle without processing the protocol seemed ok to do, because anything it outputs is going to be some side message displayed at shutdown. Displaying those once per transferrer process that is running seems unncessary. Sponsored-by: Dartmouth College's DANDI project	2022-09-22 14:39:39 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Joey Hess	d2c842e9a1	don't force use of conduit in withUrlOptionsPromptingCreds Use curl for downloads from git remotes when annex.url-options and other git configs are set. If the url needs a password, curl will fail, and git credential will not be used to prompt for it. But the user can set --netrc in url-options and put the password in the netrc file. This also means that url-options settings like -4 will take effect. That was the case before commit `1883f7ef8f` forced conduit to be used.	2022-09-09 16:07:32 -04:00
Joey Hess	c62fe5e9a8	avoid redundant prompt for http password in git-annex get that does autoinit autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git remote has not been probed yet, that will do a http get that will prompt for a password. And then the parent process will subsequently prompt for a password when getting annexed files from the remote. So the solution is for autoEnableSpecialRemotes to run remoteList before the subprocess, which will probe for the uuid for the git remote in the same process that will later be used to get annexed files. But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git, so Annex.Init cannot import Remote.List. Had to pass remoteList into functions in Annex.Init to get around this dependency loop.	2022-09-09 14:43:43 -04:00
Joey Hess	9621beabc4	cache credentials in memory when doing http basic auth to a git remote When accessing a git remote over http needs a git credential prompt for a password, cache it for the lifetime of the git-annex process, rather than repeatedly prompting. The git-lfs special remote already caches the credential when discovering the endpoint. And presumably commands like git pull do as well, since they may download multiple urls from a remote. The TMVar CredentialCache is read, so two concurrent calls to getBasicAuthFromCredential will both prompt for a credential. There would already be two concurrent password prompts in such a case, and existing uses of `prompt` probably avoid it. Anyway, it's no worse than before.	2022-09-09 14:20:32 -04:00
Joey Hess	d4fd966396	avoid dup check of guardSafeToUseRepo Speeds up init slightly, and reduces the number of syscalls by the dynamic linker. Sponsored-by: Dartmouth College's Datalad project	2022-08-29 13:52:58 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	b801812660	init: probe if sqlite works Help the user get annex.dbdir configured when their filesystem is not one that sqlite works on. The change in Database.Handle makes an error from sqlite not be ignored besides being displayed, which it was before. I can't see any reason git-annex would want to ignore these errors. I chose to use the fsck database rather than the keys database because opening the keys database populates it, and see commit `b3c4579c79`. The placement of the call to checkSqliteWorks inside checkInitializeAllowed avoids annex.uuid getting set before it's called. Sponsored-by: Dartmouth College's Datalad project	2022-08-17 13:12:26 -04:00
Joey Hess	840bd50390	make it easier to use curl for unusual url schemes Use curl when annex.security.allowed-url-schemes includes an url scheme not supported by git-annex internally, as long as annex.security.allowed-ip-addresses is configured to allow using curl. Sponsored-by: Luke Shumaker on Patreon	2022-08-15 12:22:13 -04:00
Joey Hess	4cfe17a9e8	use a subdirectory of annex.dbdir This allows annex.dbdir to be set globally or always set to the same value when needed. Each repository uses a subdirectory of it. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 13:18:15 -04:00
Joey Hess	a335c1e46e	annex.dbdir fully working Completes work started in `e60766543f` I've verified that all the sqlite databases get stored in annex.dbdir and are created successfully. If annex.dbdir does not exist, it will be created; its parent directory must already exist though. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 13:06:58 -04:00
Joey Hess	23c6e350cb	improve createDirectoryUnder to allow alternate top directories This should not change the behavior of it, unless there are multiple top directories, and then it should behave the same as if there was a single top directory that was actually above the directory to be created. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 12:52:37 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	a23fd7349f	work around git segfault Work around bug in git 2.37 that causes a segfault when when core.untrackedCache is set, and broke git-annex init. Depending on when git gets fixed and how widely the buggy versions are used, this could be reverted quite soon, or need to linger for a long time. It only makes git-annex init a tiny bit slower in a new repo. Sponsored-by: Max Thoursie on Patreon	2022-08-04 14:20:57 -04:00
Joey Hess	be19a68276	new matching options --want-get-by and --want-drop-by Sponsored-by: Graham Spencer on Patreon	2022-07-28 13:26:03 -04:00
Joey Hess	d905232842	use ResourcePool for hash-object handles Avoid starting an unncessary number of git hash-object processes when concurrency is enabled. Sponsored-by: Dartmouth College's DANDI project	2022-07-25 17:32:39 -04:00
Joey Hess	63cef2ae0b	v8 repositories automatically upgrade to v9 (And v9 later on to v10.) When v9/v10 were added, making v8 automatically upgrade was deferred "for a few months" to prevent interoperability problems if users also have an old version of git-annex. Of course that could still be the case, but there has been a good amount of time and this can't be put off forever. Allow setting annex.autoupgraderepository to false to avoid this upgrade. Previously, that only prevented upgrades from no longer supported git-annex versions, but v8 is still supported, and users may want to keep on v8 to interoperate with an old git-annex version. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-07-25 16:20:04 -04:00
Joey Hess	cbe12b9bc3	force fully strict read of journal file again I was thinking that discardIncompleteAppend would make it strict, since it looks at the end of the bytestring. But, it's applied lazily.. This probably fixes windows, which was failing: git-annex.exe: .git\annex\journal\trust.log: DeleteFile "\\\\?\\C:\\Users\\runneradmin\\.t\\5\\tmprepo22\\.git\\annex\\journal\\trust.log": permission denied (The process cannot access the file because it is being used by another process.)	2022-07-22 11:36:21 -04:00
Joey Hess	4e88137a28	prevent appends except when annex.alwayscompact=false I would like for a new repo version to enable appends, but to do so safely would need a v11 followed by a 1 year delay followed by a v12 that does it. Since a similar v9 and v10 transition is currently happening, and is less than 6 months along in most repos, it does not feel wise to stack up another year-long transition behind that. What if I need to hurry up a new repo version for some other change? Added todo so I remember to make this change at some time when a v11 and probably v12 repo version do make sense. Sponsored-by: Dartmouth College's DANDI project	2022-07-20 13:23:55 -04:00
Joey Hess	d275874e6c	handling of interrupted appends An append that is interrupted and writes part of a line is now dealt with by subsequent reads and appends. This also handles a read that happens at the same time as an append to the file. Old versions of git-annex will still see a partially written line, and could get confused. Since appends are currently done for url logs and location logs, the confusion is limited to a substring of the actual url or UUID of the remote being read. This will not affect writes, since the journal file is locked when reading in preparation for writing. However, the bad data can be output by git-annex and used by other things, or could cause surprising behavior by git-annex. Including eg, downloading the content of the wrong url. So, something needs to be done to prevent old versions of git-annex from running in a repository where this appending is being done.. Sponsored-by: Dartmouth College's DANDI project	2022-07-20 12:40:49 -04:00
Joey Hess	6f1fd3abdd	no locking of journal on read after all Finally have a final design, and it turns out not to need locking on read.	2022-07-20 10:57:28 -04:00
Joey Hess	d0860b7f0e	fix build After `28b0aaea54`	2022-07-18 16:44:32 -04:00
Joey Hess	28b0aaea54	re-add lock journal before reading journal files This reverts commit `2e6e9876e3`. This is gonna be needed after all.. The append will only be atomic if the journal is locked, because the file being appended will have to be moved out of the way to avoid an old version of git-annex seeing an incomplete write to it. When git-annex finds that the file is not in the journal, and checks the append location, locking will be needed to avoid a race causing it to miss it in the append location too due to it being moved back to the journal.	2022-07-18 16:40:25 -04:00
Joey Hess	36f0bdcd57	add annex.alwayscompact Added annex.alwayscompact setting which can be unset to speed up writes to the git-annex branch in some cases. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 16:39:19 -04:00
Joey Hess	ccff639651	Merge branch 'master' into append	2022-07-18 14:17:15 -04:00
Joey Hess	de18d92de6	efficient but unsafe journal file append This is only for checking performance, it's not safe. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 14:17:12 -04:00
Joey Hess	1c40b927aa	minor optimisation Avoid re-writing the file when the journal directory did not exist.	2022-07-18 13:50:35 -04:00
Joey Hess	2e6e9876e3	Revert "lock journal before reading journal files" This reverts commit `47358a6f95`. This added overhead, and will not be needed, because appends are going to have to be made atomic for other reasons than avoiding incomplete reads of data being appended. In particular, when git-annex is interrupted in the middle of an append, it must not leave the file with a partially written line. So appending has to somehow be made fully atomic.	2022-07-18 13:38:12 -04:00
Joey Hess	ce455223df	split out appending to journal from writing, high level only Currently this is not an improvement, but it allows for optimising appendJournalFile later. With an optimised appendJournalFile, this will greatly speed up access patterns like git-annex addurl of a lot of urls to the same key, where the log file can grow rather large. Appending rather than re-writing the journal file for each line can save a lot of disk writes. It still has to read the current journal or branch file, to check if it can append to it, and so when the journal file does not exist yet, it can write the old content from the branch to it. Probably the re-reads are better cached by the filesystem than repeated writes. (If the re-reads turn out to keep performance bad, they could be eliminated, at the cost of not being able to compact the log when replacing old information in it. That could be enabled by a switch.) While the immediate need is to affect addurl writes, it was implemented at the level of presence logs, so will also perhaps speed up location logs. The only added overhead is the call to isNewInfo, which only needs to compare ByteStrings. Helping to balance that out, it avoids compactLog when it's able to append. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 13:22:50 -04:00
Joey Hess	47358a6f95	lock journal before reading journal files This is not currently necessary; journal files are updated atomically. However, for faster appends to large journal files, locking on read will be needed, because appends are not atomic. Sponsored-by: Dartmouth College's DANDI project	2022-07-15 14:43:29 -04:00
Joey Hess	a2b1f369d1	disable journalIgnorable in enableInteractiveBranchAccess Fix a reversion that prevented --batch commands (and the assistant) from noticing data written to the journal by other commands. I have not identified which commit broke this for sure, but probably it was `aeca7c2207` --batch commands that wrote to the journal avoided the problem since journalIgnorable sets unset on write. It's a little bit surprising that nobody noticed that query --batch commands did not see data written by other commands. Sponsored-by: Dartmouth College's DANDI project	2022-07-15 13:48:41 -04:00
Joey Hess	91abd872d3	complete a comment	2022-07-15 12:59:59 -04:00
Joey Hess	ad467791c1	optimise journal writes to not mkdir journal directory when it already exists Sponsored-by: Dartmouth College's DANDI project	2022-07-14 12:29:39 -04:00
Joey Hess	1b680d330b	revert accidental change	2022-07-13 15:17:08 -04:00
Joey Hess	68e9b7f987	comment	2022-07-13 13:44:43 -04:00
Joey Hess	f58fb6a79a	fix build when dbus is enabled Broken in commit `8040ecf9b8`	2022-07-05 13:06:45 -04:00
Joey Hess	8040ecf9b8	final readonly values moves to AnnexRead At this point I've checked all AnnexState values and these were all that remained that could move. Pity that Annex.repo can't move, but it gets modified sometimes.. A couple of AnnexState values are set by options and could be AnnexRead, but happen to use Annex when being set. Sponsored-by: Max Thoursie on Patreon	2022-06-28 16:04:58 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	d00e23cac9	RawFilePath optimisations	2022-06-22 16:20:08 -04:00
Joey Hess	224a57f9ed	RawFilePath optimisation	2022-06-22 16:11:03 -04:00
Joey Hess	95a04920cf	remove objectDir'	2022-06-22 16:08:49 -04:00
Joey Hess	f80ec74128	RawFilePath optimisation	2022-06-22 16:08:26 -04:00
Joey Hess	78a3d44ea0	get rid of racy addLink The remaining callers all did not rely on it checking gitignore, so were easy to convert. They were susceptable to the same overwrite race as add and fix, although less likely to have it and a narrower window than add's race. Command.Rekey in passing got an unncessary call to removeFile deleted. addSymlink handles deleting any existing worktree file.	2022-06-14 14:47:15 -04:00
Joey Hess	7ace804d8e	avoid writing same symlink twice in a row Oddly, the second write did not cause it to lose the mtime inherited from the file being added, although the mtime was not provided to that write but only to the first. I don't quite know why that worked before!	2022-06-14 14:30:12 -04:00
Joey Hess	5ef79125ad	fix overwrite race with git-annex add of annex symlink In the unlikely case where git-annex add is run on an annex symlink that is not already added, and while it's processing it, the annex symlink is overwritten with something else, avoid git-annex overwriting that with the symlink again. Sponsored-by: Jack Hill on Patreon	2022-06-14 14:00:13 -04:00

1 2 3 4 5 ...

1888 commits