git-annex

Author	SHA1	Message	Date
Joey Hess	d9fd205cbb	push RawFilePath down into Annex.ReplaceFile Minor optimisation, but a win in every case, except for a couple where it's a wash. Note that replaceFile still takes a FilePath, because it needs to operate on Chars to truncate unicode filenames properly.	2023-10-26 13:36:49 -04:00
Joey Hess	aff37fc208	avoid annexFileMode special case This makes annexFileMode be just an application of setAnnexPerm', which avoids having 2 functions that do different versions of the same thing. Fixes some buggy behavior for some combinations of core.sharedRepository and umask. Sponsored-by: Jack Hill on Patreon	2023-04-27 15:58:37 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	cb6cb61ca1	avoid build warning on windows	2023-03-27 12:20:35 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	54ad1b4cfb	Windows: Support long filenames in more (possibly all) of the code Works around this bug in unix-compat: https://github.com/jacobstanley/unix-compat/issues/56 getFileStatus and other FilePath using functions in unix-compat do not do UNC conversion on Windows. Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the necessary conversion on windows to support long filenames. Audited all imports of System.PosixCompat.Files to make sure that no functions that operate on FilePath were imported from it. Instead, use the equvilants from Utility.RawFilePath. In particular the re-export of that module in Common had to be removed, which led to lots of other changes throughout the code. The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig make Utility.Directory not be needed to build setup. And so let it use Utility.RawFilePath, which depends on unix, which cannot be in setup-depends. Sponsored-by: Dartmouth College's Datalad project	2023-03-01 15:55:58 -04:00
Joey Hess	acc3f6211f	finishing up move --from --to Lock the local content for drop after getting it from src, to prevent another process from using the local content as a copy and dropping it from src, which would prevent dropping the local content after sending it to dest. Support resuming an interrupted move that downloaded the content from src, leaving the local content populated. In this case, the location log has not been updated to say the content is present locally, so we can assume that it's resuming and go ahead and drop the local content after sending it to dest. Note that if a `git-annex get` is being ran at the same time as a `git-annex move --from --to`, it may get a file just before the move processes it. So the location log has not been updated yet, and the move thinks it's resuming. Resulting in local copy being dropped after it's sent to the dest. This race is something we'll just have to live with, it seems. I also gave up on the idea of checking if the location log had been updated by a `git-annex get` that is ran at the same time. That wouldn't work, because the location log is precached in the seek stage, so reading it again after sending the content to dest would not notice changes made to it, unless the cache were invalidated, which would slow it down a lot. That idea anyway was subject to races where it would not detect the concurrent `git-annex get`. So concurrent `git-annex get` will have results that may be surprising. To make that less surprising, updated the documentation of this feature to be explicit that it downloads content to the local repository temporarily. Sponsored-by: Dartmouth College's DANDI project	2023-01-23 17:43:48 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	d00e23cac9	RawFilePath optimisations	2022-06-22 16:20:08 -04:00
Joey Hess	f80ec74128	RawFilePath optimisation	2022-06-22 16:08:26 -04:00
Joey Hess	478ed28f98	revert windows-specific locking changes that broke tests This reverts windows-specific parts of `5a98f2d509` There were no code paths in common between windows and unix, so this will return Windows to the old behavior. The problem that the commit talks about has to do with multiple different locations where git-annex can store annex object files, but that is not too relevant to Windows anyway, because on windows the filesystem is always treated as criplled and/or symlinks are not supported, so it will only use one object location. It would need to be using a repo populated in another OS to have the other object location in use probably. Then a drop and get could possibly lead to a dangling lock file. And, I was not able to actually reproduce that situation happening before making that commit, even when I forced a race. So making these changes on windows was just begging trouble.. I suspect that the change that caused the reversion is in Annex/Content/Presence.hs. It checks if the content file exists, and then called modifyContentDirWhenExists, which seems like it would not fail, but if something deleted the content file at that point, that call would fail. Which would result in an exception being thrown, which should not normally happen from a call to inAnnexSafe. That was a windows-specific change; the unix side did not have an equivilant change. Sponsored-by: Dartmouth College's Datalad project	2022-05-23 13:21:26 -04:00
Joey Hess	aa414d97c9	make fsck normalize object locations The purpose of this is to fix situations where the annex object file is stored in a directory structure other than where annex symlinks point to. But it will also move object files from the hashdirmixed back to hashdirlower if the repo configuration makes that the normal location. It would have been more work to avoid that than to let it do it. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 15:38:06 -04:00
Joey Hess	6b5029db29	fix hardcoding of number of hash directories It can be changed to 1 via a tuning, rather than the 2 this assumed. So it would have tried to rmdir .git/annex/objects in that case, which would not hurt anything, but is not what it is supposed to do. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 15:08:42 -04:00
Joey Hess	5a98f2d509	avoid creating content directory when locking content If the content directory does not exist, then it does not make sense to lock the content file, as it also does not exist, and so it's ok for the lock operation to fail. This avoids potential races where the content file exists but is then deleted/renamed, while another process sees that it exists and goes to lock it, resulting in a dangling lock file in an otherwise empty object directory. Also renamed modifyContent to modifyContentDir since it is not only necessarily used for modifying content files, but also other files in the content directory. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 12:34:56 -04:00
Joey Hess	51c528980c	avoid accidentally thawing git-annex symlink It did nothing, since at this point the link is dangling. But when there is a thaw hook, it would probably not be happy to be asked to run on a symlink, or might do something unexpected. Sponsored-by: Dartmouth College's Datalad project	2022-02-24 14:21:23 -04:00
Joey Hess	f4b046252a	Run annex.thawcontent-command before deleting an object file In case annex.freezecontent-command did something that would prevent deletion. Sponsored-by: Dartmouth College's Datalad project	2022-02-24 14:11:02 -04:00
Joey Hess	ce1b3a9699	info: Allow using matching options in more situations File matching options like --include will be rejected in situations where there is no filename to match against. (Or where there is a filename but it's not relative to the cwd, or otherwise seemed too bothersome to match against.) The addition of listKeys' was necessary to avoid using more memory in the common case of "git-annex info". Adding a filterM would have caused the list to buffer in memory and not stream. This is an ugly hack, but listKeys had previously run Annex operations inside unafeInterleaveIO (for direct mode). And matching against a matcher should hopefully not change any Annex state. This does allow for eg `git-annex info somefile --include=*.ext` although why someone would want to do that I don't really know. But it seems to make sense to allow it. But, consider: `git-annex info ./somefile --include=somefile` This does not match, so will not display info about somefile. If the user really wants to, they can `--include=./somefile`. Using matching options like --copies or --in=remote seems likely to be slower than git-annex find with those options, because unlike such commands, info does not have optimised streaming through the matcher. Note that `git-annex info remote` is not the same as `git-annex info --in remote`. The former shows info about all files in the remote. The latter shows local keys that are also in that remote. The output should make that clear, but this still seems like a point where users could get confused. Sponsored-by: Jochen Bartl on Patreon	2022-02-21 14:46:07 -04:00
Joey Hess	76e365769e	fix crash after drop in v10 After cleaning up the lock file, the content directory is gone, so freezing it failed. Sponsored-by: Dartmouth College's Datalad project	2022-01-20 14:03:27 -04:00
Joey Hess	cea6f6db92	v10 upgrade locking The v10 upgrade should almost be safe now. What remains to be done is notice when the v10 upgrade has occurred, while holding the shared lock, and switch to using v10 lock files. Sponsored-by: Dartmouth College's Datalad project	2022-01-20 11:33:14 -04:00
Joey Hess	538d02d397	delete content lock file safely after shared lock Upgrade the shared lock to an exclusive lock, and then delete the lock file. If there is another process still holding the shared lock, the first process will fail taking the exclusive lock, and not delete the lock file; then the other process will later delete it. Note that, in the time period where the exclusive lock is held, other attempts to lock the content in place would fail. This is unlikely to be a problem since it's a short period. Other attempts to lock the content for removal would also fail in that time period, but that's no different than a removal failing because content is locked to prevent removal. Sponsored-by: Dartmouth College's Datalad project	2022-01-13 14:54:57 -04:00
Joey Hess	86e5ffe34a	clean empty object directories after deleting content lock file When dropping content, this was already done after deleting the content file, but the lock file prevents deleting the directories. So, try the deletion again. This does mean there's a small added overhead of a failed rmdir(). Sponsored-by: Dartmouth College's Datalad project	2022-01-13 14:22:37 -04:00
Joey Hess	a3b6b3499b	delete content lock file safely on drop, keep after shared lock This seems to be the best that can be done to avoid forever accumulating the new content lock files, while being fully safe. This is fixing code paths that have lingered unused since direct mode! And direct mode seems to have been buggy in this area, since the content lock file was deleted on unlock. But with a shared lock, there could be another process that also had the lock file locked, and deleting it invalidates that lock. So, the lock file cannot be deleted after a shared lock. At least, not wihout taking an exclusive lock first.. which I have not pursued yet but may. After an exclusive lock, the lock file can be deleted. But there is still a potential race, where the exclusive lock is held, and another process gets the file open, just as the exclusive lock is dropped and the lock file is deleted. That other process would be left with a file handle it can take a shared lock of, but with no effect since the file is deleted. Annex.Transfer also deletes lock files, and deals with this same problem by using checkSaneLock, which is how I've dealt with it here. Sponsored-by: Dartmouth College's Datalad project	2022-01-13 13:58:58 -04:00
Joey Hess	3936599885	move code from Command.Fsck Sponsored-by: Dartmouth College's Datalad project	2022-01-13 13:24:50 -04:00
Joey Hess	3c042606c2	use separate lock from content file in v9 Windows has always used a separate lock file, but on unix, the content file itself was locked, and in v9 that changes to also use a separate lock file. This needs to be tested more. Eg, what happens after dropping a file; does the the content lock file get deleted too, or linger around? Sponsored-by: Dartmouth College's Datalad project	2022-01-11 17:03:14 -04:00
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	21c0d5be6e	comment	2022-01-07 12:27:19 -04:00
Joey Hess	8034f2e9bb	factor out IncrementalHasher from IncrementalVerifier	2021-11-09 12:33:22 -04:00
Joey Hess	669037862a	avoid redundant freezeContent call This opens the potential for the object file to be in place but git-annex is interrupted before it can freeze it. git-annex fsck already fixes that situation, which can also occur when lockContentForRemoval thaws content. Also improve comment to not be Windows-specific.	2021-10-27 14:18:10 -04:00
Reiko Asakura	0db7297f00	Call freezeContent after move into annex This change better supports Windows ACL management using annex.freezecontent-command and annex.thawcontent-command and matches the behaviour of adding an unlocked file. By calling freezeContent after the file has moved into the annex, the file's delete permission can be denied. If the file's delete permission is denied before moving into the annex, the file cannot be moved or deleted. If the file's delete permission is not denied after moving into the annex, it will likely inherit a grant for the delete permission which allows it to be deleted irrespective of the permissions of the parent directory.	2021-10-27 14:05:57 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	88b63a43fa	distinguish between incremental verification failing and not being done Sponsored-by: Dartmouth College's DANDI project	2021-08-18 14:38:02 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	6111958440	fix test suite `14683da9eb` caused a test suite failure. When the content of a key is not present, a LinkAnnexFailed is returned, but replaceFile then tried to move the file into place, and since it was not written, that crashed. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-08-02 13:59:23 -04:00
Joey Hess	817ccbbc47	split verifyKeyContent This avoids it calling enteringStage VerifyStage when it's used in places that only fall back to verification rarely, and which might be called while in TransferStage and be going to perform a transfer after the verification.	2021-07-29 13:58:40 -04:00
Joey Hess	067a9c70c7	simplify code	2021-07-29 12:28:13 -04:00
Joey Hess	3e0b210039	remove unncessary debugs Keeping the ones in Annex.InodeSentinal	2021-07-29 12:19:37 -04:00
Joey Hess	de482c7eeb	move verifyKeyContent to Annex.Verify The goal is that Database.Keys be able to use it; it can't use Annex.Content.Presence due to an import loop. Several other things also needed to be moved to Annex.Verify as a conseqence.	2021-07-27 14:07:23 -04:00
Joey Hess	14683da9eb	fix potential race in updating inode cache Some uses of linkFromAnnex are inside replaceWorkTreeFile, which was already safe, but others use it directly on the work tree file, which was race-prone. Eg, if the work tree file was first removed, then linkFromAnnex called to populate it, the user could have re-written it in the interim. This came to light during an audit of all calls of addInodeCaches, looking for such races. All the other uses of it seem ok. Sponsored-by: Brett Eisenberg on Patreon	2021-07-27 13:08:08 -04:00
Joey Hess	e4b2a067e0	fix potential race in updating inode cache In Annex.Content, the object file was statted after pointer files were populated. But if annex.thin is set, once the pointer files are populated, the object file can potentially be modified via the hard link. So, it was possible, though seemingly very unlikely, for the inode of the modified object file to be cached. Command.Fix and Command.Fsck had similar problems, statting the work tree files after they were in place. Changed them to stat the temp file that gets moved into place. This does rely on .git/annex being on the same filesystem. If it's not, the cached inode will not be the same as the one that the temp file gets moved to. Result will be that git-annex will later need to do an expensive verification of the content of the worktree files. Note that the cross-filesystem move of the temp file already is a larger amount of extra work, so this seems acceptable. Sponsored-by: Luke Shumaker on Patreon	2021-07-27 12:29:10 -04:00
Joey Hess	3b5a3e168d	check if object is modified before starting to send it Fix bug that caused some transfers to incorrectly fail with "content changed while it was being sent", when the content was not changed. While I don't know how to reproduce the problem that several people reported, it is presumably due to the inode cache somehow being stale. So check isUnmodified', and if it's not modified, include the file's current inode cache in the set to accept, when checking for modification after the transfer. That seems like the right thing to do for another reason: The failure says the file changed while it was being sent, but if the object file was changed before the transfer started, that's wrong. So it needs to check before allowing the transfer at all if the file is modified. (Other calls to sameInodeCache or elemInodeCaches, when operating on inode caches from the database, could also be problimatic if the inode cache is somehow getting stale. This does not address such problems.) Sponsored-by: Dartmouth College's Datalad project	2021-07-26 17:33:49 -04:00
Joey Hess	f195f3b541	more inode cache debugging	2021-07-26 12:57:35 -04:00
Joey Hess	df2001aa88	Improve display of errors when transfers fail Transfers from or to a local git repo could fail without a reason being given, if the content failed to verify, or if the object file's stat changed while it was being copied. Now display messages in these cases. Sponsored-by: Jack Hill on Patreon	2021-06-25 13:17:04 -04:00
Joey Hess	e147ae07f4	remove supportUnlocked check that is not worth its overhead moveAnnex only gets to that check if the object file was not present before. So in the case where dup files are being added repeatedly, it will only run the first time, and so there's no significant speedup from doing it; all it avoids is a single sqlite lookup. Since MVar accesses do have overhead, it's better to optimise for the common case, where unlocked files are supported. removeAnnex is less clear cut, but I think mostly is skipped running on keys when the object has already been dropped, so similar reasoning applies.	2021-06-15 09:28:56 -04:00
Joey Hess	014dc63a55	avoid sometimes expensive operations when annex.supportunlocked = false This will mostly just avoid a DB lookup, so things get marginally faster. But in cases where there are many files using the same key, it can be a more significant speedup. Added overhead is one MVar lookup per call, which should be small enough, since this happens after transferring or ingesting a file, which is always a lot more work than that. It would be nice, though, to move getGitConfig to AnnexRead, which there is an open todo about.	2021-06-14 12:40:41 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	4b739fc460	Fix build on Windows Thanks to bug reporter for the patch.	2020-11-19 12:33:00 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	af6af35228	split out Annex.Content.Presence This will let a module that Annex.Content imports use inAnnex. Unsure yet if I will need that, but this split still seems to make sense, and Annex.Content was way too long so splitting it is good.	2020-11-16 11:24:57 -04:00

1 2 3 4 5 ...

328 commits