git-annex

Author	SHA1	Message	Date
Joey Hess	faf84aa5c2	Avoid git status taking a long time after git-annex unlock of many files. Implemented by making Git.Queue have a FlushAction, which can accumulate along with another action on files, and runs only once the other action has run. This lets git-annex unlock queue up git update-index actions, without conflicting with the restagePointerFiles FlushActions. In a repository with filter-process enabled, git-annex unlock will often not take any more time than before, though it may when the files are large. Either way, it should always slow down less than git-annex status speeds up. When filter-process is not enabled, git-annex unlock will slow down as much as git status speeds up. Sponsored-by: Jochen Bartl on Patreon	2022-02-18 15:06:40 -04:00
Joey Hess	21e40b86d8	have v9 autoupgrade to v10 This was right before commit `a27776f602`, which made v6 v7 autoupgrade to v8 but not yet to v10. Sponsored-by: Dartmouth College's Datalad project	2022-01-26 13:16:06 -04:00
Joey Hess	a27776f602	init --version=6 upgrade to 8 not yet 10 autoUpgradeableVersions had latestVersion (10), but it did not make sense for asking for old version 6 to get version 10, while asking for version 8 got version 8. So use defaultVersion (8) instead. Sponsored-by: Dartmouth College's Datalad project	2022-01-25 13:52:42 -04:00
Joey Hess	3618746a85	fix failing readonly test case The problem is that withContentLockFile, in a v8 repo, has to take a shared lock of `.git/annex/content.lck`. But, in a readonly repository, if that file does not yet exist, it cannot lock it. And while it will sometimes work to `chmod +r .git/annex`, the repository might be readonly due to being owned by another user, or due to being mounted readonly. So, it seems that the only solution is to use some other file than `.git/annex/content.lck` as the lock file. The inode sential file was almost the only option that should always exist. (And if it somehow does not exist, creating an empty one for locking will be ok.) Wow, what a hack! Sponsored-by: Dartmouth College's Datalad project	2022-01-21 13:49:31 -04:00
Joey Hess	47084b8a1d	enable filter.annex.process in v9 This has tradeoffs, but is generally a win, and users who it causes git add to slow down unacceptably for can just disable it again. It needed to happen in an upgrade, since there are git-annex versions that do not support it, and using such an old version with a v8 repository with filter.annex.process set will cause bad behavior. By enabling it in v9, it's guaranteed that any git-annex version that can use the repository does support it. Although, this is not a perfect protection against problems, since an old git-annex version, if it's used with a v9 repository, will cause git add to try to run git-annex filter-process, which will fail. But at least, the user is unlikely to have an old git-annex in path if they are using a v9 repository, since it won't work in that repository. Sponsored-by: Dartmouth College's Datalad project	2022-01-21 13:11:18 -04:00
Joey Hess	dc14221bc3	detect v10 upgrade while running Capstone of the v10 upgrade process. Tested with a git-annex drop in a v8 repo that had a local v8 remote. Upgrading the repo to v10 (with --force) immedaitely caused it to notice and switch over to v10 locking. Upgrading the remote also caused it to switch over when operating on the remote. The InodeCache makes this fairly efficient, just an added stat call per lock of an object file. After the v10 upgrade, there is no more overhead. Sponsored-by: Dartmouth College's Datalad project	2022-01-21 12:56:38 -04:00
Joey Hess	76e365769e	fix crash after drop in v10 After cleaning up the lock file, the content directory is gone, so freezing it failed. Sponsored-by: Dartmouth College's Datalad project	2022-01-20 14:03:27 -04:00
Joey Hess	d0a5714409	continue to use v8 by default for now, unless upgraded Since it's easy to keep supporting v8, using it for a while (eg a few months) will give users time to upgrade git-annex installations, before it upgrades their repository to v9. This commit should be reverted once ready to start upgrading repositories by default. Sponsored-by: Dartmouth College's Datalad project	2022-01-20 11:56:05 -04:00
Joey Hess	0904eac8b4	automatic upgrade from v8 to v9 Sponsored-by: Dartmouth College's Datalad project	2022-01-20 11:39:36 -04:00
Joey Hess	cea6f6db92	v10 upgrade locking The v10 upgrade should almost be safe now. What remains to be done is notice when the v10 upgrade has occurred, while holding the shared lock, and switch to using v10 lock files. Sponsored-by: Dartmouth College's Datalad project	2022-01-20 11:33:14 -04:00
Joey Hess	9d5db6a09a	add upgrade.log The upgrade from V9 uses this to avoid an automatic upgrade until 1 year after the V9 update. It can also be used in future such situations. Sponsored-by: Dartmouth College's Datalad project	2022-01-19 15:52:29 -04:00
Joey Hess	856ce5cf5f	split upgrade into v9 and v10 v10 will run 1 year after the upgrade to v9, to give time for any v8 processes to die. Until that point, the v10 upgrade will be tried by every process but deferred, so added support for deferring upgrades. The upgrade prevention lock file that will be used by v10 is not yet implemented, so it does not yet defer. Sponsored-by: Dartmouth College's Datalad project	2022-01-19 13:09:33 -04:00
Joey Hess	4f7b8ce09d	fix spelling of upgradeable	2022-01-19 12:14:50 -04:00
Joey Hess	538d02d397	delete content lock file safely after shared lock Upgrade the shared lock to an exclusive lock, and then delete the lock file. If there is another process still holding the shared lock, the first process will fail taking the exclusive lock, and not delete the lock file; then the other process will later delete it. Note that, in the time period where the exclusive lock is held, other attempts to lock the content in place would fail. This is unlikely to be a problem since it's a short period. Other attempts to lock the content for removal would also fail in that time period, but that's no different than a removal failing because content is locked to prevent removal. Sponsored-by: Dartmouth College's Datalad project	2022-01-13 14:54:57 -04:00
Joey Hess	86e5ffe34a	clean empty object directories after deleting content lock file When dropping content, this was already done after deleting the content file, but the lock file prevents deleting the directories. So, try the deletion again. This does mean there's a small added overhead of a failed rmdir(). Sponsored-by: Dartmouth College's Datalad project	2022-01-13 14:22:37 -04:00
Joey Hess	e28d1d0325	fix logic that was not inverted after all oops	2022-01-13 14:11:36 -04:00
Joey Hess	a3b6b3499b	delete content lock file safely on drop, keep after shared lock This seems to be the best that can be done to avoid forever accumulating the new content lock files, while being fully safe. This is fixing code paths that have lingered unused since direct mode! And direct mode seems to have been buggy in this area, since the content lock file was deleted on unlock. But with a shared lock, there could be another process that also had the lock file locked, and deleting it invalidates that lock. So, the lock file cannot be deleted after a shared lock. At least, not wihout taking an exclusive lock first.. which I have not pursued yet but may. After an exclusive lock, the lock file can be deleted. But there is still a potential race, where the exclusive lock is held, and another process gets the file open, just as the exclusive lock is dropped and the lock file is deleted. That other process would be left with a file handle it can take a shared lock of, but with no effect since the file is deleted. Annex.Transfer also deletes lock files, and deals with this same problem by using checkSaneLock, which is how I've dealt with it here. Sponsored-by: Dartmouth College's Datalad project	2022-01-13 13:58:58 -04:00
Joey Hess	3d7933f124	fix inverted logic Now the content lock files are used in v9. However, I am not yet certian they are correct. In particular, lockContentUsing deletes the content lock file on unlock. But what if there's a shared lock by another process? That seems like it would discard that lock too! (Windows seems like it would not have the same problem, because as the comment in there says, "Can't delete a locked file on Windows". So if another process has a shared lock, removing it presumably fails.) Sponsored-by: Dartmouth College's Datalad project	2022-01-13 13:58:31 -04:00
Joey Hess	731b1ecf87	v9 upgrade implemented Seems to work ok. Unsure yet about the actual locking changes being correct. This is not the end of the story with upgrades, because it is unsafe for this upgrade as implemented to run in a repository where an old git-annex process is already running. The old process would use the old locking method, and not notice files locked by the new, and this could result in data loss. This problem will need to be dealt with before this branch is suitable for merging. Sponsored-by: Dartmouth College's Datalad project	2022-01-13 13:25:10 -04:00
Joey Hess	3936599885	move code from Command.Fsck Sponsored-by: Dartmouth College's Datalad project	2022-01-13 13:24:50 -04:00
Joey Hess	3c042606c2	use separate lock from content file in v9 Windows has always used a separate lock file, but on unix, the content file itself was locked, and in v9 that changes to also use a separate lock file. This needs to be tested more. Eg, what happens after dropping a file; does the the content lock file get deleted too, or linger around? Sponsored-by: Dartmouth College's Datalad project	2022-01-11 17:03:14 -04:00
Joey Hess	43f9d967ff	shared repository content file permissions for v9 v9 will not need to write to annex content files in order to lock them, so freezeContent removes the write bit in a shared repository, the same as in any other repository. checkContentWritePerm makes sure that the write perm is not set, which will let git-annex fsck fix up the permissions. Upgrading to v9 will need to fix the permissions as well, but it seems likely there will be situations where the user git-annex is running an upgrade as cannot, so it will have to leave the write bit set. In such a case, git-annex fsck can fix it later. Sponsored-by: Dartmouth College's Datalad project	2022-01-11 16:50:50 -04:00
Joey Hess	ff570ad363	add v9 annex.version, not yet the default This is the start of v9, but it's currently identical to v8, and v8 is not upgraded to it. git-annex upgrade will upgrade to v9 with this change. Sponsored-by: Dartmouth College's Datalad project	2022-01-11 14:59:39 -04:00
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	21c0d5be6e	comment	2022-01-07 12:27:19 -04:00
Joey Hess	e416635021	renameremote: Better handling of case where there are multiple special remotes with a name Instead of renaming one at random, error out and ask that a uuid be specified. Sponsored-by: Brett Eisenberg on Patreon	2022-01-05 15:24:02 -04:00
Joey Hess	58afb00f6e	enableremote: Better handling of the unusual case where multiple special remotes have been initialized with the same name Before it would pick one at random, though preferring ones that were not dead over dead ones. Now, if one is dead and the other not, it will use the non-dead one. But if both are not dead, or both dead, it will error out, suggesting the user clarify what they want to enable. Sponsored-by: Luke Shumaker on Patreon	2022-01-05 15:12:11 -04:00
Joey Hess	b1d719f9d2	handle transitions with read-only unmerged git-annex branches Capstone to this feature. Any transitions that have been performed on an unmerged remote ref but not on the local git-annex branch, or vice-versa have to be applied on the fly when reading files. Sponsored-by: Dartmouth College's Datalad project	2021-12-28 13:23:32 -04:00
Joey Hess	720baf820e	refactoring	2021-12-28 12:15:51 -04:00
Joey Hess	23a485498f	handle Annex.Branch.files with read-only unmerged git-annex branches It would be difficult to make Annex.Branch.files query the unmerged git-annex branches. Might be possible, similar to what was discussed in `7f6b2ca49c` but again I decided to make it not do anything in that situation to start with before adding such a complicated thing. git-annex info uses it when getting info about a repostory. The choices were to make that fail with an error, or display the info it can, and change the output slightly for the bits of info it cannot access. While that is a behavior change, and I want to avoid any behavior changes due to unmerged git-annex branches in a read-only repo, displaying a message that is not a number seems unlikely to break anything that was consuming a number, any worse than throwing an exception would. Probably. Also git-annex unused --from origin is made to throw an error, but it would fail later anyway when trying to write to the unused log files. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 15:28:31 -04:00
Joey Hess	7f6b2ca49c	handle overBranchFileContents with read-only unmerged git-annex branches This makes --all error out in that situation. Which is better than ignoring information from the branches. To really handle the branches right, overBranchFileContents would need to both query all the branches and union merge file contents (or perhaps not provide any file content), as well as diffing between branches to find files that are only present in the unmerged branches. And also, it would need to handle transitions.. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:30:51 -04:00
Joey Hess	d9d0fe5fa4	disable precaching git-annex branch when there are unmerged branches in a read-only repo The way precaching works, it can't merge in information from those branches efficiently, so just disable it and fall back to Annex.Branch.get in order to get the correct information. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:08:50 -04:00
Joey Hess	1e09cf661e	remove git-annex branch ref from unmerged refs list It's queried separately so it was causing extra work to include it.	2021-12-27 13:33:27 -04:00
Joey Hess	6d7ecd9e5d	merge git-annex branch in memory in read-only repository Improved support for using git-annex in a read-only repository, git-annex branch information from remotes that cannot be merged into the git-annex branch will now not crash it, but will be merged in memory. To avoid this making git-annex behave one way in a read-only repository, and another way when it can write, it's important that Annex.Branch.get return the same thing (modulo log file compaction) in both cases. This manages that mostly. There are some exceptions: - When there is a transition in one of the remote git-annex branches that has not yet been applied to the local or other git-annex branches. Transitions are not handled. - `git-annex log` runs git log on the git-annex branch, and so it will not be able to show information coming from the other, not yet merged branches. - Annex.Branch.files only looks at files in the git-annex branch and not unmerged branches. This affects git-annex info output. - Annex.Branch.hs.overBranchFileContents ditto. Affects --all and also importfeed (but importfeed cannot work in a read-only repo anyway). - CmdLine.Seek.seekFilteredKeys when precaching location logs. Note use of Annex.Branch.fullname - Database.ContentIdentifier.needsUpdateFromLog and updateFromLog These warts make this not suitable to be merged yet. This readonly code path is more expensive, since it has to query several branches. The value does get cached, but still large queries will be slower in a read-only repository when there are unmerged git-annex branches. When annex.merge-annex-branches=false, updateTo skips doing anything, and so the read-only repository code does not get triggered. So a user who is bothered by the extra work can set that. Other writes to the repository can still result in permissions errors. This includes the initial creation of the git-annex branch, and of course any writes to the git-annex branch. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 13:21:15 -04:00
Joey Hess	c2e46f4707	improve git command queue flushing with time limit So that eg, addurl of several large files that take time to download will update the index for each file, rather than deferring the index updates to the end. In cases like an add of many smallish files, where a new file is being added every few seconds. In that case, the queue will still build up a lot of changes which are flushed at once, for best performance. Since the default queue size is 10240, often it only gets flushed once at the end, same as before. (Notice that updateQueue updated _lastchanged when adding a new item to the queue without flushing it; that is necessary to avoid it flushing the queue every 5 minutes in this case.) But, when it takes more than a 5 minutes to add a file, the overhead of updating the index immediately is probably small, so do it after each file. This avoids git-annex potentially taking a very very long time indeed to stage newly added files, which can be annoying to the user who would like to get on with doing something with the files it's already added, eg using git mv to rename them to a better name. This is only likely to cause a problem if it takes say, 30 seconds to update the index; doing an extra 30 seconds of work after every 5 minute file add would be less optimal. Normally, updating the index takes significantly less time than that. On a SSD with 100k files it takes less than 1 second, and the index write time is bound by disk read and write so is not too much worse on a hard drive. So I hope this will not impact users, although if it does turn out to, the time limit could be made configurable. A perhaps better way to do it would be to have a background worker thread that wakes up every 60 seconds or so and flushes the queue. That is made somewhat difficult because the queue can contain Annex actions and so this would add a new source of concurrency issues. So I'm trying to avoid that approach if possible. Sponsored-by: Erik Bjäreholt on Patreon	2021-12-14 12:23:19 -04:00
Joey Hess	6242b35c33	fix error message Was "failed to generate a key" when key generation did not fail (it never does anymore) but the actual problem was it failed to stat the source file, perhaps due to it being deleted while the key was being generated. A user reported this, in a comment I followed up on in `262400fe04`, although I don't know what they did to trigger the error message.	2021-12-09 15:25:59 -04:00
Joey Hess	dbba231e06	Improve error message display when autoinit fails Due to eg, a permissions problem.	2021-12-09 14:38:12 -04:00
Joey Hess	ef3ab0769e	close pid lock only once no threads use it This fixes a FD leak when annex.pidlock is set and -J is used. Also, it fixes bugs where the pid lock file got deleted because one thread was done with it, while another thread was still holding it open. The LockPool now has two distinct types of resources, one is per-LockHandle and is used for file Handles, which get closed when the associated LockHandle is closed. The other one is per lock file, and gets closed when no more LockHandles use that lock file, including other shared locks of the same file. That latter kind is used for the pid lock file, so it's opened by the first thread to use a lock, and closed when the last thread closes a lock. In practice, this means that eg git-annex get of several files opens and closes the pidlock file a few times per file. While with -J5 it will open the pidlock file, process a number of files, until all the threads happen to finish together, at which point the pidlock file gets closed, and then that repeats. So in either case, another process still gets a chance to take the pidlock. registerPostRelease has a rather intricate dance, there are fine-grained STM locks, a STM lock of the pidfile itself, and the actual pidlock file on disk that are all resolved in stages by it. Sponsored-by: Dartmouth College's Datalad project	2021-12-06 15:01:39 -04:00
Joey Hess	e5ca67ea1c	fine-grained locking when annex.pidlock is enabled This locking has been missing from the beginning of annex.pidlock. It used to be possble, when two threads are doing conflicting things, for both to run at the same time despite using locking. Seems likely that nothing actually had a problem, but it was possible, and this eliminates that possible source of failure. Sponsored-by: Dartmouth College's Datalad project	2021-12-03 17:20:21 -04:00
Joey Hess	4703ad3e7f	remove unused import	2021-11-23 16:15:57 -04:00
Joey Hess	5a7f253974	support git 2.34.0's handling of merge conflict between annexed and non-annexed file This version of git -- or its new default "ort" resolver -- handles such a conflict by staging two files, one with the original name and the other named file~ref. Use unmergedSiblingFile when the latter is detected. (It doesn't do that when the conflict is between a directory and a file or symlink though, so see previous commit for how that case is handled.) The sibling file has to be deleted separately, because cleanConflictCruft may not delete it -- that only handles files that are annex links, but the sibling file may be the non-annexed file side of the conflict. The graftin code had assumed that, when the other side of a conclict is a symlink, the file in the work tree will contain the non-annexed content that we want it to contain. But that is not the case with the new git; the file may be the annex link and needs to be replaced with the content, while the annex link will be written as a -variant file. (The weird doesDirectoryExist check in graftin turns out to still be needed, test suite failed when I tried to remove it.) Test suite passes with new git with ort resolver default. Have not tried it with old git or other defaults. Sponsored-by: Noam Kremen on Patreon	2021-11-22 16:10:24 -04:00
Joey Hess	623a775609	fix cat-file leak in get with -J Bugfix: When -J was enabled, getting files leaked a ever-growing number of git cat-file processes. (Since commit `dd39e9e255`) The leak happened when mergeState called stopNonConcurrentSafeCoProcesses. While stopNonConcurrentSafeCoProcesses usually manages to stop everything, there was a race condition where cat-file processes were leaked. Because catFileStop modifies Annex.catfilehandles in a non-concurrency safe way, and could clobber modifications made in between. Which should have been ok, since originally catFileStop was only used at shutdown. Note the comment on catFileStop saying it should only be used when nothing else is using the handles. It would be possible to make catFileStop race-safe, but it should just not be used in a situation where a race is possible. So I didn't bother. Instead, the fix is just not to stop any processes in mergeState. Because in order for mergeState to be called, dupState must have been run, and it enables concurrency mode, stops any non-concurrent processes, and so all processes that are running are concurrency safea. So there is no need to stop them when merging state. Indeed, stopping them would be extra work, even if there was not this bug. Sponsored-by: Dartmouth College's Datalad project	2021-11-19 12:51:08 -04:00
Joey Hess	15d617f7e1	have setConcurrency stop any running git coprocesses When non-concurrent git coprocesses have been started, setConcurrency used to not stop them, and so could leak processes when enabling concurrency, eg when forkState is called. I do not think that ever actually happened, given where setConcurrency is called. And it probably would only leak one of each process, since it never downgrades from concurrent to non-concurrent.	2021-11-19 12:00:39 -04:00
Joey Hess	8c756d5a27	fix comment typo	2021-11-17 13:03:37 -04:00
Joey Hess	aa6e54ac6e	Fix a typo in the name of youtube-dl (reversion introduced in version 8.20210903)	2021-11-13 08:58:36 -04:00
Joey Hess	8034f2e9bb	factor out IncrementalHasher from IncrementalVerifier	2021-11-09 12:33:22 -04:00
Joey Hess	a0758bdd10	dynamically disable filter-process in restagePointerFile when it would be slower Based on my earlier benchmark, I have a rough cost model for how expensive it is for git-annex smudge to be run on a file, vs how expensive it is for a gigabyte of a file's content to be read and piped through to filter-process. So, using that cost model, it can decide if using filter-process will be more or less expensive than running the smudge filter on the files to be restaged. It turned out to be really annoying to temporarily disable filter-process. I did find a way, but urk, this is horrible. Notice that, if it's interrupted with it disabled, it will remain disabled until the next time restagePointerFile runs. Which could be some time later. If the user runs `git add` or `git checkout` on a lot of small files before that, they will see slower than expected performance. (This commit also deletes where I wrote down the benchmark results earlier.) Sponsored-by: Noam Kremen on Patreon	2021-11-08 16:20:34 -04:00
Joey Hess	837025b14f	Revert "disable filter.annex.process in restagePointerFile" This reverts commit `afe327ac49`. Unfortunately, disabling it by setting it to "" does not work, git then ignores filter.annex.smudge/clean, and does not pass files through git-annex at all. I don't think there is a way to temporarily disable this git config from the git command line. Which seems like a bug in git. So, it may be more expensive than anticipated to enable filter.annex.process, since git checkout etc will pipe all annexed files being checked out through it.	2021-11-05 12:43:33 -04:00
Joey Hess	afe327ac49	disable filter.annex.process in restagePointerFile This means git will run git-annex smudge --clean once per file that is restaged, which can be slow. But probably not as slow as git feeding all the content of annexed files you've gotten through a pipe to git-annex filter-process. The only time this is probably not ideal is after a drop of a bunch of files, when filter-process would be faster.	2021-11-04 15:20:26 -04:00
Joey Hess	a3cdff3fd5	add a comment about checkSaneLock See commit `8c2dd7d8ee` for original introduction of it, but needing to spelunk that far back to understand the code is not good.	2021-10-27 14:55:30 -04:00

1 2 3 4 5 ...

1813 commits