git-annex

Author	SHA1	Message	Date
Joey Hess	5afbea25e7	avoid counting size of keys that are in the journal twice In calcRepoSizes and also git-annex info, when a key was in the journal, it was passed to the callback twice, so the calculated size was wrong.	2024-08-13 13:23:39 -04:00
Joey Hess	467d80101a	improve handling of unmerged git-annex branches in readonly repo git-annex info was displaying a message that didn't make sense in context. In calcRepoSizes, it seems better to return the information from the git-annex branch, rather than giving up. Especially since balanced preferred content uses it, and we can't just give up evaluating a preferred content expression if git-annex is to be usable in such a readonly repo. Commit `6d7ecd9e5d` nobly wanted git-annex to behave the same with such unmerged branches as it does when it can merge them. But for the purposes of preferred content, it seems to me there's a sense that such an unmerged branch is the same as a remote we have not pulled from. The balanced preferred content will either way operate under outdated information, and so make not the best choices.	2024-08-13 13:13:12 -04:00
Joey Hess	bd3d327d8a	smarter BranchState cache invalidation Only invalidate a just-written file in the cache, not the whole cache. This will avoid the possibly performance impact of cache invalidation mentioned in commit `770aac97a7`	2024-07-28 12:33:32 -04:00
Joey Hess	25a6ab6f11	Avoid grafting in export tree objects that are missing They could be missing due to an interrupted git-annex at just the wrong time during a prior graft, after which the tree objects got garbage collected. Or they could be missing because of manual messing with the git-annex branch, eg resetting it to back before the graft commit. Sponsored-by: Dartmouth College's OpenNeuro project	2024-06-07 16:51:50 -04:00
Joey Hess	b32c4c2e98	atomic git-annex branch update when regrafting in transition Fix a bug where interrupting git-annex while it is updating the git-annex branch could lead to git fsck complaining about missing tree objects. Interrupting git-annex while regraftexports is running in a transition that is forgetting git-annex branch history would leave the repository with a git-annex branch that did not contain the tree shas listed in export.log. That lets those trees be garbage collected. A subsequent run of the same transition then regrafts the trees listed in export.log into the git-annex branch. But those trees have been lost. Note that both sides of `if neednewlocalbranch` are atomic now. I had thought only the True side needed to be, but I do think there may be cases where the False side needs to be as well. Sponsored-by: Dartmouth College's OpenNeuro project	2024-06-07 16:34:10 -04:00
Joey Hess	adcebbae47	clean up git-remote-annex git-annex branch handling Implemented alternateJournal, which git-remote-annex uses to avoid any writes to the git-annex branch while setting up a special remote from an annex:: url. That prevents the remote.log from being overwritten with the special remote configuration from the url, which might not be 100% the same as the existing special remote configuration. And it prevents an overwrite deleting of other stuff that was already in the remote.log. Also, when the branch was created by git-remote-annex, only delete it at the end if nothing else has been written to it by another command. This fixes the race condition described in `797f27ab05`, where git-remote-annex set up the branch and git-annex init and other commands were run at the same time and their writes to the branch were lost.	2024-05-15 17:33:38 -04:00
Joey Hess	2c73845d90	multiple -m second try Test suite passes this time. When committing the adjusted branch, use the old method to make a message that old git-annex can consume. Also made the code accept the new message, so that eventually commitTreeExactMessage can be removed. Sponsored-by: Kevin Mueller on Patreon	2024-04-09 12:56:47 -04:00
Joey Hess	a8dd85ea5a	Revert "multiple -m" This reverts commit `cee12f6a2f`. This commit broke git-annex init run in a repo that was cloned from a repo with an adjusted branch checked out. The problem is that findAdjustingCommit was not able to identify the commit that created the adjusted branch. It seems that there is an extra "\n" at the end of the commit message that it does not expect. Since backwards compatability needs to be maintained, cannot just make findAdjustingCommit accept it with the "\n". Will have to instead have one commitTree variant that uses the old method, and use it for adjusted branch committing.	2024-04-02 17:29:07 -04:00
Joey Hess	cee12f6a2f	multiple -m sync, assist, import: Allow -m option to be specified multiple times, to provide additional paragraphs for the commit message. The option parser didn't allow multiple -m before, so there is no risk of behavior change breaking something that was for some reason using multiple -m already. Pass through to git commands, so that the method used to assemble the paragrahs is whatever git does. Which might conceivably change in the future. Note that git commit-tree has supported -m since git 1.7.7. commitTree was probably not using it since it predates that version. Since the configure script prevents building git-annex with git older than 2.1, there is no risk that it's not supported now. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-03-27 15:58:27 -04:00
Joey Hess	a69871491f	avoid build warning on windows Since append was only exported by Annex.Common on unix, excluding it from import caused a build warning on windows.	2024-03-26 13:16:33 -04:00
Joey Hess	68e99513f0	added annex.commitmessage-command config Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:35:22 -04:00
Joey Hess	2114253eaf	update comment The segfault seems to be fixed with git 2.43, I'm not sure what the affected range was.	2024-01-20 11:25:22 -04:00
Joey Hess	f1ce15036f	started migrate --update This is most of the way there, but not quite working. The layout of migrate.tree/ needs to be changed to follow this approach. git log will list all the files in tree order, so the new layout needs to alternate old and new keys. Can that be done? git may not document tree order, or may not preserve it here. Alternatively, change to using git log --format=raw and extract the tree header from that, then use git diff --raw $tree:migrate.tree/old $tree:migrate.tree/new That will be a little more expensive, but only when there are lots of migrations. Sponsored-by: Joshua Antonishen on Patreon	2023-12-07 15:50:52 -04:00
Joey Hess	be6b56df4c	remove unused import	2023-11-01 13:14:39 -04:00
Joey Hess	eb42935e58	Windows: Fix CRLF handling in some log files In particular, the mergedrefs file was written with CR added to each line, but read without CRLF handling. This resulted in each update of the file adding CR to each line in it, growing the number of lines, while also preventing the optimisation from working, so it remerged unncessarily. writeFile and readFile do NewlineMode translation on Windows. But the ByteString conversion prevented that from happening any longer. I've audited for other cases of this, and found three more (.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All of those also only prevent optimisations from working. Some other files are currently both read and written with ByteString, but old git-annex may have written them with NewlineMode translation. Other files are at risk for breakage later if the reader gets converted to ByteString. This is a minimal fix, but should be enough, as long as I remember to use fileLines when splitting a ByteString into lines. This leaves files written using ByteString without CR added, but that's ok because old git-annex has no difficulty reading such files. When the mergedrefs file has gotten lines that end with "\r\r\r\n", this will eventually clean it up. Each update will remove a single trailing CR. Note that S8.lines is still used in eg Command.Unused, where it is parsing git show-ref, and similar in Git/*. git commands don't include CR in their output so that's ok. Sponsored-by: Joshua Antonishen on Patreon	2023-10-30 14:23:23 -04:00
Joey Hess	0da1d40cd4	Improve memory use of --all when using annex.private This does not improve Annex.Branch.files at all, since it still uses ++ to combine the lists, so forcing all but the last one. But when there are a lot of files in the private journal, it does avoid --all (or a bare repo) from buffering the filenames in memory. See commit `653b719472` for prior discussion of this buffering. Sponsored-by: Graham Spencer on Patreon	2023-10-24 13:20:55 -04:00
Joey Hess	8bde6101e3	sqlite datbase for importfeed importfeed: Use caching database to avoid needing to list urls on every run, and avoid using too much memory. Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster, and memory use dropped from 203000k to 59408k. Database.ImportFeed is Database.ContentIdentifier with the serial number filed off. There is a bit of code duplication I would like to avoid, particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use the persistent sqlite tables, so despite the code being the same, they cannot be factored out. Since this database includes the contentidentifier metadata, it will be slightly redundant if a sqlite database is ever added for metadata. I did consider making such a generic database and using it for this. But, that would then need importfeed to update both the url database and the metadata database, which is twice as much work diffing the git-annex branch trees. Or would entagle updating two databases in a complex way. So instead it seems better to optimise the database that importfeed needs, and if the metadata database is used by another command, use a little more disk space and do a little bit of redundant work to update it. Sponsored-by: unqueued on Patreon	2023-10-23 16:46:22 -04:00
Joey Hess	c268dc5878	only stage regular files from the journal git-annex only writes regular files there, but other things may drop junk like empty .DAV directories around the tree. And trying to hash such things can have weird and hard to understand effects. So it seems best to do a small amount of work in statting the journal file to make sure it's a regular file. Sponsored-by: Jack Hill on Patreon	2023-10-10 13:22:02 -04:00
Joey Hess	adda6c1088	Add git-annex remote refs that are not newer to the merged refs list Significant startup speed increase by avoiding repeatedly checking if some remote git-annex branch refs need to be merged when it is not newer. One way this could happen is when there are 2 remotes that are themselves connected. The git-annex branch on the first remote gets updated. Then the second remote pulls from the first, and merges in its git-annex branch. Then the local repo pulls from the second remote, and merges its git-annex branch. At this point, a pull from the first remote will get a git-annex branch that is not newer, but is not on the merged refs list. In my big repo, git-annex startup time dropped from 4 seconds to 0.1 seconds. There were 5 to 10 such remote refs out of 18 remotes. Sponsored-by: Graham Spencer on Patreon	2023-08-09 13:31:36 -04:00
Joey Hess	8b6c7bdbcc	filter out control characters in all other Messages This does, as a side effect, make long notes in json output not be indented. The indentation is only needed to offset them underneath the display of the file they apply to, so that's ok. Sponsored-by: Brock Spratlen on Patreon	2023-04-11 12:58:01 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	f09e299156	rawfilepath conversion	2023-02-27 15:06:32 -04:00
Joey Hess	a23fd7349f	work around git segfault Work around bug in git 2.37 that causes a segfault when when core.untrackedCache is set, and broke git-annex init. Depending on when git gets fixed and how widely the buggy versions are used, this could be reverted quite soon, or need to linger for a long time. It only makes git-annex init a tiny bit slower in a new repo. Sponsored-by: Max Thoursie on Patreon	2022-08-04 14:20:57 -04:00
Joey Hess	d905232842	use ResourcePool for hash-object handles Avoid starting an unncessary number of git hash-object processes when concurrency is enabled. Sponsored-by: Dartmouth College's DANDI project	2022-07-25 17:32:39 -04:00
Joey Hess	4e88137a28	prevent appends except when annex.alwayscompact=false I would like for a new repo version to enable appends, but to do so safely would need a v11 followed by a 1 year delay followed by a v12 that does it. Since a similar v9 and v10 transition is currently happening, and is less than 6 months along in most repos, it does not feel wise to stack up another year-long transition behind that. What if I need to hurry up a new repo version for some other change? Added todo so I remember to make this change at some time when a v11 and probably v12 repo version do make sense. Sponsored-by: Dartmouth College's DANDI project	2022-07-20 13:23:55 -04:00
Joey Hess	6f1fd3abdd	no locking of journal on read after all Finally have a final design, and it turns out not to need locking on read.	2022-07-20 10:57:28 -04:00
Joey Hess	d0860b7f0e	fix build After `28b0aaea54`	2022-07-18 16:44:32 -04:00
Joey Hess	28b0aaea54	re-add lock journal before reading journal files This reverts commit `2e6e9876e3`. This is gonna be needed after all.. The append will only be atomic if the journal is locked, because the file being appended will have to be moved out of the way to avoid an old version of git-annex seeing an incomplete write to it. When git-annex finds that the file is not in the journal, and checks the append location, locking will be needed to avoid a race causing it to miss it in the append location too due to it being moved back to the journal.	2022-07-18 16:40:25 -04:00
Joey Hess	36f0bdcd57	add annex.alwayscompact Added annex.alwayscompact setting which can be unset to speed up writes to the git-annex branch in some cases. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 16:39:19 -04:00
Joey Hess	de18d92de6	efficient but unsafe journal file append This is only for checking performance, it's not safe. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 14:17:12 -04:00
Joey Hess	2e6e9876e3	Revert "lock journal before reading journal files" This reverts commit `47358a6f95`. This added overhead, and will not be needed, because appends are going to have to be made atomic for other reasons than avoiding incomplete reads of data being appended. In particular, when git-annex is interrupted in the middle of an append, it must not leave the file with a partially written line. So appending has to somehow be made fully atomic.	2022-07-18 13:38:12 -04:00
Joey Hess	ce455223df	split out appending to journal from writing, high level only Currently this is not an improvement, but it allows for optimising appendJournalFile later. With an optimised appendJournalFile, this will greatly speed up access patterns like git-annex addurl of a lot of urls to the same key, where the log file can grow rather large. Appending rather than re-writing the journal file for each line can save a lot of disk writes. It still has to read the current journal or branch file, to check if it can append to it, and so when the journal file does not exist yet, it can write the old content from the branch to it. Probably the re-reads are better cached by the filesystem than repeated writes. (If the re-reads turn out to keep performance bad, they could be eliminated, at the cost of not being able to compact the log when replacing old information in it. That could be enabled by a switch.) While the immediate need is to affect addurl writes, it was implemented at the level of presence logs, so will also perhaps speed up location logs. The only added overhead is the call to isNewInfo, which only needs to compare ByteStrings. Helping to balance that out, it avoids compactLog when it's able to append. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 13:22:50 -04:00
Joey Hess	47358a6f95	lock journal before reading journal files This is not currently necessary; journal files are updated atomically. However, for faster appends to large journal files, locking on read will be needed, because appends are not atomic. Sponsored-by: Dartmouth College's DANDI project	2022-07-15 14:43:29 -04:00
Joey Hess	1b680d330b	revert accidental change	2022-07-13 15:17:08 -04:00
Joey Hess	68e9b7f987	comment	2022-07-13 13:44:43 -04:00
Joey Hess	b1d719f9d2	handle transitions with read-only unmerged git-annex branches Capstone to this feature. Any transitions that have been performed on an unmerged remote ref but not on the local git-annex branch, or vice-versa have to be applied on the fly when reading files. Sponsored-by: Dartmouth College's Datalad project	2021-12-28 13:23:32 -04:00
Joey Hess	720baf820e	refactoring	2021-12-28 12:15:51 -04:00
Joey Hess	23a485498f	handle Annex.Branch.files with read-only unmerged git-annex branches It would be difficult to make Annex.Branch.files query the unmerged git-annex branches. Might be possible, similar to what was discussed in `7f6b2ca49c` but again I decided to make it not do anything in that situation to start with before adding such a complicated thing. git-annex info uses it when getting info about a repostory. The choices were to make that fail with an error, or display the info it can, and change the output slightly for the bits of info it cannot access. While that is a behavior change, and I want to avoid any behavior changes due to unmerged git-annex branches in a read-only repo, displaying a message that is not a number seems unlikely to break anything that was consuming a number, any worse than throwing an exception would. Probably. Also git-annex unused --from origin is made to throw an error, but it would fail later anyway when trying to write to the unused log files. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 15:28:31 -04:00
Joey Hess	7f6b2ca49c	handle overBranchFileContents with read-only unmerged git-annex branches This makes --all error out in that situation. Which is better than ignoring information from the branches. To really handle the branches right, overBranchFileContents would need to both query all the branches and union merge file contents (or perhaps not provide any file content), as well as diffing between branches to find files that are only present in the unmerged branches. And also, it would need to handle transitions.. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:30:51 -04:00
Joey Hess	d9d0fe5fa4	disable precaching git-annex branch when there are unmerged branches in a read-only repo The way precaching works, it can't merge in information from those branches efficiently, so just disable it and fall back to Annex.Branch.get in order to get the correct information. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:08:50 -04:00
Joey Hess	1e09cf661e	remove git-annex branch ref from unmerged refs list It's queried separately so it was causing extra work to include it.	2021-12-27 13:33:27 -04:00
Joey Hess	6d7ecd9e5d	merge git-annex branch in memory in read-only repository Improved support for using git-annex in a read-only repository, git-annex branch information from remotes that cannot be merged into the git-annex branch will now not crash it, but will be merged in memory. To avoid this making git-annex behave one way in a read-only repository, and another way when it can write, it's important that Annex.Branch.get return the same thing (modulo log file compaction) in both cases. This manages that mostly. There are some exceptions: - When there is a transition in one of the remote git-annex branches that has not yet been applied to the local or other git-annex branches. Transitions are not handled. - `git-annex log` runs git log on the git-annex branch, and so it will not be able to show information coming from the other, not yet merged branches. - Annex.Branch.files only looks at files in the git-annex branch and not unmerged branches. This affects git-annex info output. - Annex.Branch.hs.overBranchFileContents ditto. Affects --all and also importfeed (but importfeed cannot work in a read-only repo anyway). - CmdLine.Seek.seekFilteredKeys when precaching location logs. Note use of Annex.Branch.fullname - Database.ContentIdentifier.needsUpdateFromLog and updateFromLog These warts make this not suitable to be merged yet. This readonly code path is more expensive, since it has to query several branches. The value does get cached, but still large queries will be slower in a read-only repository when there are unmerged git-annex branches. When annex.merge-annex-branches=false, updateTo skips doing anything, and so the read-only repository code does not get triggered. So a user who is bothered by the extra work can set that. Other writes to the repository can still result in permissions errors. This includes the initial creation of the git-annex branch, and of course any writes to the git-annex branch. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 13:21:15 -04:00
Joey Hess	5a9e6b1fd4	when private journal file exists, still read from git-annex branch Fix bug that caused stale git-annex branch information to read when annex.private or remote.name.annex-private is set. The private journal file should not prevent reading more current information from the git-annex branch, but used to. Note that, overBranchFileContents has to do additional work now, when there's a private journal file, it reads from the branch redundantly and more slowly. Sponsored-by: Jack Hill on Patreon	2021-10-26 13:43:50 -04:00
Joey Hess	e853ef3095	decorate openTempFile errors with the template name This is to track down what file in .git/annex/ is being written to via a temp file when the repository is read-only. Sponsored-by: Dartmouth College's Datalad project	2021-08-30 13:05:02 -04:00
Joey Hess	8b6dad11a2	add createMessage init: When annex.commitmessage is set, use that message for the commit that creates the git-annex branch. This will be used by filter-branch too, and it seems to make sense to let annex.commitmessage affect it.	2021-05-17 13:07:47 -04:00
Joey Hess	4ff8a1ae2b	refactoring filterBranch should be reusable for copy-branch command. Changed LogVariety to differentiate between LocationLog and UrlLog; only location logs contain uuids and need to be filtered by uuid, while url logs do not. This does not change current behavior, but it will let filterBranch be reused without filtering url logs incorrectly.	2021-05-13 14:43:25 -04:00
Joey Hess	32138b8cd8	implement annex.privateremote and remote.name.private configs The slightly unusual parsing in Types.GitConfig avoids the need to look at the remote list to get configs of remotes. annexPrivateRepos combines all the configs, and will only be calculated once, so it's nice and fast. privateUUIDsKnown and regardingPrivateUUID now need to read from the annex mvar, so are not entirely free. But that overhead can be optimised away, as seen in getJournalFileStale. The other call sites didn't seem worth optimising to save a single MVar access. The feature should have impreceptable speed overhead when not being used.	2021-04-23 14:21:57 -04:00
Joey Hess	d5a05655b4	Merge branch 'master' into hiddenannex	2021-04-23 13:06:33 -04:00
Joey Hess	657d55c401	convert withKnownUrls to use overBranchFileContents This only partly fixes importfeed to see journalled files, since it separately cats metadata directly from the branch. Held off on a changelog for a bug fix until that's dealt with.	2021-04-23 11:32:25 -04:00

1 2 3 4 5

249 commits