git-annex

Author	SHA1	Message	Date
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	a4c9d4424c	remove Logs.Presence imports When imported along with Logs.Location, it can be an unused import and it won't warn, due to reexports. The point if this is really to show that Logs.Presence is not widely used, outside Logs/	2024-06-14 17:27:34 -04:00
Joey Hess	58301e40d2	sync with special remotes with an annex:: url Check explicitly for an annex:: url, not just any url. While no built-in special remotes set an url, except ones that can be synced with, it seems possible that some external special remote sets an url for its own use, but did not expect it to be used by git-annex sync et al. The assistant also syncs with them.	2024-05-24 14:57:29 -04:00
Yaroslav Halchenko	6674c3b055	A few more of typo fixes/skip as detected with bleeding edge codespell	2024-05-01 20:06:08 -04:00
Yaroslav Halchenko	87e2ae2014	run codespell throughout fixing typos automagically === Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:21 -04:00
Joey Hess	f04d9574d6	fix transfer lock file for Download to not include uuid While redundant concurrent transfers were already prevented in most cases, it failed to prevent the case where two different repositories were sending the same content to the same repository. By removing the uuid from the transfer lock file for Download transfers, one repository sending content will block the other one from also sending the same content. In order to interoperate with old git-annex, the old lock file is still locked, as well as locking the new one. That added a lot of extra code and work, and the plan is to eventually stop locking the old lock file, at some point in time when an old git-annex process is unlikely to be running at the same time. Note that in the case of 2 repositories both doing eg `git-annex copy foo --to origin` the output is not that great: copy b (to origin...) transfer already in progress, or unable to take transfer lock git-annex: transfer already in progress, or unable to take transfer lock 97% 966.81 MiB 534 GiB/s 0sp2pstdio: 1 failed Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe)) Transfer failed Perhaps that output could be cleaned up? Anyway, it's a lot better than letting the redundant transfer happen and then failing with an obscure error about a temp file, which is what it did before. And it seems users don't often try to do this, since nobody ever reported this bug to me before. (The "97%" there is actually how far along the other transfer is.) Sponsored-by: Joshua Antonishen on Patreon	2024-03-25 14:47:46 -04:00
Joey Hess	3475b09c3e	pre-commit: Avoid committing the git-annex branch Except when a commit is made in a view, which changes metadata. Make the assistant commit the git-annex branch after git commit of working tree changes. This allows using the annex.commitmessage-command in the assistant to generate a commit message for the git-annex branch that relies on state gathered during the commit of the working tree. Eg, it might reuse the commit message. Note that, when not using the assistant, a git-annex add still commits the git-annex branch, so such a annex.commitmessage-command set up would not work then. But if someone is using the assistant and wants programmatic control over commit messages, this is useful. Someone not using the assistant can get the same result by using annex.alwayscommit=false during the git-annex add, and git-annex merge after they git commit. pre-commit was never really intended to commit the git-annex branch (except after recording changed metadata), but the assistant did sort of rely on it. It does later commit the git-annex branch before pushing to remotes, but I didn't want to risk building up lots of uncommitted changes to it if that didn't happen frequently. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:42:11 -04:00
Joey Hess	fa9197560d	move commitStaged out of Command.Sync which no longer uses it It's trivial enough that it it's not worth factoring it out to somewhere in common with Command.Undo and the assistant. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-07 16:19:28 -04:00
Joey Hess	8e9ee31621	webapp: Added --port option, and annex.port config The getSocket comment that mentioned using ":port" in the hostname seems to have been incorrect or be out of date. After all, the bug report came when the user first tried doing that, and it didn't work. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-01-25 14:08:36 -04:00
Joey Hess	20567e605a	add directional stalldetection and bwlimit configs Sponsored-by: Dartmouth College's DANDI project	2024-01-19 15:27:53 -04:00
Joey Hess	de6a297d36	assistant: When generating a gpg secret key, avoid hardcoding the key algorithm and size This aims to future-proof gpg key generation. OpenPGP is in flux with a conflict over standards ongoing. It seems not unlikely that different systems will have different gpg commands that support different algorithms. This also simplifies the code by using the --quick-gen-key interface rather than the experimental batch interface. It seems less likely that --quick-gen-key will break than an experimental interface (whose documentation I can no longer find). --quick-gen-key is supported since gpg 2.1.0 (2014). Sponsored-by: Graham Spencer on Patreon	2024-01-09 15:31:53 -04:00
Joey Hess	1654572bc1	fix --from overriding annex-ignore Make git-annex get/copy/move --from foo override configuration of remote.foo.annex-ignore, as documented. This already worked for remotes supporting hasKeyCheap. For others though, git-annex copy --from foo would silently not do anything, while git-annex copy --to foo would use the annex-ignored remote. Also improved the annex-ignore docs, to reflect that `git-annex get` without --from will skip using annex-ignored remotes, for example. Sponsored-by: Dartmouth College's DANDI project	2023-11-30 15:12:07 -04:00
Joey Hess	e4ef688b98	RawFilePath conversion May slightly speed up startup.	2023-10-26 13:58:49 -04:00
Joey Hess	d9fd205cbb	push RawFilePath down into Annex.ReplaceFile Minor optimisation, but a win in every case, except for a couple where it's a wash. Note that replaceFile still takes a FilePath, because it needs to operate on Chars to truncate unicode filenames properly.	2023-10-26 13:36:49 -04:00
Joey Hess	c873586e14	eliminate s2w8 and w82s Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode characters. Luckily, genUUIDInNameSpace is only ever used on ASCII strings as far as I can determine. In particular, git-remote-gcrypt's gcrypt-id is an ASCII string.	2023-10-26 13:12:57 -04:00
Joey Hess	83056e7b53	convert checkAvailable to use availability rather than localpath Every remote that sets localpath also implements an availability that reutrns Unavailable when a local directory is not available. This makes external remotes, and others that get support for availability Unavailable to be used by checkAvailable. (Which is only used by the assistant.) Had to keep localpath though, since other parts of the assistant use it to eg, sync with a remote when a removable drive is plugged in. Sponsored-by: Jack Hill on Patreon	2023-08-16 15:57:30 -04:00
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	10b5f79e2d	fix empty tree import when directory does not exist Fix behavior when importing a tree from a directory remote when the directory does not exist. An empty tree was imported, rather than the import failing. Merging that tree would delete every file in the branch, if those files had been exported to the directory before. The problem was that dirContentsRecursive returned [] when the directory did not exist. Better for it to throw an exception. But in commit `74f0d67aa3` back in 2012, I made it never theow exceptions, because exceptions throw inside unsafeInterleaveIO become untrappable when the list is being traversed. So, changed it to list the contents of the directory before entering unsafeInterleaveIO. So exceptions are thrown for the directory. But still not if it's unable to list the contents of a subdirectory. That's less of a problem, because the subdirectory does exist (or if not, it got removed after being listed, and it's ok to not include it in the list). A subdirectory that has permissions that don't allow listing it will have its contents omitted from the list still. (Might be better to have it return a type that includes indications of errors listing contents of subdirectories?) The rest of the changes are making callers of dirContentsRecursive use emptyWhenDoesNotExist when they relied on the behavior of it not throwing an exception when the directory does not exist. Note that it's possible some callers of dirContentsRecursive that used to ignore permissions problems listing a directory will now start throwing exceptions on them. The fix to the directory special remote consisted of not making its call in listImportableContentsM use emptyWhenDoesNotExist. So it will throw an exception as desired. Sponsored-by: Joshua Antonishen on Patreon	2023-08-15 12:57:41 -04:00
Joey Hess	be028f10e5	split out Utility.Url.Parse This is mostly for git-repair which can't include all of Utility.Url without adding many dependencies that are not really necessary.	2023-08-14 12:28:10 -04:00
Joey Hess	68c9b08faf	fix build with unix-2.8.0 Changed the parameters to openFd. So needed to add a small wrapper library to keep supporting older versions as well.	2023-08-01 18:41:27 -04:00
Joey Hess	4ef16f53ed	enable TypeOperators	2023-08-01 18:39:01 -04:00
Joey Hess	240bae38f6	sync: When in an adjusted branch, merge changes from the original branch This causes changes to the original branch to get merged with a single sync. Before, it took 2 syncs; the first happened to update the synced/ branch, and the second merged changes from the synced/ branch into the ajusted branch. Using mergeToAdjustedBranch when tomerge == origbranch is probably overkill, but it does work fine. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-07-06 12:42:24 -04:00
Joey Hess	217a6abb19	assistant: Fix a crash when a small file is deleted immediately after being created git add will fail if the file got deleted in the meantime. And since it was queued, there was a window until the queue flushed where a deletion of the file would cause a crash. Instead, reuse Command.Add.addFile, which sha1 hashes the file itself immediately, and then queues the index update. Ignore exceptions that will happen if the file got deleted already. Sponsored-by: k0ld on Patreon	2023-06-19 12:44:56 -04:00
Joey Hess	38153ad340	assistant: Add dotfiles to git by default, unless annex.dotfiles is configured Tthe same as git-annex add does. Sponsored-by: Luke Shumaker on Patreon	2023-06-12 13:25:04 -04:00
Joey Hess	8b6c7bdbcc	filter out control characters in all other Messages This does, as a side effect, make long notes in json output not be indented. The indentation is only needed to offset them underneath the display of the file they apply to, so that's ok. Sponsored-by: Brock Spratlen on Patreon	2023-04-11 12:58:01 -04:00
Joey Hess	a0e6fa18eb	eliminate showStart showStartOther These were not handling control characters and are redundant. Sponsored-by: Jack Hill on Patreon	2023-04-10 16:28:58 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	d689a5b338	git style filename quoting controlled by core.quotePath This is by no means complete, but escaping filenames in actionItemDesc does cover most commands. Note that for ActionItemBranchFilePath, the value is branch:file, and I choose to only quote the file part (if necessary). I considered quoting the whole thing. But, branch names cannot contain control characters, and while they can contain unicode, git coes not quote unicode when displaying branch names. So, it would be surprising for git-annex to quote unicode in a branch name. The find command is the most obvious command that still needs to be dealt with. There are probably other places that filenames also get displayed, eg embedded in error messages. Some other commands use ActionItemOther with a filename, I think that ActionItemOther should either be pre-sanitized, or should explicitly not be used for filenames, so that needs more work. When --json is used, unicode does not get escaped, but control characters were already escaped in json. (Key escaping may turn out to be needed, but I'm ignoring that for now.) Sponsored-by: unqueued on Patreon	2023-04-08 14:52:26 -04:00
Joey Hess	cd076cd085	Windows: Support urls like "file:///c:/path" That is a legal url, but parseUrl parses it to "/c:/path" which is not a valid path on Windows. So as a workaround, use parseURIPortable everywhere, which removes the leading slash when run on windows. Note that if an url is parsed like this and then serialized back to a string, it will be different from the input. Which could potentially be a problem, but is probably not in practice. An alternative way to do it would be to have an uriPathPortable that fixes up the path after parsing. But it would be harder to make sure that is used everywhere, since uriPath is also used when constructing an URI. It's also worth noting that System.FilePath.normalize "/c:/path" yields "c:/path". The reason I didn't use it is that it also may change "/" to "\" in the path and I wanted to keep the url changes minimal. Also noticed that convertToWindowsNativeNamespace handles "/c:/path" the same as "c:/path". Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-03-27 13:38:02 -04:00
Joey Hess	a0badc5069	sync: Fix parsing of gcrypt::rsync:// urls that use a relative path Such an url is not valid; parseURI will fail on it. But git-annex doesn't actually need to parse the url, because all it needs to do to support syncing with it is know that it's not a local path, and use git pull and push. (Note that there is no good reason for the user to use such an url. An absolute url is valid and I patched git-remote-gcrypt to support them years ago. Still, users gonna do anything that tools allow, and git-remote-gcrypt still supports them.) Sponsored-by: Jack Hill on Patreon	2023-03-23 15:20:00 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Yaroslav Halchenko	100f5aabb6	Typo: recurrance -> recurrence	2023-03-17 15:14:54 -04:00
Joey Hess	f3cee94148	fix build on OSX This is built when building Setuo, and after `54ad1b4cfb`, such modules cannot import Utility.Directory, because it depends on unix, which is not in setup-depends. This module only needs System.Directory, so import Utility.SystemDirectory instead. Sponsored-by: Dartmouth College's Datalad project	2023-03-03 13:20:49 -04:00
Joey Hess	54ad1b4cfb	Windows: Support long filenames in more (possibly all) of the code Works around this bug in unix-compat: https://github.com/jacobstanley/unix-compat/issues/56 getFileStatus and other FilePath using functions in unix-compat do not do UNC conversion on Windows. Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the necessary conversion on windows to support long filenames. Audited all imports of System.PosixCompat.Files to make sure that no functions that operate on FilePath were imported from it. Instead, use the equvilants from Utility.RawFilePath. In particular the re-export of that module in Common had to be removed, which led to lots of other changes throughout the code. The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig make Utility.Directory not be needed to build setup. And so let it use Utility.RawFilePath, which depends on unix, which cannot be in setup-depends. Sponsored-by: Dartmouth College's Datalad project	2023-03-01 15:55:58 -04:00
Joey Hess	f09e299156	rawfilepath conversion	2023-02-27 15:06:32 -04:00
Joey Hess	672258c8f4	Revert "revert recent bug fix temporarily for release" This reverts commit `16f1e24665`.	2023-02-14 14:11:23 -04:00
Joey Hess	16f1e24665	revert recent bug fix temporarily for release Decided this bug is not severe enough to delay the release until tomorrow, so this will be re-applied after the release.	2023-02-14 14:06:29 -04:00
Joey Hess	c1ef4a7481	Avoid Git.Config.updateLocation adding "/.git" to the end of the repo path to a bare repo when git config is not allowed to list the configs due to the CVE-2022-24765 fix. That resulted in a confusing error message, and prevented the nice message that explains how to mark the repo as safe to use. Made isBare a tristate so that the case where core.bare is not returned can be handled. The handling in updateLocation is to check if the directory contains config and objects and if so assume it's bare. Note that if that heuristic is somehow wrong, it would construct a repo that thinks it's bare but is not. That could cause follow-on problems, but since git-annex then checks checkRepoConfigInaccessible, and skips using the repo anyway, a wrong guess should not be a problem. Sponsored-by: Luke Shumaker on Patreon	2023-02-14 14:00:36 -04:00
Joey Hess	2b014f1a8b	don't frontload reconcileStaged in git-annex init init: Avoid scanning for annexed files, which can be lengthy in a large repository. Instead that scan is done on demand. This lets git-annex init be run and some query commands be used in a repository without waiting. Note that autoinit already behaved this way, so while this will mean some commands like git-annex get/unlock/add will do the scan the first time run, that is not really a significant behavior change. And, it's really better to have a consistent behavior. The reason for the inconsistency was a strange bug discussed in `b3c4579c79`. Avoiding reconcileStaged in init will keep avoiding whatever that was. Sponsored-by: Dartmouth College's DANDI project	2022-11-18 13:58:47 -04:00
Joey Hess	14f7a386f0	Make git-annex enable-tor work when using the linux standalone build Clean the standalone environment before running the su command to run "sh". Otherwise, PATH leaked through, causing it to run git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set, which caused that script to not work: [2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"] /home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found Changed programPath to not use GIT_ANNEX_PROGRAMPATH, but instead run the scripts at the top of GIT_ANNEX_DIR. That works both when the standalone environment is set up, and when it's not. Sponsored-by: Kevin Mueller on Patreon	2022-10-26 15:45:08 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	23c6e350cb	improve createDirectoryUnder to allow alternate top directories This should not change the behavior of it, unless there are multiple top directories, and then it should behave the same as if there was a single top directory that was actually above the directory to be created. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 12:52:37 -04:00
Joey Hess	be19a68276	new matching options --want-get-by and --want-drop-by Sponsored-by: Graham Spencer on Patreon	2022-07-28 13:26:03 -04:00
Joey Hess	2d65c4ff1d	avoid unix-compat's rename On Windows, that does not support long paths https://github.com/jacobstanley/unix-compat/issues/56 Instead, use System.Directory.renamePath, which does support long paths. Sponsored-by: Dartmouth College's Datalad project	2022-07-12 14:55:02 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	4174ee33a4	avoid needing StarIsType extension Got a deprecation warning suggesting this change. Data.Kind is available since base-4.9.0.0 and git-annex depends on a newer base.	2022-06-28 15:17:41 -04:00
Joey Hess	f7c7a5ca7e	remove redundant patern match	2022-06-28 15:12:32 -04:00
Joey Hess	ff6b36c706	assistant prompt pushing of manual commits to remotes assistant: When annex.autocommit is set, notice commits that the user makes manually, and push them out to remotes promptly. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-03-31 13:02:16 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00

1 2 3 4 5 ...

1,721 commits