git-annex

Author	SHA1	Message	Date
Joey Hess	2ffe077cc2	git-remote-annex: brought back max-git-bundles config An incremental push that gets converted to a full push due to this config results in the inManifest having just one bundle in it, and the outManifest listing every other bundle. So it actually takes up more space on the special remote. But, it speeds up clone and fetch to not have to download a long series of bundles for incremental pushes.	2024-05-28 13:28:19 -04:00
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	cb59ec3efc	avoid duplicates building up in outManifest Happened exponentially since commit `1a3c60cc8e`	2024-05-24 15:10:56 -04:00
Joey Hess	58301e40d2	sync with special remotes with an annex:: url Check explicitly for an annex:: url, not just any url. While no built-in special remotes set an url, except ones that can be synced with, it seems possible that some external special remote sets an url for its own use, but did not expect it to be used by git-annex sync et al. The assistant also syncs with them.	2024-05-24 14:57:29 -04:00
Joey Hess	3e7324bbcb	only delete bundles on pushEmpty This avoids some apparently otherwise unsolveable problems involving races that resulted in the manifest listing bundles that were deleted. Removed the annex-max-git-bundles config because it can't actually result in deleting old bundles. It would still be possible to have a config that controls how often to do a full push, which would avoid needing to download too many bundles on clone, as well as needing to checkpresent too many bundles in verifyManifest. But it would need a different name and description.	2024-05-21 11:13:27 -04:00
Joey Hess	adcebbae47	clean up git-remote-annex git-annex branch handling Implemented alternateJournal, which git-remote-annex uses to avoid any writes to the git-annex branch while setting up a special remote from an annex:: url. That prevents the remote.log from being overwritten with the special remote configuration from the url, which might not be 100% the same as the existing special remote configuration. And it prevents an overwrite deleting of other stuff that was already in the remote.log. Also, when the branch was created by git-remote-annex, only delete it at the end if nothing else has been written to it by another command. This fixes the race condition described in `797f27ab05`, where git-remote-annex set up the branch and git-annex init and other commands were run at the same time and their writes to the branch were lost.	2024-05-15 17:33:38 -04:00
Joey Hess	24af51e66d	git-annex unused --from remote skips its git-remote-annex keys This turns out to only be necessary is edge cases. Most of the time, git-annex unused --from remote doesn't see git-remote-annex keys at all, because it does not record a location log for them. On the other hand, git-annex unused does find them, since it does not rely on the location log. And that's good because they're a local cache that the user should be able to drop. If, however, the user ran git-annex unused and then git-annex move --unused --to remote, the keys would have a location log for that remote. Then git-annex unused --from remote would see them, and would consider them unused. Even when they are present on the special remote they belong to. And that risks losing data if they drop the keys from the special remote, but didn't expect it would delete git branches they had pushed to it. So, make git-annex unused --from skip git-remote-annex keys whose uuid is the same as the remote.	2024-05-14 15:17:40 -04:00
Joey Hess	0bf72ef103	max-git-bundles config for git-remote-annex	2024-05-14 14:23:40 -04:00
Joey Hess	6f1039900d	prevent using git-remote-annex with unsuitable special remote configs I hope to support importtree=yes eventually, but it does not currently work. Added remote.<name>.allow-encrypted-gitrepo that needs to be set to allow using it with encrypted git repos. Note that even encryption=pubkey uses a cipher stored in the git repo to encrypt the keys stored in the remote. While it would be possible to not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using encryption=pubkey, it doesn't currently work, and that would be a complication that I doubt is worth it.	2024-05-14 13:52:20 -04:00
Joey Hess	34eae54ff9	git-remote-annex support exporttree=yes remotes Put the annex objects in .git/annex/objects/ inside the export remote. This way, when importing from the remote, they will be filtered out. Note that, when importtree=yes, content identifiers are used, and this means that pushing to a remote updates the git-annex branch. Urk. Will need to try to prevent that later, but I already had a todo about that for other reasons. Untested! Sponsored-By: Brock Spratlen on Patreon	2024-05-13 11:48:00 -04:00
Joey Hess	424afe46d7	fix incremental push to preserve existing bundle keys in manifest Also broke Manifest out to its own type with a smart constructor. Sponsored-by: mycroft on Patreon	2024-05-13 09:47:05 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	c7731cdbd9	add Backend.GitRemoteAnnex Making GITBUNDLE be in the backend list allows those keys to be hashed to verify, both when git-remote-annex downloads them, and by other transfers and by git fsck. GITMANIFEST is not in the backend list, because those keys will never be stored in .git/annex/objects and can't be verified in any case. This does mean that git-annex version will include GITBUNDLE in the list of backends. Also documented these in backends.mdwn Sponsored-by: Kevin Mueller on Patreon	2024-05-07 13:54:08 -04:00
Yaroslav Halchenko	87e2ae2014	run codespell throughout fixing typos automagically === Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:21 -04:00
Joey Hess	c410b2bb73	annex.maxextensions configuration Controls how many filename extensions to preserve. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-04-18 14:23:38 -04:00
Joey Hess	7cef5e8f35	export tree: avoid confusing output about renaming files When a file in the export is renamed, and the remote's renameExport returned Nothing, renaming to the temp file would first say it was renaming, and appear to succeed, but actually what it did was delete the file. Then renaming from the temp file would not do anything, since the temp file is not present on the remote. This appeared as if a file got renamed to a temp file and left there. Note that exporttree=yes importree=yes remotes have their usual renameExport replaced with one that returns Nothing. (For reasons explained in Remote.Helper.ExportImport.) So this happened even with remotes that support renameExport. Fix by letting renameExport = Nothing when it's not supported at all. This avoids displaying the rename. Sponsored-by: Graham Spencer on Patreon	2024-03-09 13:50:26 -04:00
Joey Hess	e7652b0997	implement URL to VURL migration This needs the content to be present in order to hash it. But it's not possible for a module used by Backend.URL to call inAnnex because that would entail a dependency loop. So instead, rely on the fact that Command.Migrate calls inAnnex before performing a migration. But, Command.ExamineKey calls fastMigrate and the key may or may not exist, and it's not wanting to actually perform a migration in any case. To handle that, had to add an additional value to fastMigrate to indicate whether the content is inAnnex. Factored generateEquivilantKey out of Remote.Web. Note that migrateFromURLToVURL hardcodes use of the SHA256E backend. It would have been difficult not to, given all the dependency loop issues. But --backend and annex.backend are used to tell git-annex migrate to use VURL in any case, so there's no config knob that the user could expect to configure that. Sponsored-by: Brock Spratlen on Patreon	2024-03-01 16:42:02 -04:00
Joey Hess	cc17ac423b	implement isCryptographicallySecureKey for VURL Considerable difficulty to work around an import cycle. Had to move the list of backends (except for VURL) to Backend.Variety to VURL could use it. Sponsored-by: Kevin Mueller on Patreon	2024-02-29 17:26:35 -04:00
Joey Hess	e7b7ea78af	lift isCryptographicallySecure to Annex Needed for VURL backend. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:14:13 -04:00
Joey Hess	55bf01b788	add equivilant key log for VURL keys When downloading a VURL from the web, make sure that the equivilant key log is populated. Unfortunately, this does not hash the content while it's being downloaded from the web. There is not an interface in Backend currently for incrementally hash generation, only for incremental verification of an existing hash. So this might add a noticiable delay, and it has to show a "(checksum...") message. This could stand to be improved. But, that separate hashing step only has to happen on the first download of new content from the web. Once the hash is known, the VURL key can have its hash verified incrementally while downloading except when the content in the web has changed. (Doesn't happen yet because verifyKeyContentIncrementally is not implemented yet for VURL keys.) Note that the equivilant key log file is formatted as a presence log. This adds a tiny bit of overhead (eg "1 ") per line over just listing the urls. The reason I chose to use that format is it seems possible that there will need to be a way to remove an equivilant key at some point in the future. I don't know why that would be necessary, but it seemed wise to allow for the possibility. Downloads of VURL keys from other special remotes that claim urls, like bittorrent for example, does not popilate the equivilant key log. So for now, no checksum verification will be done for those. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:01:49 -04:00
Joey Hess	0f7143d226	support VURL backend Not yet implemented is recording hashes on download from web and verifying hashes. addurl --verifiable option added with -V short option because I expect a lot of people will want to use this. It seems likely that --verifiable will become the default eventually, and possibly rather soon. While old git-annex versions don't support VURL, that doesn't prevent using them with keys that use VURL. Of course, they won't verify the content on transfer, and fsck will warn that it doesn't know about VURL. So there's not much problem with starting to use VURL even when interoperating with old versions. Sponsored-by: Joshua Antonishen on Patreon	2024-02-29 13:48:51 -04:00
Joey Hess	68e99513f0	added annex.commitmessage-command config Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:35:22 -04:00
Joey Hess	8e9ee31621	webapp: Added --port option, and annex.port config The getSocket comment that mentioned using ":port" in the hostname seems to have been incorrect or be out of date. After all, the bug report came when the user first tried doing that, and it didn't work. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-01-25 14:08:36 -04:00
Joey Hess	20567e605a	add directional stalldetection and bwlimit configs Sponsored-by: Dartmouth College's DANDI project	2024-01-19 15:27:53 -04:00
Joey Hess	7e69063a29	support annex.shared-sop-command for encryption=shared This works well, and it interoperates with gpg in my testing (although some SOP commands might choose to use a profile that does not so caveat emptor). Note that for creating the Cipher, gpg --gen-random is still used. SOP does not have an eqivilant, and as long as the user has gpg around, which seems likely, it doesn't matter that it uses gpg here, it's not being used for encryption. That seemed better than implementing a second way to get high quality entropy, at least for now. The need for the sop command to run in an empty directory has each call to encrypt and decrypt creating a new temporary directory. That is some unncessary overhead, though probably swamped by the overhead of running the sop command. This could be improved in the future by passing an already empty directory to them, or a sufficiently empty directory (.git/annex/tmp would probably suffice). Sponsored-by: Brett Eisenberg on Patreon	2024-01-12 13:31:18 -04:00
Joey Hess	dd3e779020	more groundwork for StatelessOpenPGP no behavior changes	2024-01-12 13:11:36 -04:00
Joey Hess	d98f02a5b0	test annex.shared-sop-command Test a specified Stateless OpenPGP command with eg: git-annex test --test-git-config annex.shared-sop-command=sqop Also documented that config and another one, but so far only the test suite uses the configs, have not yet implemented using it for actual symmetric encryption. Sponsored-by: Joshua Antonishen on Patreon	2024-01-10 16:30:38 -04:00
Joey Hess	257f01729c	distributed migration for pull and sync --content pull, sync: When operating on content, automatically hard link objects that have been migrated. Added annex.syncmigrations config that can be set to false to prevent pull and sync from migrating object content. I think that true is a good default for this config, because it avoids users having to re-download migrated content or learning about migration. But, some users will surely not like it, whether because it does take some time (especially for the first git-annex branch scan when there is a long history), or because they want to deal with it manually, or because their filesystem doesn't support hard links and they don't want it to copy objects. Sponsored-by: k0ld on Patreon	2023-12-08 14:18:18 -04:00
Joey Hess	c41ca6c832	convert StorableCipher to ByteString This allows getting rid of the ugly and error prone handling of "bag of bytes" String in Remote.Helper.Encryptable. Avoiding breakage like that dealt with by commit `9862d64bf9` And allows converting Utility.Gpg to use ByteString for IO, which is a welcome change. Tested the new git-annex interoperability with old, using all 3 encryption= types. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-11-01 14:39:49 -04:00
Joey Hess	3742263c99	simplify base64 to only use ByteString Note the use of fromString and toString from Data.ByteString.UTF8 dated back to commit `9b93278e8a`. Back then it was using the dataenc package for base64, which operated on Word8 and String. But with the switch to sandi, it uses ByteString, and indeed fromB64' and toB64' were already using ByteString without that complication. So I think there is no risk of such an encoding related breakage. I also tested the case that `9b93278e8a` fixed: git-annex metadata -s foo='a …' x git-annex metadata x metadata x foo=a … In Remote.Helper.Encryptable, it was avoiding using Utility.Base64 because of that UTF8 conversion. Since that's no longer done, it can just use it now.	2023-10-26 13:10:05 -04:00
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	518a51a8a0	--explain for preferred/required content matching And annex.largefiles and annex.addunlocked. Also git-annex matchexpression --explain explains why its input expression matches or fails to match. When there is no limit, avoid explaining why the lack of limit matches. This is also done when no preferred content expression is set, although in a few cases it defaults to a non-empty matcher, which will be explained. Sponsored-by: Dartmouth College's DANDI project	2023-07-26 14:50:04 -04:00
Joey Hess	f25eeedeac	initial implementation of --explain Currently it only displays explanations of options like --in and --copies. In the future, it should explain preferred content expression evaluation and other decisions. The explanations of a few things could be better. In particular, "standard" will just appear as-is (or as "!standard" if it doesn't match), rather than explaining why the standard preferred content expression for the group matches or not. Currently as implemented, it goes to stdout, and so commands like git-annex find that have custom output will not display --explain information. Perhaps that should change, dunno. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 16:52:57 -04:00
Joey Hess	c6acf574c7	implement importChanges optimisaton (not used yet) For simplicity, I've not tried to make it handle History yet, so when there is a history, a full import will still be done. Probably the right way to handle history is to first diff from the current tree to the last imported tree. Then, diff from the current tree to each of the historical trees, and recurse through the history diffing from child tree to parent tree. I don't think that will need a record of the previously imported historical trees, and so Logs.Import doesn't store them. Although I did leave room for future expansion in that log just in case. Next step will be to change importTree to importChanges and modify recordImportTree et all to handle it, by using adjustTree. Sponsored-by: Brett Eisenberg on Patreon	2023-05-31 16:01:34 -04:00
Joey Hess	5df89d58c7	git-annex pull and push Split out two new commands, git-annex pull and git-annex push. Those plus a git commit are equivilant to git-annex sync. In a sense, git-annex sync conflates 3 things, and it would have been better to have push and pull from the beginning and not sync. Although note that git-annex sync --content is faster than a pull followed by a push, because it only has to walk the tree once, look at preferred content once, etc. So there is some value in git-annex sync in speed, as well as user convenience. And it would be hard to split out pull and push from sync, as far as the implementaton goes. The implementation inside sync was easy, just adjust SyncOptions so it does the right thing. Note that the new commands default to syncing content, unless annex.synccontent is explicitly set to false. I'd like sync to also do that, but that's a hard transition to make. As a start to that transition, I added a note to git-annex-sync.mdwn that it may start to do so in a future version of git-annex. But a real transition would necessarily involve displaying warnings when sync is used without --content, and time. Sponsored-by: Kevin Mueller on Patreon	2023-05-16 16:51:07 -04:00
Joey Hess	271f3b1ab4	uninit: Support --json and --json-error-messages Had to convert uninit to do everything that can error out inside a CommandStart. This was harder than feels nice. (Also, in passing, converted CommandCheck to use a data type, not a weird number that it was not clear how it managed to be unique.) Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-11 13:43:02 -04:00
Joey Hess	365dbc89dc	expire, trust et al, dead, describe: Support --json and --json-error-messages For expire, the normal output is unchanged, but the --json output includes the uuid in machine parseable form. Which could be very useful for this somewhat obscure command. That needed ActionItemUUID to be implemented, which seemed like a lot of work, but then --- I had been going to skip implementing them for trust, untrust, dead, semitrust, and describe, but putting the uuid in the json is useful information, it tells what uuid git-annex picked given the input. It was not hard to support these once ActionItemUUID was implemented. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-05 15:33:30 -04:00
Joey Hess	4881bc5a53	rename errorid to message-id	2023-04-26 12:53:30 -04:00
Joey Hess	be36e208c2	json object for FileNotFound When a nonexistant file is passed to a command and --json-error-messages is enabled, output a JSON object indicating the problem. (But git ls-files --error-unmatch still displays errors about such files in some situations.) I don't like the duplication of the name of the command introduced by this, but I can't see a great way around it. One way would be to pass the Command instead. When json is not enabled, the stderr is unchanged. This is necessary because some commands like find have custom output. So dislaying "find foo not found" would be wrong. So had to complicate things with toplevelFileProblem having different output with and without json. When not using --json-error-messages but still using --json, it displays the error to stderr, but does display a json object without the error. It does have an errorid though. Unsure how useful that behavior is. Sponsored-by: Dartmouth College's Datalad project	2023-04-25 19:26:20 -04:00
Joey Hess	91ba0cc7fd	Revert "--json-exceptions" This reverts commit `a325524454`. Turns out this was predicated on an incorrect belief that json output didn't already sometimes lack the "key" field. Since json output already can when `giveup` was used, it seems unncessary to add a whole new option for this.	2023-04-25 17:37:34 -04:00
Joey Hess	a325524454	--json-exceptions Added a --json-exceptions option, which makes some exceptions be output in json. The distinction is that --json-error-messages is for messages relating to a particular ActionItem, while --json-exceptions is for messages that are not, eg ones for a file that does not exist. It's unfortunate that we need two switches with such a fine distinction between them, but I'm worried about maintaining backwards compatability in the json output, to avoid breaking anything that parses it, and this was the way to make sure I didn't. toplevelWarning is generally used for the latter kind of message. And the other calls to toplevelWarning could be converted to showException. The only possible gotcha is that if toplevelWarning is ever called after starting acting on a file, it will add to the --json-error-messages of the json displayed for that file and converting to showException would be a behavior change. That seems unlikely, but I didn't convery everything to avoid needing to satisfy myself it was not a concern. Sponsored-by: Dartmouth College's Datalad project	2023-04-25 17:05:33 -04:00
Joey Hess	fe5e586b72	rename Git.Filename to Git.Quote	2023-04-12 17:22:03 -04:00
Joey Hess	8b6c7bdbcc	filter out control characters in all other Messages This does, as a side effect, make long notes in json output not be indented. The indentation is only needed to offset them underneath the display of the file they apply to, so that's ok. Sponsored-by: Brock Spratlen on Patreon	2023-04-11 12:58:01 -04:00
Joey Hess	2ba1559a8e	git style quoting for ActionItemOther Added StringContainingQuotedPath, which is used for ActionItemOther. In the process, checked every ActionItemOther for those containing filenames, and made them use quoting. Sponsored-by: Graham Spencer on Patreon	2023-04-08 16:30:01 -04:00
Joey Hess	d689a5b338	git style filename quoting controlled by core.quotePath This is by no means complete, but escaping filenames in actionItemDesc does cover most commands. Note that for ActionItemBranchFilePath, the value is branch:file, and I choose to only quote the file part (if necessary). I considered quoting the whole thing. But, branch names cannot contain control characters, and while they can contain unicode, git coes not quote unicode when displaying branch names. So, it would be surprising for git-annex to quote unicode in a branch name. The find command is the most obvious command that still needs to be dealt with. There are probably other places that filenames also get displayed, eg embedded in error messages. Some other commands use ActionItemOther with a filename, I think that ActionItemOther should either be pre-sanitized, or should explicitly not be used for filenames, so that needs more work. When --json is used, unicode does not get escaped, but control characters were already escaped in json. (Key escaping may turn out to be needed, but I'm ignoring that for now.) Sponsored-by: unqueued on Patreon	2023-04-08 14:52:26 -04:00
Joey Hess	d9b6be7782	convert encode_c to ByteString This turns out to be possible after all, because the old one decomposed a unicode Char to multiple Word8s and encoded those. It should be faster in some places, particularly in Git.Filename.encodeAlways. The old version encoded all unicode by default as well as ascii control characters and also '"'. The new one only encodes ascii control characters by default. That old behavior was visible in Utility.Format.format, which did escape '"' when used in eg git-annex find --format='${escaped_file}\n' So made sure to keep that working the same. Although the man page only says it will escape "unusual" characters, so it might be able to be changed. Git.Filename.encodeAlways also needs to escape '"' ; that was the original reason that was escaped. Types.Transferrer I judge is ok to not escape '"', because the escaped value is sent in a line-based protocol, which is decoded at the other end by decode_c. So old git-annex and new will be fine whether that is escaped or not, the result will be the same. Note that when asked to escape a double quote, it is escaped to \" rather than to \042. That's the same behavior as git has. It's perhaps somehow more of a special case than it needs to be. Sponsored-by: k0ld on Patreon	2023-04-07 17:10:49 -04:00
Joey Hess	371d4f8183	decode_c converted to ByteString This speeds up a few things, notably CmdLine.Seek using Git.Filename which uses decode_c and this avoids a conversion to String and back, and probably the ByteString implementation of decode_c is also faster for simple cases at least than the string version. encode_c cannot be converted to ByteString (or if it did, it would have to convert right back to String in order to handle unicode). Sponsored-by: Brock Spratlen on Patreon	2023-04-07 14:44:19 -04:00
Joey Hess	d4cb7afeed	remove unused Key parameter from isCryptographicallySecure This will allow using isCryptographicallySecure on a Backend, before a Key has been generated. Sponsored-by: Lawrence Brogan on Patreon	2023-03-27 14:34:00 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Yaroslav Halchenko	0ae5ff797f	Typo: sansative -> sensitive	2023-03-17 15:14:50 -04:00

1 2 3 4 5 ...

752 commits