git-annex

Author	SHA1	Message	Date
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	58301e40d2	sync with special remotes with an annex:: url Check explicitly for an annex:: url, not just any url. While no built-in special remotes set an url, except ones that can be synced with, it seems possible that some external special remote sets an url for its own use, but did not expect it to be used by git-annex sync et al. The assistant also syncs with them.	2024-05-24 14:57:29 -04:00
Joey Hess	22bf23782f	initremote, enableremote: Added --with-url to enable using git-remote-annex Also sets remote.name.fetch to a typical value, same as git remote add does.	2024-05-24 14:29:36 -04:00
Joey Hess	434a88c368	Merge branch 'git-remote-annex'	2024-05-15 17:57:50 -04:00
Joey Hess	768cdee461	testremote: Really fsck downloaded objects `8844372c23` exposted a bug in testremote, it was passing the serialized key, not the object file, to be checksummed.	2024-05-15 17:57:27 -04:00
Joey Hess	468de43d66	Merge branch 'master' into git-remote-annex	2024-05-15 17:49:12 -04:00
Joey Hess	24af51e66d	git-annex unused --from remote skips its git-remote-annex keys This turns out to only be necessary is edge cases. Most of the time, git-annex unused --from remote doesn't see git-remote-annex keys at all, because it does not record a location log for them. On the other hand, git-annex unused does find them, since it does not rely on the location log. And that's good because they're a local cache that the user should be able to drop. If, however, the user ran git-annex unused and then git-annex move --unused --to remote, the keys would have a location log for that remote. Then git-annex unused --from remote would see them, and would consider them unused. Even when they are present on the special remote they belong to. And that risks losing data if they drop the keys from the special remote, but didn't expect it would delete git branches they had pushed to it. So, make git-annex unused --from skip git-remote-annex keys whose uuid is the same as the remote.	2024-05-14 15:17:40 -04:00
Joey Hess	0281f7f23e	Avoid the --fast option preventing checksumming in some cases it was not supposed to fsck --fast was intended to disable checksumming, but checksumming is done after transfers too. Due to the check being in the non-incremental path, it would only affect non-incremental checksumming during a transfer, and I'm not 100% sure that it was a problem. Also, when using an external backend that does checksumming, fsck --fast didn't disable it and now does.	2024-05-12 21:36:48 -04:00
Joey Hess	05684bdd6c	fsck: Fix recent reversion that made it say it was checksumming files whose content is not present Did not track down the commit that caused the problem, but git-annex version 10.20240431 didn't behave that way.	2024-05-12 21:23:27 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	483887591d	working toward git-remote-annex using a special remote Not quite there yet. Also, changed the format of GITBUNDLE keys to use only one '-' after the UUID. A sha256 does not contain that character, so can just split at the last one. Amusingly, the sha256 will probably not actually be verified. A git bundle contains its own checksums that git uses to verify it. And if someone wanted to replace the content of a GITBUNDLE object, they could just edit the manifest to use a new one whose sha256 does verify. Sponsored-by: Nicholas Golder-Manning	2024-05-06 16:28:04 -04:00
Yaroslav Halchenko	87e2ae2014	run codespell throughout fixing typos automagically === Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:21 -04:00
Yaroslav Halchenko	d20ecff73c	Fix one ambigous typo	2024-05-01 15:46:15 -04:00
Joey Hess	2c73845d90	multiple -m second try Test suite passes this time. When committing the adjusted branch, use the old method to make a message that old git-annex can consume. Also made the code accept the new message, so that eventually commitTreeExactMessage can be removed. Sponsored-by: Kevin Mueller on Patreon	2024-04-09 12:56:47 -04:00
Joey Hess	a8dd85ea5a	Revert "multiple -m" This reverts commit `cee12f6a2f`. This commit broke git-annex init run in a repo that was cloned from a repo with an adjusted branch checked out. The problem is that findAdjustingCommit was not able to identify the commit that created the adjusted branch. It seems that there is an extra "\n" at the end of the commit message that it does not expect. Since backwards compatability needs to be maintained, cannot just make findAdjustingCommit accept it with the "\n". Will have to instead have one commitTree variant that uses the old method, and use it for adjusted branch committing.	2024-04-02 17:29:07 -04:00
Joey Hess	cee12f6a2f	multiple -m sync, assist, import: Allow -m option to be specified multiple times, to provide additional paragraphs for the commit message. The option parser didn't allow multiple -m before, so there is no risk of behavior change breaking something that was for some reason using multiple -m already. Pass through to git commands, so that the method used to assemble the paragrahs is whatever git does. Which might conceivably change in the future. Note that git commit-tree has supported -m since git 1.7.7. commitTree was probably not using it since it predates that version. Since the configure script prevents building git-annex with git older than 2.1, there is no risk that it's not supported now. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-03-27 15:58:27 -04:00
Joey Hess	e23721f579	fix build warning A recent change made plumbing the backend through fsck unncessary. Left fsck checking backend and skipping operating on key when it could not find one. Not checking the backend would be a behavior change. For example the command git-annex fsck --key FOO--bar does nothing since FOO is not a known backend. If this were removed it would instead go on and fsck it and warn that no copies exist of the key. That behavior change seems like it would be fine, but I also have no reason to make it.	2024-03-26 14:13:59 -04:00
Joey Hess	9bf4f2eb16	fix reversion in unexport when unable to rename Bug introduced in commit `7cef5e8f35`	2024-03-15 16:14:44 -04:00
Joey Hess	dd4c4bcd7a	fix build warning A recent change made plumbing the backend through fsck unncessary. Left fsck checking backend and skipping operating on key when it could not find one, although I'm not sure if that's necessary to support eg, keys with unknown backend.	2024-03-09 13:50:30 -04:00
Joey Hess	7cef5e8f35	export tree: avoid confusing output about renaming files When a file in the export is renamed, and the remote's renameExport returned Nothing, renaming to the temp file would first say it was renaming, and appear to succeed, but actually what it did was delete the file. Then renaming from the temp file would not do anything, since the temp file is not present on the remote. This appeared as if a file got renamed to a temp file and left there. Note that exporttree=yes importree=yes remotes have their usual renameExport replaced with one that returns Nothing. (For reasons explained in Remote.Helper.ExportImport.) So this happened even with remotes that support renameExport. Fix by letting renameExport = Nothing when it's not supported at all. This avoids displaying the rename. Sponsored-by: Graham Spencer on Patreon	2024-03-09 13:50:26 -04:00
Joey Hess	016d1bee88	add reregisterurl command What this can currently be used for is only to change an url from being used by a special remote to being used by the web remote. This could have been a --move-from option to registerurl. But, that would have complicated its option and --batch processing, and also would have complicated unregisterurl, which is implemented on top of Command.Registerurl. So, a separate command was actually less complicated to implement. The generic description of the command is because I want to make this command a catch-all for other url updating kind of things, if there are ever any more. Also because it was hard to come up with a good name for the specific action. I considered `git-annex moveurl`, but that seems to indicate data is perhaps actually being moved, and seems to sit at the same level as addurl and rmurl, and this command is at the plumbing level of registerurl and unregisterurl. Sponsored-by: Dartmouth College's DANDI project	2024-03-05 15:06:14 -04:00
Joey Hess	e7652b0997	implement URL to VURL migration This needs the content to be present in order to hash it. But it's not possible for a module used by Backend.URL to call inAnnex because that would entail a dependency loop. So instead, rely on the fact that Command.Migrate calls inAnnex before performing a migration. But, Command.ExamineKey calls fastMigrate and the key may or may not exist, and it's not wanting to actually perform a migration in any case. To handle that, had to add an additional value to fastMigrate to indicate whether the content is inAnnex. Factored generateEquivilantKey out of Remote.Web. Note that migrateFromURLToVURL hardcodes use of the SHA256E backend. It would have been difficult not to, given all the dependency loop issues. But --backend and annex.backend are used to tell git-annex migrate to use VURL in any case, so there's no config knob that the user could expect to configure that. Sponsored-by: Brock Spratlen on Patreon	2024-03-01 16:42:02 -04:00
Joey Hess	9c988ee607	handle multiple VURL checksums in one pass git-annex fsck and some other commands that verify the content of a key were using the non-incremental verification interface. But for VURL urls, that interface is innefficient because when there are multiple equivilant keys, it has to separately read and checksum for each key in turn until one matches. It's more efficient for those to use the incremental interface, since the file can be read a single time. There's no real downside to using the incremental interface when available. Note that more speedup could be had for VURL, if it was able to calculate the checksum a single time and then compare with the equivilant keys checksums. When the equivilant keys use the same type of checksum. Sponsored-by: k0ld on Patreon	2024-03-01 14:41:10 -04:00
Joey Hess	0ac8962b1b	fix comment typo	2024-03-01 14:12:21 -04:00
Joey Hess	cc17ac423b	implement isCryptographicallySecureKey for VURL Considerable difficulty to work around an import cycle. Had to move the list of backends (except for VURL) to Backend.Variety to VURL could use it. Sponsored-by: Kevin Mueller on Patreon	2024-02-29 17:26:35 -04:00
Joey Hess	0f7143d226	support VURL backend Not yet implemented is recording hashes on download from web and verifying hashes. addurl --verifiable option added with -V short option because I expect a lot of people will want to use this. It seems likely that --verifiable will become the default eventually, and possibly rather soon. While old git-annex versions don't support VURL, that doesn't prevent using them with keys that use VURL. Of course, they won't verify the content on transfer, and fsck will warn that it doesn't know about VURL. So there's not much problem with starting to use VURL even when interoperating with old versions. Sponsored-by: Joshua Antonishen on Patreon	2024-02-29 13:48:51 -04:00
Joey Hess	3475b09c3e	pre-commit: Avoid committing the git-annex branch Except when a commit is made in a view, which changes metadata. Make the assistant commit the git-annex branch after git commit of working tree changes. This allows using the annex.commitmessage-command in the assistant to generate a commit message for the git-annex branch that relies on state gathered during the commit of the working tree. Eg, it might reuse the commit message. Note that, when not using the assistant, a git-annex add still commits the git-annex branch, so such a annex.commitmessage-command set up would not work then. But if someone is using the assistant and wants programmatic control over commit messages, this is useful. Someone not using the assistant can get the same result by using annex.alwayscommit=false during the git-annex add, and git-annex merge after they git commit. pre-commit was never really intended to commit the git-annex branch (except after recording changed metadata), but the assistant did sort of rely on it. It does later commit the git-annex branch before pushing to remotes, but I didn't want to risk building up lots of uncommitted changes to it if that didn't happen frequently. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:42:11 -04:00
Joey Hess	fa9197560d	move commitStaged out of Command.Sync which no longer uses it It's trivial enough that it it's not worth factoring it out to somewhere in common with Command.Undo and the assistant. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-07 16:19:28 -04:00
Joey Hess	21123ba368	assistant, undo: When committing, let the usual git commit hooks run Was doing a Git.Branch.commit for historical reasons to do with direct mode, which no longer apply. Note that the preCommitAnnexHook is no longer called in commitStaged because git-annex installs a pre-commit hook that runs the pre-commit-annex hook. And git commit will run the pre-commit hook. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-07 16:15:35 -04:00
Joey Hess	6b38d0c427	addurl, importfeed: Added --raw-except option --raw-except=web allows using yt-dlp but not any other special remotes. Currently this option can only be used once, trying to use it repeatedly will make option parsing fail. Perhaps it ought to support being used more than once, but it seemed like an unlikely use case to need that. Note that getParsed is called repeatedly when the option is used with several urls. While implementing DeferredParseClass would avoid that innefficiency, it didn't seem worth the added boilerplate since getParsed only calls byNameWithUUID which does minimal work. Sponsored-by: Dartmouth College's DANDI project	2024-02-05 15:16:25 -04:00
Joey Hess	2f3fe4d904	fix importfeed --force skip behavior reversion importfeed --force: Don't treat it as a failure when an already downloaded file exists. (Fixes a behavior change introduced in 10.20230626.) `04ee6c4c6b` caused the reversion. Inside a CommandPerform, stop causes it to fail. Before that commit, it was inside a CommandStart, where stop causes it to skip.	2024-02-02 15:57:07 -04:00
Joey Hess	0c64cd30c2	compare urls irrespective of downloader importfeed --force: Avoid creating duplicates of existing already downloaded files when yt-dlp or a special remote was used.	2024-02-02 15:50:56 -04:00
Joey Hess	90db97d9a2	importfeed: Added --scrape option Which uses yt-dlp to screen scrape the equivilant of an RSS feed. Note that youtubedlscraped is a speed optimisation. Since yt-dlp found the urls, we know it can download them. That avoids calling youtubeDlSupported on each url, which makes --fast a lot faster. Almost all the same metadata fields and file formatting fields are populated, when yt-dlp is able to get the data. Note that yt-dlp has some additional useful metadata that could be exposed. But, much of it is specific to particular websites, and it would be hard to document on the git-annex importfeed man page. Sponsored-by: unqueued on Patreon	2024-01-30 15:37:29 -04:00
Joey Hess	d7949f8202	move Feed and Item out of ToDownload This is groundwork for producing ToDownload in other ways, that may not be entirely isomorphic with feeds. Eg by using yt-dlp.	2024-01-30 14:11:26 -04:00
Joey Hess	8e9ee31621	webapp: Added --port option, and annex.port config The getSocket comment that mentioned using ":port" in the hostname seems to have been incorrect or be out of date. After all, the bug report came when the user first tried doing that, and it didn't work. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-01-25 14:08:36 -04:00
Joey Hess	e765d3e24c	import: --message/-m option	2024-01-18 12:41:44 -04:00
Joey Hess	f6cf2dec4c	disk free checking for unsized keys Improve disk free space checking when transferring unsized keys to local git remotes. Since the size of the object file is known, can check that instead. Getting unsized keys from local git remotes does not check the actual object size. It would be harder to handle that direction because the size check is run locally, before anything involving the remote is done. So it doesn't know the size of the file on the remote. Also, transferring unsized keys to other remotes, including ssh remotes and p2p remotes don't do disk size checking for unsized keys. This would need a change in protocol. (It does seem like it would be possible to implement the same thing for directory special remotes though.) In some sense, it might be better to not ever do disk free checking for unsized keys, than to do it only sometimes. A user might notice this direction working and consider it a bug that the other direction does not. On the other hand, disk reserve checking is not implemented for most special remotes at all, and yet it is implemented for a few, which is also inconsistent, but best effort. And so doing this best effort seems to make some sense. Fundamentally, if the user wants the size to always be checked, they should not use unsized keys. Sponsored-by: Brock Spratlen on Patreon	2024-01-16 14:29:10 -04:00
Joey Hess	1e775ab83b	remove slightly incorrect comments	2023-12-29 13:23:27 -04:00
Joey Hess	a4a5ec6366	info: Added "annex sizes of repositories" table to the overall display Thanks to previous work in `11cc9f1933`, this is almost entirely free, it only needs to do some additional map lookups and math. The strictness annotations keep the memory use from blowing up. Sponsored-by: unqueued on Patreon	2023-12-29 12:09:30 -04:00
Joey Hess	64db927d73	optimisation	2023-12-29 10:51:05 -04:00
Joey Hess	6d789c9c81	sync, push: Avoid trying to send individual files to special remotes configured with importtree=yes exporttree=no That will always fail. It already skipped doing this when exporttree=yes.	2023-12-26 15:56:58 -04:00
Joey Hess	86dbe9a825	migrate: support adding size back to URL keys migrate: Support adding size to URL keys that were added with --relaxed, by running eg: git-annex migrate --backend=URL foo Since url keys cannot be generated, that used to fail. Make it notice that the backend is not changed, and just get the size of the content. Sponsored-by: Brock Spratlen on Patreon	2023-12-08 16:22:14 -04:00
Joey Hess	257f01729c	distributed migration for pull and sync --content pull, sync: When operating on content, automatically hard link objects that have been migrated. Added annex.syncmigrations config that can be set to false to prevent pull and sync from migrating object content. I think that true is a good default for this config, because it avoids users having to re-download migrated content or learning about migration. But, some users will surely not like it, whether because it does take some time (especially for the first git-annex branch scan when there is a long history), or because they want to deal with it manually, or because their filesystem doesn't support hard links and they don't want it to copy objects. Sponsored-by: k0ld on Patreon	2023-12-08 14:18:18 -04:00
Joey Hess	4ed71b34de	migrate --apply And avoid migrate --update/--aply migrating when the new key was already present in the repository, and got dropped. Luckily, the location log allows distinguishing from the new key never having been present! That is mostly useful for --apply because otherwise dropped files would keep coming back until the old objects were reaped as unused. But it seemed to make sense to also do it for --update. for consistency in edge cases if nothing else. One case where --update can use it is when one branch got migrated earlier, and we dropped the file, and now another branch has migrated the same file. Sponsored-by: Jack Hill on Patreon	2023-12-08 13:23:46 -04:00
Joey Hess	51b974d9f0	skip distributed migration to insecure key when annex.securehashesonly is set This only avoids extra work and a warning messsage. It seems likely that in such a situation, the user does not want migrations to insecure hashes, and so best to ignore them as much as possible. If the user merges a branch that switches annexed files to an insecure hash, they will notice that the file contents are unavailable, and git-annex get will tell them the problem then. So it does not seem useful to have migrate --update also complain about it.	2023-12-08 12:41:50 -04:00
Joey Hess	30c2728d65	always verify content in distributed migration doc/todo/distributed_migration.mdwn discusses security of distributed migration, and this was identified as necessary to do.	2023-12-07 20:05:42 -04:00
Joey Hess	62ce56c4ea	display filenames in migrate --update Have to go to a lot of bother to find them, but I think it's worth it for usability. Sponsored-by: Luke T. Shumaker on Patreon	2023-12-07 18:00:09 -04:00
Joey Hess	abea01d9e0	migrate --update fully working Could use some more testing. When the old key is not present, Command.ReKey.linkKey' will return False, so this handles that case ok. But, I do wonder if distributed migration may need to deal with the old key getting copied into the repository later. In that situation, re-running migrate --update won't link it to the new key. It may be that some users will need that. They can delete .git/annex/migrate.log and run it again, but that is not a good user interface. Maybe either have a way to re-run all distributed migrations, or record migrations in a database and scan the db to find migrations to do in a future run? Sponsored-by: Kevin Mueller on Patreon	2023-12-07 17:27:51 -04:00
Joey Hess	7c7c9912c1	migrate --update gets keys The git log is outputting the diff, but this only looks at the new files. When we have a new file, we can get the old filename by just replacing "new" with "old". And then use branchFileRef to refer to it allows catting the old key. While this does have to skip past the old files in the diff, it's still faster than calling git diff separately. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-12-07 17:25:56 -04:00
Joey Hess	f1ce15036f	started migrate --update This is most of the way there, but not quite working. The layout of migrate.tree/ needs to be changed to follow this approach. git log will list all the files in tree order, so the new layout needs to alternate old and new keys. Can that be done? git may not document tree order, or may not preserve it here. Alternatively, change to using git log --format=raw and extract the tree header from that, then use git diff --raw $tree:migrate.tree/old $tree:migrate.tree/new That will be a little more expensive, but only when there are lots of migrations. Sponsored-by: Joshua Antonishen on Patreon	2023-12-07 15:50:52 -04:00

1 2 3 4 5 ...

2815 commits