git-annex

Author	SHA1	Message	Date
Joey Hess	3e7324bbcb	only delete bundles on pushEmpty This avoids some apparently otherwise unsolveable problems involving races that resulted in the manifest listing bundles that were deleted. Removed the annex-max-git-bundles config because it can't actually result in deleting old bundles. It would still be possible to have a config that controls how often to do a full push, which would avoid needing to download too many bundles on clone, as well as needing to checkpresent too many bundles in verifyManifest. But it would need a different name and description.	2024-05-21 11:13:27 -04:00
Joey Hess	7dd2a67c41	fix names of new git configs	2024-05-14 15:33:47 -04:00
Joey Hess	23c4125ed4	mention other commands shipped with git-annex in SEE ALSO in man page	2024-05-14 15:23:45 -04:00
Joey Hess	0bf72ef103	max-git-bundles config for git-remote-annex	2024-05-14 14:23:40 -04:00
Joey Hess	6f1039900d	prevent using git-remote-annex with unsuitable special remote configs I hope to support importtree=yes eventually, but it does not currently work. Added remote.<name>.allow-encrypted-gitrepo that needs to be set to allow using it with encrypted git repos. Note that even encryption=pubkey uses a cipher stored in the git repo to encrypt the keys stored in the remote. While it would be possible to not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using encryption=pubkey, it doesn't currently work, and that would be a complication that I doubt is worth it.	2024-05-14 13:52:20 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	306ea42447	improve git-remote-annex docs renamed the git config to something shorter too	2024-05-06 13:06:22 -04:00
Joey Hess	a8cef2bf85	added man page for git-remote-annex And document remote.<name>.git-remote-annex-max-bundles which will configure it. datalad-annex uses a similar url format, but with some enhancements. See https://github.com/datalad/datalad-next/blob/main/datalad_next/gitremotes/datalad_annex.py I added the UUID to the URL, because it is needed in order to pick out which manifest file to use. The design allows for a single key/value store to have several special remotes all stored in it, and so the manifest includes the UUID in its name. While datalad-annex allows datalad-annex::<url>?, and allows referencing peices of the url in the parameters, needing the UUID prevents git-remote-annex from supporting that syntax. And anyway, it is a complication and I want to keep things simple for now. Sponsored-by: unqueued on Patreon	2024-05-06 12:48:04 -04:00
Joey Hess	c410b2bb73	annex.maxextensions configuration Controls how many filename extensions to preserve. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-04-18 14:23:38 -04:00
Joey Hess	d372553540	rclone special remote Added rclone special remote, which can be used without needing to install the git-annex-remote-rclone program. This needs a new version of rclone, which supports "rclone gitannex". This is implemented as a variant of an external special remote, that runs "rclone gitannex" instead of the usual git-annex-remote- command. Parameterized Remote.External to support that. Sponsored-by: Luke T. Shumaker on Patreon	2024-04-17 15:20:37 -04:00
Joey Hess	016d1bee88	add reregisterurl command What this can currently be used for is only to change an url from being used by a special remote to being used by the web remote. This could have been a --move-from option to registerurl. But, that would have complicated its option and --batch processing, and also would have complicated unregisterurl, which is implemented on top of Command.Registerurl. So, a separate command was actually less complicated to implement. The generic description of the command is because I want to make this command a catch-all for other url updating kind of things, if there are ever any more. Also because it was hard to come up with a good name for the specific action. I considered `git-annex moveurl`, but that seems to indicate data is perhaps actually being moved, and seems to sit at the same level as addurl and rmurl, and this command is at the plumbing level of registerurl and unregisterurl. Sponsored-by: Dartmouth College's DANDI project	2024-03-05 15:06:14 -04:00
Joey Hess	68e99513f0	added annex.commitmessage-command config Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:35:22 -04:00
Joey Hess	8e9ee31621	webapp: Added --port option, and annex.port config The getSocket comment that mentioned using ":port" in the hostname seems to have been incorrect or be out of date. After all, the bug report came when the user first tried doing that, and it didn't work. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-01-25 14:08:36 -04:00
Joey Hess	20567e605a	add directional stalldetection and bwlimit configs Sponsored-by: Dartmouth College's DANDI project	2024-01-19 15:27:53 -04:00
Joey Hess	df35f70801	tweak stall detection scaling Refactored to allow offline experimentation, and ended up changing the allowedvariation (aka fudge factor) to 3. 10 seems too high, and 1.5 too low. Scale earlier, so even if the first chunk takes less than the configured time period, allowance is made that later chunks might transfer slower. Decided to use the same allowedvariation to decide when to start scaling. Smoothed the scaling out. Some examples: ghci> upscale (BwRate 10 (Duration 60)) 25 BwRate 13 (Duration {durationSeconds = 75}) -- A small scaling upwards after 1/3rd the time. Not noticable. ghci> upscale (BwRate 10 (Duration 60)) 60 BwRate 30 (Duration {durationSeconds = 180}) -- At the configured time, 3x scaling. ghci> upscale (BwRate 10 (Duration 60)) 120 BwRate 60 (Duration {durationSeconds = 360}) -- A typical upscaling, here a 1 minute duration became 6 minutes -- due to the first chunk taking 2 minutes to transfer. ghci> upscale (BwRate 10 (Duration 60)) 600 BwRate 300 (Duration {durationSeconds = 1800}) -- Here the first chunk took 10 minutes to transfer, so it will -- take 30 minutes to detect a stall. Sponsored-by: Dartmouth College's DANDI project	2024-01-19 12:58:41 -04:00
Joey Hess	c2634e7df2	automatically adjust stall detection period Improve annex.stalldetection to handle remotes that update progress less frequently than the configured time period. In particular, this makes remotes that don't report progress but are chunked work when transferring a single chunk takes longer than the specified time period. Any remotes that just have very low update granulatity would also be handled by this. The change to Remote.Helper.Chunked avoids an extra progress update when resuming an interrupted upload. In that case, the code saw first Nothing and then Just the already transferred number of bytes, which defeated this new heuristic. This change will mean that, when resuming an interrupted upload to a chunked remote that does not do its own progress reporting, the progress display does not start out displaying the amount sent so far, until after the first chunk is sent. This behavior change does not seem like a major problem. About the scalefudgefactor, it seems reasonable to expect subsequent chunks to take no more than 1.5 times as long as the first chunk to transfer. Could set it to 1, but then any chunk taking a little longer would be treated as a stall. 2 also seems a likely value. Even 10 might be fine? Sponsored-by: Dartmouth College's DANDI project	2024-01-18 17:12:10 -04:00
Joey Hess	8f655f7953	improve annex.stalldetection documentation	2024-01-18 14:38:16 -04:00
Joey Hess	52d4f36b2b	document that --listen specifies an IP address Avoid users getting confused and thinking this allows specifying the port. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-01-18 12:58:00 -04:00
Joey Hess	d98f02a5b0	test annex.shared-sop-command Test a specified Stateless OpenPGP command with eg: git-annex test --test-git-config annex.shared-sop-command=sqop Also documented that config and another one, but so far only the test suite uses the configs, have not yet implemented using it for actual symmetric encryption. Sponsored-by: Joshua Antonishen on Patreon	2024-01-10 16:30:38 -04:00
Joey Hess	257f01729c	distributed migration for pull and sync --content pull, sync: When operating on content, automatically hard link objects that have been migrated. Added annex.syncmigrations config that can be set to false to prevent pull and sync from migrating object content. I think that true is a good default for this config, because it avoids users having to re-download migrated content or learning about migration. But, some users will surely not like it, whether because it does take some time (especially for the first git-annex branch scan when there is a long history), or because they want to deal with it manually, or because their filesystem doesn't support hard links and they don't want it to copy objects. Sponsored-by: k0ld on Patreon	2023-12-08 14:18:18 -04:00
Joey Hess	1654572bc1	fix --from overriding annex-ignore Make git-annex get/copy/move --from foo override configuration of remote.foo.annex-ignore, as documented. This already worked for remotes supporting hasKeyCheap. For others though, git-annex copy --from foo would silently not do anything, while git-annex copy --to foo would use the annex-ignored remote. Also improved the annex-ignore docs, to reflect that `git-annex get` without --from will skip using annex-ignored remotes, for example. Sponsored-by: Dartmouth College's DANDI project	2023-11-30 15:12:07 -04:00
Joey Hess	d5147df7e7	typo	2023-09-22 12:48:35 -04:00
Joey Hess	30a81d4644	remove incorrect documentation of annex.skipunknown behavior git-annex get with no parameters and annex.skipunknown = false in a directory with no files tracked by git results in the same failure as with a "." parameter. It may be that git ls-files --error-unmatch changed behavior? Or this was just wrong.	2023-09-13 12:57:24 -04:00
Joey Hess	01e952a506	add link	2023-09-11 13:15:37 -04:00
Joey Hess	cf8b30c914	oldkeys: New command that lists the keys used by old versions of a file The tricky thing about this turned out to be handling renames and reverts. For that, it has to make two passes over the git log, and to avoid buffering a possibly huge amount of logs in memory (ie the whole git log of an entire repository!), runs git log twice. (It might be possible to speed this up by asking git log to show a diff, and so avoid needing to use catKey.) Sponsored-By: Brock Spratlen on Patreon	2023-08-22 14:51:06 -04:00
Joey Hess	724ceeb1a9	avoid unncessary use of curl when conduit will do Avoid using curl when annex.security.allowed-ip-addresses is set but neither annex.web-options nor annex.security.allowed-url-schemes is set to a value that needs curl. Bug introduced in `840bd50390` Sponsored-By: Brock Spratlen on Patreon	2023-08-22 10:25:53 -04:00
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	e1fc9e204e	added git-annex satisfy This ended up having an interface like sync, rather than like get/copy/drop. That let it be implemented in terms of sync, which took a lot less code. Also, it lets it handle many of the edge cases that sync does, such as getting files that are not visible in a --hide-missing branch, and sending files to exporttree remotes. As well as being easier to implement, `git-annex satisfy myremote` makes sense as it satisfies the preferred content settings of the remote. `git-annex satisfy somefile` does not form a sentence that makes sense. So while -C can be a little bit annoying, it still makes sense to have this syntax. Note that, while I initially thought this would also satisfy numcopies, it does not. Arguably it ought to. But, sync does not send files in order to satisfy numcopies, it only sends files to satisfy preferred content. And it's important that this transfer the same files as sync does, because it will probably be used in a workflow where the user sometimes syncs and sometimes satisfies, and does not expect satisfy to do things that sync would not do. (Also opened a new bug that also affects sync et all, not only this command.) Sponsored-by: Nicholas Golder-Manning on Patreon	2023-06-29 15:34:53 -04:00
Joey Hess	1b9958f4fd	document git-annex satisfy	2023-06-29 14:15:01 -04:00
Joey Hess	d5c6197791	diffdriver: Added --text option for easy diffing of the contents of annexed text files This was already possible, but it was rather hard to come up with the complex shell command needed. Note that the diff output starts with "diff a/... b/...". I left off the "--git" because it's not a git format diff.	2023-06-28 15:27:16 -04:00
Joey Hess	6c84aabe63	document annex.thin risk to locked files pointing at same content	2023-06-21 15:39:15 -04:00
Joey Hess	f2db6da938	default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon	2023-05-27 13:04:53 -04:00
Joey Hess	e955912ad0	git-annex assist assist: New command, which is the same as git-annex sync but with new files added and content transferred by default. (Also this fixes another reversion in git-annex sync, --commit --no-commit, and --message were not enabled, oops.) See added comment for why git-annex assist does commit staged changes elsewhere in the work tree, but only adds files under the cwd. Note that it does not support --no-commit, --no-push, --no-pull like sync does. My thinking is, why should it? If you want that level of control, use git commit, git annex push, git annex pull. Sync only got those options because pull and push were not split out. Sponsored-by: k0ld on Patreon	2023-05-18 14:37:43 -04:00
Joey Hess	5df89d58c7	git-annex pull and push Split out two new commands, git-annex pull and git-annex push. Those plus a git commit are equivilant to git-annex sync. In a sense, git-annex sync conflates 3 things, and it would have been better to have push and pull from the beginning and not sync. Although note that git-annex sync --content is faster than a pull followed by a push, because it only has to walk the tree once, look at preferred content once, etc. So there is some value in git-annex sync in speed, as well as user convenience. And it would be hard to split out pull and push from sync, as far as the implementaton goes. The implementation inside sync was easy, just adjust SyncOptions so it does the right thing. Note that the new commands default to syncing content, unless annex.synccontent is explicitly set to false. I'd like sync to also do that, but that's a hard transition to make. As a start to that transition, I added a note to git-annex-sync.mdwn that it may start to do so in a future version of git-annex. But a real transition would necessarily involve displaying warnings when sync is used without --content, and time. Sponsored-by: Kevin Mueller on Patreon	2023-05-16 16:51:07 -04:00
Joey Hess	9155ed1072	configremote New command, currently limited to changing autoenable= setting of a special remote. It will probably never be used for more than that given the limitations on it. Sponsored-by: Brock Spratlen on Patreon	2023-04-18 15:30:49 -04:00
Joey Hess	2b5fa091e2	annex.maxextensionlength for view view: Support annex.maxextensionlength when generating filenames for the view branch. Note that refining an existing view will reuse the extension length that was configured when initially constructing the view. This is necessarily the case because it reuses the filenames. Also view files used to have all extensions at the end, no matter how many there were. Since annex.maxextensionlength's documentation includes that it's limited to 2 extensions, I made it consistent with that. Sponsored-by: k0ld on Patreon	2023-03-24 14:01:38 -04:00
Joey Hess	9c3c4c1712	deprecate git-annex status w/o runtime warning As far as I can see, git-annex status was added to support direct mode, and like other things added for that, it ought to be deprecated. Behavior is similar to git status --short, though not identical in a few cases eg renamed files. I think datalad does not use this command, although it might have in the past. Could not find any use of it in the current datalad code. A deprecation warning at runtime would be the next step, probably will wait and do that for all the deprecated commands together (except findref).	2023-02-28 16:34:31 -04:00
Joey Hess	aa0350ff49	add directory to views for files that lack specified metadata * view: New field?=glob and ?tag syntax that includes a directory "_" in the view for files that do not have the specified metadata set. * Added annex.viewunsetdirectory git config to change the name of the "_" directory in a view. When in a view using the new syntax, old git-annex will fail to parse the view log. It errors with "Not in a view.", which is not ideal. But that only affects view commands. annex.viewunsetdirectory is included in the View for a couple of reasons. One is to avoid needing to warn the user that it should not be changed when in a view, since that would confuse git-annex. Another reason is that it helped with plumbing the value through to some pure functions. annex.viewunsetdirectory is actually mangled the same as any other view directory. So if it's configured to something like "N/A", there won't be multiple levels of directories, which would also confuse git-annex. Sponsored-By: Jack Hill on Patreon	2023-02-07 16:28:46 -04:00
Joey Hess	f8bc208e89	findkeys: New command, very similar to git-annex find but operating on keys I've long been asked for `git-annex find --all` or something like that, but pushed back on it because I feel that the command is analagous to find(1) and so it would be surprising for it to list keys rather than files. So instead, add a new findkeys subcommand. Note that the use of withKeyOptions is rather strange because usually that is used to fall back to --all rather than listing files, but here it's made to default to --all like behavior and never list files. A performance thing that could be improved is that withKeyOptions always reads and caches location logs. But findkeys with no options does not need them, so it could be made faster. That caching does speed up options like --in though. This is really just a subset of a more general performance thing that --all reads location logs sometimes unncessarily. Anyway, it needs to read the location log in order to checkDead, and it seems good that findkeys does skip dead keys. Also, cleaned up comments on git-annex-find man page asking for --all option. Sponsored-by: Dartmouth College's DANDI project	2023-01-17 14:51:57 -04:00
Joey Hess	8d06930c88	web special remote is no longer a singleton Allow initremote of additional special remotes with type=web, in addition to the default web special remote. When --sameas=web is used, these provide additional names for the web special remote, and may also have their own additional configuration (once there is any for the web special remote) and cost. Sponsored-by: Dartmouth College's DANDI project	2023-01-09 15:49:20 -04:00
Joey Hess	5256be61c1	When youtube-dl is not available in PATH, use yt-dlp instead Debian is going to drop youtube-dl which is not active upstream, and yt-dlp is the replacement. This will make it be used if youtube-dl gets removed. If an old version of youtube-dl remains installed, git-annex will still use it. That might not be desirable, but changing git-annex to use yt-dlp in preference to youtube-dl when both are installed risks breaking when the user has annex.youtube-dl-options set to something that is supported by youtube-dl, but not by yt-dlp. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-11-21 14:40:33 -04:00
Joey Hess	dcc2957d9c	improve documentation about backends I noticed that, using just the man pages, there is no real description of what backends are, or what ones are available. Except for some examples. Added a git-annex-backends man page, that is just a stub, but at least describes what they basically are, and tells how to find the supported ons, and links to the backends web page. Sponsored-by: Brett Eisenberg on Patreon	2022-09-26 15:59:10 -04:00
Joey Hess	2e40eb07db	note that hooks are also run when on a crippled filesystem now	2022-09-26 13:10:47 -04:00
Joey Hess	2478e9e03a	restage: New git-annex command, handles restaging unlocked files This is much easier and less failure-prone than having the user run git update-index --refresh themselves. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 16:29:59 -04:00
Joey Hess	34e313f786	annex.diskreserve default increased from 1 mb to 100 mb It's hard to know what's a good default for this. But 1 mb seems way too small, because it's very easy for a git pull or some similar operation that we don't think of as using much space to use up 1 mb of space. Most people would want to free up some space if a filesystem only had 100 mb free. But on a small VPS, it's probably not uncommon to have only 1 gb free. So 1 gb is too large for annex.diskreserve. While old 1 gb USB keys are around, it's unlikely that anyone is relying on them to shuttle annex data around; it would be worth anyone's time to upgrade to a 32 gb or larger cheap modern USB key ($5). Sponsored-by: Kevin Mueller on Patreon	2022-09-21 15:00:13 -04:00
Joey Hess	d2c842e9a1	don't force use of conduit in withUrlOptionsPromptingCreds Use curl for downloads from git remotes when annex.url-options and other git configs are set. If the url needs a password, curl will fail, and git credential will not be used to prompt for it. But the user can set --netrc in url-options and put the password in the netrc file. This also means that url-options settings like -4 will take effect. That was the case before commit `1883f7ef8f` forced conduit to be used.	2022-09-09 16:07:32 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	840bd50390	make it easier to use curl for unusual url schemes Use curl when annex.security.allowed-url-schemes includes an url scheme not supported by git-annex internally, as long as annex.security.allowed-ip-addresses is configured to allow using curl. Sponsored-by: Luke Shumaker on Patreon	2022-08-15 12:22:13 -04:00
Joey Hess	4cfe17a9e8	use a subdirectory of annex.dbdir This allows annex.dbdir to be set globally or always set to the same value when needed. Each repository uses a subdirectory of it. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 13:18:15 -04:00
Joey Hess	a335c1e46e	annex.dbdir fully working Completes work started in `e60766543f` I've verified that all the sqlite databases get stored in annex.dbdir and are created successfully. If annex.dbdir does not exist, it will be created; its parent directory must already exist though. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 13:06:58 -04:00

1 2 3 4 5 ...

690 commits