git-annex

Author	SHA1	Message	Date
Joey Hess	86e9e88530	add git-remote-p2p-annex Added git-remote-p2p-annex, which allows git pull and push to P2P networks provided by external commands. This is a refactor of git-remote-tor-annex, and should just work. Except possibly for quirks with the address parsing. I've checked that the address parsing basically works. One thing I don't understand is why git-remote-tor-annex removes "/*" from the end of the address. The git history does not provide any hints. So I didn't make git-remote-p2p-annex do the same. Maybe that is needed in some situation? But, a P2P address could contain "/", so removing it would be a problem. I can't see anything in gitremote-helpers(7) about why the url might get such a thing added to the end of it. My guess is that is not needed for tor either (but does no harm there since onion addresses never contain "/"). At this point, the implementation of generic P2P transports needs only remotedaemon support.	2025-07-30 15:25:56 -04:00
Joey Hess	d364e434c8	Add --url option and url= preferred content expression To match content that is recorded as present in an url. Note that, this cannot ask remotes to provide an url using whereisKey, like whereis does. Because preferred content expressions need to match the same from multiple perspectives, and the remote would not always be available. That's why the docs say "recorded as present", but still this may be surprising to some who see an url in whereis output and are surprised they cannot match on it. The use of getDownloader is to strip the downloader prefix from urls like "yt:". Note that, when OtherDownloader is used, this strips the ":" prefix, and allows matching on those urls too.	2025-07-21 12:13:40 -04:00
Joey Hess	6a9e923c74	fix handling of linked worktrees on filesystems w/o symlinks Fix bug in handling of linked worktrees on filesystems not supporting symlinks, that caused annexed file content to be stored in the wrong location inside the git directory, and also caused pointer files to not get populated. This parameterizes functions in Annex.Locations with a GitLocationMaker. The uses of standardGitLocationMaker are in cases where the path returned by a function should not change when in a linked worktree. For example, gitAnnexLink uses standardGitLocationMaker because symlink targets should always be to ".git/annex/objects" paths, even when in a linked worktree. Hopefully I have gotten all uses of standardGitLocationMaker right. This also assumes that all path construction to the annex directory is done via the functions in Annex.Locations, and there is no other, ad-hoc construction elsewhere. Thankfully, Annex.Locations has been around since the beginning, and has been used consistently. I think. --- In fixupUnusualRepos, when symlinks are supported, the .git file is replaced with a symlink to the linked worktree git directory. And in that directory, an "annex" symlink points to the main annex directory. In that case, it's not necessary to set mainWorkTreePath. It would be ok to set it, but not setting it in that case allows an optimisation of avoiding reading the "commondir" file. The change to make fixupUnusualRepos set mainWorkTreePath when the repository is not initialized yet is done in case the initialization itself writes to the annex directory. If that were the case, without setting mainWorkTreePath, the annex symlink would not be set up yet, and so it might have created the annex directory in the wrong place. Currently that didn't happen, but now that mainWorkTreePath is available, using it here avoids any such later problem. --- This commit does not deal with the mess of a worktree that has experienced this bug before. In particular, if `git-annex get` were run in such a worktree, it would have stored the object files in the linked worktree's git directory, rather than in the main git directory. Such misplaced object files need to be dealt with; the plan is to make git-annex fsck notice and fix them. A worktree that has experienced this bug before will contain unpopulated pointer files. Those may eventually get fixed up in regular usage of git-annex, but git-annex fsck will also fix them up. --- Finally, this has me pondering if all of git-annex's state files should really be stored in one common place across all linked worktrees. Should perhaps state files that are specific to the worktree be stored per-worktree? That has not been the case when using git-annex on filesystems supporting symlinks, but it has been the case on filesystems not supporting symlinks. Perhaps this leads to some other buggy behavior in some cases. Or perhaps to extra work being done. For example, the keys database has an associated files table. Which depends on the worktree. But reconcileStaged updates that table, so when git-annex is used first in one worktree and then in another one, reconcileStaged will update the table to reflect the current worktree. Which is extra work each time a different worktree is used. But also, what if two git-annex processes are running at the same time, in separate worktrees? Probably this needs more thought and investigation. So there is a risk that this commit exposes such buggy behavior in a situation where it didn't happen before, due to the filesystem not supporting symlinks. But, given how much this bug crippled using linked worktrees in such a situation, I doubt that many people have been doing that.	2025-07-14 13:20:39 -04:00
Joey Hess	bff089a392	prevent initialization with bad freeze/thaw hook configured When annex.freezecontent-command or annex.thawcontent-command is configured but fails, prevent initialization. This allows the user to fix their configuration and avoid crippled filesystem detection entering an adjusted unlocked branch unexpectedly, when they had been relying on the hooks working around their filesystems's infelicities. In the case of git-remote-annex, a failure of these hooks is taken to mean the filesystem may be crippled, so it deletes the bundles objects and avoids initialization. That might mean extra work, but only in this edge case where the hook is misconfigured. And it keeps the command working for cloning even despite the misconfiguration. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2025-05-27 12:54:31 -04:00
Joey Hess	e81fd72018	Added remote.name.annex-web-options config Which is a per-remote version of the annex.web-options config. Had to plumb RemoteGitConfig through to getUrlOptions. In cases where a special remote does not use curl, there was no need to do that and I used Nothing instead. In the case of the addurl and importfeed commands, it seemed best to say that running these commands is not using the web special remote per se, so the config is not used for those commands.	2025-04-01 10:17:38 -04:00
Joey Hess	bcfd554a0f	findcomputed: New command, displays information about computed files.	2025-03-18 12:55:48 -04:00
Joey Hess	3bec89a3c3	started git-annex recompute The perform action of this still needs work to do the right thing. In particular, it currently behaves as if --others was always set. And, it duplicates a lot of code from addcomputed.	2025-02-26 11:54:09 -04:00
Joey Hess	a154e91513	add git-annex addcomputed Working pretty well. Mostly. But: * Does not yet support inputs that are non-annexed files checked into git * --fast is currently broken (will need something like VURL keys) * --unreproducible still uses a checksumming backend, so drop and get again will likely fail (needs probably to use an URL key or something like one) The compute special remote seems to work pretty well too. Eg, getting from it works, and dropping content that is present in it works.	2025-02-25 15:50:08 -04:00
Joey Hess	4f1eea9061	remove unused adjustedBranchRefresh associated file parameter	2025-02-21 14:51:02 -04:00
Joey Hess	2d224e0d28	more OsPath conversion (658/749) At this point the test suite builds, and mostly the assistant is left. Sponsored-by: unqueued	2025-02-08 15:27:44 -04:00
Joey Hess	5eef09a3cc	more OsPath conversion (650/749) Sponsored-by: Nicholas Golder-Manning	2025-02-07 17:03:31 -04:00
Joey Hess	0d2b805806	more OsPath conversion (520/749) Sponsored-by: mycroft	2025-02-05 15:07:59 -04:00
Joey Hess	6e27b0d4d1	convert from readFileStrict This removes that function, using file-io readFile' instead. Had to deal with newline conversion, which readFileStrict does on Windows. In a few cases, that was pretty ugly to deal with. Sponsored-by: Kevin Mueller	2025-01-22 16:20:36 -04:00
Joey Hess	9b79f0f43d	use file-io for readFile/writeFile/appendFile on ByteStrings These are all straightforward, and easy small performance wins. Sponsored-by: Nicholas Golder-Manning	2025-01-22 14:30:25 -04:00
Joey Hess	793ddecd4b	use openTempFile from file-io And follow-on changes. Note that relatedTemplate was changed to operate on a RawFilePath, and so when it counts the length, it is now the number of bytes, not the number of code points. This will just make it truncate shorter strings in some cases, the truncation is still unicode aware. When not building with the OsPath flag, toOsPath . fromRawFilePath and fromRawFilePath . fromOsPath do extra conversions back and forth between String and ByteString. That overhead could be avoided, but that's the non-optimised build mode, so didn't bother. Sponsored-by: unqueued	2025-01-22 11:41:43 -04:00
Joey Hess	1ceece3108	RawFilePath conversion of System.Directory By using System.Directory.OsPath, which takes and returns OsString, which is a ShortByteString. So, things like dirContents currently have the overhead of copying that to a ByteString, but that should be less than the overhead of using Strings which often in turn were converted to RawFilePaths. Added Utility.OsString and the OsString build flag. That flag is turned on in the stack.yaml, and will be turned on automatically by cabal when built with new enough libraries. The stack.yaml change is a bit ugly, and that could be reverted for now if it causes any problems. Note that Utility.OsString.toOsString on windows is avoiding only a check of encoding that is documented as being unlikely to fail. I don't think it can fail in git-annex; if it could, git-annex didn't contain such an encoding check before, so at worst that should be a wash.	2025-01-20 19:17:33 -04:00
Joey Hess	5d2aaafa6c	git-remote-annex enableremote to support readonly webdav * Allow enableremote of an existing webdav special remote that has read-only access. * git-remote-annex: Use enableremote rather than initremote.	2025-01-07 15:57:38 -04:00
Joey Hess	8663c72f1e	git-remote-annex: Fix buggy behavior when annex.stalldetection is configured Make programPath never return "git-remote-annex" or other known multi-call program names, which are not git-annex and won't behave like it. If the git-annex binary gets installed under some entirely other name, it will still return it. This change exposed that readProgramFile actually could crash, which happened before only if getExecutablePath was not absolute and there was no ~/.config/git-annex/program. So fixed that to catch exception.	2024-11-25 12:14:52 -04:00
Joey Hess	a2e6d99deb	show remote name when failing to help debug strange git behavior on some daily builds	2024-11-21 15:55:32 -04:00
Joey Hess	757f93203a	Merge branch 'p2phttp-multi'	2024-11-21 15:16:06 -04:00
Joey Hess	3c18398d5a	p2phttp support --jobs with --directory --jobs is usually an Annex option setter, but --directory runs in IO, so would not have that available. So instead moved the option parser into the command's Options.	2024-11-21 14:15:14 -04:00
Joey Hess	31a38f8468	git-remote-annex: Require git version 2.31 or newer Since old ones had a buggy git bundle command. In particular, git 2.30.2 has a git bundle that supports --stdin, but does not read from it, and so fails to create a bundle. While not using --stdin would perhaps work, it limits the number of revs that get included in the bundle to the command line length limit. But the real kicker is that at the same time --stdin got fixed, a bug also got fixed that made git bundle skip including refs when they had the same sha as other refs it included. Which would lead to data loss. So best to avoid that buggy thing.	2024-11-20 15:00:17 -04:00
Joey Hess	df29f29e0d	git-remote-annex: Fix cloning from a special remote on a crippled filesystem Not initializing and so deleting the bundles only causes a little more work on the first git fetch.	2024-11-19 12:43:51 -04:00
Joey Hess	700be6c38f	git-remote-annex: Fix a reversion Introduced in version 10.20241031 that broke cloning from a special remote retrieveKeyFile changed to use createAnnexDirectory, which means that the path passed to it needs to be under .git git-remote-annex is probably the only thing in git-annex where that was not the case. And there's no real reason it cannot be the case with it either. Just use withOtherTmp.	2024-11-11 12:42:35 -04:00
Joey Hess	b932acf4ad	started Annex.Sim Have most of the sim command handler, but to keep it pure while implementing the rest will need some refactoring. It seems likely that running the simulation itself will not be able to be entirely pure. Preferred content evaluation runs in Annex after all. Note that the somewhat awkward randomWords is because the i386ancient build depends on a version of random too old to support generating a random ByteString on its own.	2024-09-04 15:15:36 -04:00
Joey Hess	8239824d92	consistently omit clusters when calculating RepoSizes updateRepoSize is only called on the UUID of a repository, not any cluster it might be a node of. But overLocationLogs and overLocationLogsJournal were inclusing cluster UUIDs. So it was inconsistent. Currently I don't see any reason to calculate RepoSize for a cluster. It's not even clear what it should mean, the total size of all nodes, or the amount of information stored in the cluster in total?	2024-08-17 11:24:14 -04:00
Joey Hess	8ac2685b33	calcBranchRepoSizes without journal files This will be used to prime the RepoSizes database, which will always contain values that correpond to information in the git-annex branch, so without anything from journal files. Factored out overJournalFileContents which will later be used to update Annex.reposizes to include information from journal files. This will be partitcularly important to support private UUIDs which only ever get to journal files and not to the branch.	2024-08-14 03:19:30 -04:00
Joey Hess	5afbea25e7	avoid counting size of keys that are in the journal twice In calcRepoSizes and also git-annex info, when a key was in the journal, it was passed to the callback twice, so the calculated size was wrong.	2024-08-13 13:23:39 -04:00
Joey Hess	467d80101a	improve handling of unmerged git-annex branches in readonly repo git-annex info was displaying a message that didn't make sense in context. In calcRepoSizes, it seems better to return the information from the git-annex branch, rather than giving up. Especially since balanced preferred content uses it, and we can't just give up evaluating a preferred content expression if git-annex is to be usable in such a readonly repo. Commit `6d7ecd9e5d` nobly wanted git-annex to behave the same with such unmerged branches as it does when it can merge them. But for the purposes of preferred content, it seems to me there's a sense that such an unmerged branch is the same as a remote we have not pulled from. The balanced preferred content will either way operate under outdated information, and so make not the best choices.	2024-08-13 13:13:12 -04:00
Joey Hess	1265d7e5df	implement maxsize log and command * maxsize: New command to tell git-annex how large the expected maximum size of a repository is. * vicfg: Include maxsize configuration.	2024-08-11 15:41:26 -04:00
Joey Hess	3ce2e95a5f	balanced preferred content and --rebalance This all works fine. But it doesn't check repository sizes yet, and without repository size checking, once a repository gets full, there will be no other repository that will want its files. Use of sha2 seems unncessary, probably alder2 or md5 or crc would have been enough. Possibly just summing up the bytes of the key mod the number of repositories would have sufficed. But sha2 is there, and probably hardware accellerated. I doubt very much there is any security benefit to using it though. If someone wants to construct a key that will be balanced onto a given repository, sha2 is certianly not going to stop them.	2024-08-09 14:16:09 -04:00
Joey Hess	169fd414eb	git-remote-annex: use annexLocationsBare There was no good reason for it to be using annexLocationsNonBare, and exporttree=yes annexobjects=yes is going to use annexLocationsBare, so this should as well for consistency. Since all returned ExportLocations are tried when retrieving objects, this won't break backwards compatability.	2024-08-02 13:13:44 -04:00
Joey Hess	3d14e2cf58	http server support for proxies, incomplete Refactored git-annex-shell code so this can use checkCanProxy'. At this point all that remains is opening a proxy connection, and using a proxy connection.	2024-07-25 13:19:24 -04:00
Joey Hess	f9b7ce7224	add Annex worker pool to P2PHttp This will be needed for get and store, since those need to run Annex actions. withLocalP2PConnections will also probably use it.	2024-07-10 12:19:47 -04:00
Joey Hess	86ce3bf1e4	started servant implementation of HTTP P2P protocol	2024-07-07 12:08:10 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	0033e6c0a6	Tab completion of many commands like info and trust now includes remotes Especially useful with proxied remotes and clusters, where the user may not be entirely familiar with the name and can learn by tab completion.	2024-06-30 12:39:18 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	0b72b85df5	added git-annex extendcluster This works, but updatecluster does not work yet in multi-gateway clusters, nor do gateways relay to other gateways.	2024-06-26 10:26:54 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	f916ce4b68	allow proxying to remotes that are nodes of clusters fixes reversion in `ca08f3fcc2`	2024-06-18 17:02:23 -04:00
Joey Hess	ca08f3fcc2	only proxy to a remote when remote.name.annex-proxy is set Avoids someone writing to proxy.log and gaining access to remotes of someone else's repository that they were not intended to be able to proxy to.	2024-06-18 11:43:10 -04:00
Joey Hess	291280ced2	started on git-annex-shell cluster support Works down to P2P protocol. The question now is, how to handle protocol version negotiation for clusters? Connecting to each node to find their protocol versions and using the lowest would be too expensive with a lot of nodes. So it seems that the cluster needs to pick its own protocol version to use with the client. Then it can either negotiate that same version with the nodes when it comes time to use them, or it can translate between multiple protocol versions. That seems complicated. Thinking it would be ok to refuse to use a node if it is not able to negotiate the same protocol version with it as with the client. That will mean that sometimes need nodes to be upgraded when upgrading the cluster's proxy. But protocol versions rarely change.	2024-06-17 15:10:04 -04:00
Joey Hess	3970bbb03b	Merge branch 'master' into proxy	2024-06-17 09:29:34 -04:00
Joey Hess	af79728ac3	tab complete special remotes An oversight.. And with the work in progress proxy and cluster, there can be additional remotes that are not listed in .git/config, but are available. Making those more discoverable is another big benefit of this.	2024-06-17 09:26:03 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	570ceffe8d	broke out initcluster One benefit of this is that a typo in annex-cluster-node config won't init a new cluster. Also it gets the cluster description set and is consistent with initremote.	2024-06-14 17:23:11 -04:00
Joey Hess	bbf261487d	add git-annex updatecluster command Seems to work fine, making the right changes to the git-annex branch.	2024-06-14 15:02:01 -04:00
Joey Hess	60e63fb85b	enable proxying for git-annex-shell p2pstdio	2024-06-11 13:07:04 -04:00
Joey Hess	649b87bedd	Merge branch 'master' into proxy	2024-06-10 14:26:18 -04:00

1 2 3 4 5 ...

522 commits