git-annex

Author	SHA1	Message	Date
Joey Hess	349b1e443b	proxied importtree=yes remotes are untrustworthy Even without exporttree=yes.	2024-08-08 15:26:02 -04:00
Joey Hess	3ea835c7e8	proxied exporttree=yes versionedexport=yes remotes are not untrusted This removes versionedExport, which was only used by the S3 special remote. Instead, versionedexport=yes is a common way for remotes to indicate that they are versioned.	2024-08-08 15:24:19 -04:00
Joey Hess	5c36177e58	proxied exporttree=yes remotes are untrustworthy This is not perfect because it does not handle versioned special remotes, which should not be untrustworthy, but now are when proxied. The implementation turned out to be easy, because the exporttree field is a default field, so is available in RemoteConfig even for git remotes.	2024-08-08 14:43:53 -04:00
Joey Hess	bb9b02b723	remove unused imports	2024-08-06 14:49:20 -04:00
Joey Hess	a3d96474f2	rename to annexobjects location on unexport This avoids needing to re-upload the file again to get it to the annexobjects location, which git-annex sync was doing when it was preferred content. If the file is not preferred content, sync will drop it from the annexobjects location. If the file has been deleted from the tree, it will remain in the annexobjects location until an unused/dropunused pass is done.	2024-08-04 11:58:07 -04:00
Joey Hess	ee076b68f5	strong verification on retrieval from annexobjects location The file in the annexobjects location may have been renamed from a previously exported file that got deleted in a subsequent export. Or it may be renamed to annexobjects temporarily before being renamed to another name (to handle eg pairwise renames). But, an exported file is not guaranteed to contain the content of the key that the local repository last exported there. Another tree could have been exported from elsewhere in the meantime. So, files in annexobjects do not necessarily have the content of their key. And so have to be strongly verified when retrieving. The same as is done when retrieving exported files.	2024-08-04 11:24:21 -04:00
Joey Hess	069d90eab5	prevent removeKey from annexobjects=yes remote when the key is in the exported tree Removing the key from the annexobjects location when it's in the exported tree would leave it in the exported tree, and so succeeding would update the location log incorrectly. But this also can't remove it from the exported tree, because that would cause import tree to see a file got deleted. So, refuse to remove in this situation. It would be possible to remove from the annexobjects location and then fail. Then if a key somehow got stored in both the annexobjects location and the exported tree location(s), the duplicate would be resolved. Not doing this because first, I don't know how that situation could happen, and second, it seems wrong for a failed remove to have a side-effect like that.	2024-08-02 16:45:52 -04:00
Joey Hess	28b29f63dc	initial support for annexobjects=yes Works but some commands may need changes to support special remotes configured this way.	2024-08-02 14:07:45 -04:00
Joey Hess	6af44b9de6	p2phttp remotes are not readonly That prevented testremote from working when remote.name.url = http://..	2024-07-29 10:54:14 -04:00
Joey Hess	cd89f91aa5	remove uuid from annex+http urls Not needed it turns out.	2024-07-28 20:29:42 -04:00
Joey Hess	bc9cc79e85	set remote's annexUrl automatically When the remote repository's git config file has annex.url set to an annex+http url.	2024-07-28 20:13:41 -04:00
Joey Hess	bdde6d829c	fix http proxying for a local git remote with a relative path git-annex-shell expects an absolute path	2024-07-28 13:35:51 -04:00
Joey Hess	0bdeafc2c4	use annex+http for accessing proxies Doesn't work yet on the http server side, which is throwing 502 bad gateway.	2024-07-25 12:00:57 -04:00
Joey Hess	ba0ecbf47e	less indent	2024-07-25 10:12:59 -04:00
Joey Hess	b13c2407af	p2phttp drop supports checking proof timestamps At this point the p2phttp implementation is fully complete!	2024-07-25 10:11:09 -04:00
Joey Hess	515c42e1e3	testremote passes on p2phttp remote	2024-07-24 14:42:24 -04:00
Joey Hess	97836aafba	Remote.Git lockContent works with annex+http urls	2024-07-24 13:42:57 -04:00
Joey Hess	9fa9678585	Remote.Git removeKey works with annex+http urls Does not yet handle drop proof lock timestamp checking.	2024-07-24 12:33:26 -04:00
Joey Hess	cfdb80cd05	progress meter for p2phttp storeKey	2024-07-24 12:14:56 -04:00
Joey Hess	b3915b88ba	Remote.Git storeKey works with annex+http urls Does not yet update progress meter.	2024-07-24 12:05:10 -04:00
Joey Hess	5b1ac1a313	more generic clientGet	2024-07-24 11:10:19 -04:00
Joey Hess	10f2c23fd7	fix slowloris timeout in hashing resume of download of large file Hash the data that is already present in the file before connecting to the http server.	2024-07-24 11:03:59 -04:00
Joey Hess	7bd616e169	Remote.Git retrieveKeyFile works with annex+http urls This includes a bugfix to serveGet, it hung at the end.	2024-07-24 10:28:44 -04:00
Joey Hess	ad945896c9	avoid needing ifdefs when using P2P.Http.Client	2024-07-24 08:33:59 -04:00
Joey Hess	b0eed55d4f	factor out http server and client into own modules To avoid a cycle when Remote.Git uses the client.	2024-07-23 14:12:38 -04:00
Joey Hess	6bbc4565e6	started wiring p2phttp into Remote.Git but we have a cycle, ugh	2024-07-23 13:53:10 -04:00
Joey Hess	5c39652235	starting support for remote.name.annexUrl set to annex+http In this case, Remote.Git should not use that url for all access to the repository. It will only be used for annex operations, which isn't done yet.	2024-07-23 09:12:21 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	d2b27ca136	add content retention files This allows lockContentShared to lock content for eg, 10 minutes and if the process then gets terminated before it can unlock, the content will remain locked for that amount of time. The Windows implementation is not yet tested. In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio or remotedaemon is serving the P2P protocol, and is asked to LOCKCONTENT, and that process gets killed, the content will not be subject to deletion. This is not a perfect solution to doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most of the way there, without needing any P2P protocol changes. This is only done in v10 and higher repositories (or on Windows). It might be possible to backport it to v8 or earlier, but it would complicate locking even further, and without a separate lock file, might be hard. I think that by the time this fix reaches a given user, they will probably have been running git-annex 10.x long enough that their v8 repositories will have upgraded to v10 after the 1 year wait. And it's not as if git-annex hasn't already been subject to this problem (though I have not heard of any data loss caused by it) for 6 years already, so waiting another fraction of a year on top of however long it takes this fix to reach users is unlikely to be a problem.	2024-07-03 14:58:39 -04:00
Joey Hess	8b5fc94d50	add optional object file location to storeKey This will be used by the next commit to simplify the proxy.	2024-07-01 10:42:27 -04:00
Joey Hess	f833a28844	Merge branch 'master' into proxy-specialremotes	2024-06-30 11:16:20 -04:00
Joey Hess	3d646703ee	list proxied remotes and cluster gateways in git-annex info Wanted to also list a cluster's nodes when showing info for the cluster, but that's hard because it needs getting the name of the proxying remote, which is some prefix of the cluster's name, but if the names contain dashes there's no good way to know which prefix it is.	2024-06-30 11:14:13 -04:00
Joey Hess	158d7bc933	fix handling of ERROR in response to REMOVE This allows an error message from a proxied special remote to be displayed to the client. In the case where removal from several nodes of a cluster fails, there can be several errors. What to do? I decided to only show the first error to the user. Probably in this case the user is not in a position to do anything about an error message, so best keep it simple. If the problem with the first node is fixed, they'll see the error from the next node.	2024-06-28 14:10:25 -04:00
Joey Hess	a6ea057f6b	fix handling of ERROR in response to CHECKPRESENT That error is now rethrown on the client, so it will be displayed. For example: $ git-annex fsck x --fast --from AMS-dir fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed No protocol version check is needed. Because in order to talk to a proxied special remote, the client has to be running the upcoming git-annex release. Which has this fix in it.	2024-06-28 13:46:27 -04:00
Joey Hess	c3a785204e	support a P2PConnection that uses TMVars rather than Handles This will allow having an internal thread speaking P2P protocol, which will be needed to support proxying to external special remotes. No serialization is done on the internal P2P protocol of course. When a ByteString is being exchanged, it may or may not be exactly the length indicated by DATA. While that has to be carefully managed for the serialized P2P protocol, here it would require buffering the whole lazy bytestring in memory to check its length when sending, so it's better to do length checks on the receiving side.	2024-06-28 11:22:29 -04:00
Joey Hess	20ef1262df	give proxied cluster nodes a higher cost than the cluster gateway This makes eg git-annex get default to using the cluster rather than an arbitrary node, which is better UI. The actual cost of accessing a proxied node vs using the cluster is basically the same. But using the cluster allows smarter load-balancing to be done on the cluster.	2024-06-27 15:21:03 -04:00
Joey Hess	0ef4183b00	Merge branch 'master' into proxy	2024-06-27 12:41:57 -04:00
Joey Hess	19137ae780	avoid unfiltered debugging from git-annex-shell When --debugfilter or annex.debugfilter is set, avoid propigating debug output from git-annex-shell, since it cannot be filtered. It would be possible to pass --debugfilter on to git-annex-shell, but it only started accepting that option in 2022. So it would break interop with older versions.	2024-06-27 12:37:25 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	202ea3ff2a	don't sync with cluster nodes by default Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. This avoids a lot of unncessary work when a cluster has a lot of nodes just in checking if each node's preferred content is satisfied. And it avoids content being sent to nodes individually, so instead syncing with clusters always fanout uploads to nodes. The downside is that there are situations where a cluster's preferred content settings can be met, but those of its nodes are not. Or where a node does not contain a key, but the cluster does, and there are not enough copies of the key yet, so it would be desirable the send it there. I think that's an acceptable tradeoff. These kind of situations are ones where the cluster itself should probably be responsible for copying content to the node. Which it can do much less expensively than a client can. Part of the balanced preferred content design that I will be working on in a couple of months involves rebalancing clusters, so I expect to revisit this. The use of annex-sync config does allow running git-annex sync with a specific node, or nodes, and it will sync with it. And it's also possible to set annex-sync git configs to make it sync with a node by default. (Although that will require setting up an explicit git remote for the node rather than relying on the proxied remote.) Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster due to a cycle. And the Annex.Startup load of clusters happens too late for Remote.Git to use that. This does mean one redundant load of the cluster log, though only when there is a proxy.	2024-06-25 10:24:38 -04:00
Joey Hess	b8016eeb65	add annex-proxied This makes git-annex sync and similar not treat proxied remotes as git syncable remotes. Also, display in git-annex info remote when the remote is proxied.	2024-06-24 10:16:59 -04:00
Joey Hess	0c111fc96a	fix git-annex sync --content with proxied remotes Loading the remote list a second time was removing all proxied remotes. That happened because setting up the proxied remote added some config fields to the in-memory git config, and on the second load, it saw those configs and decided not to overwrite them with the proxy. Now on the second load, that still happens. But now, the proxied git configs are used to generate a remote same as if those configs were all set. The reason that didn't happen before was twofold, the gitremotes cache was not dropped, and the remote's url field was not set correctly. The problem with the remote's url field is that while it was marked as proxy inherited, all other proxy inherited fields are annex- configs. And the code to inherit didn't work for the url field. Now it all works, but git-annex sync is left running git push/pull on the proxied remote, which doesn't work. That still needs to be fixed.	2024-06-24 09:45:51 -04:00
Joey Hess	5b332a87be	dropping from clusters Dropping from a cluster drops from every node of the cluster. Including nodes that the cluster does not think have the content. This is different from GET and CHECKPRESENT, which do trust the cluster's location log. The difference is that removing from a cluster should make 100% the content is gone from every node. So doing extra work is ok. Compare with CHECKPRESENT where checking every node could make it very expensive, and the worst that can happen in a false negative is extra work being done. Extended the P2P protocol with FAILURE-PLUS to handle the case where a drop from one node succeeds, but a drop from another node fails. In that case the entire cluster drop has failed. Note that SUCCESS-PLUS is returned when dropping from a proxied remote that is not a cluster, when the protocol version supports it. This is because P2P.Proxy does not know when it's proxying for a single node cluster vs for a remote that is not a cluster.	2024-06-23 09:43:40 -04:00
Joey Hess	a6a04b7e5e	avoid storing SUCCESS-PLUS uuid when it is the remote uuid This is slightly belt and suspenders, but nothing guarantees that the peer avoids including its uuid in the SUCCESS-PLUS list as it's supposed to. And while it probably doesn't matter if the location log is updated redundantly, let's not find out.	2024-06-23 08:21:11 -04:00
Joey Hess	6eac3112e5	be quiet when reading cluster and proxy information at startup I had a transfer of 3 files fail like this: git-annex: transferrer protocol error: "(recording state in git...)" The remote had stalldetection enabled, although I didn't see it stall. So git-annex transferrer would have been started up. I guess that one of these new git-annex branch reads, that happens early, caused that message due to perhaps an uncommitted git-annex branch change. Since the transferrer speaks a protocol over stdout, it needs to be prevented from outputting other messages to stdout. Interestingly, startupAnnex is run after prepRunCommand, so if a command requests quiet output it would already be quiet. But the transferrer does not, instead it calls Annex.setOutput SerializedOutput in its start action.	2024-06-18 21:31:32 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	a4c9d4424c	remove Logs.Presence imports When imported along with Logs.Location, it can be an unused import and it won't warn, due to reexports. The point if this is really to show that Logs.Presence is not widely used, outside Logs/	2024-06-14 17:27:34 -04:00
Joey Hess	e224b99f36	whitespace	2024-06-12 13:24:25 -04:00

1 2 3 4 5 ...

1620 commits