git-annex

Author	SHA1	Message	Date
Joey Hess	f9b7ce7224	add Annex worker pool to P2PHttp This will be needed for get and store, since those need to run Annex actions. withLocalP2PConnections will also probably use it.	2024-07-10 12:19:47 -04:00
Joey Hess	f452bd448a	REMOVE-BEFORE and GETTIMESTAMP proxying For clusters, the timestamps have to be translated, since each node can have its own idea about what time it is. To translate a timestamp, the proxy remembers what time it asked the node for a timestamp in GETTIMESTAMP, and applies the delta as an offset in REMOVE-BEFORE. This does mean that a remove from a cluster has to call GETTIMESTAMP on every node before dropping from nodes. Not very efficient. Although currently it tries to drop from every single node anyway, which is also not very efficient. I thought about caching the GETTIMESTAMP from the nodes on the first call. That would improve efficiency. But, since monotonic clocks on !Linux don't advance when the computer is suspended, consider what might happen if one node was suspended for a while, then came back. Its monotonic timestamp would end up behind where the proxying expects it to be. Would that result in removing when it shouldn't, or refusing to remove when it should? Have not thought it through. Either way, a cluster behaving strangly for an extended period of time because one of its nodes was briefly asleep doesn't seem like good behavior.	2024-07-04 15:09:34 -04:00
Joey Hess	99b7a0cfe9	use REMOVE-BEFORE in P2P protocol Only clusters still need to be fixed to close this todo.	2024-07-04 13:47:38 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	d2b27ca136	add content retention files This allows lockContentShared to lock content for eg, 10 minutes and if the process then gets terminated before it can unlock, the content will remain locked for that amount of time. The Windows implementation is not yet tested. In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio or remotedaemon is serving the P2P protocol, and is asked to LOCKCONTENT, and that process gets killed, the content will not be subject to deletion. This is not a perfect solution to doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most of the way there, without needing any P2P protocol changes. This is only done in v10 and higher repositories (or on Windows). It might be possible to backport it to v8 or earlier, but it would complicate locking even further, and without a separate lock file, might be hard. I think that by the time this fix reaches a given user, they will probably have been running git-annex 10.x long enough that their v8 repositories will have upgraded to v10 after the 1 year wait. And it's not as if git-annex hasn't already been subject to this problem (though I have not heard of any data loss caused by it) for 6 years already, so waiting another fraction of a year on top of however long it takes this fix to reach users is unlikely to be a problem.	2024-07-03 14:58:39 -04:00
Joey Hess	2f2cc38c28	fix build on old ghc getStdRandom used to be an IO action	2024-07-02 12:27:14 -04:00
Joey Hess	fa5e7463eb	fix display when proxied GET yields ERROR The error message is not displayed to the use, but this mirrors the behavior when a regular get from a special remote fails. At least now there is not a protocol error.	2024-07-01 11:19:02 -04:00
Joey Hess	dce3848ad8	avoid populating proxy's object file when storing on special remote Now that storeKey can have a different object file passed to it, this complication is not needed. This avoids a lot of strange situations, and will also be needed if streaming is eventually supported.	2024-07-01 10:53:49 -04:00
Joey Hess	8b5fc94d50	add optional object file location to storeKey This will be used by the next commit to simplify the proxy.	2024-07-01 10:42:27 -04:00
Joey Hess	711a5166e2	PUT to proxied special remote working Still needs some work. The reason that the waitv is necessary is because without it, runNet loops back around and reads the next protocol message. But it's not finished reading the whole bytestring yet, and so it reads some part of it.	2024-06-28 17:10:58 -04:00
Joey Hess	2e5af38f86	GET from proxied special remote Working, but lots of room for improvement... Without streaming, so there is a delay before download begins as the file is retreived from the special remote. And when resuming it retrieves the whole file from the special remote again. Also, if the special remote throws an exception, currently it shows as "protocol error".	2024-06-28 15:44:48 -04:00
Joey Hess	158d7bc933	fix handling of ERROR in response to REMOVE This allows an error message from a proxied special remote to be displayed to the client. In the case where removal from several nodes of a cluster fails, there can be several errors. What to do? I decided to only show the first error to the user. Probably in this case the user is not in a position to do anything about an error message, so best keep it simple. If the problem with the first node is fixed, they'll see the error from the next node.	2024-06-28 14:10:25 -04:00
Joey Hess	a6ea057f6b	fix handling of ERROR in response to CHECKPRESENT That error is now rethrown on the client, so it will be displayed. For example: $ git-annex fsck x --fast --from AMS-dir fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed No protocol version check is needed. Because in order to talk to a proxied special remote, the client has to be running the upcoming git-annex release. Which has this fix in it.	2024-06-28 13:46:27 -04:00
Joey Hess	d3c75c003a	proxying special remotes This is early, but already working for CHECKPRESENT. However, when the special remote throws an exception on checkPresent, this happens: [2024-06-28 13:22:18.520884287] (P2P.IO) [ThreadId 4] P2P > ERROR directory /home/joey/tmp/bench2/dir is not accessible [2024-06-28 13:22:18.521053135] (P2P.IO) [ThreadId 4] P2P < ERROR expected SUCCESS or FAILURE git-annex: client error: expected SUCCESS or FAILURE (fixing location log) p2pstdio: 1 failed Based on the location log, x was expected to be present, but its content is missing. failed	2024-06-28 13:31:19 -04:00
Joey Hess	62750f0102	shut down RemoteSides cleanly Before it just exited without actually shutting down the RemoteSides, when the client hung up.	2024-06-28 13:19:57 -04:00
Joey Hess	cf59d7f92c	GET and CHECKPRESENT amoung lowest cost cluster nodes Before it was using a node that might have had a higher cost. Also threw in a random selection from amoung the low cost nodes. Of course this is a poor excuse for load balancing, but it's better than nothing. Most of the time...	2024-06-27 14:36:55 -04:00
Joey Hess	dabd05e547	remove a TODO marker I have a todo item for this outside the code	2024-06-27 13:36:04 -04:00
Joey Hess	3dad9446ce	distributed cluster cycle prevention Added BYPASS to P2P protocol, and use it to avoid cycling between cluster gateways. Distributed clusters are working well now!	2024-06-27 12:20:22 -04:00
Joey Hess	effaf51b1f	avoid loop between cluster gateways The VIA extension is still needed to avoid some extra work and ugly messages, but this is enough that it actually works. This filters out the RemoteSides that are a proxied connection via a remote gateway to the cluster. The VIA extension will not filter those out, but will send VIA to them on connect, which will cause the ones that are accessed via the listed gateways to be filtered out.	2024-06-26 15:29:59 -04:00
Joey Hess	4172109c8d	support multi-gateway clusters VIA extension still needed otherwise a copy to a cluster can loop forever.	2024-06-26 15:07:03 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	1bfe7f8a53	honor preferred content settings of cluster nodes Except when no nodes want a file, it has to be stored somewhere, so store it on all. Which is not really desirable, but neither is having to pick one. ProtoAssociatedFile deserialization is rather broken, and this could possibly affect preferred content expressions that match on filenames. The inability to roundtrip whitespace like tabs and newlines through is not a problem because preferred content expressions can't be written that match on whitespace such as a tab. For example: joey@darkstar:~/tmp/bench/z>git-annex wanted origin-node2 'exclude=CTRL-VTab' wanted origin-node2 git-annex: Parse error: Parse failure: near "*" But, the filtering of control characters could perhaps be a problem. I think that filtering is now obsolete, git-annex has comprehensive filtering of control characters when displaying filenames, that happens at a higher level. However, I don't want to risk a security hole so am leaving in that filtering in ProtoAssociatedFile deserialization for now.	2024-06-25 11:43:09 -04:00
Joey Hess	a23b0abf28	PUT to cluster send to all nodes rather than none If the location log says all nodes contain content, pass in all nodes, rather than none. The location log can be wrong. While it's good to avoid unncessessary connections to nodes that already contain a key, it would be bad to refuse to accept an upload at all when the location log is wrong. Also, passing in no nodes leaves the proxy in an untenable state. It can't proxy to no nodes. So it closes the connection. Passing in all nodes means it has to do the work to connect to all of them, and see that they say they already have the content, and then it can tell the client that.	2024-06-25 10:32:34 -04:00
Joey Hess	202ea3ff2a	don't sync with cluster nodes by default Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. This avoids a lot of unncessary work when a cluster has a lot of nodes just in checking if each node's preferred content is satisfied. And it avoids content being sent to nodes individually, so instead syncing with clusters always fanout uploads to nodes. The downside is that there are situations where a cluster's preferred content settings can be met, but those of its nodes are not. Or where a node does not contain a key, but the cluster does, and there are not enough copies of the key yet, so it would be desirable the send it there. I think that's an acceptable tradeoff. These kind of situations are ones where the cluster itself should probably be responsible for copying content to the node. Which it can do much less expensively than a client can. Part of the balanced preferred content design that I will be working on in a couple of months involves rebalancing clusters, so I expect to revisit this. The use of annex-sync config does allow running git-annex sync with a specific node, or nodes, and it will sync with it. And it's also possible to set annex-sync git configs to make it sync with a node by default. (Although that will require setting up an explicit git remote for the node rather than relying on the proxied remote.) Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster due to a cycle. And the Annex.Startup load of clusters happens too late for Remote.Git to use that. This does mean one redundant load of the cluster log, though only when there is a proxy.	2024-06-25 10:24:38 -04:00
Joey Hess	5b332a87be	dropping from clusters Dropping from a cluster drops from every node of the cluster. Including nodes that the cluster does not think have the content. This is different from GET and CHECKPRESENT, which do trust the cluster's location log. The difference is that removing from a cluster should make 100% the content is gone from every node. So doing extra work is ok. Compare with CHECKPRESENT where checking every node could make it very expensive, and the worst that can happen in a false negative is extra work being done. Extended the P2P protocol with FAILURE-PLUS to handle the case where a drop from one node succeeds, but a drop from another node fails. In that case the entire cluster drop has failed. Note that SUCCESS-PLUS is returned when dropping from a proxied remote that is not a cluster, when the protocol version supports it. This is because P2P.Proxy does not know when it's proxying for a single node cluster vs for a remote that is not a cluster.	2024-06-23 09:43:40 -04:00
Joey Hess	7bbd822a17	avoid using cluster nodes in drop proof when dropping from cluster This is obviously necessary in order for dropping from a cluster to be able to drop from all nodes. It also avoids violating numcopies when a cluster node is a special remote. If it were used in the drop proof, nothing would prevent the cluster from dropping from it.	2024-06-23 06:20:11 -04:00
Joey Hess	6eac3112e5	be quiet when reading cluster and proxy information at startup I had a transfer of 3 files fail like this: git-annex: transferrer protocol error: "(recording state in git...)" The remote had stalldetection enabled, although I didn't see it stall. So git-annex transferrer would have been started up. I guess that one of these new git-annex branch reads, that happens early, caused that message due to perhaps an uncommitted git-annex branch change. Since the transferrer speaks a protocol over stdout, it needs to be prevented from outputting other messages to stdout. Interestingly, startupAnnex is run after prepRunCommand, so if a command requests quiet output it would already be quiet. But the transferrer does not, instead it calls Annex.setOutput SerializedOutput in its start action.	2024-06-18 21:31:32 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	fb0fd78485	only use a remote as a node when git configuration is set Avoids someone writing to cluster.log and nominating remotes of someone else's repository as a cluster.	2024-06-18 11:37:38 -04:00
Joey Hess	f049156a03	checkpresent support for clusters This assumes that the proxy for a cluster has up-to-date location logs. If it didn't, it might proxy the checkpresent to a node that no longer has the content, while some other node still does, and so it would incorrectly appear that the cluster no longer contains the content. Since cluster UUIDs are not stored to location logs, git-annex fsck --fast when claiming to fix a location log when that occurred would not cause any problems. And presumably the location tracking would later get sorted out. At least usually, changes to the content of nodes goes via the proxy, and it will update its location logs, so they will be accurate. However, if there were multiple proxies to the same cluster, or nodes were accessed directly (or via proxy to the node and not the cluster), the proxy's location log could certainly be wrong. (The location log access for GET has the same issues.)	2024-06-18 11:16:16 -04:00
Joey Hess	88d9a02f7c	initial, working support for getting from clusters Currently tends to put all the load on a single node, which will need to be improved.	2024-06-18 11:01:10 -04:00
Joey Hess	d34326ab76	factor out Annex.Proxy	2024-06-18 10:51:37 -04:00
Joey Hess	f0d6114286	refactor cluster code into own module	2024-06-18 10:36:04 -04:00
Joey Hess	64afbb0b93	don't count clusters as copies, continued Handled limitCopies, as well as everything using fromNumCopies and fromMinCopies. This should be everything, probably. Note that, git-annex info displays a count of repositories, which still includes cluster. I think that's ok. It would be possible to filter out clusters there, but to the user they're pretty much just another repository. The numcopies displayed by eg `git-annex info .` does not include clusters.	2024-06-16 15:14:53 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	36c6d8da69	don't count clusters as copies Since the cluster UUID is inserted into the location log when the location log lists a node as containing content. Also avoid trying to lock content on cluster remotes. The cluster nodes are also proxied, so that content can be locked on individual nodes, and locking content on a cluster as a whole probably won't be implemented. And made git-annex whereis use numcopies machinery for displaying its count, so it won't count cluster UUIDs redundantly to nodes. Other commands, like git-annex info that also display numcopies information already used the numcopies machinery. There is more to be done, fromNumCopies is sometimes used to get a number that is compared with a list of UUIDs. And limitCopies doesn't use numcopies machinery.	2024-06-16 14:17:56 -04:00
Joey Hess	a2f4a8eddf	proxying GET now working Memory use is small and constant; receiveBytes returns a lazy bytestring and it does stream. Comparing speed of a get of a 500 mb file over proxy from origin-origin, vs from the same remote over a direct ssh: joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin get bigfile (from origin-origin...) ok (recording state in git...) 1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k 0inputs+984320outputs (0major+10779minor)pagefaults 0swaps joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh get bigfile (from direct-ssh...) ok 1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k 0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps So the proxy doesn't add much overhead even when run on the same machine as the client and remote. Still, piping receiveBytes into sendBytes like this does suggest that the proxy could be made to use less CPU resouces by using `sendfile()`.	2024-06-11 15:09:43 -04:00
Joey Hess	09b5e53f49	set annex.uuid in proxy's Repo getRepoUUID looks at that, and was seeing the annex.uuid of the proxy. Which caused it to unncessarily set the git config. Probably also would have led to other problems.	2024-06-11 13:40:50 -04:00
Joey Hess	649b87bedd	Merge branch 'master' into proxy	2024-06-10 14:26:18 -04:00
Joey Hess	25a6ab6f11	Avoid grafting in export tree objects that are missing They could be missing due to an interrupted git-annex at just the wrong time during a prior graft, after which the tree objects got garbage collected. Or they could be missing because of manual messing with the git-annex branch, eg resetting it to back before the graft commit. Sponsored-by: Dartmouth College's OpenNeuro project	2024-06-07 16:51:50 -04:00
Joey Hess	b32c4c2e98	atomic git-annex branch update when regrafting in transition Fix a bug where interrupting git-annex while it is updating the git-annex branch could lead to git fsck complaining about missing tree objects. Interrupting git-annex while regraftexports is running in a transition that is forgetting git-annex branch history would leave the repository with a git-annex branch that did not contain the tree shas listed in export.log. That lets those trees be garbage collected. A subsequent run of the same transition then regrafts the trees listed in export.log into the git-annex branch. But those trees have been lost. Note that both sides of `if neednewlocalbranch` are atomic now. I had thought only the True side needed to be, but I do think there may be cases where the False side needs to be as well. Sponsored-by: Dartmouth College's OpenNeuro project	2024-06-07 16:34:10 -04:00
Joey Hess	b43c835def	instantiate remotes that are behind a proxy remote Untested, but this should be close to working. The proxied remotes have the same url but a different uuid. When talking to current git-annex-shell, it will fail due to a uuid mismatch. Once it supports proxies, it will know that the presented uuid is for a remote that it proxies for. The check for any git config settings for a remote with the same name as the proxied remote is there for several reasons. One is security: Writing a name to the proxy log should not cause changes to how an existing, configured git remote operates in a different clone of the repo. It's possible that the user has been using a proxied remote, and decides to set a git config for it. We can't tell the difference between that scenario and an evil remote trying to eg, intercept a file upload by replacing their remote with a proxied remote. Also, if the user sets some git config, does it override the config inherited from the proxy remote? Seems a difficult question. Luckily, the above means we don't need to think through it. This does mean though, that in order for a user to change the config of a proxy remote, they have to manually set its annex-uuid and url, as well as the config they want to change. They may also have to set any of the inherited configs that they were relying on.	2024-06-06 17:15:32 -04:00
Joey Hess	3318d25c65	adjust unlocked execute bit handling When building an adjusted unlocked branch, make pointer files executable when the annex object file is executable. This slows down git-annex adjust --unlock/--unlock-present by needing to stat all annex object files in the tree. Probably not a significant slowdown compared to other work they do, but I have not benchmarked. I chose to leave git-annex adjust --unlock marked as stable, even though get or drop of an object file can change whether it would make the pointer file executable. Partly because making it unstable would slow down re-adjustment, and partly for symmetry with the handling of an unlocked pointer file that is executable when the content is dropped, which does not remove its execute bit.	2024-05-28 12:39:42 -04:00
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	adcebbae47	clean up git-remote-annex git-annex branch handling Implemented alternateJournal, which git-remote-annex uses to avoid any writes to the git-annex branch while setting up a special remote from an annex:: url. That prevents the remote.log from being overwritten with the special remote configuration from the url, which might not be 100% the same as the existing special remote configuration. And it prevents an overwrite deleting of other stuff that was already in the remote.log. Also, when the branch was created by git-remote-annex, only delete it at the end if nothing else has been written to it by another command. This fixes the race condition described in `797f27ab05`, where git-remote-annex set up the branch and git-annex init and other commands were run at the same time and their writes to the branch were lost.	2024-05-15 17:33:38 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	59fc2005ec	git clone support for git-remote-annex Also support using annex:: urls that specify the whole special remote config. Both of these cases need a special remote to be initialized enough to use it, which means writing to .git/config but not to the git-annex branch. When cloning, the remote is left set up in .git/config, so further use of it, by git-annex or git-remote-annex will work. When using git with an annex:: url, a temporary remote is written to .git/config, but then removed at the end. While that's a little bit ugly, the fact is that the Remote interface expects that it's ok to set git configs of the remote that is being initialized. And it's nowhere near as ugly as the alternative of making a temporary git repository and initializing the special remote in there. Cloning from a repository that does not contain a git-annex branch and then later running git-annex init is currently broken, although I've gotten most of the way there to supporting it. See cleanupInitialization FIXME. Special shout out to git clone for running gitremote-helpers with GIT_DIR set, but not in the git repository and with GIT_WORK_TREE not set. Resulting in needing the fixupRepo hack. Sponsored-by: unqueued on Patreon	2024-05-08 17:07:33 -04:00
Yaroslav Halchenko	87e2ae2014	run codespell throughout fixing typos automagically === Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:21 -04:00
Joey Hess	c410b2bb73	annex.maxextensions configuration Controls how many filename extensions to preserve. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-04-18 14:23:38 -04:00
Joey Hess	c64a73c7ea	startExternalAddonProcess add parameters Not used yet but intended to support eg running "rclone gitannex"	2024-04-17 13:09:10 -04:00
Joey Hess	2c73845d90	multiple -m second try Test suite passes this time. When committing the adjusted branch, use the old method to make a message that old git-annex can consume. Also made the code accept the new message, so that eventually commitTreeExactMessage can be removed. Sponsored-by: Kevin Mueller on Patreon	2024-04-09 12:56:47 -04:00
Joey Hess	a8dd85ea5a	Revert "multiple -m" This reverts commit `cee12f6a2f`. This commit broke git-annex init run in a repo that was cloned from a repo with an adjusted branch checked out. The problem is that findAdjustingCommit was not able to identify the commit that created the adjusted branch. It seems that there is an extra "\n" at the end of the commit message that it does not expect. Since backwards compatability needs to be maintained, cannot just make findAdjustingCommit accept it with the "\n". Will have to instead have one commitTree variant that uses the old method, and use it for adjusted branch committing.	2024-04-02 17:29:07 -04:00
Joey Hess	cee12f6a2f	multiple -m sync, assist, import: Allow -m option to be specified multiple times, to provide additional paragraphs for the commit message. The option parser didn't allow multiple -m before, so there is no risk of behavior change breaking something that was for some reason using multiple -m already. Pass through to git commands, so that the method used to assemble the paragrahs is whatever git does. Which might conceivably change in the future. Note that git commit-tree has supported -m since git 1.7.7. commitTree was probably not using it since it predates that version. Since the configure script prevents building git-annex with git older than 2.1, there is no risk that it's not supported now. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-03-27 15:58:27 -04:00
Joey Hess	f601e06b90	avoid build warning on windows	2024-03-26 14:07:41 -04:00
Joey Hess	a69871491f	avoid build warning on windows Since append was only exported by Annex.Common on unix, excluding it from import caused a build warning on windows.	2024-03-26 13:16:33 -04:00
Joey Hess	f04d9574d6	fix transfer lock file for Download to not include uuid While redundant concurrent transfers were already prevented in most cases, it failed to prevent the case where two different repositories were sending the same content to the same repository. By removing the uuid from the transfer lock file for Download transfers, one repository sending content will block the other one from also sending the same content. In order to interoperate with old git-annex, the old lock file is still locked, as well as locking the new one. That added a lot of extra code and work, and the plan is to eventually stop locking the old lock file, at some point in time when an old git-annex process is unlikely to be running at the same time. Note that in the case of 2 repositories both doing eg `git-annex copy foo --to origin` the output is not that great: copy b (to origin...) transfer already in progress, or unable to take transfer lock git-annex: transfer already in progress, or unable to take transfer lock 97% 966.81 MiB 534 GiB/s 0sp2pstdio: 1 failed Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe)) Transfer failed Perhaps that output could be cleaned up? Anyway, it's a lot better than letting the redundant transfer happen and then failing with an obscure error about a temp file, which is what it did before. And it seems users don't often try to do this, since nobody ever reported this bug to me before. (The "97%" there is actually how far along the other transfer is.) Sponsored-by: Joshua Antonishen on Patreon	2024-03-25 14:47:46 -04:00
Joey Hess	62129f0b24	fix windows transfer lock check If the lock file was not able to be exclusivlely locked, don't indicate locking failed. I'm pretty sure this was a typo. It goes all the way back to `891c85cd88` where locking was first introduced on windows, and there's no indication of why it would make sense to return True here. Sponsored-by: Leon Schuermann on Patreon	2024-03-25 14:11:25 -04:00
Joey Hess	9c988ee607	handle multiple VURL checksums in one pass git-annex fsck and some other commands that verify the content of a key were using the non-incremental verification interface. But for VURL urls, that interface is innefficient because when there are multiple equivilant keys, it has to separately read and checksum for each key in turn until one matches. It's more efficient for those to use the incremental interface, since the file can be read a single time. There's no real downside to using the incremental interface when available. Note that more speedup could be had for VURL, if it was able to calculate the checksum a single time and then compare with the equivilant keys checksums. When the equivilant keys use the same type of checksum. Sponsored-by: k0ld on Patreon	2024-03-01 14:41:10 -04:00
Joey Hess	cc17ac423b	implement isCryptographicallySecureKey for VURL Considerable difficulty to work around an import cycle. Had to move the list of backends (except for VURL) to Backend.Variety to VURL could use it. Sponsored-by: Kevin Mueller on Patreon	2024-02-29 17:26:35 -04:00
Joey Hess	e7b7ea78af	lift isCryptographicallySecure to Annex Needed for VURL backend. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:14:13 -04:00
Joey Hess	0f7143d226	support VURL backend Not yet implemented is recording hashes on download from web and verifying hashes. addurl --verifiable option added with -V short option because I expect a lot of people will want to use this. It seems likely that --verifiable will become the default eventually, and possibly rather soon. While old git-annex versions don't support VURL, that doesn't prevent using them with keys that use VURL. Of course, they won't verify the content on transfer, and fsck will warn that it doesn't know about VURL. So there's not much problem with starting to use VURL even when interoperating with old versions. Sponsored-by: Joshua Antonishen on Patreon	2024-02-29 13:48:51 -04:00
Joey Hess	70cb41028e	Pass --no-warnings to yt-dlp Notice a warning with -J2 causing git-annex progress output to get slightly messed up. Error output would also probably do that, so perhaps it should capture stderr and only display it when yt-dlp exited nonzero? This option might also make sense for youtube-dl, I don't have an installation handy anymore to check.	2024-02-19 18:35:57 -04:00
Joey Hess	68e99513f0	added annex.commitmessage-command config Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:35:22 -04:00
Joey Hess	90db97d9a2	importfeed: Added --scrape option Which uses yt-dlp to screen scrape the equivilant of an RSS feed. Note that youtubedlscraped is a speed optimisation. Since yt-dlp found the urls, we know it can download them. That avoids calling youtubeDlSupported on each url, which makes --fast a lot faster. Almost all the same metadata fields and file formatting fields are populated, when yt-dlp is able to get the data. Note that yt-dlp has some additional useful metadata that could be exposed. But, much of it is specific to particular websites, and it would be hard to document on the git-annex importfeed man page. Sponsored-by: unqueued on Patreon	2024-01-30 15:37:29 -04:00
Joey Hess	2114253eaf	update comment The segfault seems to be fixed with git 2.43, I'm not sure what the affected range was.	2024-01-20 11:25:22 -04:00
Joey Hess	20567e605a	add directional stalldetection and bwlimit configs Sponsored-by: Dartmouth College's DANDI project	2024-01-19 15:27:53 -04:00
Joey Hess	8da85fd3a3	RawFilePath conversion Sponsored-by: Dartmouth College's DANDI project	2024-01-19 14:26:21 -04:00
Joey Hess	703a70cafa	avoid watchFileSize running backward This is groundwork for using watchFileSize for downloads from external special remotes. In Annex.Content.downloadUrl, this potentially avoids jitter in the progress meter. When downloading with conduit, the meter gets updated based on both the size of the file, and on the data flowing through conduit. If that has not yet been flushed to the file, it seems possible for the meter to run backwards when meter is updated with the file size. It's probably only a few kb of jitter, so may not be visible. Sponsored-by: Dartmouth College's DANDI project	2024-01-19 14:11:27 -04:00
Joey Hess	df35f70801	tweak stall detection scaling Refactored to allow offline experimentation, and ended up changing the allowedvariation (aka fudge factor) to 3. 10 seems too high, and 1.5 too low. Scale earlier, so even if the first chunk takes less than the configured time period, allowance is made that later chunks might transfer slower. Decided to use the same allowedvariation to decide when to start scaling. Smoothed the scaling out. Some examples: ghci> upscale (BwRate 10 (Duration 60)) 25 BwRate 13 (Duration {durationSeconds = 75}) -- A small scaling upwards after 1/3rd the time. Not noticable. ghci> upscale (BwRate 10 (Duration 60)) 60 BwRate 30 (Duration {durationSeconds = 180}) -- At the configured time, 3x scaling. ghci> upscale (BwRate 10 (Duration 60)) 120 BwRate 60 (Duration {durationSeconds = 360}) -- A typical upscaling, here a 1 minute duration became 6 minutes -- due to the first chunk taking 2 minutes to transfer. ghci> upscale (BwRate 10 (Duration 60)) 600 BwRate 300 (Duration {durationSeconds = 1800}) -- Here the first chunk took 10 minutes to transfer, so it will -- take 30 minutes to detect a stall. Sponsored-by: Dartmouth College's DANDI project	2024-01-19 12:58:41 -04:00
Joey Hess	c2634e7df2	automatically adjust stall detection period Improve annex.stalldetection to handle remotes that update progress less frequently than the configured time period. In particular, this makes remotes that don't report progress but are chunked work when transferring a single chunk takes longer than the specified time period. Any remotes that just have very low update granulatity would also be handled by this. The change to Remote.Helper.Chunked avoids an extra progress update when resuming an interrupted upload. In that case, the code saw first Nothing and then Just the already transferred number of bytes, which defeated this new heuristic. This change will mean that, when resuming an interrupted upload to a chunked remote that does not do its own progress reporting, the progress display does not start out displaying the amount sent so far, until after the first chunk is sent. This behavior change does not seem like a major problem. About the scalefudgefactor, it seems reasonable to expect subsequent chunks to take no more than 1.5 times as long as the first chunk to transfer. Could set it to 1, but then any chunk taking a little longer would be treated as a stall. 2 also seems a likely value. Even 10 might be fine? Sponsored-by: Dartmouth College's DANDI project	2024-01-18 17:12:10 -04:00
Joey Hess	f6cf2dec4c	disk free checking for unsized keys Improve disk free space checking when transferring unsized keys to local git remotes. Since the size of the object file is known, can check that instead. Getting unsized keys from local git remotes does not check the actual object size. It would be harder to handle that direction because the size check is run locally, before anything involving the remote is done. So it doesn't know the size of the file on the remote. Also, transferring unsized keys to other remotes, including ssh remotes and p2p remotes don't do disk size checking for unsized keys. This would need a change in protocol. (It does seem like it would be possible to implement the same thing for directory special remotes though.) In some sense, it might be better to not ever do disk free checking for unsized keys, than to do it only sometimes. A user might notice this direction working and consider it a bug that the other direction does not. On the other hand, disk reserve checking is not implemented for most special remotes at all, and yet it is implemented for a few, which is also inconsistent, but best effort. And so doing this best effort seems to make some sense. Fundamentally, if the user wants the size to always be checked, they should not use unsized keys. Sponsored-by: Brock Spratlen on Patreon	2024-01-16 14:29:10 -04:00
Joey Hess	11b9069dc2	bump copyright year after my first commit of 2024	2024-01-02 14:10:52 -04:00
Joey Hess	a5b9c2ca69	import: Sped up import from special remote when the imported tree is unchanged I saw a nearly 2 minute speed up from this, in a repo with 56000 files some of which are preferred content of the special remote and others not. In such a case, addBackExportExcluded has to do a lot of work, which is unncessary when the tree is unchanged. When using sync --content, preferred content checking of that many files takes about 1 minute. So this speeds up sync --content by 3x. When using git-annex import, the speed up is much larger. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-01-02 13:57:31 -04:00
Joey Hess	9a67ed0f10	importtree: support preferred content expressions needing keys When importing from a special remote, support preferred content expressions that use terms that match on keys (eg "present", "copies=1"). Such terms are ignored when importing, since the key is not known yet. When "standard" or "groupwanted" is used, the terms in those expressions also get pruned accordingly. This does allow setting preferred content to "not (copies=1)" to make a special remote into a "source" type of repository. Importing from it will import all files. Then exporting to it will drop all files from it. In the case of setting preferred content to "present", it's pruned on import, so everything gets imported from it. Then on export, it's applied, and everything in it is left on it, and no new content is exported to it. Since the old behavior on these preferred content expressions was for importtree to error out, there's no backwards compatability to worry about. Except that sync/pull/etc will now import where before it errored out.	2023-12-18 16:27:59 -04:00
Joey Hess	eb59da9dd2	Lower precision of timestamps in git-annex branch This can reduce the size of the branch by up to 8%. My test was running git-annex add 1000 times on one file each. Lots of different high-resolution timestamps were recorded before and eliminating those, after packing, the git repo was 8% smaller. Due to the use of vector clocks, high resolution timestamps are not necessary to make clear which information is most recent when eg, a value is changed repeatedly in the same second. In such a case, the vector clock will be advanced to the next second after the last modification. For example, running git-annex numcopies 1; git-annex numcopies 2 The first will record the current second, while the next records the second after that even if it runs in the same second. As for conflicting information written to two different clones of the repository, this will make git-annex sometimes pick information that was written earlier in a second over information written later in the same second. Usually git-annex does not write conflicting information, but there are some cases where it could. Eg, storing an object on a remote can update the remote state log with some state. If two repos both store the same object, and end up storing different remote state for some reason, this can result in one that ran a tiny bit later winning. Such a situation seems unlikely to be user visible. And a small amount of clock skew could already result in such things. The only case I can think of where this might be a user visible change is if a configuration command like git-annex numcopies is being run in 2 clones of a repository on the same machine at very close to the same time. Then the user will know which they ran last, and git-annex won't. If that did become a problem, this could be dialed back to eg log milliseconds with still some space saving.	2023-12-11 15:04:06 -04:00
Joey Hess	86dbe9a825	migrate: support adding size back to URL keys migrate: Support adding size to URL keys that were added with --relaxed, by running eg: git-annex migrate --backend=URL foo Since url keys cannot be generated, that used to fail. Make it notice that the backend is not changed, and just get the size of the content. Sponsored-by: Brock Spratlen on Patreon	2023-12-08 16:22:14 -04:00
Joey Hess	b65379a107	fix missing space in warning message	2023-12-08 12:36:33 -04:00
Joey Hess	f1ce15036f	started migrate --update This is most of the way there, but not quite working. The layout of migrate.tree/ needs to be changed to follow this approach. git log will list all the files in tree order, so the new layout needs to alternate old and new keys. Can that be done? git may not document tree order, or may not preserve it here. Alternatively, change to using git log --format=raw and extract the tree header from that, then use git diff --raw $tree:migrate.tree/old $tree:migrate.tree/new That will be a little more expensive, but only when there are lots of migrations. Sponsored-by: Joshua Antonishen on Patreon	2023-12-07 15:50:52 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	fd0b510573	improve message about 1 copy "Could only verify the existence of 0 out of 1 necessary copy" does not sound right, but neither does it with "copies". Kept the "1" rather than "only" or such since numcopies is mentioned. Sponsored-by: Brock Spratlen on Patreon	2023-12-04 11:12:54 -04:00
Joey Hess	1654572bc1	fix --from overriding annex-ignore Make git-annex get/copy/move --from foo override configuration of remote.foo.annex-ignore, as documented. This already worked for remotes supporting hasKeyCheap. For others though, git-annex copy --from foo would silently not do anything, while git-annex copy --to foo would use the annex-ignored remote. Also improved the annex-ignore docs, to reflect that `git-annex get` without --from will skip using annex-ignored remotes, for example. Sponsored-by: Dartmouth College's DANDI project	2023-11-30 15:12:07 -04:00
Joey Hess	38b9ebc5fd	newtype MapLog Noticed that Semigroup instance of Map is not suitable to use for MapLog. For example, it behaved like this: ghci> parseTrustLog "foo 1 timestamp=10\nfoo 2 timestamp=11" <> parseTrustLog "foo X timestamp=12" fromList [(UUID "foo",LogEntry {changed = VectorClock 11s, value = SemiTrusted})] Which was wrong, it lost the newer DeadTrusted value. Luckily, nothing used that Semigroup when operating on a MapLog. And this provides a safe instance. Sponsored-by: Graham Spencer on Patreon	2023-11-13 14:37:22 -04:00
Joey Hess	be6b56df4c	remove unused import	2023-11-01 13:14:39 -04:00
Joey Hess	eb42935e58	Windows: Fix CRLF handling in some log files In particular, the mergedrefs file was written with CR added to each line, but read without CRLF handling. This resulted in each update of the file adding CR to each line in it, growing the number of lines, while also preventing the optimisation from working, so it remerged unncessarily. writeFile and readFile do NewlineMode translation on Windows. But the ByteString conversion prevented that from happening any longer. I've audited for other cases of this, and found three more (.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All of those also only prevent optimisations from working. Some other files are currently both read and written with ByteString, but old git-annex may have written them with NewlineMode translation. Other files are at risk for breakage later if the reader gets converted to ByteString. This is a minimal fix, but should be enough, as long as I remember to use fileLines when splitting a ByteString into lines. This leaves files written using ByteString without CR added, but that's ok because old git-annex has no difficulty reading such files. When the mergedrefs file has gotten lines that end with "\r\r\r\n", this will eventually clean it up. Each update will remove a single trailing CR. Note that S8.lines is still used in eg Command.Unused, where it is parsing git show-ref, and similar in Git/*. git commands don't include CR in their output so that's ok. Sponsored-by: Joshua Antonishen on Patreon	2023-10-30 14:23:23 -04:00
Joey Hess	d9fd205cbb	push RawFilePath down into Annex.ReplaceFile Minor optimisation, but a win in every case, except for a couple where it's a wash. Note that replaceFile still takes a FilePath, because it needs to operate on Chars to truncate unicode filenames properly.	2023-10-26 13:36:49 -04:00
Joey Hess	c873586e14	eliminate s2w8 and w82s Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode characters. Luckily, genUUIDInNameSpace is only ever used on ASCII strings as far as I can determine. In particular, git-remote-gcrypt's gcrypt-id is an ASCII string.	2023-10-26 13:12:57 -04:00
Joey Hess	3742263c99	simplify base64 to only use ByteString Note the use of fromString and toString from Data.ByteString.UTF8 dated back to commit `9b93278e8a`. Back then it was using the dataenc package for base64, which operated on Word8 and String. But with the switch to sandi, it uses ByteString, and indeed fromB64' and toB64' were already using ByteString without that complication. So I think there is no risk of such an encoding related breakage. I also tested the case that `9b93278e8a` fixed: git-annex metadata -s foo='a …' x git-annex metadata x metadata x foo=a … In Remote.Helper.Encryptable, it was avoiding using Utility.Base64 because of that UTF8 conversion. Since that's no longer done, it can just use it now.	2023-10-26 13:10:05 -04:00
Joey Hess	0da1d40cd4	Improve memory use of --all when using annex.private This does not improve Annex.Branch.files at all, since it still uses ++ to combine the lists, so forcing all but the last one. But when there are a lot of files in the private journal, it does avoid --all (or a bare repo) from buffering the filenames in memory. See commit `653b719472` for prior discussion of this buffering. Sponsored-by: Graham Spencer on Patreon	2023-10-24 13:20:55 -04:00
Joey Hess	8bde6101e3	sqlite datbase for importfeed importfeed: Use caching database to avoid needing to list urls on every run, and avoid using too much memory. Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster, and memory use dropped from 203000k to 59408k. Database.ImportFeed is Database.ContentIdentifier with the serial number filed off. There is a bit of code duplication I would like to avoid, particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use the persistent sqlite tables, so despite the code being the same, they cannot be factored out. Since this database includes the contentidentifier metadata, it will be slightly redundant if a sqlite database is ever added for metadata. I did consider making such a generic database and using it for this. But, that would then need importfeed to update both the url database and the metadata database, which is twice as much work diffing the git-annex branch trees. Or would entagle updating two databases in a complex way. So instead it seems better to optimise the database that importfeed needs, and if the metadata database is used by another command, use a little more disk space and do a little bit of redundant work to update it. Sponsored-by: unqueued on Patreon	2023-10-23 16:46:22 -04:00
Joey Hess	c268dc5878	only stage regular files from the journal git-annex only writes regular files there, but other things may drop junk like empty .DAV directories around the tree. And trying to hash such things can have weird and hard to understand effects. So it seems best to do a small amount of work in statting the journal file to make sure it's a regular file. Sponsored-by: Jack Hill on Patreon	2023-10-10 13:22:02 -04:00
Joey Hess	724ceeb1a9	avoid unncessary use of curl when conduit will do Avoid using curl when annex.security.allowed-ip-addresses is set but neither annex.web-options nor annex.security.allowed-url-schemes is set to a value that needs curl. Bug introduced in `840bd50390` Sponsored-By: Brock Spratlen on Patreon	2023-08-22 10:25:53 -04:00
Joey Hess	10b5f79e2d	fix empty tree import when directory does not exist Fix behavior when importing a tree from a directory remote when the directory does not exist. An empty tree was imported, rather than the import failing. Merging that tree would delete every file in the branch, if those files had been exported to the directory before. The problem was that dirContentsRecursive returned [] when the directory did not exist. Better for it to throw an exception. But in commit `74f0d67aa3` back in 2012, I made it never theow exceptions, because exceptions throw inside unsafeInterleaveIO become untrappable when the list is being traversed. So, changed it to list the contents of the directory before entering unsafeInterleaveIO. So exceptions are thrown for the directory. But still not if it's unable to list the contents of a subdirectory. That's less of a problem, because the subdirectory does exist (or if not, it got removed after being listed, and it's ok to not include it in the list). A subdirectory that has permissions that don't allow listing it will have its contents omitted from the list still. (Might be better to have it return a type that includes indications of errors listing contents of subdirectories?) The rest of the changes are making callers of dirContentsRecursive use emptyWhenDoesNotExist when they relied on the behavior of it not throwing an exception when the directory does not exist. Note that it's possible some callers of dirContentsRecursive that used to ignore permissions problems listing a directory will now start throwing exceptions on them. The fix to the directory special remote consisted of not making its call in listImportableContentsM use emptyWhenDoesNotExist. So it will throw an exception as desired. Sponsored-by: Joshua Antonishen on Patreon	2023-08-15 12:57:41 -04:00
Joey Hess	be028f10e5	split out Utility.Url.Parse This is mostly for git-repair which can't include all of Utility.Url without adding many dependencies that are not really necessary.	2023-08-14 12:28:10 -04:00
Joey Hess	adda6c1088	Add git-annex remote refs that are not newer to the merged refs list Significant startup speed increase by avoiding repeatedly checking if some remote git-annex branch refs need to be merged when it is not newer. One way this could happen is when there are 2 remotes that are themselves connected. The git-annex branch on the first remote gets updated. Then the second remote pulls from the first, and merges in its git-annex branch. Then the local repo pulls from the second remote, and merges its git-annex branch. At this point, a pull from the first remote will get a git-annex branch that is not newer, but is not on the merged refs list. In my big repo, git-annex startup time dropped from 4 seconds to 0.1 seconds. There were 5 to 10 such remote refs out of 18 remotes. Sponsored-by: Graham Spencer on Patreon	2023-08-09 13:31:36 -04:00
Joey Hess	3a52b4c4c3	fix hang when built with unix-2.8 git-annex test hang when running git-annex add in an adjusted unlocked branch. I couldn't seem to reproduce the hang outside the test suite. Seems that the code added in `26a9ea12d1` was buggy, and as that commit was made without testing it, building with unix-2.8 exposed the bug. I don't fully understand the bug, which involves fdToHandle and then closing the fd, vs closing the handle. May somehow involve laziness or forcing around the S.hGet? Using hClose solved it in any case. (Also eliminated checkcontentfollowssymlinks to fix a build warning when it's not used.)	2023-08-01 20:22:28 -04:00
Joey Hess	eb8e30a2f1	fix build with unix-2.8.0 got the arguments the wrong way around when I wrote this also squelch a build warning	2023-08-01 18:27:12 -04:00
Joey Hess	fa92383993	onlyingroup * Support "onlyingroup=" in preferred content expressions. * Support --onlyingroup= matching option. Sponsored-by: Jack Hill on Patreon	2023-07-31 14:43:58 -04:00
Joey Hess	473d66132d	display explanations in --debug too When --explain is not enabled. This can be useful debugging information as well. Sponsored-by: Dartmouth College's DANDI project	2023-07-31 13:06:40 -04:00
Joey Hess	846384fc3a	--explain for numcopies checks And closed the todo as completed. Sponsored-by: Dartmouth College's DANDI project	2023-07-31 12:53:17 -04:00
Joey Hess	518a51a8a0	--explain for preferred/required content matching And annex.largefiles and annex.addunlocked. Also git-annex matchexpression --explain explains why its input expression matches or fails to match. When there is no limit, avoid explaining why the lack of limit matches. This is also done when no preferred content expression is set, although in a few cases it defaults to a non-empty matcher, which will be explained. Sponsored-by: Dartmouth College's DANDI project	2023-07-26 14:50:04 -04:00
Joey Hess	f25eeedeac	initial implementation of --explain Currently it only displays explanations of options like --in and --copies. In the future, it should explain preferred content expression evaluation and other decisions. The explanations of a few things could be better. In particular, "standard" will just appear as-is (or as "!standard" if it doesn't match), rather than explaining why the standard preferred content expression for the group matches or not. Currently as implemented, it goes to stdout, and so commands like git-annex find that have custom output will not display --explain information. Perhaps that should change, dunno. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 16:52:57 -04:00
Joey Hess	cf40e2d4b6	Revert "use existing debug machinery for explain" This reverts commit `409572c9e4`.	2023-07-25 15:53:50 -04:00
Joey Hess	409572c9e4	use existing debug machinery for explain explain is a kind of debug message, but not formatted in the same way. So it makes sense to reuse the debug machinery for it, since that is already quite optimised. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 15:47:58 -04:00
Joey Hess	e82823d448	nub list of files yt-dlp when resumed was observed having written the same filename twice into the file list. Perhaps once by the first download and once by the resumed one?	2023-07-09 14:18:25 -04:00
Joey Hess	240bae38f6	sync: When in an adjusted branch, merge changes from the original branch This causes changes to the original branch to get merged with a single sync. Before, it took 2 syncs; the first happened to update the synced/ branch, and the second merged changes from the synced/ branch into the ajusted branch. Using mergeToAdjustedBranch when tomerge == origbranch is probably overkill, but it does work fine. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-07-06 12:42:24 -04:00
Joey Hess	adb09117f1	propigateAdjustedCommits: avoid overwriting diverged original branch Bug fix: Re-running git-annex adjust or sync when in an adjusted branch would overwrite the original branch, losing any commits that had been made to it since the adjusted branch was created. When git-annex adjust is run in this situation, it will display a warning about the diverged branches. When git-annex sync is run in this situation, mergeToAdjustedBranch will merge the changes from the original branch to the adjusted branch. So it does not need to display the divergence warning. Note that for some reason, I'm needing to run sync twice for that to happen. The first run does not do the merge and the second does. I'm unsure why and so am not fully done with this bug. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-07-05 17:09:49 -04:00
Joey Hess	a05bc6a314	Fix breakage when git is configured with safe.bareRepository = explicit Running git config --list inside .git then fails, so better to only do that when --git-dir was specified explicitly. Otherwise, when the repository is not bare, run the command inside the working tree. Also make init detect when the uuid it just set cannot be read and fail with an error, in case git changes something that breaks this later. I still don't actually understand why git-annex add/assist -J2 was affected but -J1 was not. But I did show that it was skipping writing to the location log, because the uuid was NoUUID. Sponsored-by: Graham Spencer on Patreon	2023-07-05 14:43:14 -04:00
Joey Hess	928b2a4839	create journal directory in withJournalHandle Fixes a crash by git-annex repair when .git/annex/journal/ does not exist. Normally the journal directory is created before withJournalHandle gets run, but git-annex repair can be run in a situation where it does not exist.	2023-06-21 15:23:59 -04:00
Joey Hess	72715845a1	display destination file before youtube-dl download Rather than after it, which can leave one wondering what file it's downloading. youtubeDl should not ever return Right Nothing in normal operation, becaause it's already asked youtube-dl if it supports the url. So it would have to succeed at that, then not download any file, but also exit successfully, in order for the new error message to display. Also display the name of yt-dlp when using it.	2023-06-20 14:55:25 -04:00
Joey Hess	a861d56428	httpalso: Support being used with special remotes that use chunking. Sponsored-by: k0ld on Patreon	2023-06-20 13:35:28 -04:00
Joey Hess	a36a81dea3	Improve resuming interrupted download when using yt-dlp Sometimes resuming an interrupted download will fail to resume and download more files with different names. That resulted in the workdir having multiple files at the end, which causes git-annex to give up because it does not know what was downloaded. To fix this, use a yt-dlp feature, which appends to a file the name of each file after it's finished downloading it. So the presence of other cruft in the workdir will not confuse git-annex.	2023-06-19 14:39:08 -04:00
Joey Hess	64738ea157	config: Added the --show-origin and --for-file options * config: Added the --show-origin and --for-file options. * config: Support annex.numcopies and annex.mincopies. There is a little bit of redundancy here with other code elsewhere that combines the various configs and selects which to use. But really only for the special case of annex.numcopies, which is a git config that does not override the annex branch setting and for annex.mincopies, which does not have a git config but does have gitattributes settings as well as the annex branch setting. That seems small enough, and unlikely enough to grow into a mess that it was worth supporting annex.numcopies and annex.mincopies in git-annex config --show-origin. Because these settings are a prime thing that someone might get confused about and want to know where they were configured. And, it followed that git-annex config might as well support those two for --set and --get as well. While this is redundant with the speclialized commands, it's only a little code and it makes it more consistent. Note that --set does not have as nice output as numcopies/mincopies commands in some special cases like setting to 0 or a negative number. It does avoid setting to a bad value thanks to the smart constructors (eg configuredNumCopies). As for other git-annex branch configurations that are not set by git-annex config, things like trust and wanted that are specific to a repository don't map to a git config name, so don't really fit into git-annex config. And they are only configured in the git-annex branch with no local override (at least so far), so --show-origin would not be useful for them. Sponsored-by: Dartmouth College's DANDI project	2023-06-12 16:24:31 -04:00
Joey Hess	ae98fb1b31	move unspecifiedAttr check to checkAttr It just so happens that everywhere that checks attrs other than annex.largefiles parses the value further, and failed to parse unspecifiedAttr in a way that behaved the same as if nothing was set. So this is not a bug fix or behavior change. What it does so is prevent future uses of checkAttr from needing to remember to handle checking for this edge case. Sponsored-by: Dartmouth College's DANDI project	2023-06-12 14:37:42 -04:00
Joey Hess	532b227086	update exportdb tree in getImportableContents This avoids bottlenecking on git check-ignore in a particular situation. Also, there may have been a correctness issue with it not having updated it. When the exportdb is already up-to-date, this is not expensive. And the exportdb is updated elsewhere, so usually it is up-to-date. Sponsored-by: Joshua Antonishen on Patreon	2023-06-08 18:36:24 -04:00
Joey Hess	6821ba8dab	sync: use log to track adjusted branch needs updating Speeds up sync in an adjusted branch by avoiding re-adjusting the branch unncessarily, particularly when it is adjusted with --hide-missing or --unlock-present. When there are a lot of files, that was the majority of the time of a --no-content sync. Uses a log file, which is updated when content presence changes. This adds a little bit of overhead to every file get/drop when on such an adjusted branch. The overhead is minimal for get of any size of file, but might be noticable for drop in some cases. It seems like a reasonable trade-off. It would be possible to update the log file only at the end, but then it would not happen if the command is interrupted. When not in an adjusted branch, there should be no additional overhead. (getCurrentBranch is an MVar read, and it avoids the MVar read of getGitConfig.) Note that this does not deal with situations such as: git checkout master, git-annex get, git checkout adjusted branch, git-annex sync. The sync won't know that the adjusted branch needs to be updated. Dealing with that would add overhead to operation in non-adjusted branches, which I don't like. Also, there are other situations like having two adjusted branches that both need to be updated like this, and switching between them and sync not updating. This does mean a behavior change to sync, since it did previously deal with those situations. But, the documentation did not say that it did. The man pages only talk about sync updating the adjusted branch after it transfers content. I did consider making sync keep track of content it transferred (and dropped) and only update the adjusted branch then, not to catch up to other changes made previously. That would perform better. But it seemed rather hard to implement, and also it would have problems with races with a concurrent get/drop, which this implementation avoids. And it seemed pretty likely someone had gotten used to get/drop followed by sync updating the branch. It seems much less likely someone is switching branches, doing get/drop, and then switching back and expecting sync to update the branch. Re-running git-annex adjust still does a full re-adjusting of the branch, for anyone who needs that. Sponsored-by: Leon Schuermann on Patreon	2023-06-08 14:35:41 -04:00
Joey Hess	637f19bebb	fix adjusted branch update breakage Introduced recently in commit `64fc34b3da`. adjustBranch changes the sha that is recorded for the current branch (eg the adjusted branch). So, have to get the original sha before calling it. Sponsored-by: Jack Hill on Patreon	2023-06-08 13:33:58 -04:00
Joey Hess	64fc34b3da	narrow window where HEAD is detached Updating an adjusted branch can take a while when there are a lot of files. HEAD was detached at the start, so if eg git-annex sync was interrupted at the wrong point, there was a possibly wide window where it would leave the repo with HEAD detached. There's still a window, just much narrower. I don't know if it's possible to close the window entirely. While git can clearly update the currently checked out branch in eg git merge, it doesn't seem to provide another way to do it. Sponsored-by: Graham Spencer on Patreon	2023-06-07 11:10:54 -04:00
Joey Hess	fe1b2dfb4b	speed up very first tree import by 25% Reading from the cidsdb is responsible for about 25% of the runtime of an import. Since the cidmap is used to store the same information in ram, the cidsdb is not written to during an import any longer. And so, if it started off empty (and updateFromLog wasn't needed), those reads can just be skipped. This is kind of a cheesy optimisation, since after any import from any special remote, the database will no longer be empty, so it's a single use optimisation. But it's probably not uncommon to start by importing a lot of files, and it can save a lot of time then. Sponsored-by: Brock Spratlen on Patreon	2023-06-02 13:30:30 -04:00
Joey Hess	40017089f2	use importChanges optimisation Large speed up to importing trees from special remotes that contain a lot of files, by only processing changed files. Benchmarks: Importing from a special remote that has 10000 files, that have all been imported before, and 1 new file sped up from 26.06 to 2.59 seconds. An import with no change and 10000 unchanged files sped up from 24.3 to 1.99 seconds. Going up to 20000 files, an import with no changes sped up from 125.95 to 3.84 seconds. Sponsored-by: k0ld on Patreon	2023-06-01 13:47:00 -04:00
Joey Hess	c6acf574c7	implement importChanges optimisaton (not used yet) For simplicity, I've not tried to make it handle History yet, so when there is a history, a full import will still be done. Probably the right way to handle history is to first diff from the current tree to the last imported tree. Then, diff from the current tree to each of the historical trees, and recurse through the history diffing from child tree to parent tree. I don't think that will need a record of the previously imported historical trees, and so Logs.Import doesn't store them. Although I did leave room for future expansion in that log just in case. Next step will be to change importTree to importChanges and modify recordImportTree et all to handle it, by using adjustTree. Sponsored-by: Brett Eisenberg on Patreon	2023-05-31 16:01:34 -04:00
Joey Hess	7298123520	build git trees using ContentIdentifier to speed up import This gets the trees built, but it does not use them. Next step will be to remember the tree for next time an import is done, and diff between old and new trees to find the files that have changed. Added --missing to the mktree parameters. That only disables a check, so it's ok to do everywhere mktree is used. It probably also speeds up mktree to disable the check. Note that git fsck does not complain about the resulting tree objects that point to shas that are not in the repository. Even with --strict. A quick benchmark, importing 10000 files, this slowed it down from 2:04.06 to 2:04.28. So it will more than pay for itself. Sponsored-by: Luke Shumaker on Patreon	2023-05-31 12:46:54 -04:00
Joey Hess	f6aa097a39	avoid import writing to cidsdb initially Speed up importing trees from special remotes somewhat by avoiding redundant writes to sqlite database. Before, import would write to both the git-annex branch and also to the sqlite database. But then the next time it was run, needsUpdateFromLog would see the branch had changed, so run updateFromLog, which would make the same writes to the sqlite database a second time. Now import writes only to the git-annex branch. The next time it's run, needsUpdateFromLog sees that the branch has changed and so calls updateFromLog, which updates the sqlite database. Why defer the write to the sqlite database like this? It seems that it could write to the database as it goes, and at the end call recordAnnexBranchTree to indicate that the information in the git-annex branch has all been written to the cidsdb. That would avoid the second import doing extra work. But, there could be other processes running at the same time, and one of them may update the git-annex branch, eg merging a remote git-annex branch into it. Any cids logs on that merged git-annex branch would not be reflected in the cidsdb yet. If the import then called recordAnnexBranchTree, the cidsdb would never get updated with that merged information. I don't think there's a good way to prevent, or to detect that situation. So, it can't call recordAnnexBranchTree at the end. So it might as well wait until the next run and do updateFromLog then. It could instead do updateFromLog at the end, but it's going to check needsUpdateFromLog at the beginning anyway. Note that the database writes were queued, so there is already a cidmap that is used to remember changes that the current process has made. So, omitting database writes can't change the behavior of the current process. Also note that thirdpartypopulatedimport uses recordcidkeyindb, which reflects what it already did. That code path does not use the cidmap, but does not need to query it either. It might be possible to make that code path also only update the git-annex branch and not the db, but I haven't checked. Sponsored-by: Noam Kremen on Patreon	2023-05-30 17:05:28 -04:00
Joey Hess	f2db6da938	default to yt-dlp and fix progress parsing bugs I noticed git-annex was using a lot of CPU when downloading from youtube, and was not displaying progress. Turns out that yt-dlp (and I think also youtube-dl) sometimes only knows an estimated size, not the actual size, and displays the progress output slightly differently for that. That broke the parser. And, the parser was feeding chunks that failed to parse back as a remainder, which caused it to try to re-parse the entire output each time, so it got slower and slower. Using --progress-template like this should avoid parsing problems as well as future proof against output changes. But it will work with only yt-dlp. So, this seemed like the right time to deprecate youtube-dl, and default to yt-dlp when available. git-annex will still use youtube-dl if that's all that's available. However, since the progress parser for youtube-dl was buggy, and I don't want to maintain two different progress parsers (especially since youtube-dl is no longer in debian unstable having been replaced by yt-dlp), made git-annex no longer try to parse youtube-dl's progress. Also, updated docs for yt-dlp being default. It did not seem worth renaming annex.youtube-dl-options and annex.youtube-dl-command. Note that yt-dlp does not seem to document the fields available in the progress template. I found them by reading the source and looking at the templates it uses internally. Also note that the use of "i" (rather than "s") in progressTemplate makes it display floats rounded to integers; particularly the estimated total size can be a float. That also does not seem to be documented but I assume is a python thing? Sponsored-by: Joshua Antonishen on Patreon	2023-05-27 13:04:53 -04:00
Joey Hess	aff37fc208	avoid annexFileMode special case This makes annexFileMode be just an application of setAnnexPerm', which avoids having 2 functions that do different versions of the same thing. Fixes some buggy behavior for some combinations of core.sharedRepository and umask. Sponsored-by: Jack Hill on Patreon	2023-04-27 15:58:37 -04:00
Joey Hess	67f8268b3f	Support core.sharedRepository=0xxx at long last Sponsored-by: Brett Eisenberg on Patreon	2023-04-26 17:03:29 -04:00
Joey Hess	0aa98aa09b	fix perms for core.sharedRepository These two missed setting it. It rarely matters that the journal gets the right perm. But, when using annex.alwayscommit=false, someone else may come along later and want to append to the journal file. It probably never matters what the sentinal perms are, but for completeness.. Sponsored-by: Luke Shumaker on Patreon	2023-04-26 16:29:11 -04:00
Joey Hess	7af75a59be	Warn about unsupported core.sharedRepository=0xxx when set This spams the user with a lot of messages, but it seems like busywork to avoid that and only warn once, since this warning will go away when it gets implemented. Also fix parsing of the octal value. Sponsored-by: Kevin Mueller on Patreon	2023-04-26 13:25:29 -04:00
Joey Hess	9155ed1072	configremote New command, currently limited to changing autoenable= setting of a special remote. It will probably never be used for more than that given the limitations on it. Sponsored-by: Brock Spratlen on Patreon	2023-04-18 15:30:49 -04:00
Joey Hess	fe5e586b72	rename Git.Filename to Git.Quote	2023-04-12 17:22:03 -04:00
Joey Hess	a576fc3b12	fix mojibake reversion in display of utf8 When displaying a ByteString like "💕", safeOutput operates on individual bytes like "\240\159\146\149" and isControl '\146' = True, so it got truncated to just "\240". So, only treat the low control characters, and DEL, as control characters. Also split Utility.Terminal out of Utility.SafeOutput. The latter needs win32, but Utility.SafeOutput is used by Control.Exception, which is used by Setup. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-04-12 13:53:30 -04:00
Joey Hess	c50aa21d5f	init: Avoid autoenabling special remotes that have control characters in their names I'm on the fence about this. Notice that pulling from a git remote can pull branches that have escape sequences in their names. Git will display those as-is. Arguably git should try harder to avoid that. But, names of remotes are usually up to the local user, and autoenable changes that, and so it makes sense that git chooses to display control characters in names of remotes, and so autoenable needs to guard against it. Sponsored-by: Graham Spencer on Patreon	2023-04-12 12:37:12 -04:00
Joey Hess	de68e3dd4f	allow tab in controlCharacterInFilePath Seems unlikely to have a tab in a path, but it's not a control character that needs to be prevented either. Left \n \r \v and \a as other non-threatening control characters that are still obnoxious to have in a filepath because of how it causes issues with display and/or with shell scripting.	2023-04-12 12:31:16 -04:00
Joey Hess	8b6c7bdbcc	filter out control characters in all other Messages This does, as a side effect, make long notes in json output not be indented. The indentation is only needed to offset them underneath the display of the file they apply to, so that's ok. Sponsored-by: Brock Spratlen on Patreon	2023-04-11 12:58:01 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	063c00e4f7	git style filename quoting for giveup When the filenames are part of the git repository or other files that might have attacker-controlled names, quote them in error messages. This is fairly complete, although I didn't do the one in Utility.DirWatcher.INotify.hs because that doesn't have access to Git.Filename or Annex. But it's also quite possible I missed some. And also while scanning for these, I found giveup used with other things that could be attacker controlled to contain control characters (eg Keys). So, I'm thinking it would also be good for giveup to just filter out control characters. This commit is then not the only line of defence, but just good formatting when git-annex displays a filename in an error message. Sponsored-by: Kevin Mueller on Patreon	2023-04-10 12:56:45 -04:00
Joey Hess	da83652c76	addurl --preserve-filename: reject control characters As well as escape sequences, control characters seem unlikely to be desired when doing addurl, and likely to trip someone up. So disallow them as well. I did consider going the other way and allowing filenames with control characters and escape sequences, since git-annex is in the process of escaping display of all filenames. Might still be a better idea? Also display the illegal filename git quoted when it rejects it. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-04-10 12:18:25 -04:00
Joey Hess	2ba1559a8e	git style quoting for ActionItemOther Added StringContainingQuotedPath, which is used for ActionItemOther. In the process, checked every ActionItemOther for those containing filenames, and made them use quoting. Sponsored-by: Graham Spencer on Patreon	2023-04-08 16:30:01 -04:00
Joey Hess	ac0345aa42	improve comments	2023-04-04 15:23:39 -04:00
Joey Hess	e3f5bd4ca6	Revert "override rather than setting user.name and user.email" This reverts commit `66eb63dd82`. git-annex init is the only thing that uses ensureCommit. So overriding there will make later commits to the git-annex branch or by git-annex sync fail. It's ugly that git-annex init sets user.name and user.email, but it only does it on systems that are badly configured.	2023-04-04 15:15:02 -04:00
Joey Hess	e91bf784cd	Support user.useConfigOnly git config When it's set and git cannot determine user.name or user.email, this will result in git-annex init failing when committing to create the git-annex branch. Other git-annex commands that commit can also fail. Sponsored-by: Jack Hill on Patreon	2023-04-04 15:12:52 -04:00
Joey Hess	66eb63dd82	override rather than setting user.name and user.email Avoid setting user.name and user.email in the git config when git is unable to detect them. git-annex has good reason to want to ensure git commit succeeds when eg committing to the git-annex branch. But it's not playing nice to set these values where other commands can see them. Sponsored-by: Brett Eisenberg on Patreon	2023-04-04 14:56:44 -04:00
Joey Hess	3eb51ee929	readFileStrict to avoid laziness bug Fix laziness bug introduced in last release that breaks use of --unlock-present and --hide-missing adjusted branches. Since there is a writeFile of the same file immediately after readFile, it may still have the file open for read (or may have happened to read it already and closed it). I was not able to reproduce the problem in brief testing, but this seems obvious. Sponsored-by: Luke Shumaker on Patreona	2023-04-04 14:25:01 -04:00
Joey Hess	22091d4765	fix comment	2023-03-28 13:40:17 -04:00
Joey Hess	a5709dcc22	Copy with a reflink when exporting a tree to a directory special remote Remote.Directory makes a temp file, then calls this, and since the temp file exists, it prevented probing if CoW works. Note that deleting the empty file does mean there's a small window for a race. If another process is also exporting to the remote, that could let it make the same temp file. However, the temp filename actually has the processes's pid in it, which avoids that being a problem. This may have been a reversion caused by commits around `63d508e885`, but I haven't gone back and tested to be sure. The directory special remote had supposedly supported CoW for this going back to about half a year before that. Sponsored-by: Graham Spencer on Patreon	2023-03-28 13:09:14 -04:00
Joey Hess	24ae4b291c	addurl, importfeed: Fix failure when annex.securehashesonly is set The temporary URL key used for the download, before the real key is generated, was blocked by annex.securehashesonly. Fixed by passing the Backend that will be used for the final key into runTransfer. When a Backend is provided, have preCheckSecureHashes check that, rather than the key being transferred. Sponsored-by: unqueued on Patreon	2023-03-27 15:10:46 -04:00
Joey Hess	cb6cb61ca1	avoid build warning on windows	2023-03-27 12:20:35 -04:00
Joey Hess	291ad8f6b2	avoid build warning on windows	2023-03-27 12:19:26 -04:00
Joey Hess	2b5fa091e2	annex.maxextensionlength for view view: Support annex.maxextensionlength when generating filenames for the view branch. Note that refining an existing view will reuse the extension length that was configured when initially constructing the view. This is necessarily the case because it reuses the filenames. Also view files used to have all extensions at the end, no matter how many there were. Since annex.maxextensionlength's documentation includes that it's limited to 2 extensions, I made it consistent with that. Sponsored-by: k0ld on Patreon	2023-03-24 14:01:38 -04:00
Joey Hess	038a2600f4	Avoid leaving repo with a detached head when there is a failure checking out an updated adjusted branch I don't know of scenarios where that can happen (besides the bug fixed by the parent commit), but there probably are some. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2023-03-23 16:36:43 -04:00
Joey Hess	cb4d9f7b1f	run restagePointerFiles in adjustedBranchRefreshFull Avoid failure to update adjusted branch --unlock-present after git-annex drop when annex.adjustedbranchrefresh=1 At higher values, it did flush the queue, which ran restagePointerFiles. But at 1, adjustedBranchRefreshFull gets added to the queue, and while restagePointerFiles is also in the queue, it runs after that. Sponsored-by: Brock Spratlen on Patreon	2023-03-23 16:25:45 -04:00
Joey Hess	e822df2a09	fix build warnings on windows	2023-03-21 18:41:23 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Yaroslav Halchenko	0ae5ff797f	Typo: sansative -> sensitive	2023-03-17 15:14:50 -04:00
Yaroslav Halchenko	e018ae1125	Fix ambigous typos	2023-03-17 15:14:47 -04:00
Joey Hess	54ad1b4cfb	Windows: Support long filenames in more (possibly all) of the code Works around this bug in unix-compat: https://github.com/jacobstanley/unix-compat/issues/56 getFileStatus and other FilePath using functions in unix-compat do not do UNC conversion on Windows. Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the necessary conversion on windows to support long filenames. Audited all imports of System.PosixCompat.Files to make sure that no functions that operate on FilePath were imported from it. Instead, use the equvilants from Utility.RawFilePath. In particular the re-export of that module in Common had to be removed, which led to lots of other changes throughout the code. The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig make Utility.Directory not be needed to build setup. And so let it use Utility.RawFilePath, which depends on unix, which cannot be in setup-depends. Sponsored-by: Dartmouth College's Datalad project	2023-03-01 15:55:58 -04:00
Joey Hess	bb54c8a633	support --hide-missing adjustment of view branches I had thought this would not make sense to combine with view branches, since removing files from a view changes metadata. However, that's committing removal of files. With --hide-missing, the files get removed when git-annex updates the branch itself, so there is no conflict. It does not seem likely to be very useful, but it does work! And that's nice because it means all types of adjusted branches can be combined with view branches. Sponsored-by: Max Thoursie on Patreon	2023-02-27 15:39:58 -04:00
Joey Hess	1c4f4b449a	support --unlock-present adjustment of view branches When generating the view, check if the key is present. When syncing in a view branch with an adjustment, run adjustedBranchRefreshFull the same as is done when syncing in other adjusted branches. This is needed because the docs for git-annex adjust --unlock-present suggest using git-annex sync to update the branch when annex.adjustedbranchrefresh is not set. Note that, with annex.adjustedbranchrefresh set, it just works! The adjusted branch gets updated in the usual way and it doesn't matter that there's a view branch underneath. And of course, re-running git-annex adjut --unlock-present also works, as suggested in the docs. Sponsored-by: Erik Bjäreholt on Patreon	2023-02-27 15:37:57 -04:00
Joey Hess	7d839176c3	support generation of unlocked views Just make pointer files rather than symlinks, easy. As for the other adjustments: --lock is the default for views --fix happens automatically in views --hide-missing probably does not make sense when combined with views, because deleting a file from a view removes metadata --unlock-present will need a bit more work	2023-02-27 15:07:36 -04:00
Joey Hess	f09e299156	rawfilepath conversion	2023-02-27 15:06:32 -04:00
Joey Hess	cc32e31161	understand adjusted view branch names An adjusted view branch has a name like "refs/heads/adjusted/views/master(author=_)(unlocked)", so it is a view branch that has been converted to an adjusted branch. Made Logs.View support such branch names. So now git-annex sync and pre-commit handle updating metadata on commit in such a branch. Much remains to be done to fully support adjusted view branches, including actually applying the adjustment when updating the view branch. Sponsored-by: Graham Spencer on Patreon	2023-02-27 14:57:58 -04:00
Joey Hess	2a966f49f2	overwrite old adjusted view branch When git-annex adjust is run in a view branch, and the adjusted branch already exists, overwrite the old adjusted branch with the new one without being forced. Usually overwriting an adjusted branch is avoided because it could lose data. But when a view branch has been adjusted, there is no data to lose in the adjusted branch, because the only changes that can be made of significance are to move files between directories. Which changes metadata on commit. And the old branch has already been committed. Sponsored-by: Lawrence Brogan on Patreon	2023-02-27 14:35:27 -04:00
Joey Hess	9b1fe37818	improve adjusted branch name parsing to support adjusted view branches An adjusted view branch has a name like "adjusted/views/master(author=_)(unlocked)" and so the adjustment starts at the last open paren, not the first open paren. Note that git-annex sync still does not do anything useful when run in such a branch, because it does not realize that it is a view branch. This is only groundwork for adjusted view branches. This also fixes adjusted branches when the basis branch name contains parens for some other reason, though that is not common in a git branch name. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2023-02-27 14:09:05 -04:00
Joey Hess	da61d564f1	fix view reversion caused by optimisation view: Fix a reversion in 10.20230214 that omitted a file from a view when the file had no metadata set, but the view only used path fields. Sponsored-by: Jack Hill on Patreon	2023-02-16 15:18:17 -04:00
Joey Hess	826b225ca8	Sped up view branch construction by 50% A benchmark in my sound repository with `git-annex view feedtitle=*` took 2:52 wall clock time before and 1:58 after. Though it still only used 130% of CPU. This is the same kind of optimisation that is in seekFilteredKeys, though that precaches location logs while this streams the metadata logs direct to parsing them. seekFilteredKeys contains more streaming, to find the annexed files, and this could be further sped up with similar streaming. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-02-13 13:29:57 -04:00
Joey Hess	bb4550c7c1	sync: Warn when the adjusted basis ref cannot be found As happens eg when the user has renamed branches. Sponsored-by: Graham Spencer on Patreon	2023-02-10 14:33:21 -04:00
Joey Hess	5f9bf51438	sync in view branch updates the view branch * sync: When run in a view branch, refresh the view branch to reflect any changes that have been made to the parent branch or metadata. This is basically working, but probably needs some more work to deal with all the edge cases of things sync does. Sponsored-by: Lawrence Brogan on Patreon	2023-02-08 15:37:28 -04:00
Joey Hess	aa0350ff49	add directory to views for files that lack specified metadata * view: New field?=glob and ?tag syntax that includes a directory "_" in the view for files that do not have the specified metadata set. * Added annex.viewunsetdirectory git config to change the name of the "_" directory in a view. When in a view using the new syntax, old git-annex will fail to parse the view log. It errors with "Not in a view.", which is not ideal. But that only affects view commands. annex.viewunsetdirectory is included in the View for a couple of reasons. One is to avoid needing to warn the user that it should not be changed when in a view, since that would confuse git-annex. Another reason is that it helped with plumbing the value through to some pure functions. annex.viewunsetdirectory is actually mangled the same as any other view directory. So if it's configured to something like "N/A", there won't be multiple levels of directories, which would also confuse git-annex. Sponsored-By: Jack Hill on Patreon	2023-02-07 16:28:46 -04:00
Joey Hess	579d9b60c1	improve concurrency of move/copy --from --to Use separate stages for download and upload. In the common case where it downloads the file from one remote and then uploads to the other, those are by far the most expensive operations, and there's a decent chance the two remotes bottleneck on different resources. Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads will be started both downloading from the src remote. They will probably finish at the same time. Then two threads will be started uploading to the dst remote. They will probably take the same time as well. Before this change, it would alternate back and forth, bottlenecking on src and dst. With this change, as soon as the two threads start uploading to dst, two more threads are able to start, downloading from src. So bandwidth to both remotes is saturated more often. Other commands that use transferStages only send in one direction at a time. So the worker threads for the other direction will sit idle, and there will be no change in their behavior. Sponsored-by: Dartmouth College's DANDI project	2023-01-24 13:59:39 -04:00
Joey Hess	acc3f6211f	finishing up move --from --to Lock the local content for drop after getting it from src, to prevent another process from using the local content as a copy and dropping it from src, which would prevent dropping the local content after sending it to dest. Support resuming an interrupted move that downloaded the content from src, leaving the local content populated. In this case, the location log has not been updated to say the content is present locally, so we can assume that it's resuming and go ahead and drop the local content after sending it to dest. Note that if a `git-annex get` is being ran at the same time as a `git-annex move --from --to`, it may get a file just before the move processes it. So the location log has not been updated yet, and the move thinks it's resuming. Resulting in local copy being dropped after it's sent to the dest. This race is something we'll just have to live with, it seems. I also gave up on the idea of checking if the location log had been updated by a `git-annex get` that is ran at the same time. That wouldn't work, because the location log is precached in the seek stage, so reading it again after sending the content to dest would not notice changes made to it, unless the cache were invalidated, which would slow it down a lot. That idea anyway was subject to races where it would not detect the concurrent `git-annex get`. So concurrent `git-annex get` will have results that may be surprising. To make that less surprising, updated the documentation of this feature to be explicit that it downloads content to the local repository temporarily. Sponsored-by: Dartmouth College's DANDI project	2023-01-23 17:43:48 -04:00
Joey Hess	1abd457e98	push location log updating up to callers of download Prep for move --to --from, which needs to download from a src repo without updating the location log for the local repo, before sending the content on to the dest repo. Note that caller of download' already update the log themselves. See previous commit `a422a056f2` that pushed it up to download from getViaTmpFrom. (Also removed in passing a debug print + readline that I accidentially committed last week on this branch.) Sponsored-by: Dartmouth College's DANDI project	2023-01-23 13:47:41 -04:00
Joey Hess	cfaae7e931	added an optional cost= configuration to all special remotes Note that when this is specified and an older git-annex is used to enableremote such a special remote, it will simply ignore the cost= field and use whatever the default cost is. In passing, fixed adb to support the remote.name.cost and remote.name.cost-command configs. Sponsored-by: Dartmouth College's DANDI project	2023-01-12 13:42:28 -04:00
Joey Hess	2fa7656627	switch to readMaybe to handle values with leading number followed by non-number readish ignores a trailing string after a number, but to support values like "YYYY:MM:DD" which it makes sense to compare lexographically, require the whole string to be parsed as a number in order to enable numeric comparison. Sponsored-by: Max Thoursie on Patreon	2022-12-22 14:33:47 -04:00
Joey Hess	9d60385001	convert renameFile to moveFile to support cross-device moves Improve handling of some .git/annex/ subdirectories being on other filesystems, in the bittorrent special remote, and youtube-dl integration, and git-annex addurl. The only one of these that I've confirmed to be a problem is in the bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are on different filesystems. As well as auditing for renameFile, also audited for createLink, all of those are ok as are the other remaining renameFile calls. Also audited all code paths that use .git/annex/othertmp, and did not find any other cross-device problems. So, removing mention of othertmp needing to be on the same device. Sponsored-by: Dartmouth College's Datalad project	2022-12-20 15:17:50 -04:00
Joey Hess	aa6919737c	--metadata lexicographical comparisons Change --metadata comparisons < > <= and >= to fall back to lexicographical comparisons when one or both values being compared are not numbers. Sponsored-by: Erik Bjäreholt on Patreon	2022-12-12 13:33:24 -04:00
Joey Hess	65f9e7a3c7	fix deadlock in restagePointerFiles Fix a hang that occasionally occurred during commands such as move. (A bug introduced in 10.20220927, in commit `6a3bd283b8`) The restage.log was kept locked while running a complex index refresh action. In an unusual situation, that action could need to write to the restage log, which caused a deadlock. The solution is a two-stage process. First the restage.log is moved to a work file, which is done with the lock held. Then the content of the work file is read and processed, which happens without the lock being held. This is all done in a crash-safe manner. Note that streamRestageLog may not be fully safe to run concurrently with itself. That's ok, because restagePointerFiles uses it with the index lock held, so only one can be run at a time. streamRestageLog does delete the restage.old file at the end without locking. If a calcRestageLog is run concurrently, it will either see the file content before it was deleted, or will see it's missing. Either is ok, because at most this will cause calcRestageLog to report more work remains to be done than there is. Sponsored-by: Dartmouth College's Datalad project	2022-12-08 14:36:11 -04:00
Joey Hess	43f681d4c1	Support parsing yt-dpl output to display download progress Before this fix, no progress was displayed when yt-dpl was used. Sponsored-by: Graham Spencer on Patreon	2022-11-21 15:04:36 -04:00
Joey Hess	5256be61c1	When youtube-dl is not available in PATH, use yt-dlp instead Debian is going to drop youtube-dl which is not active upstream, and yt-dlp is the replacement. This will make it be used if youtube-dl gets removed. If an old version of youtube-dl remains installed, git-annex will still use it. That might not be desirable, but changing git-annex to use yt-dlp in preference to youtube-dl when both are installed risks breaking when the user has annex.youtube-dl-options set to something that is supported by youtube-dl, but not by yt-dlp. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-11-21 14:40:33 -04:00
Joey Hess	2b014f1a8b	don't frontload reconcileStaged in git-annex init init: Avoid scanning for annexed files, which can be lengthy in a large repository. Instead that scan is done on demand. This lets git-annex init be run and some query commands be used in a repository without waiting. Note that autoinit already behaved this way, so while this will mean some commands like git-annex get/unlock/add will do the scan the first time run, that is not really a significant behavior change. And, it's really better to have a consistent behavior. The reason for the inconsistency was a strange bug discussed in `b3c4579c79`. Avoiding reconcileStaged in init will keep avoiding whatever that was. Sponsored-by: Dartmouth College's DANDI project	2022-11-18 13:58:47 -04:00
Joey Hess	14f7a386f0	Make git-annex enable-tor work when using the linux standalone build Clean the standalone environment before running the su command to run "sh". Otherwise, PATH leaked through, causing it to run git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set, which caused that script to not work: [2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"] /home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found Changed programPath to not use GIT_ANNEX_PROGRAMPATH, but instead run the scripts at the top of GIT_ANNEX_DIR. That works both when the standalone environment is set up, and when it's not. Sponsored-by: Kevin Mueller on Patreon	2022-10-26 15:45:08 -04:00
Joey Hess	731e806c96	use lookupKeyStaged in --batch code paths Make --batch mode handle unstaged annexed files consistently whether the file is unlocked or not. Before this, a unstaged locked file would have the symlink on disk examined and operated on in --batch mode, while an unstaged unlocked file would be skipped. Note that, when not in batch mode, unstaged files are skipped over too. That is actually somewhat new behavior; as late as 7.20191114 a command like `git-annex whereis .` would operate on unstaged locked files and skip over unstaged unlocked files. That changed during optimisation of CmdLine.Seek with apparently little fanfare or notice. Turns out that rmurl still behaved that way when given an unstaged file on the command line. It was changed to use lookupKeyStaged to handle its --batch mode. That also affected its non-batch mode, but since that's just catching up to the change earlier made to most other commands, I have not mentioed that in the changelog. It may be that other uses of lookupKey should also change to lookupKeyStaged. But it may also be that would slow down some things, or lead to unwanted behavior changes, so I've kept the changes minimal for now. An example of a place where the use of lookupKey is better than lookupKeyStaged is in Command.AddUrl, where it looks to see if the file already exists, and adds the url to the file when so. It does not matter there whether the file is staged or not (when it's locked). The use of lookupKey in Command.Unused likewise seems good (and faster). Sponsored-by: Nicholas Golder-Manning on Patreon	2022-10-26 14:43:06 -04:00
Joey Hess	b2ee2496ee	remove whenAnnexed and ifAnnexed In preparation for adding a new variation on lookupKey. Sponsored-by: Max Thoursie on Patreon	2022-10-26 14:06:32 -04:00
Joey Hess	6fbd337e34	avoid uncessary keys db writes; doubled speed! When running eg git-annex get, for each file it has to read from and write to the keys database. But it's reading exclusively from one table, and writing to a different table. So, it is not necessary to flush the write to the database before reading. This avoids writing the database once per file, instead it will buffer 1000 changes before writing. Benchmarking getting 1000 small files from a local origin, git-annex get now takes 13.62s, down from 22.41s! git-annex drop now takes 9.07s, down from 18.63s! Wowowowowowowow! (It would perhaps have been better if there were separate databases for the two tables. At least it would have avoided this complexity. Ah well, this is better than splitting the table in a annex.version upgrade.) Sponsored-by: Dartmouth College's Datalad project	2022-10-12 15:33:16 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	c2ad84b423	all keys are still present on versioned remote after import of a tree When importing from versioned remotes, fix tracking of the content of deleted files. Only S3 supports versioning so far, so only it was affected. But, the draft import/export interface for external remotes also seemed to need a change, so that versionedExport could be set.	2022-10-11 13:05:40 -04:00
Joey Hess	7059322a6c	Support "inbackend" in preferred content expressions Well, actually, fix a typo that has always been in the implementation of that. "inbacked" used to work, but let's not tell users about that; they might try to use it and expect git-annex to keep supporting the typo.. Sponsored-by: Jack Hill on Patreon	2022-09-26 16:06:49 -04:00
Joey Hess	b411a1ce74	remove unncessary do block Left by Reiko's patch	2022-09-26 13:10:25 -04:00
Reiko Asakura	1d48153bb8	Run freeze and thaw hooks on crippled filesystems The user sets these hooks deliberately so they should always be run. For example this allows hooks to be used to manage file permissions on NTFS volumes in WSL1.	2022-09-26 13:05:39 -04:00
Joey Hess	98eb5ff84f	fix windows build	2022-09-26 12:08:04 -04:00
Joey Hess	e62e4eaaf2	refector for legibility	2022-09-23 18:53:06 -04:00
Joey Hess	2478e9e03a	restage: New git-annex command, handles restaging unlocked files This is much easier and less failure-prone than having the user run git update-index --refresh themselves. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 16:29:59 -04:00
Joey Hess	f7146c153b	fix restaging of transferred files after stalldetection kicks in Sponsored-by: Dartmouth College's DANDI project	2022-09-23 15:55:40 -04:00
Joey Hess	6a3bd283b8	add restage log When pointer files need to be restaged, they're first written to the log, and then when the restage operation runs, it reads the log. This way, if the git-annex process is interrupted before it can do the restaging, a later git-annex process can do it. Currently, this lets a git-annex get/drop command be interrupted and then re-ran, and as long as it gets/drops additional files, it will clean up after the interrupted command. But more changes are needed to make it easier to restage after an interrupted process. Kept using the git queue to run the restage action, even though the list of files that it builds up for that action is not actually used by the action. This could perhaps be simplified to make restaging a cleanup action that gets registered, rather than using the git queue for it. But I wasn't sure if that would cause visible behavior changes, when eg dropping a large number of files, currently the git queue flushes periodically, and so it restages incrementally, rather than all at the end. In restagePointerFiles, it reads the restage log twice, once to get the number of files and size, and a second time to process it. This seemed better than reading the whole file into memory, since potentially a huge number of files could be in there. Probably the OS will cache the file in memory and there will not be much performance impact. It might be better to keep running tallies in another file though. But updating that atomically with the log seems hard. Also note that it's possible for calcRestageLog to see a different file than streamRestageLog does. More files may be added to the log in between. That is ok, it will only cause the filterprocessfaster heuristic to operate with slightly out of date information, so it may make the wrong choice for the files that got added and be a little slower than ideal. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 15:47:24 -04:00
Joey Hess	8718125ae4	refactor the restage runner Sponsored-by: Dartmouth College's DANDI project	2022-09-23 13:12:17 -04:00
Joey Hess	6e3c9bea2e	drain transferrer read handle when shutting it down Fixes updating git index file after getting an unlocked file when annex.stalldetection is set. The transferrer may want to send additional protocol messages when it's shut down. Closing the read handle prevented it from doing that, and caused it to crash rather than cleanly shutting down. Draining the handle without processing the protocol seemed ok to do, because anything it outputs is going to be some side message displayed at shutdown. Displaying those once per transferrer process that is running seems unncessary. Sponsored-by: Dartmouth College's DANDI project	2022-09-22 14:39:39 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Joey Hess	d2c842e9a1	don't force use of conduit in withUrlOptionsPromptingCreds Use curl for downloads from git remotes when annex.url-options and other git configs are set. If the url needs a password, curl will fail, and git credential will not be used to prompt for it. But the user can set --netrc in url-options and put the password in the netrc file. This also means that url-options settings like -4 will take effect. That was the case before commit `1883f7ef8f` forced conduit to be used.	2022-09-09 16:07:32 -04:00
Joey Hess	c62fe5e9a8	avoid redundant prompt for http password in git-annex get that does autoinit autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git remote has not been probed yet, that will do a http get that will prompt for a password. And then the parent process will subsequently prompt for a password when getting annexed files from the remote. So the solution is for autoEnableSpecialRemotes to run remoteList before the subprocess, which will probe for the uuid for the git remote in the same process that will later be used to get annexed files. But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git, so Annex.Init cannot import Remote.List. Had to pass remoteList into functions in Annex.Init to get around this dependency loop.	2022-09-09 14:43:43 -04:00
Joey Hess	9621beabc4	cache credentials in memory when doing http basic auth to a git remote When accessing a git remote over http needs a git credential prompt for a password, cache it for the lifetime of the git-annex process, rather than repeatedly prompting. The git-lfs special remote already caches the credential when discovering the endpoint. And presumably commands like git pull do as well, since they may download multiple urls from a remote. The TMVar CredentialCache is read, so two concurrent calls to getBasicAuthFromCredential will both prompt for a credential. There would already be two concurrent password prompts in such a case, and existing uses of `prompt` probably avoid it. Anyway, it's no worse than before.	2022-09-09 14:20:32 -04:00
Joey Hess	d4fd966396	avoid dup check of guardSafeToUseRepo Speeds up init slightly, and reduces the number of syscalls by the dynamic linker. Sponsored-by: Dartmouth College's Datalad project	2022-08-29 13:52:58 -04:00

... 2 3 4 5 6 ...

2228 commits