git-annex

Author	SHA1	Message	Date
Joey Hess	12a0ca9656	assistant: Fix a race condition that could cause a pointer file to get ingested into the annex This was caused by commit `fb8ab2469d` putting an isPointerFile check in the wrong place. So if the file was not a pointer file at that point, but got replaced by one before the file got locked down, the pointer file would be ingested into the annex. The fix is simply to move the isPointerFile check to after safeToAdd locks down the file. Now if the file changes to a pointer file after the isPointerFile check, ingestion will see that it changed after lockdown, and will refuse to add it to the annex. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-07-02 12:25:30 -04:00
Joey Hess	20ebb54b6f	prep release	2024-07-01 15:13:10 -04:00
Joey Hess	0033e6c0a6	Tab completion of many commands like info and trust now includes remotes Especially useful with proxied remotes and clusters, where the user may not be entirely familiar with the name and can learn by tab completion.	2024-06-30 12:39:18 -04:00
Joey Hess	dbfff04fb6	update for clusters	2024-06-27 12:47:26 -04:00
Joey Hess	0ef4183b00	Merge branch 'master' into proxy	2024-06-27 12:41:57 -04:00
Joey Hess	19137ae780	avoid unfiltered debugging from git-annex-shell When --debugfilter or annex.debugfilter is set, avoid propigating debug output from git-annex-shell, since it cannot be filtered. It would be possible to pass --debugfilter on to git-annex-shell, but it only started accepting that option in 2022. So it would break interop with older versions.	2024-06-27 12:37:25 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	3970bbb03b	Merge branch 'master' into proxy	2024-06-17 09:29:34 -04:00
Joey Hess	af79728ac3	tab complete special remotes An oversight.. And with the work in progress proxy and cluster, there can be additional remotes that are not listed in .git/config, but are available. Making those more discoverable is another big benefit of this.	2024-06-17 09:26:03 -04:00
Joey Hess	570ceffe8d	broke out initcluster One benefit of this is that a typo in annex-cluster-node config won't init a new cluster. Also it gets the cluster description set and is consistent with initremote.	2024-06-14 17:23:11 -04:00
Joey Hess	bbf261487d	add git-annex updatecluster command Seems to work fine, making the right changes to the git-annex branch.	2024-06-14 15:02:01 -04:00
Joey Hess	2844230dfe	add git configs for clusters	2024-06-14 12:20:17 -04:00
Joey Hess	649b87bedd	Merge branch 'master' into proxy	2024-06-10 14:26:18 -04:00
Joey Hess	b32c4c2e98	atomic git-annex branch update when regrafting in transition Fix a bug where interrupting git-annex while it is updating the git-annex branch could lead to git fsck complaining about missing tree objects. Interrupting git-annex while regraftexports is running in a transition that is forgetting git-annex branch history would leave the repository with a git-annex branch that did not contain the tree shas listed in export.log. That lets those trees be garbage collected. A subsequent run of the same transition then regrafts the trees listed in export.log into the git-annex branch. But those trees have been lost. Note that both sides of `if neednewlocalbranch` are atomic now. I had thought only the True side needed to be, but I do think there may be cases where the False side needs to be as well. Sponsored-by: Dartmouth College's OpenNeuro project	2024-06-07 16:34:10 -04:00
Joey Hess	f97f4b8bdb	Added updateproxy command and remote.name.annex-proxy configuration So far this only records proxy information on the git-annex branch.	2024-06-04 14:52:03 -04:00
Joey Hess	da2c02162c	Fix Windows build with Win32 2.13.4+ Thanks, Oleg Tolmatcev	2024-06-03 13:04:15 -04:00
Joey Hess	abbb8f6bbf	releasing package git-annex version 10.20240531	2024-05-31 12:32:34 -04:00
Joey Hess	aeedca70ca	prep release	2024-05-30 17:53:33 -04:00
Joey Hess	98762a2f96	group: Added --list option Seemed to make sense to exclude groups used only by dead repositories.	2024-05-29 13:37:35 -04:00
Joey Hess	3318d25c65	adjust unlocked execute bit handling When building an adjusted unlocked branch, make pointer files executable when the annex object file is executable. This slows down git-annex adjust --unlock/--unlock-present by needing to stat all annex object files in the tree. Probably not a significant slowdown compared to other work they do, but I have not benchmarked. I chose to leave git-annex adjust --unlock marked as stable, even though get or drop of an object file can change whether it would make the pointer file executable. Partly because making it unstable would slow down re-adjustment, and partly for symmetry with the handling of an unlocked pointer file that is executable when the content is dropped, which does not remove its execute bit.	2024-05-28 12:39:42 -04:00
Joey Hess	22bf23782f	initremote, enableremote: Added --with-url to enable using git-remote-annex Also sets remote.name.fetch to a typical value, same as git remote add does.	2024-05-24 14:29:36 -04:00
Joey Hess	434a88c368	Merge branch 'git-remote-annex'	2024-05-15 17:57:50 -04:00
Joey Hess	768cdee461	testremote: Really fsck downloaded objects `8844372c23` exposted a bug in testremote, it was passing the serialized key, not the object file, to be checksummed.	2024-05-15 17:57:27 -04:00
Joey Hess	468de43d66	Merge branch 'master' into git-remote-annex	2024-05-15 17:49:12 -04:00
Joey Hess	0281f7f23e	Avoid the --fast option preventing checksumming in some cases it was not supposed to fsck --fast was intended to disable checksumming, but checksumming is done after transfers too. Due to the check being in the non-incremental path, it would only affect non-incremental checksumming during a transfer, and I'm not 100% sure that it was a problem. Also, when using an external backend that does checksumming, fsck --fast didn't disable it and now does.	2024-05-12 21:36:48 -04:00
Joey Hess	05684bdd6c	fsck: Fix recent reversion that made it say it was checksumming files whose content is not present Did not track down the commit that caused the problem, but git-annex version 10.20240431 didn't behave that way.	2024-05-12 21:23:27 -04:00
Joey Hess	dfb09ad1ad	preparing to merge git-remote-annex Update its todo with remaining items. Add changelog entry. Simplified internals document to no longer be notes to myself, but target users who want to understand how the data is stored and might want to extract these repos manually. Sponsored-by: Kevin Mueller on Patreon	2024-05-10 15:06:15 -04:00
Joey Hess	9dea552f9b	changelog for typo fixes Since a few affected output messages.	2024-05-01 15:47:28 -04:00
Yaroslav Halchenko	9c2ab31549	Fix compatable typo (yet to add to codespell) === Do not change lines below === { "chain": [], "cmd": "git-sedi compatable compatible", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:25 -04:00
Joey Hess	d6ad5b9b50	releasing package git-annex version 10.20240430	2024-04-30 15:27:31 -04:00
Joey Hess	f3cca8a9f8	applied patch	2024-04-30 15:17:38 -04:00
Joey Hess	c410b2bb73	annex.maxextensions configuration Controls how many filename extensions to preserve. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-04-18 14:23:38 -04:00
Joey Hess	d372553540	rclone special remote Added rclone special remote, which can be used without needing to install the git-annex-remote-rclone program. This needs a new version of rclone, which supports "rclone gitannex". This is implemented as a variant of an external special remote, that runs "rclone gitannex" instead of the usual git-annex-remote- command. Parameterized Remote.External to support that. Sponsored-by: Luke T. Shumaker on Patreon	2024-04-17 15:20:37 -04:00
Joey Hess	2c73845d90	multiple -m second try Test suite passes this time. When committing the adjusted branch, use the old method to make a message that old git-annex can consume. Also made the code accept the new message, so that eventually commitTreeExactMessage can be removed. Sponsored-by: Kevin Mueller on Patreon	2024-04-09 12:56:47 -04:00
Joey Hess	a8dd85ea5a	Revert "multiple -m" This reverts commit `cee12f6a2f`. This commit broke git-annex init run in a repo that was cloned from a repo with an adjusted branch checked out. The problem is that findAdjustingCommit was not able to identify the commit that created the adjusted branch. It seems that there is an extra "\n" at the end of the commit message that it does not expect. Since backwards compatability needs to be maintained, cannot just make findAdjustingCommit accept it with the "\n". Will have to instead have one commitTree variant that uses the old method, and use it for adjusted branch committing.	2024-04-02 17:29:07 -04:00
Joey Hess	cee12f6a2f	multiple -m sync, assist, import: Allow -m option to be specified multiple times, to provide additional paragraphs for the commit message. The option parser didn't allow multiple -m before, so there is no risk of behavior change breaking something that was for some reason using multiple -m already. Pass through to git commands, so that the method used to assemble the paragrahs is whatever git does. Which might conceivably change in the future. Note that git commit-tree has supported -m since git 1.7.7. commitTree was probably not using it since it predates that version. Since the configure script prevents building git-annex with git older than 2.1, there is no risk that it's not supported now. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-03-27 15:58:27 -04:00
Joey Hess	377e9fff18	fix typo	2024-03-27 12:45:40 -04:00
Joey Hess	7c5007279c	Windows: Fix escaping output to terminal when using old versions of MinTTY	2024-03-26 13:09:21 -04:00
Joey Hess	f04d9574d6	fix transfer lock file for Download to not include uuid While redundant concurrent transfers were already prevented in most cases, it failed to prevent the case where two different repositories were sending the same content to the same repository. By removing the uuid from the transfer lock file for Download transfers, one repository sending content will block the other one from also sending the same content. In order to interoperate with old git-annex, the old lock file is still locked, as well as locking the new one. That added a lot of extra code and work, and the plan is to eventually stop locking the old lock file, at some point in time when an old git-annex process is unlikely to be running at the same time. Note that in the case of 2 repositories both doing eg `git-annex copy foo --to origin` the output is not that great: copy b (to origin...) transfer already in progress, or unable to take transfer lock git-annex: transfer already in progress, or unable to take transfer lock 97% 966.81 MiB 534 GiB/s 0sp2pstdio: 1 failed Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe)) Transfer failed Perhaps that output could be cleaned up? Anyway, it's a lot better than letting the redundant transfer happen and then failing with an obscure error about a temp file, which is what it did before. And it seems users don't often try to do this, since nobody ever reported this bug to me before. (The "97%" there is actually how far along the other transfer is.) Sponsored-by: Joshua Antonishen on Patreon	2024-03-25 14:47:46 -04:00
Joey Hess	dee249ac51	fix name of option	2024-03-22 10:54:14 -04:00
Joey Hess	016d1bee88	add reregisterurl command What this can currently be used for is only to change an url from being used by a special remote to being used by the web remote. This could have been a --move-from option to registerurl. But, that would have complicated its option and --batch processing, and also would have complicated unregisterurl, which is implemented on top of Command.Registerurl. So, a separate command was actually less complicated to implement. The generic description of the command is because I want to make this command a catch-all for other url updating kind of things, if there are ever any more. Also because it was hard to come up with a good name for the specific action. I considered `git-annex moveurl`, but that seems to indicate data is perhaps actually being moved, and seems to sit at the same level as addurl and rmurl, and this command is at the plumbing level of registerurl and unregisterurl. Sponsored-by: Dartmouth College's DANDI project	2024-03-05 15:06:14 -04:00
Joey Hess	def94fbff6	update	2024-03-01 13:48:51 -04:00
Joey Hess	0f7143d226	support VURL backend Not yet implemented is recording hashes on download from web and verifying hashes. addurl --verifiable option added with -V short option because I expect a lot of people will want to use this. It seems likely that --verifiable will become the default eventually, and possibly rather soon. While old git-annex versions don't support VURL, that doesn't prevent using them with keys that use VURL. Of course, they won't verify the content on transfer, and fsck will warn that it doesn't know about VURL. So there's not much problem with starting to use VURL even when interoperating with old versions. Sponsored-by: Joshua Antonishen on Patreon	2024-02-29 13:48:51 -04:00
Joey Hess	c2d6c02c27	Added dependency on unbounded-delays And stop vendoring part of it. This is a free dependency because tasty depends on it. Sponsored-by: Leon Schuermann on Patreon	2024-02-27 13:11:59 -04:00
Joey Hess	bee3abab14	releasing package git-annex version 10.20240227	2024-02-27 13:02:17 -04:00
Joey Hess	70cb41028e	Pass --no-warnings to yt-dlp Notice a warning with -J2 causing git-annex progress output to get slightly messed up. Error output would also probably do that, so perhaps it should capture stderr and only display it when yt-dlp exited nonzero? This option might also make sense for youtube-dl, I don't have an installation handy anymore to check.	2024-02-19 18:35:57 -04:00
Joey Hess	3475b09c3e	pre-commit: Avoid committing the git-annex branch Except when a commit is made in a view, which changes metadata. Make the assistant commit the git-annex branch after git commit of working tree changes. This allows using the annex.commitmessage-command in the assistant to generate a commit message for the git-annex branch that relies on state gathered during the commit of the working tree. Eg, it might reuse the commit message. Note that, when not using the assistant, a git-annex add still commits the git-annex branch, so such a annex.commitmessage-command set up would not work then. But if someone is using the assistant and wants programmatic control over commit messages, this is useful. Someone not using the assistant can get the same result by using annex.alwayscommit=false during the git-annex add, and git-annex merge after they git commit. pre-commit was never really intended to commit the git-annex branch (except after recording changed metadata), but the assistant did sort of rely on it. It does later commit the git-annex branch before pushing to remotes, but I didn't want to risk building up lots of uncommitted changes to it if that didn't happen frequently. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:42:11 -04:00
Joey Hess	68e99513f0	added annex.commitmessage-command config Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-12 14:35:22 -04:00
Joey Hess	21123ba368	assistant, undo: When committing, let the usual git commit hooks run Was doing a Git.Branch.commit for historical reasons to do with direct mode, which no longer apply. Note that the preCommitAnnexHook is no longer called in commitStaged because git-annex installs a pre-commit hook that runs the pre-commit-annex hook. And git commit will run the pre-commit hook. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-02-07 16:15:35 -04:00
Joey Hess	4f8fcf707d	stack.yaml: Update to lts-22.9 and use crypton.	2024-02-06 11:08:12 -04:00
Joey Hess	6b38d0c427	addurl, importfeed: Added --raw-except option --raw-except=web allows using yt-dlp but not any other special remotes. Currently this option can only be used once, trying to use it repeatedly will make option parsing fail. Perhaps it ought to support being used more than once, but it seemed like an unlikely use case to need that. Note that getParsed is called repeatedly when the option is used with several urls. While implementing DeferredParseClass would avoid that innefficiency, it didn't seem worth the added boilerplate since getParsed only calls byNameWithUUID which does minimal work. Sponsored-by: Dartmouth College's DANDI project	2024-02-05 15:16:25 -04:00
Joey Hess	2f3fe4d904	fix importfeed --force skip behavior reversion importfeed --force: Don't treat it as a failure when an already downloaded file exists. (Fixes a behavior change introduced in 10.20230626.) `04ee6c4c6b` caused the reversion. Inside a CommandPerform, stop causes it to fail. Before that commit, it was inside a CommandStart, where stop causes it to skip.	2024-02-02 15:57:07 -04:00
Joey Hess	0c64cd30c2	compare urls irrespective of downloader importfeed --force: Avoid creating duplicates of existing already downloaded files when yt-dlp or a special remote was used.	2024-02-02 15:50:56 -04:00
Joey Hess	90db97d9a2	importfeed: Added --scrape option Which uses yt-dlp to screen scrape the equivilant of an RSS feed. Note that youtubedlscraped is a speed optimisation. Since yt-dlp found the urls, we know it can download them. That avoids calling youtubeDlSupported on each url, which makes --fast a lot faster. Almost all the same metadata fields and file formatting fields are populated, when yt-dlp is able to get the data. Note that yt-dlp has some additional useful metadata that could be exposed. But, much of it is specific to particular websites, and it would be hard to document on the git-annex importfeed man page. Sponsored-by: unqueued on Patreon	2024-01-30 15:37:29 -04:00
Joey Hess	d61633e183	releasing package git-annex version 10.20240129	2024-01-29 14:12:12 -04:00
Joey Hess	0b8ba37d12	improve changelog	2024-01-25 14:28:19 -04:00
Joey Hess	8e9ee31621	webapp: Added --port option, and annex.port config The getSocket comment that mentioned using ":port" in the hostname seems to have been incorrect or be out of date. After all, the bug report came when the user first tried doing that, and it didn't work. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-01-25 14:08:36 -04:00
Joey Hess	b9e147d282	Added --expected-present file matching option	2024-01-25 12:56:41 -04:00
Joey Hess	20567e605a	add directional stalldetection and bwlimit configs Sponsored-by: Dartmouth College's DANDI project	2024-01-19 15:27:53 -04:00
Joey Hess	c02df79248	use watchFileSize in Remote.External.retrieveKeyFile external: Monitor file size when getting content from external special remotes and use that to update the progress meter, in case the external special remote program does not report progress. This relies on `703a70cafa` to prevent ever running the meter backwards. Sponsored-by: Dartmouth College's DANDI project	2024-01-19 14:34:30 -04:00
Joey Hess	c2634e7df2	automatically adjust stall detection period Improve annex.stalldetection to handle remotes that update progress less frequently than the configured time period. In particular, this makes remotes that don't report progress but are chunked work when transferring a single chunk takes longer than the specified time period. Any remotes that just have very low update granulatity would also be handled by this. The change to Remote.Helper.Chunked avoids an extra progress update when resuming an interrupted upload. In that case, the code saw first Nothing and then Just the already transferred number of bytes, which defeated this new heuristic. This change will mean that, when resuming an interrupted upload to a chunked remote that does not do its own progress reporting, the progress display does not start out displaying the amount sent so far, until after the first chunk is sent. This behavior change does not seem like a major problem. About the scalefudgefactor, it seems reasonable to expect subsequent chunks to take no more than 1.5 times as long as the first chunk to transfer. Could set it to 1, but then any chunk taking a little longer would be treated as a stall. 2 also seems a likely value. Even 10 might be fine? Sponsored-by: Dartmouth College's DANDI project	2024-01-18 17:12:10 -04:00
Joey Hess	e765d3e24c	import: --message/-m option	2024-01-18 12:41:44 -04:00
Joey Hess	f6cf2dec4c	disk free checking for unsized keys Improve disk free space checking when transferring unsized keys to local git remotes. Since the size of the object file is known, can check that instead. Getting unsized keys from local git remotes does not check the actual object size. It would be harder to handle that direction because the size check is run locally, before anything involving the remote is done. So it doesn't know the size of the file on the remote. Also, transferring unsized keys to other remotes, including ssh remotes and p2p remotes don't do disk size checking for unsized keys. This would need a change in protocol. (It does seem like it would be possible to implement the same thing for directory special remotes though.) In some sense, it might be better to not ever do disk free checking for unsized keys, than to do it only sometimes. A user might notice this direction working and consider it a bug that the other direction does not. On the other hand, disk reserve checking is not implemented for most special remotes at all, and yet it is implemented for a few, which is also inconsistent, but best effort. And so doing this best effort seems to make some sense. Fundamentally, if the user wants the size to always be checked, they should not use unsized keys. Sponsored-by: Brock Spratlen on Patreon	2024-01-16 14:29:10 -04:00
Joey Hess	7e69063a29	support annex.shared-sop-command for encryption=shared This works well, and it interoperates with gpg in my testing (although some SOP commands might choose to use a profile that does not so caveat emptor). Note that for creating the Cipher, gpg --gen-random is still used. SOP does not have an eqivilant, and as long as the user has gpg around, which seems likely, it doesn't matter that it uses gpg here, it's not being used for encryption. That seemed better than implementing a second way to get high quality entropy, at least for now. The need for the sop command to run in an empty directory has each call to encrypt and decrypt creating a new temporary directory. That is some unncessary overhead, though probably swamped by the overhead of running the sop command. This could be improved in the future by passing an already empty directory to them, or a sufficiently empty directory (.git/annex/tmp would probably suffice). Sponsored-by: Brett Eisenberg on Patreon	2024-01-12 13:31:18 -04:00
Joey Hess	d98f02a5b0	test annex.shared-sop-command Test a specified Stateless OpenPGP command with eg: git-annex test --test-git-config annex.shared-sop-command=sqop Also documented that config and another one, but so far only the test suite uses the configs, have not yet implemented using it for actual symmetric encryption. Sponsored-by: Joshua Antonishen on Patreon	2024-01-10 16:30:38 -04:00
Joey Hess	de6a297d36	assistant: When generating a gpg secret key, avoid hardcoding the key algorithm and size This aims to future-proof gpg key generation. OpenPGP is in flux with a conflict over standards ongoing. It seems not unlikely that different systems will have different gpg commands that support different algorithms. This also simplifies the code by using the --quick-gen-key interface rather than the experimental batch interface. It seems less likely that --quick-gen-key will break than an experimental interface (whose documentation I can no longer find). --quick-gen-key is supported since gpg 2.1.0 (2014). Sponsored-by: Graham Spencer on Patreon	2024-01-09 15:31:53 -04:00
Joey Hess	2c86651180	optimise adjustTree when adding many TreeItems The old code traversed the list of addtreeitems once per subdirectory in the tree, so could get quite slow. Converting to Map lookups sped it up significantly. In my test case, git-annex import used to take about 2 minutes, when calling adjustTree to add back excluded files to the imported tree. This dropped it down to 6 seconds. Of which 4 seconds are the actual enumeration of the contents of the remote, so really only 2 seconds for this. The path prefix map is a bit suboptimal memory-wise, since items get stored in the map once per subdirectory on the path to the item. It would perhaps be better to use a tree data structure. Also it's suboptimal memory-wise that it builds two maps, as well as retaining a reference to addtreeitems. I could not see a way around that though. Sponsored-by: Luke T. Shumaker on Patreon	2024-01-03 15:07:49 -04:00
Joey Hess	a5b9c2ca69	import: Sped up import from special remote when the imported tree is unchanged I saw a nearly 2 minute speed up from this, in a repo with 56000 files some of which are preferred content of the special remote and others not. In such a case, addBackExportExcluded has to do a lot of work, which is unncessary when the tree is unchanged. When using sync --content, preferred content checking of that many files takes about 1 minute. So this speeds up sync --content by 3x. When using git-annex import, the speed up is much larger. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-01-02 13:57:31 -04:00
Joey Hess	a4a5ec6366	info: Added "annex sizes of repositories" table to the overall display Thanks to previous work in `11cc9f1933`, this is almost entirely free, it only needs to do some additional map lookups and math. The strictness annotations keep the memory use from blowing up. Sponsored-by: unqueued on Patreon	2023-12-29 12:09:30 -04:00
Joey Hess	f3fa9dc65f	releasing package git-annex version 10.20231227	2023-12-27 19:27:55 -04:00
Joey Hess	8a3beabf35	use RawFilePath for opening sqlite databases Fix a crash opening sqlite databases when run in a non-unicode locale, with a remote that uses a non-unicode filepath. In that situation converting to Text fails. The fix needs git-annex to be built with persistent-sqlite 2.13.3. Building against older versions still works, but that version is used when building with stack. Database.RawFilePath is a lot of code copied from persistent-sqlite and lightly modified, since only 1 function in persistent-sqlite was made to support RawFilePath. This is a bit of a pain, and I hope that persistent-sqlite will eventually switch to using OsPath, allowing this module to be removed from git-annex. Sponsored-by: k0ld on Patreon	2023-12-26 18:31:52 -04:00
Joey Hess	6d789c9c81	sync, push: Avoid trying to send individual files to special remotes configured with importtree=yes exporttree=no That will always fail. It already skipped doing this when exporttree=yes.	2023-12-26 15:56:58 -04:00
Joey Hess	aec7bed1aa	prepping for release	2023-12-26 15:40:55 -04:00
Joey Hess	9a67ed0f10	importtree: support preferred content expressions needing keys When importing from a special remote, support preferred content expressions that use terms that match on keys (eg "present", "copies=1"). Such terms are ignored when importing, since the key is not known yet. When "standard" or "groupwanted" is used, the terms in those expressions also get pruned accordingly. This does allow setting preferred content to "not (copies=1)" to make a special remote into a "source" type of repository. Importing from it will import all files. Then exporting to it will drop all files from it. In the case of setting preferred content to "present", it's pruned on import, so everything gets imported from it. Then on export, it's applied, and everything in it is left on it, and no new content is exported to it. Since the old behavior on these preferred content expressions was for importtree to error out, there's no backwards compatability to worry about. Except that sync/pull/etc will now import where before it errored out.	2023-12-18 16:27:59 -04:00
Joey Hess	eb59da9dd2	Lower precision of timestamps in git-annex branch This can reduce the size of the branch by up to 8%. My test was running git-annex add 1000 times on one file each. Lots of different high-resolution timestamps were recorded before and eliminating those, after packing, the git repo was 8% smaller. Due to the use of vector clocks, high resolution timestamps are not necessary to make clear which information is most recent when eg, a value is changed repeatedly in the same second. In such a case, the vector clock will be advanced to the next second after the last modification. For example, running git-annex numcopies 1; git-annex numcopies 2 The first will record the current second, while the next records the second after that even if it runs in the same second. As for conflicting information written to two different clones of the repository, this will make git-annex sometimes pick information that was written earlier in a second over information written later in the same second. Usually git-annex does not write conflicting information, but there are some cases where it could. Eg, storing an object on a remote can update the remote state log with some state. If two repos both store the same object, and end up storing different remote state for some reason, this can result in one that ran a tiny bit later winning. Such a situation seems unlikely to be user visible. And a small amount of clock skew could already result in such things. The only case I can think of where this might be a user visible change is if a configuration command like git-annex numcopies is being run in 2 clones of a repository on the same machine at very close to the same time. Then the user will know which they ran last, and git-annex won't. If that did become a problem, this could be dialed back to eg log milliseconds with still some space saving.	2023-12-11 15:04:06 -04:00
Joey Hess	86dbe9a825	migrate: support adding size back to URL keys migrate: Support adding size to URL keys that were added with --relaxed, by running eg: git-annex migrate --backend=URL foo Since url keys cannot be generated, that used to fail. Make it notice that the backend is not changed, and just get the size of the content. Sponsored-by: Brock Spratlen on Patreon	2023-12-08 16:22:14 -04:00
Joey Hess	257f01729c	distributed migration for pull and sync --content pull, sync: When operating on content, automatically hard link objects that have been migrated. Added annex.syncmigrations config that can be set to false to prevent pull and sync from migrating object content. I think that true is a good default for this config, because it avoids users having to re-download migrated content or learning about migration. But, some users will surely not like it, whether because it does take some time (especially for the first git-annex branch scan when there is a long history), or because they want to deal with it manually, or because their filesystem doesn't support hard links and they don't want it to copy objects. Sponsored-by: k0ld on Patreon	2023-12-08 14:18:18 -04:00
Joey Hess	4ed71b34de	migrate --apply And avoid migrate --update/--aply migrating when the new key was already present in the repository, and got dropped. Luckily, the location log allows distinguishing from the new key never having been present! That is mostly useful for --apply because otherwise dropped files would keep coming back until the old objects were reaped as unused. But it seemed to make sense to also do it for --update. for consistency in edge cases if nothing else. One case where --update can use it is when one branch got migrated earlier, and we dropped the file, and now another branch has migrated the same file. Sponsored-by: Jack Hill on Patreon	2023-12-08 13:23:46 -04:00
Joey Hess	f1ce15036f	started migrate --update This is most of the way there, but not quite working. The layout of migrate.tree/ needs to be changed to follow this approach. git log will list all the files in tree order, so the new layout needs to alternate old and new keys. Can that be done? git may not document tree order, or may not preserve it here. Alternatively, change to using git log --format=raw and extract the tree header from that, then use git diff --raw $tree:migrate.tree/old $tree:migrate.tree/new That will be a little more expensive, but only when there are lots of migrations. Sponsored-by: Joshua Antonishen on Patreon	2023-12-07 15:50:52 -04:00
Joey Hess	a6eb7d7339	prevent relatedTemplate from truncating a filename to end in "." Avoid a problem with temp file names ending in "." on certian filesystems that have problems with such filenames. relatedTemplate is quite an ugly hack really; since it doesn't know the max filename length of the filesystem it can only assume that the filename is max allowed length. When given the input "lh.aparc.DKTatlas.annot", it wants to reserve 20 characters for tempfile so it truncates to "lh.". That ending period is apparently a problem on some filesystem (FAT eats it, but does not throw EINVAL; ntfs does not seem bothered by it, I don't know what FUSE filesystem the bug reporter was really using). Sponsored-by: Brett Eisenberg on Patreon	2023-12-05 12:38:14 -04:00
Joey Hess	0485dd3161	sync: Fix locking problems during merge when annex.pidlock is set Presumably git merge sometimes needs to verifiy if a worktree file is modified, and so will then run git-annex filter-process which would try to take the pid lock. And for whatever reason, git-annex sync already had the pidlock held. I have not replicated that, but it does make enough sense to deploy the workaround. Like I said back in commit `7bdb0cdc0d`, Arguably, it would be better to have a way to make any process git-annex runs have the env var set. But then it would need to take the pid lock when running any and all processes, and that would be a problem when git-annex runs two processes concurrently. So, I'm left doing it ad-hoc in places where git-annex really does run a child process, directly or indirectly via a particular git command. Sponsored-by: KDM on Patreon	2023-12-04 13:40:28 -04:00
Joey Hess	1e31bf8122	copy/move --from-anywhere --to remote Implementation was simple because it's equivilant to --from=foo --to remote for each other remote, followed by --to remote when there's a local copy. (Or, in the edge case of --from-anywhere --to=here, it's the same as --to=here.) Note that, when the local repo does not have a copy, fromToPerform gets it from a remote, sends it to the destination, and drops the local copy. Another call to that for a second remote will notice that the dest now has a copy, and simply drop from the second remote, avoiding a second transfer. Also note that, when numcopies doesn't allow dropping it from everywhere, it will drop it from the cheapest remotes first (maybe not ideal) up to more expensive remotes, and finally from the local repo. So the local repo will generally end up holding a copy. Maybe not ideal in all cases either, but it seems no worse to do that than to end up with a copy undropped from a remote. And I'm not entirely happy with the output, eg: copy bigfile (from r3...) ok copy bigfile ok That makes sense if you think of the second line as being the same as what is output by `git-annex copy bigfile --to bar`, but it's less clear in this context. Maybe add "(from here...)"? Also the --json output doesn't have a machine-readable field for the "from" uuid, and maybe it should? Sponsored-by: Dartmouth College's DANDI project	2023-11-30 16:34:30 -04:00
Joey Hess	1654572bc1	fix --from overriding annex-ignore Make git-annex get/copy/move --from foo override configuration of remote.foo.annex-ignore, as documented. This already worked for remotes supporting hasKeyCheap. For others though, git-annex copy --from foo would silently not do anything, while git-annex copy --to foo would use the annex-ignored remote. Also improved the annex-ignore docs, to reflect that `git-annex get` without --from will skip using annex-ignored remotes, for example. Sponsored-by: Dartmouth College's DANDI project	2023-11-30 15:12:07 -04:00
Joey Hess	bacd781c4f	releasing package git-annex version 10.20231129	2023-11-29 16:01:01 -04:00
Joey Hess	f3f864fc6d	findkeys: Support --largerthan and --smallerthan Sponsored-by: Brett Eisenberg on Patreon	2023-11-28 11:51:43 -04:00
Joey Hess	6e3bcbf4dd	Make git-annex copy --from --to --fast actually fast Eg when the destination is logged as containing a file, skip actively checking that it does contain it. Note that --fast does not prevent other verifications of content location that are done in a copy --from --to. Perhaps it could, but this change will already avoid the real unnecessary work of operating on files that are already in the remote. And avoiding other verifications might cause it to fail if the location log thinks that --to does not contain the content but does. Such complications with `git-annex copy --to remote --fast` led to commit `d006586cd0` which added a note that gets displayed when that fails, mentioning it might be due to --fast being enabled. copy --from --to is already complicated enough without needing to worry about such edge cases, so continuing to doing some verification of content location after the initial --fast filtering seems ok. Sponsored-by: Dartmouth College's DANDI project	2023-11-17 17:37:58 -04:00
Joey Hess	7a8393ce7d	Fix bug in git-annex copy --from --to Caused it to skip files that were locally present. Sponsored-by: Dartmouth College's DANDI project	2023-11-17 16:30:20 -04:00
Joey Hess	7d67229884	git-annex log --gnuplot The gnuplot output is pretty good, but could still be improved with: * more colors (repeating colors is confusing with a lot of repos) * better positioning of the legend, making the plot wider and moving it from over top of the graph Sponsored-by: Kevin Mueller on Patreon	2023-11-14 14:56:58 -04:00
Joey Hess	0fdc1a54db	git-annex log --received modifier option Only counting received and not dropped makes this show the bandwidth of data coming into the repository, although only in a sense. Since git-annex branch updates only happen at the end of a command, and we don't know when a command started, it's only an approximation of the actual bandwidth. (A previous git-annex branch update made have happened in a different repository.) It would be possible to also add a --dropped option, but I don't know how useful that would be? Sponsored-by: Nicholas Golder-Manning on Patreon	2023-11-14 14:04:46 -04:00
Joey Hess	574514545c	git-annex log --sizesof This can take a lot of memory. I decided to violate the usual rule in git-annex that it operate in constant memory no matter how many annexed objects. In this case, it would be hard to be fast without using a big map of the location logs. The main difficulty here is that there can be many git-annex branches and it needs to display a consistent view at a point in time, which means merging information from multiple git-annex branches. I have not checked if there are any laziness leaks in this code. It takes 1 gb to run in my big repo, which is around what I estimated before writing it. 2 options that are documented are not yet implemented. Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the next change after 12:59 is then. Then it waits until after 2:10 to display the next change. It ought to wait until after 2:00. Sponsored-by: Brock Spratlen on Patreon	2023-11-10 17:26:10 -04:00
Joey Hess	11cc9f1933	info: Added calculation of combined annex size of all repositories Factored out overLocationLogs from CmdLine.Seek, which can calculate this pretty fast even in a large repo. In my big repo, the time to run git-annex info went up from 1.33s to 8.5s. Note that the "backend usage" stats are for annexed files in the working tree only, not all annexed files. This new data source would let that be changed, but that would be a confusing behavior change. And I cannot retitle it either, out of fear something uses the current title (eg parsing the json). Also note that, while time says "402108maxresident" in my big repo now, up from "54092maxresident", top shows the RES constant at 64mb, and it was 48mb before. So I don't think there is a memory leak. I tried using deepseq to force full evaluation of addKeyCopies and memory use didn't change, which also says no memory leak. And indeed, not even calling addKeyCopies resulted in the same memory use. Probably the increased memory usage is buffering the stream of data from git in overLocationLogs. Sponsored-by: Brett Eisenberg on Patreon	2023-11-08 13:35:11 -04:00
Joey Hess	4e35067325	windows hook scripts newlines without CR Windows: When git-annex init is installing hook scripts, it will avoid ending lines with CR for portability. Existing hook scripts that do have CR line endings will not be changed. While it would be possible to have git-annex init upgrade them, users would need to know to use that command to do that, and it would add complexity that does not seem warranted for the portability benefit alone. Sponsored-by: Luke T. Shumaker on Patreon	2023-11-02 13:37:04 -04:00
Joey Hess	f8d35d9480	lookupkey: Sped up --batch When the file is relative, it does not need to be passed through git lsfiles to normalize it. Sponsored-by: Kevin Mueller on Patreon	2023-10-30 14:59:09 -04:00
Joey Hess	39ca30e004	Windows: Consistently avoid ending output lines with CR This matches the behavior of git on Windows, which does not end lines with CR either. Previously, git-annex used to always write lines with putStrLn, so would output CR on Windows. Then parts of it changed to use ByteString.putStrLn, which does not output CR. That left its output inconsistent, sometimes within the same command. The point of this commit is to get back to consistency. Having the same behavior as git is a nice bonus. It would be much harder to make it consistently output CR, because every place it uses ByteString.putStrLn or similar would need to be changed. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-10-30 14:43:43 -04:00
Joey Hess	eb42935e58	Windows: Fix CRLF handling in some log files In particular, the mergedrefs file was written with CR added to each line, but read without CRLF handling. This resulted in each update of the file adding CR to each line in it, growing the number of lines, while also preventing the optimisation from working, so it remerged unncessarily. writeFile and readFile do NewlineMode translation on Windows. But the ByteString conversion prevented that from happening any longer. I've audited for other cases of this, and found three more (.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All of those also only prevent optimisations from working. Some other files are currently both read and written with ByteString, but old git-annex may have written them with NewlineMode translation. Other files are at risk for breakage later if the reader gets converted to ByteString. This is a minimal fix, but should be enough, as long as I remember to use fileLines when splitting a ByteString into lines. This leaves files written using ByteString without CR added, but that's ok because old git-annex has no difficulty reading such files. When the mergedrefs file has gotten lines that end with "\r\r\r\n", this will eventually clean it up. Each update will remove a single trailing CR. Note that S8.lines is still used in eg Command.Unused, where it is parsing git show-ref, and similar in Git/*. git commands don't include CR in their output so that's ok. Sponsored-by: Joshua Antonishen on Patreon	2023-10-30 14:23:23 -04:00
Joey Hess	0da1d40cd4	Improve memory use of --all when using annex.private This does not improve Annex.Branch.files at all, since it still uses ++ to combine the lists, so forcing all but the last one. But when there are a lot of files in the private journal, it does avoid --all (or a bare repo) from buffering the filenames in memory. See commit `653b719472` for prior discussion of this buffering. Sponsored-by: Graham Spencer on Patreon	2023-10-24 13:20:55 -04:00
Joey Hess	8bde6101e3	sqlite datbase for importfeed importfeed: Use caching database to avoid needing to list urls on every run, and avoid using too much memory. Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster, and memory use dropped from 203000k to 59408k. Database.ImportFeed is Database.ContentIdentifier with the serial number filed off. There is a bit of code duplication I would like to avoid, particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use the persistent sqlite tables, so despite the code being the same, they cannot be factored out. Since this database includes the contentidentifier metadata, it will be slightly redundant if a sqlite database is ever added for metadata. I did consider making such a generic database and using it for this. But, that would then need importfeed to update both the url database and the metadata database, which is twice as much work diffing the git-annex branch trees. Or would entagle updating two databases in a complex way. So instead it seems better to optimise the database that importfeed needs, and if the metadata database is used by another command, use a little more disk space and do a little bit of redundant work to update it. Sponsored-by: unqueued on Patreon	2023-10-23 16:46:22 -04:00
Joey Hess	6a61c7ff45	Fix crash of enableremote when the special remote has embedcreds=yes The crash occurred because writeCreds got called twice, and writeFileProtected neglected to close its file handle, so the file was open for write when written the second time. It seems unncessary and suboptimal that writeCreds gets called twice. One call is from getRemoteCredPair and the other from setRemoteCredPair'. What happens is that in the enableremote case, code that also runs at initremote does unncessary work. Might be possible to improve that, but I've gone for the simple fix. Sponsored-by: k0ld on Patreon	2023-10-20 13:19:12 -04:00
Joey Hess	c268dc5878	only stage regular files from the journal git-annex only writes regular files there, but other things may drop junk like empty .DAV directories around the tree. And trying to hash such things can have weird and hard to understand effects. So it seems best to do a small amount of work in statting the journal file to make sure it's a regular file. Sponsored-by: Jack Hill on Patreon	2023-10-10 13:22:02 -04:00
Joey Hess	b9240d2c5d	releasing package git-annex version 10.20230926	2023-09-26 13:29:49 -04:00

1 2 3 4 5 ...

1783 commits