git-annex

Author	SHA1	Message	Date
Joey Hess	04ec726d3b	S3 region= S3: Support a region= configuration useful for some non-Amazon S3 implementations. This feature needs git-annex to be built with aws-0.24. datacenter= sets both the AWS hostname and region in one setting, which is easy when using AWS, but not useful for other hosts. So kept datacenter as-is, but added this additional config. Sponsored-By: Brett Eisenberg on Patreon	2023-02-06 14:08:45 -04:00
Joey Hess	3f5d8e2211	correct obsolete comment	2023-01-31 14:42:26 -04:00
Joey Hess	cfaae7e931	added an optional cost= configuration to all special remotes Note that when this is specified and an older git-annex is used to enableremote such a special remote, it will simply ignore the cost= field and use whatever the default cost is. In passing, fixed adb to support the remote.name.cost and remote.name.cost-command configs. Sponsored-by: Dartmouth College's DANDI project	2023-01-12 13:42:28 -04:00
Joey Hess	400ce29a94	remove a debug print and fix build	2023-01-12 13:18:25 -04:00
Joey Hess	8a305e5fa3	respect urlinclude/urlexclude of other web special remotes When a web special remote does not have urlinclude/urlexclude configured, make it respect the configuration of other web special remotes and avoid using urls that match the config of another. Note that the other web special remote does not have to be enabled. That seems ok, it would have been extra work to check for only ones that are enabled. The implementation does mean that the web special remote re-parses its own config once at startup, as well as re-parsing the configs of any other web special remotes. This should be a very small slowdown unless there are lots of web special remotes. Sponsored-by: Dartmouth College's DANDI project	2023-01-10 14:58:53 -04:00
Joey Hess	6fa166e1fc	web: Add urlinclude and urlexclude configuration settings Sponsored-by: Dartmouth College's DANDI project	2023-01-09 17:16:53 -04:00
Joey Hess	8d06930c88	web special remote is no longer a singleton Allow initremote of additional special remotes with type=web, in addition to the default web special remote. When --sameas=web is used, these provide additional names for the web special remote, and may also have their own additional configuration (once there is any for the web special remote) and cost. Sponsored-by: Dartmouth College's DANDI project	2023-01-09 15:49:20 -04:00
Joey Hess	f316b7f105	Revert "Removed the vendored git-lfs and the GitLfs build flag" This reverts commit `efda811404`. Turns out that datalad is building git-annex against debian bullseye. https://github.com/datalad/git-annex/issues/149	2023-01-04 17:33:29 -04:00
Joey Hess	efda811404	Removed the vendored git-lfs and the GitLfs build flag AFAICS all git-annex builds are using the git-lfs library not the vendored copy. Debian stable does have a too old haskell-git-lfs package to be able to build git-annex from source, but there is not currently a backport of a recent git-annex to Debian stable. And if they update the backport at some point, they should be able to backport the library too. Sponsored-by: Svenne Krap on Patreon	2022-12-26 12:49:53 -04:00
Joey Hess	9d60385001	convert renameFile to moveFile to support cross-device moves Improve handling of some .git/annex/ subdirectories being on other filesystems, in the bittorrent special remote, and youtube-dl integration, and git-annex addurl. The only one of these that I've confirmed to be a problem is in the bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are on different filesystems. As well as auditing for renameFile, also audited for createLink, all of those are ok as are the other remaining renameFile calls. Also audited all code paths that use .git/annex/othertmp, and did not find any other cross-device problems. So, removing mention of othertmp needing to be on the same device. Sponsored-by: Dartmouth College's Datalad project	2022-12-20 15:17:50 -04:00
Joey Hess	27444459e9	fix build warning	2022-11-09 15:33:46 -04:00
Joey Hess	c69d340ce5	remove dangling where clause	2022-11-09 13:25:05 -04:00
Joey Hess	e100993935	complete support for S3 signature=anonymous aws-0.23 has been released. When built with an older aws, initremote will error out when run with signature=anonymous. And when a remote has been initialized with that by a version of git-annex that does support it, older versions will fail when the remote is accessed, with a useful error message. Sponsored-by: Dartmouth College's DANDI project	2022-11-04 16:20:28 -04:00
Joey Hess	de1e8201a6	Merge branch 'master' into anons3	2022-11-04 15:08:29 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	b4305315b2	S3: pass fileprefix into getBucket calls S3: Speed up importing from a large bucket when fileprefix= is set by only asking for files under the prefix. getBucket still returns the files with the prefix included, so the rest of the fileprefix stripping still works unchanged. Sponsored-by: Dartmouth College's DANDI project	2022-10-10 17:37:26 -04:00
Joey Hess	ca91c3ba91	S3: Support signature=anonymous to access a S3 bucket anonymously This can be used, for example, with importtree=yes to import from a public bucket. This needs a patch that has not yet landed in the aws library, and will need to be adjusted to support compiling with old versions of the library, so is not yet suitable for merging. See https://github.com/aristidb/aws/pull/281 The stack.yaml changes are provided to show how to build against the aws fork and will need to be reverted as well. Sponsored-by: Dartmouth College's DANDI project	2022-10-10 17:02:45 -04:00
Joey Hess	90f9671e00	future proof AWS.Credentials generation Avoid breaking when a field is added to the constructor. Sponsored-by: Dartmouth College's DANDI project	2022-10-10 16:33:21 -04:00
Joey Hess	0756f4453d	try retrieval from more than one export location when the first fails Combined with commit `0ffc59d341`, this fixes the case where there are duplicate files on the special remote, and the first gets modified/deleted, while the second is still present. directory, adb: Fixed a bug when importtree=yes, and multiple files in the special remote have the same content, that caused it to refuse to get a file from the special remote, incorrectly complaining that it had changed, due to only accepting the inode+mtime of one file (that was since modified or deleted) and not accepting the inode+mtime of other duplicate files. Sponsored-by: Max Thoursie on Patreon	2022-09-20 13:33:57 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Joey Hess	1fe9cf7043	deal with ignoreinode config setting Improve handling of directory special remotes with importtree=yes whose ignoreinode setting has been changed. (By either enableremote or by upgrading to commit 3e2f1f73cbc5fc10475745b3c3133267bd1850a7.) When getting a file from such a remote, accept the content that would have been accepted with the previous ignoreinode setting. After a change to ignoreinode, importing a tree from the remote will re-import and generate new content identifiers using the new config. So when ignoreinode has changed to no, the inodes will be learned, and after that point, a change in an inode will be detected as a change. Before re-importing, a change in an inode will be ignored, as it was before the ignoreinode change. This seems acceptble, because the user can re-import immediately if they urgently need to add inodes. And if not, they'll do it sometime, presumably, and the change will take effect then. Sponsored-by: Erik Bjäreholt on Patreon	2022-09-16 14:11:25 -04:00
Joey Hess	c62fe5e9a8	avoid redundant prompt for http password in git-annex get that does autoinit autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git remote has not been probed yet, that will do a http get that will prompt for a password. And then the parent process will subsequently prompt for a password when getting annexed files from the remote. So the solution is for autoEnableSpecialRemotes to run remoteList before the subprocess, which will probe for the uuid for the git remote in the same process that will later be used to get annexed files. But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git, so Annex.Init cannot import Remote.List. Had to pass remoteList into functions in Annex.Init to get around this dependency loop.	2022-09-09 14:43:43 -04:00
Joey Hess	8a4cfd4f2d	use getSymbolicLinkStatus not getFileStatus to avoid crash on broken symlink Fix crash importing from a directory special remote that contains a broken symlink. The crash was in listImportableContentsM but some other places in Remote.Directory also seemed like they could have the same problem. Also audited for other places that have such a problem. Not all calls to getFileStatus are bad, in some cases it's better to crash on something unexpected. For example, `git-annex import path` when the path is a broken symlink should crash, the same as when it does not exist. Many of the getFileStatus calls are like that, particularly when they involve .git/annex/objects which should never have a broken symlink in it. Fixed a few other possible cases of the problem. Sponsored-by: Lawrence Brogan on Patreon	2022-09-05 13:46:32 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	ed39979ac8	import: Avoid following symbolic links inside directories being imported Too big a footgun. This does not prevent attackers who can write to the directory being imported from racing the check. But they can cause anything to be imported anyway, so would be limited to getting the legacy import to follow into a directory they do not write to, and move files out of it into the annex. (The directory special remote does not have that problem since it does not move files.) Sponsored-by: Jack Hill on Patreon	2022-08-19 13:31:16 -04:00
Joey Hess	23c6e350cb	improve createDirectoryUnder to allow alternate top directories This should not change the behavior of it, unless there are multiple top directories, and then it should behave the same as if there was a single top directory that was actually above the directory to be created. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 12:52:37 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	2c1288334d	Avoid running bup join concurrently with bup split On the bup mailing list, this was hypothesized as also being a concurrency problem. Sponsored-by: Svenne Krap on Patreon	2022-08-09 10:40:45 -04:00
Joey Hess	abd417d4fe	Avoid running multiple bup split processes concurrently Since bup split is not concurrency safe. Used a lock file so that 2 git-annex processes only run one bup split between them (per bup repo). (Concurrent writes from different git-annex repository clones to the same bup repo could still have concurrency problems.) Sponsored-by: Noam Kremen on Patreon	2022-08-08 18:54:06 -04:00
Joey Hess	04247fb4d0	avoid surprising "not found" error when copying to a http remote git-annex copy --to a http remote will of course fail, as that's not supported. But git-annex copy first checks if the content is already present in the remote, and that threw a "not found". Looks to me like other remotes that use Url.checkBoth in their checkPresent do just return false when it fails. And Url.checkBoth does display errors when unusual errors occur. So I'm pretty sure removing this error message is ok. Sponsored-by: Jarkko Kniivilä on Patreon	2022-08-08 11:57:24 -04:00
Joey Hess	5bc70e2da5	When bup split fails, display its stderr It seems worth noting here that I emailed bup's author about bup split being noisy on stderr even with -q in approximately 2011. That never got fixed. Its current repo on github only accepts pull requests, not bug reports. Needing to add such complexity to deal with such a longstanding unfixed issue is not fun. Sponsored-by: Kevin Mueller on Patreon	2022-08-05 13:57:20 -04:00
Joey Hess	f94908f2a6	improve output when storing to bup bup split outputs to stderr even with -q. This was discarded when using -J, but it was still outputting when not using -J, and so was git-annex. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-08-05 12:29:33 -04:00
Joey Hess	093ad89ead	S3: Avoid writing or checking the uuid file in the S3 bucket when importtree=yes or exporttree=yes It does not make sense for either; importing from an existing bucket should not write to it. And the user may not have write access at all. And exporting to a bucket should not write other files. Also this prevents the uuid file being imported after being written. Sponsored-by: Dartmouth College's DANDI project	2022-07-14 15:05:51 -04:00
Joey Hess	50c2cac7e7	adb: Added configuration setting oldandroid=true To avoid using find -printf, which was first supported in Android around 2019-2020. Probing seems too fragile, and execing stat once per file is too slow to do when there's a faster way available, which brought me to an option... Sponsored-by: Brett Eisenberg on Patreon	2022-07-13 18:00:47 -04:00
Joey Hess	2d65c4ff1d	avoid unix-compat's rename On Windows, that does not support long paths https://github.com/jacobstanley/unix-compat/issues/56 Instead, use System.Directory.renamePath, which does support long paths. Sponsored-by: Dartmouth College's Datalad project	2022-07-12 14:55:02 -04:00
Joey Hess	21c50c0f72	fix parallel copy from/to a local git repo Improve handling of parallelization with -J when copying content from/to a git remote that is a local path. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-06-29 12:40:12 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	95a04920cf	remove objectDir'	2022-06-22 16:08:49 -04:00
Joey Hess	13fc6a9b6a	fix to use 1 chunk for empty file Fix retrival of an empty file that is stored in a special remote with chunking enabled. The speculative chunk stuff caused a reversion by adding an empty list for the empty file. Which is just wrong; the empty file is still stored on the remote, and should be retrieved like any other file. It uses 1 chunk, so `max 1` is the simple fix. Sponsored-by: Noam Kremen on Patreon	2022-06-09 14:24:56 -04:00
Joey Hess	f30532614f	fix typo	2022-06-09 13:40:05 -04:00
Joey Hess	14584e7a38	initremote type=git probe uuid rather than matching path of an existing remote to find the uuid. The main benefit of this is that locations not using ssh:// will work now, including both paths and host:/path The other benefit is that it's a simpler interface, no need to have an existing remote with the same url and some other name. Although that will still work of course. This does rely on tryGitConfigRead working when given a Git.Repo that is not a remote. Luckily, it works fine that way. Also, tryGitConfigRead will auto-init a local repo that has a git-annex branch. I did not enable auto-init of ssh repos though. The uuid discovery actually happens twice; initremote discovers it, and uses it to store the special remote config, but does not set it in the git remote it creates. So the next run of git-annex does uuid discovery again, and caches it that time. This could be improved for a tiny speedup, but I didn't want to complicate things for that in this commit. Sponsored-by: Dartmouth College's DANDI project	2022-06-09 13:16:50 -04:00
Joey Hess	54809e9eb3	fix untrustworthiness of import/export remotes Commit `36133f27c0` had a boolean flip in it, aaargh. Special remotes with importtree=yes or exporttree=yes are once again treated as untrusted, since files stored in them can be deleted or modified at any time. Sponsored-by: Kevin Mueller on Patreon	2022-05-09 15:53:23 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	2f2701137d	incremental verification for retrieval from all export remotes Only for export remotes so far, not export/import. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 13:49:33 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	43701759a3	disable shellescape for rsync 3.2.4 rsync 3.2.4 broke backwards-compatability by preventing exposing filenames to the shell. Made the rsync and gcrypt special remotes detect this and disable shellescape. An alternative fix would have been to always set RSYNC_OLD_ARGS=1. Which would avoid the overhead of probing rsync --help for each affected remote. But that is really very fast to run, and it seemed better to switch to the modern code path rather than keeping on using the bad old code path. Sponsored-by: Tobias Ammann on Patreon	2022-05-03 12:12:41 -04:00
Joey Hess	3e2f1f73cb	add back inode to directory special remote ContentIdentifier Directory special remotes with importtree=yes have changed to once more take inodes into account. This will cause extra work when importing from a directory on a FAT filesystem that changes inodes on every mount. To avoid that extra work, set ignoreinodes=yes when initializing a new directory special remote, or change the configuration of your existing remote: git-annex enableremote foo ignoreinodes=yes This will mean a one-time re-import of all contents from every directory special remote due to the changed setting. `73df633a62` thought it was too unlikely that there would be modifications that the inode number was needed to notice. That was probably right; it's very unlikely that a file will get modified and end up with the same size and mtime as before. But, what was not considered is that a program like NextCloud might write two files with different content so closely together that they share the mtime. The inode is necessary to detect that situation. Sponsored-by: Max Thoursie on Patreon	2022-03-21 13:12:02 -04:00
Joey Hess	952664641a	turn of PackageImports in cabal file This makes it easier to build eg benchmarks of individual modules. May be that most of these PackageImports are not really necessary, dunno.	2022-02-25 13:16:36 -04:00
Joey Hess	a32ff6cef0	adb: Avoid find failing with "Argument list too long" The "+" argument only runs the command once, so is not safe to use. Using ";" instead would have been the simplest fix, but also the slowest. Since my phone has an xargs that supports -0, I piped find to xargs instead. Unsure how portable this will be, perhaps some android's don't have xargs -0 or find -printf to send null terminated output. The business with pipefail is necessary to make a failure of find cause the import to fail. Probably this works on all androids, but if not, it will probably just result in a failure of find being ignored. It would be possible to make ignorefinderror just disable setting pipefail, but then if some android has a shell that has pipefail enabled by default, ignorefinderror would not work, so I kept the \|\| true approach for that. Sponsored-by: Max Thoursie on Patreon	2022-01-31 13:19:09 -04:00
Joey Hess	525473aa5a	adb: Added ignorefinderror configuration parameter On a phone with Calyxos, adb find in /sdcard complains: find: ./Android/data/com.android.providers.downloads.ui: Permission denied But otherwise works, so this option makes import and export work ok, except for that one app's data. Sponsored-by: Graham Spencer	2022-01-10 21:17:00 -04:00
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	0584e096d1	comment	2022-01-03 13:53:34 -04:00
Joey Hess	19b87f7396	avoid no longer necessary piping of ssh stderr for p2pstdio This was needed when supporting old git-annex-shell that do not support p2pstdio yet, in order to cleanly fall back to the old interface without error messages being displayed. That is no longer supported, so simplify to not intercept error messages. Sponsored-by: Dartmouth College's DANDI project	2022-01-03 12:54:40 -04:00
Joey Hess	92cc28c316	remove obsolete comment	2022-01-03 12:38:29 -04:00
Joey Hess	4b19626a36	Fix build with ghc 9.0.1 Continuing along the same lines as commit `2739adc258`, it seems that while Remote -> Retriever expands to the same data type this changes it to, ghc 9.0.1 refuses to consider them equiviant. I guess it has something to do with the forall? The rest of the build all succeeds, although the stack build then crashes: Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.4.0.0/build/git-annex/git-annex ... Completed 233 action(s). Prelude.chr: bad argument: 2214592520 This issue seems likely to be about it: https://github.com/commercialhaskell/stack/pull/5508 I'm building with stack from debian, version 2.3.3, so a newer stack probably avoids that. Anyway, despite that stack problem, the git-annex binary is built, and works. The stack.yaml I used for this build was patched as follows: diff --git a/stack.yaml b/stack.yaml index 8dac87c15..62c4b5b9d 100644 --- a/stack.yaml +++ b/stack.yaml @@ -1,6 +1,6 @@ flags: git-annex: - production: true + production: false assistant: true pairing: true torrentparser: true @@ -14,7 +14,7 @@ flags: httpclientrestricted: true packages: - '.' -resolver: lts-18.13 +resolver: nightly-2021-09-07 extra-deps: - IfElse-0.85 - aws-0.22 Sponsored-by: Graham Spencer on Patreon	2021-12-08 15:08:02 -04:00
Joey Hess	f3326b8b5a	git-lfs gitlab interoperability fix git-lfs: Fix interoperability with gitlab's implementation of the git-lfs protocol, which requests Content-Encoding chunked. Sponsored-by: Dartmouth College's Datalad project	2021-11-10 13:51:11 -04:00
Joey Hess	8034f2e9bb	factor out IncrementalHasher from IncrementalVerifier	2021-11-09 12:33:22 -04:00
Joey Hess	29d687dce9	When retrival from a chunked remote fails, display the error that occurred when downloading the chunk Rather than the error that occurred when trying to download the unchunked content, which is less likely to actually be stored in the remote. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-10-14 12:45:05 -04:00
Joey Hess	17a0fa3dbc	negotiate P2P protocol version for tor remotes This negotiation is not supported by versions of git-annex older than 6.20180312. Well, maybe really 6.20180227 or so, but using that in the changelog simplifies things since it was the version for the other changes as well. See commit `c81768d425` for the back story. As well as allowing for future protocol improvements, this will result in negoatiating protocol version 1, which is an improvement over default version 0. In fact, it looks like no supported version of git-annex will use protocol version 0, since version 1 was introduced in 6.20180227. Still, removing the code for version 0 seems unncessary. See commit `31e1adc005`. Sponsored-by: Brett Eisenberg on Patreon.	2021-10-11 15:58:51 -04:00
Joey Hess	e43aaa22be	Merge branch 'p2pflagday'	2021-10-11 15:42:52 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	2e94ba9c70	remove broken code git-annex-shell fsck has never worked, back in commit `1ffb3bb0ba` I discussed maybe adding it one day, but this code has always failed.	2021-10-11 14:59:27 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	7ccf642863	revert change that broke test_readonly commit `63d508e885` broke test_readonly. When a local git remote is readonly, tryCopyCoW run to copy a file from it failed at withOtherTmp. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:02:41 -04:00
Joey Hess	798b33ba3d	simplify annex.bwlimit handling RemoteGitConfig parsing looks for annex.bwlimit when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Svenne Krap on Patreon	2021-09-22 10:52:01 -04:00
Joey Hess	05a097cde8	Merge branch 'master' into bwlimit	2021-09-22 10:48:27 -04:00
Joey Hess	63d508e885	resume properly when copying a file to/from a local git remote is interrupted Probably this fixes a reversion, but I don't know what version broke it. This does use withOtherTmp for a temp file that could be quite large. Though albeit a reflink copy that will not actually take up any space as long as the file it was copied from still exists. So if the copy cow succeeds but git-annex is interrupted just before that temp file gets renamed into the usual .git/annex/tmp/ location, there is a risk that the other temp directory ends up cluttered with a larger temp file than later. It will eventually be cleaned up, and the changes of this being a problem are small, so this seems like an acceptable thing to do. Sponsored-by: Shae Erisson on Patreon	2021-09-21 17:43:35 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	2739adc258	fix build with ghc 9.0.1 I was not able to test the whole build because of a very strange Prelude.chr: bad argument: 469762054 Which I assume is a problem with this version of ghc or the way I was using stack. The stack.yaml that builds it used this patch diff --git a/stack.yaml b/stack.yaml index 790bffff2..8bcbaa0ec 100644 --- a/stack.yaml +++ b/stack.yaml @@ -1,6 +1,6 @@ flags: git-annex: - production: true + production: false assistant: true pairing: true torrentparser: true @@ -18,13 +18,15 @@ extra-deps: - IfElse-0.85 - aws-0.22 - bloomfilter-2.0.1.0 -- filepath-bytestring-1.4.2.1.6 -- git-lfs-1.1.0 -- http-client-restricted-0.0.3 +- filepath-bytestring-1.4.2.1.8 +- git-lfs-1.1.1 +- http-client-restricted-0.0.4 - network-multicast-0.3.2 - sandi-0.5 - torrent-10000.1.1 - bencode-0.6.1.1 +- base16-bytestring-0.1.1.7 +- base64-bytestring-1.0.0.3 explicit-setup-deps: git-annex: true -resolver: lts-16.27 +resolver: nightly-2021-09-07	2021-09-07 16:53:07 -04:00
Joey Hess	9f38ecac1e	borg: Avoid trying to extract xattrs, ACLS, and bsdflags when retrieving from a borg repository That broke restoring on linux from a borg backup made on OSX. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-09-03 12:10:14 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	53744e132d	incremental verification for gitlfs and httpalso And that should be all the special remotes supporting it on linux now, except for in the odd edge case here and there. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:17:10 -04:00
Joey Hess	f5e09a1dbe	incremental verification for S3 Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:07:00 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	88b63a43fa	distinguish between incremental verification failing and not being done Sponsored-by: Dartmouth College's DANDI project	2021-08-18 14:38:02 -04:00
Joey Hess	325bfda12d	refactor	2021-08-18 13:37:00 -04:00
Joey Hess	449851225a	refactor IncrementalVerifier moved to Utility.Hash, which will let Utility.Url use it later. It's perhaps not really specific to hashing, but making a separate module just for the data type seemed unncessary. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 13:19:02 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	8613770b06	incremental verify for webdav special remote Sponsored-by: Dartmouth College's DANDI project	2021-08-16 17:29:32 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	a644f729ce	refactor fileCopier Sponsored-by: Dartmouth College's DANDI project	2021-08-16 15:56:24 -04:00
Joey Hess	d889ae0c01	move comment	2021-08-16 15:25:06 -04:00
Joey Hess	c4aba8e032	better handling of finishing up incomplete incremental verify Now it's run in VerifyStage. I thought about keeping the file handle open, and resuming reading where tailVerify left off. But that risks leaking open file handles, until the GC closes them, if the deferred verification does not get resumed. Since that could perhaps happen if there's an exception somewhere, I decided that was too unsafe. Instead, re-open the file, seek, and resume. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 14:52:59 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	7eb3742e4b	incremental verify for chunked remotes Simply feed each chunk in turn to the incremental verifier. When resuming an interrupted retrieve, it does not do incremental verification. That would need to read the file, up to the resume point, and feed it to the incremental verifier. That seems easy to get wrong. Also it would mean extra work done before the transfer can start. Which would complicate displaying progress, and would perhaps not appear to the user as if it was resuming from where it left off. Instead, in that situation, return UnVerified, and let the verification be done in a separate pass. Granted, Annex.CopyFile does manage all that, but it's not complicated by dealing with chunks too. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:42:49 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	a871bcfe77	simplify	2021-08-09 15:17:48 -04:00
Joey Hess	f1176f82a5	rsync special remote: Stop displaying rsync progress, and use git-annex's own progress display Reasons are same as in commit `cee14f147a`. (It was already done when using -J.) Sponsored-by: Mark Reidenbach on Patreon	2021-08-09 12:06:10 -04:00
Joey Hess	de482c7eeb	move verifyKeyContent to Annex.Verify The goal is that Database.Keys be able to use it; it can't use Annex.Content.Presence due to an import loop. Several other things also needed to be moved to Annex.Verify as a conseqence.	2021-07-27 14:07:23 -04:00
Joey Hess	e676cd43c0	propagate debugging into remote's Annex monad This is needed to make the debugging added in `0073384850` actually be displayed when running git-annex get from a local remote.	2021-07-26 11:40:51 -04:00
Joey Hess	635e7f3e26	split annexLocations To avoid mistakes like commit `0ccbed4f6f`, be explicit about the two variants of this. Incidentially avoids a small amount of overhead in calling reverse. Sponsored-by: Shae Erisson on Patreon	2021-07-16 14:17:56 -04:00
Joey Hess	c952c485c8	Fix retrieval of content from borg repos accessed over ssh It was making the borgrepo path absolute.. even when it was a ssh repository. Made BorgRepo a newtype, to guard against accidentially treating it like a FilePath. Sponsored-by: Graham Spencer on Patreon	2021-07-15 12:39:24 -04:00
Joey Hess	df2001aa88	Improve display of errors when transfers fail Transfers from or to a local git repo could fail without a reason being given, if the content failed to verify, or if the object file's stat changed while it was being copied. Now display messages in these cases. Sponsored-by: Jack Hill on Patreon	2021-06-25 13:17:04 -04:00
Joey Hess	0f73b6d03a	Avoid more than 1 gpg password prompt at the same time Which could happen occasionally before when concurrency is enabled. While not much of a problem when it did happen, better to avoid it. Also, since it seems likely the gpg-agent sometimes fails in such a situation, this makes it not happen when running a single git-annex command with concurrency enabled. This commit was sponsored by Jake Vosloo on Patreon.	2021-04-27 16:36:44 -04:00
Joey Hess	c7a3404b20	add missing whitespace	2021-04-27 15:23:56 -04:00
Joey Hess	f8836306fa	remove "checking remotename" message This fixes fsck of a remote that uses chunking displaying (checking remotename) (checking remotename)" for every chunk. Also, some remotes displayed the message, and others did not, with no consistency. It was originally displayed only when accessing remotes that were expensive or might involve a password prompt, I think, but nothing in the API said when to do it so it became an inconsistent mess. Originally I thought fsck should always display it. But it only displays in fsck --from remote, so the user knows the remote is being accessed, so there is no reason to tell them it's accessing it over and over. It was also possible for git-annex move to sometimes display it twice, due to checking if content is present twice. But, the user of move specifies --from/--to, so it does not need to display when it's accessing the remote, as the user expects it to access the remote. git-annex get might display it, but only if the remote also supports hasKeyCheap, which is really only local git remotes, which didn't display it always; and in any case nothing displayed it before hasKeyCheap, which is checked first, so I don't think this needs to display it ever. mirror is like move. And that's all the main places it would have been displayed. This commit was sponsored by Jochen Bartl on Patreon.	2021-04-27 13:05:27 -04:00
Joey Hess	0e830b6bb5	make remoteKeyToRemoteName safer If it's passed a ConfigKey such as annex.version, avoid returning an empty remote name and return Nothing instead. Also, foo.bar.baz is not treated as a remote named "bar".	2021-04-23 13:29:21 -04:00

1 2 3 4 5 ...

1543 commits