git-annex

Author	SHA1	Message	Date
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	ed39979ac8	import: Avoid following symbolic links inside directories being imported Too big a footgun. This does not prevent attackers who can write to the directory being imported from racing the check. But they can cause anything to be imported anyway, so would be limited to getting the legacy import to follow into a directory they do not write to, and move files out of it into the annex. (The directory special remote does not have that problem since it does not move files.) Sponsored-by: Jack Hill on Patreon	2022-08-19 13:31:16 -04:00
Joey Hess	23c6e350cb	improve createDirectoryUnder to allow alternate top directories This should not change the behavior of it, unless there are multiple top directories, and then it should behave the same as if there was a single top directory that was actually above the directory to be created. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 12:52:37 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	2c1288334d	Avoid running bup join concurrently with bup split On the bup mailing list, this was hypothesized as also being a concurrency problem. Sponsored-by: Svenne Krap on Patreon	2022-08-09 10:40:45 -04:00
Joey Hess	abd417d4fe	Avoid running multiple bup split processes concurrently Since bup split is not concurrency safe. Used a lock file so that 2 git-annex processes only run one bup split between them (per bup repo). (Concurrent writes from different git-annex repository clones to the same bup repo could still have concurrency problems.) Sponsored-by: Noam Kremen on Patreon	2022-08-08 18:54:06 -04:00
Joey Hess	04247fb4d0	avoid surprising "not found" error when copying to a http remote git-annex copy --to a http remote will of course fail, as that's not supported. But git-annex copy first checks if the content is already present in the remote, and that threw a "not found". Looks to me like other remotes that use Url.checkBoth in their checkPresent do just return false when it fails. And Url.checkBoth does display errors when unusual errors occur. So I'm pretty sure removing this error message is ok. Sponsored-by: Jarkko Kniivilä on Patreon	2022-08-08 11:57:24 -04:00
Joey Hess	5bc70e2da5	When bup split fails, display its stderr It seems worth noting here that I emailed bup's author about bup split being noisy on stderr even with -q in approximately 2011. That never got fixed. Its current repo on github only accepts pull requests, not bug reports. Needing to add such complexity to deal with such a longstanding unfixed issue is not fun. Sponsored-by: Kevin Mueller on Patreon	2022-08-05 13:57:20 -04:00
Joey Hess	f94908f2a6	improve output when storing to bup bup split outputs to stderr even with -q. This was discarded when using -J, but it was still outputting when not using -J, and so was git-annex. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-08-05 12:29:33 -04:00
Joey Hess	093ad89ead	S3: Avoid writing or checking the uuid file in the S3 bucket when importtree=yes or exporttree=yes It does not make sense for either; importing from an existing bucket should not write to it. And the user may not have write access at all. And exporting to a bucket should not write other files. Also this prevents the uuid file being imported after being written. Sponsored-by: Dartmouth College's DANDI project	2022-07-14 15:05:51 -04:00
Joey Hess	50c2cac7e7	adb: Added configuration setting oldandroid=true To avoid using find -printf, which was first supported in Android around 2019-2020. Probing seems too fragile, and execing stat once per file is too slow to do when there's a faster way available, which brought me to an option... Sponsored-by: Brett Eisenberg on Patreon	2022-07-13 18:00:47 -04:00
Joey Hess	2d65c4ff1d	avoid unix-compat's rename On Windows, that does not support long paths https://github.com/jacobstanley/unix-compat/issues/56 Instead, use System.Directory.renamePath, which does support long paths. Sponsored-by: Dartmouth College's Datalad project	2022-07-12 14:55:02 -04:00
Joey Hess	21c50c0f72	fix parallel copy from/to a local git repo Improve handling of parallelization with -J when copying content from/to a git remote that is a local path. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-06-29 12:40:12 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	95a04920cf	remove objectDir'	2022-06-22 16:08:49 -04:00
Joey Hess	13fc6a9b6a	fix to use 1 chunk for empty file Fix retrival of an empty file that is stored in a special remote with chunking enabled. The speculative chunk stuff caused a reversion by adding an empty list for the empty file. Which is just wrong; the empty file is still stored on the remote, and should be retrieved like any other file. It uses 1 chunk, so `max 1` is the simple fix. Sponsored-by: Noam Kremen on Patreon	2022-06-09 14:24:56 -04:00
Joey Hess	f30532614f	fix typo	2022-06-09 13:40:05 -04:00
Joey Hess	14584e7a38	initremote type=git probe uuid rather than matching path of an existing remote to find the uuid. The main benefit of this is that locations not using ssh:// will work now, including both paths and host:/path The other benefit is that it's a simpler interface, no need to have an existing remote with the same url and some other name. Although that will still work of course. This does rely on tryGitConfigRead working when given a Git.Repo that is not a remote. Luckily, it works fine that way. Also, tryGitConfigRead will auto-init a local repo that has a git-annex branch. I did not enable auto-init of ssh repos though. The uuid discovery actually happens twice; initremote discovers it, and uses it to store the special remote config, but does not set it in the git remote it creates. So the next run of git-annex does uuid discovery again, and caches it that time. This could be improved for a tiny speedup, but I didn't want to complicate things for that in this commit. Sponsored-by: Dartmouth College's DANDI project	2022-06-09 13:16:50 -04:00
Joey Hess	54809e9eb3	fix untrustworthiness of import/export remotes Commit `36133f27c0` had a boolean flip in it, aaargh. Special remotes with importtree=yes or exporttree=yes are once again treated as untrusted, since files stored in them can be deleted or modified at any time. Sponsored-by: Kevin Mueller on Patreon	2022-05-09 15:53:23 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	2f2701137d	incremental verification for retrieval from all export remotes Only for export remotes so far, not export/import. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 13:49:33 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	43701759a3	disable shellescape for rsync 3.2.4 rsync 3.2.4 broke backwards-compatability by preventing exposing filenames to the shell. Made the rsync and gcrypt special remotes detect this and disable shellescape. An alternative fix would have been to always set RSYNC_OLD_ARGS=1. Which would avoid the overhead of probing rsync --help for each affected remote. But that is really very fast to run, and it seemed better to switch to the modern code path rather than keeping on using the bad old code path. Sponsored-by: Tobias Ammann on Patreon	2022-05-03 12:12:41 -04:00
Joey Hess	3e2f1f73cb	add back inode to directory special remote ContentIdentifier Directory special remotes with importtree=yes have changed to once more take inodes into account. This will cause extra work when importing from a directory on a FAT filesystem that changes inodes on every mount. To avoid that extra work, set ignoreinodes=yes when initializing a new directory special remote, or change the configuration of your existing remote: git-annex enableremote foo ignoreinodes=yes This will mean a one-time re-import of all contents from every directory special remote due to the changed setting. `73df633a62` thought it was too unlikely that there would be modifications that the inode number was needed to notice. That was probably right; it's very unlikely that a file will get modified and end up with the same size and mtime as before. But, what was not considered is that a program like NextCloud might write two files with different content so closely together that they share the mtime. The inode is necessary to detect that situation. Sponsored-by: Max Thoursie on Patreon	2022-03-21 13:12:02 -04:00
Joey Hess	952664641a	turn of PackageImports in cabal file This makes it easier to build eg benchmarks of individual modules. May be that most of these PackageImports are not really necessary, dunno.	2022-02-25 13:16:36 -04:00
Joey Hess	a32ff6cef0	adb: Avoid find failing with "Argument list too long" The "+" argument only runs the command once, so is not safe to use. Using ";" instead would have been the simplest fix, but also the slowest. Since my phone has an xargs that supports -0, I piped find to xargs instead. Unsure how portable this will be, perhaps some android's don't have xargs -0 or find -printf to send null terminated output. The business with pipefail is necessary to make a failure of find cause the import to fail. Probably this works on all androids, but if not, it will probably just result in a failure of find being ignored. It would be possible to make ignorefinderror just disable setting pipefail, but then if some android has a shell that has pipefail enabled by default, ignorefinderror would not work, so I kept the \|\| true approach for that. Sponsored-by: Max Thoursie on Patreon	2022-01-31 13:19:09 -04:00
Joey Hess	525473aa5a	adb: Added ignorefinderror configuration parameter On a phone with Calyxos, adb find in /sdcard complains: find: ./Android/data/com.android.providers.downloads.ui: Permission denied But otherwise works, so this option makes import and export work ok, except for that one app's data. Sponsored-by: Graham Spencer	2022-01-10 21:17:00 -04:00
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	0584e096d1	comment	2022-01-03 13:53:34 -04:00
Joey Hess	19b87f7396	avoid no longer necessary piping of ssh stderr for p2pstdio This was needed when supporting old git-annex-shell that do not support p2pstdio yet, in order to cleanly fall back to the old interface without error messages being displayed. That is no longer supported, so simplify to not intercept error messages. Sponsored-by: Dartmouth College's DANDI project	2022-01-03 12:54:40 -04:00
Joey Hess	92cc28c316	remove obsolete comment	2022-01-03 12:38:29 -04:00
Joey Hess	4b19626a36	Fix build with ghc 9.0.1 Continuing along the same lines as commit `2739adc258`, it seems that while Remote -> Retriever expands to the same data type this changes it to, ghc 9.0.1 refuses to consider them equiviant. I guess it has something to do with the forall? The rest of the build all succeeds, although the stack build then crashes: Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.4.0.0/build/git-annex/git-annex ... Completed 233 action(s). Prelude.chr: bad argument: 2214592520 This issue seems likely to be about it: https://github.com/commercialhaskell/stack/pull/5508 I'm building with stack from debian, version 2.3.3, so a newer stack probably avoids that. Anyway, despite that stack problem, the git-annex binary is built, and works. The stack.yaml I used for this build was patched as follows: diff --git a/stack.yaml b/stack.yaml index 8dac87c15..62c4b5b9d 100644 --- a/stack.yaml +++ b/stack.yaml @@ -1,6 +1,6 @@ flags: git-annex: - production: true + production: false assistant: true pairing: true torrentparser: true @@ -14,7 +14,7 @@ flags: httpclientrestricted: true packages: - '.' -resolver: lts-18.13 +resolver: nightly-2021-09-07 extra-deps: - IfElse-0.85 - aws-0.22 Sponsored-by: Graham Spencer on Patreon	2021-12-08 15:08:02 -04:00
Joey Hess	f3326b8b5a	git-lfs gitlab interoperability fix git-lfs: Fix interoperability with gitlab's implementation of the git-lfs protocol, which requests Content-Encoding chunked. Sponsored-by: Dartmouth College's Datalad project	2021-11-10 13:51:11 -04:00
Joey Hess	8034f2e9bb	factor out IncrementalHasher from IncrementalVerifier	2021-11-09 12:33:22 -04:00
Joey Hess	29d687dce9	When retrival from a chunked remote fails, display the error that occurred when downloading the chunk Rather than the error that occurred when trying to download the unchunked content, which is less likely to actually be stored in the remote. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-10-14 12:45:05 -04:00
Joey Hess	17a0fa3dbc	negotiate P2P protocol version for tor remotes This negotiation is not supported by versions of git-annex older than 6.20180312. Well, maybe really 6.20180227 or so, but using that in the changelog simplifies things since it was the version for the other changes as well. See commit `c81768d425` for the back story. As well as allowing for future protocol improvements, this will result in negoatiating protocol version 1, which is an improvement over default version 0. In fact, it looks like no supported version of git-annex will use protocol version 0, since version 1 was introduced in 6.20180227. Still, removing the code for version 0 seems unncessary. See commit `31e1adc005`. Sponsored-by: Brett Eisenberg on Patreon.	2021-10-11 15:58:51 -04:00
Joey Hess	e43aaa22be	Merge branch 'p2pflagday'	2021-10-11 15:42:52 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	2e94ba9c70	remove broken code git-annex-shell fsck has never worked, back in commit `1ffb3bb0ba` I discussed maybe adding it one day, but this code has always failed.	2021-10-11 14:59:27 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	7ccf642863	revert change that broke test_readonly commit `63d508e885` broke test_readonly. When a local git remote is readonly, tryCopyCoW run to copy a file from it failed at withOtherTmp. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:02:41 -04:00
Joey Hess	798b33ba3d	simplify annex.bwlimit handling RemoteGitConfig parsing looks for annex.bwlimit when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Svenne Krap on Patreon	2021-09-22 10:52:01 -04:00
Joey Hess	05a097cde8	Merge branch 'master' into bwlimit	2021-09-22 10:48:27 -04:00
Joey Hess	63d508e885	resume properly when copying a file to/from a local git remote is interrupted Probably this fixes a reversion, but I don't know what version broke it. This does use withOtherTmp for a temp file that could be quite large. Though albeit a reflink copy that will not actually take up any space as long as the file it was copied from still exists. So if the copy cow succeeds but git-annex is interrupted just before that temp file gets renamed into the usual .git/annex/tmp/ location, there is a risk that the other temp directory ends up cluttered with a larger temp file than later. It will eventually be cleaned up, and the changes of this being a problem are small, so this seems like an acceptable thing to do. Sponsored-by: Shae Erisson on Patreon	2021-09-21 17:43:35 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	2739adc258	fix build with ghc 9.0.1 I was not able to test the whole build because of a very strange Prelude.chr: bad argument: 469762054 Which I assume is a problem with this version of ghc or the way I was using stack. The stack.yaml that builds it used this patch diff --git a/stack.yaml b/stack.yaml index 790bffff2..8bcbaa0ec 100644 --- a/stack.yaml +++ b/stack.yaml @@ -1,6 +1,6 @@ flags: git-annex: - production: true + production: false assistant: true pairing: true torrentparser: true @@ -18,13 +18,15 @@ extra-deps: - IfElse-0.85 - aws-0.22 - bloomfilter-2.0.1.0 -- filepath-bytestring-1.4.2.1.6 -- git-lfs-1.1.0 -- http-client-restricted-0.0.3 +- filepath-bytestring-1.4.2.1.8 +- git-lfs-1.1.1 +- http-client-restricted-0.0.4 - network-multicast-0.3.2 - sandi-0.5 - torrent-10000.1.1 - bencode-0.6.1.1 +- base16-bytestring-0.1.1.7 +- base64-bytestring-1.0.0.3 explicit-setup-deps: git-annex: true -resolver: lts-16.27 +resolver: nightly-2021-09-07	2021-09-07 16:53:07 -04:00
Joey Hess	9f38ecac1e	borg: Avoid trying to extract xattrs, ACLS, and bsdflags when retrieving from a borg repository That broke restoring on linux from a borg backup made on OSX. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-09-03 12:10:14 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00

1 2 3 4 5 ...

1470 commits