Commit graph

1483 commits

Author SHA1 Message Date
Joey Hess
27444459e9
fix build warning 2022-11-09 15:33:46 -04:00
Joey Hess
c69d340ce5
remove dangling where clause 2022-11-09 13:25:05 -04:00
Joey Hess
e100993935
complete support for S3 signature=anonymous
aws-0.23 has been released.

When built with an older aws, initremote will error out when run
with signature=anonymous. And when a remote has been initialized with
that by a version of git-annex that does support it, older versions will
fail when the remote is accessed, with a useful error message.

Sponsored-by: Dartmouth College's DANDI project
2022-11-04 16:20:28 -04:00
Joey Hess
de1e8201a6
Merge branch 'master' into anons3 2022-11-04 15:08:29 -04:00
Joey Hess
ba7ecbc6a9
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)

Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.

In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.

In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.

The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.

After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?

Sponsored-by: Dartmouth College's Datalad project
2022-10-12 14:12:23 -04:00
Joey Hess
b4305315b2
S3: pass fileprefix into getBucket calls
S3: Speed up importing from a large bucket when fileprefix= is set by only
asking for files under the prefix.

getBucket still returns the files with the prefix included, so the rest of
the fileprefix stripping still works unchanged.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:37:26 -04:00
Joey Hess
ca91c3ba91
S3: Support signature=anonymous to access a S3 bucket anonymously
This can be used, for example, with importtree=yes to import from a public
bucket.

This needs a patch that has not yet landed in the aws library, and will
need to be adjusted to support compiling with old versions of the library,
so is not yet suitable for merging.
See https://github.com/aristidb/aws/pull/281

The stack.yaml changes are provided to show how to build against the aws
fork and will need to be reverted as well.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:02:45 -04:00
Joey Hess
90f9671e00
future proof AWS.Credentials generation
Avoid breaking when a field is added to the constructor.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 16:33:21 -04:00
Joey Hess
0756f4453d
try retrieval from more than one export location when the first fails
Combined with commit 0ffc59d341, this
fixes the case where there are duplicate files on the special remote,
and the first gets modified/deleted, while the second is still present.

directory, adb: Fixed a bug when importtree=yes, and multiple files in the
special remote have the same content, that caused it to refuse to get a
file from the special remote, incorrectly complaining that it had changed,
due to only accepting the inode+mtime of one file (that was since modified
or deleted) and not accepting the inode+mtime of other duplicate files.

Sponsored-by: Max Thoursie on Patreon
2022-09-20 13:33:57 -04:00
Joey Hess
0ffc59d341
change retrieveExportWithContentIdentifier to take a list of ContentIdentifier
This partly fixes an issue where there are duplicate files in the
special remote, and the first file gets swapped with another duplicate,
or deleted. The swap case is fixed by this, the deleted case will need
other changes.

This makes retrieveExportWithContentIdentifier take a list of allowed
ContentIdentifier, same as storeExportWithContentIdentifier,
removeExportWithContentIdentifier, and
checkPresentExportWithContentIdentifier.

Of the special remotes that support importtree, borg is a special case
and does not use content identifiers, S3 I assume can't get mixed up
like this, directory certainly has the problem, and adb also appears to
have had the problem.

Sponsored-by: Graham Spencer on Patreon
2022-09-20 13:19:42 -04:00
Joey Hess
1fe9cf7043
deal with ignoreinode config setting
Improve handling of directory special remotes with importtree=yes whose
ignoreinode setting has been changed. (By either enableremote or by
upgrading to commit 3e2f1f73cbc5fc10475745b3c3133267bd1850a7.)

When getting a file from such a remote, accept the content that would have
been accepted with the previous ignoreinode setting.

After a change to ignoreinode, importing a tree from the remote will
re-import and generate new content identifiers using the new config. So
when ignoreinode has changed to no, the inodes will be learned, and after
that point, a change in an inode will be detected as a change. Before
re-importing, a change in an inode will be ignored, as it was before the
ignoreinode change. This seems acceptble, because the user can re-import
immediately if they urgently need to add inodes. And if not, they'll
do it sometime, presumably, and the change will take effect then.

Sponsored-by: Erik Bjäreholt on Patreon
2022-09-16 14:11:25 -04:00
Joey Hess
c62fe5e9a8
avoid redundant prompt for http password in git-annex get that does autoinit
autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git
remote has not been probed yet, that will do a http get that will prompt
for a password. And then the parent process will subsequently prompt
for a password when getting annexed files from the remote.

So the solution is for autoEnableSpecialRemotes to run remoteList before
the subprocess, which will probe for the uuid for the git remote in the
same process that will later be used to get annexed files.

But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git,
so Annex.Init cannot import Remote.List. Had to pass remoteList into
functions in Annex.Init to get around this dependency loop.
2022-09-09 14:43:43 -04:00
Joey Hess
8a4cfd4f2d
use getSymbolicLinkStatus not getFileStatus to avoid crash on broken symlink
Fix crash importing from a directory special remote that contains a broken
symlink.

The crash was in listImportableContentsM but some other places in
Remote.Directory also seemed like they could have the same problem.

Also audited for other places that have such a problem. Not all calls
to getFileStatus are bad, in some cases it's better to crash on something
unexpected. For example, `git-annex import path` when the path is a broken
symlink should crash, the same as when it does not exist. Many of the
getFileStatus calls are like that, particularly when they involve
.git/annex/objects which should never have a broken symlink in it.

Fixed a few other possible cases of the problem.

Sponsored-by: Lawrence Brogan on Patreon
2022-09-05 13:46:32 -04:00
Yaroslav Halchenko
0151976676
Typo fix unncessary -> unnecessary.
Detected while reading recent CHANGELOG entry but then decided to apply
to entire codebase and docs since why not?
2022-08-20 09:40:19 -04:00
Joey Hess
ed39979ac8
import: Avoid following symbolic links inside directories being imported
Too big a footgun.

This does not prevent attackers who can write to the directory being
imported from racing the check. But they can cause anything to be imported
anyway, so would be limited to getting the legacy import to follow into a
directory they do not write to, and move files out of it into the annex.
(The directory special remote does not have that problem since it does not
move files.)

Sponsored-by: Jack Hill on Patreon
2022-08-19 13:31:16 -04:00
Joey Hess
23c6e350cb
improve createDirectoryUnder to allow alternate top directories
This should not change the behavior of it, unless there are multiple top
directories, and then it should behave the same as if there was a single
top directory that was actually above the directory to be created.

Sponsored-by: Dartmouth College's Datalad project
2022-08-12 12:52:37 -04:00
Joey Hess
e60766543f
add annex.dbdir (WIP)
WIP: This is mostly complete, but there is a problem: createDirectoryUnder
throws an error when annex.dbdir is set to outside the git repo.

annex.dbdir is a workaround for filesystems where sqlite does not work,
due to eg, the filesystem not properly supporting locking.

It's intended to be set before initializing the repository. Changing it
in an existing repository can be done, but would be the same as making a
new repository and moving all the annexed objects into it. While the
databases get recreated from the git-annex branch in that situation, any
information that is in the databases but not stored in the branch gets
lost. It may be that no information ever gets stored in the databases
that cannot be reconstructed from the branch, but I have not verified
that.

Sponsored-by: Dartmouth College's Datalad project
2022-08-11 16:58:53 -04:00
Joey Hess
2c1288334d
Avoid running bup join concurrently with bup split
On the bup mailing list, this was hypothesized as also being a
concurrency problem.

Sponsored-by: Svenne Krap on Patreon
2022-08-09 10:40:45 -04:00
Joey Hess
abd417d4fe
Avoid running multiple bup split processes concurrently
Since bup split is not concurrency safe.

Used a lock file so that 2 git-annex processes only run one bup split
between them (per bup repo).

(Concurrent writes from different git-annex repository clones to the same
bup repo could still have concurrency problems.)

Sponsored-by: Noam Kremen on Patreon
2022-08-08 18:54:06 -04:00
Joey Hess
04247fb4d0
avoid surprising "not found" error when copying to a http remote
git-annex copy --to a http remote will of course fail, as that's not
supported. But git-annex copy first checks if the content is already
present in the remote, and that threw a "not found".

Looks to me like other remotes that use Url.checkBoth in their checkPresent
do just return false when it fails. And Url.checkBoth does display
errors when unusual errors occur. So I'm pretty sure removing this error
message is ok.

Sponsored-by: Jarkko Kniivilä on Patreon
2022-08-08 11:57:24 -04:00
Joey Hess
5bc70e2da5
When bup split fails, display its stderr
It seems worth noting here that I emailed bup's author about bup split
being noisy on stderr even with -q in approximately 2011. That never got
fixed. Its current repo on github only accepts pull requests, not bug
reports. Needing to add such complexity to deal with such a longstanding
unfixed issue is not fun.

Sponsored-by: Kevin Mueller on Patreon
2022-08-05 13:57:20 -04:00
Joey Hess
f94908f2a6
improve output when storing to bup
bup split outputs to stderr even with -q. This was discarded when using -J,
but it was still outputting when not using -J, and so was git-annex.

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-08-05 12:29:33 -04:00
Joey Hess
093ad89ead
S3: Avoid writing or checking the uuid file in the S3 bucket when importtree=yes or exporttree=yes
It does not make sense for either; importing from an existing bucket should
not write to it. And the user may not have write access at all. And exporting to
a bucket should not write other files.

Also this prevents the uuid file being imported after being written.

Sponsored-by: Dartmouth College's DANDI project
2022-07-14 15:05:51 -04:00
Joey Hess
50c2cac7e7
adb: Added configuration setting oldandroid=true
To avoid using find -printf, which was first supported in Android around
2019-2020.

Probing seems too fragile, and execing stat once per file is too slow to do
when there's a faster way available, which brought me to an option...

Sponsored-by: Brett Eisenberg on Patreon
2022-07-13 18:00:47 -04:00
Joey Hess
2d65c4ff1d
avoid unix-compat's rename
On Windows, that does not support long paths
https://github.com/jacobstanley/unix-compat/issues/56

Instead, use System.Directory.renamePath, which does support long paths.

Sponsored-by: Dartmouth College's Datalad project
2022-07-12 14:55:02 -04:00
Joey Hess
21c50c0f72
fix parallel copy from/to a local git repo
Improve handling of parallelization with -J when copying content from/to a
git remote that is a local path.

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-06-29 12:40:12 -04:00
Joey Hess
cb9cf30c48
move several readonly values to AnnexRead
This improves performance to a small extent in several places.

Sponsored-by: Tobias Ammann on Patreon
2022-06-28 15:40:19 -04:00
Joey Hess
debcf86029
use RawFilePath version of rename
Some small wins, almost certianly swamped by the system calls, but still
worthwhile progress on the RawFilePath conversion.

Sponsored-by: Erik Bjäreholt on Patreon
2022-06-22 16:47:34 -04:00
Joey Hess
95a04920cf
remove objectDir' 2022-06-22 16:08:49 -04:00
Joey Hess
13fc6a9b6a
fix to use 1 chunk for empty file
Fix retrival of an empty file that is stored in a special remote with
chunking enabled.

The speculative chunk stuff caused a reversion by adding an empty list for
the empty file. Which is just wrong; the empty file is still stored on the
remote, and should be retrieved like any other file. It uses 1 chunk, so
`max 1` is the simple fix.

Sponsored-by: Noam Kremen on Patreon
2022-06-09 14:24:56 -04:00
Joey Hess
f30532614f
fix typo 2022-06-09 13:40:05 -04:00
Joey Hess
14584e7a38
initremote type=git probe uuid
rather than matching path of an existing remote to find the uuid.

The main benefit of this is that locations not using ssh:// will work
now, including both paths and host:/path

The other benefit is that it's a simpler interface, no need to have an
existing remote with the same url and some other name. Although that
will still work of course.

This does rely on tryGitConfigRead working when given a Git.Repo that is
not a remote. Luckily, it works fine that way.

Also, tryGitConfigRead will auto-init a local repo that has a git-annex
branch. I did not enable auto-init of ssh repos though.

The uuid discovery actually happens twice; initremote discovers it,
and uses it to store the special remote config, but does not set it in the
git remote it creates. So the next run of git-annex does uuid discovery
again, and caches it that time. This could be improved for a tiny
speedup, but I didn't want to complicate things for that in this
commit.

Sponsored-by: Dartmouth College's DANDI project
2022-06-09 13:16:50 -04:00
Joey Hess
54809e9eb3
fix untrustworthiness of import/export remotes
Commit 36133f27c0 had a boolean flip in it,
aaargh.

Special remotes with importtree=yes or exporttree=yes are once again
treated as untrusted, since files stored in them can be deleted or modified
at any time.

Sponsored-by: Kevin Mueller on Patreon
2022-05-09 15:53:23 -04:00
Joey Hess
e8a601aa24
incremental verification for retrieval from import remotes
Sponsored-by: Dartmouth College's Datalad project
2022-05-09 15:39:43 -04:00
Joey Hess
2f2701137d
incremental verification for retrieval from all export remotes
Only for export remotes so far, not export/import.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 13:49:33 -04:00
Joey Hess
90950a37e5
support incremental verification when retrieving from export/import remotes
None of the special remotes do it yet, but this lays the groundwork.

Added MustFinishIncompleteVerify so that, when an incremental verify is
started but not complete, it can be forced to finish it. Otherwise, it
would have skipped doing it when verification is disabled, but
verification must always be done when retrievin from export remotes
since files can be modified during retrieval.

Note that retrieveExportWithContentIdentifier doesn't support incremental
verification yet. And I'm not sure if it can -- it doesn't know the Key
before it downloads the content. It seems a new API call would need to
be split out of that, which is provided with the key.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 12:25:04 -04:00
Joey Hess
43701759a3
disable shellescape for rsync 3.2.4
rsync 3.2.4 broke backwards-compatability by preventing exposing filenames
to the shell. Made the rsync and gcrypt special remotes detect this and
disable shellescape.

An alternative fix would have been to always set RSYNC_OLD_ARGS=1.
Which would avoid the overhead of probing rsync --help for each affected
remote. But that is really very fast to run, and it seemed better to switch
to the modern code path rather than keeping on using the bad old code path.

Sponsored-by: Tobias Ammann on Patreon
2022-05-03 12:12:41 -04:00
Joey Hess
3e2f1f73cb
add back inode to directory special remote ContentIdentifier
Directory special remotes with importtree=yes have changed to once more
take inodes into account. This will cause extra work when importing from a
directory on a FAT filesystem that changes inodes on every mount.

To avoid that extra work, set ignoreinodes=yes when initializing a new
directory special remote, or change the configuration of your existing
remote: git-annex enableremote foo ignoreinodes=yes

This will mean a one-time re-import of all contents from every directory
special remote due to the changed setting.

73df633a62 thought
it was too unlikely that there would be modifications that the inode number
was needed to notice. That was probably right; it's very unlikely that a
file will get modified and end up with the same size and mtime as before.
But, what was not considered is that a program like NextCloud might write
two files with different content so closely together that they share the
mtime. The inode is necessary to detect that situation.

Sponsored-by: Max Thoursie on Patreon
2022-03-21 13:12:02 -04:00
Joey Hess
952664641a
turn of PackageImports in cabal file
This makes it easier to build eg benchmarks of individual modules.

May be that most of these PackageImports are not really necessary,
dunno.
2022-02-25 13:16:36 -04:00
Joey Hess
a32ff6cef0
adb: Avoid find failing with "Argument list too long"
The "+" argument only runs the command once, so is not safe to use. Using
";" instead would have been the simplest fix, but also the slowest.

Since my phone has an xargs that supports -0, I piped find to xargs
instead. Unsure how portable this will be, perhaps some android's don't
have xargs -0 or find -printf to send null terminated output.

The business with pipefail is necessary to make a failure of find cause the
import to fail. Probably this works on all androids, but if not, it will
probably just result in a failure of find being ignored. It would be
possible to make ignorefinderror just disable setting pipefail, but then
if some android has a shell that has pipefail enabled by default, ignorefinderror
would not work, so I kept the || true approach for that.

Sponsored-by: Max Thoursie on Patreon
2022-01-31 13:19:09 -04:00
Joey Hess
525473aa5a
adb: Added ignorefinderror configuration parameter
On a phone with Calyxos, adb find in /sdcard complains:

find: ./Android/data/com.android.providers.downloads.ui: Permission denied

But otherwise works, so this option makes import and export work ok, except
for that one app's data.

Sponsored-by: Graham Spencer
2022-01-10 21:17:00 -04:00
Joey Hess
e95747a149
fix handling of corrupted data received from git remote
Recover from corrupted content being received from a git remote due eg to a
wire error, by deleting the temporary file when it fails to verify. This
prevents a retry from failing again.

Reversion introduced in version 8.20210903, when incremental verification
was added.

Only the git remote seems to be affected, although it is certianly
possible that other remotes could later have the same issue. This only
affects things passed to getViaTmp that return (False, UnVerified) due to
verification failing. As far as getViaTmp can tell, that could just as well
mean that the transfer failed in a way that would resume, so it cannot
delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally,
while other remotes do not, which is why only it seems affected.

A better fix perhaps would be to improve the types of the callback
passed to getViaTmp, so that some other value could be used to indicate
the state where the transfer succeeded but verification failed.

Sponsored-by: Boyd Stephen Smith Jr.
2022-01-07 13:25:33 -04:00
Joey Hess
0584e096d1
comment 2022-01-03 13:53:34 -04:00
Joey Hess
19b87f7396
avoid no longer necessary piping of ssh stderr for p2pstdio
This was needed when supporting old git-annex-shell that do not support
p2pstdio yet, in order to cleanly fall back to the old interface without
error messages being displayed. That is no longer supported, so simplify
to not intercept error messages.

Sponsored-by: Dartmouth College's DANDI project
2022-01-03 12:54:40 -04:00
Joey Hess
92cc28c316
remove obsolete comment 2022-01-03 12:38:29 -04:00
Joey Hess
4b19626a36
Fix build with ghc 9.0.1
Continuing along the same lines as commit
2739adc258, it seems that
while Remote -> Retriever expands to the same data type this changes
it to, ghc 9.0.1 refuses to consider them equiviant. I guess it has
something to do with the forall?

The rest of the build all succeeds, although the stack build then crashes:
Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.4.0.0/build/git-annex/git-annex ...
Completed 233 action(s).
Prelude.chr: bad argument: 2214592520
This issue seems likely to be about it:
https://github.com/commercialhaskell/stack/pull/5508
I'm building with stack from debian, version 2.3.3, so a newer stack
probably avoids that. Anyway, despite that stack problem,
the git-annex binary is built, and works.

The stack.yaml I used for this build was patched as follows:

diff --git a/stack.yaml b/stack.yaml
index 8dac87c15..62c4b5b9d 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -1,6 +1,6 @@
 flags:
   git-annex:
-    production: true
+    production: false
     assistant: true
     pairing: true
     torrentparser: true
@@ -14,7 +14,7 @@ flags:
     httpclientrestricted: true
 packages:
 - '.'
-resolver: lts-18.13
+resolver: nightly-2021-09-07
 extra-deps:
 - IfElse-0.85
 - aws-0.22

Sponsored-by: Graham Spencer on Patreon
2021-12-08 15:08:02 -04:00
Joey Hess
f3326b8b5a
git-lfs gitlab interoperability fix
git-lfs: Fix interoperability with gitlab's implementation of the git-lfs
protocol, which requests Content-Encoding chunked.

Sponsored-by: Dartmouth College's Datalad project
2021-11-10 13:51:11 -04:00
Joey Hess
8034f2e9bb
factor out IncrementalHasher from IncrementalVerifier 2021-11-09 12:33:22 -04:00
Joey Hess
29d687dce9
When retrival from a chunked remote fails, display the error that occurred when downloading the chunk
Rather than the error that occurred when trying to download the unchunked
content, which is less likely to actually be stored in the remote.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2021-10-14 12:45:05 -04:00
Joey Hess
17a0fa3dbc
negotiate P2P protocol version for tor remotes
This negotiation is not supported by versions of git-annex older
than 6.20180312. Well, maybe really 6.20180227 or so, but using that in
the changelog simplifies things since it was the version for the other
changes as well.

See commit c81768d425 for the back story.

As well as allowing for future protocol improvements, this will result
in negoatiating protocol version 1, which is an improvement over default
version 0.

In fact, it looks like no supported version of git-annex will use
protocol version 0, since version 1 was introduced in 6.20180227.
Still, removing the code for version 0 seems unncessary.
See commit 31e1adc005.

Sponsored-by: Brett Eisenberg on Patreon.
2021-10-11 15:58:51 -04:00