Commit graph

1572 commits

Author SHA1 Message Date
Joey Hess
2807ab0a09
gcrypt: Remove empty hash directories when dropping content
As was recently done with the directory special remote.

Note that the top directory passed to removeDirGeneric was changed to
avoid deleting .git/annex or .git/annex/objects if they ended up empty.

Sponsored-by: Brett Eisenberg on Patreon
2023-07-21 16:04:11 -04:00
Joey Hess
b15366494a
directory: Remove empty hash directories when dropping content
Failure to remove is not treated as a problem, and no permissions
modifications are done, to avoid unexpected states.

Sponsored-by: Luke Shumaker on Patreon
2023-07-21 14:57:29 -04:00
Joey Hess
33ba537728
deal with Amazon S3 breaking change for public=yes
* S3: Amazon S3 buckets created after April 2023 do not support ACLs,
  so public=yes cannot be used with them. Existing buckets configured
  with public=yes will keep working.
* S3: Allow setting publicurl=yes without public=yes, to support
  buckets that are configured with a Bucket Policy that allows public
  access.

Sponsored-by: Joshua Antonishen on Patreon
2023-07-21 13:59:07 -04:00
Joey Hess
6e059b42da
fix build warnings 2023-06-21 13:06:11 -04:00
Joey Hess
a861d56428
httpalso: Support being used with special remotes that use chunking.
Sponsored-by: k0ld on Patreon
2023-06-20 13:35:28 -04:00
Joey Hess
fe1b2dfb4b
speed up very first tree import by 25%
Reading from the cidsdb is responsible for about 25% of the runtime of
an import. Since the cidmap is used to store the same information in
ram, the cidsdb is not written to during an import any longer. And so,
if it started off empty (and updateFromLog wasn't needed), those reads
can just be skipped.

This is kind of a cheesy optimisation, since after any import from any
special remote, the database will no longer be empty, so it's a single
use optimisation. But it's probably not uncommon to start by importing a
lot of files, and it can save a lot of time then.

Sponsored-by: Brock Spratlen on Patreon
2023-06-02 13:30:30 -04:00
Joey Hess
aff37fc208
avoid annexFileMode special case
This makes annexFileMode be just an application of setAnnexPerm',
which avoids having 2 functions that do different versions of the same
thing.

Fixes some buggy behavior for some combinations of core.sharedRepository
and umask.

Sponsored-by: Jack Hill on Patreon
2023-04-27 15:58:37 -04:00
Joey Hess
d11b3bc1af
Honor --force option when operating on a local git remote
Propagate Annex.force into the remote's Annex state.

Fixes this problem:

joey@darkstar:~/tmp/xxxx>git-annex copy mmm --to origin --force
copy mmm (to origin...)
  not enough free space, need 908.72 MB more (use --force to override this check or adjust annex.diskreserve)

  failed to send content to remote
failed

Does beg the question if anything else should be propagated.
Some things like Annex.forcenumcopies certianly not; using --numcopies
overrides the number of copies the current repo wants, not all of them.

Sponsored-by: Graham Spencer on Patreon
2023-04-19 12:53:58 -04:00
Joey Hess
8728695b9c
support enableremote of git repo changing eg autoenable=
enableremote: Support enableremote of a git remote (that was previously set
up with initremote) when additional parameters such as autoenable= are
passed.

The enableremote special case for regular git repos is intended to handle
ones that don't have a UUID probed, and the user wants git-annex to
re-probe. So, that special case is still needed. But, in that special
case, the user is not passing any extra parameters. So, when there are
parameters, instead run the special remote setup code. That requires there
to be a uuid known already, and it allows changing things like autoenable=

Remote.Git.enableRemote changed to be a no-op if a git remote with the name
already exists. Which it generally will in this case.

Sponsored-by: Jack Hill on Patreon
2023-04-18 14:00:24 -04:00
Joey Hess
8b6c7bdbcc
filter out control characters in all other Messages
This does, as a side effect, make long notes in json output not
be indented. The indentation is only needed to offset them
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brock Spratlen on Patreon
2023-04-11 12:58:01 -04:00
Joey Hess
3290a09a70
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.

Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.

When json is being output, no quoting is done, since json gets its own
quoting.

This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 15:55:44 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Joey Hess
18d326cb6f
external protocol VERSION 2
Support VERSION 2 in the external special remote protocol, which is
identical to VERSION 1, but avoids external remote programs neededing to
work around the above bug. External remote program that support
exporttree=yes are recommended to be updated to send VERSION 2.

Sponsored-by: Kevin Mueller on Patreon
2023-03-28 17:00:08 -04:00
Joey Hess
02662f5292
fix concurrency bug causing EXPORT to be sent to the wrong external
Fix bug that caused broken protocol to be used with external remotes that
use exporttree=yes. In some cases this could result in the wrong content
being exported to, or retrieved from the remote.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-03-28 15:21:10 -04:00
Joey Hess
24ae4b291c
addurl, importfeed: Fix failure when annex.securehashesonly is set
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.

Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.

Sponsored-by: unqueued on Patreon
2023-03-27 15:10:46 -04:00
Joey Hess
cd076cd085
Windows: Support urls like "file:///c:/path"
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.

Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.

An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.

It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-03-27 13:38:02 -04:00
Joey Hess
a0badc5069
sync: Fix parsing of gcrypt::rsync:// urls that use a relative path
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.

(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)

Sponsored-by: Jack Hill on Patreon
2023-03-23 15:20:00 -04:00
Joey Hess
e822df2a09
fix build warnings on windows 2023-03-21 18:41:23 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Yaroslav Halchenko
0ae5ff797f
Typo: sansative -> sensitive 2023-03-17 15:14:50 -04:00
Yaroslav Halchenko
e018ae1125
Fix ambigous typos 2023-03-17 15:14:47 -04:00
Joey Hess
398633c12b
fix build on windows 2023-03-03 12:58:39 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
672258c8f4
Revert "revert recent bug fix temporarily for release"
This reverts commit 16f1e24665.
2023-02-14 14:11:23 -04:00
Joey Hess
16f1e24665
revert recent bug fix temporarily for release
Decided this bug is not severe enough to delay the release until
tomorrow, so this will be re-applied after the release.
2023-02-14 14:06:29 -04:00
Joey Hess
c1ef4a7481
Avoid Git.Config.updateLocation adding "/.git" to the end of the repo
path to a bare repo when git config is not allowed to list the configs
due to the CVE-2022-24765 fix.

That resulted in a confusing error message, and prevented the nice
message that explains how to mark the repo as safe to use.

Made isBare a tristate so that the case where core.bare is not returned can
be handled.

The handling in updateLocation is to check if the directory
contains config and objects and if so assume it's bare.
Note that if that heuristic is somehow wrong, it would construct a repo
that thinks it's bare but is not. That could cause follow-on problems,
but since git-annex then checks checkRepoConfigInaccessible, and skips
using the repo anyway, a wrong guess should not be a problem.

Sponsored-by: Luke Shumaker on Patreon
2023-02-14 14:00:36 -04:00
Joey Hess
e44716f277
upper case start of warning 2023-02-14 12:34:46 -04:00
Joey Hess
7c69f3a75a
add space after colon in warning message 2023-02-14 12:34:07 -04:00
Joey Hess
143de24769
fix build warning 2023-02-06 16:30:50 -04:00
Joey Hess
04ec726d3b
S3 region=
S3: Support a region= configuration useful for some non-Amazon S3
implementations. This feature needs git-annex to be built with aws-0.24.

datacenter= sets both the AWS hostname and region in one setting, which is
easy when using AWS, but not useful for other hosts. So kept datacenter
as-is, but added this additional config.

Sponsored-By: Brett Eisenberg on Patreon
2023-02-06 14:08:45 -04:00
Joey Hess
3f5d8e2211
correct obsolete comment 2023-01-31 14:42:26 -04:00
Joey Hess
cfaae7e931
added an optional cost= configuration to all special remotes
Note that when this is specified and an older git-annex is used to
enableremote such a special remote, it will simply ignore the cost= field
and use whatever the default cost is.

In passing, fixed adb to support the remote.name.cost and
remote.name.cost-command configs.

Sponsored-by: Dartmouth College's DANDI project
2023-01-12 13:42:28 -04:00
Joey Hess
400ce29a94
remove a debug print and fix build 2023-01-12 13:18:25 -04:00
Joey Hess
8a305e5fa3
respect urlinclude/urlexclude of other web special remotes
When a web special remote does not have urlinclude/urlexclude
configured, make it respect the configuration of other web special
remotes and avoid using urls that match the config of another.

Note that the other web special remote does not have to be enabled.
That seems ok, it would have been extra work to check for only ones that
are enabled.

The implementation does mean that the web special remote re-parses
its own config once at startup, as well as re-parsing the configs of any
other web special remotes. This should be a very small slowdown
unless there are lots of web special remotes.

Sponsored-by: Dartmouth College's DANDI project
2023-01-10 14:58:53 -04:00
Joey Hess
6fa166e1fc
web: Add urlinclude and urlexclude configuration settings
Sponsored-by: Dartmouth College's DANDI project
2023-01-09 17:16:53 -04:00
Joey Hess
8d06930c88
web special remote is no longer a singleton
Allow initremote of additional special remotes with type=web, in addition
to the default web special remote.

When --sameas=web is used, these provide additional names for the web
special remote, and may also have their own additional configuration
(once there is any for the web special remote) and cost.

Sponsored-by: Dartmouth College's DANDI project
2023-01-09 15:49:20 -04:00
Joey Hess
f316b7f105
Revert "Removed the vendored git-lfs and the GitLfs build flag"
This reverts commit efda811404.

Turns out that datalad is building git-annex against debian bullseye.
https://github.com/datalad/git-annex/issues/149
2023-01-04 17:33:29 -04:00
Joey Hess
efda811404
Removed the vendored git-lfs and the GitLfs build flag
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.

Debian stable does have a too old haskell-git-lfs package to be able to
build git-annex from source, but there is not currently a backport of a
recent git-annex to Debian stable. And if they update the backport at some
point, they should be able to backport the library too.

Sponsored-by: Svenne Krap on Patreon
2022-12-26 12:49:53 -04:00
Joey Hess
9d60385001
convert renameFile to moveFile to support cross-device moves
Improve handling of some .git/annex/ subdirectories being on other
filesystems, in the bittorrent special remote, and youtube-dl integration,
and git-annex addurl.

The only one of these that I've confirmed to be a problem is in the
bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are
on different filesystems.

As well as auditing for renameFile, also audited for createLink, all of
those are ok as are the other remaining renameFile calls. Also audited all
code paths that use .git/annex/othertmp, and did not find any other
cross-device problems. So, removing mention of othertmp needing to be on
the same device.

Sponsored-by: Dartmouth College's Datalad project
2022-12-20 15:17:50 -04:00
Joey Hess
27444459e9
fix build warning 2022-11-09 15:33:46 -04:00
Joey Hess
c69d340ce5
remove dangling where clause 2022-11-09 13:25:05 -04:00
Joey Hess
e100993935
complete support for S3 signature=anonymous
aws-0.23 has been released.

When built with an older aws, initremote will error out when run
with signature=anonymous. And when a remote has been initialized with
that by a version of git-annex that does support it, older versions will
fail when the remote is accessed, with a useful error message.

Sponsored-by: Dartmouth College's DANDI project
2022-11-04 16:20:28 -04:00
Joey Hess
de1e8201a6
Merge branch 'master' into anons3 2022-11-04 15:08:29 -04:00
Joey Hess
ba7ecbc6a9
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)

Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.

In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.

In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.

The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.

After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?

Sponsored-by: Dartmouth College's Datalad project
2022-10-12 14:12:23 -04:00
Joey Hess
b4305315b2
S3: pass fileprefix into getBucket calls
S3: Speed up importing from a large bucket when fileprefix= is set by only
asking for files under the prefix.

getBucket still returns the files with the prefix included, so the rest of
the fileprefix stripping still works unchanged.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:37:26 -04:00
Joey Hess
ca91c3ba91
S3: Support signature=anonymous to access a S3 bucket anonymously
This can be used, for example, with importtree=yes to import from a public
bucket.

This needs a patch that has not yet landed in the aws library, and will
need to be adjusted to support compiling with old versions of the library,
so is not yet suitable for merging.
See https://github.com/aristidb/aws/pull/281

The stack.yaml changes are provided to show how to build against the aws
fork and will need to be reverted as well.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:02:45 -04:00
Joey Hess
90f9671e00
future proof AWS.Credentials generation
Avoid breaking when a field is added to the constructor.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 16:33:21 -04:00
Joey Hess
0756f4453d
try retrieval from more than one export location when the first fails
Combined with commit 0ffc59d341, this
fixes the case where there are duplicate files on the special remote,
and the first gets modified/deleted, while the second is still present.

directory, adb: Fixed a bug when importtree=yes, and multiple files in the
special remote have the same content, that caused it to refuse to get a
file from the special remote, incorrectly complaining that it had changed,
due to only accepting the inode+mtime of one file (that was since modified
or deleted) and not accepting the inode+mtime of other duplicate files.

Sponsored-by: Max Thoursie on Patreon
2022-09-20 13:33:57 -04:00
Joey Hess
0ffc59d341
change retrieveExportWithContentIdentifier to take a list of ContentIdentifier
This partly fixes an issue where there are duplicate files in the
special remote, and the first file gets swapped with another duplicate,
or deleted. The swap case is fixed by this, the deleted case will need
other changes.

This makes retrieveExportWithContentIdentifier take a list of allowed
ContentIdentifier, same as storeExportWithContentIdentifier,
removeExportWithContentIdentifier, and
checkPresentExportWithContentIdentifier.

Of the special remotes that support importtree, borg is a special case
and does not use content identifiers, S3 I assume can't get mixed up
like this, directory certainly has the problem, and adb also appears to
have had the problem.

Sponsored-by: Graham Spencer on Patreon
2022-09-20 13:19:42 -04:00
Joey Hess
1fe9cf7043
deal with ignoreinode config setting
Improve handling of directory special remotes with importtree=yes whose
ignoreinode setting has been changed. (By either enableremote or by
upgrading to commit 3e2f1f73cbc5fc10475745b3c3133267bd1850a7.)

When getting a file from such a remote, accept the content that would have
been accepted with the previous ignoreinode setting.

After a change to ignoreinode, importing a tree from the remote will
re-import and generate new content identifiers using the new config. So
when ignoreinode has changed to no, the inodes will be learned, and after
that point, a change in an inode will be detected as a change. Before
re-importing, a change in an inode will be ignored, as it was before the
ignoreinode change. This seems acceptble, because the user can re-import
immediately if they urgently need to add inodes. And if not, they'll
do it sometime, presumably, and the change will take effect then.

Sponsored-by: Erik Bjäreholt on Patreon
2022-09-16 14:11:25 -04:00
Joey Hess
c62fe5e9a8
avoid redundant prompt for http password in git-annex get that does autoinit
autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git
remote has not been probed yet, that will do a http get that will prompt
for a password. And then the parent process will subsequently prompt
for a password when getting annexed files from the remote.

So the solution is for autoEnableSpecialRemotes to run remoteList before
the subprocess, which will probe for the uuid for the git remote in the
same process that will later be used to get annexed files.

But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git,
so Annex.Init cannot import Remote.List. Had to pass remoteList into
functions in Annex.Init to get around this dependency loop.
2022-09-09 14:43:43 -04:00
Joey Hess
8a4cfd4f2d
use getSymbolicLinkStatus not getFileStatus to avoid crash on broken symlink
Fix crash importing from a directory special remote that contains a broken
symlink.

The crash was in listImportableContentsM but some other places in
Remote.Directory also seemed like they could have the same problem.

Also audited for other places that have such a problem. Not all calls
to getFileStatus are bad, in some cases it's better to crash on something
unexpected. For example, `git-annex import path` when the path is a broken
symlink should crash, the same as when it does not exist. Many of the
getFileStatus calls are like that, particularly when they involve
.git/annex/objects which should never have a broken symlink in it.

Fixed a few other possible cases of the problem.

Sponsored-by: Lawrence Brogan on Patreon
2022-09-05 13:46:32 -04:00
Yaroslav Halchenko
0151976676
Typo fix unncessary -> unnecessary.
Detected while reading recent CHANGELOG entry but then decided to apply
to entire codebase and docs since why not?
2022-08-20 09:40:19 -04:00
Joey Hess
ed39979ac8
import: Avoid following symbolic links inside directories being imported
Too big a footgun.

This does not prevent attackers who can write to the directory being
imported from racing the check. But they can cause anything to be imported
anyway, so would be limited to getting the legacy import to follow into a
directory they do not write to, and move files out of it into the annex.
(The directory special remote does not have that problem since it does not
move files.)

Sponsored-by: Jack Hill on Patreon
2022-08-19 13:31:16 -04:00
Joey Hess
23c6e350cb
improve createDirectoryUnder to allow alternate top directories
This should not change the behavior of it, unless there are multiple top
directories, and then it should behave the same as if there was a single
top directory that was actually above the directory to be created.

Sponsored-by: Dartmouth College's Datalad project
2022-08-12 12:52:37 -04:00
Joey Hess
e60766543f
add annex.dbdir (WIP)
WIP: This is mostly complete, but there is a problem: createDirectoryUnder
throws an error when annex.dbdir is set to outside the git repo.

annex.dbdir is a workaround for filesystems where sqlite does not work,
due to eg, the filesystem not properly supporting locking.

It's intended to be set before initializing the repository. Changing it
in an existing repository can be done, but would be the same as making a
new repository and moving all the annexed objects into it. While the
databases get recreated from the git-annex branch in that situation, any
information that is in the databases but not stored in the branch gets
lost. It may be that no information ever gets stored in the databases
that cannot be reconstructed from the branch, but I have not verified
that.

Sponsored-by: Dartmouth College's Datalad project
2022-08-11 16:58:53 -04:00
Joey Hess
2c1288334d
Avoid running bup join concurrently with bup split
On the bup mailing list, this was hypothesized as also being a
concurrency problem.

Sponsored-by: Svenne Krap on Patreon
2022-08-09 10:40:45 -04:00
Joey Hess
abd417d4fe
Avoid running multiple bup split processes concurrently
Since bup split is not concurrency safe.

Used a lock file so that 2 git-annex processes only run one bup split
between them (per bup repo).

(Concurrent writes from different git-annex repository clones to the same
bup repo could still have concurrency problems.)

Sponsored-by: Noam Kremen on Patreon
2022-08-08 18:54:06 -04:00
Joey Hess
04247fb4d0
avoid surprising "not found" error when copying to a http remote
git-annex copy --to a http remote will of course fail, as that's not
supported. But git-annex copy first checks if the content is already
present in the remote, and that threw a "not found".

Looks to me like other remotes that use Url.checkBoth in their checkPresent
do just return false when it fails. And Url.checkBoth does display
errors when unusual errors occur. So I'm pretty sure removing this error
message is ok.

Sponsored-by: Jarkko Kniivilä on Patreon
2022-08-08 11:57:24 -04:00
Joey Hess
5bc70e2da5
When bup split fails, display its stderr
It seems worth noting here that I emailed bup's author about bup split
being noisy on stderr even with -q in approximately 2011. That never got
fixed. Its current repo on github only accepts pull requests, not bug
reports. Needing to add such complexity to deal with such a longstanding
unfixed issue is not fun.

Sponsored-by: Kevin Mueller on Patreon
2022-08-05 13:57:20 -04:00
Joey Hess
f94908f2a6
improve output when storing to bup
bup split outputs to stderr even with -q. This was discarded when using -J,
but it was still outputting when not using -J, and so was git-annex.

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-08-05 12:29:33 -04:00
Joey Hess
093ad89ead
S3: Avoid writing or checking the uuid file in the S3 bucket when importtree=yes or exporttree=yes
It does not make sense for either; importing from an existing bucket should
not write to it. And the user may not have write access at all. And exporting to
a bucket should not write other files.

Also this prevents the uuid file being imported after being written.

Sponsored-by: Dartmouth College's DANDI project
2022-07-14 15:05:51 -04:00
Joey Hess
50c2cac7e7
adb: Added configuration setting oldandroid=true
To avoid using find -printf, which was first supported in Android around
2019-2020.

Probing seems too fragile, and execing stat once per file is too slow to do
when there's a faster way available, which brought me to an option...

Sponsored-by: Brett Eisenberg on Patreon
2022-07-13 18:00:47 -04:00
Joey Hess
2d65c4ff1d
avoid unix-compat's rename
On Windows, that does not support long paths
https://github.com/jacobstanley/unix-compat/issues/56

Instead, use System.Directory.renamePath, which does support long paths.

Sponsored-by: Dartmouth College's Datalad project
2022-07-12 14:55:02 -04:00
Joey Hess
21c50c0f72
fix parallel copy from/to a local git repo
Improve handling of parallelization with -J when copying content from/to a
git remote that is a local path.

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-06-29 12:40:12 -04:00
Joey Hess
cb9cf30c48
move several readonly values to AnnexRead
This improves performance to a small extent in several places.

Sponsored-by: Tobias Ammann on Patreon
2022-06-28 15:40:19 -04:00
Joey Hess
debcf86029
use RawFilePath version of rename
Some small wins, almost certianly swamped by the system calls, but still
worthwhile progress on the RawFilePath conversion.

Sponsored-by: Erik Bjäreholt on Patreon
2022-06-22 16:47:34 -04:00
Joey Hess
95a04920cf
remove objectDir' 2022-06-22 16:08:49 -04:00
Joey Hess
13fc6a9b6a
fix to use 1 chunk for empty file
Fix retrival of an empty file that is stored in a special remote with
chunking enabled.

The speculative chunk stuff caused a reversion by adding an empty list for
the empty file. Which is just wrong; the empty file is still stored on the
remote, and should be retrieved like any other file. It uses 1 chunk, so
`max 1` is the simple fix.

Sponsored-by: Noam Kremen on Patreon
2022-06-09 14:24:56 -04:00
Joey Hess
f30532614f
fix typo 2022-06-09 13:40:05 -04:00
Joey Hess
14584e7a38
initremote type=git probe uuid
rather than matching path of an existing remote to find the uuid.

The main benefit of this is that locations not using ssh:// will work
now, including both paths and host:/path

The other benefit is that it's a simpler interface, no need to have an
existing remote with the same url and some other name. Although that
will still work of course.

This does rely on tryGitConfigRead working when given a Git.Repo that is
not a remote. Luckily, it works fine that way.

Also, tryGitConfigRead will auto-init a local repo that has a git-annex
branch. I did not enable auto-init of ssh repos though.

The uuid discovery actually happens twice; initremote discovers it,
and uses it to store the special remote config, but does not set it in the
git remote it creates. So the next run of git-annex does uuid discovery
again, and caches it that time. This could be improved for a tiny
speedup, but I didn't want to complicate things for that in this
commit.

Sponsored-by: Dartmouth College's DANDI project
2022-06-09 13:16:50 -04:00
Joey Hess
54809e9eb3
fix untrustworthiness of import/export remotes
Commit 36133f27c0 had a boolean flip in it,
aaargh.

Special remotes with importtree=yes or exporttree=yes are once again
treated as untrusted, since files stored in them can be deleted or modified
at any time.

Sponsored-by: Kevin Mueller on Patreon
2022-05-09 15:53:23 -04:00
Joey Hess
e8a601aa24
incremental verification for retrieval from import remotes
Sponsored-by: Dartmouth College's Datalad project
2022-05-09 15:39:43 -04:00
Joey Hess
2f2701137d
incremental verification for retrieval from all export remotes
Only for export remotes so far, not export/import.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 13:49:33 -04:00
Joey Hess
90950a37e5
support incremental verification when retrieving from export/import remotes
None of the special remotes do it yet, but this lays the groundwork.

Added MustFinishIncompleteVerify so that, when an incremental verify is
started but not complete, it can be forced to finish it. Otherwise, it
would have skipped doing it when verification is disabled, but
verification must always be done when retrievin from export remotes
since files can be modified during retrieval.

Note that retrieveExportWithContentIdentifier doesn't support incremental
verification yet. And I'm not sure if it can -- it doesn't know the Key
before it downloads the content. It seems a new API call would need to
be split out of that, which is provided with the key.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 12:25:04 -04:00
Joey Hess
43701759a3
disable shellescape for rsync 3.2.4
rsync 3.2.4 broke backwards-compatability by preventing exposing filenames
to the shell. Made the rsync and gcrypt special remotes detect this and
disable shellescape.

An alternative fix would have been to always set RSYNC_OLD_ARGS=1.
Which would avoid the overhead of probing rsync --help for each affected
remote. But that is really very fast to run, and it seemed better to switch
to the modern code path rather than keeping on using the bad old code path.

Sponsored-by: Tobias Ammann on Patreon
2022-05-03 12:12:41 -04:00
Joey Hess
3e2f1f73cb
add back inode to directory special remote ContentIdentifier
Directory special remotes with importtree=yes have changed to once more
take inodes into account. This will cause extra work when importing from a
directory on a FAT filesystem that changes inodes on every mount.

To avoid that extra work, set ignoreinodes=yes when initializing a new
directory special remote, or change the configuration of your existing
remote: git-annex enableremote foo ignoreinodes=yes

This will mean a one-time re-import of all contents from every directory
special remote due to the changed setting.

73df633a62 thought
it was too unlikely that there would be modifications that the inode number
was needed to notice. That was probably right; it's very unlikely that a
file will get modified and end up with the same size and mtime as before.
But, what was not considered is that a program like NextCloud might write
two files with different content so closely together that they share the
mtime. The inode is necessary to detect that situation.

Sponsored-by: Max Thoursie on Patreon
2022-03-21 13:12:02 -04:00
Joey Hess
952664641a
turn of PackageImports in cabal file
This makes it easier to build eg benchmarks of individual modules.

May be that most of these PackageImports are not really necessary,
dunno.
2022-02-25 13:16:36 -04:00
Joey Hess
a32ff6cef0
adb: Avoid find failing with "Argument list too long"
The "+" argument only runs the command once, so is not safe to use. Using
";" instead would have been the simplest fix, but also the slowest.

Since my phone has an xargs that supports -0, I piped find to xargs
instead. Unsure how portable this will be, perhaps some android's don't
have xargs -0 or find -printf to send null terminated output.

The business with pipefail is necessary to make a failure of find cause the
import to fail. Probably this works on all androids, but if not, it will
probably just result in a failure of find being ignored. It would be
possible to make ignorefinderror just disable setting pipefail, but then
if some android has a shell that has pipefail enabled by default, ignorefinderror
would not work, so I kept the || true approach for that.

Sponsored-by: Max Thoursie on Patreon
2022-01-31 13:19:09 -04:00
Joey Hess
525473aa5a
adb: Added ignorefinderror configuration parameter
On a phone with Calyxos, adb find in /sdcard complains:

find: ./Android/data/com.android.providers.downloads.ui: Permission denied

But otherwise works, so this option makes import and export work ok, except
for that one app's data.

Sponsored-by: Graham Spencer
2022-01-10 21:17:00 -04:00
Joey Hess
e95747a149
fix handling of corrupted data received from git remote
Recover from corrupted content being received from a git remote due eg to a
wire error, by deleting the temporary file when it fails to verify. This
prevents a retry from failing again.

Reversion introduced in version 8.20210903, when incremental verification
was added.

Only the git remote seems to be affected, although it is certianly
possible that other remotes could later have the same issue. This only
affects things passed to getViaTmp that return (False, UnVerified) due to
verification failing. As far as getViaTmp can tell, that could just as well
mean that the transfer failed in a way that would resume, so it cannot
delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally,
while other remotes do not, which is why only it seems affected.

A better fix perhaps would be to improve the types of the callback
passed to getViaTmp, so that some other value could be used to indicate
the state where the transfer succeeded but verification failed.

Sponsored-by: Boyd Stephen Smith Jr.
2022-01-07 13:25:33 -04:00
Joey Hess
0584e096d1
comment 2022-01-03 13:53:34 -04:00
Joey Hess
19b87f7396
avoid no longer necessary piping of ssh stderr for p2pstdio
This was needed when supporting old git-annex-shell that do not support
p2pstdio yet, in order to cleanly fall back to the old interface without
error messages being displayed. That is no longer supported, so simplify
to not intercept error messages.

Sponsored-by: Dartmouth College's DANDI project
2022-01-03 12:54:40 -04:00
Joey Hess
92cc28c316
remove obsolete comment 2022-01-03 12:38:29 -04:00
Joey Hess
4b19626a36
Fix build with ghc 9.0.1
Continuing along the same lines as commit
2739adc258, it seems that
while Remote -> Retriever expands to the same data type this changes
it to, ghc 9.0.1 refuses to consider them equiviant. I guess it has
something to do with the forall?

The rest of the build all succeeds, although the stack build then crashes:
Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.4.0.0/build/git-annex/git-annex ...
Completed 233 action(s).
Prelude.chr: bad argument: 2214592520
This issue seems likely to be about it:
https://github.com/commercialhaskell/stack/pull/5508
I'm building with stack from debian, version 2.3.3, so a newer stack
probably avoids that. Anyway, despite that stack problem,
the git-annex binary is built, and works.

The stack.yaml I used for this build was patched as follows:

diff --git a/stack.yaml b/stack.yaml
index 8dac87c15..62c4b5b9d 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -1,6 +1,6 @@
 flags:
   git-annex:
-    production: true
+    production: false
     assistant: true
     pairing: true
     torrentparser: true
@@ -14,7 +14,7 @@ flags:
     httpclientrestricted: true
 packages:
 - '.'
-resolver: lts-18.13
+resolver: nightly-2021-09-07
 extra-deps:
 - IfElse-0.85
 - aws-0.22

Sponsored-by: Graham Spencer on Patreon
2021-12-08 15:08:02 -04:00
Joey Hess
f3326b8b5a
git-lfs gitlab interoperability fix
git-lfs: Fix interoperability with gitlab's implementation of the git-lfs
protocol, which requests Content-Encoding chunked.

Sponsored-by: Dartmouth College's Datalad project
2021-11-10 13:51:11 -04:00
Joey Hess
8034f2e9bb
factor out IncrementalHasher from IncrementalVerifier 2021-11-09 12:33:22 -04:00
Joey Hess
29d687dce9
When retrival from a chunked remote fails, display the error that occurred when downloading the chunk
Rather than the error that occurred when trying to download the unchunked
content, which is less likely to actually be stored in the remote.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2021-10-14 12:45:05 -04:00
Joey Hess
17a0fa3dbc
negotiate P2P protocol version for tor remotes
This negotiation is not supported by versions of git-annex older
than 6.20180312. Well, maybe really 6.20180227 or so, but using that in
the changelog simplifies things since it was the version for the other
changes as well.

See commit c81768d425 for the back story.

As well as allowing for future protocol improvements, this will result
in negoatiating protocol version 1, which is an improvement over default
version 0.

In fact, it looks like no supported version of git-annex will use
protocol version 0, since version 1 was introduced in 6.20180227.
Still, removing the code for version 0 seems unncessary.
See commit 31e1adc005.

Sponsored-by: Brett Eisenberg on Patreon.
2021-10-11 15:58:51 -04:00
Joey Hess
e43aaa22be
Merge branch 'p2pflagday' 2021-10-11 15:42:52 -04:00
Joey Hess
7bdc7350a5
remove git-annex-shell compat code
* Removed support for accessing git remotes that use versions of
  git-annex older than 6.20180312.
* git-annex-shell: Removed several commands that were only needed to
  support git-annex versions older than 6.20180312.
  (lockcontent, recvkey, sendkey, transferinfo, commit)

The P2P protocol was added in that version, and used ever since, so
this code was only needed for interop with older versions.

"git-annex-shell commit" is used by newer git-annex versions, though
unnecessarily so, because the p2pstdio command makes a single commit at
shutdown. Luckily, it was run with stderr and stdout sent to /dev/null,
and non-zero exit status or other exceptions are caught and ignored. So,
that was able to be removed from git-annex-shell too.

git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by
gcrypt special remotes accessed over ssh, so those had to be kept.
It would probably be possible to convert that to using the P2P protocol,
but it would be another multi-year transition.

Some git-annex-shell fields were able to be removed. I hoped to remove
all of them, and the very concept of them, but unfortunately autoinit
is used by git-annex sync, and gcrypt uses remoteuuid.

The main win here is really in Remote.Git, removing piles of hairy fallback
code.

Sponsored-by: Luke Shumaker
2021-10-11 15:36:51 -04:00
Joey Hess
2e94ba9c70
remove broken code
git-annex-shell fsck has never worked, back in
commit 1ffb3bb0ba I discussed maybe adding
it one day, but this code has always failed.
2021-10-11 14:59:27 -04:00
Joey Hess
69f8e6c7c0
ImportableContentsChunkable
This improves the borg special remote memory usage, by
letting it only load one archive's worth of filenames into memory at a
time, and building up a larger tree out of the chunks.

When a borg repository has many archives, git-annex could easily OOM
before. Now, it will use only memory proportional to the number of
annexed keys in an archive.

Minor implementation wart: Each new chunk re-opens the content
identifier database, and also a new vector clock is used for each chunk.
This is a minor innefficiency only; the use of continuations makes
it hard to avoid, although putting the database handle into a Reader
monad would be one way to fix it.

It may later be possible to extend the ImportableContentsChunkable
interface to remotes that are not third-party populated. However, that
would perhaps need an interface that does not use continuations.

The ImportableContentsChunkable interface currently does not allow
populating the top of the tree with anything other than subtrees. It
would be easy to extend it to allow putting files in that tree, but borg
doesn't need that so I left it out for now.

Sponsored-by: Noam Kremen on Patreon
2021-10-08 13:15:22 -04:00
Joey Hess
19e78816f0
convert Key to ShortByteString
This adds the overhead of a copy when serializing and deserializing keys.
I have not benchmarked much, but runtimes seem barely changed at all by that.

When a lot of keys are in memory, it improves memory use.

And, it prevents keys sometimes getting PINNED in memory and failing to GC,
which is a problem ByteString has sometimes. In particular, git-annex sync
from a borg special remote had that problem and this improved its memory
use by a large amount.

Sponsored-by: Shae Erisson on Patreon
2021-10-05 20:20:08 -04:00
Joey Hess
7ccf642863
revert change that broke test_readonly
commit 63d508e885 broke test_readonly.
When a local git remote is readonly, tryCopyCoW run to copy a file
from it failed at withOtherTmp.

Sponsored-by: Dartmouth College's Datalad project
2021-09-27 16:02:41 -04:00
Joey Hess
798b33ba3d
simplify annex.bwlimit handling
RemoteGitConfig parsing looks for annex.bwlimit when a remote
does not have a per-remote config for it, so no need for a separate
gobal config.

Sponsored-by: Svenne Krap on Patreon
2021-09-22 10:52:01 -04:00
Joey Hess
05a097cde8
Merge branch 'master' into bwlimit 2021-09-22 10:48:27 -04:00
Joey Hess
63d508e885
resume properly when copying a file to/from a local git remote is interrupted
Probably this fixes a reversion, but I don't know what version broke it.

This does use withOtherTmp for a temp file that could be quite large.
Though albeit a reflink copy that will not actually take up any space
as long as the file it was copied from still exists. So if the copy cow
succeeds but git-annex is interrupted just before that temp file gets
renamed into the usual .git/annex/tmp/ location, there is a risk that
the other temp directory ends up cluttered with a larger temp file than
later. It will eventually be cleaned up, and the changes of this being
a problem are small, so this seems like an acceptable thing to do.

Sponsored-by: Shae Erisson on Patreon
2021-09-21 17:43:35 -04:00
Joey Hess
18e00500ce
bwlimit
Added annex.bwlimit and remote.name.annex-bwlimit config that works for git
remotes and many but not all special remotes.

This nearly works, at least for a git remote on the same disk. With it set
to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with
occasional spikes to 160 kb/s. So it needs to delay just a bit longer...
I'm unsure why.

However, at the beginning a lot of data flows before it determines the
right bandwidth limit. A granularity of less than 1s would probably improve
that.

And, I don't know yet if it makes sense to have it be 100ks/1s rather than
100kb/s. Is there a situation where the user would want a larger
granularity? Does granulatity need to be configurable at all? I only used that
format for the config really in order to reuse an existing parser.

This can't support for external special remotes, or for ones that
themselves shell out to an external command. (Well, it could, but it
would involve pausing and resuming the child process tree, which seems
very hard to implement and very strange besides.) There could also be some
built-in special remotes that it still doesn't work for, due to them not
having a progress meter whose displays blocks the bandwidth using thread.
But I don't think there are actually any that run a separate thread for
downloads than the thread that displays the progress meter.

Sponsored-by: Graham Spencer on Patreon
2021-09-21 16:58:10 -04:00
Joey Hess
2739adc258
fix build with ghc 9.0.1
I was not able to test the whole build because of a very strange
Prelude.chr: bad argument: 469762054
Which I assume is a problem with this version of ghc or the way I was
using stack.

The stack.yaml that builds it used this patch

diff --git a/stack.yaml b/stack.yaml
index 790bffff2..8bcbaa0ec 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -1,6 +1,6 @@
 flags:
   git-annex:
-    production: true
+    production: false
     assistant: true
     pairing: true
     torrentparser: true
@@ -18,13 +18,15 @@ extra-deps:
 - IfElse-0.85
 - aws-0.22
 - bloomfilter-2.0.1.0
-- filepath-bytestring-1.4.2.1.6
-- git-lfs-1.1.0
-- http-client-restricted-0.0.3
+- filepath-bytestring-1.4.2.1.8
+- git-lfs-1.1.1
+- http-client-restricted-0.0.4
 - network-multicast-0.3.2
 - sandi-0.5
 - torrent-10000.1.1
 - bencode-0.6.1.1
+- base16-bytestring-0.1.1.7
+- base64-bytestring-1.0.0.3
 explicit-setup-deps:
   git-annex: true
-resolver: lts-16.27
+resolver: nightly-2021-09-07
2021-09-07 16:53:07 -04:00