Commit graph

2646 commits

Author SHA1 Message Date
Joey Hess
3e5829721f
fix build 2023-04-12 14:31:56 -04:00
Joey Hess
11790df3e6
fix build 2023-04-12 14:18:29 -04:00
Joey Hess
3346aa9659
safe output to terminal for calckey inprogress and lookupkey
These are quite low-level, but still there is no point in displaying
escape sequences that have been embedded in a key to the terminal.

I think these are the only remaining commands that didn't use safe
output, except for cases where git-annex is speaking a protocol to
itself.

Sponsored-by: Kevin Mueller on Patreon
2023-04-12 14:03:44 -04:00
Joey Hess
a576fc3b12
fix mojibake reversion in display of utf8
When displaying a ByteString like "💕", safeOutput operates on
individual bytes like "\240\159\146\149" and isControl '\146' = True,
so it got truncated to just "\240".

So, only treat the low control characters, and DEL, as control
characters.

Also split Utility.Terminal out of Utility.SafeOutput. The latter needs
win32, but Utility.SafeOutput is used by Control.Exception, which is
used by Setup.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-12 13:53:30 -04:00
Joey Hess
afa5b883dc
find, findkeys, examinekey: escape output to terminal when --format is not used
Note that filenames are not quoted, only escaped. This is to match the
output of --format with escaping.

Sponsored-by: Lawrence Brogan on Patreon
2023-04-11 15:27:07 -04:00
Joey Hess
df6f9f1ee8
filter out control characters and quote filenames
Searched for uses of putStr and hPutStr and changed appropriate ones to filter
out control characters and quote filenames.

This notably does not make find and findkeys quote filenames in their default
output. Because they should only do that when stdout is non a pipe.

A few commands like calckey and lookupkey seem too low-level to make sense to filter
output, so skipped those.

Also when relaying output from other commands that is not progress output,
have git-annex filter out control characters.

Sponsored-by: k0ld on Patreon
2023-04-11 14:27:22 -04:00
Joey Hess
8b6c7bdbcc
filter out control characters in all other Messages
This does, as a side effect, make long notes in json output not
be indented. The indentation is only needed to offset them
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brock Spratlen on Patreon
2023-04-11 12:58:01 -04:00
Joey Hess
a0e6fa18eb
eliminate showStart showStartOther
These were not handling control characters and are redundant.

Sponsored-by: Jack Hill on Patreon
2023-04-10 16:28:58 -04:00
Joey Hess
3290a09a70
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.

Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.

When json is being output, no quoting is done, since json gets its own
quoting.

This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 15:55:44 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Joey Hess
063c00e4f7
git style filename quoting for giveup
When the filenames are part of the git repository or other files that
might have attacker-controlled names, quote them in error messages.

This is fairly complete, although I didn't do the one in
Utility.DirWatcher.INotify.hs because that doesn't have access to
Git.Filename or Annex.

But it's also quite possible I missed some. And also while scanning for
these, I found giveup used with other things that could be attacker
controlled to contain control characters (eg Keys). So, I'm thinking
it would also be good for giveup to just filter out control characters.
This commit is then not the only line of defence, but just good
formatting when git-annex displays a filename in an error message.

Sponsored-by: Kevin Mueller on Patreon
2023-04-10 12:56:45 -04:00
Joey Hess
da83652c76
addurl --preserve-filename: reject control characters
As well as escape sequences, control characters seem unlikely to be desired when
doing addurl, and likely to trip someone up. So disallow them as well.

I did consider going the other way and allowing filenames with control characters
and escape sequences, since git-annex is in the process of escaping display
of all filenames. Might still be a better idea?

Also display the illegal filename git quoted when it rejects it.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-10 12:18:25 -04:00
Joey Hess
2ba1559a8e
git style quoting for ActionItemOther
Added StringContainingQuotedPath, which is used for ActionItemOther.

In the process, checked every ActionItemOther for those containing
filenames, and made them use quoting.

Sponsored-by: Graham Spencer on Patreon
2023-04-08 16:30:01 -04:00
Joey Hess
d689a5b338
git style filename quoting controlled by core.quotePath
This is by no means complete, but escaping filenames in actionItemDesc does
cover most commands.

Note that for ActionItemBranchFilePath, the value is branch:file, and I
choose to only quote the file part (if necessary). I considered quoting the
whole thing. But, branch names cannot contain control characters, and while
they can contain unicode, git coes not quote unicode when displaying branch
names. So, it would be surprising for git-annex to quote unicode in a
branch name.

The find command is the most obvious command that still needs to be
dealt with. There are probably other places that filenames also get
displayed, eg embedded in error messages.

Some other commands use ActionItemOther with a filename, I think that
ActionItemOther should either be pre-sanitized, or should explicitly not
be used for filenames, so that needs more work.

When --json is used, unicode does not get escaped, but control
characters were already escaped in json.

(Key escaping may turn out to be needed, but I'm ignoring that for now.)

Sponsored-by: unqueued on Patreon
2023-04-08 14:52:26 -04:00
Joey Hess
98a3ba0ea5
restore old registerurl location tracking behavior
registerurl: When an url is claimed by a special remote other than the web,
update location tracking for that special remote.

registerurl's behavior was changed in commit
451171b7c1, apparently accidentially to not
update location tracking except for the web.

This makes registerurl followed by unregisterurl not be a no-op, when the
url happens to be claimed by a remote other than the web. It is a noop when
the url is unclaimed except by the web. I don't like the inconsistency,
and wish that registerurl and unregisterurl never updated location
tracking, which would be more in keeping with them being plumbing.

But there is the fact that it used to behave this way, and also it was
inconsistent that it updated location tracking for the web but not for
other remotes, unlike addurl. And there's an argument that the user might
not know what remote to expect to claim an url, so would be considerably in
the dark when using registerurl. (Although they have to know what content
gets downloaded, since they specify a key..)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-04-05 17:06:44 -04:00
Joey Hess
2b940f7725
registerurl, unregisterurl: Added --remote option
This serves two purposes. --remote=web bypasses other special remotes that
claim the url, same as addurl --raw. And, specifying some other remote
allows making sure that an url is claimed by the remote you expect,
which makes then using setpresentkey not be fragile.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-04-05 15:54:41 -04:00
Joey Hess
24ae4b291c
addurl, importfeed: Fix failure when annex.securehashesonly is set
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.

Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.

Sponsored-by: unqueued on Patreon
2023-03-27 15:10:46 -04:00
Joey Hess
cd076cd085
Windows: Support urls like "file:///c:/path"
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.

Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.

An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.

It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-03-27 13:38:02 -04:00
Joey Hess
e900e3caf3
avoid build warning on windows 2023-03-27 12:21:40 -04:00
Joey Hess
a0badc5069
sync: Fix parsing of gcrypt::rsync:// urls that use a relative path
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.

(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)

Sponsored-by: Jack Hill on Patreon
2023-03-23 15:20:00 -04:00
Joey Hess
e822df2a09
fix build warnings on windows 2023-03-21 18:41:23 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Yaroslav Halchenko
e018ae1125
Fix ambigous typos 2023-03-17 15:14:47 -04:00
Joey Hess
f1b678face
copy --from --to location tracking update
copy: When --from and --to are combined and the content is already present
on the destination remote, update location tracking as necessary.

Sponsored-by: Dartmouth College's DANDI project
2023-03-13 14:51:09 -04:00
Joey Hess
2323af3736
importfeed: Display feed title
When importing a bunch of feeds, this makes it more clear what it's working
on. Also, I sometimes want to delete a particular feed from a list of feeds
but don't know which url belongs to the feed, and this solves that.

Control characters are filtered out just to protect against some feed
putting escape character stuff in the feed, which could be a
security problem. (Control characters also get filtered out of
importfeed filenames.)

Sponsored-by: Luke Shumaker on Patreon
2023-03-11 13:52:45 -04:00
Joey Hess
ff141c093e
include subdir when checking export branch is checked out
sync: Fix a reversion that prevented sending files to exporttree=yes
remotes when annex-tracking-branch was configured to branch:subdir
(Introduced in version 10.20230214)

Sponsored-by: Kevin Mueller on Patreon
2023-03-10 11:41:52 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
9c3c4c1712
deprecate git-annex status w/o runtime warning
As far as I can see, git-annex status was added to support direct mode, and
like other things added for that, it ought to be deprecated.

Behavior is similar to git status --short, though not identical in a few
cases eg renamed files.

I think datalad does not use this command, although it might have in the
past. Could not find any use of it in the current datalad code.

A deprecation warning at runtime would be the next step, probably will wait
and do that for all the deprecated commands together (except findref).
2023-02-28 16:34:31 -04:00
Joey Hess
80478cc145
support git-annex view in an adjusted branch
Rather than entering a view of the adjusted branch, enter an adjusted
view branch. This way, it's the same as first using git-amnnex view
followed by git-annex adjust, and everything already implemented to
support that works.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-02-27 15:48:58 -04:00
Joey Hess
1c4f4b449a
support --unlock-present adjustment of view branches
When generating the view, check if the key is present.

When syncing in a view branch with an adjustment, run adjustedBranchRefreshFull
the same as is done when syncing in other adjusted branches. This is
needed because the docs for git-annex adjust --unlock-present suggest
using git-annex sync to update the branch when annex.adjustedbranchrefresh
is not set.

Note that, with annex.adjustedbranchrefresh set, it just works! The
adjusted branch gets updated in the usual way and it doesn't matter that
there's a view branch underneath.

And of course, re-running git-annex adjut --unlock-present also works,
as suggested in the docs.

Sponsored-by: Erik Bjäreholt on Patreon
2023-02-27 15:37:57 -04:00
Joey Hess
f09e299156
rawfilepath conversion 2023-02-27 15:06:32 -04:00
Joey Hess
cc32e31161
understand adjusted view branch names
An adjusted view branch has a name like
"refs/heads/adjusted/views/master(author=_)(unlocked)", so it is a view
branch that has been converted to an adjusted branch.

Made Logs.View support such branch names. So now git-annex sync and
pre-commit handle updating metadata on commit in such a branch.

Much remains to be done to fully support adjusted view branches,
including actually applying the adjustment when updating the view branch.

Sponsored-by: Graham Spencer on Patreon
2023-02-27 14:57:58 -04:00
Joey Hess
16d3097a08
fix reversion in info, and add test case
info: Fix reversion in last release involving handling of unsupported input
by continuing to handle any other inputs, before exiting nonzero at the
end.

Sponsored-by: Dartmouth College's Datalad project
2023-02-20 14:31:37 -04:00
Joey Hess
452b080dba
better handling of multiple repositories with the same name
Used to fail with a bad error message, indicating there was no
repository with the specified name, or something like that. Now, suggest
they use the uuid to disambiguate.

* info, enableremotemote, renameremote: Avoid a confusing message when more
  than one repository matches the user provided name.
* info: Exit nonzero when the input is not supported.

Sponsored-by: Kevin Mueller on Patreon
2023-02-13 14:31:09 -04:00
Joey Hess
e9b6efac5a
fix buggy sync to exporttree remote when annex-tracking-branch is not checked out
sync: Fix a bug that caused files to be removed from an importtree=yes
exporttree=yes special remote when the remote's annex-tracking-branch was
not the currently checked out branch.

Sponsored-by: Max Thoursie on Patreon
2023-02-10 15:49:15 -04:00
Joey Hess
c2b3e870df
finishing up sync in view branch
sync: When run in a view branch, avoid updating synced/ branches, or trying
to merge anything from remotes.

Sponsored-by: Erik Bjäreholt on Patreon
2023-02-10 15:27:42 -04:00
Joey Hess
5f9bf51438
sync in view branch updates the view branch
* sync: When run in a view branch, refresh the view branch to reflect any
    changes that have been made to the parent branch or metadata.

This is basically working, but probably needs some more work to deal with
all the edge cases of things sync does.

Sponsored-by: Lawrence Brogan on Patreon
2023-02-08 15:37:28 -04:00
Joey Hess
a11d6e0baf
avoid sync pushing view branches to remotes, and better view branch names
* sync: Avoid pushing view branches to remotes.
* Changed the name of view branches to include the parent branch.
  Existing view branches checked out using an old name will still work.

It does not seem useful for sync to push view branches around, because
the information in a view branch can entirely be derived from other
information in git. And sync doesn't push adjusted branches around either.

The better view branch names make it more in line with adjusted branch
names, but were also needed to make fromViewBranch be able to return the
original branch name.

Kept the old view branch names still working. But, when those branches
exist in a repo, sync will still try to push them as before. Avoiding
that would need more complicated and/or expensive changes to sync.

Sponsored-By: Boyd Stephen Smith Jr. on Patreon
2023-02-08 13:57:48 -04:00
Joey Hess
c209e0f643
add FIELD?=GLOB to git-annex view usage
And also to vadd usage.

Also added some other things to the usage that were omitted before to
save space.

Adding even FIELD?=GLOB made the git-annex --help list of commands grow
too wide for an 80 column display. So, removed the description of
parameters from that list of commands.

Sponsored-By: Brock Spratlen on Patreon
2023-02-07 18:09:10 -04:00
Joey Hess
aa0350ff49
add directory to views for files that lack specified metadata
* view: New field?=glob and ?tag syntax that includes a directory "_"
  in the view for files that do not have the specified metadata set.
* Added annex.viewunsetdirectory git config to change the name of the
  "_" directory in a view.

When in a view using the new syntax, old git-annex will fail to parse the
view log. It errors with "Not in a view.", which is not ideal. But that
only affects view commands.

annex.viewunsetdirectory is included in the View for a couple of reasons.
One is to avoid needing to warn the user that it should not be changed when
in a view, since that would confuse git-annex. Another reason is that it
helped with plumbing the value through to some pure functions.

annex.viewunsetdirectory is actually mangled the same as any other view
directory. So if it's configured to something like "N/A", there won't be
multiple levels of directories, which would also confuse git-annex.

Sponsored-By: Jack Hill on Patreon
2023-02-07 16:28:46 -04:00
Joey Hess
152be2948b
use transfer stages for copy --from
See commit e04a931439 for an explanation
of why move uses transfer stages for --from, but command stages for
--to. At the point of that commit, copy was actually already using
command stages for everything, so the commit was incorrect about
improving copy --to.

But, the same reasoning about --from applies to copy as to move; when
verification is not done incrementally, download and verification are
the main two stages. The cleanup stage for copy is even less work than
for move (it doesn't drop from the remote).

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 14:07:49 -04:00
Joey Hess
579d9b60c1
improve concurrency of move/copy --from --to
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.

Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.

Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 13:59:39 -04:00
Joey Hess
77266e46dd
fix behavior of copy --from --to
Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:55:16 -04:00
Joey Hess
acc3f6211f
finishing up move --from --to
Lock the local content for drop after getting it from src, to prevent another
process from using the local content as a copy and dropping it from src,
which would prevent dropping the local content after sending it to dest.

Support resuming an interrupted move that downloaded the content from
src, leaving the local content populated. In this case, the location log
has not been updated to say the content is present locally, so we can
assume that it's resuming and go ahead and drop the local content after
sending it to dest.

Note that if a `git-annex get` is being ran at the same time as a
`git-annex move --from --to`, it may get a file just before the move
processes it. So the location log has not been updated yet, and the move
thinks it's resuming. Resulting in local copy being dropped after it's
sent to the dest. This race is something we'll just have to live with,
it seems.

I also gave up on the idea of checking if the location log had been updated
by a `git-annex get` that is ran at the same time. That wouldn't work, because
the location log is precached in the seek stage, so reading it again after
sending the content to dest would not notice changes made to it, unless the cache
were invalidated, which would slow it down a lot. That idea anyway was subject
to races where it would not detect the concurrent `git-annex get`.

So concurrent `git-annex get` will have results that may be surprising.
To make that less surprising, updated the documentation of this feature to
be explicit that it downloads content to the local repository
temporarily.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:43:48 -04:00
Joey Hess
f5f799f17e
fully working move --from --to (not release quality)
When the destination already has a copy, it behaves the same as
drop --from really, but display it as a move and implement it
reusing the factored out code from fromPerform.

(Note that willDropMakeItWorse never returns DropAllowed in that
situation, because it's told that dest has a copy. So numcopies is
always checked.)

And when only the source and not the local repo or destination have a
copy, do the full copy from source to local, then copy from local to
dest, then drop from local, then drop from source dance.

This is complicated by fromPerform being hardcoded to assume there is a
local copy, but the local copy has already been dropped. That's why
it uses cleanupfromsrc RemoveNever to avoid the code that makes that
assumption, and finishes with a call to dropfromsrc.

And, since the location log has not yet been updated, checking numcopies
was not working, until I added UnVerifiedRemote dest to the list of
things to check.

This is not yet quite mergeable though. There are two things in the
comment above fromToPerform that are not implemented yet: Checking the
location log before dropping the local copy, and locking the temporary
local copy for drop.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 16:12:33 -04:00
Joey Hess
1abd457e98
push location log updating up to callers of download
Prep for move --to --from, which needs to download from a src repo
without updating the location log for the local repo, before sending the
content on to the dest repo.

Note that caller of download' already update the log themselves.
See previous commit a422a056f2
that pushed it up to download from getViaTmpFrom.

(Also removed in passing a debug print + readline that I accidentially
committed last week on this branch.)

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:47:41 -04:00
Joey Hess
8c349b8802
implement move --from --to when there is a local copy already
This is rather trivial, since it does not need to temporarily get the
local copy.

Added fromPerform' to handle the situation where the local copy
is dropped by another process during the copy to the dest. This avoids
ever re-downloading the local copy before dropping from the src.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:17:35 -04:00
Joey Hess
a46c385aec
move/copy: started implementing --from src --to dest
This is not in a usable state, but I have a possible plan for how to do
it.

Sponsored-by: Dartmouth College's DANDI project
2023-01-20 11:10:38 -04:00
Joey Hess
a6c1d9752b
move/copy: option parsing for --from with --to
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!

The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.

Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.

Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.

Sponsored-by: Dartmouth College's DANDI project
2023-01-18 14:42:39 -04:00
Joey Hess
f8bc208e89
findkeys: New command, very similar to git-annex find but operating on keys
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.

Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.

A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.

Also, cleaned up comments on git-annex-find man page asking for --all
option.

Sponsored-by: Dartmouth College's DANDI project
2023-01-17 14:51:57 -04:00