Commit graph

2812 commits

Author SHA1 Message Date
Joey Hess
549d390d03
display drop from remote more consistently
With eg copy --to remote

This is particularly an improvement in sync --content output, which
mixes the two, so it's nice to have consistent display.
2023-06-27 19:01:33 -04:00
Joey Hess
d98aa35b3b
reinject: Added --guesskeys option
Sponsored-by: Noam Kremen on Patreon
2023-06-26 14:05:31 -04:00
Joey Hess
39f3d783fe
consolidate 2023-06-20 15:10:11 -04:00
Joey Hess
72715845a1
display destination file before youtube-dl download
Rather than after it, which can leave one wondering what file it's
downloading.

youtubeDl should not ever return Right Nothing in normal operation,
becaause it's already asked youtube-dl if it supports the url. So it
would have to succeed at that, then not download any file, but also
exit successfully, in order for the new error message to display.

Also display the name of yt-dlp when using it.
2023-06-20 14:55:25 -04:00
Joey Hess
958c2fa6d2
Improve resuming interrupted download when using yt-dlp or youtube-dl
Fixes a failure like this:

curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume.

That happens because the whole web page has already been downloaded
previously, and kept, so now addurl tries to download it, and curl asks the
server to resume from the last byte. And youtube.com can't, for whatever
stupid reason.

So, delete the temp file after determining that youtube-dl can be used.
2023-06-19 15:01:47 -04:00
Joey Hess
1f09b709fc
skip sending individual files to export remotes
That will fail, and it already exports whole trees.
f6dd34ca81 made it sync content with
import remotes, and if an import remote is also an export remote, that
caused this new failure mode.

Sponsored-by: Brock Spratlen on Patreon
2023-06-19 11:24:32 -04:00
Joey Hess
64738ea157
config: Added the --show-origin and --for-file options
* config: Added the --show-origin and --for-file options.
* config: Support annex.numcopies and annex.mincopies.

There is a little bit of redundancy here with other code elsewhere that
combines the various configs and selects which to use. But really only
for the special case of annex.numcopies, which is a git config that does
not override the annex branch setting and for annex.mincopies, which does
not have a git config but does have gitattributes settings as well as the
annex branch setting.

That seems small enough, and unlikely enough to grow into a mess that it was
worth supporting annex.numcopies and annex.mincopies in git-annex config
--show-origin. Because these settings are a prime thing that someone might
get confused about and want to know where they were configured.

And, it followed that git-annex config might as well support those two
for --set and --get as well. While this is redundant with the speclialized
commands, it's only a little code and it makes it more consistent.

Note that --set does not have as nice output as numcopies/mincopies
commands in some special cases like setting to 0 or a negative number.
It does avoid setting to a bad value thanks to the smart
constructors (eg configuredNumCopies).

As for other git-annex branch configurations that are not set by git-annex
config, things like trust and wanted that are specific to a repository
don't map to a git config name, so don't really fit into git-annex config.
And they are only configured in the git-annex branch with no local override
(at least so far), so --show-origin would not be useful for them.

Sponsored-by: Dartmouth College's DANDI project
2023-06-12 16:24:31 -04:00
Joey Hess
f6dd34ca81
sync content with import remotes
This didn't used to be needed because importKeys would import all
content and so doing another pass was redundant.

But since 40017089f2 it uses
importChanges, so only new files are imported. If a file that was
already imported before was dropped, that would prevent sync --content
from gettng its content again.

Sponsored-by: Jack Hill on Patreon
2023-06-01 18:52:19 -04:00
Joey Hess
40017089f2
use importChanges optimisation
Large speed up to importing trees from special remotes that contain a lot
of files, by only processing changed files.

Benchmarks:

Importing from a special remote that has 10000 files, that have all been
imported before, and 1 new file sped up from 26.06 to 2.59 seconds.

An import with no change and 10000 unchanged files sped up from 24.3 to
1.99 seconds.

Going up to 20000 files, an import with no changes sped up from
125.95 to 3.84 seconds.

Sponsored-by: k0ld on Patreon
2023-06-01 13:47:00 -04:00
Joey Hess
c6acf574c7
implement importChanges optimisaton (not used yet)
For simplicity, I've not tried to make it handle History yet, so when
there is a history, a full import will still be done. Probably the right
way to handle history is to first diff from the current tree to the last
imported tree. Then, diff from the current tree to each of the
historical trees, and recurse through the history diffing from child tree
to parent tree.

I don't think that will need a record of the previously imported
historical trees, and so Logs.Import doesn't store them. Although I did
leave room for future expansion in that log just in case.

Next step will be to change importTree to importChanges and modify
recordImportTree et all to handle it, by using adjustTree.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-31 16:01:34 -04:00
Joey Hess
f1cdb79ca4
assist: honor gitignore
Sponsored-by: Graham Spencer on Patreon
2023-05-24 14:04:09 -04:00
Joey Hess
0f89d221bd
version: Avoid error message when entire output is not read
Sponsored-by: Dartmouth College's Datalad project
2023-05-19 15:00:57 -04:00
Joey Hess
9ed59dab5b
assist: operate on all files in working tree by default
Consistency with sync and internal consistency is more important than
consistency with the assistant, which is not itself consistent about
what it does when run in a subdirectory.

Note that with -C, it will still commit staged changes to files outside
the directory. Like sync does. Presumably if the user is manually
staging things, then running this command, they intend to build up a
commit.

Sponsored-by: unqueued on Patreon
2023-05-19 14:47:05 -04:00
Joey Hess
908b9687cc
assist: fix bug committing just added file when -J is used
Need to wait for worker threads adding files before flushing the queue.
2023-05-18 15:02:10 -04:00
Joey Hess
fc3df2da9e
assist: fix bug commiting just added file 2023-05-18 14:56:13 -04:00
Joey Hess
9619f562f4
fix assist to commit 2023-05-18 14:50:05 -04:00
Joey Hess
9ec5d7fb05
improve --cleanup desc 2023-05-18 14:41:20 -04:00
Joey Hess
e955912ad0
git-annex assist
assist: New command, which is the same as git-annex sync but with
new files added and content transferred by default.

(Also this fixes another reversion in git-annex sync,
--commit --no-commit, and --message were not enabled, oops.)

See added comment for why git-annex assist does commit staged
changes elsewhere in the work tree, but only adds files under
the cwd.

Note that it does not support --no-commit, --no-push, --no-pull
like sync does. My thinking is, why should it? If you want that
level of control, use git commit, git annex push, git annex pull.
Sync only got those options because pull and push were not split
out.

Sponsored-by: k0ld on Patreon
2023-05-18 14:37:43 -04:00
Joey Hess
935330aaab
fix inverted logic (fixes test fail)
Sponsored-by: Jack Hill on Patreon
2023-05-18 10:48:08 -04:00
Joey Hess
f93a7fce1d
sync: Started transition to --content being enabled by default
When used without --content or --no-content, warn about the upcoming
transition, and suggest using one of the options, or setting
annex.synccontent.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-17 13:23:42 -04:00
Joey Hess
af6b73a7e6
push: Support --cleanup
This option is not specific to sync, so it seemed it should be in either
pull or push as well as sync. Since it does modify the remote, it seems
better to have it in push; the modification of the local repo pulls in
the direction of pull, but not hard enough.

Maybe it would be better to have it in both?

Sponsored-by: Luke Shumaker on Patreon
2023-05-17 12:51:49 -04:00
Joey Hess
40731ff9fd
sync: Added -g as a short option for --no-content
I anticipate that if sync is transitioned to syncing content by default,
people will want a short option. And in repositories where
annex.synccontent = true, they already would. And pull and push sync
content by default, so a short option is useful with them too.

Mnemonic: -g makes only git data be synced
Also, -a makes only annex data be synced.

Would have preferred -c, which would complement -C, but it
was already taken to set git configs.

Sponsored-by: Noam Kremen on Patreon
2023-05-17 12:34:26 -04:00
Joey Hess
5df89d58c7
git-annex pull and push
Split out two new commands, git-annex pull and git-annex push. Those plus a
git commit are equivilant to git-annex sync.

In a sense, git-annex sync conflates 3 things, and it would have been
better to have push and pull from the beginning and not sync. Although
note that git-annex sync --content is faster than a pull followed by a
push, because it only has to walk the tree once, look at preferred
content once, etc. So there is some value in git-annex sync in speed, as
well as user convenience.

And it would be hard to split out pull and push from sync, as far as the
implementaton goes. The implementation inside sync was easy, just adjust
SyncOptions so it does the right thing.

Note that the new commands default to syncing content, unless
annex.synccontent is explicitly set to false. I'd like sync to also do
that, but that's a hard transition to make. As a start to that
transition, I added a note to git-annex-sync.mdwn that it may start to
do so in a future version of git-annex. But a real transition would
necessarily involve displaying warnings when sync is used without
--content, and time.

Sponsored-by: Kevin Mueller on Patreon
2023-05-16 16:51:07 -04:00
Joey Hess
b1c396a695
remove unused imports 2023-05-16 16:33:02 -04:00
Joey Hess
2e984c51b6
sync --no-pull and --no-push affect download and upload of content
The man page is somewhat vague about this, but I do think it was a bug
that these options didn't alreay behave that way. The options are
documented to disable imports and exports, which is the same operations
just with a special remote that uses trees.

The real motivation for this is that I'm adding git-annex pull and
git-annex push, and I want these options to turn off the equivilant of
those commands. And git-annex pull will certianly download and push
upload.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-05-16 16:25:23 -04:00
Joey Hess
212442dd9b
pullOption should be pushOption in seekExportContent
sync: Fix bug that made --no-pull, rather than --no-push prevent exporting
trees to special remotes.

Sponsored-by: Joshua Antonishen on Patreon
2023-05-16 15:55:24 -04:00
Joey Hess
07e0d2a35b
clean up uninit output
Don't think including the location of .git/annex/objects in the json is
really useful.
2023-05-11 13:52:22 -04:00
Joey Hess
55dfa929d6
uninit: remove unncessary ExistSuccess
That was added in 2011 to prevent writing to the git-annex branch on
shutdown. But, the use of saveState causes pending git-annex branch
writes to be completed before the branch is deleted. So, an unusual exit
is not needed.
2023-05-11 13:50:20 -04:00
Joey Hess
271f3b1ab4
uninit: Support --json and --json-error-messages
Had to convert uninit to do everything that can error out inside a
CommandStart. This was harder than feels nice.

(Also, in passing, converted CommandCheck to use a data type, not a
weird number that it was not clear how it managed to be unique.)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-11 13:43:02 -04:00
Joey Hess
1904cebbb3
fix typo 2023-05-11 13:26:55 -04:00
Joey Hess
02cfef1f91
uninit: Avoid buffering the names of all annexed files in memory
Oops, using the same list twice does prevent streaming in constant memory.

Sponsored-by: unqueued on Patreon
2023-05-11 13:25:55 -04:00
Joey Hess
b8d9c18e98
remove unused import 2023-05-11 13:24:34 -04:00
Joey Hess
cd5108bb47
uninit: remove undocumented suport for specifying files to act on
I think this was just copied from another command without paying
attention to what it did, because there does not seem to be any valid
reason to want to only unannex some files when running uninit.
2023-05-11 13:23:41 -04:00
Joey Hess
de84abb210
configremote: Support --json and --json-error-messages
Seems unlikely to be too useful, but who knows.

Moved the checkSafeConfig call to happen after an action is started, so
it will be captured by --json-error-messages

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:21:42 -04:00
Joey Hess
a242eabc7a
enableremote: Support --json and --json-error-messages
Seems unlikely to be too useful, but who knows. Was trivial anyway.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:09:27 -04:00
Joey Hess
b3cc8dbacb
initremote: Support --json and --json-error-messages
Including special --whatelse handling.

Otherwise, it seems unlikely to be too useful, but who knows.

Refactored code to call starting before displaying error messages.
This makes the error messages be captured by --json-error-messages

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:03:40 -04:00
Joey Hess
9812d9aaec
support aeson for Map
Make unused --json use it, which is better than the doubly nested lists
it was using.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 13:51:37 -04:00
Joey Hess
8d8e044458
upgrade: Support --json and --json-error-messages and --json-progress
Seems unlikely to be very useful, but trivial.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 12:54:48 -04:00
Joey Hess
c98fb0b637
merge: Support --json and --json-error-messages and --json-progress
Seems unlikely to be very useful, but trivial.
And, this completes the story that git-annex sync does not need json,
since every sub-operation is available in a command that does support json.
(Well, except for committing, but that's not a git-annex command.)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 12:34:19 -04:00
Joey Hess
2fdf0ae38d
include url in json output
The input field is consistently the url of the feed, which makes sense
as that is the user input, but to differentiate multiple urls downloaded
from a feed when using --json-progress -J, need the url that is being
downloaded too.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:59:44 -04:00
Joey Hess
7919349cee
importfeed: Support --json and --json-error-messages and --json-progress
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:51:16 -04:00
Joey Hess
6b54ea69e3
importfeed: Move error to where --json-error-messages can capture it
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:27:23 -04:00
Joey Hess
04ee6c4c6b
importfeed: Support -J (and work toward supporting --json)
Both -J and --json needed importfeed to be refactored to use commandAction.

That was difficult, because of the interrelated nature of downloading feeds
and then downloading files from feeds, both of which needed to use
commandAction. And then checking for problems in feeds has to come after
these actions, which may be run as background jobs.

As for --json support, it's most of the way there, but still has some
warts, so I didn't enable jsonOptions yet. The warts include:

- An initial empty json record is displayed by getCache.
- Input is not populated, should be feed url
- feedProblem at end will not be captured by --json-error-messages
  (see FIXME)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:13:56 -04:00
Joey Hess
a71c831949
renameremote: Support --json and --json-error-messages
Seems unlikely to be useful, but it works so

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 16:25:40 -04:00
Joey Hess
a5d0c85ae1
factor out maybeAddJSONField
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 16:15:41 -04:00
Joey Hess
3d8f93dc0a
reinject: Support --json and --json-error-messages
Also fix support for operating on multiple pairs of files and keys.

Moved notAnnexed to inside starting, so error message will get into the json.

Cannot include the key in the starting as it's not known yet, so instead
add it to the json later.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 15:43:37 -04:00
Joey Hess
91b9915b09
reinit: Support --json and --json-error-messages
Basically same concerns as init..

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 15:07:40 -04:00
Joey Hess
f09a248fe2
init: Support --json and --json-error-messages
Dunno how useful this will be, since about all that's accessible from
the json is whether it succeeded or failed, and the error messages
which were already on stderr.

Note that, when autoenabling a special remote, it would be possible for
one to stop and prompt or output not using Messages and so not output as
part of the json. I don't think that happens, but I'm not 100% sure
something doesn't manage to break it. Of course, the same could be the
case for commands that transfer objects. Using Annex.Init.autoEnableSpecialRemotes
in --json mode would avoid the problem, but I've chosen to wait until I
know it's needed to use it.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 14:58:08 -04:00
Joey Hess
c208442292
unused: Support --json and --json-error-messages
Generalized AddJSONActionItemField to allow it to add several fields. Not entirely
happy with that, since the names of the fields have to be carefully chosen to
not conflict with other json fields. And fields added that way can't be parsed
back in FromJSON, except for the "fields" field that is special cased for metadata.
Still, I couldn't see another way to do it.

Also, omit file:null from the json output. Which does affect other commands,
eg git-annex whereis --all --json. Hopefully that won't break something that expects
a null file. If it did, that could be reverted, but it would be ugly to have
file:null in the unused --json

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 14:39:57 -04:00
Joey Hess
365dbc89dc
expire, trust et al, dead, describe: Support --json and --json-error-messages
For expire, the normal output is unchanged, but the --json output includes the uuid
in machine parseable form. Which could be very useful for this somewhat obscure
command. That needed ActionItemUUID to be implemented, which seemed like a lot
of work, but then ---

I had been going to skip implementing them for trust, untrust, dead, semitrust,
and describe, but putting the uuid in the json is useful information, it tells
what uuid git-annex picked given the input. It was not hard to support
these once ActionItemUUID was implemented.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-05 15:33:30 -04:00
Joey Hess
1a9af823bc
addunused, dropunused: Support --json and --json-error-messages
This also changes addunused to display the names of the files that it adds.
That seems like a general usability improvement, and not displaying the input
number does not seem likely to be a problem to a user, since the filename
is based on the key. Displaying the filename was necessary to get it and the key
included in the json.

dropunused does not include the key in the json. It would be possible to
add, but would need more changes. And I doubt that dropunused --json
would be used in a situation where a program cared which keys were
dropped. Note that drop --unused does have the key in its json, so such
a program could just use it. Or could just dropkey --batch with the
specific keys it wants to drop if it cares about specific keys.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-05 14:01:40 -04:00
Joey Hess
1d4bd2dcb8
migrate, undo: Support --json and --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 16:34:35 -04:00
Joey Hess
38fc5d3fc7
rekey, setpresentkey: Support --json and --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 16:03:54 -04:00
Joey Hess
f20c8b087e
fix: Support --json and --json-error-messages
And triaged out some commands that don't need to support these options.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 14:28:21 -04:00
Joey Hess
46c7c30140
log: Support --json and --json-error-messages
Also in passing the --all display was fixed up to not quote keys like filenames.

Note that the check added to compareChanges was needed to avoid logging when
nothing changed.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 12:36:31 -04:00
Joey Hess
f56f6140fa
remote tailing 's' from log --raw-data
log: When --raw-date is used, display only seconds from the epoch, as
documented, omitting a trailing "s" that was included in the output
before.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 11:53:38 -04:00
Joey Hess
c235488e2d
rmurl: Support --json and --json-error-messages
The json does not include an url field, but it does have an input field that is
"file url" when using --batch and ["file", "url"] when using the command line.
I chose not to change that because it would complicate batchInput.
An url field could be added if it turns out to be useful.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 11:28:27 -04:00
Joey Hess
6cbcba484c
unannex: Support --json and --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-03 15:56:20 -04:00
Joey Hess
1beca851ff
back compat fix for info --json on unknown item
This was changed in a0e6fa18eb in a way
that broke datalad, which expected to see a "file" field in the --json.

See 45ddd4b12f
2023-05-01 12:05:21 -04:00
Joey Hess
4d6c918eff
avoid quoting spaces in git-annex find output to terminal
That's too much quoting, the user expects the filename to be copy and
pasteable. It would be ok to slash-escape space ('\ ')
which is what gnu find does, but it doesn't seem necessary either.

${escaped_file} has always quoted spaces though, so keep on doing it
there.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-26 00:18:30 -04:00
Joey Hess
be36e208c2
json object for FileNotFound
When a nonexistant file is passed to a command and  --json-error-messages
is enabled, output a JSON object indicating the problem.

(But git ls-files --error-unmatch still displays errors about such files in
some situations.)

I don't like the duplication of the name of the command introduced by this,
but I can't see a great way around it. One way would be to pass the Command
instead.

When json is not enabled, the stderr is unchanged. This is necessary
because some commands like find have custom output. So dislaying
"find foo not found" would be wrong. So had to complicate things with
toplevelFileProblem having different output with and without json.

When not using --json-error-messages but still using --json, it displays
the error to stderr, but does display a json object without the error. It
does have an errorid though. Unsure how useful that behavior is.

Sponsored-by: Dartmouth College's Datalad project
2023-04-25 19:26:20 -04:00
Joey Hess
31e4b6dee1
catch chdir exception in --autostop
assistant --autostop: Avoid crashing when ~/.config/git-annex/autostart
lists a directory that it cannot chdir to.

Sponsored-by: k0ld on Patreon
2023-04-19 12:42:02 -04:00
Joey Hess
9155ed1072
configremote
New command, currently limited to changing autoenable= setting of a special remote.

It will probably never be used for more than that given the limitations on
it.

Sponsored-by: Brock Spratlen on Patreon
2023-04-18 15:30:49 -04:00
Joey Hess
8728695b9c
support enableremote of git repo changing eg autoenable=
enableremote: Support enableremote of a git remote (that was previously set
up with initremote) when additional parameters such as autoenable= are
passed.

The enableremote special case for regular git repos is intended to handle
ones that don't have a UUID probed, and the user wants git-annex to
re-probe. So, that special case is still needed. But, in that special
case, the user is not passing any extra parameters. So, when there are
parameters, instead run the special remote setup code. That requires there
to be a uuid known already, and it allows changing things like autoenable=

Remote.Git.enableRemote changed to be a no-op if a git remote with the name
already exists. Which it generally will in this case.

Sponsored-by: Jack Hill on Patreon
2023-04-18 14:00:24 -04:00
Joey Hess
160d4c9254
whereused: Fix display of branch:file when run in a subdirectory
The file needs to be relative to the top of the repository
in that case, but it was relative to the subdir.

Sponsored-by: Luke Shumaker on Patreon
2023-04-12 15:18:04 -04:00
Joey Hess
275d974120
improve display of relative path to file
When in a subdirectory, and the file is too, it used to display eg
../subdir/thefile and now will display thefile.
2023-04-12 15:11:44 -04:00
Joey Hess
3e5829721f
fix build 2023-04-12 14:31:56 -04:00
Joey Hess
11790df3e6
fix build 2023-04-12 14:18:29 -04:00
Joey Hess
3346aa9659
safe output to terminal for calckey inprogress and lookupkey
These are quite low-level, but still there is no point in displaying
escape sequences that have been embedded in a key to the terminal.

I think these are the only remaining commands that didn't use safe
output, except for cases where git-annex is speaking a protocol to
itself.

Sponsored-by: Kevin Mueller on Patreon
2023-04-12 14:03:44 -04:00
Joey Hess
a576fc3b12
fix mojibake reversion in display of utf8
When displaying a ByteString like "💕", safeOutput operates on
individual bytes like "\240\159\146\149" and isControl '\146' = True,
so it got truncated to just "\240".

So, only treat the low control characters, and DEL, as control
characters.

Also split Utility.Terminal out of Utility.SafeOutput. The latter needs
win32, but Utility.SafeOutput is used by Control.Exception, which is
used by Setup.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-12 13:53:30 -04:00
Joey Hess
afa5b883dc
find, findkeys, examinekey: escape output to terminal when --format is not used
Note that filenames are not quoted, only escaped. This is to match the
output of --format with escaping.

Sponsored-by: Lawrence Brogan on Patreon
2023-04-11 15:27:07 -04:00
Joey Hess
df6f9f1ee8
filter out control characters and quote filenames
Searched for uses of putStr and hPutStr and changed appropriate ones to filter
out control characters and quote filenames.

This notably does not make find and findkeys quote filenames in their default
output. Because they should only do that when stdout is non a pipe.

A few commands like calckey and lookupkey seem too low-level to make sense to filter
output, so skipped those.

Also when relaying output from other commands that is not progress output,
have git-annex filter out control characters.

Sponsored-by: k0ld on Patreon
2023-04-11 14:27:22 -04:00
Joey Hess
8b6c7bdbcc
filter out control characters in all other Messages
This does, as a side effect, make long notes in json output not
be indented. The indentation is only needed to offset them
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brock Spratlen on Patreon
2023-04-11 12:58:01 -04:00
Joey Hess
a0e6fa18eb
eliminate showStart showStartOther
These were not handling control characters and are redundant.

Sponsored-by: Jack Hill on Patreon
2023-04-10 16:28:58 -04:00
Joey Hess
3290a09a70
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.

Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.

When json is being output, no quoting is done, since json gets its own
quoting.

This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 15:55:44 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Joey Hess
063c00e4f7
git style filename quoting for giveup
When the filenames are part of the git repository or other files that
might have attacker-controlled names, quote them in error messages.

This is fairly complete, although I didn't do the one in
Utility.DirWatcher.INotify.hs because that doesn't have access to
Git.Filename or Annex.

But it's also quite possible I missed some. And also while scanning for
these, I found giveup used with other things that could be attacker
controlled to contain control characters (eg Keys). So, I'm thinking
it would also be good for giveup to just filter out control characters.
This commit is then not the only line of defence, but just good
formatting when git-annex displays a filename in an error message.

Sponsored-by: Kevin Mueller on Patreon
2023-04-10 12:56:45 -04:00
Joey Hess
da83652c76
addurl --preserve-filename: reject control characters
As well as escape sequences, control characters seem unlikely to be desired when
doing addurl, and likely to trip someone up. So disallow them as well.

I did consider going the other way and allowing filenames with control characters
and escape sequences, since git-annex is in the process of escaping display
of all filenames. Might still be a better idea?

Also display the illegal filename git quoted when it rejects it.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-10 12:18:25 -04:00
Joey Hess
2ba1559a8e
git style quoting for ActionItemOther
Added StringContainingQuotedPath, which is used for ActionItemOther.

In the process, checked every ActionItemOther for those containing
filenames, and made them use quoting.

Sponsored-by: Graham Spencer on Patreon
2023-04-08 16:30:01 -04:00
Joey Hess
d689a5b338
git style filename quoting controlled by core.quotePath
This is by no means complete, but escaping filenames in actionItemDesc does
cover most commands.

Note that for ActionItemBranchFilePath, the value is branch:file, and I
choose to only quote the file part (if necessary). I considered quoting the
whole thing. But, branch names cannot contain control characters, and while
they can contain unicode, git coes not quote unicode when displaying branch
names. So, it would be surprising for git-annex to quote unicode in a
branch name.

The find command is the most obvious command that still needs to be
dealt with. There are probably other places that filenames also get
displayed, eg embedded in error messages.

Some other commands use ActionItemOther with a filename, I think that
ActionItemOther should either be pre-sanitized, or should explicitly not
be used for filenames, so that needs more work.

When --json is used, unicode does not get escaped, but control
characters were already escaped in json.

(Key escaping may turn out to be needed, but I'm ignoring that for now.)

Sponsored-by: unqueued on Patreon
2023-04-08 14:52:26 -04:00
Joey Hess
98a3ba0ea5
restore old registerurl location tracking behavior
registerurl: When an url is claimed by a special remote other than the web,
update location tracking for that special remote.

registerurl's behavior was changed in commit
451171b7c1, apparently accidentially to not
update location tracking except for the web.

This makes registerurl followed by unregisterurl not be a no-op, when the
url happens to be claimed by a remote other than the web. It is a noop when
the url is unclaimed except by the web. I don't like the inconsistency,
and wish that registerurl and unregisterurl never updated location
tracking, which would be more in keeping with them being plumbing.

But there is the fact that it used to behave this way, and also it was
inconsistent that it updated location tracking for the web but not for
other remotes, unlike addurl. And there's an argument that the user might
not know what remote to expect to claim an url, so would be considerably in
the dark when using registerurl. (Although they have to know what content
gets downloaded, since they specify a key..)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-04-05 17:06:44 -04:00
Joey Hess
2b940f7725
registerurl, unregisterurl: Added --remote option
This serves two purposes. --remote=web bypasses other special remotes that
claim the url, same as addurl --raw. And, specifying some other remote
allows making sure that an url is claimed by the remote you expect,
which makes then using setpresentkey not be fragile.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-04-05 15:54:41 -04:00
Joey Hess
24ae4b291c
addurl, importfeed: Fix failure when annex.securehashesonly is set
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.

Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.

Sponsored-by: unqueued on Patreon
2023-03-27 15:10:46 -04:00
Joey Hess
cd076cd085
Windows: Support urls like "file:///c:/path"
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.

Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.

An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.

It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-03-27 13:38:02 -04:00
Joey Hess
e900e3caf3
avoid build warning on windows 2023-03-27 12:21:40 -04:00
Joey Hess
a0badc5069
sync: Fix parsing of gcrypt::rsync:// urls that use a relative path
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.

(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)

Sponsored-by: Jack Hill on Patreon
2023-03-23 15:20:00 -04:00
Joey Hess
e822df2a09
fix build warnings on windows 2023-03-21 18:41:23 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Yaroslav Halchenko
e018ae1125
Fix ambigous typos 2023-03-17 15:14:47 -04:00
Joey Hess
f1b678face
copy --from --to location tracking update
copy: When --from and --to are combined and the content is already present
on the destination remote, update location tracking as necessary.

Sponsored-by: Dartmouth College's DANDI project
2023-03-13 14:51:09 -04:00
Joey Hess
2323af3736
importfeed: Display feed title
When importing a bunch of feeds, this makes it more clear what it's working
on. Also, I sometimes want to delete a particular feed from a list of feeds
but don't know which url belongs to the feed, and this solves that.

Control characters are filtered out just to protect against some feed
putting escape character stuff in the feed, which could be a
security problem. (Control characters also get filtered out of
importfeed filenames.)

Sponsored-by: Luke Shumaker on Patreon
2023-03-11 13:52:45 -04:00
Joey Hess
ff141c093e
include subdir when checking export branch is checked out
sync: Fix a reversion that prevented sending files to exporttree=yes
remotes when annex-tracking-branch was configured to branch:subdir
(Introduced in version 10.20230214)

Sponsored-by: Kevin Mueller on Patreon
2023-03-10 11:41:52 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
9c3c4c1712
deprecate git-annex status w/o runtime warning
As far as I can see, git-annex status was added to support direct mode, and
like other things added for that, it ought to be deprecated.

Behavior is similar to git status --short, though not identical in a few
cases eg renamed files.

I think datalad does not use this command, although it might have in the
past. Could not find any use of it in the current datalad code.

A deprecation warning at runtime would be the next step, probably will wait
and do that for all the deprecated commands together (except findref).
2023-02-28 16:34:31 -04:00
Joey Hess
80478cc145
support git-annex view in an adjusted branch
Rather than entering a view of the adjusted branch, enter an adjusted
view branch. This way, it's the same as first using git-amnnex view
followed by git-annex adjust, and everything already implemented to
support that works.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-02-27 15:48:58 -04:00
Joey Hess
1c4f4b449a
support --unlock-present adjustment of view branches
When generating the view, check if the key is present.

When syncing in a view branch with an adjustment, run adjustedBranchRefreshFull
the same as is done when syncing in other adjusted branches. This is
needed because the docs for git-annex adjust --unlock-present suggest
using git-annex sync to update the branch when annex.adjustedbranchrefresh
is not set.

Note that, with annex.adjustedbranchrefresh set, it just works! The
adjusted branch gets updated in the usual way and it doesn't matter that
there's a view branch underneath.

And of course, re-running git-annex adjut --unlock-present also works,
as suggested in the docs.

Sponsored-by: Erik Bjäreholt on Patreon
2023-02-27 15:37:57 -04:00
Joey Hess
f09e299156
rawfilepath conversion 2023-02-27 15:06:32 -04:00
Joey Hess
cc32e31161
understand adjusted view branch names
An adjusted view branch has a name like
"refs/heads/adjusted/views/master(author=_)(unlocked)", so it is a view
branch that has been converted to an adjusted branch.

Made Logs.View support such branch names. So now git-annex sync and
pre-commit handle updating metadata on commit in such a branch.

Much remains to be done to fully support adjusted view branches,
including actually applying the adjustment when updating the view branch.

Sponsored-by: Graham Spencer on Patreon
2023-02-27 14:57:58 -04:00
Joey Hess
16d3097a08
fix reversion in info, and add test case
info: Fix reversion in last release involving handling of unsupported input
by continuing to handle any other inputs, before exiting nonzero at the
end.

Sponsored-by: Dartmouth College's Datalad project
2023-02-20 14:31:37 -04:00
Joey Hess
452b080dba
better handling of multiple repositories with the same name
Used to fail with a bad error message, indicating there was no
repository with the specified name, or something like that. Now, suggest
they use the uuid to disambiguate.

* info, enableremotemote, renameremote: Avoid a confusing message when more
  than one repository matches the user provided name.
* info: Exit nonzero when the input is not supported.

Sponsored-by: Kevin Mueller on Patreon
2023-02-13 14:31:09 -04:00
Joey Hess
e9b6efac5a
fix buggy sync to exporttree remote when annex-tracking-branch is not checked out
sync: Fix a bug that caused files to be removed from an importtree=yes
exporttree=yes special remote when the remote's annex-tracking-branch was
not the currently checked out branch.

Sponsored-by: Max Thoursie on Patreon
2023-02-10 15:49:15 -04:00
Joey Hess
c2b3e870df
finishing up sync in view branch
sync: When run in a view branch, avoid updating synced/ branches, or trying
to merge anything from remotes.

Sponsored-by: Erik Bjäreholt on Patreon
2023-02-10 15:27:42 -04:00
Joey Hess
5f9bf51438
sync in view branch updates the view branch
* sync: When run in a view branch, refresh the view branch to reflect any
    changes that have been made to the parent branch or metadata.

This is basically working, but probably needs some more work to deal with
all the edge cases of things sync does.

Sponsored-by: Lawrence Brogan on Patreon
2023-02-08 15:37:28 -04:00
Joey Hess
a11d6e0baf
avoid sync pushing view branches to remotes, and better view branch names
* sync: Avoid pushing view branches to remotes.
* Changed the name of view branches to include the parent branch.
  Existing view branches checked out using an old name will still work.

It does not seem useful for sync to push view branches around, because
the information in a view branch can entirely be derived from other
information in git. And sync doesn't push adjusted branches around either.

The better view branch names make it more in line with adjusted branch
names, but were also needed to make fromViewBranch be able to return the
original branch name.

Kept the old view branch names still working. But, when those branches
exist in a repo, sync will still try to push them as before. Avoiding
that would need more complicated and/or expensive changes to sync.

Sponsored-By: Boyd Stephen Smith Jr. on Patreon
2023-02-08 13:57:48 -04:00
Joey Hess
c209e0f643
add FIELD?=GLOB to git-annex view usage
And also to vadd usage.

Also added some other things to the usage that were omitted before to
save space.

Adding even FIELD?=GLOB made the git-annex --help list of commands grow
too wide for an 80 column display. So, removed the description of
parameters from that list of commands.

Sponsored-By: Brock Spratlen on Patreon
2023-02-07 18:09:10 -04:00
Joey Hess
aa0350ff49
add directory to views for files that lack specified metadata
* view: New field?=glob and ?tag syntax that includes a directory "_"
  in the view for files that do not have the specified metadata set.
* Added annex.viewunsetdirectory git config to change the name of the
  "_" directory in a view.

When in a view using the new syntax, old git-annex will fail to parse the
view log. It errors with "Not in a view.", which is not ideal. But that
only affects view commands.

annex.viewunsetdirectory is included in the View for a couple of reasons.
One is to avoid needing to warn the user that it should not be changed when
in a view, since that would confuse git-annex. Another reason is that it
helped with plumbing the value through to some pure functions.

annex.viewunsetdirectory is actually mangled the same as any other view
directory. So if it's configured to something like "N/A", there won't be
multiple levels of directories, which would also confuse git-annex.

Sponsored-By: Jack Hill on Patreon
2023-02-07 16:28:46 -04:00
Joey Hess
152be2948b
use transfer stages for copy --from
See commit e04a931439 for an explanation
of why move uses transfer stages for --from, but command stages for
--to. At the point of that commit, copy was actually already using
command stages for everything, so the commit was incorrect about
improving copy --to.

But, the same reasoning about --from applies to copy as to move; when
verification is not done incrementally, download and verification are
the main two stages. The cleanup stage for copy is even less work than
for move (it doesn't drop from the remote).

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 14:07:49 -04:00
Joey Hess
579d9b60c1
improve concurrency of move/copy --from --to
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.

Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.

Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 13:59:39 -04:00
Joey Hess
77266e46dd
fix behavior of copy --from --to
Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:55:16 -04:00
Joey Hess
acc3f6211f
finishing up move --from --to
Lock the local content for drop after getting it from src, to prevent another
process from using the local content as a copy and dropping it from src,
which would prevent dropping the local content after sending it to dest.

Support resuming an interrupted move that downloaded the content from
src, leaving the local content populated. In this case, the location log
has not been updated to say the content is present locally, so we can
assume that it's resuming and go ahead and drop the local content after
sending it to dest.

Note that if a `git-annex get` is being ran at the same time as a
`git-annex move --from --to`, it may get a file just before the move
processes it. So the location log has not been updated yet, and the move
thinks it's resuming. Resulting in local copy being dropped after it's
sent to the dest. This race is something we'll just have to live with,
it seems.

I also gave up on the idea of checking if the location log had been updated
by a `git-annex get` that is ran at the same time. That wouldn't work, because
the location log is precached in the seek stage, so reading it again after
sending the content to dest would not notice changes made to it, unless the cache
were invalidated, which would slow it down a lot. That idea anyway was subject
to races where it would not detect the concurrent `git-annex get`.

So concurrent `git-annex get` will have results that may be surprising.
To make that less surprising, updated the documentation of this feature to
be explicit that it downloads content to the local repository
temporarily.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:43:48 -04:00
Joey Hess
f5f799f17e
fully working move --from --to (not release quality)
When the destination already has a copy, it behaves the same as
drop --from really, but display it as a move and implement it
reusing the factored out code from fromPerform.

(Note that willDropMakeItWorse never returns DropAllowed in that
situation, because it's told that dest has a copy. So numcopies is
always checked.)

And when only the source and not the local repo or destination have a
copy, do the full copy from source to local, then copy from local to
dest, then drop from local, then drop from source dance.

This is complicated by fromPerform being hardcoded to assume there is a
local copy, but the local copy has already been dropped. That's why
it uses cleanupfromsrc RemoveNever to avoid the code that makes that
assumption, and finishes with a call to dropfromsrc.

And, since the location log has not yet been updated, checking numcopies
was not working, until I added UnVerifiedRemote dest to the list of
things to check.

This is not yet quite mergeable though. There are two things in the
comment above fromToPerform that are not implemented yet: Checking the
location log before dropping the local copy, and locking the temporary
local copy for drop.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 16:12:33 -04:00
Joey Hess
1abd457e98
push location log updating up to callers of download
Prep for move --to --from, which needs to download from a src repo
without updating the location log for the local repo, before sending the
content on to the dest repo.

Note that caller of download' already update the log themselves.
See previous commit a422a056f2
that pushed it up to download from getViaTmpFrom.

(Also removed in passing a debug print + readline that I accidentially
committed last week on this branch.)

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:47:41 -04:00
Joey Hess
8c349b8802
implement move --from --to when there is a local copy already
This is rather trivial, since it does not need to temporarily get the
local copy.

Added fromPerform' to handle the situation where the local copy
is dropped by another process during the copy to the dest. This avoids
ever re-downloading the local copy before dropping from the src.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:17:35 -04:00
Joey Hess
a46c385aec
move/copy: started implementing --from src --to dest
This is not in a usable state, but I have a possible plan for how to do
it.

Sponsored-by: Dartmouth College's DANDI project
2023-01-20 11:10:38 -04:00
Joey Hess
a6c1d9752b
move/copy: option parsing for --from with --to
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!

The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.

Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.

Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.

Sponsored-by: Dartmouth College's DANDI project
2023-01-18 14:42:39 -04:00
Joey Hess
f8bc208e89
findkeys: New command, very similar to git-annex find but operating on keys
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.

Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.

A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.

Also, cleaned up comments on git-annex-find man page asking for --all
option.

Sponsored-by: Dartmouth College's DANDI project
2023-01-17 14:51:57 -04:00
Joey Hess
9d60385001
convert renameFile to moveFile to support cross-device moves
Improve handling of some .git/annex/ subdirectories being on other
filesystems, in the bittorrent special remote, and youtube-dl integration,
and git-annex addurl.

The only one of these that I've confirmed to be a problem is in the
bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are
on different filesystems.

As well as auditing for renameFile, also audited for createLink, all of
those are ok as are the other remaining renameFile calls. Also audited all
code paths that use .git/annex/othertmp, and did not find any other
cross-device problems. So, removing mention of othertmp needing to be on
the same device.

Sponsored-by: Dartmouth College's Datalad project
2022-12-20 15:17:50 -04:00
Joey Hess
2b014f1a8b
don't frontload reconcileStaged in git-annex init
init: Avoid scanning for annexed files, which can be lengthy in a
large repository. Instead that scan is done on demand. This lets git-annex
init be run and some query commands be used in a repository without
waiting.

Note that autoinit already behaved this way, so while this will mean some
commands like git-annex get/unlock/add will do the scan the first time run,
that is not really a significant behavior change.

And, it's really better to have a consistent behavior. The reason for
the inconsistency was a strange bug discussed in
b3c4579c79. Avoiding reconcileStaged in
init will keep avoiding whatever that was.

Sponsored-by: Dartmouth College's DANDI project
2022-11-18 13:58:47 -04:00
Joey Hess
b2cc63d5bf
export: fix multi-file delete bug
export: Fix a bug that left a file on a special remote when two files with
the same content were both deleted in the exported tree.

Case of the wrong data structure leading to the wrong result.
The DiffMap now contains all the old filenames, and all the new filenames.

Note that, when 2 files with the same content are both renamed,
it only renames the first, but deletes and re-exports the second.
Improving that is possible, but it would need to use a different temporary
filename. Anyway, that is an unusual case, and there are known to be other
unusual cases where export does not rename with maximum efficiency, IIRC.
(Or maybe this is the case that I remember?)

Sponsored-by: Dartmouth College's OpenNeuro project
2022-11-09 16:24:37 -04:00
Joey Hess
14f7a386f0
Make git-annex enable-tor work when using the linux standalone build
Clean the standalone environment before running the su command
to run "sh". Otherwise, PATH leaked through, causing it to run
git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set,
which caused that script to not work:

[2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"]
/home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found

Changed programPath to not use GIT_ANNEX_PROGRAMPATH,
but instead run the scripts at the top of GIT_ANNEX_DIR.
That works both when the standalone environment is set up, and when it's
not.

Sponsored-by: Kevin Mueller on Patreon
2022-10-26 15:45:08 -04:00
Joey Hess
731e806c96
use lookupKeyStaged in --batch code paths
Make --batch mode handle unstaged annexed files consistently whether the
file is unlocked or not. Before this, a unstaged locked file
would have the symlink on disk examined and operated on in --batch mode,
while an unstaged unlocked file would be skipped.

Note that, when not in batch mode, unstaged files are skipped over too.
That is actually somewhat new behavior; as late as 7.20191114 a
command like `git-annex whereis .` would operate on unstaged locked
files and skip over unstaged unlocked files. That changed during
optimisation of CmdLine.Seek with apparently little fanfare or notice.

Turns out that rmurl still behaved that way when given an unstaged file
on the command line. It was changed to use lookupKeyStaged to
handle its --batch mode. That also affected its non-batch mode, but
since that's just catching up to the change earlier made to most
other commands, I have not mentioed that in the changelog.

It may be that other uses of lookupKey should also change to
lookupKeyStaged. But it may also be that would slow down some things,
or lead to unwanted behavior changes, so I've kept the changes minimal
for now.

An example of a place where the use of lookupKey is better than
lookupKeyStaged is in Command.AddUrl, where it looks to see if the file
already exists, and adds the url to the file when so. It does not matter
there whether the file is staged or not (when it's locked). The use of
lookupKey in Command.Unused likewise seems good (and faster).

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-10-26 14:43:06 -04:00
Joey Hess
b2ee2496ee
remove whenAnnexed and ifAnnexed
In preparation for adding a new variation on lookupKey.

Sponsored-by: Max Thoursie on Patreon
2022-10-26 14:06:32 -04:00
Joey Hess
ba7ecbc6a9
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)

Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.

In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.

In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.

The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.

After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?

Sponsored-by: Dartmouth College's Datalad project
2022-10-12 14:12:23 -04:00
Joey Hess
c2ad84b423
all keys are still present on versioned remote after import of a tree
When importing from versioned remotes, fix tracking of the content of
deleted files.

Only S3 supports versioning so far, so only it was affected.

But, the draft import/export interface for external remotes also seemed to
need a change, so that versionedExport could be set.
2022-10-11 13:05:40 -04:00
Joey Hess
4a42c69092
take lock in checkLogFile and calcLogFile
move: Fix openFile crash with -J

This does make them a bit slower, although usually the log file is not
very big, so even when it's being rewritten, they will not block for
long taking the lock. Still, little slowdowns may add up when moving a lot
file files.

A less expensive fix would be to use something lower level than openFile
that does not check if the file is already open for write by another
thread. But GHC does not seem to provide anything convenient; even mkFD
checks for a writing thread.

fullLines is no longer necessary since these functions no longer will
read the file while it's being written.

Sponsored-by: Dartmouth College's DANDI project
2022-10-07 13:19:17 -04:00
Joey Hess
44d763468a
add missing whitespace in warning message 2022-10-04 13:30:22 -04:00
Joey Hess
70d2ece381
improve usage
These commands operate on not only remotes, but any way a repository can
be specified, including "here" etc.

Sponsored-by: Graham Spencer on Patreon
2022-10-03 13:49:42 -04:00
Joey Hess
15f9fcbcb1
avoid combining multiple words provided to trust/untrust/dead
* trust, untrust, semitrust, dead: Fix behavior when provided with
  multiple repositories to operate on.
* trust, untrust, semitrust, dead: When provided with no parameters,
  do not operate on a repository that has an empty name.

The man page and usage already indicated that multiple repos could be
provided to these commands, but they actually used unwords to combine
everything into string, and found a repo matching that string. This was
especially bad when no parameters resulted in the empty string and some
repo happened to have an empty description.

This does change the behavior, and it's possible someone relied on the
current behavior to eg, trust a repo by name with the name not quoted into
a single parameter. But fixing the empty string bug and matching the
documentation are worth breaking that usage.

Note that git-annex init/reinit do still unwords multiple parameters when
provided to them. That is inconsistent behavior, but it certianly seems
possible that something does run git-annex init with an unquoted
description, and I don't think it's worth breaking that just to make it more
consistent with these other commands.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2022-10-03 13:48:40 -04:00
Joey Hess
ce65f11de0
enable-tor: Fix breakage caused by git's fix for CVE-2022-24765
This relies on bfa451fc4e and is a bit of an
ugly hack.

Sponsored-by: Noam Kremen on Patreon
2022-09-26 14:48:58 -04:00
Joey Hess
2478e9e03a
restage: New git-annex command, handles restaging unlocked files
This is much easier and less failure-prone than having the user run
git update-index --refresh themselves.

Sponsored-by: Dartmouth College's DANDI project
2022-09-23 16:29:59 -04:00
Joey Hess
b17e328175
avoid unncessary locking by checkLogFile
Like the comment says, this works without locking. It looks like I
originally copied another function and forgot to remove the locking.

Sponsored-by: Dartmouth College's DANDI project
2022-09-23 14:01:43 -04:00
Joey Hess
451a7ce77f
vicfg: Include mincopies configuration
Sponsored-by: k0ld on Patreon
2022-09-15 15:11:59 -04:00
Joey Hess
eefc026370
fix reversion on skipping dead keys in --all/bare
Fix a reversion that made dead keys not be skipped when operating on all
keys via --all or in a bare repo. (Introduced in version 8.20200720)

Also, improved the documentation of git-annex-dead, it does not only apply
to fsck --all.

Also, made git-annex fsck, when run on a file whose key is dead, display
that. Before, it displayed that only when run with --all, but with this
fix, it skips dead keys with --all. But it can still be run on a file that
uses a dead key, and displaying "This key is dead" explains to the user
why it does not consider missing content for it to be a problem.

Sponsored-by: k0ld on Patreon
2022-09-13 14:38:13 -04:00
Joey Hess
c62fe5e9a8
avoid redundant prompt for http password in git-annex get that does autoinit
autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git
remote has not been probed yet, that will do a http get that will prompt
for a password. And then the parent process will subsequently prompt
for a password when getting annexed files from the remote.

So the solution is for autoEnableSpecialRemotes to run remoteList before
the subprocess, which will probe for the uuid for the git remote in the
same process that will later be used to get annexed files.

But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git,
so Annex.Init cannot import Remote.List. Had to pass remoteList into
functions in Annex.Init to get around this dependency loop.
2022-09-09 14:43:43 -04:00
Joey Hess
d4fd966396
avoid dup check of guardSafeToUseRepo
Speeds up init slightly, and reduces the number of syscalls by the
dynamic linker.

Sponsored-by: Dartmouth College's Datalad project
2022-08-29 13:52:58 -04:00
Joey Hess
94029995fa
fix git-annex add regression on deleted file
Fix a regression in 10.20220624 that caused git-annex add to crash when
there was an unstaged deletion.

Sponsored-by: Dartmouth College's Datalad project
2022-08-19 12:55:49 -04:00
Joey Hess
e60766543f
add annex.dbdir (WIP)
WIP: This is mostly complete, but there is a problem: createDirectoryUnder
throws an error when annex.dbdir is set to outside the git repo.

annex.dbdir is a workaround for filesystems where sqlite does not work,
due to eg, the filesystem not properly supporting locking.

It's intended to be set before initializing the repository. Changing it
in an existing repository can be done, but would be the same as making a
new repository and moving all the annexed objects into it. While the
databases get recreated from the git-annex branch in that situation, any
information that is in the databases but not stored in the branch gets
lost. It may be that no information ever gets stored in the databases
that cannot be reconstructed from the branch, but I have not verified
that.

Sponsored-by: Dartmouth College's Datalad project
2022-08-11 16:58:53 -04:00
Joey Hess
21cfd0ea98
fix reversion
3a513cfe73 caused a reversion in addurl.
The type of addSmall changed, but the void prevented the type checker
from helping notice this. Since it now returns a CommandPerform, the
cleanup action has to be run.

Sponsored-by: Dartmouth College's Datalad project
2022-08-09 13:49:30 -04:00
Joey Hess
3a513cfe73
add --dry-run: New option
This is intended for users who want to see what it would output in order to
eg, check if a file would be added to git or the annex. It is not intended
as a way for scripts to get information.

Sponsored-by: Dartmouth College's Datalad project
2022-08-03 11:16:04 -04:00
Joey Hess
570b1aa6a1
Allow find --branch to be used in a bare repository, the same as the deprecated findref can be
This will allow later fully deprecating and removing findref.

Sponsored-by: Erik Bjäreholt on Patreon
2022-07-29 12:52:12 -04:00
Joey Hess
be19a68276
new matching options --want-get-by and --want-drop-by
Sponsored-by: Graham Spencer on Patreon
2022-07-28 13:26:03 -04:00
Joey Hess
2d65c4ff1d
avoid unix-compat's rename
On Windows, that does not support long paths
https://github.com/jacobstanley/unix-compat/issues/56

Instead, use System.Directory.renamePath, which does support long paths.

Sponsored-by: Dartmouth College's Datalad project
2022-07-12 14:55:02 -04:00
Joey Hess
201e41cffd
add: Fix reversion when adding an annex link that has been moved to another directory
Fixes commit f259be7f39

Sponsored-by: Dartmouth College's Datalad project
2022-07-05 16:22:41 -04:00
Joey Hess
149d12f188
support --backend again in addurl and importfeed
Missed these two when converting from a global option.

Sponsored-by: Dartmouth College's Datalad project
2022-07-05 15:35:43 -04:00
Joey Hess
cddcabfbb5
include filename in fsck warning
So it's available in --quiet mode. The same was already done in other
fsck warnings.

Sponsored-by: Noam Kremen on Patreon
2022-06-29 14:33:35 -04:00
Joey Hess
b223988e22
remove --backend from global options
--backend is no longer a global option, and is only accepted by commands
that actually need it.

Three commands that used to support backend but don't any longer are
watch, webapp, and assistant. It would be possible to make them support it,
but I doubt anyone used the option with these. And in the case of webapp
and assistant, the option was handled inconsistently, only taking affect
when the command is run with an existing git-annex repo, not when it
creates a new one.

Also, renamed GlobalOption etc to AnnexOption. Because there are many
options of this type that are not actually global (any more) and get
added to commands that need them.

Sponsored-by: Kevin Mueller on Patreon
2022-06-29 13:33:25 -04:00
Joey Hess
543993b068
remove redundant pattern match
brilliant spot by new ghc
2022-06-28 15:40:27 -04:00
Joey Hess
cb9cf30c48
move several readonly values to AnnexRead
This improves performance to a small extent in several places.

Sponsored-by: Tobias Ammann on Patreon
2022-06-28 15:40:19 -04:00
Joey Hess
debcf86029
use RawFilePath version of rename
Some small wins, almost certianly swamped by the system calls, but still
worthwhile progress on the RawFilePath conversion.

Sponsored-by: Erik Bjäreholt on Patreon
2022-06-22 16:47:34 -04:00
Joey Hess
d00e23cac9
RawFilePath optimisations 2022-06-22 16:20:08 -04:00