importfeed bug fix: When -J was used with multiple feeds, some feeds did
not get their items downloaded.
In my case, I had added a feed to the end of the list, and no items from it
were ever downloaded.
Sponsored-by: Leon Schuermann on Patreon
yt-dlp when resumed was observed having written the same filename twice
into the file list. Perhaps once by the first download and once by the
resumed one?
(And allow it to be used in the --template although that seems unlikely to
be very useful.)
My use case for this is that one of the podcast feeds I subscribe to is
sometimes leaking episodes of some other podcast. The other podcast is also
very close to spam, so this may be a form of intentional spamming. I have
not been able to catch the podcast feed containing those episodes, so I
don't know which one is at fault. So putting this in the metadata will let
me eventually catch it.
Bug fix: Re-running git-annex adjust or sync when in an adjusted branch
would overwrite the original branch, losing any commits that had been made
to it since the adjusted branch was created.
When git-annex adjust is run in this situation, it will display a warning
about the diverged branches.
When git-annex sync is run in this situation, mergeToAdjustedBranch
will merge the changes from the original branch to the adjusted branch.
So it does not need to display the divergence warning.
Note that for some reason, I'm needing to run sync twice for that to
happen. The first run does not do the merge and the second does. I'm unsure
why and so am not fully done with this bug.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Running git config --list inside .git then fails, so better to only
do that when --git-dir was specified explicitly. Otherwise, when the
repository is not bare, run the command inside the working tree.
Also make init detect when the uuid it just set cannot be read and fail
with an error, in case git changes something that breaks this later.
I still don't actually understand why git-annex add/assist -J2 was
affected but -J1 was not. But I did show that it was skipping writing to
the location log, because the uuid was NoUUID.
Sponsored-by: Graham Spencer on Patreon
Command.Add.seek starts concurrency with CommandStages. And for
Command.Sync, it needs TransferStages. So, to get both types of concurrency
for the two different parts, it either needs to change the type of
concurrency in between, or just call startConcurrency once for each.
It seems safe enough to call startConcurrency twice, because it does shut
down concurrency (mostly) at the end, and eg the old Annex.workers get
emptied.
Sponsored-by: unqueued on Patreon
This ended up having an interface like sync, rather than like get/copy/drop.
That let it be implemented in terms of sync, which took a lot less code.
Also, it lets it handle many of the edge cases that sync does, such as
getting files that are not visible in a --hide-missing branch, and sending
files to exporttree remotes.
As well as being easier to implement, `git-annex satisfy myremote` makes
sense as it satisfies the preferred content settings of the remote.
`git-annex satisfy somefile` does not form a sentence that makes sense. So
while -C can be a little bit annoying, it still makes sense to have this
syntax.
Note that, while I initially thought this would also satisfy numcopies, it
does not. Arguably it ought to. But, sync does not send files in order to
satisfy numcopies, it only sends files to satisfy preferred content. And
it's important that this transfer the same files as sync does, because
it will probably be used in a workflow where the user sometimes syncs and
sometimes satisfies, and does not expect satisfy to do things that sync
would not do.
(Also opened a new bug that also affects sync et all, not only this command.)
Sponsored-by: Nicholas Golder-Manning on Patreon
This was already possible, but it was rather hard to come up with the
complex shell command needed.
Note that the diff output starts with "diff a/... b/...".
I left off the "--git" because it's not a git format diff.
Fixes a crash by git-annex repair when .git/annex/journal/ does not exist.
Normally the journal directory is created before withJournalHandle gets
run, but git-annex repair can be run in a situation where it does not
exist.
Fixes a failure like this:
curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume.
That happens because the whole web page has already been downloaded
previously, and kept, so now addurl tries to download it, and curl asks the
server to resume from the last byte. And youtube.com can't, for whatever
stupid reason.
So, delete the temp file after determining that youtube-dl can be used.
Sometimes resuming an interrupted download will fail to resume and download
more files with different names. That resulted in the workdir having
multiple files at the end, which causes git-annex to give up because it
does not know what was downloaded.
To fix this, use a yt-dlp feature, which appends to a file the name of each
file after it's finished downloading it. So the presence of other cruft in
the workdir will not confuse git-annex.
git add will fail if the file got deleted in the meantime. And since it was
queued, there was a window until the queue flushed where a deletion of the
file would cause a crash.
Instead, reuse Command.Add.addFile, which sha1 hashes the file itself
immediately, and then queues the index update. Ignore exceptions that will
happen if the file got deleted already.
Sponsored-by: k0ld on Patreon
Commit b6642dde8a broke it by enabling
non-concurrent display mode while leaving concurrency set in the config
and having already started concurrency earlier.
(I don't actually know if that commit was a good idea.)
Sponsored-By: Brett Eisenberg on Patreon
* config: Added the --show-origin and --for-file options.
* config: Support annex.numcopies and annex.mincopies.
There is a little bit of redundancy here with other code elsewhere that
combines the various configs and selects which to use. But really only
for the special case of annex.numcopies, which is a git config that does
not override the annex branch setting and for annex.mincopies, which does
not have a git config but does have gitattributes settings as well as the
annex branch setting.
That seems small enough, and unlikely enough to grow into a mess that it was
worth supporting annex.numcopies and annex.mincopies in git-annex config
--show-origin. Because these settings are a prime thing that someone might
get confused about and want to know where they were configured.
And, it followed that git-annex config might as well support those two
for --set and --get as well. While this is redundant with the speclialized
commands, it's only a little code and it makes it more consistent.
Note that --set does not have as nice output as numcopies/mincopies
commands in some special cases like setting to 0 or a negative number.
It does avoid setting to a bad value thanks to the smart
constructors (eg configuredNumCopies).
As for other git-annex branch configurations that are not set by git-annex
config, things like trust and wanted that are specific to a repository
don't map to a git config name, so don't really fit into git-annex config.
And they are only configured in the git-annex branch with no local override
(at least so far), so --show-origin would not be useful for them.
Sponsored-by: Dartmouth College's DANDI project
Optimise database to further speed up importing large trees from special
remotes.
See comment for details of why the other index didn't help cid queries.
It would probably be better to manually create an index on only cid, rather
than adding a second uniqueness constraint that is a larger index. But
persitent does not support creating indexes, and an attempt to manually add
it to the migration failed.
Sponsored-by: Nicholas Golder-Manning on Patreon
Speeds up sync in an adjusted branch by avoiding re-adjusting the branch
unncessarily, particularly when it is adjusted with --hide-missing or
--unlock-present.
When there are a lot of files, that was the majority of the time of a
--no-content sync.
Uses a log file, which is updated when content presence changes. This
adds a little bit of overhead to every file get/drop when on such an
adjusted branch. The overhead is minimal for get of any size of file,
but might be noticable for drop in some cases. It seems like a reasonable
trade-off. It would be possible to update the log file only at the end, but
then it would not happen if the command is interrupted.
When not in an adjusted branch, there should be no additional overhead.
(getCurrentBranch is an MVar read, and it avoids the MVar read of
getGitConfig.)
Note that this does not deal with situations such as:
git checkout master, git-annex get, git checkout adjusted branch,
git-annex sync. The sync won't know that the adjusted branch needs to be
updated. Dealing with that would add overhead to operation in non-adjusted
branches, which I don't like. Also, there are other situations like having
two adjusted branches that both need to be updated like this, and switching
between them and sync not updating.
This does mean a behavior change to sync, since it did previously deal
with those situations. But, the documentation did not say that it did.
The man pages only talk about sync updating the adjusted branch after
it transfers content.
I did consider making sync keep track of content it transferred (and
dropped) and only update the adjusted branch then, not to catch up to other
changes made previously. That would perform better. But it seemed rather
hard to implement, and also it would have problems with races with a
concurrent get/drop, which this implementation avoids.
And it seemed pretty likely someone had gotten used to get/drop followed by
sync updating the branch. It seems much less likely someone is switching
branches, doing get/drop, and then switching back and expecting sync to update
the branch.
Re-running git-annex adjust still does a full re-adjusting of the branch,
for anyone who needs that.
Sponsored-by: Leon Schuermann on Patreon
Speeds up eg git-annex sync --content by up to 50%. When it does not need
to transfer or drop anything, it now noops a lot more quickly.
I didn't see anything else in sync --content noop loop that could really
be sped up. It has to cat git objects to keys, stat object files, etc.
Sponsored-by: unqueued on Patreon
Reading from the cidsdb is responsible for about 25% of the runtime of
an import. Since the cidmap is used to store the same information in
ram, the cidsdb is not written to during an import any longer. And so,
if it started off empty (and updateFromLog wasn't needed), those reads
can just be skipped.
This is kind of a cheesy optimisation, since after any import from any
special remote, the database will no longer be empty, so it's a single
use optimisation. But it's probably not uncommon to start by importing a
lot of files, and it can save a lot of time then.
Sponsored-by: Brock Spratlen on Patreon
Large speed up to importing trees from special remotes that contain a lot
of files, by only processing changed files.
Benchmarks:
Importing from a special remote that has 10000 files, that have all been
imported before, and 1 new file sped up from 26.06 to 2.59 seconds.
An import with no change and 10000 unchanged files sped up from 24.3 to
1.99 seconds.
Going up to 20000 files, an import with no changes sped up from
125.95 to 3.84 seconds.
Sponsored-by: k0ld on Patreon
Speed up importing trees from special remotes somewhat by avoiding
redundant writes to sqlite database.
Before, import would write to both the git-annex branch and also to the
sqlite database. But then the next time it was run, needsUpdateFromLog
would see the branch had changed, so run updateFromLog, which would make
the same writes to the sqlite database a second time.
Now import writes only to the git-annex branch. The next time it's run,
needsUpdateFromLog sees that the branch has changed and so calls
updateFromLog, which updates the sqlite database.
Why defer the write to the sqlite database like this? It seems that it
could write to the database as it goes, and at the end call
recordAnnexBranchTree to indicate that the information in the git-annex
branch has all been written to the cidsdb. That would avoid the second
import doing extra work.
But, there could be other processes running at the same time, and one of
them may update the git-annex branch, eg merging a remote git-annex branch
into it. Any cids logs on that merged git-annex branch would not be
reflected in the cidsdb yet. If the import then called
recordAnnexBranchTree, the cidsdb would never get updated with that merged
information.
I don't think there's a good way to prevent, or to detect that situation.
So, it can't call recordAnnexBranchTree at the end. So it might as well
wait until the next run and do updateFromLog then. It could instead do
updateFromLog at the end, but it's going to check needsUpdateFromLog
at the beginning anyway.
Note that the database writes were queued, so there is already a cidmap
that is used to remember changes that the current process has made.
So, omitting database writes can't change the behavior of the current
process.
Also note that thirdpartypopulatedimport uses recordcidkeyindb, which
reflects what it already did. That code path does not use the cidmap,
but does not need to query it either. It might be possible to make that
code path also only update the git-annex branch and not the db, but I
haven't checked.
Sponsored-by: Noam Kremen on Patreon
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.
Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.
So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.
git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.
Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.
Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?
Sponsored-by: Joshua Antonishen on Patreon
The obvious way to fix this would be to adapt lines to split on null.
However, it's actually nontrivial to rewrite lines. In particular it has a
weird implementation to avoid a space leak. See:
https://gitlab.haskell.org/ghc/ghc/-/issues/4334
Also, while that is a small amount of code, it's covered by a rather
complex copyright and I'd have to include that copyright in git-annex.
So, I opted to filter out the trailing empty string instead.
Sponsored-by: Dartmouth College's Datalad project
assist: New command, which is the same as git-annex sync but with
new files added and content transferred by default.
(Also this fixes another reversion in git-annex sync,
--commit --no-commit, and --message were not enabled, oops.)
See added comment for why git-annex assist does commit staged
changes elsewhere in the work tree, but only adds files under
the cwd.
Note that it does not support --no-commit, --no-push, --no-pull
like sync does. My thinking is, why should it? If you want that
level of control, use git commit, git annex push, git annex pull.
Sync only got those options because pull and push were not split
out.
Sponsored-by: k0ld on Patreon
When used without --content or --no-content, warn about the upcoming
transition, and suggest using one of the options, or setting
annex.synccontent.
Sponsored-by: Brett Eisenberg on Patreon
I anticipate that if sync is transitioned to syncing content by default,
people will want a short option. And in repositories where
annex.synccontent = true, they already would. And pull and push sync
content by default, so a short option is useful with them too.
Mnemonic: -g makes only git data be synced
Also, -a makes only annex data be synced.
Would have preferred -c, which would complement -C, but it
was already taken to set git configs.
Sponsored-by: Noam Kremen on Patreon
Split out two new commands, git-annex pull and git-annex push. Those plus a
git commit are equivilant to git-annex sync.
In a sense, git-annex sync conflates 3 things, and it would have been
better to have push and pull from the beginning and not sync. Although
note that git-annex sync --content is faster than a pull followed by a
push, because it only has to walk the tree once, look at preferred
content once, etc. So there is some value in git-annex sync in speed, as
well as user convenience.
And it would be hard to split out pull and push from sync, as far as the
implementaton goes. The implementation inside sync was easy, just adjust
SyncOptions so it does the right thing.
Note that the new commands default to syncing content, unless
annex.synccontent is explicitly set to false. I'd like sync to also do
that, but that's a hard transition to make. As a start to that
transition, I added a note to git-annex-sync.mdwn that it may start to
do so in a future version of git-annex. But a real transition would
necessarily involve displaying warnings when sync is used without
--content, and time.
Sponsored-by: Kevin Mueller on Patreon
The man page is somewhat vague about this, but I do think it was a bug
that these options didn't alreay behave that way. The options are
documented to disable imports and exports, which is the same operations
just with a special remote that uses trees.
The real motivation for this is that I'm adding git-annex pull and
git-annex push, and I want these options to turn off the equivilant of
those commands. And git-annex pull will certianly download and push
upload.
Sponsored-by: Nicholas Golder-Manning on Patreon
Had to convert uninit to do everything that can error out inside a
CommandStart. This was harder than feels nice.
(Also, in passing, converted CommandCheck to use a data type, not a
weird number that it was not clear how it managed to be unique.)
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Seems unlikely to be too useful, but who knows.
Moved the checkSafeConfig call to happen after an action is started, so
it will be captured by --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Including special --whatelse handling.
Otherwise, it seems unlikely to be too useful, but who knows.
Refactored code to call starting before displaying error messages.
This makes the error messages be captured by --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Seems unlikely to be very useful, but trivial.
And, this completes the story that git-annex sync does not need json,
since every sub-operation is available in a command that does support json.
(Well, except for committing, but that's not a git-annex command.)
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Both -J and --json needed importfeed to be refactored to use commandAction.
That was difficult, because of the interrelated nature of downloading feeds
and then downloading files from feeds, both of which needed to use
commandAction. And then checking for problems in feeds has to come after
these actions, which may be run as background jobs.
As for --json support, it's most of the way there, but still has some
warts, so I didn't enable jsonOptions yet. The warts include:
- An initial empty json record is displayed by getCache.
- Input is not populated, should be feed url
- feedProblem at end will not be captured by --json-error-messages
(see FIXME)
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Also fix support for operating on multiple pairs of files and keys.
Moved notAnnexed to inside starting, so error message will get into the json.
Cannot include the key in the starting as it's not known yet, so instead
add it to the json later.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Dunno how useful this will be, since about all that's accessible from
the json is whether it succeeded or failed, and the error messages
which were already on stderr.
Note that, when autoenabling a special remote, it would be possible for
one to stop and prompt or output not using Messages and so not output as
part of the json. I don't think that happens, but I'm not 100% sure
something doesn't manage to break it. Of course, the same could be the
case for commands that transfer objects. Using Annex.Init.autoEnableSpecialRemotes
in --json mode would avoid the problem, but I've chosen to wait until I
know it's needed to use it.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Generalized AddJSONActionItemField to allow it to add several fields. Not entirely
happy with that, since the names of the fields have to be carefully chosen to
not conflict with other json fields. And fields added that way can't be parsed
back in FromJSON, except for the "fields" field that is special cased for metadata.
Still, I couldn't see another way to do it.
Also, omit file:null from the json output. Which does affect other commands,
eg git-annex whereis --all --json. Hopefully that won't break something that expects
a null file. If it did, that could be reverted, but it would be ugly to have
file:null in the unused --json
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
For expire, the normal output is unchanged, but the --json output includes the uuid
in machine parseable form. Which could be very useful for this somewhat obscure
command. That needed ActionItemUUID to be implemented, which seemed like a lot
of work, but then ---
I had been going to skip implementing them for trust, untrust, dead, semitrust,
and describe, but putting the uuid in the json is useful information, it tells
what uuid git-annex picked given the input. It was not hard to support
these once ActionItemUUID was implemented.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
This also changes addunused to display the names of the files that it adds.
That seems like a general usability improvement, and not displaying the input
number does not seem likely to be a problem to a user, since the filename
is based on the key. Displaying the filename was necessary to get it and the key
included in the json.
dropunused does not include the key in the json. It would be possible to
add, but would need more changes. And I doubt that dropunused --json
would be used in a situation where a program cared which keys were
dropped. Note that drop --unused does have the key in its json, so such
a program could just use it. Or could just dropkey --batch with the
specific keys it wants to drop if it cares about specific keys.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Also in passing the --all display was fixed up to not quote keys like filenames.
Note that the check added to compareChanges was needed to avoid logging when
nothing changed.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
log: When --raw-date is used, display only seconds from the epoch, as
documented, omitting a trailing "s" that was included in the output
before.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
The json does not include an url field, but it does have an input field that is
"file url" when using --batch and ["file", "url"] when using the command line.
I chose not to change that because it would complicate batchInput.
An url field could be added if it turns out to be useful.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
This makes annexFileMode be just an application of setAnnexPerm',
which avoids having 2 functions that do different versions of the same
thing.
Fixes some buggy behavior for some combinations of core.sharedRepository
and umask.
Sponsored-by: Jack Hill on Patreon
These two missed setting it.
It rarely matters that the journal gets the right perm. But, when using
annex.alwayscommit=false, someone else may come along later and want
to append to the journal file.
It probably never matters what the sentinal perms are, but for
completeness..
Sponsored-by: Luke Shumaker on Patreon
init: Bug fix: Create .git/annex/ and .git/annex/fsckdb/ directories with
permissions configured by core.sharedRepository.
The fsckfb being created happens to create .git/annex/ and it was not using
createAnnexDirectory. Probably a reversion partly, but maybe the database
directory was always created not honoring core.sharedRepository?
Sponsored-by: Noam Kremen on Patreon
This spams the user with a lot of messages, but it seems like busywork to
avoid that and only warn once, since this warning will go away when it gets
implemented.
Also fix parsing of the octal value.
Sponsored-by: Kevin Mueller on Patreon
That's too much quoting, the user expects the filename to be copy and
pasteable. It would be ok to slash-escape space ('\ ')
which is what gnu find does, but it doesn't seem necessary either.
${escaped_file} has always quoted spaces though, so keep on doing it
there.
Sponsored-by: Nicholas Golder-Manning on Patreon
When a nonexistant file is passed to a command and --json-error-messages
is enabled, output a JSON object indicating the problem.
(But git ls-files --error-unmatch still displays errors about such files in
some situations.)
I don't like the duplication of the name of the command introduced by this,
but I can't see a great way around it. One way would be to pass the Command
instead.
When json is not enabled, the stderr is unchanged. This is necessary
because some commands like find have custom output. So dislaying
"find foo not found" would be wrong. So had to complicate things with
toplevelFileProblem having different output with and without json.
When not using --json-error-messages but still using --json, it displays
the error to stderr, but does display a json object without the error. It
does have an errorid though. Unsure how useful that behavior is.
Sponsored-by: Dartmouth College's Datalad project
This reverts commit a325524454.
Turns out this was predicated on an incorrect belief that json output
didn't already sometimes lack the "key" field. Since json output already
can when `giveup` was used, it seems unncessary to add a whole new
option for this.
Added a --json-exceptions option, which makes some exceptions be output in json.
The distinction is that --json-error-messages is for messages relating
to a particular ActionItem, while --json-exceptions is for messages that
are not, eg ones for a file that does not exist.
It's unfortunate that we need two switches with such a fine distinction
between them, but I'm worried about maintaining backwards compatability
in the json output, to avoid breaking anything that parses it, and this was
the way to make sure I didn't.
toplevelWarning is generally used for the latter kind of message. And
the other calls to toplevelWarning could be converted to showException. The
only possible gotcha is that if toplevelWarning is ever called after
starting acting on a file, it will add to the --json-error-messages of the
json displayed for that file and converting to showException would be a
behavior change. That seems unlikely, but I didn't convery everything to
avoid needing to satisfy myself it was not a concern.
Sponsored-by: Dartmouth College's Datalad project
Propagate Annex.force into the remote's Annex state.
Fixes this problem:
joey@darkstar:~/tmp/xxxx>git-annex copy mmm --to origin --force
copy mmm (to origin...)
not enough free space, need 908.72 MB more (use --force to override this check or adjust annex.diskreserve)
failed to send content to remote
failed
Does beg the question if anything else should be propagated.
Some things like Annex.forcenumcopies certianly not; using --numcopies
overrides the number of copies the current repo wants, not all of them.
Sponsored-by: Graham Spencer on Patreon
New command, currently limited to changing autoenable= setting of a special remote.
It will probably never be used for more than that given the limitations on
it.
Sponsored-by: Brock Spratlen on Patreon
enableremote: Support enableremote of a git remote (that was previously set
up with initremote) when additional parameters such as autoenable= are
passed.
The enableremote special case for regular git repos is intended to handle
ones that don't have a UUID probed, and the user wants git-annex to
re-probe. So, that special case is still needed. But, in that special
case, the user is not passing any extra parameters. So, when there are
parameters, instead run the special remote setup code. That requires there
to be a uuid known already, and it allows changing things like autoenable=
Remote.Git.enableRemote changed to be a no-op if a git remote with the name
already exists. Which it generally will in this case.
Sponsored-by: Jack Hill on Patreon
These are quite low-level, but still there is no point in displaying
escape sequences that have been embedded in a key to the terminal.
I think these are the only remaining commands that didn't use safe
output, except for cases where git-annex is speaking a protocol to
itself.
Sponsored-by: Kevin Mueller on Patreon
I'm on the fence about this. Notice that pulling from a git remote can
pull branches that have escape sequences in their names. Git will
display those as-is. Arguably git should try harder to avoid that.
But, names of remotes are usually up to the local user, and autoenable
changes that, and so it makes sense that git chooses to display control
characters in names of remotes, and so autoenable needs to guard against
it.
Sponsored-by: Graham Spencer on Patreon
Searched for uses of putStr and hPutStr and changed appropriate ones to filter
out control characters and quote filenames.
This notably does not make find and findkeys quote filenames in their default
output. Because they should only do that when stdout is non a pipe.
A few commands like calckey and lookupkey seem too low-level to make sense to filter
output, so skipped those.
Also when relaying output from other commands that is not progress output,
have git-annex filter out control characters.
Sponsored-by: k0ld on Patreon
As well as escape sequences, control characters seem unlikely to be desired when
doing addurl, and likely to trip someone up. So disallow them as well.
I did consider going the other way and allowing filenames with control characters
and escape sequences, since git-annex is in the process of escaping display
of all filenames. Might still be a better idea?
Also display the illegal filename git quoted when it rejects it.
Sponsored-by: Nicholas Golder-Manning on Patreon
This is by no means complete, but escaping filenames in actionItemDesc does
cover most commands.
Note that for ActionItemBranchFilePath, the value is branch:file, and I
choose to only quote the file part (if necessary). I considered quoting the
whole thing. But, branch names cannot contain control characters, and while
they can contain unicode, git coes not quote unicode when displaying branch
names. So, it would be surprising for git-annex to quote unicode in a
branch name.
The find command is the most obvious command that still needs to be
dealt with. There are probably other places that filenames also get
displayed, eg embedded in error messages.
Some other commands use ActionItemOther with a filename, I think that
ActionItemOther should either be pre-sanitized, or should explicitly not
be used for filenames, so that needs more work.
When --json is used, unicode does not get escaped, but control
characters were already escaped in json.
(Key escaping may turn out to be needed, but I'm ignoring that for now.)
Sponsored-by: unqueued on Patreon
registerurl: When an url is claimed by a special remote other than the web,
update location tracking for that special remote.
registerurl's behavior was changed in commit
451171b7c1, apparently accidentially to not
update location tracking except for the web.
This makes registerurl followed by unregisterurl not be a no-op, when the
url happens to be claimed by a remote other than the web. It is a noop when
the url is unclaimed except by the web. I don't like the inconsistency,
and wish that registerurl and unregisterurl never updated location
tracking, which would be more in keeping with them being plumbing.
But there is the fact that it used to behave this way, and also it was
inconsistent that it updated location tracking for the web but not for
other remotes, unlike addurl. And there's an argument that the user might
not know what remote to expect to claim an url, so would be considerably in
the dark when using registerurl. (Although they have to know what content
gets downloaded, since they specify a key..)
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
This serves two purposes. --remote=web bypasses other special remotes that
claim the url, same as addurl --raw. And, specifying some other remote
allows making sure that an url is claimed by the remote you expect,
which makes then using setpresentkey not be fragile.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
This reverts commit 66eb63dd82.
git-annex init is the only thing that uses ensureCommit. So overriding
there will make later commits to the git-annex branch or by git-annex sync
fail.
It's ugly that git-annex init sets user.name and user.email, but it only
does it on systems that are badly configured.
When it's set and git cannot determine user.name or user.email, this will
result in git-annex init failing when committing to create the git-annex
branch. Other git-annex commands that commit can also fail.
Sponsored-by: Jack Hill on Patreon
Avoid setting user.name and user.email in the git config when git is unable
to detect them.
git-annex has good reason to want to ensure git commit succeeds when eg
committing to the git-annex branch. But it's not playing nice to set these
values where other commands can see them.
Sponsored-by: Brett Eisenberg on Patreon
Fix laziness bug introduced in last release that breaks use of
--unlock-present and --hide-missing adjusted branches.
Since there is a writeFile of the same file immediately after readFile, it
may still have the file open for read (or may have happened to read it
already and closed it).
I was not able to reproduce the problem in brief testing, but this seems
obvious.
Sponsored-by: Luke Shumaker on Patreona
Support VERSION 2 in the external special remote protocol, which is
identical to VERSION 1, but avoids external remote programs neededing to
work around the above bug. External remote program that support
exporttree=yes are recommended to be updated to send VERSION 2.
Sponsored-by: Kevin Mueller on Patreon
Fix bug that caused broken protocol to be used with external remotes that
use exporttree=yes. In some cases this could result in the wrong content
being exported to, or retrieved from the remote.
Sponsored-by: Nicholas Golder-Manning on Patreon
Remote.Directory makes a temp file, then calls this, and since the temp
file exists, it prevented probing if CoW works.
Note that deleting the empty file does mean there's a small window for a
race. If another process is also exporting to the remote, that could let it
make the same temp file. However, the temp filename actually has the
processes's pid in it, which avoids that being a problem.
This may have been a reversion caused by commits around
63d508e885, but I haven't gone back and
tested to be sure. The directory special remote had supposedly supported
CoW for this going back to about half a year before that.
Sponsored-by: Graham Spencer on Patreon
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.
Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.
Sponsored-by: unqueued on Patreon
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.
Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.
An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.
It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project