checkpresentkey: When no remote is specified, try all remotes, not only
ones that the location log says contain the key. This is what the
documentation has always said it did.
Still try the logged remotes first, because they are far more likely to
have the key.
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.
Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
Not yet 100% done, so far I've grepped for waitForProcess and converted
everything that uses that to start the process with withCreateProcess.
Except for some things like P2P.IO and Assistant.TransferrerPool,
and Utility.CoProcess, that manage a pool of processes. See #2
in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7
for how those will need to be dealt with.
checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so
callers of them will also need to be dealt with, and have not been yet.
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.
Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.
annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
Try to enable special remotes configured with autoenable=yes when git-annex
auto-initialization happens in a new clone of an existing repo. Previously,
git-annex init had to be explicitly run to enable them. That was a bit of a
wart of a special case for users to need to keep in mind.
Special remotes cannot display anything when autoenabled this way, to avoid
interfering with the output of git-annex query commands.
Any error messages will be hidden, and if it fails, nothing is displayed.
The user will realize the remote isn't enable when they try to use it,
and can run git-annex init manually then to try the autoenable again and
see what failed.
That seems like a reasonable approach, and it's less complicated than
communicating something across a pipe in order to display it as a side
message. Other reason not to do that is that, if the first command the
user runs is one like git-annex find that has machine readable output,
any message about autoenable failing would need to not be displayed anyway.
So better to not display a failure message ever, for consistency.
(Had to split out Remote.List.Util to avoid an import cycle.)
Fix a crash or potentially not all files being exported when sync -J
--content is used with an export remote.
Crash as described in fixed bug report.
waitForAllRunningCommandActions inserted in several points where all the
commandActions started before need to have finished before moving on to
the next stage of the export. A race across those points could have
maybe resulted in not all files being exported, or a wrong tree being
export.
For example, changeExport starting up an action like
a rename of A to B. Then, with that action still running, fillExport
uploading a new A, *before* the rename occurred. That race seems
unlikely to have happened. There are some other ones that this also
fixes.
move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup
actions in separate job pool from uploads.
transferStages was confusingly named, it's only useful when doing downloads
as then the verify actions can be run concurrently with other downloads.
For commands that upload, there will be more concurrency from running
cleanup actions in a separate job pool.
As for sync, I left it using downloadStages although that's not optimal
for the part of a sync that uploads. Perhaps it should use the union of
both?
Already supported --json, but not that.
Also checked all other commands that only support --json, and the only
other one that does transfers is fsck (--from), which it did not seem worth
adding --json-progress to really.
One way this can be used is to remove all urls for some website that went
away:
git-annex whereis --format '${file} ${url}\0' | \
grep -z whatever.com | git-annex rmurl --batch -z
Combining ${url} and ${uuid} is a bit of a combinatorial explosion.
It didn't seem worth only outputting a uuid alongside an url belonging
to it, so each uuid is output beside each url.
Finishes the transition to make remote methods throw exceptions, rather
than silently hide them.
A bit on the fence about this one, because when renameExport fails,
it falls back to deleting instead, and so does the user care why it failed?
However, it did let me clean up several places in the code.
This commit was sponsored by Ethan Aubin.
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.
This commit was sponsored by Graham Spencer on Patreon.
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.
getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.
This commit was sponsored by Ilya Shlyakhter on Patreon.
When storing content on remote fails, always display a reason why.
Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
Finishing work begun in 6952060665
Also, truncate filenames provided by other remotes if they're too long,
when --preserve-filename is not used. That seems to have been omitted
before by accident.
I run haddock with `cabal haddock --executables`. It fails with:
Types/Remote.hs:271:17: error: parse error on input ‘->’
Apparently haddock does not like to find haddock blocks outside of
declarations? In any case, this patch makes these type of errors go
away.
Afterwards, I see errors like these, that need to be investigated as
a next step:
haddock: internal error: internal: extractDecl
CallStack (from HasCallStack):
error, called at utils/haddock/haddock-api/src/Haddock/Interface/Create.hs:1116:12 in main:Haddock.Interface.Create
* addurl --preserve-filename: New option, uses server-provided filename
without any sanitization, but with some security checking.
Not yet implemented for remotes other than the web.
* addurl, importfeed: Avoid adding filenames with leading '.', instead
it will be replaced with '_'.
This might be considered a security fix, but a CVE seems unwattanted.
It was possible for addurl to create a dotfile, which could change
behavior of some program. It was also possible for a web server to say
the file name was ".git" or "foo/.git". That would not overrwrite the
.git directory, but would cause addurl to fail; of course git won't
add "foo/.git".
sanitizeFilePath is too opinionated to remain in Utility, so moved it.
The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:
isDrive will never succeed, because "c:" gets munged to "c_"
".." gets sanitized now
".git" gets sanitized now
It will never be null, because sanitizeFilePath keeps the length
the same, and splitDirectories never returns a null path.
Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
Now the warning gets displayed, which is better than an arcane git error.
The warning is still kind of ugly, especially when the pull later in the
sync will clear up what it warns about. But, this is an unusual situation
not likely to happen, and if there is no remote to pull from, the warning
message is needed or the sync will seem to succeed despite not merging the
synced master branch.
Would still be better if it could merge the synced master branch in this
situation, making an empty commit to master to do it seems wrong, and
otherwise it would need a whole separate code path, and would bypass using
git merge in favor of say, setting master to the syned branch. Which would
bypass git configs like arguably merge.ff and certianly
merge.verifySignatures. So don't want to do that.
Todo item is done at last.
Might later want to think about testing some other types of remotes that
can be tested locally. The git remote itself is probably already well
enough tested by the test suite that testremote is not needed. Could
test things like bup, or rsync to a local directory. Or even external,
although that would require embedding an external special remote program
into the test suite..
Factored out a mkTestTree, which can be used to get a TestTree,
w/o needing to first run any annex actions, which the main test suite
cannot do because it does not operate in an annex repo to start with,
and it needs to start testing before a repo is available.
aeca7c2207 exposed this problem, but it
was never a good idea to have a series of test cases, some of which depend on
prior ones, and throw away annex state after each.
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.
(Also importfeed --fast potentially.)
Do not sync with a faster remote that was not specified.
That old behavior was only documented in the changelog, and was certianly
surprising. It also meant adding --fast made it slower..
get --from, move --from: When used with a local git remote, these used to
silently skip files that the location log thought were present on the
remote, when the remote actually no longer contained them. Since that
behavior could be surprising, now instead display a warning.
I got very confused when I encountered this behavior, since it was silently
skipping a file I needed that whereis said was on the remote.
get without --from already displayed a "unable to access these remotes"
message, which while a bit misleading in that the remote is likely
accessible, but just doesn't contain the file, at least indicated something
went wrong.
Having get --from display a warning makes it in line with get
w/o --from, so seems certianly ok. It might be there are situations where
move --from is used, on eg a whole directory, and the user only wants to
move whatever is present in the remote, and is perfectly ok with files
that are not present being skipped. So I'm less sure about the new warning
being ok there. OTOH, only local git remotes avoiding displaying a warning
in that case too, so this just brings them into line with other remotes.
(Also note that this makes it a little bit faster when dealing with a lot of
files, since it avoids a redundant stat of the file.)
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.
What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.
Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
A couple of these were probably actual bugs in edge cases. Most of the
changes I'm fine with. The fact that aeson's object returns sometihng
that we know will be an Object, but the type checker does not know is
kind of annoying.
Eg"core.bare" is the same as "core.bare = true".
Note that git treats "core.bare =" the same as "core.bare = false", so the
code had to become more complicated in order to treat the absense of a
value differently than an empty value. Ugh.
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.
So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.
--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
Improve git-annex's ability to find the path to its program, especially
when it needs to run itself in another repo to upgrade it.
Some parts of the code used readProgramFile, probably because I forgot that
programPath exists.
I noticed this when a git-annex auto-upgrade failed because it was running
git-annex upgrade --autoonly, but the code to run git-annex used
readProgramFile, which happened to point to an older build of git-annex.
This was originally added so that unannex could prevent the hook from
running while files were in a state that the hook would interpret as
old-style unlocked and so would lock.
Now that's gone, so the only thing the hook was preventing was two
pre-commit processes running simulantaneously. But such concurrency
is normal in git-annex and should not be a problem.
Does mean that .git/hooks/pre-commit-annex might run more concurrently,
that seems the only risk of it causing any problems.
Running `git annex add --force-small` on a modified submodule fails
when the submodule path is fed to hash-object. This failure is
unlikely to be triggered by a caller passing a submodule explicitly to
`git annex add` because there's nothing useful that annex-add can do
with a submodule. A more likely scenario for hitting this failure is
that the caller passes "." or a subdirectory to `annex-add` while a
submodule underneath the specified path happens to be modified.
addSmallOverridden already routes symbolic links through addFile
rather than using the custom hash-object/update-index call. The
latter is valid only for regular files, so extend this condition so
that everything that isn't a regular file goes through addFile. Doing
so avoids the above error because submodules come in as directories.
addSmallOverridden calls getFileStatus and then checks the result with
isSymbolicLink. getFileStatus dereferences symbolic links, so
isSymbolicLink will always return false (assuming the getFileStatus
call doesn't fail on a broken link). Use getSymbolicLinkStatus
instead.
Upgrade other repos than the current one by running git-annex upgrade
inside them, which avoids problems with upgrade code making assumptions
that the cwd will be inside the repo being upgraded.
In particular, this fixes a problem where upgrading a v7 repo to v8 caused
an ugly git error message.
I actually could not find a way to make Upgrade.V7 work properly
without changing directory to the remote. Once I got git ls-files to work,
the git cat-file failed because :path can only be used in the current git
repo.