When storing content on remote fails, always display a reason why.
Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
Finishing work begun in 6952060665
Also, truncate filenames provided by other remotes if they're too long,
when --preserve-filename is not used. That seems to have been omitted
before by accident.
I run haddock with `cabal haddock --executables`. It fails with:
Types/Remote.hs:271:17: error: parse error on input ‘->’
Apparently haddock does not like to find haddock blocks outside of
declarations? In any case, this patch makes these type of errors go
away.
Afterwards, I see errors like these, that need to be investigated as
a next step:
haddock: internal error: internal: extractDecl
CallStack (from HasCallStack):
error, called at utils/haddock/haddock-api/src/Haddock/Interface/Create.hs:1116:12 in main:Haddock.Interface.Create
* addurl --preserve-filename: New option, uses server-provided filename
without any sanitization, but with some security checking.
Not yet implemented for remotes other than the web.
* addurl, importfeed: Avoid adding filenames with leading '.', instead
it will be replaced with '_'.
This might be considered a security fix, but a CVE seems unwattanted.
It was possible for addurl to create a dotfile, which could change
behavior of some program. It was also possible for a web server to say
the file name was ".git" or "foo/.git". That would not overrwrite the
.git directory, but would cause addurl to fail; of course git won't
add "foo/.git".
sanitizeFilePath is too opinionated to remain in Utility, so moved it.
The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:
isDrive will never succeed, because "c:" gets munged to "c_"
".." gets sanitized now
".git" gets sanitized now
It will never be null, because sanitizeFilePath keeps the length
the same, and splitDirectories never returns a null path.
Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
Now the warning gets displayed, which is better than an arcane git error.
The warning is still kind of ugly, especially when the pull later in the
sync will clear up what it warns about. But, this is an unusual situation
not likely to happen, and if there is no remote to pull from, the warning
message is needed or the sync will seem to succeed despite not merging the
synced master branch.
Would still be better if it could merge the synced master branch in this
situation, making an empty commit to master to do it seems wrong, and
otherwise it would need a whole separate code path, and would bypass using
git merge in favor of say, setting master to the syned branch. Which would
bypass git configs like arguably merge.ff and certianly
merge.verifySignatures. So don't want to do that.
Todo item is done at last.
Might later want to think about testing some other types of remotes that
can be tested locally. The git remote itself is probably already well
enough tested by the test suite that testremote is not needed. Could
test things like bup, or rsync to a local directory. Or even external,
although that would require embedding an external special remote program
into the test suite..
Factored out a mkTestTree, which can be used to get a TestTree,
w/o needing to first run any annex actions, which the main test suite
cannot do because it does not operate in an annex repo to start with,
and it needs to start testing before a repo is available.
aeca7c2207 exposed this problem, but it
was never a good idea to have a series of test cases, some of which depend on
prior ones, and throw away annex state after each.
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.
(Also importfeed --fast potentially.)
Do not sync with a faster remote that was not specified.
That old behavior was only documented in the changelog, and was certianly
surprising. It also meant adding --fast made it slower..
get --from, move --from: When used with a local git remote, these used to
silently skip files that the location log thought were present on the
remote, when the remote actually no longer contained them. Since that
behavior could be surprising, now instead display a warning.
I got very confused when I encountered this behavior, since it was silently
skipping a file I needed that whereis said was on the remote.
get without --from already displayed a "unable to access these remotes"
message, which while a bit misleading in that the remote is likely
accessible, but just doesn't contain the file, at least indicated something
went wrong.
Having get --from display a warning makes it in line with get
w/o --from, so seems certianly ok. It might be there are situations where
move --from is used, on eg a whole directory, and the user only wants to
move whatever is present in the remote, and is perfectly ok with files
that are not present being skipped. So I'm less sure about the new warning
being ok there. OTOH, only local git remotes avoiding displaying a warning
in that case too, so this just brings them into line with other remotes.
(Also note that this makes it a little bit faster when dealing with a lot of
files, since it avoids a redundant stat of the file.)
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.
What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.
Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
A couple of these were probably actual bugs in edge cases. Most of the
changes I'm fine with. The fact that aeson's object returns sometihng
that we know will be an Object, but the type checker does not know is
kind of annoying.
Eg"core.bare" is the same as "core.bare = true".
Note that git treats "core.bare =" the same as "core.bare = false", so the
code had to become more complicated in order to treat the absense of a
value differently than an empty value. Ugh.
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.
So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.
--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
Improve git-annex's ability to find the path to its program, especially
when it needs to run itself in another repo to upgrade it.
Some parts of the code used readProgramFile, probably because I forgot that
programPath exists.
I noticed this when a git-annex auto-upgrade failed because it was running
git-annex upgrade --autoonly, but the code to run git-annex used
readProgramFile, which happened to point to an older build of git-annex.
This was originally added so that unannex could prevent the hook from
running while files were in a state that the hook would interpret as
old-style unlocked and so would lock.
Now that's gone, so the only thing the hook was preventing was two
pre-commit processes running simulantaneously. But such concurrency
is normal in git-annex and should not be a problem.
Does mean that .git/hooks/pre-commit-annex might run more concurrently,
that seems the only risk of it causing any problems.
Running `git annex add --force-small` on a modified submodule fails
when the submodule path is fed to hash-object. This failure is
unlikely to be triggered by a caller passing a submodule explicitly to
`git annex add` because there's nothing useful that annex-add can do
with a submodule. A more likely scenario for hitting this failure is
that the caller passes "." or a subdirectory to `annex-add` while a
submodule underneath the specified path happens to be modified.
addSmallOverridden already routes symbolic links through addFile
rather than using the custom hash-object/update-index call. The
latter is valid only for regular files, so extend this condition so
that everything that isn't a regular file goes through addFile. Doing
so avoids the above error because submodules come in as directories.
addSmallOverridden calls getFileStatus and then checks the result with
isSymbolicLink. getFileStatus dereferences symbolic links, so
isSymbolicLink will always return false (assuming the getFileStatus
call doesn't fail on a broken link). Use getSymbolicLinkStatus
instead.
Upgrade other repos than the current one by running git-annex upgrade
inside them, which avoids problems with upgrade code making assumptions
that the cwd will be inside the repo being upgraded.
In particular, this fixes a problem where upgrading a v7 repo to v8 caused
an ugly git error message.
I actually could not find a way to make Upgrade.V7 work properly
without changing directory to the remote. Once I got git ls-files to work,
the git cat-file failed because :path can only be used in the current git
repo.
Remaining things needing converted are in the assistant, and Annex.Ssh.
Every other remaining call to createDirectoryIfMissing True has been
audited and is not relevant. The ones in Build/ of course don't get
included in the program. Others included eg, Remote.Tahoe and
Config.Files which both write to dotfiles under the home directory.
Since it was used on both worktree and .git/annex files, split into
multiple functions.
In passing, this also improves permissions of created directories in
.git/annex, using createAnnexDirectory on those.
git-annex config: Only allow configs be set that are ones git-annex
actually supports reading from repo-global config, to avoid confused users
trying to set other configs with this.
The 'fail' method has been moved to the 'MonadFail' class. I made the changes
so that the code still compiles with previous versions of 'base' that don't
have the new MonadFail class exported by Prelude yet.
* whereis: If a remote fails to report on urls where a key
is located, display a warning, rather than giving up and not displaying
any information.
* When external special remotes fail but neglect to provide an error
message, say what request failed, which is better than displaying an
empty error message to the user.
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.
Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)
To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.
And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.
(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
remoteAnnexConfig will avoid bugs like
a3a674d15b
Use now more generic remoteConfig in a couple places that built
non-annex config settings manually before.
That was added back in 2013 commit 2af652e1b8
and I'm a bit unclear about the reasons.
It seemed that, at the time, receive.denyNonFastforwards=true, which is
the default in a repo created by git init --shared --bare (but not
without --shared), which the assistant did, caused problems syncing.
But even at the time the bug report showed an error message clearly
explaining that it was a non-fast-forward push being denied.
I tried it with the current version, and since git-annex sync pulls
from the bare repo and merges, it pushes a fast-forward. So there's no
failure to push. (There could be one if another push happened after the
pull, but you'd want it to fail then presumably.)
I'm not 100% sure what changed to make it not be a problem, but I know
I've seen this message in many circumstances and I can't ever recall it
having anything to do with any issue that prevented a push.
Based on doc/forum/non_fast_forward_error_with_git_annex_sync.mdwn,
which showed the problem when syncing from a direct mode repo,
and on doc/forum/receiving_indirect_renames_on_direct_repo___63__/comment_3_0246fff6c7c75f6be45bd257ec3872a5._comment
which seems to show the problem was actually a problem pulling,
I think there's a good chance that the problem actually involved direct
mode.
* Added sync --only-annex, which syncs the git-annex branch and annexed
content but leaves managing the other git branches up to you.
* Added annex.synconlyannex git config setting, which can also be set with
git-annex config to configure sync in all clones of the repo.
Use case is then the user has their own git workflow, and wants to use
git-annex without disrupting that, so they sync --only-annex to get the
git-annex stuff in sync in addition to their usual git workflow.
When annex.synconlyannex is set, --not-only-annex can be used to override
it.
It's not entirely clear what --only-annex --commit or --only-annex
--push should do, and I left that combination not documented because I
don't know if I might want to change the current behavior, which is that
such options do not override the --only-annex. My gut feeling is that
there is no good reasons to use such combinations; if you want to use
your own git workflow, you'll be doing your own committing and pulling
and pushing.
A subtle question is, how should import/export special remotes be handled?
Importing updates their remote tracking branch and merges it into master.
If --only-annex prevented that git branch stuff, then it would prevent
exporting to the special remote, in the case where it has changes that
were not imported yet, because there would be a unresolved conflict.
I decided that it's best to treat the fact that there's a remote tracking
branch for import/export as an implementation detail in this case. The more
important thing is that an import/export special remote is entirely annexed
content, and so it makes a lot of sense that --only-annex will still sync
with it.
Fix support for repositories tuned with annex.tune.branchhash1=true,
including --all not working and git-annex log not displaying anything for
annexed files.
fsck --from remote: Fix a concurrency bug that could make it incorrectly
detect that content in the remote is corrupt, and remove it, resulting in
data loss.