Commit graph

910 commits

Author SHA1 Message Date
Joey Hess
cabbc91b18
addurl, importfeed: Allow '-' in filenames, as long as it's not the first character 2020-05-11 13:50:49 -04:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
69e2e4763e
only check --force at init time, not enable time
git-lfs repos that encrypt the annexed content but not the git repo only
need --force passed to initremote, allow enableremote and autoenable of
such remotes without forcing again.

Needing --force again particularly made autoenable of such a repo not work.
And once such a repo has been set up, it seems a second --force when
enabling it elsewhere has little added value. It does tell the user about
the possibly insecure configuration, but if the git repo has already been
pushed to that remote in the clear, data has already been exposed. The goal
of that --force was not to prevent every situation where such an exposure
can happen -- anyone who sets up a public git repo and pushes to it will
expose things similarly and git-annex is not involved. Instead, the purpose
of the --force is to point out to the user that they're asking for a
configuration where encryption is inconsistently applied.
2020-05-07 15:59:29 -04:00
Joey Hess
1532d67c3e
S3: Support signature=v4
To use S3 Signature Version 4. Some S3 services seem to require v4, while
others may only support v2, which remains the default.

I'm also not sure if v4 works correctly in all cases, there is this
upstream bug report: https://github.com/aristidb/aws/issues/262
I've only tested it against the default S3 endpoint.
2020-05-07 13:18:11 -04:00
Joey Hess
bb88a01910
upgrade: When upgrade fails due to an exception, display it.
37b42e72e7 made it catch exceptions but
thought they were unlikely to be useful to display, which may be right when
a git command fails, but not in the case yoh found.
2020-05-07 12:22:32 -04:00
Joey Hess
0040d2c129
sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from
Now the warning gets displayed, which is better than an arcane git error.

The warning is still kind of ugly, especially when the pull later in the
sync will clear up what it warns about. But, this is an unusual situation
not likely to happen, and if there is no remote to pull from, the warning
message is needed or the sync will seem to succeed despite not merging the
synced master branch.

Would still be better if it could merge the synced master branch in this
situation, making an empty commit to master to do it seems wrong, and
otherwise it would need a whole separate code path, and would bypass using
git merge in favor of say, setting master to the syned branch. Which would
bypass git configs like arguably merge.ff and certianly
merge.verifySignatures. So don't want to do that.
2020-05-05 14:31:37 -04:00
Joey Hess
dfc4e641b5
repair: Improve fetching from a remote with an url in host:path format.
User reported git@my.gitlab.foo:username/myrepo.git didn't work with
git-repair, because it rewrites it to an url
ssh://git@my.gitlab.foo/~/username/myrepo.git
and the /~/ was not something the hosting site supported.

Since git-annex still generally needs the repo url to be well, an url, did
not change the conversion code. But in this case, we're running git fetch,
so we might as well pass it the remote name rather than the url.

Did a quick audit of repoLocation uses to see if there was anything else
like this problem elsewhere, and didn't see any. But this is not the first
time this special case in git and git-annex's attempt to de-special-case
it has caused a problem..
2020-05-04 15:32:06 -04:00
Joey Hess
f9ed30de3b
avoid beware of the leopard situation
* Display a warning message when a remote uses a protocol, such as
  git://, that git-annex does not support. Silently skipping such a
  remote was confusing behavior.

  It sets annex-ignore, so the warning is only displayed once.

* Also display a warning message when a remote, without a known uuid,
  is located in a directory that does not currently exist, to avoid
  silently skipping such a remote.

  This is a bit more debatable, since git-annex get will say,
  try making repository available. And since it does not set annex-ignore,
  the warning will be displayed repeatedly. It's also an extreme edge case,
  I don't think I've ever seen it happen in real life.
2020-05-04 13:01:11 -04:00
Joey Hess
e72ab75633
releasing package git-annex version 8.20200501 2020-05-01 17:41:36 -04:00
Joey Hess
d7db481471
wip
This does not compile, and I hit a bad dead end. Wah.
2020-04-29 15:48:39 -04:00
Joey Hess
4a6d328ae9
Avoid a test suite failure when the environment does not let gpg be tested
Due to eg, too long a path to the agent socket, caused by running gpg in a
container where /run is not mounted, and/or some other gpg behavior like
unnecessarily making relative paths to its home directory absolute.
2020-04-28 15:47:23 -04:00
Joey Hess
57b89c635f
support required groupwanted
When the required content is set to "groupwanted", use whatever expression
has been set in groupwanted as the required content of the repo, similar to
how setting required content to "standard" already worked.
2020-04-28 13:31:26 -04:00
Joey Hess
19b5137227
addurl --fast error message improvement
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.

(Also importfeed --fast potentially.)
2020-04-27 13:48:14 -04:00
Joey Hess
c05c4e549e
sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes
Do not sync with a faster remote that was not specified.

That old behavior was only documented in the changelog, and was certianly
surprising. It also meant adding --fast made it slower..
2020-04-23 16:08:45 -04:00
Joey Hess
2aeb79249b
external: stop storing readonly=true in remote.log
readonly=true is used to make an external special remote that does not
need the external program to be installed. It was stored in the
remote.log by default, and so every time it was specified in an
enableremote or initremote, whatever value was used became the new
default for subsequent enableremotes of that remote.

That was surprising, and I consider it to be a bug.

It does not make much sense to pass it to initremote because then how
would you populate that remote with anything? You would have to
enableremote elsewhere, and store content there. I'm assuming nobody
used it that way.

Someone might rely on passing it to enableremote once, and then that
being inherited in other clones. But that is not how it's documented to
be used. It is barely documented in git-annex at all, only in the
external special remote protocol, and the documentation there says to
"Document that this external special remote can be used in readonly
mode." (by the user of it passing readonly=true to enableremote). The
one external special remote that I know of that does document that is
<https://github.com/bgilbert/gcsannex> (the one that motivated adding
it). That one's docs do say to pass it to enableremote.

So, it seemed safe to make this behavior change. If someone was in fact
relying on one of those behaviors, all their current repos will still
work as they configured them (although they will need to deal
with the related change in 9f3c2dfeda).
In new clones, they will find enableremote fails, complaining the
external program is not in path. An easy enough problem to recover from.
2020-04-23 15:21:26 -04:00
Joey Hess
9f3c2dfeda
stop using remote.name.annex-readonly for two distinct things 2020-04-23 14:56:03 -04:00
Joey Hess
cd1676d604
fix bug involving local git remote and out of date location log
get --from, move --from: When used with a local git remote, these used to
silently skip files that the location log thought were present on the
remote, when the remote actually no longer contained them. Since that
behavior could be surprising, now instead display a warning.

I got very confused when I encountered this behavior, since it was silently
skipping a file I needed that whereis said was on the remote.

get without --from already displayed a "unable to access these remotes"
message, which while a bit misleading in that the remote is likely
accessible, but just doesn't contain the file, at least indicated something
went wrong.

Having get --from display a warning makes it in line with get
w/o --from, so seems certianly ok. It might be there are situations where
move --from is used, on eg a whole directory, and the user only wants to
move whatever is present in the remote, and is perfectly ok with files
that are not present being skipped. So I'm less sure about the new warning
being ok there. OTOH, only local git remotes avoiding displaying a warning
in that case too, so this just brings them into line with other remotes.

(Also note that this makes it a little bit faster when dealing with a lot of
files, since it avoids a redundant stat of the file.)
2020-04-21 12:36:58 -04:00
Joey Hess
04352ed9c5
check-ignore resource pool
Much like check-attr before.
2020-04-21 11:25:28 -04:00
Joey Hess
45fb7af21c
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.

In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.

Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 11:05:57 -04:00
Joey Hess
cee6b344b4
cat-file resource pool
Avoid running a large number of git cat-file child processes when run with
a large -J value.

This implementation takes care to avoid adding any overhead to git-annex
when run without -J. When run with -J, there is a small bit of added
overhead, to manipulate the resource pool. That optimisation added a
fair bit of complexity.
2020-04-20 15:19:31 -04:00
Joey Hess
529f488ec4
fix a thundering herd problem
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.

What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.

Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
2020-04-17 17:09:29 -04:00
Joey Hess
957a87b437
fix absolute filenames fed into --batch and git-annex info 2020-04-15 16:04:05 -04:00
Joey Hess
5a62e8132d
When parsing git configs, support all the documented ways to write true and false, including "yes", "on", "1", etc.
This change does impact git-annex config
eg "git annex config --set annex.addunlocked on"
will store "on" and new git-annex will understand that value, while
old git-annex will error:
git-annex: bad annex.addunlocked configuration in git annex config:
Parse failure: near "on"
That seems acceptable.

Not special remote configs that are only documented as =true or =false
however. Having git-annex support other values for those would break
backwards compatability when used with old versions of git-annex. And
older versions ignore invalid special remote configs.. That would not
be a good combination.
2020-04-13 14:05:30 -04:00
Joey Hess
9cb69dbb76
support boolean git configs that are represented by the name of the setting with no value
Eg"core.bare" is the same as "core.bare = true".

Note that git treats "core.bare =" the same as "core.bare = false", so the
code had to become more complicated in order to treat the absense of a
value differently than an empty value. Ugh.
2020-04-13 13:35:22 -04:00
Joey Hess
ca9c6c5f60
Fix a potential failure to parse git config
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.

So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
2020-04-13 13:05:41 -04:00
Joey Hess
86426036a0
optimise catfile interface with ByteString and Attoparsec
Around 3% total speedup.

Profiling git annex find --not --in web, it's now bytestring end-to-end,
and there is only a little added overhead in eg accessing the Annex
state MVar (3%). The rest of the runtime is spent reading symlinks, and
in attoparsec.

This feels like the end of the optimisation road, without a major change
like caching information for faster queries.
2020-04-10 14:18:52 -04:00
Joey Hess
2caf579718
cache annex index filename for 1.5% speedup to queries 2020-04-10 13:37:04 -04:00
Joey Hess
aeca7c2207
Sped up query commands that read the git-annex branch by around 5%
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.

--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
2020-04-09 13:54:43 -04:00
Joey Hess
541803bb52
changelog for ByteString Ref 2020-04-08 14:10:29 -04:00
Joey Hess
7ebc118776
adb: Better messages when the adb command is not installed
After a user completely ignored the display of the exception probably
because it didn't make sense..

This does make it a little bit slower since it checks adb is in path each
time before running it. Also, it might display a lot of warnings about it
not being installed.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-04-02 10:48:28 -04:00
Joey Hess
87d5583a91
use programPath consistently, not readProgramFile
Improve git-annex's ability to find the path to its program, especially
when it needs to run itself in another repo to upgrade it.

Some parts of the code used readProgramFile, probably because I forgot that
programPath exists.

I noticed this when a git-annex auto-upgrade failed because it was running
git-annex upgrade --autoonly, but the code to run git-annex used
readProgramFile, which happened to point to an older build of git-annex.
2020-03-30 16:06:27 -04:00
Joey Hess
dbad6c5c39
releasing package git-annex version 8.20200330 2020-03-30 13:46:24 -04:00
Joey Hess
c9ac7aa338
patch applied 2020-03-26 15:18:47 -04:00
Joey Hess
173465592f
changelog for Kyle's other fix, and close bug 2020-03-26 13:18:41 -04:00
Joey Hess
c0ceb969e6
changelog for Kyle's fix 2020-03-26 13:12:49 -04:00
Joey Hess
bbe977a2b6
fsck: Fix reversion in 8.20200226 that made it incorrectly warn that hashed keys with an extension should be upgraded. 2020-03-20 13:09:16 -04:00
Joey Hess
4b92bbe8d7
webdav: Made exporttree remotes faster by caching connection to the server
Followed example of Remote.S3.
2020-03-20 12:48:43 -04:00
Joey Hess
c8fec6ab03
Fix a minor bug that caused options provided with -c to be passed multiple times to git. 2020-03-16 13:06:44 -04:00
Joey Hess
14a4a9f4cd
releasing package git-annex version 8.20200309 2020-03-09 17:08:16 -04:00
Joey Hess
4ce518998a
Fix upgrade failure when a file has been deleted from the working tree 2020-03-09 16:59:18 -04:00
Joey Hess
afe72d04ff
fix problems with upgrade of local remotes
Upgrade other repos than the current one by running git-annex upgrade
inside them, which avoids problems with upgrade code making assumptions
that the cwd will be inside the repo being upgraded.

In particular, this fixes a problem where upgrading a v7 repo to v8 caused
an ugly git error message.

I actually could not find a way to make Upgrade.V7 work properly
without changing directory to the remote. Once I got git ls-files to work,
the git cat-file failed because :path can only be used in the current git
repo.
2020-03-09 16:49:28 -04:00
Joey Hess
d930a2035c
Avoid converting .git file in a worktree or submodule to a symlink when the repository is not a git-annex repository.
This means it will still be a .git file when git-annex init runs. That's
ok, the repo probably contains no annexed objects yet, and even if it does,
git-annex init does not care if symlinks in the worktree don't point to the
objects.

I made init, at the end, run the conversion code. Not really necessary
because the next git-annex command could do it just as well. But, this
avoids commands that don't normally write to the repo needing to write to
it, which might avoid some problem or other, and seems worth avoiding
generally.
2020-03-09 14:54:14 -04:00
Joey Hess
c0a981cb0e
update comment 2020-03-09 14:31:28 -04:00
Joey Hess
8a58f67170
remove changelog for problem I've not fixed yet 2020-03-09 14:24:40 -04:00
Joey Hess
1978a24207
Fix bug that caused unlocked annexed dotfiles to be added to git by the smudge filter when annex.dotfiles was not set. 2020-03-09 14:20:02 -04:00
Joey Hess
70d24c0302
add a comment about CWD
While git ls-files can actually be used on a repo that is not in the
cwd, it works inconsistently. For example, this fails:

git --git-dir=../foo/.git --work-tree=../foo ls-files ../foo

But change some of the paths to absolute and it will succeed. That seems
like a bug in git.

OTOH, this succeeds:

git --git-dir=../foo/.git --work-tree=../foo ls-files

But, that lists paths relative to the top of the --work-tree,
rather than the usual listing them relative to the cwd. Because the cwd
is not in the repo. And so anything parsing the ls-files output of that
is likely to operate on files in the wrong location. Indeed, there is
code in Upgrade/ that has this problem!
2020-03-09 13:37:01 -04:00
Joey Hess
6a91471923
GETCONFIG name fix
Fix regression that prevented external special remotes from using GETCONFIG
to query values like "name". (Introduced in version 7.20200202.7.)
2020-03-09 12:38:04 -04:00
Joey Hess
093fde5abd
completed the createDirectoryIfMissing conversion
Remaining calls in the assistant and Annex.Ssh have been audited and are ok.
2020-03-06 12:55:03 -04:00
Joey Hess
88f721549d
Linux standalone: Use md5sum to shorten paths in .cache/git-annex/locales
md5sum is part of busybox, so is probably available unless it were compiled
out. If md5sum (or cut for that matter) is not available, it will
still use the whole path to $base, otherwise hash it.

Of course it's possible for md5sum to be available sometimes and not others
on the same system; in such an event the locales would be built twice for
the same bundle. The cleanup code will delete both sets once that
version of the bundle is upgraded.
2020-03-04 13:04:56 -04:00
Joey Hess
2099e40664
improve wording 2020-03-02 16:27:29 -04:00