Commit graph

1373 commits

Author SHA1 Message Date
Joey Hess
30ac015b79
add a formatContainsVar function
Also, the format function gets faster because it checks for "escaped_"
at gen time instead of every time format is called.
2020-05-19 15:35:00 -04:00
Joey Hess
49bf7c8403
typo 2020-05-12 13:59:15 -04:00
Joey Hess
2a8fdfc7d8
Display a warning message when asked to operate on a file inside a directory that's a symbolic link to elsewhere
This relicates git's behavior. It adds a few stat calls for the command
line parameters, so there is some minor slowdown, but even with thousands
of parameters it will not be very noticable, and git does the same statting
in similar circumstances.

Note that this does not prevent eg "git annex add symlink"; the symlink
will be added to git as usual. And "git annex find symlink" will silently
list nothing as well. It's only "symlink/foo" or "subdir/symlink/foo" that
triggers the warning.
2020-05-11 15:03:35 -04:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
021ed4f1b9
fix some build warnings 2020-05-04 12:44:26 -04:00
Joey Hess
4a6d328ae9
Avoid a test suite failure when the environment does not let gpg be tested
Due to eg, too long a path to the agent socket, caused by running gpg in a
container where /run is not mounted, and/or some other gpg behavior like
unnecessarily making relative paths to its home directory absolute.
2020-04-28 15:47:23 -04:00
Joey Hess
19b5137227
addurl --fast error message improvement
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.

(Also importfeed --fast potentially.)
2020-04-27 13:48:14 -04:00
Joey Hess
45fb7af21c
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.

In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.

Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 11:05:57 -04:00
Joey Hess
cee6b344b4
cat-file resource pool
Avoid running a large number of git cat-file child processes when run with
a large -J value.

This implementation takes care to avoid adding any overhead to git-annex
when run without -J. When run with -J, there is a small bit of added
overhead, to manipulate the resource pool. That optimisation added a
fair bit of complexity.
2020-04-20 15:19:31 -04:00
Joey Hess
d5d8259937
ByteString Ref continued
Attoparsec parser for diff-tree.

Changed fromRef back to producing a String, to avoid needing to convert
every use of it. However, this does mean I'm going to miss some
opportunities where fromRef is used and the result converted back to a
ByteString. Would be worth revisiting that at some point maybe.
2020-04-07 11:54:27 -04:00
Joey Hess
279991604d
started converting Ref from String to ByteString
This should make code that reads shas and refs from git faster.

Does not compile yet, a lot needs to be done still.
2020-04-06 17:14:49 -04:00
Joey Hess
f6d19b18f6
remove unused imports 2020-03-30 12:11:52 -04:00
Joey Hess
cd5658d972
fix windows build 2020-03-10 13:24:37 -04:00
Joey Hess
6d58ca94d6
some easy createDirectoryUnder conversions 2020-03-05 15:20:10 -04:00
Joey Hess
ebbc5004fa
convert createAnnexDirectory to use createDirectoryUnder
It will create foo/.git/annex/, but not foo/.git/ and not foo/.

This will avoid it creating an empty path to a repo when a drive is
yanked out and the mount point goes away, for example.
2020-03-05 14:33:04 -04:00
Joey Hess
5b022eea87
implemented createDirectoryUnder 2020-03-05 14:10:34 -04:00
Joey Hess
662e5a5db9
small opt to absPath
Noticed that it gets the CWD unncessarily when the path is absolute.

I have not benchmarked this, but I guess that the small overhead of
isAbsolute is so tiny compared to the system call that it's worth
it even if most of the time relative paths are passed to absPath.
2020-03-05 13:52:30 -04:00
Joey Hess
716e573514
split up quickcheck tests for hashes and macs
So when one fais, it's clear which one is the problem.
2020-03-02 14:34:48 -04:00
Joey Hess
f6d629e483
changelog and minor style 2020-02-28 12:57:55 -04:00
Peter Simons
73cf523a4b
Fix build with ghc-8.8.x.
The 'fail' method has been moved to the 'MonadFail' class. I made the changes
so that the code still compiles with previous versions of 'base' that don't
have the new MonadFail class exported by Prelude yet.
2020-02-28 12:54:20 -04:00
Joey Hess
9659f1c30f
annex.security.allowed-ip-addresses ports syntax
Extended annex.security.allowed-ip-addresses to let specific ports of an IP
address to be used, while denying use of other ports.
2020-02-25 15:45:52 -04:00
Joey Hess
029c883713
Merge branch 'master' into v8 2020-02-19 14:32:11 -04:00
Joey Hess
6f90bb7738
handle git-credential prompt in -J mode
If git-credential has it cached and does not prompt, this will
unfortunately result in a brief flicker, as the displayed console
regions are hidden while running it and then re-displayed. Better than a
corrupted display.

Actually, I tried it and don't see a visible flicker, so probably only
over a slow ssh will it be apparent.
2020-01-22 16:42:15 -04:00
Joey Hess
1883f7ef8f
support git remotes that need http basic auth
using git credential to get the password

One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
2020-01-22 16:16:19 -04:00
Joey Hess
b68a8d8968
use conversion functions from filepath-bytestring (again)
This reverts commit 3a04af7927.
2020-01-04 20:18:40 -04:00
Joey Hess
2cea674d1e
Merge branch 'master' into v8 2020-01-01 14:26:43 -04:00
Joey Hess
999a6f0541
windows build fix 2020-01-01 13:05:23 -04:00
Joey Hess
39c91f91a9
windows build fix 2020-01-01 12:24:31 -04:00
Joey Hess
e006acc8e3
fix quickcheck failure
prop_encode_decode_roundtrip failed on "\175" in C locale.

This may be a new problem after the switch to RawFilePath, but it
already had filtering for high chars, so changed to only test ascii
chars.
2019-12-30 13:54:46 -04:00
Joey Hess
3a04af7927
temporary revert "use conversion functions from filepath-bytestring"
This reverts commit 75c40279c1.

Debian unstable is one version too old, so this can be de-reverted in a
bit.
2019-12-27 19:29:09 -04:00
Joey Hess
2b821eb225
Merge branch 'master' into sqlite 2019-12-26 15:15:42 -04:00
Joey Hess
444d5591ee
Improve file ordering behavior when one parameter is "." and other parameters are other directories
eg, `git-annex get . ..` used to order the files strangly, because it
did not realize that when git ls-files output eg "foo", that should be
grouped with the first set of files and not the second set.

Fixed by making            dirContains "." "./foo" = True
which makes sense, because dirContains ".." "../foo" = True
2019-12-20 18:01:29 -04:00
Joey Hess
d5628a16b8
Merge branch 'bs' into sqlite-bs 2019-12-18 14:51:03 -04:00
Joey Hess
75c40279c1
use conversion functions from filepath-bytestring
Behavior should be the same, but I'd hope to eventually get rid of
most of Utility.FileSystemEncoding and this is a first step.
2019-12-18 13:42:43 -04:00
Joey Hess
322c542b5c
fix ByteString conversion on windows
the encode' and decode' functions on Windows should not apply the
filesystem encoding, which does not work there. Instead, convert to and
from UTF-8.

Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem
encoding, so won't work as expected on windows.
2019-12-18 13:32:56 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
a0168cd9a2
use RawFilePath getSymbolicLinkStatus for speed 2019-12-06 15:42:54 -04:00
Joey Hess
2f9a80d803
merging sqlite and bs branches
Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
2019-12-06 15:30:45 -04:00
Joey Hess
5f391179f1
use RawFilePath getFileStatus for speed
Only done on those calls to getFileStatus that had a RawFilePath, not a
FilePath. The others would probably be just as fast if converted to use
it with toRawFilePath, but I'm not 100% sure.

Note that genInodeCache' uses fromRawFilePath, but that value only gets
used on Windows, so on unix the thunk will never be evaluated.
2019-12-06 14:44:42 -04:00
Joey Hess
360942ba12
RawFilePath will need to support Windows too
Of course, readSymbolicLink always fails on Windows, but now it's ready
for other things that don't fail there.
2019-12-06 14:17:48 -04:00
Joey Hess
f39f018ee0
fix git ls-tree parser
File mode is octal not decimal. This broke in the conversion to
attoparsec.

(I've submitted the content of Utility.Attoparsec to the attoparsec
developers.)

Test suite passes 100% now.
2019-12-06 14:05:48 -04:00
Joey Hess
37d0f73e66
reword comment 2019-11-27 16:38:18 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Joey Hess
1ff889e456
explict export lists
A small amount of dead code removed.

All of Utility/ done now.

This commit was sponsored by Brock Spratlen on Patreon.
2019-11-23 11:24:10 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
1d0dbdf201
squelch tab warnings 2019-11-22 12:49:41 -04:00
Joey Hess
7263aafd2b
Merge branch 'master' into sqlite 2019-11-22 12:49:35 -04:00
Joey Hess
b82ab21468
missed an export 2019-11-22 12:35:57 -04:00
Joey Hess
a9888f6151
Windows: Fix handling of changes to time zone.
Used to work but was broken in version 7.20181031, specifically commit
5ab0f48ffb.

That this was not noticed over at least 1 daylight savings time zone
changes makes me wonder if the TSDelta stuff is still needed.
Perhaps the mtime on Windows no longer changes when the time zone is changed?

(cherry picked from commit 09ee6b0ccb)
2019-11-21 17:28:18 -04:00