Commit graph

1587 commits

Author SHA1 Message Date
Joey Hess
a0168cd9a2
use RawFilePath getSymbolicLinkStatus for speed 2019-12-06 15:42:54 -04:00
Joey Hess
2f9a80d803
merging sqlite and bs branches
Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
2019-12-06 15:30:45 -04:00
Joey Hess
5f391179f1
use RawFilePath getFileStatus for speed
Only done on those calls to getFileStatus that had a RawFilePath, not a
FilePath. The others would probably be just as fast if converted to use
it with toRawFilePath, but I'm not 100% sure.

Note that genInodeCache' uses fromRawFilePath, but that value only gets
used on Windows, so on unix the thunk will never be evaluated.
2019-12-06 14:44:42 -04:00
Joey Hess
360942ba12
RawFilePath will need to support Windows too
Of course, readSymbolicLink always fails on Windows, but now it's ready
for other things that don't fail there.
2019-12-06 14:17:48 -04:00
Joey Hess
f39f018ee0
fix git ls-tree parser
File mode is octal not decimal. This broke in the conversion to
attoparsec.

(I've submitted the content of Utility.Attoparsec to the attoparsec
developers.)

Test suite passes 100% now.
2019-12-06 14:05:48 -04:00
Joey Hess
37d0f73e66
reword comment 2019-11-27 16:38:18 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Joey Hess
1ff889e456
explict export lists
A small amount of dead code removed.

All of Utility/ done now.

This commit was sponsored by Brock Spratlen on Patreon.
2019-11-23 11:24:10 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
1d0dbdf201
squelch tab warnings 2019-11-22 12:49:41 -04:00
Joey Hess
7263aafd2b
Merge branch 'master' into sqlite 2019-11-22 12:49:35 -04:00
Joey Hess
b82ab21468
missed an export 2019-11-22 12:35:57 -04:00
Joey Hess
a9888f6151
Windows: Fix handling of changes to time zone.
Used to work but was broken in version 7.20181031, specifically commit
5ab0f48ffb.

That this was not noticed over at least 1 daylight savings time zone
changes makes me wonder if the TSDelta stuff is still needed.
Perhaps the mtime on Windows no longer changes when the time zone is changed?

(cherry picked from commit 09ee6b0ccb)
2019-11-21 17:28:18 -04:00
Joey Hess
d4661959de
Merge branch 'master' into sqlite 2019-11-21 17:26:50 -04:00
Joey Hess
8ea5f3ff99
explict export lists
Eliminated some dead code. In other cases, exported a currently unused
function, since it was a logical part of the API.

Of course this improves the API documentation. It may also sometimes
let ghc optimize code better, since it can know a function is internal
to a module.

364 modules still to go, according to
git grep -E 'module [A-Za-z.]+ where'
2019-11-21 16:08:37 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
09ee6b0ccb
Windows: Fix handling of changes to time zone.
Used to work but was broken in version 7.20181031, specifically commit
5ab0f48ffb.

That this was not noticed over at least 1 daylight savings time zone
changes makes me wonder if the TSDelta stuff is still needed.
Perhaps the mtime on Windows no longer changes when the time zone is changed?
2019-11-06 14:36:49 -04:00
Joey Hess
89bdcffdfa
found a way to extract InodeCache from git index
This will allow a race-free database transition. It is somewhat hairy in
that it depends on an unspecified git output format.
2019-11-06 14:23:00 -04:00
Joey Hess
4940a135af
eliminate raw sql LIKE query 2019-10-30 15:19:52 -04:00
Joey Hess
94efc400e9
horrible impementation of isInodeKnown
The only good thing about it is it does not require a major version bump
to improve the database. That will need to happen at some point though.

Potentially very very slow in a large repository.

Ugly use of raw sql.
2019-10-23 14:37:29 -04:00
Joey Hess
eebf080b33
comment typo 2019-10-23 12:32:46 -04:00
Joey Hess
9a5d9019ba
Deal with pkexec changing to root's home directory when running a command.
Wow, that's not documented anywhere, and seems like a major gotcha in
pkexec.

Broke enable-tor.
2019-10-21 12:39:19 -04:00
Joey Hess
b90ddbc383
enable-tor: Use pkexec to run command as root when gksu and kdesu are not available.
gksu is no longer in debian, even stable

kdesu in debian is not installed in PATH any longer, though the executable
is still present under /usr/lib

pkexec is packagekit's replacement for those older commands.
2019-09-30 15:19:01 -04:00
Joey Hess
f2737a5fbe
enable-tor: Run kdesu with -c option. 2019-09-30 15:14:05 -04:00
Joey Hess
ab8a6a82e1
remove unused 2019-09-24 18:16:01 -04:00
Joey Hess
a4750fa537
move haddock block so haddock will build 2019-09-24 18:14:47 -04:00
Joey Hess
71f30d2f07
improve haddock 2019-09-24 18:10:34 -04:00
Joey Hess
bc1b9a2c0a
improved GitLFS api 2019-09-24 18:05:11 -04:00
Joey Hess
6ae0a44c64
git-lfs: Added support for http basic auth 2019-09-24 14:46:20 -04:00
Joey Hess
53fd746705
avoid some build warnings on windows 2019-09-12 14:11:19 -04:00
Joey Hess
9624fe4c37
improve comment 2019-09-01 12:33:19 -04:00
Joey Hess
1558e03014
Refuse to upgrade direct mode repositories when git is older than 2.22
That git fixed a memory leak that could cause an OOM during the upgrade.

Most git-annex builds have a new enough git already.
OSX git was upgraded with brew.

Linux i386ancient build's git was too old. Upgrading it to a fixed
git didn't work (due to the newer git not working with the old ssh,
https://bugs.chromium.org/p/git/issues/detail?id=7 )

Choices to deal with that were:

* Somehow make direct mode upgrade work with the old git, avoiding its
  OOM problem. One way would be to switch the repo to indirect mode
  first, and so upgrade to a repo with locked files. Not good when
  the filesystem does not support symlinks.
* backport the OOM fix from git 2.22
  (And do what about the version number so git-annex knows it's fixed?)
* backport openssh (and possibly more stuff)
* move the i386ancient build to at least Debian stretch (still backporting git)
  But this will make it no longer work with some of the ancient kernels it
  targets.

Of those, backporting the OOM fix seemed the best approach. Put "oomfix"
in the git version number to indicate it.

I have not automated building the git backport, so here's the patch I
used:

diff -ur orig/git-2.1.4/convert.c git-2.1.4/convert.c
--- orig/git-2.1.4/convert.c	2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/convert.c	2019-08-29 20:05:04.371872338 +0100
@@ -404,7 +404,7 @@
 	if (start_async(&async))
 		return 0;	/* error was already reported */

-	if (strbuf_read(&nbuf, async.out, len) < 0) {
+	if (strbuf_read(&nbuf, async.out, 0) < 0) {
 		error("read from external filter %s failed", cmd);
 		ret = 0;
 	}
diff -ur orig/git-2.1.4/GIT-VERSION-GEN git-2.1.4/GIT-VERSION-GEN
--- orig/git-2.1.4/GIT-VERSION-GEN	2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/GIT-VERSION-GEN	2019-08-29 20:06:39.132743228 +0100
@@ -1,7 +1,7 @@
 #!/bin/sh

 GVF=GIT-VERSION-FILE
-DEF_VER=v2.1.4
+DEF_VER=v2.1.4.oomfix

 LF='
 '
diff -ur orig/git-2.1.4/configure git-2.1.4/configure
--- orig/git-2.1.4/configure	2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/configure	2019-08-29 20:27:45.896380015 +0100
@@ -580,8 +580,8 @@
 # Identity of this package.
 PACKAGE_NAME='git'
 PACKAGE_TARNAME='git'
-PACKAGE_VERSION='2.1.4'
-PACKAGE_STRING='git 2.1.4'
+PACKAGE_VERSION='2.1.4.oomfix'
+PACKAGE_STRING='git 2.1.4.oomfix'
 PACKAGE_BUGREPORT='git@vger.kernel.org'
 PACKAGE_URL=''

diff -ur orig/git-2.1.4/version git-2.1.4/version
--- orig/git-2.1.4/version	2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/version	2019-08-29 20:06:17.572545210 +0100
@@ -1 +1 @@
-2.1.4
+2.1.4.oomfix
2019-08-29 15:24:41 -04:00
Joey Hess
69cefe8190
followup and display rsync exit status 2019-08-15 14:47:22 -04:00
Joey Hess
05d52f9699
fix display of http exceptions 2019-08-10 11:09:25 -04:00
Joey Hess
868942e19b
fix unused module import warnings when building on windows 2019-08-08 12:18:53 -04:00
Joey Hess
ee72fd2f7d
add exports useful if using this module to write a git-lfs server 2019-08-05 15:40:36 -04:00
Joey Hess
c527ae5887
Merge branch 'master' into git-lfs 2019-08-05 11:48:45 -04:00
Joey Hess
19defc7932
fix reversion
4af55c42bf reordered the exception
catching, preventing following ftp redirect
2019-08-04 14:32:06 -04:00
Joey Hess
4af55c42bf
factored out downloadConduit from download
useful when an API provides a Request to download
2019-08-04 12:31:54 -04:00
Joey Hess
5be0a35dae
implemented checkPresent for git-lfs 2019-08-03 12:21:28 -04:00
Joey Hess
f536a0b264
weaken comment
I'm seeing the github lfs server request an upload of an object that has
already been uploaded to it before. Probably because they offload
storage to S3 and so skipped the overhead of checking for an unncessary
upload.
2019-08-03 11:31:02 -04:00
Joey Hess
74e9e3ccf0
add to request headers, don't overwrite 2019-08-03 11:15:08 -04:00
Joey Hess
fc09a41ed1
storing objects in git-lfs is working
Still need to record the sha256 and size when they cannot be determined
by inspecting the key.
2019-08-02 13:56:55 -04:00
Joey Hess
6c1130a3bb
lfs endpoint discovery and caching in git-lfs special remote 2019-08-02 12:38:14 -04:00
Joey Hess
03a765909c
move IO code out
Let's keep this entirely pure.

git-annex has its own facilities for running a ssh command, that make it
respect various config settings, and cache connections, etc. So better
not to have the library run ssh itself.
2019-08-02 10:57:40 -04:00
Joey Hess
2533acc7a2
note about ssh hostname sanitization 2019-08-02 10:40:55 -04:00
Joey Hess
bd6c508334
finalizing lfs module
It may eventually move to its own package.
2019-08-01 14:04:56 -04:00
Joey Hess
018b5b8173
Support building with socks-0.6 and persistant-template-2.7
persistent-template now needs UndecidableInstances.

socks changed defaultSocksConf to take a SockAddr.
2019-07-30 12:50:48 -04:00
Joey Hess
426053cb6c
Corrected some license statements
In 40ecf58d4b I changed the license of code I
wrote from GPL to AGPL. But, two files containing code I wrote combined
with code by others were updated to say their license is AGPL, while in
fact part of it was (the code I wrote) but part remained under the original
license (the code written by others).

Remote/Ddar.hs is now changed entirely back to GPL 3.

Annex/DirHashes.hs stays AGPL, but I broke out Utility/MD5.hs with the code
not written by me, and corrected its license statement to GPL-2, which
is the actual version of the GPL included with the code in its original
distribution at http://www.cs.ox.ac.uk/people/ian.lynagh/md5/
2019-07-28 14:27:33 -04:00
Joey Hess
7fd650355e
merge from http-client-restricted
I made some improvements to its API after splitting it out of git-annex,
so merge those back in.

This is groundwork for removing the embedded copy of it and depending on
it.

Also moved the managerResponseTimeout disabling to Annex.Url as it's
git-annex specific.

This commit was sponsored by Ethan Aubin on Patreon.
2019-07-17 16:48:50 -04:00
Joey Hess
7234b1f9a7
small optimisation to file copying
Avoid statting file, just try to remove it.

Also a comment to explain why it tries to remove it, which was puzzling
me when I revisited this code until I saw that cp fails to overwrite a
mode 444 file, including perhaps one left by a previous interrupted cp.

This commit was sponsored by Fernando Jimenez on Patreon.
2019-07-17 14:22:21 -04:00
Joey Hess
21ff5e1e5a
CoW probing
Improved probing when CoW copies can be made between files on the same
drive. Now supports CoW between BTRFS subvolumes. And, falls back to rsync
instead of using cp when CoW won't work, eg copies between repos on the
same EXT4 filesystem.

Rather than trying cp --reflink=always for each file copied to a remote,
it's tried once and if it fails it falls back to using rsync thereafter
for the lifetime of the Remote object. That avoids overhead of calling cp
which while small, will add up over a large number of files.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2019-07-17 14:19:08 -04:00
Joey Hess
0c6b7e288d
Add BLAKE2BP512 and BLAKE2BP512E backends
using a blake2 variant optimised for 4-way CPUs

This had been deferred because the Debian package of cryptonite, and
possibly other builds, was broken for blake2bp, but I've confirmed #892855
is fixed.

This commit was sponsored by Brett Eisenberg on Patreon.
2019-07-05 15:30:03 -04:00
Joey Hess
9a5ddda511
remove many old version ifdefs
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.

The only remaining version ifdefs in the entire code base are now a couple
for aws!

This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of  git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.

This commit was sponsored by Ilya Shlyakhter.
2019-07-05 15:09:37 -04:00
Joey Hess
42c386fc47
add: Display progress meter when hashing files.
* add: Display progress meter when hashing files.
* add: Support --json-progress option.
2019-06-25 13:12:47 -04:00
Joey Hess
759fd9ea68
avoid url resume from 0
When downloading an url and the destination file exists but is empty,
avoid using http range to resume, since a range "bytes=0-" is an unusual
edge case that it's best to avoid relying on working.

This is known to fix a case where importfeed downloaded a partial feed from
such a server. Since importfeed uses withTmpFile, the destination always exists
empty, so it would particularly tickle such problem servers. Resuming from 0
is otherwise possible, but unlikely.
2019-06-20 12:26:17 -04:00
Joey Hess
fe49747fc8
add missing case
and fix name shadowing warning
2019-06-04 11:24:32 -04:00
Joey Hess
6136e299a2
add back support for following http to ftp redirects
Did not test build with http-client < 0.5 and while I tried to support
it, the ifdefed parts may needs some fixes.
2019-05-30 16:04:59 -04:00
Joey Hess
67c06f5121
add back support for ftp urls
Add back support for ftp urls, which was disabled as part of the fix for
security hole CVE-2018-10857 (except for configurations which enabled curl
and bypassed public IP address restrictions). Now it will work if allowed
by annex.security.allowed-ip-addresses.
2019-05-30 14:51:34 -04:00
Joey Hess
aa7710982b
avoid list lookup by parseToken
Minor optimisation to parsing of a preferred content expression.
2019-05-14 13:11:29 -04:00
Joey Hess
00e9e15c70
squelch build warning with old version of quickcheck 2019-05-03 11:02:12 -04:00
Joey Hess
2b52dbe905
fix build with older QuickCheck
The NonEmpty instance was moved out of QuickCheck and into a package
with more deps than I want to drag in, so I'm providing my own instance,
but with older QuickCheck, use theirs to avoid overlapping.
2019-03-22 10:07:16 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
c0bd202147
fix failing test case
An empty list of [ContentIdenfier] serialized to the same thing
as a single ContentIdentifier "". Avoid this ambiguity by requiring the
list be non-empty.
2019-03-06 14:27:15 -04:00
Joey Hess
1ec9e1494c
use relatedTempate in viaTmp 2019-03-04 14:12:00 -04:00
Joey Hess
4603713b4e
avoid using htonl
It got removed from network-3.0.0.0 and nothing in the haskell ecosystem
currently provides it (which seems it ought to be fixed).

Tested new code on both little-endian and big-endian with:

ghci> hostAddressToTuple $ fromJust $ embeddedIpv4 (0,0,0,0,0,0xffff,0x7f00,1)
(127,0,0,1)
2019-02-19 12:17:20 -04:00
Joey Hess
f5f059e288
relocate gpg test framework temp dir to outside repo
The gitAnnexTmpOtherDir cleanup made it be deleted too early sometimes,
and so the test suite failed. Also there was a report of a similar
failure which likely had a similar cause and hopwfully this fixes that
too.
2019-01-21 14:16:00 -04:00
Joey Hess
e38b654096
Estimated time to completion display shortened from eg "1h1m1s" to "1h1m"
Because seconds accuracy over such a time is unlikely to be accurate.
Also, it was possible to get a ridiculous "1y1d1h1m1s" if stalled or
very slow.
2019-01-21 00:04:35 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
0e44985210
remove duplicate import 2019-01-14 18:26:38 -04:00
Joey Hess
e0c4ac99b5
convert serializeKey' to strict ByteString
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
2019-01-14 17:03:46 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
fc21cccf1c
slight optimisation more 2019-01-11 19:56:31 -04:00
Joey Hess
16c798b5ef
switch MetaValue to ByteString and MetaField to Text
MetaField was already limited to alphanumerics, so it makes sense to use
Text for it.

Note that technically a UUID can contain invalid UTF-8, and so
remoteMetaDataPrefix's use of T.pack . fromUUID could replace non-UTF8
values with '?' or whatever. In practice, a UUID is usually also text,
I only kept open the possibility of it containing invalid UTF-8 to avoid
breaking parsing of strange UUIDs in git-annex branch files. So, I
decided to let this edge case slip by.

Have not updated the rest of the code base yet for this change, as the
change took 2.5 hours longer than I expected to get working properly.
2019-01-07 14:18:24 -04:00
Joey Hess
a80922a594
support for ByteStrings 2019-01-07 12:29:25 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
f574d8af10
comment typo 2019-01-03 00:22:05 -04:00
Joey Hess
3ba6e9bb96
use attoparsec parser for String parsing, 10x speedup
This is not as efficient as using ByteStrings throughout, but converting
the String to ByteString is actually significantly faster than the old
parser.

    benchmarking parse/old
    time                 9.657 μs   (9.600 μs .. 9.732 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 9.703 μs   (9.645 μs .. 9.785 μs)
    std dev              231.6 ns   (161.5 ns .. 323.7 ns)
    variance introduced by outliers: 25% (moderately inflated)

    benchmarking parse/new
    time                 834.6 ns   (797.1 ns .. 886.9 ns)
                         0.987 R²   (0.976 R² .. 0.999 R²)
    mean                 816.4 ns   (802.7 ns .. 845.1 ns)
    std dev              62.39 ns   (37.66 ns .. 108.4 ns)
    variance introduced by outliers: 82% (severely inflated)

There is a small behavior change from the old parsePOSIXTime,
which accepted any amount of trailing whitespace after the timestamp.
That behavior was not documented, and it doesn't seem anything relied on it.
2019-01-02 13:28:44 -04:00
Joey Hess
3c74dcd4e1
attoparsec parser for POSIXTime
(Not yet used anywhere.)

Benchmarking

{-# LANGUAGE OverloadedStrings #-}

import Criterion.Main
import Utility.TimeStamp
import Data.Attoparsec.ByteString

main = defaultMain
	[ bgroup "parse"
		[ bench "new" $ whnf (parseOnly (parserPOSIXTime <* endOfInput)) "1431286201.113452s"
		, bench "old" $ whnf parsePOSIXTime "1431286201.113452s"
		]
	]

benchmarking parse/new
time                 643.6 ns   (640.2 ns .. 646.7 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 645.3 ns   (642.1 ns .. 650.9 ns)
std dev              14.59 ns   (9.194 ns .. 22.07 ns)
variance introduced by outliers: 29% (moderately inflated)

benchmarking parse/old
time                 9.657 μs   (9.600 μs .. 9.732 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.703 μs   (9.645 μs .. 9.785 μs)
std dev              231.6 ns   (161.5 ns .. 323.7 ns)
variance introduced by outliers: 25% (moderately inflated)

So old took 9703 ns to parse, and new 643 ns.
2019-01-02 12:48:53 -04:00
Joey Hess
ba2c0663f9
comments 2019-01-01 22:48:14 -04:00
Joey Hess
ec1b9da72f
avoid abusing from/toRawFilePath for non-FilePaths 2019-01-01 22:44:04 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
1b44426805
avoid conflicting definitions of Template type
When both modules are imported and then re-exported.
2018-12-30 15:03:31 -04:00
Joey Hess
5480b3a9af
fix bogus ghc 8.6.3 build warning
ghc warned that the guards did not cover all values of h, but they
clearly do, and when rewritten as a case statement the warning goes
away.

Probably a ghc bug, but I kind of prefer the case statement over the
guards anyway.
2018-12-30 14:43:27 -04:00
Joey Hess
14971414dc
Make test suite work better when the temp directory is on NFS.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.

Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
2018-12-19 12:44:56 -04:00
Joey Hess
850d19d038
add dropFromEnd 2018-11-23 11:24:05 -04:00
Joey Hess
9127fe4821
add DebugLocks build flag
Using the method described in
https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell
but my own code to implement it, and with callstacks added.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-19 15:02:43 -04:00
Joey Hess
ff9bd9620e
Fix resume of download of url when the whole file content is already actually downloaded
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-12 16:08:47 -04:00
Joey Hess
051dfcb3be
Revert "fix comment"
This reverts commit bac7d34e71.

The comment was right; ARG_MAX is the total length of all arguments.
2018-11-06 17:26:20 -04:00
Joey Hess
bac7d34e71
fix comment 2018-11-06 11:42:31 -04:00
Joey Hess
5ad5d45d4c
make Arbitrary POSIXTime include decimal half the time 2018-10-31 16:27:55 -04:00
Joey Hess
2ca408dc33
Increase minimum QuickCheck version. 2018-10-31 15:53:22 -04:00
Joey Hess
f00b329e0c
remove unused import 2018-10-30 13:38:29 -04:00
Joey Hess
86df2a08fe
fix windows build 2018-10-30 11:09:45 -04:00
Joey Hess
5ab0f48ffb
high-res mtimes
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).

Including on Windows.

With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
2018-10-30 00:41:26 -04:00
Joey Hess
48af284872
fix parse of negative posix time
Should never happen, but..
2018-10-29 23:40:34 -04:00
Joey Hess
a8ad577d1d
fix parsing of timestamp w/o trailing 's'
Luckily, this did not affect any git-annex log files, since they all
include the trailing 's' for backwards compatability reasons.

But, if I later want to drop that, this is the first commit where
git-annex can be trusted to parse that right.

The misparse caused it to be off by up to 10 seconds.
2018-10-29 23:36:47 -04:00
Joey Hess
3d1b22dc8e
factor out another function 2018-10-29 23:33:56 -04:00
Joey Hess
2e9f128dea
moved module and relicensed 2018-10-29 23:13:36 -04:00
Joey Hess
5d97898a7c
touch files with high-resolution timestamp
Needs unix 2.7.2, but that was included in ghc 8.0.1 (and much older)
so not really a new dep.
2018-10-29 22:25:21 -04:00
Joey Hess
94b7968f1f
forgot to remove this when dropping support for old ghc 2018-10-29 22:01:06 -04:00
Joey Hess
595fb98473
add small delay to avoid problems on systems with low-resolution mtime
I've seen intermittent failures of the test suite with v6 for a long time,
it seems to have possibly gotten worse with the changes around v7. Or just
being unlucky; all tests failed today.

Seen on amd64 and i386 builders, repeatedly but intermittently:

	unused: FAIL (4.86s)
	Test.hs:928:
	git diff did not show changes to unlocked file

And I think other such failures, all involving v7/v6 mode tests.

I managed to reproduce the unused failure with --keep-failures,
and inside the repo, git diff was indeed not showing any changes for
the modified unlocked file.

The two stats will be the same other than mtime; the old and new files have
the same size and inode, since the test case writes to the file and then
overwrites it.

Indeed, notice the identical timestamps:

	builder@orca:~/gitbuilder/build/.t/tmprepo335$ echo 1 > foo; stat foo; echo 2 > foo; stat foo
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.894942036 +0000
	Change: 2018-10-29 22:14:10.894942036 +0000
	 Birth: -
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.898942036 +0000
	Change: 2018-10-29 22:14:10.898942036 +0000
	 Birth: -

I'm seeing this in Linux VMs; it doesn't happen on my laptop. I've also
not experienced the intermittent test suite failures on my laptop.

So, I hope that this small delay will avoid the problem.

Update: I didn't, indeed I then reproduced the same failure on my
laptop, so it must be due to something else. But keeping this change anyway
since not needing to worry about lowish-resolution mtime in the test suite seems
worthwhile.
2018-10-29 19:31:26 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
38d691a10f
removed the old Android app
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.

[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.

Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.

Some documentation specific to the Android app (screenshots etc) needs
to be updated still.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-13 01:41:11 -04:00
Joey Hess
45e09ea7f3
debug the full adjusted Request
So that the user-agent etc are included in the debug.
2018-10-04 13:45:27 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
502c5a4917
remove support for old http-client version
git-annex already bumped to a newer version for the http security fix.

This commit was sponsored by mo on Patreon.
2018-10-03 12:00:07 -04:00
Joey Hess
c88e8c8249
unify error display 2018-10-03 11:56:52 -04:00
Joey Hess
26a02cb386
display error when an invalid url is downloaded
download is documented as displaying an error when download fails, but
it didn't when the url was not valid at all. That leads to confusing
behavior.

Also, display the url with --debug
2018-09-25 13:38:20 -04:00
Joey Hess
cc82f81227
More FreeBSD build fixes.
Untested, on FreeBSD but enough to fix the listed build errors.

Seems that System.Posix.Files must have used to export this stuff and it
was split.

This commit was sponsored by Peter on Patreon.
2018-09-24 11:25:56 -04:00
Joey Hess
ceee7758a5
fix \ escaping 2018-09-22 11:33:08 -04:00
Joey Hess
d2c351f547
update windows NUL for ghc 8.6.1
This should also work with older ghc, since the path is a windows device
namespace path.
2018-09-22 11:31:55 -04:00
Joey Hess
2aae6e84af
Support newlines in filenames.
Work around git cat-file --batch's protocol not supporting newlines by
running git cat-file not batched and passing the filename as a
parameter.

Of course this is quite a lot less efficient, especially because it
currently runs it multiple times to query for different pieces of
information.

Also, it has subtly different behavior when the batch process was
started and then some changes were made, in which case the batch process
sees the old index but this workaround sees the current index. Since
that batch behavior is mostly a problem that affects the assistant and has
to be worked around in it, I think I can get away with this difference.

I don't know of any other problems with newlines in filenames, everything
else in git I can think of supports -z. And git-annex's json output
supports newlines in filenames so downstream parsers from git-annex will be ok.
git-annex commands that use --batch themselves don't support newlines
in input filenames; using --json --batch is currently a way around that
problem.

This commit was sponsored by Ewen McNeill on Patreon.
2018-09-20 13:45:44 -04:00
Yaroslav Halchenko
b976eb5353
BF(minor): missing space after "Unsupported url scheme" msg before the scheme 2018-09-18 18:19:20 -04:00
Joey Hess
b3c9c59d3d
--debug urls
When git-annex used wget and curl, --debug would show urls. So there can't
be any new security problem with doing so.

This commit was sponsored by John Pellman on Patreon.
2018-09-14 12:46:39 -04:00
Joey Hess
b18fb1e343
clean P2P protocol shutdown on EOF
Avoids "git-annex-shell: <stdin>: hGetChar: end of file"
being displayed by the test suite, due to the way it
runs git-annex-shell without using ssh.

git-annex-shell over ssh was not affected because git-annex hangs up the
ssh connection and so never sees the error message that git-annnex-shell
probably did emit.

This commit was sponsored by Ryan Newton on Patreon.
2018-09-13 10:46:37 -04:00
Joey Hess
872640549b
comment typo 2018-09-05 13:57:06 -04:00
Joey Hess
f4788f3853
clarify comment
haskell-mountpoints contains android specific code, but it's not used
when git-annex was built for linux and is running on android.
2018-09-05 11:22:27 -04:00
Joey Hess
55f8d90dee
remove Utlity.SRV, no longer used 2018-09-05 11:15:33 -04:00
Joey Hess
f54c72d2e1
Fix build on FreeBSD
This must have been broken for years..

This commit was sponsored by Jack Hill on Patreon.
2018-08-29 12:09:03 -04:00
Joey Hess
c565340adc
stop using external hash programs, since cryptonite is faster
In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external
hashers". Re-benchmarking today, I found cryptonite's sha256 consistently
outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb
files with both sha256 and sha512. And for smaller files, the external
process startup time swamps the hash time.

Perhaps cryptonite has improved. Or it could just do better on my
current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite
is slower in some situations, seems likely it would only be marginally slower;
it's got the same class of highly optimised C code under the hood as coreutils.
The main difference between the two sha256 implementations seems to be
how much of the inner loop they unroll..

This commit was sponsored by Henrik Riomar on Patreon.
2018-08-28 18:10:58 -04:00
Joey Hess
6a445dc086
support conditionally excluding queued files
Switched code to use a for loop to avoid a filterM that would have
doubled the memory used.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 14:38:37 -04:00
Joey Hess
218c76b789
avoid unused imports warning on non-linux 2018-08-07 15:06:33 -04:00
Joey Hess
e1ab01f94d
Fix reversion in display of http 404 errors.
Switch to using http-client for large file downloads caused the reversion;
the code for displaying a 404 response was instead displaying the raw html
document, which is not useful.

This commit was sponsored by Ryan Newton on Patreon.
2018-07-31 12:15:26 -04:00
Joey Hess
50609da787
fix User-Agent reversion
Send User-Agent and any configured annex.http-headers when downloading with
http, fixes reversion introduced when switching to http-client.

This commit was sponsored by mo on Patreon.
2018-07-16 11:56:47 -04:00
Joey Hess
ac228fa723
don't import all of System.Posix.Files
This avoid a build problem when different versions of posix and
posixcompat are used. Does not normally happen as cabal prevents that,
but this is sometimes used with ghc --make which can get into that
situation.
2018-07-10 12:04:49 -04:00
Joey Hess
3dd7f450c1
fix p2p --pair
p2p --pair: Fix interception of the magic-wormhole pairing code, which
since 0.8.2 it has sent to stderr rather than stdout.

This is highly annoying because I had asked the magic wormhole developers
for a machine-readable way to get the data, and instead they changed how
the data was output, and didn't even mention this in my issue, or in the
changelog.

Seems this needs to be tested periodically to make sure it's still working.

This commit was sponsored by Ethan Aubin.
2018-07-04 15:14:03 -04:00
Joey Hess
3976b89116
fix license date
I wrote this this year
2018-06-22 10:25:53 -04:00
Joey Hess
22f49f216e
get android building the security fix
Had to update http-client and network, with follow-on dep changes.

This commit was sponsored by Brock Spratlen on Patreon.
2018-06-21 10:23:04 -04:00
Joey Hess
923578ad78
improve error message
This commit was sponsored by Jack Hill on Patreon.
2018-06-19 14:21:41 -04:00
Joey Hess
47cd8001bc
call base ManagerSetting's exception wrapper
This commit was sponsored by Henrik Riomar on Patreon.
2018-06-19 14:17:05 -04:00
Joey Hess
fc79f68404
support building on debian stable
Specifically, http-client-0.4.31

This commit was supported by the NSF-funded DataLad project.
2018-06-19 11:25:10 -04:00
Joey Hess
3c0a538335
allow ftp urls by default
They're no worse than http certianly. And, the backport of these
security fixes has to deal with wget, which supports http https and ftp
and has no way to turn off individual schemes, so this will make that
easier.
2018-06-18 15:37:17 -04:00
Joey Hess
cc08135e65
prevent using local http proxies per annex.security.allowed-http-addresses
A local http proxy would bypass the security configuration. So,
the security configuration has to be applied when choosing whether to
use the proxy.

While http rebinding attacks against the dns lookup of the proxy IP
address seem very unlikely, this implementation does prevent them, since
it resolves the IP address once, checks it, and then reconfigures
http-client's proxy using the resolved address.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-06-18 13:32:20 -04:00
Joey Hess
b54b2cdc0e
prevent http connections to localhost and private ips by default
Security fix!

* git-annex will refuse to download content from http servers on
  localhost, or any private IP addresses, to prevent accidental
  exposure of internal data. This can be overridden with the
  annex.security.allowed-http-addresses setting.
* Since curl's interface does not have a way to prevent it from accessing
  localhost or private IP addresses, curl defaults to not being used
  for url downloads, even if annex.web-options enabled it before.
  Only when annex.security.allowed-http-addresses=all will curl be used.

Since S3 and WebDav use the Manager, the same policies apply to them too.

youtube-dl is not handled yet, and a http proxy configuration can bypass
these checks too. Those cases are still TBD.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-06-17 13:30:28 -04:00
Joey Hess
43bf219a3c
added makeAddressMatcher
Would be nice to add CIDR notation to this, but this is the minimal
thing needed for the security fix.

This commit was sponsored by Ewen McNeill on Patreon.
2018-06-17 13:29:15 -04:00
Joey Hess
014a3fef34
added isPrivateAddress and isLoopbackAddress
For use in a security boundary enforcement.

Based on https://en.wikipedia.org/wiki/Reserved_IP_addresses

Including supporting IPv4 addresses embedded in IPv6 addresses. Because
while RFC6052 3.1 says "Address translators MUST NOT translate packets
in which an address is composed of the Well-Known Prefix and a non-
global IPv4 address; they MUST drop these packets", I don't want to
trust that implementations get that right when enforcing a security
boundary.

This commit was sponsored by John Pellman on Patreon.
2018-06-17 13:28:25 -04:00
Joey Hess
40e8358284
add Utility.HttpManagerRestricted
This is a clean way to add IP address restrictions to http-client, and
any library using it.
See https://github.com/snoyberg/http-client/issues/354#issuecomment-397830259

Some code from http-client and http-client-tls was copied in and
modified. Credited its author accordingly, and used the same MIT license.

The restrictions don't apply to http proxies. If using http proxies is a
problem, http-client already has a way to disable them.
SOCKS support is not included. As far as I can tell, http-client-tls
does not support SOCKS by default, and so git-annex never has.

The additional dependencies are free; git-annex already transitively
depended on them via http-conduit.

This commit was sponsored by Eric Drechsel on Patreon.
2018-06-16 18:44:13 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
caaedb2993
fix http-client gzip decompression bug
Prevent haskell http-client from decompressing gzip files, so downloads of
such files works the same as it used to with wget and curl.

Explicitly setting accept-encoding to "identity" is probably not needed,
but that's what wget sends (curl does not send the header), and since
http-client is trying to be excessively smart, it seems we need to set
hAcceptEncoding to something to prevent it from inserting its own,
and this seems better than some hack like "".

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-05-21 15:10:25 -04:00
Joey Hess
6a63920732
fix build 2018-05-21 11:00:23 -04:00
Joey Hess
5204e1dd9d
Workaround for bug in an old version of cryptonite that broke https downloads, by using curl for downloads when git-annex is built with it.
This commit was supported by the NSF-funded DataLad project.
2018-05-20 14:12:37 -04:00
Joey Hess
86958fda5d
fix build with old http-client 2018-05-10 00:22:23 -04:00
Joey Hess
db720f6a9c
Display error message when http download fails.
* Display error message when http download fails.

  There's nothing in the http-client library to nicely format a http
  exception, so in some cases it has to fall back to using show on it.
  Seems better than just saying "it failed" or only showing the http
  status code.

* Avoid forward retry when 0 bytes were received.

  forwardRetry was comparing Nothing to Just 0, and so thought there had
  been progress made when 0 bytes were received.

This commit was supported by the NSF-funded DataLad project.
2018-05-08 16:11:45 -04:00
Joey Hess
7dc28dc705
Support building with hinotify-0.3.10.
Kept backwards compat with old versions via a shim.

This commit was sponsored by mo on Patreon.
2018-05-08 14:43:06 -04:00
Joey Hess
2948f6d916
avoid uname -o on !linux and catch any exception from it
Fix bug in last release that prevented the webapp opening on non-Linux systems.

This commit was sponsored by Jake Vosloo on Patreon.
2018-05-08 14:06:19 -04:00
Joey Hess
a8c91ce69a
add streamDirectoryContents 2018-04-26 13:38:36 -04:00
Joey Hess
f5df6244f3
deal with getMounts crashing on android 2018-04-25 17:42:27 -04:00
Joey Hess
096bb0aa21
improve comment 2018-04-25 16:57:57 -04:00
Joey Hess
6e862f47dd
deal with uname -o newline 2018-04-25 16:53:17 -04:00
Joey Hess
9807e5bead
fix webapp opening in termux
Open real url not html shim since android and file:// urls is a nasty
kettle of fish.

This commit was sponsored by John Pellman on Patreon.
2018-04-25 14:38:42 -04:00
Joey Hess
3c6e60dc69
fix build with old http-conduit 2018-04-24 21:23:40 -04:00
Joey Hess
526243d6f5
catch exceptions from getEffectiveUserID
This fixes a crash when using the linux_standalone build in termux on
android.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-04-24 20:10:10 -04:00
Joey Hess
aebf9e6dd5
Fix build with yesod 1.6.
Also avoid some depreaction warnings.
2018-04-22 13:56:35 -04:00
Joey Hess
256d8f07e8
avoid insertWith' depreaction warning
Switch to Data.Map.Strict everywhere that used it.

There are still lots of lazy maps in git-annex. I think switching these
is safe. The risk is that there might be a map that is used in a way
that relies on the values not being evaluated to WHNF, and switching to
strict might result in bad performance or memory use. So, I have not
switched everything.
2018-04-22 13:28:31 -04:00
Joey Hess
558a0a9328
deal with conduit 1.3 change
I don't know if this will build with older conduit, it may need an
ifdef.
2018-04-22 13:14:55 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
e5a404ebe2
fix build with old version of http-client 2018-04-09 13:04:23 -04:00
Joey Hess
c8f2d302dc
run curl when configured to do it at runtime, even if not available at build time 2018-04-06 21:17:36 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
36e6b8abbf
Fix resuming a download when using curl.
Noticed a bug; when using curl a workaround for its empty file behavior
overwrote the file content, so it never resumed and always started over.
2018-04-06 16:09:53 -04:00
Joey Hess
0f6775f1ff
refactor sinkResponseFile and add downloadC
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.

downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.

git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 16:07:08 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
bebf541aa7
Fix calculation of estimated completion for progress meter.
Was estimating transfer of whole file, not remaining part of it.
2018-03-19 23:26:41 -04:00
Joey Hess
2c05bc9dfd
fix build with old base
Old base (used on android still) lacks tryReadMVar
2018-03-16 12:06:45 -04:00
Joey Hess
d2af6baaeb
fixed processTranscript hang problem
The pipe's FDs got inherited by ssh and it did something that kept them
open even once it exited. Probably involving passing them on to the ssh
mux daemon.

Set close on exec, and all is well.

Kept Annex.Ssh not using processTranscript even though it no longer
hangs when it does use it, just because processTranscript is overkill
there.

This commit was supported by the NSF-funded DataLad project.
2018-03-15 16:14:22 -04:00
Joey Hess
d6700721c0
simplify with async
This is much clearer to follow.

I've tested this, and it still has the problem described in
doc/bugs/occasional_hang_with_p2pstdio.mdwn

Which I think indicates that problem is not with my code, but something
else. ghc runtime? Something crazy ssh does in this situation? Unsure..
2018-03-15 15:34:25 -04:00
Joey Hess
7d83502329
add comments explaining puzzling code 2018-03-15 14:47:54 -04:00
Joey Hess
521d4ede1e
fix build with cryptonite-0.20
Some blake hash varieties were not yet available in that version.
Rather than tracking exact details of what cryptonite supported when,
disable blake unless using a current cryptonite.
2018-03-15 11:16:00 -04:00
Joey Hess
ba44ca80e6
Include amount of data transferred in progress display. 2018-03-14 13:39:14 -04:00
Joey Hess
050ada746f
Added backends for the BLAKE2 family of hashes.
There are a lot of different variants and sizes, I suppose we might as well
export all the common ones.

Bump dep to cryptonite to 0.16, earlier versions lacked BLAKE2 support.
Even android has 0.16 or newer.

On Debian, Blake2bp_512 is buggy, so I have omitted it for now.
http://bugs.debian.org/892855

This commit was sponsored by andrea rota.
2018-03-13 16:23:42 -04:00
Joey Hess
e16b069331
use total size from DATA
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.

Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.

In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.

When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-12 21:46:58 -04:00
Joey Hess
c036a380b2
p2p ssh connection pools
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.

Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.

Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)

Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-03-08 15:11:31 -04:00
Joey Hess
3dd43df9c2
Better ssh connection warmup when using -J for concurrency.
Avoids ugly messages when forced ssh command is not git-annex-shell.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-03-07 17:30:14 -04:00
Joey Hess
84e4874ae0
windows build fix 2018-01-05 15:09:10 -04:00
Joey Hess
e4f00e891d
fix windows build 2018-01-04 14:23:11 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
1f5bf73af0
Revert "git-annex.cabal: Add back custom-setup stanza, so cabal new-build works."
This reverts commit 51228c2306.

No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.

With cabal, System.Posix.Process and System.Posix.Env are both missing.
2017-12-31 14:09:41 -04:00
Joey Hess
51228c2306
git-annex.cabal: Add back custom-setup stanza, so cabal new-build works.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-12-31 13:54:41 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
70344d25c0
type signature works for both old and new versions of ifdef 2017-12-11 12:49:23 -04:00
Joey Hess
c6e4bc0a22
fix regression in addurl --file caused by youtube-dl support
Now youtubeDlCheck downloads the beginning of the url's content and
checks if it's html, only when it is does it pass it off the youtube-dl
to check if it supports it.

This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.

As well as the reported bug, this also fixes behavior when an url
was added with youtube-dl, but the url content has now changed from
a html page to something else. Remote.Web.checkKey used to wrongly
succeed in that situation, since youtube-dl said sure it can download
that something else.

This commit was supported by the NSF-funded DataLad project.
2017-12-06 13:22:31 -04:00
Joey Hess
ed701667aa
fix gpg subkey support typo
initremote, enableremote: Really support gpg subkeys suffixed with an
exclamation mark, which forces gpg to use a specific subkey. (Previous try
had a bug.)

This commit was sponsored by Jake Vosloo on Patreon.
2017-12-05 13:58:53 -04:00
Joey Hess
24f27ec39d
convert importfeed to youtube-dl
Fully working, including --fast/--relaxed.

Note that, while git-annex addurl --relaxed is not going to check
youtube-dl, I kept git annex importfeed --relaxed checking it.
Thinking is that, let's not break people's importfeed cron jobs, and
importfeed does not typically have to check a large number of new items,
so it's ok if it's a little bit slower when used with youtube playlist
feeds.

importfeed's behavior is also improved (?) when a feed has links in it
to non-media files. Before, those were skipped. Now, the content of the
link is downloaded. This had to be done, because trying to use
youtube-dl is slow, and if those were skipped, it would have to check
every time importfeed was run. While this behavior change may not be
desirable for some feeds, that intersperse links to web pages with
enclosures, it will be desirable for other feeds, that have
non-enclosure directy links to media files.

Remove old quvi modules.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-29 17:30:02 -04:00
Joey Hess
3febb79c8f
wip 2017-11-28 17:17:40 -04:00
Joey Hess
57b4c5bdff
add Utility.HtmlDetect
This will be used in youtube-dl integration, to tell when a html page has
been downloaded by addurl, in which case it is worth running youtube-dl
to see if it can extract media from it.

tagsoup is an almost free dependency, because yesod depends on it.
So, this only really adds a dep when git-annex is built without the
webapp.

I'd like this to as closely as possible match how browsers decide if a
page is html or not. Unfortunately, that is fairly heuristic, in order
to support malformed html. And, we don't want to falsely detect
something as html just because it has something that looks like a html
tag embedded somewhere in it. Probably any major video hosting site is
going to be serving html documents that at least start with a <html>
tag, so requiring that or a DOCTYPE should be good enough.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-11-28 13:03:11 -04:00
Joey Hess
ed9d5da2d5
Fix build with dns-3.0.
This commit was sponsored by Henrik Riomar on Patreon.
2017-11-24 10:49:31 -04:00
Joey Hess
1b6cbb63e9
still can't express custom-setup deps
They need unix on non-windows, for Utility.Env, which Build.Configure uses,
but cabal can't express that in a custom-setup stanza.

To avoid this problem, Utility.Env would need to be moved into
unix-compat..
2017-11-14 14:59:51 -04:00
Joey Hess
8d68112be5
split out setEnv to avoid adding dep
Windows needs the setenv package in custom-setup, but I don't want to
pull it in on unix, which would probably break some builds and need more
work. Instead, split out setEnv to a separate module.

Quite likely, unix-compat will get a portable environment layer, and
then both modules can be removed from here.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-14 14:28:49 -04:00
Joey Hess
07c4be500d
clean up build warnings on Windows 2017-11-14 14:14:10 -04:00
Joey Hess
8dd84b87f9
use unix-compat 0.5 on windows
Re-applying 3ec579f5e1
2017-11-14 14:00:24 -04:00
Joey Hess
1bd956bed4
Revert "Revert "remove dep on Win32-extras""
This reverts commit d18bc52caf.
2017-11-13 12:55:23 -04:00
Joey Hess
5f55082d10
Revert "use unix-compat 0.5 on windows"
This reverts commit 3ec579f5e1.

Too early for this; needs newer Win32 version. Le sigh.
2017-11-09 15:14:00 -04:00
Joey Hess
d18bc52caf
Revert "remove dep on Win32-extras"
This reverts commit 8b5480c66a.

Yeah, too early for that too
2017-11-09 15:09:14 -04:00
Joey Hess
8b5480c66a
remove dep on Win32-extras
Win32 now has its own getCurrentProcessId.
2017-11-09 14:03:06 -04:00
Joey Hess
3ec579f5e1
use unix-compat 0.5 on windows
That version has my patches for the problems that Utility.PosixFiles
was working around, so am able to get rid of that module now.

This will later allow bringing back the custom-setup stanza in the cabal
file. It will need to depend on unix-compat 0.5 on all OS's, which I'm
not ready to do yet.

This commit was sponsored by Nick Daly on Patreon.
2017-11-09 12:47:05 -04:00
Joey Hess
c3449ff91d
finish fix for gitAnnexLink on windows
dropDrive needed since if splitPath splits out the drives, they would
appear different.
2017-10-26 12:01:16 -04:00
Joey Hess
584dbfb892
terminateProcessId renamed
win32 upstream suggested a better name
2017-10-25 19:46:28 -04:00
Joey Hess
0ae2ac282e
fix gitAnnexLink to not be absolute on Windows
Windows: Fix reversion that caused the path used to link to annexed
content include the drive letter and full path, rather than being
relative. (`git annex fix` will fix up after this problem).

I've not identified the commit that brought the reversion (probably it
happened this spring when I was removing MisingH and last touched
Utility.Path). Likely commit 18b9a4b8024115db67ae309fdaf54e1553037529?

The problem is that relPathDirToFile got called two paths that had the
slashes different ways around. Since takeDrive includes the first slash,
this made two paths on the same drive seem different and it bailed.

(ifdefs around this to avoid doing extra work on non-windows)

This commit was sponsored by Jack Hill on Patreon.
2017-10-25 19:36:29 -04:00
Joey Hess
833b3f06cd
build for windows with forked win32 package that has terminateProcessId
Get ugly reversion out of CHANGELOG.

Also, relocated the windows stack.yaml to top, and updated windows build
instructions.

This commit was sponsored by Henrik Riomar on Patreon.
2017-10-25 14:45:23 -04:00
Joey Hess
78b5e759a5
fix 2017-10-24 13:20:26 -04:00
Joey Hess
3e839ab327
temporary hack to get windows build working
Code for terminating processes on Windows is not linking anymore;
made a warning be displayed instead. This breaks restarting the
assistant and git annex assistant --stop.

I hope to see the code added to the Win32 library, where it should fit
better and should avoid whatever problem is making the linker not like it
when included in git-annex. I opened an issue requesting its addition,
here: https://github.com/haskell/win32/issues/91

This commit was sponsored by Thomas Hochstein on Patreon.
2017-10-24 13:16:40 -04:00
Joey Hess
901807cf75
Revert "try to avoid TerminateProcess link error on windows"
This reverts commit 839ec7e26c.

Neither way is working.. The other way failed:

.stack-work\dist\5f9bc736\build\git-annex\git-annex-tmp\Assistant.o:fake:(.text+0x6bb3): undefined reference to `terminatepid'

Seems that winprocess.c is not getting linked in.
2017-10-24 13:05:24 -04:00
Joey Hess
24ba7c4296
add winprocess.h 2017-10-24 12:48:04 -04:00
Joey Hess
839ec7e26c
try to avoid TerminateProcess link error on windows
Building with stack, it failed:

`_TerminateProcess' referenced in section `.text' of .stack-work\dist\5f9bc736\build\git-annex\git-annex-tmp\Utility\WinProcess.o: defined in discarded section `.text' of C:/Users/jenkins/AppData/Local/Programs/stack/i386-windows/ghc-8.0.2/mingw/bin/../lib/gcc/i686-w64-mingw32/5.2.0/../../../../i686-w64-mingw32/lib/../lib/libkernel32.a(dacgs01154.o)

This is a reversion of 86e638567a,
to try the other way to implement it, which will hopefully avoid the problem.
2017-10-24 12:33:44 -04:00
Joey Hess
92c7e67022
temporarily import from win32-extras 2017-10-24 12:06:41 -04:00
Joey Hess
93d5951f11
remove redundant pattern match 2017-09-24 16:17:58 -04:00
Joey Hess
01068d8280
fix build with old http-client 2017-09-13 15:35:42 -04:00
Joey Hess
2ca1d3cc01
deal with box.com horrible infinite redirect behavior
webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.

It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.

Deal with such problems by assuming such behavior means the file is not
present.

Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 15:13:42 -04:00
Joey Hess
bb08b1abd2
make storeExport atomic
This avoids needing to deal with the complexity of partially transferred
files in the export. We'd not be able to resume uploading to such a file
anyway, so just avoid them.

The implementation in Remote.Directory is not completely ideal, because
it could leave the temp file hanging around in the export directory.
This only happens if it's killed with -9, or there's a power failure;
normally viaTmp cleans up after itself, even when interrupted. I could
not see a better way to do it though, since the export directory might
be the root of a filesystem.

Also some design thoughts on resuming, which depend on storeExport being
atomic.

This commit was sponsored by Fernando Jimenez on Partreon.
2017-08-31 14:24:32 -04:00
Joey Hess
df11e54788
avoid the dashed ssh hostname class of security holes
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.

No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.

Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.

SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.

Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.

This commit was sponsored by Jochen Bartl on Patreon.
2017-08-17 22:11:31 -04:00
Joey Hess
0a2f7c261f
fix build with old http-client versions 2017-08-17 11:00:48 -04:00
Joey Hess
266bf43632
make import work with Win32 instead of Win32-extras 2017-08-16 17:51:29 -04:00
Joey Hess
69dcb08d7a
Disable http-client's default 30 second response timeout when HEADing an url to check if it exists. Some web servers take quite a long time to answer a HEAD request. 2017-08-15 13:56:12 -04:00
Joey Hess
8526cd7c92
test: Avoid most situations involving failure to delete test directories
By forking a worker process and only deleting the test directory once it exits.

This way, if a test leaves files open, they'll get closed when the worker
exits, so avoiding failure to delete open files on Windows, and failure to
delete directories due to NFS lock files.

If a test leaves a git worker process running, the closed pipes should
cause the worker to exit too, also avoiding the problem there. The 10
second sleep ought to give plenty of time for such worker processes to
exit, although this is of course a race.

Finally, even if test directory fails to be deleted still,
it won't appear as if the last test in the test suite failed; the error
will be displayed at the very end.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 16:29:47 -04:00
Joey Hess
da8e84efe9
fix failing quickcheck properties
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.

As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.

So, making Git.FileName's roundtrip test only chars < 256 seems fine.

Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.

Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.

Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.

This commit was sponsored by Henrik Riomar on Patreon.
2017-06-17 16:48:00 -04:00
Joey Hess
75cecbbe3f
Fix build with QuickCheck 2.10.
QuickCheck added an Arbitrary instance for CTime aka EpochTime. However,
while git-annex's instance disallowed times before the epoch, QuickCheck's
does not. So, rather than using its instance, convert from an Integer.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-06-17 13:04:48 -04:00
Joey Hess
1426f7ff3a
disable closingTracked on OSX
Don't trust OSX FSEvents's eventFlagItemModified to be called when the last
writer of a file closes it; apparently that sometimes does not happen,
which prevented files from being quickly added.

This commit was sponsored by John Peloquin on Patreon.
2017-06-09 14:18:58 -04:00
Joey Hess
19a6227e6e
remove temp file in failure case 2017-06-06 14:23:33 -04:00
Joey Hess
ed639c140d
Fix bug that prevented transfer locks from working when run on SMB or other filesystem that does not support fcntl locks and hard links.
This commit was sponsored by Ethan Aubin.
2017-06-06 14:22:03 -04:00
Joey Hess
7db37ddde0
Fix transfer log file locking problem when running concurrent transfers.
orElse is great, but was not the right thing to use here because
waitTakeLock could retry for other reasons than the lock being held,
which made tryTakeLock fail when it shouldn't.

Instead, move the code to tryTakeLock and implement waitTakeLock using
tryTakeLock and retry.

(Also, in runTransfer, when checkSaneLock fails, dropLock to avoid leaking a
lock handle.)

This commit was supported by the NSF-funded DataLad project.
2017-05-25 17:40:23 -04:00
Joey Hess
9bddc6d5ca
Improve progress display when watching file size, in cases where a transfer does not resume.
This commit was supported by the NSF-funded DataLad project.
2017-05-25 14:30:18 -04:00
Joey Hess
77ba430b38
tighten forced subkey matching
Someone might have a name or email address ending in a bang..
2017-05-24 14:54:54 -04:00
Joey Hess
35465b6062
initremote, enableremote: Support gpg subkeys suffixed with an exclamation mark, which forces gpg to use a specific subkey.
This commit was sponsored by Peter Hogg on Patreon.
2017-05-24 14:08:02 -04:00
Joey Hess
7ec72e3874
optimisation
Avoids N^2 list traversal.
2017-05-16 11:33:53 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
6dd806f1ad
stop using MissingH for MD5
Cryptonite is faster and allocates less, and I want to get rid of
MissingH use.

Note that the new dependency on memory is free; it's a dependency of
cryptonite.

This commit was supported by the NSF-funded DataLad project.
2017-05-15 21:36:03 -04:00
Joey Hess
18b9a4b802
remove absNormPathUnix again
Moving toward dropping MissingH dep.

I think I've addressed the problem identified earlier in
09a66f702d. On Windows,
absPathFrom "/tmp/repo/xxx" "y/bar" would be "/tmp/repo/xxx\\y/bar",
which then confuses relPathDirToFile. Fixed by converting to unix (git)
style paths.

Also, relPathDirToFile was splitting only on \\ on windows and not /
which broke the example in 09a66f702d of
relPathDirToFile (absPathFrom "/tmp/repo/xxx" "y/bar") "/tmp/repo/.git/annex/objects/xxx"

Now, on windows, that will yield "..\\..\\..\\.git/annex/objects/xxx"
which once converted to unix style paths is what we want.
2017-05-15 21:35:35 -04:00
Joey Hess
c3970f6c1a
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting.
This commit was supported by the NSF-funded DataLad project.
2017-03-30 19:35:30 -04:00
Joey Hess
d1ecdd04b2
Windows: Fix bug in shell script shebang lookup code that caused a "delayed read on closed handle" error.
The bug was that withFile closes the handle afterwards, but the content
of the file was not read due to laziness. Using readFile avoids it.

This commit was sponsored by Nick Daly on Patreon.
2017-03-13 16:20:52 -04:00
Joey Hess
1c4e5f65fc
Drop support for building with old versions of directory, feed, and http-types. 2017-03-10 15:57:41 -04:00
Joey Hess
ca49a84ba5
Drop support for building with old versions of dns and http-conduit. 2017-03-10 15:49:14 -04:00
Joey Hess
2ffd74c684
relicense Utility/GPG.hs BSD as the rest of Utility is
The COPYRIGHT had Utility/DirWatcher* listed as GPL, but they were
actually BSD licensed.

No idea why I put the GPL on Utility/GPG.hs file originally.
I wrote all of it, except for guilhem's small changes to it in
00fc21bfec, which seem too small to be
independently copyrightable. I'm relicencing it BSD.
2017-03-10 15:08:21 -04:00
Joey Hess
5358fb992a
Windows: Improve handling of shebang in external special remote program, searching for the program in the PATH.
findShellCommand needs a full path to a file in order to check it for a
shebang on Windows. It was being run with only the base name of the external
special remote program, which would only work when it was in the current
directory.

This is why users in
https://github.com/DanielDent/git-annex-remote-rclone/pull/10 and elsewhere
were complaining that the previous improvements to git-annex didn't make
git-remote-rclone work on Windows.

Also, reworked checkearlytermination, which while it worked, seemed
to rely on a race condition. And, improved its error messages.

This commit was sponsored by Shane-o on Patreon.
2017-03-08 15:59:00 -04:00
Joey Hess
40327cab6e
Removed support for building with the old cryptohash library.
Building with that library made git-annex not support SHA3; it's time for
that to always be supported in case SHA2 dominoes.
2017-02-24 20:56:26 -04:00
Joey Hess
7a0d6d81a0
make curl show http errors to stderr
* Run curl with -S, so HTTP errors are displayed, even when
  it's otherwise silent.
* When downloading in --json or --quiet mode, use curl in preference
  to wget, since curl is able to display only errors to stderr, unlike
  wget.

This does mean that downloadQuiet is only silent on stdout, not necessarily
on stderr, which affects a couple other calls of it. For example,
downloading the .git/config of a http remote may show an error message now,
perhaps with slightly suboptimal formatting due to other output.
2017-02-20 16:09:32 -04:00
Joey Hess
8dd3635acf
improve layout 2017-02-20 15:44:14 -04:00
Joey Hess
4a397b5313
Run wget with -nv instead of -q, so it will display HTTP errors.
This adds one extra line of output when a download is successful,
after the progress bar. I don't much like that, but wget does not provide a
way to show HTTP errors without it.
2017-02-20 15:25:02 -04:00
Joey Hess
113b10cdc9
simpler more generic processTranscript'
This allows using functions that generate CreateProcess and passing the
result to processTranscript', which is more flexible, and also simpler
than the old interface.

This commit was sponsored by Riku Voipio.
2017-02-15 16:02:10 -04:00
Joey Hess
3b22ad9f47
Work around sqlite's incorrect handling of umask when creating databases.
Refactored some common code into initDb.

This only deals with the problem when creating new databases. If a repo
got bad permissions into it, it's up to the user to deal with it.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-02-13 17:39:16 -04:00
Edward Betts
0750913136
correct spelling mistakes 2017-02-12 17:30:23 -04:00
Joey Hess
5e6ced7d0f
Improve pid locking code to work on filesystems that don't support hard links.
Probing for hard link support in the pid locking code is redundant since
git-annex init already probes that. But, it didn't seem worth threading
that data through; the pid locking code runs at most once per git-annex
process, and only on unusual filesystems. Optimising a single hard link
and unlink isn't worth it.

This commit was sponsored by Francois Marier on Patreon.
2017-02-10 15:22:28 -04:00
Joey Hess
3fe9d99f24
wormhole pairing appid flag day 2021-12-31
Wormhole pairing will start to provide an appid to wormhole on 2021-12-31.
An appid can't be provided now because Debian stable is going to ship a
older version of git-annex that does not provide an appid. Assumption is
that by 2021-12-31, this version of git-annex will be shipped in a Debian
stable release. If that turns out to not be the case, this change will need
to be cherry-picked into the git-annex in Debian stable, or its wormhole
pairing will break.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-02-03 15:06:40 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
23d71423e1
work around ghc segfault
hSetEncoding of a closed handle segfaults.
https://ghc.haskell.org/trac/ghc/ticket/7161

8484c0c197 introduced the crash.
In particular, stdin may get closed (by eg, getContents) and then trying
to set its encoding will crash. We didn't need to adjust stdin's
encoding anyway, but only stderr, to work around
https://github.com/yesodweb/persistent/issues/474

Thanks to Mesar Hameed for assistance related to reproducing this bug.
2016-12-30 18:14:19 -04:00
Joey Hess
1c744b9512
more windows build fix 2016-12-30 16:39:51 -04:00
Joey Hess
dfbd303d66
fix windows build 2016-12-30 11:37:24 -04:00
Joey Hess
cf6c5d5ca9
fix windows build 2016-12-30 11:37:06 -04:00
Joey Hess
3ec1478767
fix build with old ghc 2016-12-30 11:10:20 -04:00
Joey Hess
d785a565ef
make this build under windows 2016-12-30 11:04:00 -04:00
Joey Hess
e92f2d1080
improve description of password prompting
Since the user does not know whether it will run su or sudo, indicate
whether the password prompt will be for root or the user's password,
when possible.

I assume that programs like gksu that can prompt for either depending on
system setup will make clear in their prompt what they're asking for.
2016-12-28 16:07:49 -04:00
Joey Hess
10e4d93212
Support all common locations of the torrc file. 2016-12-28 15:12:31 -04:00
Joey Hess
924fdea53f
fix windows build 2016-12-28 15:00:44 -04:00
Joey Hess
93f7d114db
Merge branch 'no-xmpp' 2016-12-28 12:26:16 -04:00
Joey Hess
9dabe85bb5
whitespace 2016-12-28 00:17:36 -04:00
Joey Hess
41d956e0a0
avoid leaving MVar empty
Something might want to observe the code multiple times.
2016-12-27 16:26:26 -04:00
Joey Hess
9e0aae036b
webapp: check that tor and magic wormhole are installed 2016-12-24 17:08:03 -04:00
Joey Hess
25881f3413
cleanup 2016-12-24 15:14:46 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
c89a9e6ca5
really fix su command 2016-12-24 13:23:52 -04:00
Joey Hess
0e04b22491
fix su params 2016-12-24 13:08:07 -04:00
Joey Hess
405fbd25e1
include tor-annex in hidden service directory names
To make it easier to manage/delete them etc.

Backwards compatablity is preserved for existing tor configs.
2016-12-21 14:39:32 -04:00
Joey Hess
f48b9775d8
cleanup 2016-12-20 17:46:30 -04:00
Joey Hess
f7ca2b92fb
enable-tor: No longer needs to be run as root.
When run by not root, su's to root automatically.

This commit was sponsored by Brock Spratlen on Patreon.
2016-12-20 17:40:36 -04:00
Joey Hess
944a6503b9
relocate tor socket out of /etc
weasel explained that apparmor limits on what files tor can read do not
apply to sockets (because they're not files). And apparently the
problems I was seeing with hidden services not being accessible had to
do with onion address propigation and not the location of the socket
file.

remotedaemon looks up the HiddenServicePort in torrc, so if it was
previously configured with the socket in /etc, that will still work.

This commit was sponsored by Denis Dzyubenko on Patreon.
2016-12-20 16:24:46 -04:00
Joey Hess
e312ec3750
Fix build with directory-1.3.
See https://github.com/haskell/directory/issues/66
2016-12-20 15:23:59 -04:00
Joey Hess
249ddb5953
typo 2016-12-18 17:16:53 -04:00
Joey Hess
7f2e7fa271
check if wormhole is installed 2016-12-18 17:11:13 -04:00
Joey Hess
ccde0932a5
p2p --pair with magic wormhole (untested)
It builds. I have not tried to run it yet. :)

This commit was sponsored by Jake Vosloo on Patreon.
2016-12-18 16:51:41 -04:00
Joey Hess
b2b6296f9d
make sure False is returned on error 2016-12-17 18:31:19 -04:00
Joey Hess
def2019602
improve types 2016-12-17 18:29:59 -04:00
Joey Hess
7cddfca799
document a minor problem 2016-12-17 17:36:55 -04:00
Joey Hess
399d0f1929
use PYTHONUNBUFFERED to force python to use sane stdout buffering
Works around https://github.com/warner/magic-wormhole/issues/108

See http://stackoverflow.com/questions/107705/disable-output-buffering
for the gory details. Why a scripting language would chose a default
stdout buffering that differs between terminal and piped output, and
tends to introduce this kind of bug, I don't know.
2016-12-17 17:28:08 -04:00
Joey Hess
fe6f36d9f3
magic wormhole module
This interacts with it using stdio, which is surprisingly hard.

sendFile does not currently work, due to
https://github.com/warner/magic-wormhole/issues/108

Parsing the output to find the magic code is done as robustly as
possible, and should continue to work unless wormhole radically changes
the format of its codes. Presumably it will never output something that
looks like a wormhole code before the actual wormhole code; that would
also break this. It would be better if there was a way to make
wormhole not mix the code with other output, as requested in
https://github.com/warner/magic-wormhole/issues/104

Only exchange of files/directories is supported. To exchange messages,
https://github.com/warner/magic-wormhole/issues/99 would need to be resolved.
I don't need message exchange however.
2016-12-17 16:58:05 -04:00
Joey Hess
59fead6da3
Pass annex.web-options to wget and curl after other options, so that eg --no-show-progress can be set by the user to disable the default --show-progress. 2016-12-13 11:56:23 -04:00
Alper Nebi Yasak
93a22a1c97
Remove http-conduit (<2.2.0) constraint
Since https://github.com/aristidb/aws/issues/206 is resolved, this
constraint is no longer necessary. However, http-conduit (>=2.2.0)
requires http-client (>=0.5.0) which introduces some breaking changes.
This commit also implements those changes depending on the version.
Fixes: https://git-annex.branchable.com/bugs/Build_with_aws_head_fails/

Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
2016-12-10 10:45:52 -04:00
Joey Hess
15be5c04a6
git-annex-shell, remotedaemon, git remote: Fix some memory DOS attacks.
The attacker could just send a very lot of data, with no \n and it would
all be buffered in memory until the kernel killed git-annex or perhaps OOM
killed some other more valuable process.

This is a low impact security hole, only affecting communication between
local git-annex and git-annex-shell on the remote system. (With either
able to be the attacker). Only those with the right ssh key can do it. And,
there are probably lots of ways to construct git repositories that make git
use a lot of memory in various ways, which would have similar impact as
this attack.

The fix in P2P/IO.hs would have been higher impact, if it had made it to a
released version, since it would have allowed DOSing the tor hidden
service without needing to authenticate.

(The LockContent and NotifyChanges instances may not be really
exploitable; since the line is read and ignored, it probably gets read
lazily and does not end up staying buffered in memory.)
2016-12-09 13:34:32 -04:00
Joey Hess
2ad06ded7e
force sofar calculation
This could avoid a memory leak. It would only happen when
the meter didn't look at sofar.
2016-12-08 16:28:07 -04:00
Joey Hess
ad5ef51040
more p2p progress meters
Display progress meter on send and receive from remote.

Added a new hGetMetered that can read an exact number of bytes (or
less), updating a meter as it goes.

This commit was sponsored by Andreas on Patreon.
2016-12-07 14:25:01 -04:00
Joey Hess
83ea1cec86
update progress meter when sending to p2p remote
This commit was sponsored by Thom May on Patreon.
2016-12-07 13:37:35 -04:00
Joey Hess
53bf5cf8e6
cleanup 2016-11-29 17:52:46 -04:00
Joey Hess
38425fdc39
finish git-annex enable-tor
Make it stash the address away for git-annex p2p to use later, rather
than outputting it. And, look up the UUID itself.
2016-11-29 17:30:27 -04:00
Joey Hess
53b6d9057c
move tor hidden service socket to /etc, temporarily violating the FHS
On Debian, apparmor prevents tor from reading from most locations. And,
it silently fails if it is prevented from reading the hidden service
socket. I filed #846275 about this; violating the FHS is the least bad of a
bad set of choices until that bug is fixed.
2016-11-29 15:29:47 -04:00
Joey Hess
af4d919793
unified AuthToken type between webapp and tor 2016-11-22 14:18:34 -04:00
Joey Hess
6b992f672c
pull/push over tor working now
Still a couple bugs:

* Closing the connection to the server leaves git upload-pack /
  receive-pack running, which could be used to DOS.

* Sometimes the data is transferred, but it fails at the end, sometimes
  with:

  git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe)

  Must be a race condition around shutdown.
2016-11-21 19:24:55 -04:00
Joey Hess
070fb9e624
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service.
Almost working, but there's a bug in the relaying.

Also, made tor hidden service setup pick a random port, to make it harder
to port scan.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2016-11-21 17:27:38 -04:00
Joey Hess
74691ddf0e
remotedaemon: serve tor hidden service 2016-11-20 15:48:12 -04:00
Joey Hess
0eaad7ca3a
extend p2p protocol to support gitremote-helpers connect
A bit tricky since Proto doesn't support threads. Rather than adding
threading support to it, ended up using a callback that waits for both
data on a Handle, and incoming messages at the same time.

This commit was sponsored by Denis Dzyubenko on Patreon.
2016-11-19 22:39:36 -04:00
Joey Hess
65e903397c
implementation of peer-to-peer protocol
For use with tor hidden services, and perhaps other transports later.

Based on Utility.SimpleProtocol, it's a line-based protocol,
interspersed with transfers of bytestrings of a specified size.

Implementation of the local and remote sides of the protocol is done
using a free monad. This lets monadic code be included here, without
tying it to any particular way to get bytes peer-to-peer.

This adds a dependency on the haskell package "free", although that
was probably pulled in transitively from other dependencies already.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2016-11-17 18:30:50 -04:00
Joey Hess
95916b2ecf
Merge branch 'master' into tor 2016-11-17 12:56:27 -04:00
Joey Hess
2493c2c5a4
allow Utility.Exception to still be used when not building with cabal 2016-11-15 22:01:55 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
57d33f7923
use socket for tor hidden service
This avoids needing to bind to the right port before something else
does.

The socket is in /var/run/user/$uid/ which ought to be writable by only
that uid. At least it is on linux systems using systemd.

For Windows, may need to revisit this and use ports or something.

The first version of tor to support sockets for hidden services
was 0.2.6.3. That is not in Debian stable, but is available in
backports.

This commit was sponsored by andrea rota.
2016-11-14 16:47:56 -04:00
Joey Hess
07ad19f421
git-annex enable-tor command
Tor unfortunately does not come out of the box configured to let hidden
services register themselves on the fly via the ControlPort.

And, changing the config to enable the ControlPort and a particular type
of auth for it may break something already using the ControlPort, or
lessen the security of the system.

So, this leaves only one option to us: Add a hidden service to the
torrc. git-annex enable-tor does so, and picks an unused high port for
tor to listen on for connections to the hidden service.

It's up to the caller to somehow pick a local port to listen on
that won't be used by something else. That may be difficult to do..

This commit was sponsored by Jochen Bartl on Patreon.
2016-11-14 13:48:35 -04:00
Joey Hess
4643470537
webapp: Explicitly avoid checking for auth in static subsite requests.
Yesod didn't used to do auth checks for that, but this may have changed.
I don't have a way to reproduce the reported problem yet, but this change
certianly won't hurt anything.

This commit was sponsored by Thom May on Patreon.
2016-11-10 13:48:54 -04:00
Joey Hess
e23028d19b
restart coprocess in raw mode
Restarting a crashing git process could result in filename encoding issues
when not in a unicode locale, as the restarted processes's handles were not
read in raw mode.

Since rawMode is always used when starting a coprocess, didn't bother
to parameterise it and just always enable it for simplicity.

This commit was sponsored by Jake Vosloo on Patreon.
2016-11-01 14:03:59 -04:00
Joey Hess
6e4fee1faf
test: Deal with gpg-agent behavior change that broke the test suite.
gpg-agent started deleting its socket file on shutdown, and this tickled an
ugly behavior in removeDirectoryRecursive,
https://github.com/haskell/directory/issues/60

Running removeDirectoryRecursive again on exception avoids the problem.
2016-10-18 16:56:38 -04:00
Joey Hess
fd03ff2b81
use System.Directory not Utility.Directory
This module does not use isSymbolicLink so avoid depending on extra
Utility.* stuff, to make it more easily reused elsewhere.
2016-09-22 11:34:55 -04:00
Joey Hess
4f7a50f27d
avoid needing PartialPrelude 2016-09-22 11:29:53 -04:00