Commit graph

1666 commits

Author SHA1 Message Date
Joey Hess
c59a51a065
discard any exception thrown while trying to kill worker threads
Since there's a race here, and since Kyle saw an exception leak out,
which I have not been able to reproduce that. See my comment for what
I think might be going on.

Note that, I used tryNonAsync, because it seems a later tryNonAsync
caught the exception. I don't actually understand how it did, as I
understand exception classification, it's the data type, not the way it
was thrown. One possibility is that the async exception may have been wrapped
in some other, non-async exception, and Show displayed it the same way.
2020-08-10 16:24:51 -04:00
Joey Hess
a6af887a19
couple more exports needed 2020-08-10 14:18:00 -04:00
Joey Hess
6661eeda96
one more export needed 2020-08-10 13:46:20 -04:00
Joey Hess
3fcf478c19
fix build with old versions of process 2020-08-10 13:21:40 -04:00
Joey Hess
9994c5882c
further change to support dlist-1.0
Its tail no longer yields a DList.
2020-08-05 10:37:14 -04:00
Joey Hess
c4ec52b9ae
Slightly sped up the linux standalone bundle
Reduce the number of directories listed in libdirs, which makes the linker
check a lot less dead ends looking for directories.

Eliminated some directories that didn't really contain shared libraries,
or only contained the linker.

That left only 2, one in lib and one in usr/lib, so consolidate those two.

Doing it this way, rather than just consolidating all libs that might exist
into a single directory means that, if there are optimised versions of some
libs, eg in lib/subarch/foo.so, and lib/subarch2/foo.so, they don't get
moved around in a way that would make the linker pick the wrong one.
2020-07-31 14:42:03 -04:00
Joey Hess
f75be32166
external backends wip
It's able to start them up, the only thing not implemented is generating
and verifying keys. And, the key translation for HasExt.
2020-07-29 15:23:18 -04:00
Joey Hess
aa492bc659
Fix a hang when using git-annex with an old openssh 7.2p2
This does mean a 2 second delay after transfers when using that ssh, but
it's an old and apparently quite weirdly broken version of ssh.
2020-07-22 11:04:33 -04:00
Joey Hess
ac56a5c2a0
Fix a lock file descriptor leak that could occur when running commands like git-annex add with -J
Bug was introduced as part of a different FD leak fix in version 6.20160318.
2020-07-21 15:30:47 -04:00
Joey Hess
798fdad660
fix build with dlist-1.0
That removed the list function. This new implementation appears to
actually be more efficient anyway, since it avoids toList.
2020-07-21 12:58:51 -04:00
Joey Hess
2234a1d64a
document 2020-07-19 21:31:06 -04:00
Joey Hess
4c9ad1de46
optimisation: stream keys through git cat-file --buffer
This is only implemented for git-annex get so far. It makes git-annex
get nearly twice as fast in a repo with 10k files, all of them present!

But, see the TODO for some caveats.
2020-07-10 13:54:52 -04:00
Joey Hess
f63a7aa0e7
fix headTList to drop the head item 2020-07-10 13:02:32 -04:00
Joey Hess
de3d7d044d
make catObjectStream support newline and carriage return in filenames
Turns out the %(rest) trick was not needed. Instead, just maintain a
list of files we've asked for, and each cat-file response is for the
next file in the list.

This actually benchmarks 25% faster than before! Very surprising, but it
must be due to needing to shove less data through the pipe, and parse
less.
2020-07-08 13:49:03 -04:00
Joey Hess
d66fc1a464
Revert "async exception safety for coprocesses"
This reverts commit 7013798df5.
2020-07-06 15:11:28 -04:00
Joey Hess
38c0057cc6
fix build on windows 2020-07-01 16:53:50 -04:00
Joey Hess
83df96d1f5
fix build on windows 2020-06-30 11:01:28 -04:00
Joey Hess
104b3a9c6a
Build with the http-client-restricted library when available
Otherwise use the vendored copy as before.

The library is in Debian testing but not stable. Once it reaches
stable, the vendored copy can be removed.

Did not add it to debian/control because IIRC that's used to build
git-annex on stable too, possibly. However, the Debian maintainer will
probably want to make the package depend on libghc-http-client-restricted-dev

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:31:31 -04:00
Joey Hess
01eb863a14
Build with the git-lfs library when available
Otherwise use the vendored copy as before.

The library is in Debian testing but not stable. Once it reaches
stable, the vendored copy can be removed.

Did not add it to debian/control because IIRC that's used to build
git-annex on stable too, possibly. However, the Debian maintainer will
probably want to make the package depend on libghc-git-lfs-dev.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:21:25 -04:00
Joey Hess
aa1ad0b7ca
remove redundant imports
Clean build under ghc 8.8.3, which seems to do better at finding cases
where two imports both provide the same symbol, and warns about one of
them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:05:34 -04:00
Joey Hess
6ef62cb3c7
fix unused import warning
Network.HTTP.Client exports makeConnection since 0.5.3.

Debian stable has a newer version than 0.5.3, so bumping the
min version seems better than adding an ifdef.
2020-06-22 10:55:37 -04:00
Joey Hess
82448bdf39
fix a annex.pidlock issue
That made eg git-annex get of an unlocked file hang until the
annex.pidlocktimeout and then fail.

This fix should be fully thread safe no matter what else git-annex is
doing.

Only using runsGitAnnexChildProcess in the one place it's known to be a
problem. Could audit for all places where git-annex runs itself as a child
and add it to all of them, later.
2020-06-17 15:30:59 -04:00
Joey Hess
9fb549b3f1
fix strictness issue
Recent changes to Utility.Gpg exposed a strictness bug in how Creds
uses it.
2020-06-16 17:09:34 -04:00
Joey Hess
c8ff3e082e
add debug logging wrapper for withCreateProcess 2020-06-11 16:43:24 -04:00
Joey Hess
24ff5e2b29
use uninterruptibleMask
Some recent changes to use mask missed that async exceptions can still
be thrown inside it. The goal is to make sure a block of cleanup code
runs entirely, w/o being interrupted by an async exception, so use
uninterruptibleMask.

Also, converted a few to bracket, which is nicer.
2020-06-09 15:02:56 -04:00
Joey Hess
7013798df5
async exception safety for coprocesses
Tested the forcerestart code path and it works.

The hairy part is, what if an async exception is caught when it's in
restart?

If it's in the part that stops the old process, the old process
is left in the handle. The next attempt to use the CoProcessHandle
will then throw an IO exception, which will result in restart getting
run again. So I think this will work, but have not actually tested it.

The use of withMVarMasked lets it start the new process and fill the
mvar with it, even if there's an async exception at that point.

Note that exceptions are masked while running forcerestart, so
do not need to worry about an async exception being thrown while it's
recovering from an async exception.
2020-06-09 13:44:23 -04:00
Joey Hess
0210e81d83
async exception safety for openFd
Audited for openFile and openFd, and this fixes all the ones I found
where an async exception could prevent the file getting closed.

Except for the lock pool, which is a whole other can of worms.
2020-06-05 15:48:00 -04:00
Joey Hess
660d8d3a87
simpler way to do this
Remove old code that can be trivially implemented using async in a much
nicer way (that is async exception safe).

I've audited all forkOS calls (except for ones in the assistant),
and this was the last remaining one that is not async exception safe.
The rest look ok to me.
2020-06-05 14:18:06 -04:00
Joey Hess
074260f036
async exception safety
Use async/cancel so helper threads are not left running.

Bracket createPipe to ensure the handles get closed.
2020-06-05 14:08:46 -04:00
Joey Hess
a0d09f1d9e
bracket createPipe for async exception safety 2020-06-05 13:58:04 -04:00
Joey Hess
b7619414bf
support building with process-1.6.3 again 2020-06-05 11:40:18 -04:00
Joey Hess
b329cf32ec
fix reversion
Accidentially lost the handle setup in processTranscript.
2020-06-05 10:40:41 -04:00
Joey Hess
2670890b17
convert to withCreateProcess for async exception safety
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.

Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
2020-06-04 15:45:52 -04:00
Joey Hess
e4993b4456
async exception safety
Convert to withCreateProcess (missed this one a couple commits ago)
and also make sure that the child thread gets canceled on exception.
2020-06-04 12:57:22 -04:00
Joey Hess
bd3074643b
remove unused createBackgroundProcess 2020-06-04 12:48:42 -04:00
Joey Hess
20557cf0ef
stop exporting createProcessChecked
Yay, that had an ugly comment associated with it.
2020-06-04 12:46:55 -04:00
Joey Hess
438dbe3b66
convert to withCreateProcess for async exception safety
This handles all sites where checkSuccessProcess/ignoreFailureProcess
is used, except for one: Git.Command.pipeReadLazy
That one will be significantly more work to convert to bracketing.

(Also skipped Command.Assistant.autoStart, but it does not need to
shut down the processes it started on exception because they are
git-annex assistant daemons..)

forceSuccessProcess is done, except for createProcessSuccess.
All call sites of createProcessSuccess will need to be converted
to bracketing.

(process pools still todo also)
2020-06-04 12:44:09 -04:00
Joey Hess
92f775eba0
convert to withCreateProcess for async exception safety
Not yet 100% done, so far I've grepped for waitForProcess and converted
everything that uses that to start the process with withCreateProcess.

Except for some things like P2P.IO and Assistant.TransferrerPool,
and Utility.CoProcess, that manage a pool of processes. See #2
in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7
for how those will need to be dealt with.

checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so
callers of them will also need to be dealt with, and have not been yet.
2020-06-03 15:48:09 -04:00
Joey Hess
31d53587d5
generalize withNullHandle to MonadIO 2020-06-03 15:18:48 -04:00
Joey Hess
1f2e2d15e8
async exception safety
Convert to withCreateProcess and concurrently, both of which
handle cleaning up when there's an async exception thrown to the thread
running this.
2020-06-03 13:19:28 -04:00
Joey Hess
94986fb228
convert to withCreateProcess
Makes it stop the command if the consumer gets killed.

Also, it seems that the old version expected bracketOnError to return
the False from the error handler, but it does not, it would have thrown
the exception and ignored the False. That's fixed, it will now return
False when there is an exception.
2020-06-03 13:15:01 -04:00
Joey Hess
53263efe4b
simplify
This was a pre-withCreateProcess attempt at doing the same thing, so can
just call boolSystem now that it uses withCreateProcess.

There's a slight behavior change, since it used to wait, after an async
exception, for the command to finish, before re-throwing the exception.
Now, it rethrows the exception right away. I don't think that impact any
of the users of this.
2020-06-03 13:01:18 -04:00
Joey Hess
e1fc4f7594
make safeCommand stop the process if the thread gets killed
And a comment on a todo item that this commit is perhaps the start of
solving.
2020-06-03 12:52:11 -04:00
Joey Hess
30ac015b79
add a formatContainsVar function
Also, the format function gets faster because it checks for "escaped_"
at gen time instead of every time format is called.
2020-05-19 15:35:00 -04:00
Joey Hess
49bf7c8403
typo 2020-05-12 13:59:15 -04:00
Joey Hess
2a8fdfc7d8
Display a warning message when asked to operate on a file inside a directory that's a symbolic link to elsewhere
This relicates git's behavior. It adds a few stat calls for the command
line parameters, so there is some minor slowdown, but even with thousands
of parameters it will not be very noticable, and git does the same statting
in similar circumstances.

Note that this does not prevent eg "git annex add symlink"; the symlink
will be added to git as usual. And "git annex find symlink" will silently
list nothing as well. It's only "symlink/foo" or "subdir/symlink/foo" that
triggers the warning.
2020-05-11 15:03:35 -04:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
021ed4f1b9
fix some build warnings 2020-05-04 12:44:26 -04:00
Joey Hess
4a6d328ae9
Avoid a test suite failure when the environment does not let gpg be tested
Due to eg, too long a path to the agent socket, caused by running gpg in a
container where /run is not mounted, and/or some other gpg behavior like
unnecessarily making relative paths to its home directory absolute.
2020-04-28 15:47:23 -04:00
Joey Hess
19b5137227
addurl --fast error message improvement
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.

(Also importfeed --fast potentially.)
2020-04-27 13:48:14 -04:00
Joey Hess
45fb7af21c
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.

In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.

Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 11:05:57 -04:00
Joey Hess
cee6b344b4
cat-file resource pool
Avoid running a large number of git cat-file child processes when run with
a large -J value.

This implementation takes care to avoid adding any overhead to git-annex
when run without -J. When run with -J, there is a small bit of added
overhead, to manipulate the resource pool. That optimisation added a
fair bit of complexity.
2020-04-20 15:19:31 -04:00
Joey Hess
d5d8259937
ByteString Ref continued
Attoparsec parser for diff-tree.

Changed fromRef back to producing a String, to avoid needing to convert
every use of it. However, this does mean I'm going to miss some
opportunities where fromRef is used and the result converted back to a
ByteString. Would be worth revisiting that at some point maybe.
2020-04-07 11:54:27 -04:00
Joey Hess
279991604d
started converting Ref from String to ByteString
This should make code that reads shas and refs from git faster.

Does not compile yet, a lot needs to be done still.
2020-04-06 17:14:49 -04:00
Joey Hess
f6d19b18f6
remove unused imports 2020-03-30 12:11:52 -04:00
Joey Hess
cd5658d972
fix windows build 2020-03-10 13:24:37 -04:00
Joey Hess
6d58ca94d6
some easy createDirectoryUnder conversions 2020-03-05 15:20:10 -04:00
Joey Hess
ebbc5004fa
convert createAnnexDirectory to use createDirectoryUnder
It will create foo/.git/annex/, but not foo/.git/ and not foo/.

This will avoid it creating an empty path to a repo when a drive is
yanked out and the mount point goes away, for example.
2020-03-05 14:33:04 -04:00
Joey Hess
5b022eea87
implemented createDirectoryUnder 2020-03-05 14:10:34 -04:00
Joey Hess
662e5a5db9
small opt to absPath
Noticed that it gets the CWD unncessarily when the path is absolute.

I have not benchmarked this, but I guess that the small overhead of
isAbsolute is so tiny compared to the system call that it's worth
it even if most of the time relative paths are passed to absPath.
2020-03-05 13:52:30 -04:00
Joey Hess
716e573514
split up quickcheck tests for hashes and macs
So when one fais, it's clear which one is the problem.
2020-03-02 14:34:48 -04:00
Joey Hess
f6d629e483
changelog and minor style 2020-02-28 12:57:55 -04:00
Peter Simons
73cf523a4b
Fix build with ghc-8.8.x.
The 'fail' method has been moved to the 'MonadFail' class. I made the changes
so that the code still compiles with previous versions of 'base' that don't
have the new MonadFail class exported by Prelude yet.
2020-02-28 12:54:20 -04:00
Joey Hess
9659f1c30f
annex.security.allowed-ip-addresses ports syntax
Extended annex.security.allowed-ip-addresses to let specific ports of an IP
address to be used, while denying use of other ports.
2020-02-25 15:45:52 -04:00
Joey Hess
029c883713
Merge branch 'master' into v8 2020-02-19 14:32:11 -04:00
Joey Hess
6f90bb7738
handle git-credential prompt in -J mode
If git-credential has it cached and does not prompt, this will
unfortunately result in a brief flicker, as the displayed console
regions are hidden while running it and then re-displayed. Better than a
corrupted display.

Actually, I tried it and don't see a visible flicker, so probably only
over a slow ssh will it be apparent.
2020-01-22 16:42:15 -04:00
Joey Hess
1883f7ef8f
support git remotes that need http basic auth
using git credential to get the password

One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
2020-01-22 16:16:19 -04:00
Joey Hess
b68a8d8968
use conversion functions from filepath-bytestring (again)
This reverts commit 3a04af7927.
2020-01-04 20:18:40 -04:00
Joey Hess
2cea674d1e
Merge branch 'master' into v8 2020-01-01 14:26:43 -04:00
Joey Hess
999a6f0541
windows build fix 2020-01-01 13:05:23 -04:00
Joey Hess
39c91f91a9
windows build fix 2020-01-01 12:24:31 -04:00
Joey Hess
e006acc8e3
fix quickcheck failure
prop_encode_decode_roundtrip failed on "\175" in C locale.

This may be a new problem after the switch to RawFilePath, but it
already had filtering for high chars, so changed to only test ascii
chars.
2019-12-30 13:54:46 -04:00
Joey Hess
3a04af7927
temporary revert "use conversion functions from filepath-bytestring"
This reverts commit 75c40279c1.

Debian unstable is one version too old, so this can be de-reverted in a
bit.
2019-12-27 19:29:09 -04:00
Joey Hess
2b821eb225
Merge branch 'master' into sqlite 2019-12-26 15:15:42 -04:00
Joey Hess
444d5591ee
Improve file ordering behavior when one parameter is "." and other parameters are other directories
eg, `git-annex get . ..` used to order the files strangly, because it
did not realize that when git ls-files output eg "foo", that should be
grouped with the first set of files and not the second set.

Fixed by making            dirContains "." "./foo" = True
which makes sense, because dirContains ".." "../foo" = True
2019-12-20 18:01:29 -04:00
Joey Hess
d5628a16b8
Merge branch 'bs' into sqlite-bs 2019-12-18 14:51:03 -04:00
Joey Hess
75c40279c1
use conversion functions from filepath-bytestring
Behavior should be the same, but I'd hope to eventually get rid of
most of Utility.FileSystemEncoding and this is a first step.
2019-12-18 13:42:43 -04:00
Joey Hess
322c542b5c
fix ByteString conversion on windows
the encode' and decode' functions on Windows should not apply the
filesystem encoding, which does not work there. Instead, convert to and
from UTF-8.

Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem
encoding, so won't work as expected on windows.
2019-12-18 13:32:56 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
a0168cd9a2
use RawFilePath getSymbolicLinkStatus for speed 2019-12-06 15:42:54 -04:00
Joey Hess
2f9a80d803
merging sqlite and bs branches
Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
2019-12-06 15:30:45 -04:00
Joey Hess
5f391179f1
use RawFilePath getFileStatus for speed
Only done on those calls to getFileStatus that had a RawFilePath, not a
FilePath. The others would probably be just as fast if converted to use
it with toRawFilePath, but I'm not 100% sure.

Note that genInodeCache' uses fromRawFilePath, but that value only gets
used on Windows, so on unix the thunk will never be evaluated.
2019-12-06 14:44:42 -04:00
Joey Hess
360942ba12
RawFilePath will need to support Windows too
Of course, readSymbolicLink always fails on Windows, but now it's ready
for other things that don't fail there.
2019-12-06 14:17:48 -04:00
Joey Hess
f39f018ee0
fix git ls-tree parser
File mode is octal not decimal. This broke in the conversion to
attoparsec.

(I've submitted the content of Utility.Attoparsec to the attoparsec
developers.)

Test suite passes 100% now.
2019-12-06 14:05:48 -04:00
Joey Hess
37d0f73e66
reword comment 2019-11-27 16:38:18 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Joey Hess
1ff889e456
explict export lists
A small amount of dead code removed.

All of Utility/ done now.

This commit was sponsored by Brock Spratlen on Patreon.
2019-11-23 11:24:10 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
1d0dbdf201
squelch tab warnings 2019-11-22 12:49:41 -04:00
Joey Hess
7263aafd2b
Merge branch 'master' into sqlite 2019-11-22 12:49:35 -04:00
Joey Hess
b82ab21468
missed an export 2019-11-22 12:35:57 -04:00
Joey Hess
a9888f6151
Windows: Fix handling of changes to time zone.
Used to work but was broken in version 7.20181031, specifically commit
5ab0f48ffb.

That this was not noticed over at least 1 daylight savings time zone
changes makes me wonder if the TSDelta stuff is still needed.
Perhaps the mtime on Windows no longer changes when the time zone is changed?

(cherry picked from commit 09ee6b0ccb)
2019-11-21 17:28:18 -04:00
Joey Hess
d4661959de
Merge branch 'master' into sqlite 2019-11-21 17:26:50 -04:00
Joey Hess
8ea5f3ff99
explict export lists
Eliminated some dead code. In other cases, exported a currently unused
function, since it was a logical part of the API.

Of course this improves the API documentation. It may also sometimes
let ghc optimize code better, since it can know a function is internal
to a module.

364 modules still to go, according to
git grep -E 'module [A-Za-z.]+ where'
2019-11-21 16:08:37 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
09ee6b0ccb
Windows: Fix handling of changes to time zone.
Used to work but was broken in version 7.20181031, specifically commit
5ab0f48ffb.

That this was not noticed over at least 1 daylight savings time zone
changes makes me wonder if the TSDelta stuff is still needed.
Perhaps the mtime on Windows no longer changes when the time zone is changed?
2019-11-06 14:36:49 -04:00
Joey Hess
89bdcffdfa
found a way to extract InodeCache from git index
This will allow a race-free database transition. It is somewhat hairy in
that it depends on an unspecified git output format.
2019-11-06 14:23:00 -04:00
Joey Hess
4940a135af
eliminate raw sql LIKE query 2019-10-30 15:19:52 -04:00
Joey Hess
94efc400e9
horrible impementation of isInodeKnown
The only good thing about it is it does not require a major version bump
to improve the database. That will need to happen at some point though.

Potentially very very slow in a large repository.

Ugly use of raw sql.
2019-10-23 14:37:29 -04:00
Joey Hess
eebf080b33
comment typo 2019-10-23 12:32:46 -04:00
Joey Hess
9a5d9019ba
Deal with pkexec changing to root's home directory when running a command.
Wow, that's not documented anywhere, and seems like a major gotcha in
pkexec.

Broke enable-tor.
2019-10-21 12:39:19 -04:00
Joey Hess
b90ddbc383
enable-tor: Use pkexec to run command as root when gksu and kdesu are not available.
gksu is no longer in debian, even stable

kdesu in debian is not installed in PATH any longer, though the executable
is still present under /usr/lib

pkexec is packagekit's replacement for those older commands.
2019-09-30 15:19:01 -04:00
Joey Hess
f2737a5fbe
enable-tor: Run kdesu with -c option. 2019-09-30 15:14:05 -04:00
Joey Hess
ab8a6a82e1
remove unused 2019-09-24 18:16:01 -04:00
Joey Hess
a4750fa537
move haddock block so haddock will build 2019-09-24 18:14:47 -04:00
Joey Hess
71f30d2f07
improve haddock 2019-09-24 18:10:34 -04:00
Joey Hess
bc1b9a2c0a
improved GitLFS api 2019-09-24 18:05:11 -04:00
Joey Hess
6ae0a44c64
git-lfs: Added support for http basic auth 2019-09-24 14:46:20 -04:00
Joey Hess
53fd746705
avoid some build warnings on windows 2019-09-12 14:11:19 -04:00
Joey Hess
9624fe4c37
improve comment 2019-09-01 12:33:19 -04:00
Joey Hess
1558e03014
Refuse to upgrade direct mode repositories when git is older than 2.22
That git fixed a memory leak that could cause an OOM during the upgrade.

Most git-annex builds have a new enough git already.
OSX git was upgraded with brew.

Linux i386ancient build's git was too old. Upgrading it to a fixed
git didn't work (due to the newer git not working with the old ssh,
https://bugs.chromium.org/p/git/issues/detail?id=7 )

Choices to deal with that were:

* Somehow make direct mode upgrade work with the old git, avoiding its
  OOM problem. One way would be to switch the repo to indirect mode
  first, and so upgrade to a repo with locked files. Not good when
  the filesystem does not support symlinks.
* backport the OOM fix from git 2.22
  (And do what about the version number so git-annex knows it's fixed?)
* backport openssh (and possibly more stuff)
* move the i386ancient build to at least Debian stretch (still backporting git)
  But this will make it no longer work with some of the ancient kernels it
  targets.

Of those, backporting the OOM fix seemed the best approach. Put "oomfix"
in the git version number to indicate it.

I have not automated building the git backport, so here's the patch I
used:

diff -ur orig/git-2.1.4/convert.c git-2.1.4/convert.c
--- orig/git-2.1.4/convert.c	2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/convert.c	2019-08-29 20:05:04.371872338 +0100
@@ -404,7 +404,7 @@
 	if (start_async(&async))
 		return 0;	/* error was already reported */

-	if (strbuf_read(&nbuf, async.out, len) < 0) {
+	if (strbuf_read(&nbuf, async.out, 0) < 0) {
 		error("read from external filter %s failed", cmd);
 		ret = 0;
 	}
diff -ur orig/git-2.1.4/GIT-VERSION-GEN git-2.1.4/GIT-VERSION-GEN
--- orig/git-2.1.4/GIT-VERSION-GEN	2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/GIT-VERSION-GEN	2019-08-29 20:06:39.132743228 +0100
@@ -1,7 +1,7 @@
 #!/bin/sh

 GVF=GIT-VERSION-FILE
-DEF_VER=v2.1.4
+DEF_VER=v2.1.4.oomfix

 LF='
 '
diff -ur orig/git-2.1.4/configure git-2.1.4/configure
--- orig/git-2.1.4/configure	2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/configure	2019-08-29 20:27:45.896380015 +0100
@@ -580,8 +580,8 @@
 # Identity of this package.
 PACKAGE_NAME='git'
 PACKAGE_TARNAME='git'
-PACKAGE_VERSION='2.1.4'
-PACKAGE_STRING='git 2.1.4'
+PACKAGE_VERSION='2.1.4.oomfix'
+PACKAGE_STRING='git 2.1.4.oomfix'
 PACKAGE_BUGREPORT='git@vger.kernel.org'
 PACKAGE_URL=''

diff -ur orig/git-2.1.4/version git-2.1.4/version
--- orig/git-2.1.4/version	2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/version	2019-08-29 20:06:17.572545210 +0100
@@ -1 +1 @@
-2.1.4
+2.1.4.oomfix
2019-08-29 15:24:41 -04:00
Joey Hess
69cefe8190
followup and display rsync exit status 2019-08-15 14:47:22 -04:00
Joey Hess
05d52f9699
fix display of http exceptions 2019-08-10 11:09:25 -04:00
Joey Hess
868942e19b
fix unused module import warnings when building on windows 2019-08-08 12:18:53 -04:00
Joey Hess
ee72fd2f7d
add exports useful if using this module to write a git-lfs server 2019-08-05 15:40:36 -04:00
Joey Hess
c527ae5887
Merge branch 'master' into git-lfs 2019-08-05 11:48:45 -04:00
Joey Hess
19defc7932
fix reversion
4af55c42bf reordered the exception
catching, preventing following ftp redirect
2019-08-04 14:32:06 -04:00
Joey Hess
4af55c42bf
factored out downloadConduit from download
useful when an API provides a Request to download
2019-08-04 12:31:54 -04:00
Joey Hess
5be0a35dae
implemented checkPresent for git-lfs 2019-08-03 12:21:28 -04:00
Joey Hess
f536a0b264
weaken comment
I'm seeing the github lfs server request an upload of an object that has
already been uploaded to it before. Probably because they offload
storage to S3 and so skipped the overhead of checking for an unncessary
upload.
2019-08-03 11:31:02 -04:00
Joey Hess
74e9e3ccf0
add to request headers, don't overwrite 2019-08-03 11:15:08 -04:00
Joey Hess
fc09a41ed1
storing objects in git-lfs is working
Still need to record the sha256 and size when they cannot be determined
by inspecting the key.
2019-08-02 13:56:55 -04:00
Joey Hess
6c1130a3bb
lfs endpoint discovery and caching in git-lfs special remote 2019-08-02 12:38:14 -04:00
Joey Hess
03a765909c
move IO code out
Let's keep this entirely pure.

git-annex has its own facilities for running a ssh command, that make it
respect various config settings, and cache connections, etc. So better
not to have the library run ssh itself.
2019-08-02 10:57:40 -04:00
Joey Hess
2533acc7a2
note about ssh hostname sanitization 2019-08-02 10:40:55 -04:00
Joey Hess
bd6c508334
finalizing lfs module
It may eventually move to its own package.
2019-08-01 14:04:56 -04:00
Joey Hess
018b5b8173
Support building with socks-0.6 and persistant-template-2.7
persistent-template now needs UndecidableInstances.

socks changed defaultSocksConf to take a SockAddr.
2019-07-30 12:50:48 -04:00
Joey Hess
426053cb6c
Corrected some license statements
In 40ecf58d4b I changed the license of code I
wrote from GPL to AGPL. But, two files containing code I wrote combined
with code by others were updated to say their license is AGPL, while in
fact part of it was (the code I wrote) but part remained under the original
license (the code written by others).

Remote/Ddar.hs is now changed entirely back to GPL 3.

Annex/DirHashes.hs stays AGPL, but I broke out Utility/MD5.hs with the code
not written by me, and corrected its license statement to GPL-2, which
is the actual version of the GPL included with the code in its original
distribution at http://www.cs.ox.ac.uk/people/ian.lynagh/md5/
2019-07-28 14:27:33 -04:00
Joey Hess
7fd650355e
merge from http-client-restricted
I made some improvements to its API after splitting it out of git-annex,
so merge those back in.

This is groundwork for removing the embedded copy of it and depending on
it.

Also moved the managerResponseTimeout disabling to Annex.Url as it's
git-annex specific.

This commit was sponsored by Ethan Aubin on Patreon.
2019-07-17 16:48:50 -04:00
Joey Hess
7234b1f9a7
small optimisation to file copying
Avoid statting file, just try to remove it.

Also a comment to explain why it tries to remove it, which was puzzling
me when I revisited this code until I saw that cp fails to overwrite a
mode 444 file, including perhaps one left by a previous interrupted cp.

This commit was sponsored by Fernando Jimenez on Patreon.
2019-07-17 14:22:21 -04:00
Joey Hess
21ff5e1e5a
CoW probing
Improved probing when CoW copies can be made between files on the same
drive. Now supports CoW between BTRFS subvolumes. And, falls back to rsync
instead of using cp when CoW won't work, eg copies between repos on the
same EXT4 filesystem.

Rather than trying cp --reflink=always for each file copied to a remote,
it's tried once and if it fails it falls back to using rsync thereafter
for the lifetime of the Remote object. That avoids overhead of calling cp
which while small, will add up over a large number of files.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2019-07-17 14:19:08 -04:00
Joey Hess
0c6b7e288d
Add BLAKE2BP512 and BLAKE2BP512E backends
using a blake2 variant optimised for 4-way CPUs

This had been deferred because the Debian package of cryptonite, and
possibly other builds, was broken for blake2bp, but I've confirmed #892855
is fixed.

This commit was sponsored by Brett Eisenberg on Patreon.
2019-07-05 15:30:03 -04:00
Joey Hess
9a5ddda511
remove many old version ifdefs
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.

The only remaining version ifdefs in the entire code base are now a couple
for aws!

This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of  git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.

This commit was sponsored by Ilya Shlyakhter.
2019-07-05 15:09:37 -04:00
Joey Hess
42c386fc47
add: Display progress meter when hashing files.
* add: Display progress meter when hashing files.
* add: Support --json-progress option.
2019-06-25 13:12:47 -04:00
Joey Hess
759fd9ea68
avoid url resume from 0
When downloading an url and the destination file exists but is empty,
avoid using http range to resume, since a range "bytes=0-" is an unusual
edge case that it's best to avoid relying on working.

This is known to fix a case where importfeed downloaded a partial feed from
such a server. Since importfeed uses withTmpFile, the destination always exists
empty, so it would particularly tickle such problem servers. Resuming from 0
is otherwise possible, but unlikely.
2019-06-20 12:26:17 -04:00
Joey Hess
fe49747fc8
add missing case
and fix name shadowing warning
2019-06-04 11:24:32 -04:00
Joey Hess
6136e299a2
add back support for following http to ftp redirects
Did not test build with http-client < 0.5 and while I tried to support
it, the ifdefed parts may needs some fixes.
2019-05-30 16:04:59 -04:00
Joey Hess
67c06f5121
add back support for ftp urls
Add back support for ftp urls, which was disabled as part of the fix for
security hole CVE-2018-10857 (except for configurations which enabled curl
and bypassed public IP address restrictions). Now it will work if allowed
by annex.security.allowed-ip-addresses.
2019-05-30 14:51:34 -04:00
Joey Hess
aa7710982b
avoid list lookup by parseToken
Minor optimisation to parsing of a preferred content expression.
2019-05-14 13:11:29 -04:00
Joey Hess
00e9e15c70
squelch build warning with old version of quickcheck 2019-05-03 11:02:12 -04:00
Joey Hess
2b52dbe905
fix build with older QuickCheck
The NonEmpty instance was moved out of QuickCheck and into a package
with more deps than I want to drag in, so I'm providing my own instance,
but with older QuickCheck, use theirs to avoid overlapping.
2019-03-22 10:07:16 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
c0bd202147
fix failing test case
An empty list of [ContentIdenfier] serialized to the same thing
as a single ContentIdentifier "". Avoid this ambiguity by requiring the
list be non-empty.
2019-03-06 14:27:15 -04:00
Joey Hess
1ec9e1494c
use relatedTempate in viaTmp 2019-03-04 14:12:00 -04:00
Joey Hess
4603713b4e
avoid using htonl
It got removed from network-3.0.0.0 and nothing in the haskell ecosystem
currently provides it (which seems it ought to be fixed).

Tested new code on both little-endian and big-endian with:

ghci> hostAddressToTuple $ fromJust $ embeddedIpv4 (0,0,0,0,0,0xffff,0x7f00,1)
(127,0,0,1)
2019-02-19 12:17:20 -04:00
Joey Hess
f5f059e288
relocate gpg test framework temp dir to outside repo
The gitAnnexTmpOtherDir cleanup made it be deleted too early sometimes,
and so the test suite failed. Also there was a report of a similar
failure which likely had a similar cause and hopwfully this fixes that
too.
2019-01-21 14:16:00 -04:00
Joey Hess
e38b654096
Estimated time to completion display shortened from eg "1h1m1s" to "1h1m"
Because seconds accuracy over such a time is unlikely to be accurate.
Also, it was possible to get a ridiculous "1y1d1h1m1s" if stalled or
very slow.
2019-01-21 00:04:35 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
0e44985210
remove duplicate import 2019-01-14 18:26:38 -04:00
Joey Hess
e0c4ac99b5
convert serializeKey' to strict ByteString
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
2019-01-14 17:03:46 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
fc21cccf1c
slight optimisation more 2019-01-11 19:56:31 -04:00
Joey Hess
16c798b5ef
switch MetaValue to ByteString and MetaField to Text
MetaField was already limited to alphanumerics, so it makes sense to use
Text for it.

Note that technically a UUID can contain invalid UTF-8, and so
remoteMetaDataPrefix's use of T.pack . fromUUID could replace non-UTF8
values with '?' or whatever. In practice, a UUID is usually also text,
I only kept open the possibility of it containing invalid UTF-8 to avoid
breaking parsing of strange UUIDs in git-annex branch files. So, I
decided to let this edge case slip by.

Have not updated the rest of the code base yet for this change, as the
change took 2.5 hours longer than I expected to get working properly.
2019-01-07 14:18:24 -04:00
Joey Hess
a80922a594
support for ByteStrings 2019-01-07 12:29:25 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
f574d8af10
comment typo 2019-01-03 00:22:05 -04:00
Joey Hess
3ba6e9bb96
use attoparsec parser for String parsing, 10x speedup
This is not as efficient as using ByteStrings throughout, but converting
the String to ByteString is actually significantly faster than the old
parser.

    benchmarking parse/old
    time                 9.657 μs   (9.600 μs .. 9.732 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 9.703 μs   (9.645 μs .. 9.785 μs)
    std dev              231.6 ns   (161.5 ns .. 323.7 ns)
    variance introduced by outliers: 25% (moderately inflated)

    benchmarking parse/new
    time                 834.6 ns   (797.1 ns .. 886.9 ns)
                         0.987 R²   (0.976 R² .. 0.999 R²)
    mean                 816.4 ns   (802.7 ns .. 845.1 ns)
    std dev              62.39 ns   (37.66 ns .. 108.4 ns)
    variance introduced by outliers: 82% (severely inflated)

There is a small behavior change from the old parsePOSIXTime,
which accepted any amount of trailing whitespace after the timestamp.
That behavior was not documented, and it doesn't seem anything relied on it.
2019-01-02 13:28:44 -04:00
Joey Hess
3c74dcd4e1
attoparsec parser for POSIXTime
(Not yet used anywhere.)

Benchmarking

{-# LANGUAGE OverloadedStrings #-}

import Criterion.Main
import Utility.TimeStamp
import Data.Attoparsec.ByteString

main = defaultMain
	[ bgroup "parse"
		[ bench "new" $ whnf (parseOnly (parserPOSIXTime <* endOfInput)) "1431286201.113452s"
		, bench "old" $ whnf parsePOSIXTime "1431286201.113452s"
		]
	]

benchmarking parse/new
time                 643.6 ns   (640.2 ns .. 646.7 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 645.3 ns   (642.1 ns .. 650.9 ns)
std dev              14.59 ns   (9.194 ns .. 22.07 ns)
variance introduced by outliers: 29% (moderately inflated)

benchmarking parse/old
time                 9.657 μs   (9.600 μs .. 9.732 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.703 μs   (9.645 μs .. 9.785 μs)
std dev              231.6 ns   (161.5 ns .. 323.7 ns)
variance introduced by outliers: 25% (moderately inflated)

So old took 9703 ns to parse, and new 643 ns.
2019-01-02 12:48:53 -04:00
Joey Hess
ba2c0663f9
comments 2019-01-01 22:48:14 -04:00
Joey Hess
ec1b9da72f
avoid abusing from/toRawFilePath for non-FilePaths 2019-01-01 22:44:04 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
1b44426805
avoid conflicting definitions of Template type
When both modules are imported and then re-exported.
2018-12-30 15:03:31 -04:00
Joey Hess
5480b3a9af
fix bogus ghc 8.6.3 build warning
ghc warned that the guards did not cover all values of h, but they
clearly do, and when rewritten as a case statement the warning goes
away.

Probably a ghc bug, but I kind of prefer the case statement over the
guards anyway.
2018-12-30 14:43:27 -04:00
Joey Hess
14971414dc
Make test suite work better when the temp directory is on NFS.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.

Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
2018-12-19 12:44:56 -04:00
Joey Hess
850d19d038
add dropFromEnd 2018-11-23 11:24:05 -04:00
Joey Hess
9127fe4821
add DebugLocks build flag
Using the method described in
https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell
but my own code to implement it, and with callstacks added.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-19 15:02:43 -04:00
Joey Hess
ff9bd9620e
Fix resume of download of url when the whole file content is already actually downloaded
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-12 16:08:47 -04:00
Joey Hess
051dfcb3be
Revert "fix comment"
This reverts commit bac7d34e71.

The comment was right; ARG_MAX is the total length of all arguments.
2018-11-06 17:26:20 -04:00
Joey Hess
bac7d34e71
fix comment 2018-11-06 11:42:31 -04:00
Joey Hess
5ad5d45d4c
make Arbitrary POSIXTime include decimal half the time 2018-10-31 16:27:55 -04:00
Joey Hess
2ca408dc33
Increase minimum QuickCheck version. 2018-10-31 15:53:22 -04:00
Joey Hess
f00b329e0c
remove unused import 2018-10-30 13:38:29 -04:00
Joey Hess
86df2a08fe
fix windows build 2018-10-30 11:09:45 -04:00
Joey Hess
5ab0f48ffb
high-res mtimes
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).

Including on Windows.

With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
2018-10-30 00:41:26 -04:00
Joey Hess
48af284872
fix parse of negative posix time
Should never happen, but..
2018-10-29 23:40:34 -04:00
Joey Hess
a8ad577d1d
fix parsing of timestamp w/o trailing 's'
Luckily, this did not affect any git-annex log files, since they all
include the trailing 's' for backwards compatability reasons.

But, if I later want to drop that, this is the first commit where
git-annex can be trusted to parse that right.

The misparse caused it to be off by up to 10 seconds.
2018-10-29 23:36:47 -04:00
Joey Hess
3d1b22dc8e
factor out another function 2018-10-29 23:33:56 -04:00
Joey Hess
2e9f128dea
moved module and relicensed 2018-10-29 23:13:36 -04:00
Joey Hess
5d97898a7c
touch files with high-resolution timestamp
Needs unix 2.7.2, but that was included in ghc 8.0.1 (and much older)
so not really a new dep.
2018-10-29 22:25:21 -04:00
Joey Hess
94b7968f1f
forgot to remove this when dropping support for old ghc 2018-10-29 22:01:06 -04:00
Joey Hess
595fb98473
add small delay to avoid problems on systems with low-resolution mtime
I've seen intermittent failures of the test suite with v6 for a long time,
it seems to have possibly gotten worse with the changes around v7. Or just
being unlucky; all tests failed today.

Seen on amd64 and i386 builders, repeatedly but intermittently:

	unused: FAIL (4.86s)
	Test.hs:928:
	git diff did not show changes to unlocked file

And I think other such failures, all involving v7/v6 mode tests.

I managed to reproduce the unused failure with --keep-failures,
and inside the repo, git diff was indeed not showing any changes for
the modified unlocked file.

The two stats will be the same other than mtime; the old and new files have
the same size and inode, since the test case writes to the file and then
overwrites it.

Indeed, notice the identical timestamps:

	builder@orca:~/gitbuilder/build/.t/tmprepo335$ echo 1 > foo; stat foo; echo 2 > foo; stat foo
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.894942036 +0000
	Change: 2018-10-29 22:14:10.894942036 +0000
	 Birth: -
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.898942036 +0000
	Change: 2018-10-29 22:14:10.898942036 +0000
	 Birth: -

I'm seeing this in Linux VMs; it doesn't happen on my laptop. I've also
not experienced the intermittent test suite failures on my laptop.

So, I hope that this small delay will avoid the problem.

Update: I didn't, indeed I then reproduced the same failure on my
laptop, so it must be due to something else. But keeping this change anyway
since not needing to worry about lowish-resolution mtime in the test suite seems
worthwhile.
2018-10-29 19:31:26 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
38d691a10f
removed the old Android app
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.

[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.

Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.

Some documentation specific to the Android app (screenshots etc) needs
to be updated still.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-13 01:41:11 -04:00
Joey Hess
45e09ea7f3
debug the full adjusted Request
So that the user-agent etc are included in the debug.
2018-10-04 13:45:27 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
502c5a4917
remove support for old http-client version
git-annex already bumped to a newer version for the http security fix.

This commit was sponsored by mo on Patreon.
2018-10-03 12:00:07 -04:00
Joey Hess
c88e8c8249
unify error display 2018-10-03 11:56:52 -04:00
Joey Hess
26a02cb386
display error when an invalid url is downloaded
download is documented as displaying an error when download fails, but
it didn't when the url was not valid at all. That leads to confusing
behavior.

Also, display the url with --debug
2018-09-25 13:38:20 -04:00
Joey Hess
cc82f81227
More FreeBSD build fixes.
Untested, on FreeBSD but enough to fix the listed build errors.

Seems that System.Posix.Files must have used to export this stuff and it
was split.

This commit was sponsored by Peter on Patreon.
2018-09-24 11:25:56 -04:00
Joey Hess
ceee7758a5
fix \ escaping 2018-09-22 11:33:08 -04:00
Joey Hess
d2c351f547
update windows NUL for ghc 8.6.1
This should also work with older ghc, since the path is a windows device
namespace path.
2018-09-22 11:31:55 -04:00
Joey Hess
2aae6e84af
Support newlines in filenames.
Work around git cat-file --batch's protocol not supporting newlines by
running git cat-file not batched and passing the filename as a
parameter.

Of course this is quite a lot less efficient, especially because it
currently runs it multiple times to query for different pieces of
information.

Also, it has subtly different behavior when the batch process was
started and then some changes were made, in which case the batch process
sees the old index but this workaround sees the current index. Since
that batch behavior is mostly a problem that affects the assistant and has
to be worked around in it, I think I can get away with this difference.

I don't know of any other problems with newlines in filenames, everything
else in git I can think of supports -z. And git-annex's json output
supports newlines in filenames so downstream parsers from git-annex will be ok.
git-annex commands that use --batch themselves don't support newlines
in input filenames; using --json --batch is currently a way around that
problem.

This commit was sponsored by Ewen McNeill on Patreon.
2018-09-20 13:45:44 -04:00
Yaroslav Halchenko
b976eb5353
BF(minor): missing space after "Unsupported url scheme" msg before the scheme 2018-09-18 18:19:20 -04:00
Joey Hess
b3c9c59d3d
--debug urls
When git-annex used wget and curl, --debug would show urls. So there can't
be any new security problem with doing so.

This commit was sponsored by John Pellman on Patreon.
2018-09-14 12:46:39 -04:00
Joey Hess
b18fb1e343
clean P2P protocol shutdown on EOF
Avoids "git-annex-shell: <stdin>: hGetChar: end of file"
being displayed by the test suite, due to the way it
runs git-annex-shell without using ssh.

git-annex-shell over ssh was not affected because git-annex hangs up the
ssh connection and so never sees the error message that git-annnex-shell
probably did emit.

This commit was sponsored by Ryan Newton on Patreon.
2018-09-13 10:46:37 -04:00
Joey Hess
872640549b
comment typo 2018-09-05 13:57:06 -04:00
Joey Hess
f4788f3853
clarify comment
haskell-mountpoints contains android specific code, but it's not used
when git-annex was built for linux and is running on android.
2018-09-05 11:22:27 -04:00
Joey Hess
55f8d90dee
remove Utlity.SRV, no longer used 2018-09-05 11:15:33 -04:00
Joey Hess
f54c72d2e1
Fix build on FreeBSD
This must have been broken for years..

This commit was sponsored by Jack Hill on Patreon.
2018-08-29 12:09:03 -04:00
Joey Hess
c565340adc
stop using external hash programs, since cryptonite is faster
In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external
hashers". Re-benchmarking today, I found cryptonite's sha256 consistently
outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb
files with both sha256 and sha512. And for smaller files, the external
process startup time swamps the hash time.

Perhaps cryptonite has improved. Or it could just do better on my
current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite
is slower in some situations, seems likely it would only be marginally slower;
it's got the same class of highly optimised C code under the hood as coreutils.
The main difference between the two sha256 implementations seems to be
how much of the inner loop they unroll..

This commit was sponsored by Henrik Riomar on Patreon.
2018-08-28 18:10:58 -04:00
Joey Hess
6a445dc086
support conditionally excluding queued files
Switched code to use a for loop to avoid a filterM that would have
doubled the memory used.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 14:38:37 -04:00
Joey Hess
218c76b789
avoid unused imports warning on non-linux 2018-08-07 15:06:33 -04:00
Joey Hess
e1ab01f94d
Fix reversion in display of http 404 errors.
Switch to using http-client for large file downloads caused the reversion;
the code for displaying a 404 response was instead displaying the raw html
document, which is not useful.

This commit was sponsored by Ryan Newton on Patreon.
2018-07-31 12:15:26 -04:00
Joey Hess
50609da787
fix User-Agent reversion
Send User-Agent and any configured annex.http-headers when downloading with
http, fixes reversion introduced when switching to http-client.

This commit was sponsored by mo on Patreon.
2018-07-16 11:56:47 -04:00
Joey Hess
ac228fa723
don't import all of System.Posix.Files
This avoid a build problem when different versions of posix and
posixcompat are used. Does not normally happen as cabal prevents that,
but this is sometimes used with ghc --make which can get into that
situation.
2018-07-10 12:04:49 -04:00
Joey Hess
3dd7f450c1
fix p2p --pair
p2p --pair: Fix interception of the magic-wormhole pairing code, which
since 0.8.2 it has sent to stderr rather than stdout.

This is highly annoying because I had asked the magic wormhole developers
for a machine-readable way to get the data, and instead they changed how
the data was output, and didn't even mention this in my issue, or in the
changelog.

Seems this needs to be tested periodically to make sure it's still working.

This commit was sponsored by Ethan Aubin.
2018-07-04 15:14:03 -04:00
Joey Hess
3976b89116
fix license date
I wrote this this year
2018-06-22 10:25:53 -04:00
Joey Hess
22f49f216e
get android building the security fix
Had to update http-client and network, with follow-on dep changes.

This commit was sponsored by Brock Spratlen on Patreon.
2018-06-21 10:23:04 -04:00
Joey Hess
923578ad78
improve error message
This commit was sponsored by Jack Hill on Patreon.
2018-06-19 14:21:41 -04:00
Joey Hess
47cd8001bc
call base ManagerSetting's exception wrapper
This commit was sponsored by Henrik Riomar on Patreon.
2018-06-19 14:17:05 -04:00
Joey Hess
fc79f68404
support building on debian stable
Specifically, http-client-0.4.31

This commit was supported by the NSF-funded DataLad project.
2018-06-19 11:25:10 -04:00
Joey Hess
3c0a538335
allow ftp urls by default
They're no worse than http certianly. And, the backport of these
security fixes has to deal with wget, which supports http https and ftp
and has no way to turn off individual schemes, so this will make that
easier.
2018-06-18 15:37:17 -04:00
Joey Hess
cc08135e65
prevent using local http proxies per annex.security.allowed-http-addresses
A local http proxy would bypass the security configuration. So,
the security configuration has to be applied when choosing whether to
use the proxy.

While http rebinding attacks against the dns lookup of the proxy IP
address seem very unlikely, this implementation does prevent them, since
it resolves the IP address once, checks it, and then reconfigures
http-client's proxy using the resolved address.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-06-18 13:32:20 -04:00
Joey Hess
b54b2cdc0e
prevent http connections to localhost and private ips by default
Security fix!

* git-annex will refuse to download content from http servers on
  localhost, or any private IP addresses, to prevent accidental
  exposure of internal data. This can be overridden with the
  annex.security.allowed-http-addresses setting.
* Since curl's interface does not have a way to prevent it from accessing
  localhost or private IP addresses, curl defaults to not being used
  for url downloads, even if annex.web-options enabled it before.
  Only when annex.security.allowed-http-addresses=all will curl be used.

Since S3 and WebDav use the Manager, the same policies apply to them too.

youtube-dl is not handled yet, and a http proxy configuration can bypass
these checks too. Those cases are still TBD.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-06-17 13:30:28 -04:00
Joey Hess
43bf219a3c
added makeAddressMatcher
Would be nice to add CIDR notation to this, but this is the minimal
thing needed for the security fix.

This commit was sponsored by Ewen McNeill on Patreon.
2018-06-17 13:29:15 -04:00
Joey Hess
014a3fef34
added isPrivateAddress and isLoopbackAddress
For use in a security boundary enforcement.

Based on https://en.wikipedia.org/wiki/Reserved_IP_addresses

Including supporting IPv4 addresses embedded in IPv6 addresses. Because
while RFC6052 3.1 says "Address translators MUST NOT translate packets
in which an address is composed of the Well-Known Prefix and a non-
global IPv4 address; they MUST drop these packets", I don't want to
trust that implementations get that right when enforcing a security
boundary.

This commit was sponsored by John Pellman on Patreon.
2018-06-17 13:28:25 -04:00
Joey Hess
40e8358284
add Utility.HttpManagerRestricted
This is a clean way to add IP address restrictions to http-client, and
any library using it.
See https://github.com/snoyberg/http-client/issues/354#issuecomment-397830259

Some code from http-client and http-client-tls was copied in and
modified. Credited its author accordingly, and used the same MIT license.

The restrictions don't apply to http proxies. If using http proxies is a
problem, http-client already has a way to disable them.
SOCKS support is not included. As far as I can tell, http-client-tls
does not support SOCKS by default, and so git-annex never has.

The additional dependencies are free; git-annex already transitively
depended on them via http-conduit.

This commit was sponsored by Eric Drechsel on Patreon.
2018-06-16 18:44:13 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
caaedb2993
fix http-client gzip decompression bug
Prevent haskell http-client from decompressing gzip files, so downloads of
such files works the same as it used to with wget and curl.

Explicitly setting accept-encoding to "identity" is probably not needed,
but that's what wget sends (curl does not send the header), and since
http-client is trying to be excessively smart, it seems we need to set
hAcceptEncoding to something to prevent it from inserting its own,
and this seems better than some hack like "".

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-05-21 15:10:25 -04:00
Joey Hess
6a63920732
fix build 2018-05-21 11:00:23 -04:00
Joey Hess
5204e1dd9d
Workaround for bug in an old version of cryptonite that broke https downloads, by using curl for downloads when git-annex is built with it.
This commit was supported by the NSF-funded DataLad project.
2018-05-20 14:12:37 -04:00
Joey Hess
86958fda5d
fix build with old http-client 2018-05-10 00:22:23 -04:00
Joey Hess
db720f6a9c
Display error message when http download fails.
* Display error message when http download fails.

  There's nothing in the http-client library to nicely format a http
  exception, so in some cases it has to fall back to using show on it.
  Seems better than just saying "it failed" or only showing the http
  status code.

* Avoid forward retry when 0 bytes were received.

  forwardRetry was comparing Nothing to Just 0, and so thought there had
  been progress made when 0 bytes were received.

This commit was supported by the NSF-funded DataLad project.
2018-05-08 16:11:45 -04:00
Joey Hess
7dc28dc705
Support building with hinotify-0.3.10.
Kept backwards compat with old versions via a shim.

This commit was sponsored by mo on Patreon.
2018-05-08 14:43:06 -04:00
Joey Hess
2948f6d916
avoid uname -o on !linux and catch any exception from it
Fix bug in last release that prevented the webapp opening on non-Linux systems.

This commit was sponsored by Jake Vosloo on Patreon.
2018-05-08 14:06:19 -04:00
Joey Hess
a8c91ce69a
add streamDirectoryContents 2018-04-26 13:38:36 -04:00
Joey Hess
f5df6244f3
deal with getMounts crashing on android 2018-04-25 17:42:27 -04:00
Joey Hess
096bb0aa21
improve comment 2018-04-25 16:57:57 -04:00
Joey Hess
6e862f47dd
deal with uname -o newline 2018-04-25 16:53:17 -04:00
Joey Hess
9807e5bead
fix webapp opening in termux
Open real url not html shim since android and file:// urls is a nasty
kettle of fish.

This commit was sponsored by John Pellman on Patreon.
2018-04-25 14:38:42 -04:00
Joey Hess
3c6e60dc69
fix build with old http-conduit 2018-04-24 21:23:40 -04:00
Joey Hess
526243d6f5
catch exceptions from getEffectiveUserID
This fixes a crash when using the linux_standalone build in termux on
android.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-04-24 20:10:10 -04:00
Joey Hess
aebf9e6dd5
Fix build with yesod 1.6.
Also avoid some depreaction warnings.
2018-04-22 13:56:35 -04:00
Joey Hess
256d8f07e8
avoid insertWith' depreaction warning
Switch to Data.Map.Strict everywhere that used it.

There are still lots of lazy maps in git-annex. I think switching these
is safe. The risk is that there might be a map that is used in a way
that relies on the values not being evaluated to WHNF, and switching to
strict might result in bad performance or memory use. So, I have not
switched everything.
2018-04-22 13:28:31 -04:00
Joey Hess
558a0a9328
deal with conduit 1.3 change
I don't know if this will build with older conduit, it may need an
ifdef.
2018-04-22 13:14:55 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
e5a404ebe2
fix build with old version of http-client 2018-04-09 13:04:23 -04:00
Joey Hess
c8f2d302dc
run curl when configured to do it at runtime, even if not available at build time 2018-04-06 21:17:36 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
36e6b8abbf
Fix resuming a download when using curl.
Noticed a bug; when using curl a workaround for its empty file behavior
overwrote the file content, so it never resumed and always started over.
2018-04-06 16:09:53 -04:00
Joey Hess
0f6775f1ff
refactor sinkResponseFile and add downloadC
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.

downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.

git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 16:07:08 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
bebf541aa7
Fix calculation of estimated completion for progress meter.
Was estimating transfer of whole file, not remaining part of it.
2018-03-19 23:26:41 -04:00
Joey Hess
2c05bc9dfd
fix build with old base
Old base (used on android still) lacks tryReadMVar
2018-03-16 12:06:45 -04:00
Joey Hess
d2af6baaeb
fixed processTranscript hang problem
The pipe's FDs got inherited by ssh and it did something that kept them
open even once it exited. Probably involving passing them on to the ssh
mux daemon.

Set close on exec, and all is well.

Kept Annex.Ssh not using processTranscript even though it no longer
hangs when it does use it, just because processTranscript is overkill
there.

This commit was supported by the NSF-funded DataLad project.
2018-03-15 16:14:22 -04:00
Joey Hess
d6700721c0
simplify with async
This is much clearer to follow.

I've tested this, and it still has the problem described in
doc/bugs/occasional_hang_with_p2pstdio.mdwn

Which I think indicates that problem is not with my code, but something
else. ghc runtime? Something crazy ssh does in this situation? Unsure..
2018-03-15 15:34:25 -04:00
Joey Hess
7d83502329
add comments explaining puzzling code 2018-03-15 14:47:54 -04:00
Joey Hess
521d4ede1e
fix build with cryptonite-0.20
Some blake hash varieties were not yet available in that version.
Rather than tracking exact details of what cryptonite supported when,
disable blake unless using a current cryptonite.
2018-03-15 11:16:00 -04:00
Joey Hess
ba44ca80e6
Include amount of data transferred in progress display. 2018-03-14 13:39:14 -04:00
Joey Hess
050ada746f
Added backends for the BLAKE2 family of hashes.
There are a lot of different variants and sizes, I suppose we might as well
export all the common ones.

Bump dep to cryptonite to 0.16, earlier versions lacked BLAKE2 support.
Even android has 0.16 or newer.

On Debian, Blake2bp_512 is buggy, so I have omitted it for now.
http://bugs.debian.org/892855

This commit was sponsored by andrea rota.
2018-03-13 16:23:42 -04:00
Joey Hess
e16b069331
use total size from DATA
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.

Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.

In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.

When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-12 21:46:58 -04:00
Joey Hess
c036a380b2
p2p ssh connection pools
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.

Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.

Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)

Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-03-08 15:11:31 -04:00
Joey Hess
3dd43df9c2
Better ssh connection warmup when using -J for concurrency.
Avoids ugly messages when forced ssh command is not git-annex-shell.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-03-07 17:30:14 -04:00
Joey Hess
84e4874ae0
windows build fix 2018-01-05 15:09:10 -04:00
Joey Hess
e4f00e891d
fix windows build 2018-01-04 14:23:11 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
1f5bf73af0
Revert "git-annex.cabal: Add back custom-setup stanza, so cabal new-build works."
This reverts commit 51228c2306.

No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.

With cabal, System.Posix.Process and System.Posix.Env are both missing.
2017-12-31 14:09:41 -04:00
Joey Hess
51228c2306
git-annex.cabal: Add back custom-setup stanza, so cabal new-build works.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-12-31 13:54:41 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
70344d25c0
type signature works for both old and new versions of ifdef 2017-12-11 12:49:23 -04:00
Joey Hess
c6e4bc0a22
fix regression in addurl --file caused by youtube-dl support
Now youtubeDlCheck downloads the beginning of the url's content and
checks if it's html, only when it is does it pass it off the youtube-dl
to check if it supports it.

This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.

As well as the reported bug, this also fixes behavior when an url
was added with youtube-dl, but the url content has now changed from
a html page to something else. Remote.Web.checkKey used to wrongly
succeed in that situation, since youtube-dl said sure it can download
that something else.

This commit was supported by the NSF-funded DataLad project.
2017-12-06 13:22:31 -04:00
Joey Hess
ed701667aa
fix gpg subkey support typo
initremote, enableremote: Really support gpg subkeys suffixed with an
exclamation mark, which forces gpg to use a specific subkey. (Previous try
had a bug.)

This commit was sponsored by Jake Vosloo on Patreon.
2017-12-05 13:58:53 -04:00
Joey Hess
24f27ec39d
convert importfeed to youtube-dl
Fully working, including --fast/--relaxed.

Note that, while git-annex addurl --relaxed is not going to check
youtube-dl, I kept git annex importfeed --relaxed checking it.
Thinking is that, let's not break people's importfeed cron jobs, and
importfeed does not typically have to check a large number of new items,
so it's ok if it's a little bit slower when used with youtube playlist
feeds.

importfeed's behavior is also improved (?) when a feed has links in it
to non-media files. Before, those were skipped. Now, the content of the
link is downloaded. This had to be done, because trying to use
youtube-dl is slow, and if those were skipped, it would have to check
every time importfeed was run. While this behavior change may not be
desirable for some feeds, that intersperse links to web pages with
enclosures, it will be desirable for other feeds, that have
non-enclosure directy links to media files.

Remove old quvi modules.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-29 17:30:02 -04:00
Joey Hess
3febb79c8f
wip 2017-11-28 17:17:40 -04:00
Joey Hess
57b4c5bdff
add Utility.HtmlDetect
This will be used in youtube-dl integration, to tell when a html page has
been downloaded by addurl, in which case it is worth running youtube-dl
to see if it can extract media from it.

tagsoup is an almost free dependency, because yesod depends on it.
So, this only really adds a dep when git-annex is built without the
webapp.

I'd like this to as closely as possible match how browsers decide if a
page is html or not. Unfortunately, that is fairly heuristic, in order
to support malformed html. And, we don't want to falsely detect
something as html just because it has something that looks like a html
tag embedded somewhere in it. Probably any major video hosting site is
going to be serving html documents that at least start with a <html>
tag, so requiring that or a DOCTYPE should be good enough.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-11-28 13:03:11 -04:00
Joey Hess
ed9d5da2d5
Fix build with dns-3.0.
This commit was sponsored by Henrik Riomar on Patreon.
2017-11-24 10:49:31 -04:00
Joey Hess
1b6cbb63e9
still can't express custom-setup deps
They need unix on non-windows, for Utility.Env, which Build.Configure uses,
but cabal can't express that in a custom-setup stanza.

To avoid this problem, Utility.Env would need to be moved into
unix-compat..
2017-11-14 14:59:51 -04:00
Joey Hess
8d68112be5
split out setEnv to avoid adding dep
Windows needs the setenv package in custom-setup, but I don't want to
pull it in on unix, which would probably break some builds and need more
work. Instead, split out setEnv to a separate module.

Quite likely, unix-compat will get a portable environment layer, and
then both modules can be removed from here.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-14 14:28:49 -04:00
Joey Hess
07c4be500d
clean up build warnings on Windows 2017-11-14 14:14:10 -04:00
Joey Hess
8dd84b87f9
use unix-compat 0.5 on windows
Re-applying 3ec579f5e1
2017-11-14 14:00:24 -04:00
Joey Hess
1bd956bed4
Revert "Revert "remove dep on Win32-extras""
This reverts commit d18bc52caf.
2017-11-13 12:55:23 -04:00
Joey Hess
5f55082d10
Revert "use unix-compat 0.5 on windows"
This reverts commit 3ec579f5e1.

Too early for this; needs newer Win32 version. Le sigh.
2017-11-09 15:14:00 -04:00
Joey Hess
d18bc52caf
Revert "remove dep on Win32-extras"
This reverts commit 8b5480c66a.

Yeah, too early for that too
2017-11-09 15:09:14 -04:00
Joey Hess
8b5480c66a
remove dep on Win32-extras
Win32 now has its own getCurrentProcessId.
2017-11-09 14:03:06 -04:00
Joey Hess
3ec579f5e1
use unix-compat 0.5 on windows
That version has my patches for the problems that Utility.PosixFiles
was working around, so am able to get rid of that module now.

This will later allow bringing back the custom-setup stanza in the cabal
file. It will need to depend on unix-compat 0.5 on all OS's, which I'm
not ready to do yet.

This commit was sponsored by Nick Daly on Patreon.
2017-11-09 12:47:05 -04:00
Joey Hess
c3449ff91d
finish fix for gitAnnexLink on windows
dropDrive needed since if splitPath splits out the drives, they would
appear different.
2017-10-26 12:01:16 -04:00
Joey Hess
584dbfb892
terminateProcessId renamed
win32 upstream suggested a better name
2017-10-25 19:46:28 -04:00
Joey Hess
0ae2ac282e
fix gitAnnexLink to not be absolute on Windows
Windows: Fix reversion that caused the path used to link to annexed
content include the drive letter and full path, rather than being
relative. (`git annex fix` will fix up after this problem).

I've not identified the commit that brought the reversion (probably it
happened this spring when I was removing MisingH and last touched
Utility.Path). Likely commit 18b9a4b8024115db67ae309fdaf54e1553037529?

The problem is that relPathDirToFile got called two paths that had the
slashes different ways around. Since takeDrive includes the first slash,
this made two paths on the same drive seem different and it bailed.

(ifdefs around this to avoid doing extra work on non-windows)

This commit was sponsored by Jack Hill on Patreon.
2017-10-25 19:36:29 -04:00
Joey Hess
833b3f06cd
build for windows with forked win32 package that has terminateProcessId
Get ugly reversion out of CHANGELOG.

Also, relocated the windows stack.yaml to top, and updated windows build
instructions.

This commit was sponsored by Henrik Riomar on Patreon.
2017-10-25 14:45:23 -04:00
Joey Hess
78b5e759a5
fix 2017-10-24 13:20:26 -04:00
Joey Hess
3e839ab327
temporary hack to get windows build working
Code for terminating processes on Windows is not linking anymore;
made a warning be displayed instead. This breaks restarting the
assistant and git annex assistant --stop.

I hope to see the code added to the Win32 library, where it should fit
better and should avoid whatever problem is making the linker not like it
when included in git-annex. I opened an issue requesting its addition,
here: https://github.com/haskell/win32/issues/91

This commit was sponsored by Thomas Hochstein on Patreon.
2017-10-24 13:16:40 -04:00
Joey Hess
901807cf75
Revert "try to avoid TerminateProcess link error on windows"
This reverts commit 839ec7e26c.

Neither way is working.. The other way failed:

.stack-work\dist\5f9bc736\build\git-annex\git-annex-tmp\Assistant.o:fake:(.text+0x6bb3): undefined reference to `terminatepid'

Seems that winprocess.c is not getting linked in.
2017-10-24 13:05:24 -04:00
Joey Hess
24ba7c4296
add winprocess.h 2017-10-24 12:48:04 -04:00
Joey Hess
839ec7e26c
try to avoid TerminateProcess link error on windows
Building with stack, it failed:

`_TerminateProcess' referenced in section `.text' of .stack-work\dist\5f9bc736\build\git-annex\git-annex-tmp\Utility\WinProcess.o: defined in discarded section `.text' of C:/Users/jenkins/AppData/Local/Programs/stack/i386-windows/ghc-8.0.2/mingw/bin/../lib/gcc/i686-w64-mingw32/5.2.0/../../../../i686-w64-mingw32/lib/../lib/libkernel32.a(dacgs01154.o)

This is a reversion of 86e638567a,
to try the other way to implement it, which will hopefully avoid the problem.
2017-10-24 12:33:44 -04:00
Joey Hess
92c7e67022
temporarily import from win32-extras 2017-10-24 12:06:41 -04:00
Joey Hess
93d5951f11
remove redundant pattern match 2017-09-24 16:17:58 -04:00
Joey Hess
01068d8280
fix build with old http-client 2017-09-13 15:35:42 -04:00
Joey Hess
2ca1d3cc01
deal with box.com horrible infinite redirect behavior
webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.

It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.

Deal with such problems by assuming such behavior means the file is not
present.

Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 15:13:42 -04:00
Joey Hess
bb08b1abd2
make storeExport atomic
This avoids needing to deal with the complexity of partially transferred
files in the export. We'd not be able to resume uploading to such a file
anyway, so just avoid them.

The implementation in Remote.Directory is not completely ideal, because
it could leave the temp file hanging around in the export directory.
This only happens if it's killed with -9, or there's a power failure;
normally viaTmp cleans up after itself, even when interrupted. I could
not see a better way to do it though, since the export directory might
be the root of a filesystem.

Also some design thoughts on resuming, which depend on storeExport being
atomic.

This commit was sponsored by Fernando Jimenez on Partreon.
2017-08-31 14:24:32 -04:00
Joey Hess
df11e54788
avoid the dashed ssh hostname class of security holes
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.

No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.

Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.

SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.

Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.

This commit was sponsored by Jochen Bartl on Patreon.
2017-08-17 22:11:31 -04:00
Joey Hess
0a2f7c261f
fix build with old http-client versions 2017-08-17 11:00:48 -04:00
Joey Hess
266bf43632
make import work with Win32 instead of Win32-extras 2017-08-16 17:51:29 -04:00
Joey Hess
69dcb08d7a
Disable http-client's default 30 second response timeout when HEADing an url to check if it exists. Some web servers take quite a long time to answer a HEAD request. 2017-08-15 13:56:12 -04:00
Joey Hess
8526cd7c92
test: Avoid most situations involving failure to delete test directories
By forking a worker process and only deleting the test directory once it exits.

This way, if a test leaves files open, they'll get closed when the worker
exits, so avoiding failure to delete open files on Windows, and failure to
delete directories due to NFS lock files.

If a test leaves a git worker process running, the closed pipes should
cause the worker to exit too, also avoiding the problem there. The 10
second sleep ought to give plenty of time for such worker processes to
exit, although this is of course a race.

Finally, even if test directory fails to be deleted still,
it won't appear as if the last test in the test suite failed; the error
will be displayed at the very end.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 16:29:47 -04:00
Joey Hess
da8e84efe9
fix failing quickcheck properties
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.

As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.

So, making Git.FileName's roundtrip test only chars < 256 seems fine.

Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.

Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.

Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.

This commit was sponsored by Henrik Riomar on Patreon.
2017-06-17 16:48:00 -04:00
Joey Hess
75cecbbe3f
Fix build with QuickCheck 2.10.
QuickCheck added an Arbitrary instance for CTime aka EpochTime. However,
while git-annex's instance disallowed times before the epoch, QuickCheck's
does not. So, rather than using its instance, convert from an Integer.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-06-17 13:04:48 -04:00
Joey Hess
1426f7ff3a
disable closingTracked on OSX
Don't trust OSX FSEvents's eventFlagItemModified to be called when the last
writer of a file closes it; apparently that sometimes does not happen,
which prevented files from being quickly added.

This commit was sponsored by John Peloquin on Patreon.
2017-06-09 14:18:58 -04:00
Joey Hess
19a6227e6e
remove temp file in failure case 2017-06-06 14:23:33 -04:00
Joey Hess
ed639c140d
Fix bug that prevented transfer locks from working when run on SMB or other filesystem that does not support fcntl locks and hard links.
This commit was sponsored by Ethan Aubin.
2017-06-06 14:22:03 -04:00