Commit graph

280 commits

Author SHA1 Message Date
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
c20f4704a7
all commands building except for assistant
also, changed ConfigValue to a newtype, and moved it into Git.Config.
2019-12-05 14:41:18 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
bf179f64d1
add regression test 2019-11-14 14:02:09 -04:00
Joey Hess
3f0eef4baa
v7 for all repositories
* Default to v7 for new repositories.
* Automatically upgrade v5 repositories to v7.
2019-08-30 14:09:14 -04:00
Joey Hess
adb89ee71b
update test suite for removal of direct mode
Removed that pass and all the complications of checking direct mode's
edge cases.
2019-08-26 15:07:10 -04:00
Joey Hess
007892739d
avoid running adjusted branch tests when git is too old 2019-08-15 13:57:12 -04:00
Joey Hess
b36229905f
avoid redundant test pass on crippled filesystem
v7 unlocked uses an adjusted branch on crippled filesystem, so is nearly
identical to the previous test pass.
2019-08-13 15:11:49 -04:00
Joey Hess
5798d063b0
make test_export_import work on adjusted branch 2019-08-09 14:00:22 -04:00
Joey Hess
b87ea12b6b
git-annex merge branch
* merge: When run with a branch parameter, merges from that branch.
  This is especially useful when using an adjusted branch, because
  it applies the same adjustment to the branch before merging it.
2019-08-09 13:21:15 -04:00
Joey Hess
b90ee6dc52
test: Add pass using adjusted unlocked branch
On second thought, the extra time running the test suite is worth it.
It will be gained back once we finally get rid of direct mode.

There are two failing tests, same two that have been failing on windows
(though the failure does not look identical). So this should also spare me
the Windows VM while fixing.
2019-08-09 11:34:10 -04:00
Joey Hess
fbc270a3f0
disable 2 failing tests on windows
I have not tracked down why these fail on windows, but they
mostly test git-annex-shell anyway, and windows rarely acts as a
ssh server.
2019-08-08 14:58:40 -04:00
Joey Hess
57b24b2510
avoid pushing the special remote to origin
The sync is only to sync up the adjusted branch, not other info.
Since many tests use their own special remote named "foo",
the push broke later tests.
2019-08-08 14:33:46 -04:00
Joey Hess
65f34ffb4c
fix windows build 2019-08-08 13:41:56 -04:00
Joey Hess
9e230cd448
work around adjusted unlocked branch problem in test suite 2019-08-08 13:28:04 -04:00
Joey Hess
aac0e187c5
don't test rsync special remote on windows
git-annex no longer ships with rsync on windows so this will generally
fail
2019-08-08 12:37:25 -04:00
Joey Hess
6c7bbe2c5a
test case for bf7ecd6892 2019-05-06 14:24:42 -04:00
Joey Hess
b6a3d0ae10
fix test suite when git is too old to understand --allow-unrelated-histories 2019-03-22 13:49:22 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
8e9713b769
add export+import test case 2019-03-06 16:49:33 -04:00
Joey Hess
936aee6a60
quickcheck property for parsing of content identifier logs 2019-02-21 13:17:43 -04:00
Joey Hess
f5f059e288
relocate gpg test framework temp dir to outside repo
The gitAnnexTmpOtherDir cleanup made it be deleted too early sometimes,
and so the test suite failed. Also there was a report of a similar
failure which likely had a similar cause and hopwfully this fixes that
too.
2019-01-21 14:16:00 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
591e4b145f
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.

There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo  bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo  bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.

Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...

For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1  group2 " due to excess
whitespace, which would be even more confusing behavior.

The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 16:34:20 -04:00
Joey Hess
bfc9039ead
convert git-annex branch access to ByteStrings and Builders
Most of the individual logs are not converted yet, only presense logs
have an efficient ByteString Builder implemented so far. The rest
convert to and from String.
2019-01-03 13:21:48 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
f9ec330cbf
avoid a FAT fail 2018-12-19 13:45:28 -04:00
Joey Hess
5759e93444
honor init --version=5 on crippled filesystem
init: When --version=5 is passed on a crippled filesystem, use a v5 direct
mode repo as requested, rather than upgrading to v7 adjusted unlocked.

Fixed test suite on crippled filesystems, making it request --version=5
to test direct mode.
2018-12-19 13:17:04 -04:00
Joey Hess
14971414dc
Make test suite work better when the temp directory is on NFS.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.

Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
2018-12-19 12:44:56 -04:00
Joey Hess
83109affd1
remove leftovers from removed TestSuite build flag
Test suite is always built, so this can be simplified.
2018-11-19 12:39:16 -04:00
Joey Hess
76a25fdcf0
Fix test suite failure when git-annex test is not run inside a git repository
Not the first time this kind of test suite breakage has happened..
It would be good to avoid somehow it looking up from .t and finding a git
repo. But just running the test suite from time to time outside of
git-annex would also let me notice these before the distribution packagers
do.

This commit was sponsored by mo on Patreon.
2018-11-05 13:31:49 -04:00
Joey Hess
89ac32616f
don't move keys db out from under sqlite
Instead remove enough data from it that this regression test tests what
it needs to.

Moving the database was the last thing that made the test suite
unstable, including sometimes hanging completely. It seems that, despite
closeDb being called before the move, sqlite was not quite done with it,
and somehow this causes other sqlite handles to become unstable. Not
good.

With this change, the test suite has successfully run 100+ times without
any issues.
2018-10-31 08:04:16 -04:00
Joey Hess
cc1087de42
test: display error messages from git-annex on unexpected failures
.. but not on expected failures
2018-10-30 10:49:39 -04:00
Joey Hess
595fb98473
add small delay to avoid problems on systems with low-resolution mtime
I've seen intermittent failures of the test suite with v6 for a long time,
it seems to have possibly gotten worse with the changes around v7. Or just
being unlucky; all tests failed today.

Seen on amd64 and i386 builders, repeatedly but intermittently:

	unused: FAIL (4.86s)
	Test.hs:928:
	git diff did not show changes to unlocked file

And I think other such failures, all involving v7/v6 mode tests.

I managed to reproduce the unused failure with --keep-failures,
and inside the repo, git diff was indeed not showing any changes for
the modified unlocked file.

The two stats will be the same other than mtime; the old and new files have
the same size and inode, since the test case writes to the file and then
overwrites it.

Indeed, notice the identical timestamps:

	builder@orca:~/gitbuilder/build/.t/tmprepo335$ echo 1 > foo; stat foo; echo 2 > foo; stat foo
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.894942036 +0000
	Change: 2018-10-29 22:14:10.894942036 +0000
	 Birth: -
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.898942036 +0000
	Change: 2018-10-29 22:14:10.898942036 +0000
	 Birth: -

I'm seeing this in Linux VMs; it doesn't happen on my laptop. I've also
not experienced the intermittent test suite failures on my laptop.

So, I hope that this small delay will avoid the problem.

Update: I didn't, indeed I then reproduced the same failure on my
laptop, so it must be due to something else. But keeping this change anyway
since not needing to worry about lowish-resolution mtime in the test suite seems
worthwhile.
2018-10-29 19:31:26 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
94aa0e2f64
fix strange test failure
It was trying to git annex adjust when in a direct mode repo, and that
of course fails. What I don't understand though, is how the test suite
managed to work before, when it was clearly checking the wrong thing.
Since the right way to fix it was obvious, I have not bisected.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 16:51:09 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
6063b3df3f
Dial back optimisation when building on arm
Prevent ghc and llc from running out of memory when optimising some
files.

Sean Whitton reported that doing this only in Test.hs was insufficient,
the build still OOMed by the time it got to Test.hs. He had earlier found
the build worked when these options are applied globally.

See https://ghc.haskell.org/trac/ghc/ticket/14821 for why it needs -O1;
once that's fixed it may suffice to use "GHC-Options: -O2 -optlo-O2",
although it may also be that the -O1 prevents ghc from using/leaking
as much memory.

os(arm) should match armel, armhf, armeb, and arm.
It probably also matches arm64, somewhat unfortunately since arm64
systems probably tend to have more memory. See list of arches in
https://hackage.haskell.org/package/Cabal-1.22.2.0/docs/src/Distribution-System.html

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-04 19:48:07 -04:00
Joey Hess
8ccfbd14d0
Split Test.hs and avoid optimising it much, to need less memory to compile.
The ghc options were found by Sean Whitton; the debian arm autobuilders
need those to build w/o OOM, and it seems to involve llvm using too much
memory to optimize Test.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-02-18 11:48:48 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
1f5bf73af0
Revert "git-annex.cabal: Add back custom-setup stanza, so cabal new-build works."
This reverts commit 51228c2306.

No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.

With cabal, System.Posix.Process and System.Posix.Env are both missing.
2017-12-31 14:09:41 -04:00
Joey Hess
51228c2306
git-annex.cabal: Add back custom-setup stanza, so cabal new-build works.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-12-31 13:54:41 -04:00
Joey Hess
79857d7e9f
Removed the testsuite build flag
Test suite is always included.

Building with this flag disabled has actually been broken for some time,
since Command.TestRemote uses tasty. Fewer build flags are better, so good
time to drop it.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-12-20 12:25:03 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
3cc94c1667
.noannex file
A top-level .noannex file will prevent git-annex init from being used in a
repository. This is useful for repositories that have a policy reason not
to use git-annex. The content of the file will be displayed to the user who
tries to run git-annex init.

This also affects git annex reinit and initialization via the webapp.
It does not affect automatic inits, when there's a sibling git-annex branch
already.

This commit was supported by the NSF-funded DataLad project.
2017-12-13 14:34:32 -04:00
Joey Hess
429132f496
try again to avoid directory removal issues on NFS
af6068525a seems to not have worked;
though the keys database should not have any files open after closeDb,
NFS seems to be creating some files where while the directory is being
removed, which causes the removal to fail.

So instead, try renaming the directory out of the way.

This commit was supported by the NSF-funded DataLad project.
2017-12-05 14:25:51 -04:00