Commit graph

224 commits

Author SHA1 Message Date
Joey Hess
c3afb3434d
remove recently added cache from KeyVariety
Adding that field broke the Read/Show serialization back-compat,
and also the Eq and Ord instances were not blinded to it, which broke
git annex fsck and probably more.

I think that the new approach used in formatKeyVariety will be nearly
as fast, but have not benchmarked it.
2019-01-16 16:33:08 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
4ecba916a1
annex.maxextensionlength
Added annex.maxextensionlength for use cases where extensions longer than 4
characters are needed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-09-24 12:10:18 -04:00
Joey Hess
c565340adc
stop using external hash programs, since cryptonite is faster
In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external
hashers". Re-benchmarking today, I found cryptonite's sha256 consistently
outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb
files with both sha256 and sha512. And for smaller files, the external
process startup time swamps the hash time.

Perhaps cryptonite has improved. Or it could just do better on my
current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite
is slower in some situations, seems likely it would only be marginally slower;
it's got the same class of highly optimised C code under the hood as coreutils.
The main difference between the two sha256 implementations seems to be
how much of the inner loop they unroll..

This commit was sponsored by Henrik Riomar on Patreon.
2018-08-28 18:10:58 -04:00
Joey Hess
2da2ae0919
fix migration bug and make fsck warn
* migrate: Fix bug in migration between eg SHA256 and SHA256E,
  that caused the extension to be included in SHA256 keys,
  and omitted from SHA256E keys.
  (Bug introduced in version 6.20170214)
* migrate: Check for above bug when migrating from SHA256 to SHA256
  (and same for SHA1 to SHA1 etc), and remove the extension that should
  not be in the SHA256 key.
* fsck: Detect and warn when keys need an upgrade, either to fix up
  from the above migrate bug, or to add missing size information
  (a long ago transition), or because of a few other past key related
  bugs.

This commit was sponsored by Henrik Riomar on Patreon.
2018-05-23 14:07:51 -04:00
Joey Hess
521d4ede1e
fix build with cryptonite-0.20
Some blake hash varieties were not yet available in that version.
Rather than tracking exact details of what cryptonite supported when,
disable blake unless using a current cryptonite.
2018-03-15 11:16:00 -04:00
Joey Hess
050ada746f
Added backends for the BLAKE2 family of hashes.
There are a lot of different variants and sizes, I suppose we might as well
export all the common ones.

Bump dep to cryptonite to 0.16, earlier versions lacked BLAKE2 support.
Even android has 0.16 or newer.

On Debian, Blake2bp_512 is buggy, so I have omitted it for now.
http://bugs.debian.org/892855

This commit was sponsored by andrea rota.
2018-03-13 16:23:42 -04:00
Joey Hess
07e253b1fb
Improve SHA*E extension extraction code
Do not treat parts of the filename that contain punctuation or other
non-alphanumeric characters as extensions. Before, such characters were
filtered out.

Note that in 45308ec78b "foo.ba__________r"
was munged to ".bar" and so incorrectly treated as an extension. That was
fixed by changing the filter order, but not allowing punctuation seems a
better fix.

This assumes that extensions containing punctuation are rare. "_" seems the
most likely character; I used it in ikiwiki "._comment" files. But I can't
recall seeing it anywhere else. It certianly seems that no commonly used
extensions contain punctuation. If git-annex doesn't treat "._comment"
as an extension, it's not likely to break software that expects to see that
extension like some software expects to see .epub or .mp3.

This commit was sponsored by Jack Hill on Patreon.
2018-03-05 11:25:01 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
fc845e6530
more lambda-case conversion 2017-12-05 15:00:50 -04:00
Joey Hess
96c055eda2
migrate: WORM keys containing spaces will be migrated to not contain spaces anymore
To work around the problem that the external special remote protocol does
not support keys containing spaces.

This commit was sponsored by Denis Dzyubenko on Patreon.
2017-08-17 15:09:38 -04:00
Joey Hess
6dd806f1ad
stop using MissingH for MD5
Cryptonite is faster and allocates less, and I want to get rid of
MissingH use.

Note that the new dependency on memory is free; it's a dependency of
cryptonite.

This commit was supported by the NSF-funded DataLad project.
2017-05-15 21:36:03 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
40327cab6e
Removed support for building with the old cryptohash library.
Building with that library made git-annex not support SHA3; it's time for
that to always be supported in case SHA2 dominoes.
2017-02-24 20:56:26 -04:00
Joey Hess
9c4650358c
add KeyVariety type
Where before the "name" of a key and a backend was a string, this makes
it a concrete data type.

This is groundwork for allowing some varieties of keys to be disabled
in file2key, so git-annex won't use them at all.

Benchmarks ran in my big repo:

old git-annex info:

real	0m3.338s
user	0m3.124s
sys	0m0.244s

new git-annex info:

real	0m3.216s
user	0m3.024s
sys	0m0.220s

new git-annex find:

real	0m7.138s
user	0m6.924s
sys	0m0.252s

old git-annex find:

real	0m7.433s
user	0m7.240s
sys	0m0.232s

Surprising result; I'd have expected it to be slower since it now parses
all the key varieties. But, the parser is very simple and perhaps
sharing KeyVarieties uses less memory or something like that.

This commit was supported by the NSF-funded DataLad project.
2017-02-24 15:16:56 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
45308ec78b
Improve SHA*E extension extraction code.
Filter out over-long "extensions" before stripping out non-alphanumerics
from them, so that eg "foo.ba__________r" is not considered a .bar
extension.
2016-05-27 13:14:51 -04:00
Joey Hess
d2fa4a6873
rename function 2016-05-27 13:10:23 -04:00
Joey Hess
b946ca44c3
Support --metadata field<number, --metadata field>number etc to match ranges of numeric values.
Similarly (well, for free), support preferred content expressions like
metadata=field<number and metadata=field>number
2016-02-27 10:55:02 -04:00
Joey Hess
ac8af8da07
better forcing of hash 2016-02-26 16:36:24 -04:00
Joey Hess
e3a73e5bb7
try again at forcing file read while hashing 2016-02-26 14:04:10 -04:00
Joey Hess
d1f87e8c8e
test revert "force hash to finish with file before returning"
This reverts commit 7482853ddd.

This seems to have caused a memory leak.
2016-02-26 13:30:38 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
7482853ddd
force hash to finish with file before returning
Fixes a minor fd leak, never more than 1 in normal use,
which broke the test suite when I tried to write to
a file that was still open for a previous hashing.
2016-01-06 22:09:36 -04:00
Joey Hess
a0fcb8ec93
generalize catchHardwareFault to catchIOErrorType 2015-12-06 16:26:38 -04:00
Joey Hess
fa9333e99f
use action, not sideAction
sideAction is for things not generally related to the current action being
performed. And, it adds a newline after the side action. This was not the
right thing to use for stuff like "checksum", where doing a checksum is
part of the git annex get process, and indeed we want it to display
"(checksum...) ok"
2015-10-11 13:29:44 -04:00
Joey Hess
cad3349001 rename fsckKey to verifyKeyContent
No behavior changes.
2015-10-01 13:29:17 -04:00
Joey Hess
0ec9bc2200 Added support for SHA3 hashed keys (in 8 varieties), when git-annex is built using the cryptonite library.
While cryptohash has SHA3 support, it has not been updated for the final
version of the spec. Note that cryptonite has not been ported to all arches
that cryptohash builds on yet.
2015-08-06 15:02:25 -04:00
Joey Hess
8990d4cc68 fsck: When checksumming a file fails due to a hardware fault, the file is now moved to the bad directory, and the fsck proceeds. Before, the fsck immediately failed. 2015-05-27 16:40:03 -04:00
Joey Hess
30960c0465 refactor 2015-05-27 16:00:44 -04:00
Joey Hess
b256f861ca if external hash command fails for any reason, fall back to internal hashing
This way, if a system's sha1sum etc is broken, it will be tried if
git-annex was built to use it, but at least it will fall back to using
internal hashing when it fails.

A side benefit of this is that hashFile consistently throws an IOError if
the file is unable to be read. In particular, if the disk is failing with
IO errors, and external hash command is used, it used to throw a user error
with the error message from externalSHA. Now, the external hash command
will fail, that message will be printed as a warning, and it'll fall back
to the internal hash command. If the disk IO error is not intermittent, it
will re-occur, and so an IOError will be thrown.

Of course, this can mean it reads a file twice, but only in edge cases.
2015-05-27 15:58:32 -04:00
Joey Hess
77c43a388e fromkey, registerurl: Allow urls to be specified instead of keys, and generate URL keys.
This is especially useful because the caller doesn't need to generate valid
url keys, which involves some escaping of characters, and may involve
taking a md5sum of the url if it's too long.
2015-05-22 22:41:36 -04:00
Joey Hess
8eb01bc894 Added MD5 and MD5E backends. 2015-02-04 13:47:54 -04:00
Joey Hess
95c1593098 Remove support for building without cryptohash.
This will prevent backporting to wheezy, but it's time to simplify the
code.
2015-02-04 13:41:26 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
4f657aa14e add getFileSize, which can get the real size of a large file on Windows
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.

Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.

This commit was sponsored by Christian Dietrich.
2015-01-20 17:09:24 -04:00
Joey Hess
c0f2b992ed Generate shorter keys for WORM and URL, avoiding keys that are longer than used for SHA256, so as to not break on systems like Windows that have very small maximum path length limits. 2015-01-06 17:58:57 -04:00
Joey Hess
73928c2274 Avoid re-checksumming when migrating from hash to hashE backend. Closes: #774494 2015-01-04 12:33:10 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
9711d529c8 WORM backend: Switched to include the relative path to the file inside the repository, rather than just the file's base name. Note that if you're relying on such things to keep files separate with WORM, you should really be using a better backend. 2014-09-11 14:50:18 -04:00
Joey Hess
f0df660570 WORM backend: When adding a file in a subdirectory, avoid including the subdirectory in the key name. 2014-08-12 14:38:53 -04:00
Joey Hess
9720ee9e56 testremote: New command to test uploads/downloads to a remote.
This only performs some basic tests so far; no testing of chunking or
resuming. Also, the existing encryption type of the remote is used; it
would be good later to derive an encrypted and a non-encrypted version of
the remote and test them both.

This commit was sponsored by Joseph Liu.
2014-08-01 15:10:01 -04:00
Joey Hess
13bbb61a51 add key stability checking interface
Needed for resuming from chunks.

Url keys are considered not stable. I considered treating url keys with a
known size as stable, but just don't feel that is enough information.
2014-07-27 12:33:46 -04:00
Joey Hess
d751591ac8 add chunk metadata to Key
Added new fields for chunk number, and chunk size. These will not appear
in normal keys ever, but will be used for chunked data stored on special
remotes.

This commit was sponsored by Jouni K Seppanen.
2014-07-24 13:36:23 -04:00
Joey Hess
9d71903c2f migrate: Avoid re-checksumming when migrating from hashE to hash backend. 2014-07-10 17:06:04 -04:00