Commit graph

285 commits

Author SHA1 Message Date
Joey Hess
5f4ad2a5de
refactor 2024-05-21 09:51:19 -04:00
Joey Hess
3a38520aac
avoid interrupted push leaving remote without a manifest
Added a backup manifest key, which is used if the main manifest key is
not present. When uploading a new Manifest, it makes sure that it never
drops one key except when the other key is present.

It's entirely possible for the two manifest keys to get out of sync, due
to races. The main one wins when it's present, it is possible for the
main one being dropped to expose the backup one, which has a different
push recorded.
2024-05-20 15:41:09 -04:00
Joey Hess
adb564a48e
avoid using dropEnd
That needs bytestring-0.11.1.0 and several builds don't have that.
2024-05-16 09:52:34 -04:00
Joey Hess
468de43d66
Merge branch 'master' into git-remote-annex 2024-05-15 17:49:12 -04:00
Joey Hess
24af51e66d
git-annex unused --from remote skips its git-remote-annex keys
This turns out to only be necessary is edge cases. Most of the
time, git-annex unused --from remote doesn't see git-remote-annex keys
at all, because it does not record a location log for them.

On the other hand, git-annex unused does find them, since it does not
rely on the location log. And that's good because they're a local cache
that the user should be able to drop.

If, however, the user ran git-annex unused and then git-annex move
--unused --to remote, the keys would have a location log for that
remote. Then git-annex unused --from remote would see them, and would
consider them unused. Even when they are present on the special remote
they belong to. And that risks losing data if they drop the keys from
the special remote, but didn't expect it would delete git branches they
had pushed to it.

So, make git-annex unused --from skip git-remote-annex keys whose uuid
is the same as the remote.
2024-05-14 15:17:40 -04:00
Joey Hess
8844372c23
remove doesPathExist check in checkKeyCheckSum
Before commit c565340adc,
it was statting the file in order to get its size, which was needed to
use an external hasher. In that commit since it no longer needed the
stat, I made it check doesPathExist since the old code implicitly
checked if the file existed. But that is not necessary, this should only
be ever called on files that exist.
2024-05-12 21:31:58 -04:00
Joey Hess
ff5193c6ad
Merge branch 'master' into git-remote-annex 2024-05-10 14:20:36 -04:00
Joey Hess
c7731cdbd9
add Backend.GitRemoteAnnex
Making GITBUNDLE be in the backend list allows those keys to be
hashed to verify, both when git-remote-annex downloads them, and by other
transfers and by git fsck.

GITMANIFEST is not in the backend list, because those keys will never be
stored in .git/annex/objects and can't be verified in any case.

This does mean that git-annex version will include GITBUNDLE in the list
of backends.

Also documented these in backends.mdwn

Sponsored-by: Kevin Mueller on Patreon
2024-05-07 13:54:08 -04:00
Yaroslav Halchenko
87e2ae2014
run codespell throughout fixing typos automagically
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
2024-05-01 15:46:21 -04:00
Joey Hess
c410b2bb73
annex.maxextensions configuration
Controls how many filename extensions to preserve.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-04-18 14:23:38 -04:00
Joey Hess
c64a73c7ea
startExternalAddonProcess add parameters
Not used yet but intended to support eg running "rclone gitannex"
2024-04-17 13:09:10 -04:00
Joey Hess
b4811b74b1
implement VURL to URL migration
Just for completeness, someone might want to downgrade I guess?
2024-03-01 17:01:27 -04:00
Joey Hess
e7652b0997
implement URL to VURL migration
This needs the content to be present in order to hash it. But it's not
possible for a module used by Backend.URL to call inAnnex because that
would entail a dependency loop. So instead, rely on the fact that
Command.Migrate calls inAnnex before performing a migration.

But, Command.ExamineKey calls fastMigrate and the key may or may not
exist, and it's not wanting to actually perform a migration in any case.
To handle that, had to add an additional value to fastMigrate to
indicate whether the content is inAnnex.

Factored generateEquivilantKey out of Remote.Web.

Note that migrateFromURLToVURL hardcodes use of the SHA256E backend.
It would have been difficult not to, given all the dependency loop
issues. But --backend and annex.backend are used to tell git-annex
migrate to use VURL in any case, so there's no config knob that
the user could expect to configure that.

Sponsored-by: Brock Spratlen on Patreon
2024-03-01 16:42:02 -04:00
Joey Hess
4046f17ca0
incremental verification for VURL
Sponsored-by: Brett Eisenberg on Patreon
2024-03-01 13:33:29 -04:00
Joey Hess
c72df19784
verifyKeyContent for VURL
VURL is now fully working, though needs more testing.

Still need to implement verifyKeyContentIncrementally but it works
without it.

Sponsored-by: Luke T. Shumaker on Patreon
2024-02-29 17:44:21 -04:00
Joey Hess
cc17ac423b
implement isCryptographicallySecureKey for VURL
Considerable difficulty to work around an import cycle. Had to move the
list of backends (except for VURL) to Backend.Variety to VURL could use
it.

Sponsored-by: Kevin Mueller on Patreon
2024-02-29 17:26:35 -04:00
Joey Hess
e7b7ea78af
lift isCryptographicallySecure to Annex
Needed for VURL backend.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-02-29 16:14:13 -04:00
Joey Hess
55bf01b788
add equivilant key log for VURL keys
When downloading a VURL from the web, make sure that the equivilant key
log is populated.

Unfortunately, this does not hash the content while it's being
downloaded from the web. There is not an interface in Backend currently
for incrementally hash generation, only for incremental verification of an
existing hash. So this might add a noticiable delay, and it has to show
a "(checksum...") message. This could stand to be improved.

But, that separate hashing step only has to happen on the first download
of new content from the web. Once the hash is known, the VURL key can have
its hash verified incrementally while downloading except when the
content in the web has changed. (Doesn't happen yet because
verifyKeyContentIncrementally is not implemented yet for VURL keys.)

Note that the equivilant key log file is formatted as a presence log.
This adds a tiny bit of overhead (eg "1 ") per line over just listing the
urls. The reason I chose to use that format is it seems possible that
there will need to be a way to remove an equivilant key at some point in
the future. I don't know why that would be necessary, but it seemed wise
to allow for the possibility.

Downloads of VURL keys from other special remotes that claim urls,
like bittorrent for example, does not popilate the equivilant key log.
So for now, no checksum verification will be done for those.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-02-29 16:01:49 -04:00
Joey Hess
0f7143d226
support VURL backend
Not yet implemented is recording hashes on download from web and
verifying hashes.

addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.

It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.

Sponsored-by: Joshua Antonishen on Patreon
2024-02-29 13:48:51 -04:00
Joey Hess
8b6c7bdbcc
filter out control characters in all other Messages
This does, as a side effect, make long notes in json output not
be indented. The indentation is only needed to offset them
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brock Spratlen on Patreon
2023-04-11 12:58:01 -04:00
Joey Hess
3290a09a70
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.

Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.

When json is being output, no quoting is done, since json gets its own
quoting.

This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 15:55:44 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Joey Hess
d4cb7afeed
remove unused Key parameter from isCryptographicallySecure
This will allow using isCryptographicallySecure on a Backend, before a
Key has been generated.

Sponsored-by: Lawrence Brogan on Patreon
2023-03-27 14:34:00 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
cb9cf30c48
move several readonly values to AnnexRead
This improves performance to a small extent in several places.

Sponsored-by: Tobias Ammann on Patreon
2022-06-28 15:40:19 -04:00
Joey Hess
19e78816f0
convert Key to ShortByteString
This adds the overhead of a copy when serializing and deserializing keys.
I have not benchmarked much, but runtimes seem barely changed at all by that.

When a lot of keys are in memory, it improves memory use.

And, it prevents keys sometimes getting PINNED in memory and failing to GC,
which is a problem ByteString has sometimes. In particular, git-annex sync
from a borg special remote had that problem and this improved its memory
use by a large amount.

Sponsored-by: Shae Erisson on Patreon
2021-10-05 20:20:08 -04:00
Joey Hess
449851225a
refactor
IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.

It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 13:19:02 -04:00
Joey Hess
c4aba8e032
better handling of finishing up incomplete incremental verify
Now it's run in VerifyStage.

I thought about keeping the file handle open, and resuming reading where
tailVerify left off. But that risks leaking open file handles, until the
GC closes them, if the deferred verification does not get resumed. Since
that could perhaps happen if there's an exception somewhere, I decided
that was too unsafe.

Instead, re-open the file, seek, and resume.

Sponsored-by: Dartmouth College's DANDI project
2021-08-16 14:52:59 -04:00
Joey Hess
dadbb510f6
incremental hashing for fileRetriever
It uses tailVerify to hash the file while it's being written.

This is able to sometimes avoid a separate checksum step. Although
if the file gets written quickly enough, tailVerify may not see it
get created before the write finishes, and the checksum still happens.

Testing with the directory special remote, incremental checksumming did
not happen. But then I disabled the copy CoW probing, and it did work.
What's going on with that is the CoW probe creates an empty file on
failure, then deletes it, and then the file is created again. tailVerify
will open the first, empty file, and so fails to read the content that
gets written to the file that replaces it.

The directory special remote really ought to be able to avoid needing to
use tailVerify, and while other special remotes could do things that
cause similar problems, they probably don't. And if they do, it just
means the checksum doesn't get done incrementally.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 15:43:29 -04:00
Joey Hess
e07625df8a
convert tailVerify to not finalize the verification
Added failIncremental so it can force failure to verify.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 13:39:02 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
899983058f
add: When adding a dotfile, avoid treating its name as an extension. 2021-08-03 12:22:58 -04:00
Joey Hess
13c090b37a
use fastDebug everywhere it can be used
None of these are likely to yeild a noticable speedup though.
2021-04-06 15:41:24 -04:00
Joey Hess
aaba83795b
switch from hslogger to purpose-built Utility.Debug
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.

The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.

Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
2021-04-05 13:40:31 -04:00
Joey Hess
dc9376feeb
optimisation
IORef rather than MVar sped up benchmark mentioned in last commit to
13.0s.

This makes me wonder if changing the interface to not need the IORef
either would improve speed further.
2021-02-10 16:39:41 -04:00
Joey Hess
62e152f210
incremental checksum on download from ssh or p2p
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.

Not tested at all yet, but I imagine it will work!

Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
2021-02-09 17:03:27 -04:00
Joey Hess
ed684f651e
add incremental hashing interface to Backend
As yet unused.

Backend.External could perhaps implement it too, although that would
involve sending chunks of data to it via a pipe or something, so likely
to be slow.
2021-02-09 15:00:51 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
5f4228dc2b
types for async protocol extension
renamed AsyncMessage to ExceptionalMessage to make way for this new
extension.
2020-08-12 12:04:12 -04:00
Joey Hess
ea63d1dfe3
E variant of external backend keys 2020-07-29 17:12:22 -04:00
Joey Hess
b6fa4cb42f
external backends genKey and verifyKeyContent implemented
Only key translation for HasExt remains..
2020-07-29 16:36:32 -04:00
Joey Hess
f75be32166
external backends wip
It's able to start them up, the only thing not implemented is generating
and verifying keys. And, the key translation for HasExt.
2020-07-29 15:23:18 -04:00
Joey Hess
c4cc2cdf4c
rename getKey to genKey
for consistency with external backend protocol
2020-07-20 14:06:05 -04:00
Joey Hess
172743728e
move cryptographicallySecure into Backend type
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.

Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
2020-07-20 12:17:42 -04:00
Joey Hess
d010ab04be
sped up the --all option by 2x to 16x by using git cat-file --buffer
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.

Keys have been prevented from containing newlines since 2011,
commit 480495beb4. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.

Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.

Thanks to Lukey for finding this optimisation.
2020-07-07 13:54:04 -04:00
Joey Hess
3334d3831b
change retrieveExport and getKey to throw exception
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 13:45:53 -04:00
Joey Hess
bbe977a2b6
fsck: Fix reversion in 8.20200226 that made it incorrectly warn that hashed keys with an extension should be upgraded. 2020-03-20 13:09:16 -04:00
Joey Hess
c31e1be781
convert KeySource to RawFilePath 2020-02-21 10:04:44 -04:00
Joey Hess
e468fbc518
simplify 2020-02-20 18:22:31 -04:00