Commit graph

1043 commits

Author SHA1 Message Date
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
cb375977a6
follow-on changes from MetaData type changes
Including writing and parsing the metadata log files with
bytestring-builder and attoparsec.
2019-01-07 15:51:05 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
9cc6d5549b
convert UUID from String to ByteString
This should make == comparison of UUIDs somewhat faster, and perhaps a
few other operations around maps of UUIDs etc.

FromUUID/ToUUID are used to convert String, which is still used for all
IO of UUIDs. Eventually the hope is those instances can be removed,
and all git-annex branch log files etc use ByteString throughout, for a
real speed improvement.

Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually
contains only alphanumerics and so could be treated as ascii, it's
conceivable that some git-annex repository has been initialized using
a UUID that is not only not a canonical UUID, but contains high unicode
or invalid unicode. Using the filesystem encoding avoids any problems
with such a thing. However, a NUL in a UUID seems extremely unlikely,
so I didn't use encodeBS / decodeBS to avoid their extra overhead in
handling NULs.

The Read/Show instance for UUID luckily serializes the same way for
ByteString as it did for String.
2019-01-01 14:45:33 -04:00
Joey Hess
2e069eb9f6
use putBucket to future-proof
New fields can be added to PutBucket in the future.
2018-12-31 13:09:20 -04:00
Joey Hess
3fdc6fdfa9
remove unused import 2018-12-30 15:18:49 -04:00
Joey Hess
a26514d67e
Fix doubled progress display when downloading an url when -J is used.
downloadUrl uses meteredFile, which sets up one progress meter,
and Remote.Web also uses metered, so two progress meters are displayed for
the same download.

Reversion introduced with the http-conduit switch in
c34152777b -- I don't know why the extra
call to metered was added there.

When -J is not used, the extra progress meter didn't display,
but an extra blank line did get output, which is also fixed.

This commit was sponsored by John Pellman on Patreon.
2018-12-30 12:29:49 -04:00
Joey Hess
3f587d447a
fix webdav reversion
webdav: When initializing, avoid trying to make a directory at the top of
the webdav server, which could never accomplish anything and failed on
nextcloud servers. (Reversion introduced in version 6.20170925.)

This commit was sponsored by mo on patreon.
2018-12-10 12:49:51 -04:00
Joey Hess
4579dd6201
S3: Improve diagnostics when a remote is configured with exporttree and versioning, but no S3 version id has been recorded for a key.
When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.

When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.

In both cases, display a hopefully useful warning.

This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.

Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-12-06 13:44:37 -04:00
Joey Hess
1308a76bf1
deMaybe credPairRemoteKey
It's always Just
2018-12-04 13:37:43 -04:00
Joey Hess
a25fef36ad
fix json for exportedtrees in conflict
Repeating the same json field with multiple values tends to not be
supported well by json parsers, so list the trees separated by spaces.
2018-12-03 14:43:59 -04:00
Joey Hess
b8f9dea27d
add exportedtree to info
info: When used with an exporttree remote, includes an "exportedtree" info,
which is the tree last exported to the remote. During an export conflict,
multiple values will be listed.

This commit was sponsored by John Pellman on Patreon.
2018-12-03 14:36:00 -04:00
Robert Schütz
32017f8082
replace TORRENT by WITH_TORRENTPARSER 2018-11-27 12:29:25 -04:00
Joey Hess
370757087d
catch lockContentForRemoval exception
removeKey should not throw exceptions, so catch exception there

In Assistant.Unused, keep trying to drop other keys if one drop fails
2018-11-15 15:39:57 -04:00
Joey Hess
d65df7ab21
improve messages around export conflicts
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-11-13 15:50:06 -04:00
Joey Hess
c472c268c4
webapp: Fixed a crash when adding a git remote.
Reversion introduced in 2b66492d6e which added a new cache that needs to be
cleared.
2018-10-29 16:01:08 -04:00
Joey Hess
a622488758
remove CHECKURL-MULTI single url response special case
Removed undocumented special case in handling of a CHECKURL-MULTI response
with only a single file listed. Rather than ignoring the url that was in
the response, use it. This allows external special remotes that want to
provide some better url to do so, although I don't entirely agree with
using CHECKURL-MULTI to accomplish that. I'm more of the feeling that an
undocumented special case that throws data away is just not a good idea.

This could in theory break some external special remote program that relied
on the current behavior, but its seems unlikely that it would because such
a program must already handle the multiple url case, unless it only ever
provides a single url response to CHECKURL-MULTI.

Make addurl --file work with a single item CHECKURL-MULTI response.
It already did for external special remotes due to the special case,
but now it also will for builtin ones like the BitTorrent special remote.

This commit was sponsored by Ilya Shlyakhter on Patron.
2018-10-29 14:52:12 -04:00
Joey Hess
fcca7adaff
instrument P2P --debug with connection and thread info
For debugging http://git-annex.branchable.com/bugs/annex_get_-J_16_via_ssh_stalls_/

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 15:52:11 -04:00
Joey Hess
c24e255de1
Fix concurrency bug that occurred on the first download from an exporttree remote
Block other threads while the export database is being constructed (or
updated) by the first thread to try to access it.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 12:59:10 -04:00
Joey Hess
a9dd087074
centralized "yes"/"no" parsing
This commit was sponsored by Jack Hill on Patreon.
2018-10-10 11:14:27 -04:00
Joey Hess
6f0d8870df
Fix crash when exporttree is set to a bad value.
Made it impossible to recover from setting a bad value since enableremote
to change it would crash.

This commit was sponsored by Henrik Riomar on Patreon.
2018-10-10 10:44:54 -04:00
Joey Hess
451171b7c1
clean up url removal presence update
* rmurl: Fix a case where removing the last url left git-annex thinking
  content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
  used to update the presence information of the external special remote
  that called them; this was not documented behavior and is no longer done.

Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.

In AddUrl, had to make it update location tracking, to handle the
non-web-url case.

This commit was sponsored by Ewen McNeill on Patreon.
2018-10-04 17:35:49 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
6134431254
clean P2P protocol shutdown on EOF try 2
Same goal as b18fb1e343 but without
breaking backwards compatability. Just return IO exceptions when running
the P2P protocol, so that git-annex-shell can detect eof and avoid the
ugly message.

This commit was sponsored by Ethan Aubin.
2018-09-25 16:49:59 -04:00
Joey Hess
bc31b93c77
remote.name.annex-security-allow-unverified-downloads
Added remote.name.annex-security-allow-unverified-downloads, a per-remote
setting for annex.security.allow-unverified-downloads.

This commit was sponsored by Brock Spratlen on Patreon.
2018-09-25 15:34:47 -04:00
Joey Hess
773084c49b
S3: Fix url construction bug
When the publicurl has been set to an url that does not end with a slash,
we need to add one in between it and the rest of the url.

As far as I can see, git-annex does not default to such publicurls; it's
careful to end them with slashes. But this was observed in the wild, and
there may be documentation that doesn't include the slash. And it's an easy
mistake to make in any case.

This commit was sponsored by Eric Drechsel on Patreon.
2018-09-14 12:25:23 -04:00
Joey Hess
677038199c
fix build with older aws
S3: Multipart uploads are now only supported when git-annex is built
with aws-0.16.0 or later, as earlier versions of the library don't
support versioning with multipart uploads.

This will affect the android build, and debian stable also has a too old
aws to support both features at the same time.

This commit was sponsored by Nick Piper on Patreon.
2018-09-13 09:58:39 -04:00
Joey Hess
445ea66732
simplify 2018-09-06 16:07:16 -04:00
Joey Hess
b7daf2685f
support public versioned S3 access
Makes git annex whereis display the versionId urls.

And, when a s3 remote is enabled without creds, git-annex will use the
versionId urls to access its contents.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-09-06 14:31:41 -04:00
Joey Hess
7407a80c27
S3: Support AWS_SESSION_TOKEN
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-09-05 15:53:57 -04:00
Joey Hess
53d839d543
more efficient encoding 2018-08-31 13:49:08 -04:00
Joey Hess
b3d42283ad
use per-remote metadata storage for S3 version ID
Since the same key can be stored in a versioned S3 bucket multiple times
with different version IDs, this allows tracking them all. Not currently
needed, but if we ever want to drop from a versioned S3 bucket, we'll
need to know them all.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 13:27:29 -04:00
Joey Hess
6b75b9c448
turn on appendonly when versioning is enabled 2018-08-31 10:53:07 -04:00
Joey Hess
19dcff2b71
use S3 version ID for retrieval
Have to store the S3 object along with the version ID, so retrieval can
use the same object.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 15:37:08 -04:00
Joey Hess
794e9a7a44
store S3 version IDs
Only done when versioning=yes is configured. It could always do it when
S3 sends back a version id, but there may be buckets that have
versioning enabled by accident, so it seemed better to honor the
configuration.

S3's docs say version IDs are "randomly generated", so presumably
storing the same content twice gets two different ones not the same one.
So I considered storing a list of version IDs for a key. That would
allow removing the key completely. But.. The way Logs.RemoteState works,
when there are multiple writers, the last writer wins. So storing a list
would need a different log format that merges, which seemed overkill to support
removing a key from an append-only remote.

Note that Logs.RemoteState for S3 is now dedicated to version IDs.
If something else needs to be stored, a new log will be needed to do it.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 14:30:56 -04:00
Joey Hess
0ff5a41311
S3 versioning=yes config
Not yet used.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 13:45:28 -04:00
Joey Hess
358178fbfb
don't untrust appendonly exports
Make exporttree=yes remotes that are appendonly not be untrusted, and not force
verification of content, since the usual concerns about losing data when an
export is updated by someone else don't apply.

Note that all the remote operations on keys are left as usual for
appendonly export remotes, except for storing content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:48:04 -04:00
Joey Hess
8b39db20b5
export appendonly support
Make `git annex export` check appendonly when removing a file from an
export, and not update the location log, since the remote still contains
the content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:18:20 -04:00
Joey Hess
02630b39ee
add Remote.readonly
Does nothing yet.

Considered making bup readonly, but while the content can't be removed,
it is able to delete a branch, so didn't.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:12:18 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
2884637cab
S3: Support credential-less download from remotes configured with public=yes exporttree=yes.
This commit was supported by the NSF-funded DataLad project.
2018-07-31 16:32:43 -04:00
Joey Hess
9f3a346f25
fix nested exception bug
Fix reversion introduced in version 6.20180316 that caused git-annex to
stop processing files when unable to contact a ssh remote.

The bug was not in any of the changed lines, but this one in inAnnex:

P2PHelper.checkpresent (Ssh.runProto rmt connpool (cantCheck rmt) fallback) key

cantCheck throws an exception, but that parameter to runProto expects a
value, which it returns. So, inAnnex is returning a Bool containing an
exception. This defeats the usual checks for checkPresent throwing an
exception, crashing git-annex.

Fixed by making runProto take an `Annex a` instead of an `a`, so
passing cantCheck to it doesn't nest exceptions.

This commit was sponsored by andrea rota.
2018-07-03 13:10:43 -04:00
Joey Hess
dc6cb6aa5f
Merge branch 'later' 2018-06-25 21:59:20 -04:00
Joey Hess
a5228ac765
Support configuring remote.web.annex-cost and remote.bittorrent.annex-cost
Seems that has never worked before due to oversight.
2018-06-24 17:31:22 -04:00
Joey Hess
05ecee0db4
set ddar to RetrievalAllKeysSecure
Based on information from Robie Basak.
2018-06-21 16:38:47 -04:00