Commit graph

502 commits

Author SHA1 Message Date
Joey Hess
b3c9c59d3d
--debug urls
When git-annex used wget and curl, --debug would show urls. So there can't
be any new security problem with doing so.

This commit was sponsored by John Pellman on Patreon.
2018-09-14 12:46:39 -04:00
Joey Hess
773084c49b
S3: Fix url construction bug
When the publicurl has been set to an url that does not end with a slash,
we need to add one in between it and the rest of the url.

As far as I can see, git-annex does not default to such publicurls; it's
careful to end them with slashes. But this was observed in the wild, and
there may be documentation that doesn't include the slash. And it's an easy
mistake to make in any case.

This commit was sponsored by Eric Drechsel on Patreon.
2018-09-14 12:25:23 -04:00
Joey Hess
547d01fd0e
releasing package git-annex version 6.20180913 2018-09-13 15:50:50 -04:00
Joey Hess
677038199c
fix build with older aws
S3: Multipart uploads are now only supported when git-annex is built
with aws-0.16.0 or later, as earlier versions of the library don't
support versioning with multipart uploads.

This will affect the android build, and debian stable also has a too old
aws to support both features at the same time.

This commit was sponsored by Nick Piper on Patreon.
2018-09-13 09:58:39 -04:00
Joey Hess
2743224658
change v6 git-annex add of staged unmodified unlocked file
v6: When a file is unlocked but has not been modified, and the unlocking is
only staged, git-annex add did not lock it. Now it will, for consistency
with how modified files are handled and with v5.

Note the removal of the sameInodeCache check. Otherwise it would see
that the unmodified file is unmodified and stop there. That check seems to have
been copied from the direct mode branch. But, direct mode had a specific
reason to check for unmodified content, that does not apply to v6.

The second pass means there is potential for a race, eg the unlocked
file could be modified in between the first and second passes.
No problem with that, since both passes do the same thing.

This commit was sponsored by Jake Vosloo on Patreon.
2018-09-12 14:00:05 -04:00
Joey Hess
942b466293
wording 2018-09-11 16:03:58 -04:00
Joey Hess
fdbdf64d87
fix reversions due to undocumented and buggy git behavior
* Don't use GIT_PREFIX when GIT_WORK_TREE=. because it seems git
  does not intend GIT_WORK_TREE to be relative to GIT_PREFIX in that
  case, despite GIT_WORK_TREE=.. being relative to GIT_PREFIX.
* Don't use GIT_PREFIX to fix up a relative GIT_DIR, because
  git 2.11 sets GIT_PREFIX set to a path it's not relative to.
  and apparently GIT_DIR is never relative to GIT_PREFIX.

Commit e50ed4ba48 led us down this path
by working around a git bug by relying on the barely documented GIT_PREFIX.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-09-11 15:54:21 -04:00
Joey Hess
7407a80c27
S3: Support AWS_SESSION_TOKEN
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-09-05 15:53:57 -04:00
Joey Hess
b600ad71ce
make linkToAnnex freezeContent the object file
v6: Fix annex object file permissions when git-annex add is run on a
modified unlocked file, and in some related cases.

If a hard link is made, don't freeze it; annex.thin
uses writable object files.

Also: For some reason, linkToAnnex used to thawContent src. I can see no
reason why it needed to do that, so I eliminated that.

This commit was sponsored by Brock Spratlen on Patreon.
2018-09-05 15:27:22 -04:00
Joey Hess
69907e397f
revert a few problem areas of git-annex.cabal patch 2018-09-05 11:47:00 -04:00
Joey Hess
69d4c8dce6
devblog 2018-08-30 15:52:44 -04:00
Joey Hess
f54c72d2e1
Fix build on FreeBSD
This must have been broken for years..

This commit was sponsored by Jack Hill on Patreon.
2018-08-29 12:09:03 -04:00
Joey Hess
c565340adc
stop using external hash programs, since cryptonite is faster
In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external
hashers". Re-benchmarking today, I found cryptonite's sha256 consistently
outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb
files with both sha256 and sha512. And for smaller files, the external
process startup time swamps the hash time.

Perhaps cryptonite has improved. Or it could just do better on my
current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite
is slower in some situations, seems likely it would only be marginally slower;
it's got the same class of highly optimised C code under the hood as coreutils.
The main difference between the two sha256 implementations seems to be
how much of the inner loop they unroll..

This commit was sponsored by Henrik Riomar on Patreon.
2018-08-28 18:10:58 -04:00
Joey Hess
759a87ad70
fix git command queue to be concurrency safe
Probably not noticed until now because the queue is large enough that two
threads each filling theirs at the same time and flushing is unlikely to
happen.

Also made explicit that each worker thread gets its own queue.
I think that was the case before, but if something was put in the queue
before worker threads were forked off, they could have each inherited the
same queue.

Could have gone with a single shared queue, but per-worker queues is more
efficient, because a worker can add lots of stuff to its own queue without
any locking.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-08-28 13:16:33 -04:00
Joey Hess
10138056dc
v6: avoid accidental conversion when annex.largefiles is not configured
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.

Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.

Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.

This commit was supported by the NSF-funded DataLad project.
2018-08-27 14:51:10 -04:00
Joey Hess
50fa17aee6
v6: recover from race between git mv and git-annex get/drop
Update pointer file next time reconcileStaged is run to recover from the
race.

Note that restagePointerFile causes git to run the clean filter,
and that will run reconcileStaged. So, normally by the time the git
annex get/drop command finishes, the race has already been dealt with.
It may be that, in some case, that won't happen and the race will be
dealt with at a later point. git-annex could run reconcileStaged at
shutdown if that becomes a problem.

This does not handle the situation where the git mv is committed before
git-annex gets a chance to run again. git commit does run the clean
filter, and that happens to re-inject the content if it was supposed to
be dropped but is still populated. But, the case where the file was
supposed to be gotten but is not populated is not handled yet.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 15:56:43 -04:00
Joey Hess
5e56d9b620
v6: Update associated files database when git has staged changes to pointer files
This commit was supported by the NSF-funded DataLad project.
2018-08-21 17:02:20 -04:00
Joey Hess
fa44bca8b3
linux standalone: When LOCPATH is already set, use it instead of the bundled locales.
It can be set to an empty string to use the system locales too. Of course
whether that will work depends on the amount of divergence.

This commit was supported by the NSF-funded DataLad project.
2018-08-20 12:20:54 -04:00
Joey Hess
48e9e12961
finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.

What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.

So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.

This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.

This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.

This commit was supported by the NSF-funded DataLad project.
2018-08-14 16:23:58 -04:00
Joey Hess
a96972015d
massive v6 add speed/memory improvement
v6 add: Take advantage of improved SIGPIPE handler in git 2.5 to speed up
the clean filter by not reading the file content from the pipe. This also
avoids git buffering the whole file content in memory.

When built with an older git, still consumes stdin. If built with a newer
git and used with an older one, it breaks, but that's acceptable --
checking the git version every time would make repeated smudge runs slow.

This commit was supported by the NSF-funded DataLad project.
2018-08-09 18:17:46 -04:00
Joey Hess
12460fcea6
make --batch honor matching options
When --batch is used with matching options like --in, --metadata, etc, only
operate on the provided files when they match those options. Otherwise, a
blank line is output in the batch protocol.

Affected commands: find, add, whereis, drop, copy, move, get

In the case of find, the documentation for --batch already said it honored
the matching options. The docs for the rest didn't, but it makes sense to
have them honor them. While this is a behavior change, why specify the
matching options with --batch if you didn't want them to apply?

Note that the batch output for all of the affected commands could
already output a blank line in other cases, so batch users should
already be prepared to deal with it.

git-annex metadata didn't seem worth making support the matching options,
since all it does is output metadata or set metadata, the use cases for
using it in combination with the martching options seem small. Made it
refuse to run when they're combined, leaving open the possibility for later
support if a use case develops.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-08-08 12:07:06 -04:00
Joey Hess
947599aad4
releasing package git-annex version 6.20180807 2018-08-07 16:22:16 -04:00
Joey Hess
2503cd63d0
prep release 2018-08-06 20:30:38 -04:00
Joey Hess
4c918437ab
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.

The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.

Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.

Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.

This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 17:36:30 -04:00
Joey Hess
38ddd6072d
addurl: Include filename in --json-progress output when known. 2018-08-06 12:53:44 -04:00
Joey Hess
1a02fc1159
Fix wrong sorting of remotes when using -J
It was sorting by uuid, rather than cost!

Avoid future bugs of this kind by changing the Ord to primarily compare
by cost, with uuid only used when the cost is the same.

This commit was supported by the NSF-funded DataLad project.
2018-08-03 13:10:50 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
6e6c9cc6d3
Added --accessedwithin matching option.
Useful for dropping old objects from cache repositories.

But also, quite a genrally useful thing to have..

Rather than imitiating find's -atime and other options, all of which are
pretty horrible to use, I made this match files accessed within a time
period, using the same duration format used by git-annex schedule and
--limit-time

In passing, changed the --limit-time option parser to parse the
duration, instead of having it later throw an error.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 15:34:03 -04:00
Joey Hess
fd5a392006
cache remotes via annex-speculate-present
Added remote.name.annex-speculate-present config that can be used to
make cache remotes.

Implemented it in Remote.keyPossibilities, which is used by the
get/move/copy/mirror commands, and nothing else. This way, things like
whereis will not show content that's speculatively present.

The assistant and sync --content were not using Remote.keyPossibilities,
and were changed to use it.

The efficiency hit should be small; Remote.keyPossibilities is only
used before transferring a file, which is the expensive operation.
And, it's only doing one lookup of the remoteList and a very cheap
filter over it.

Note that, git-annex still updates the location log when copying content
to a remote with annex-speculate-present set. In this case, the location
tracking will indicate that content is present in the remote. This may
not be wanted for caches, or may not be a real problem for them. TBD.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 14:28:05 -04:00
Joey Hess
2884637cab
S3: Support credential-less download from remotes configured with public=yes exporttree=yes.
This commit was supported by the NSF-funded DataLad project.
2018-07-31 16:32:43 -04:00
Joey Hess
e1ab01f94d
Fix reversion in display of http 404 errors.
Switch to using http-client for large file downloads caused the reversion;
the code for displaying a 404 response was instead displaying the raw html
document, which is not useful.

This commit was sponsored by Ryan Newton on Patreon.
2018-07-31 12:15:26 -04:00
Joey Hess
e8ff5d8c66
releasing package git-annex version 6.20180719 2018-07-19 13:53:59 -04:00
Joey Hess
d986b57134
reorder 2018-07-18 14:48:06 -04:00
Joey Hess
22ff136230
prep for release tomorrow 2018-07-18 14:45:44 -04:00
Joey Hess
081f8e57c6
Support working trees set up by git-worktree.
Support working trees set up by git-worktree, by setting up some symlinks
such that git-annex links work right.

Also improved support for repositories created with --separate-git-dir.
At least recent git makes a .git file for those (older may have used a
symlink?), so that also needs to be converted to a symlink.

This commit was sponsored by Nick Piper on Patreon.
2018-07-18 14:27:26 -04:00
Joey Hess
e50ed4ba48
work around git bug
Work around git bug that runs smudge/clean filters at the top of the
repository while passing them a relative GIT_WORK_TREE that may point
outside of the repository, by using GIT_PREFIX to get back to the
subdirectory where a relative GIT_WORK_TREE is valid.

git devs have been informed of the bug and may fix it, which could conveivably
break this fix, but as it is, this works back to git 1.7.6.

This commit was sponsored by Jochen Bartl on Patreon.
2018-07-17 14:27:39 -04:00
Joey Hess
50609da787
fix User-Agent reversion
Send User-Agent and any configured annex.http-headers when downloading with
http, fixes reversion introduced when switching to http-client.

This commit was sponsored by mo on Patreon.
2018-07-16 11:56:47 -04:00
Joey Hess
cc2cb46857
unused --from: Allow specifiying a repository by uuid or description.
This commit was sponsored by Jake Vosloo on Patreon.
2018-07-11 16:01:35 -04:00
Joey Hess
25ec8ec4c6
update re writable HOME with standalone bundle 2018-07-10 14:22:37 -04:00
Joey Hess
e802323071
deal with the persistent locpath issue
linux standalone: Generate locale files in ~/.cache/git-annex/locales/ so
they're available even when the standalone tarball is installed in a
directory owned by root.

This avoids a full-on reference counting cleanup hell, by letting old
locale caches linger as long as the standalone bundle directory associated
with them is still around. Old ones get cleaned up.

In the case where the directory has a new bundle unpacked over top of it,
the old locale cache is invalidated and rebuilt. Of course, running
programs using that may get confused, but this was already the case, and
unpacking over top of a bundle is probably not a good idea anyhow.

To support that, added a buildid file, which only needs to be unique across
builds of git-annex with different libc versions. sha1sum of git-annex
seems good enough for that.

Removed debian/patches/standalone-no-LOCPATH as it's no longer
necessary.

This commit was supported by the NSF-funded DataLad project.
2018-07-10 12:13:19 -04:00
Joey Hess
3dd7f450c1
fix p2p --pair
p2p --pair: Fix interception of the magic-wormhole pairing code, which
since 0.8.2 it has sent to stderr rather than stdout.

This is highly annoying because I had asked the magic wormhole developers
for a machine-readable way to get the data, and instead they changed how
the data was output, and didn't even mention this in my issue, or in the
changelog.

Seems this needs to be tested periodically to make sure it's still working.

This commit was sponsored by Ethan Aubin.
2018-07-04 15:14:03 -04:00
Joey Hess
9f3a346f25
fix nested exception bug
Fix reversion introduced in version 6.20180316 that caused git-annex to
stop processing files when unable to contact a ssh remote.

The bug was not in any of the changed lines, but this one in inAnnex:

P2PHelper.checkpresent (Ssh.runProto rmt connpool (cantCheck rmt) fallback) key

cantCheck throws an exception, but that parameter to runProto expects a
value, which it returns. So, inAnnex is returning a Bool containing an
exception. This defeats the usual checks for checkPresent throwing an
exception, crashing git-annex.

Fixed by making runProto take an `Annex a` instead of an `a`, so
passing cantCheck to it doesn't nest exceptions.

This commit was sponsored by andrea rota.
2018-07-03 13:10:43 -04:00
Joey Hess
14557a3ff6
git-annex.cabal: Fix network version.
Needed for hostAddressToTuple.

Which means the build flag for the network-uri split is no longer needed.
2018-07-01 13:07:24 -04:00
Joey Hess
a63bbd868b
make addurl of media url fail when youtube-dl is disabled
addurl: When security configuration prevents downloads with youtube-dl,
still check if the url is one that it supports, and fail downloading it,
instead of downloading the raw web page.
2018-06-28 13:01:18 -04:00
Joey Hess
dc6cb6aa5f
Merge branch 'later' 2018-06-25 21:59:20 -04:00
Joey Hess
3160cadba3 git-annex version 6.20180626
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKKUAw1IH6rcvbA8l2xLbD/BfjzgFAlstCaQACgkQ2xLbD/Bf
 jzh5nxAAn7D9soTI0ex6AVDDo2CjOyTTDVrIcl2h5XizfuUD3ev5P0TR3BZmzpAb
 MI6uaZ8kxqZ/eGAsBTyH9PsV7QVYIdht9t89ytP4xWyTQiOgjyJeA6PnJl4zVK9z
 Y8Of3mlylaz+97+sndljpsvy/KHENrHI7HHd+qxAu7wKysJxG6fJB7CjremkjaCI
 zAwg3mIy72ZKyuR/8hL9puJN9fdfw1ulkzQR+he007e/HkurPCwgRAOYW/Aa2tpY
 Oigdb9a6/0nl/VnOS8ZyHrSPRrhLH9c4IBmsdC1Xt5NDVmID/sWgD9uPF9dsHSMF
 OM25QdSlJ5cSNg+/XCpmmhC9MjgKkuVNpZ/fWBaHFs6KYgGhtZcAayQdz5AmMS2N
 HTPWB1IxZiV5TQHQpLbdH/q3RfNtRq1G1tc24zpd/zdhzijeTM6D8n4No6LXNq8X
 7U0qcrp9TdLOpBCTf6Jrg/7qFaXddHoEW1e3KrsOmB0hlYHuNxfY4bs0+ROeXGOT
 00koezcbF8kEI0ekoDvJjtVqaUq+608YjJZ5v7dE0vbtTj0KGbl5EHwC9atUluCX
 MHyTDY89uq68g4HIDytL001ZLvE3EUGJc4jh3+OMDzuZSKB5uwJIIky+qIaQu34K
 QJrZuyAIY0sVFV6LUX9nwqTW6Nnx/bB+kZ6k0+gx+Lpf7pUpE+o=
 =kex4
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKKUAw1IH6rcvbA8l2xLbD/BfjzgFAlsxnX4ACgkQ2xLbD/Bf
 jzjK1xAAnJ58ZxLyTYlCZRcKiR81UHS/Mk6+SDAjRIRbT0SsY+6gSP55XKjrcuOb
 Jatp+6cNNSgk2lBpn37mq+rYIqboFh9moDRK7JSh1mDHCVtIwdARGblFRfuwaWPi
 xHnu+Pj43+SP7OF+8qP8/kDM+js3iMS+0gvBBz8pQN/yJDROXii6u0eONOd7vbER
 iRY9QpJdj5lp3hjaWfXt5iJC0re0eOAY4eUSHPsFIASysShnn33dFPOZ2hbhRKjR
 unQHUVIUE+ehmW3w9qIqn+9v2kca7laGK11cvzYRpmu/9rrvpf+RF1h42S8822dP
 CKHvxDkBGbyqTA+F9/6zpU1i9/ARgHFDpScRcdq7ZJi9FbWabKDklHCsgxwrkdXb
 +FXgb7N5Sa4+eVDNUf4rxldtLPX53nrtZ3IqrGiCWApCvbysNyP5kE0nix02l9z2
 xzY2vlpicx7TOMoO9mZesSFNgRzuFAbbya/zDJrz+xfgSRYXRYg58yTpmhpTFvSI
 h3Fw6+MYvehvRdAweLtoQt2p/UV2MAWrTpNzFoqgf2OCQOiH97ACDHn8Yki9rnQi
 NuMsqv9WOYQs4SaygDZMKemgAxftf3uaXiBW0RzHHwwWnDjHhqsEioOvOhNNyZbz
 U3OjKrH1JZlkNHlIBQD4BsWGLlIct66ZTU3k2OxPEp+mpEG/Xi4=
 =p+cW
 -----END PGP SIGNATURE-----

Merge tag '6.20180626' - previously embargoed security release
2018-06-25 21:56:43 -04:00
Joey Hess
6091b7b9db
info: Display uuid and description when a repository is identified by uuid, and for "here". 2018-06-24 17:38:18 -04:00
Joey Hess
a5228ac765
Support configuring remote.web.annex-cost and remote.bittorrent.annex-cost
Seems that has never worked before due to oversight.
2018-06-24 17:31:22 -04:00
Joey Hess
57dc30a029
finalize release 2018-06-22 10:37:01 -04:00
Joey Hess
dab55715da
add link to advistory 2018-06-22 10:27:22 -04:00