Commit graph

1052 commits

Author SHA1 Message Date
Joey Hess
d1961e4498
back out incorrect IO interleaving change
Fix regression in last release that crashes when using --all or running
git-annex in a bare repository. May have also affected git-annex unused and
git-annex info.

Reversed the order of the (++) in Annex.Branch.files so --all will stream
lazily still when there are not a bunch of uncommitted journal files.
Added a todo to maybe improve this later.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-05-08 13:54:42 -04:00
Joey Hess
bea0ad220a
avoid --all buffering list of all keys
In Annex.Branch.branch, the (++) was killing laziness.
Rewrote so it streams lazily.

filterM also kills laziness, so made loggedKeys use a Unchecked type,
and check if the key is dead in the seek loop.

Note that loggedKeysFor still buffers, so git-annex info <remote> and
git-annex unused --from remote still use more memory than necessary.

Also removed some unused functions from Annex.Journal.
2018-04-26 16:00:20 -04:00
Joey Hess
f56594af9e
finish fixing inverted Ord for TrustLevel
Flipped all comparisons. When a TrustLevel list was wanted from Trusted
downwards, used Down to compare it in that order.

This commit was sponsored by mo on Patreon.
2018-04-13 15:17:54 -04:00
Joey Hess
a0e4b9678b
fix inverted Ord for TrustLevel (intermediate commit)
This commit removes the Ord and Enum instances, commenting out all code
that depends on them, to make sure that all code effected by the
inversion fix has been identified.

(Assuming no ifdefs involve TrustLevel.)

The next commit will fix up all the identified code.
2018-04-13 14:50:14 -04:00
Joey Hess
af8546990d
move: --safe/--unsafe and potential drop race fix
move: Added --safe option, which makes move honor numcopies settings.
Also --unsafe enables the default behavior, anticipating that the
default may one day change.

This commit was sponsored by Ethan Aubin.
2018-04-09 16:20:10 -04:00
Joey Hess
de552bd469
fix thinko for the second time
It's almost like this part of haskell syntax is not very good..
2018-04-09 13:10:44 -04:00
Joey Hess
61aa56465b
fix pattern match 2018-04-06 23:11:20 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
2ec07bc29f
Avoid running annex.http-headers-command more than once. 2018-04-04 15:15:08 -04:00
Joey Hess
ef389722ae
don't copy old date metadata when adding new version of a file
When adding a new version of a file, and annex.genmetadata is enabled,
don't copy the data metadata from the old version of the file, instead use
the mtime of the file. Rationalle being that the user has requested to
generate metadata and so would expect to get the new mtime into metadata.

Also, avoid warning about copying metadata when all the old metadata is
date metadata. Which was rather the harder part.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-04-04 13:58:16 -04:00
Joey Hess
9ec1d6b077
add units 2018-03-29 13:31:53 -04:00
Joey Hess
961fa377d9
Also do forward retrying in cases where no exception is thrown, but the transfer failed.
I think this used to be the case, but it was accidentially lost way back in
commit 3887432c54. Normally, transfers do not
throw exceptions, so probably forward retrying was rarely done due to that
oversight.

This also affects the new annex.retry etc configuration. If a transfer
fails, without making any progress, eg because the file is not present on
the remote or the remote is not accessible, it will now retry when
configuration calls for it. In some cases such a retry is not desirable,
for example the remote could be accessible and not have a copy of the file
that the local repo thinks it has. I see no way to distinguish such cases
from cases where a retry should really be done. So, it'll be up to the user
to configure it to work for them.
2018-03-29 13:22:49 -04:00
Joey Hess
46d4316954
implement annex.retry et al
Added annex.retry, annex.retry-delay, and per-remote versions to configure
transfer retries.

This commit was supported by the NSF-funded DataLad project.
2018-03-29 13:04:07 -04:00
Joey Hess
ac6f58d642
fix ssh warmup hang
Fix race condition in ssh warmup that caused git-annex to get stuck and
never process some while when run with high levels of concurrency.

So far, I've isolated the problem to processTranscript, which hangs
reading output from ssh in this situation. I don't yet understand why
processTranscript behaves that way.

Since here we don't care about the ssh output, and only want to /dev/null
it, changed to not use processTranscript, avoiding its problem.

This commit was supported by the NSF-funded DataLad project.
2018-03-15 15:04:15 -04:00
Joey Hess
ed81762c86
avoid compiler warning
add type sig so it's clear createtfile returns unit
2018-03-15 13:21:32 -04:00
Joey Hess
10d3b7fc62
Fix reversion introduced in 6.20171214 that caused concurrent transfers to incorrectly fail with "transfer already in progress".
Avoid creating transfer info file before transfer lock is created and
locked.

The wrong order for one thing caused transfer info to be overwritten
when a transfer was already in progress.

But worse, it caused checkTransfer to see the transfer info,
and so lock the transfer lock in order to verify the transfer was not in
progress. Which in a concurrent situation, prevented the transferrer
from locking the transfer lock, so it failed with "transfer already in
progress".

Note that the transferinfo command does not lock the transfer lock
before creating the transfer info. But, that's only run after
recvkey is running, and recvkey does lock the transfer lock, so that
seems more or less ok. (Other than being a super complicated legacy mess
that the P2P code has mostly obsoleted now.)

This commit was supported by the NSF-funded DataLad project.
2018-03-14 18:55:34 -04:00
Joey Hess
4015c5679a
force verification when resuming download
When resuming a download and not using a rolling checksummer like rsync,
the partial file we start with might contain garbage, in the case where a
file changed as it was being downloaded. So, disabling verification on
resumes risked a bad object being put into the annex.

Even downloads with rsync are currently affected. It didn't seem worth the
added complexity to special case those to prevent verification, especially
since git-annex is using rsync less often now.

This commit was sponsored by Brock Spratlen on Patreon.
2018-03-13 14:50:49 -04:00
Joey Hess
31e1adc005
deal with unlocked files
P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.

Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.

As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.

This commit was sponsored by Jack Hill on Patreon.
2018-03-13 14:27:14 -04:00
Joey Hess
3dd43df9c2
Better ssh connection warmup when using -J for concurrency.
Avoids ugly messages when forced ssh command is not git-annex-shell.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-03-07 17:30:14 -04:00
Joey Hess
f4103744c3
make sure that lockContentShared is always paired with an inAnnex check
lockContentShared had a screwy caveat that it didn't verify that the content
was present when locking it, but in the most common case, eg indirect mode,
it failed to lock when the content is not present.

That led to a few callers forgetting to check inAnnex when using it,
but the potential data loss was unlikely to be noticed because it only
affected direct mode I think.

Fix data loss bug when the local repository uses direct mode, and a
locally modified file is dropped from a remote repsitory. The bug
caused the modified file to be counted as a copy of the original file.
(This is not a severe bug because in such a situation, dropping
from the remote and then modifying the file is allowed and has the same
end result.)

And, in content locking over tor, when the remote repository is
in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy.

So, make lockContentShared do its own inAnnex check. This could perhaps
be optimised for direct mode, to avoid the check then, since locking
the content necessarily verifies it exists there, but I have not bothered
with that.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-07 14:23:52 -04:00
Joey Hess
09e73a3ab6
annex.merge-annex-branches
Added annex.merge-annex-branches config setting which can be used to
disable automatic merge of git-annex branches.

I wonder if git-annex merge/sync/assistant should disable this
setting? Not sure yet, so have not done so. May be that users will not set
it in git config, but pass it via -c to commands that need it.

Checking the config setting adds a very small overhead, but it's
only checked once per command so should be insignificant.

This commit was supported by the NSF-funded DataLad project.
2018-02-22 14:25:32 -04:00
Joey Hess
2e25185a9c
Remove temporary code added in 6.20160619 to prime the mergedrefs log.
Repositories that are upgraded from before that version to this
one will not break, but will just not see the benefit of the mergedrefs log
speeding things up, until one new ref gets merged in.
2018-02-22 12:31:27 -04:00
Joey Hess
2b66492d6e
Improve startup time for commands that do not operate on remotes
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
2018-01-09 16:22:07 -04:00
Joey Hess
faf03ee4da
more core.sharedRepository perm fixes
Fix more places where files in .git/annex/ were written with modes that
did not take the core.sharedRepository config into account.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-01-04 14:46:58 -04:00
Joey Hess
24df95f0f6
Fix several places where files in .git/annex/ were written with modes that did not take the core.sharedRepository config into account.
git grep writeFile finds some more that might also be problems, but
for now I've concentrated on .git/annex/ log files. There are certianly
cases where writeFile is not a problem too.

This commit was sponsored by mo on Patreon.
2018-01-02 17:25:25 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
2bfdd690e2
addurl: Fix encoding of filename queried from youtube-dl when in --fast mode.
And also now in non-fast mode, since it was just changed to query for the
filename separately.

And avoid processTranscript which mixed up stdout and stderr and could have
led to weirdness if there were warnings that didn't get suppressed.
2017-12-31 15:19:01 -04:00
Joey Hess
fcdd9ce788
repeated addurl behavior reversion fix
addurl: When the file youtube-dl will download is already an annexed file,
don't download it again and fail to overwrite it, instead just do nothing,
like it used to when quvi was used.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-12-31 14:55:51 -04:00
Joey Hess
1f5bf73af0
Revert "git-annex.cabal: Add back custom-setup stanza, so cabal new-build works."
This reverts commit 51228c2306.

No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.

With cabal, System.Posix.Process and System.Posix.Env are both missing.
2017-12-31 14:09:41 -04:00
Joey Hess
51228c2306
git-annex.cabal: Add back custom-setup stanza, so cabal new-build works.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-12-31 13:54:41 -04:00
Joey Hess
664a4d1873
remove ifdef for older base than git-annex.cabal allows 2017-12-14 13:45:50 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
f824a1f41d
reorg 2017-12-14 11:26:59 -04:00
Joey Hess
3cc94c1667
.noannex file
A top-level .noannex file will prevent git-annex init from being used in a
repository. This is useful for repositories that have a policy reason not
to use git-annex. The content of the file will be displayed to the user who
tries to run git-annex init.

This also affects git annex reinit and initialization via the webapp.
It does not affect automatic inits, when there's a sibling git-annex branch
already.

This commit was supported by the NSF-funded DataLad project.
2017-12-13 14:34:32 -04:00
Joey Hess
cc6f5d6e49
avoid trying youtube-dl for ftp and file url schemes
This commit was sponsored by John Peloquin on Patreon.
2017-12-11 12:46:34 -04:00
Joey Hess
8990afaef0
fix regression in addurl --fast caused by youtube-dl support
Similar to c6e4bc0a22 but another code
path. As well as using youtube-dl unecessarily, it used the filename it
comes up with, which while nice for youtube videos, is not right for
other files.

This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.

This commit was supported by the NSF-funded DataLad project.
2017-12-08 14:51:17 -04:00
Joey Hess
c6e4bc0a22
fix regression in addurl --file caused by youtube-dl support
Now youtubeDlCheck downloads the beginning of the url's content and
checks if it's html, only when it is does it pass it off the youtube-dl
to check if it supports it.

This means more work is done for urls that youtube-dl does support,
but is probably more efficient for other urls, since it only downloads
the first chunk of content, while youtube-dl probably downloads more.

As well as the reported bug, this also fixes behavior when an url
was added with youtube-dl, but the url content has now changed from
a html page to something else. Remote.Web.checkKey used to wrongly
succeed in that situation, since youtube-dl said sure it can download
that something else.

This commit was supported by the NSF-funded DataLad project.
2017-12-06 13:22:31 -04:00
Joey Hess
7c98f14391
avoid build warning when built w/o dbus 2017-12-06 10:54:48 -04:00
Joey Hess
fc845e6530
more lambda-case conversion 2017-12-05 15:00:50 -04:00
Joey Hess
639a6df58a
fix windows build 2017-12-05 13:11:03 -04:00
Joey Hess
1228fe8c86
honor annex.diskreserve when running youtube-dl
This commit was sponsored by André Pereira on Patreon.
2017-11-30 16:14:36 -04:00
Joey Hess
bbedc1c265
check youtube-dl for --fast and --relaxed when adding new file
The filename comes from youtube-dl also.

This commit was sponsored by Denis Dzyubenko on Patreon.
2017-11-30 14:57:20 -04:00
Joey Hess
2528e3ddb0
rethought --relaxed change
Better to make it not be surprising and slow, than surprising and fast.
--raw can be used when it needs to be really fast.

Implemented adding a youtube-dl supported url to an existing file.

This commit was sponsored by andrea rota.
2017-11-30 14:13:20 -04:00
Joey Hess
8a0038ec23
avoid warning when youtube-dl is not installed
If a user does not have it installed, don't warn on every imported item
about it.
2017-11-30 13:43:55 -04:00
Joey Hess
22a9389bc7
fix build 2017-11-30 13:21:19 -04:00
Joey Hess
31b4d7c6d0
pass git config options to youtube-dl --simulate
Decided not to --ignore-config by default. It the user has something in
their youtube-dl config files that breaks git-annex they can configure
it to use that option.
2017-11-29 20:07:03 -04:00
Joey Hess
24f27ec39d
convert importfeed to youtube-dl
Fully working, including --fast/--relaxed.

Note that, while git-annex addurl --relaxed is not going to check
youtube-dl, I kept git annex importfeed --relaxed checking it.
Thinking is that, let's not break people's importfeed cron jobs, and
importfeed does not typically have to check a large number of new items,
so it's ok if it's a little bit slower when used with youtube playlist
feeds.

importfeed's behavior is also improved (?) when a feed has links in it
to non-media files. Before, those were skipped. Now, the content of the
link is downloaded. This had to be done, because trying to use
youtube-dl is slow, and if those were skipped, it would have to check
every time importfeed was run. While this behavior change may not be
desirable for some feeds, that intersperse links to web pages with
enclosures, it will be desirable for other feeds, that have
non-enclosure directy links to media files.

Remove old quvi modules.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-29 17:30:02 -04:00
Joey Hess
99bebdface
youtube-dl working
Including resuming and cleanup of incomplete downloads.

Still todo: --fast, --relaxed, importfeed, disk reserve checking,
quvi code cleanup.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-11-29 16:40:32 -04:00
Joey Hess
4e7e1fcff4
add gitAnnexTmpWorkDir and withTmpWorkDir
Needed to run youtube-dl in, but could also be useful for other stuff.

The tricky part of this was making the workdir be cleaned up whenever the
tmp object file is cleaned up.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-11-29 13:53:39 -04:00