Commit graph

161 commits

Author SHA1 Message Date
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
894716512d
add a UUIDDesc type containing a ByteString
Groundwork for handling uuid.log using ByteString
2019-01-01 16:17:54 -04:00
Joey Hess
9cc6d5549b
convert UUID from String to ByteString
This should make == comparison of UUIDs somewhat faster, and perhaps a
few other operations around maps of UUIDs etc.

FromUUID/ToUUID are used to convert String, which is still used for all
IO of UUIDs. Eventually the hope is those instances can be removed,
and all git-annex branch log files etc use ByteString throughout, for a
real speed improvement.

Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually
contains only alphanumerics and so could be treated as ascii, it's
conceivable that some git-annex repository has been initialized using
a UUID that is not only not a canonical UUID, but contains high unicode
or invalid unicode. Using the filesystem encoding avoids any problems
with such a thing. However, a NUL in a UUID seems extremely unlikely,
so I didn't use encodeBS / decodeBS to avoid their extra overhead in
handling NULs.

The Read/Show instance for UUID luckily serializes the same way for
ByteString as it did for String.
2019-01-01 14:45:33 -04:00
Joey Hess
fd5a392006
cache remotes via annex-speculate-present
Added remote.name.annex-speculate-present config that can be used to
make cache remotes.

Implemented it in Remote.keyPossibilities, which is used by the
get/move/copy/mirror commands, and nothing else. This way, things like
whereis will not show content that's speculatively present.

The assistant and sync --content were not using Remote.keyPossibilities,
and were changed to use it.

The efficiency hit should be small; Remote.keyPossibilities is only
used before transferring a file, which is the expensive operation.
And, it's only doing one lookup of the remoteList and a very cheap
filter over it.

Note that, git-annex still updates the location log when copying content
to a remote with annex-speculate-present set. In this case, the location
tracking will indicate that content is present in the remote. This may
not be wanted for caches, or may not be a real problem for them. TBD.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 14:28:05 -04:00
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
67e46229a5
change Remote.repo to Remote.getRepo
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.

The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.

The additional IO action should have no performance impact as long as
it's simply return.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2018-06-04 15:30:26 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
cf6545198b
a few last changes post xmpp removal 2016-12-28 12:29:21 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
870873bdaa
Removed dependency on json library; all JSON is now handled by aeson.
I've eyeballed all --json commands, and the only difference should be
that some fields are re-ordered.
2016-07-26 19:15:34 -04:00
Joey Hess
5c92b8e034
unused import 2016-07-26 14:52:07 -04:00
Joey Hess
a030d0a8b7
allow using Aeson for streaming JSON output
Keeping Text.JSON use for now, because it seems a better fit for most of
the commands, which don't use very structured JSON objects, but just output
whatever fields suites them. But this lets Aeson be used when a more
structured data type is available to serialize to JSON.
2016-07-26 13:30:07 -04:00
Joey Hess
91df4c6b53
Pass the various gnupg-options configs to gpg in several cases where they were not before.
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.

RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
2016-05-23 17:03:20 -04:00
Gabor Greif
108f829543
Remove redundant Show constraint 2016-01-28 12:34:07 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
b26ce646e4
whereis --json: Urls are now listed inside the remote that claims them, rather than all together at the end. 2016-01-15 14:16:48 -04:00
Joey Hess
640dba43b6
enableremote: List uuids and descriptions of remotes that can be enabled, and accept either the uuid or the description in leu if the name. 2015-10-26 14:55:40 -04:00
Joey Hess
8c54069980
add a hint about syncing when unable to determine remote uuid
syncing gets the uuid in passing..
2015-10-15 15:23:48 -04:00
Joey Hess
45e1a7c361
verify local copy of content with locking 2015-10-09 14:57:32 -04:00
Joey Hess
c5b8484c2e Simplify setup process for a ssh remote.
Now it suffices to run git remote add, followed by git-annex sync. Now the
remote is automatically initialized for use by git-annex, where before the
git-annex branch had to manually be pushed before using git-annex sync.
Note that this involved changes to git-annex-shell, so if the remote is
using an old version, the manual push is still needed.

Implementation required git-annex-shell be changed, so configlist can
autoinit a repository even when no git-annex branch has been pushed yet.
Unfortunate because we'll have to wait for it to get deployed to servers
before being able to rely on this change in the documentation.

Did consider making git-annex sync push the git-annex branch to repos that
didn't have a uuid, but this seemed difficult to do without complicating it
in messy ways.

It would be cleaner to split a command out from configlist to handle
the initialization. But this is difficult without sacrificing backwards
compatability, for users of old git-annex versions which would not use the
new command.
2015-08-05 13:49:58 -04:00
Joey Hess
67f7f1b1cb info: Added json output for "backend usage", "numcopies stats", "repositories containing these files", and "transfers in progress". 2015-06-16 13:50:28 -04:00
Joey Hess
607eed0de2 improve messages 2015-04-30 14:10:28 -04:00
Joey Hess
b14ec45aa8 info dir: Added information about repositories that contain files in the specified directory.
This is a nearly free feature; it piggybacks on the location log lookups
done for the numcopies stats. So, the only extra overhead is updating
the map of repository sizes.

However, I had to switch to Data.Map.Strict, which needs containers 0.5.
If backporting to wheezy, will probably need to revert this commit.
2015-04-12 12:54:41 -04:00
Joey Hess
2b79e6fe08 a few hlints 2015-04-11 00:10:34 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
23d2519be1 Fix wording of message displayed when unable to get a file that is available in untrusted repositories. 2015-01-16 13:29:54 -04:00
Joey Hess
3abb775855 remove unnecessary qualification 2015-01-16 13:14:44 -04:00
Joey Hess
aba3e11776 sync: Now supports remote groups, the same way git remote update does. 2014-12-29 13:42:58 -04:00
Joey Hess
7ae16bb6f7 Revert "let url claims optionally include a suggested filename"
This reverts commit 85df9c30e9.

Putting filename in the claim was a bad idea.
2014-12-11 14:09:57 -04:00
Joey Hess
85df9c30e9 let url claims optionally include a suggested filename 2014-12-11 12:47:57 -04:00
Joey Hess
30bf112185 Urls can now be claimed by remotes. This will allow creating, for example, a external special remote that handles magnet: and *.torrent urls. 2014-12-08 19:15:07 -04:00
Joey Hess
b61c6bc2ff hlint 2014-10-09 15:46:05 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
c784ef4586 unify exception handling into Utility.Exception
Removed old extensible-exceptions, only needed for very old ghc.

Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.

Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.

However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.

Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
2014-08-07 22:03:29 -04:00
Joey Hess
b4cf22a388 pushed checkPresent exception handling out of Remote implementations
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:

1. They can all be caught in one place (Remote.hasKey), and we know
   every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
   Remotes to be able to be implemented without needing to worry about
   catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
   Bool, which all the other relevant remote methods already did.
   I do not see a good way to generalize that type; my previous attempts
   failed miserably.
2014-08-06 13:45:19 -04:00
Joey Hess
fed509fb3e assistant: When there are multiple remotes giving different ways to access the same repository, honor remote cost settings and use the cheapest available.
Note that TransferInfo does not always contain the Remote, although
any transfer added to the TransferQueue does have a Remote in its
TransferInfo. The transferkeys command still accepts a UUID, which is
useful to handle upgrades, where an old assistant version runs the new
transferkeys.

This commit was sponsored by Kalle Svensson.
2014-05-19 16:19:33 -04:00
Joey Hess
f80f0e12fe Simplified repository description line format. The remote name, if any, is always in square brackets after the description. 2014-04-26 21:28:49 -03:00
Joey Hess
fb73792f72 make git-remote-daemon ssh transport robust
* Remote system might be available, and connection get lost. Should
  reconnect, but needs to avoid bad behavior (ie, constant reconnect
  attempts.) Use exponential backoff.

* Detect if old system had a too old git-annex-shell, and show the user
  a nice message in the webapp. Required parsing error messages, so perhaps
  this code shoudl be removed once enough time has passed..

* Switch the protocol to using remote URI's, rather than remote names.
  Names change. Also avoids issues with serialization of names containing
  whitespace.

This is nearly ready for merge into master now. I'd still like to make the ssh
transport smarter about reusing ssh connection caching during git pull.

This commit was sponsored by Jim Paris.
2014-04-09 14:14:55 -04:00
Joey Hess
005e8744c1 Commands that allow specifying which repository to act on using the repository's description will now fail when multiple repositories match, rather than picking a repository at random.
So will --in=

Note that since limitIn is not used by preferred content expressions,
it's ok for it to throw an exception.
2014-03-13 15:36:58 -04:00
Joey Hess
40cec65ace more hlint 2014-02-11 10:48:52 -04:00
Joey Hess
bbef0cddfd improve sync with xmpp and annex-ignore
* sync --content: Honor annex-ignore configuration.
* sync: Don't try to sync with xmpp remotes, which are only currently
  supported when using the assistant.
2014-02-01 10:33:55 -04:00
Joey Hess
b6ba0bd556 sync --content: New option that makes the content of annexed files be transferred.
Similar to the assistant, this honors any configured preferred content
expressions.

I am not entirely happpy with the implementation. It would be nicer if
the seek function returned a list of actions which included the individual
file gets and copies and drops, rather than the current list of calls to
syncContent. This would allow getting rid of the somewhat reundant display
of "sync file [ok|failed]" after the get/put display.

But, do that, withFilesInGit would need to somehow be able to construct
such a mixed action list. And it would be less efficient than the current
implementation, which is able to reuse several values between eg get and
drop.

Note that currently this does not try to satisfy numcopies when
getting/putting files (numcopies are of course checked when dropping
files!) This makes it like the assistant, and unlike get --auto
and copy --auto, which do duplicate files when numcopies is not yet
satisfied. I don't know if this is the right decision; it only seemed to
make sense to have this parallel the assistant as far as possible to start
with, since I know the assistant works.

This commit was sponsored by Øyvind Andersen Holm.
2014-01-19 17:49:54 -04:00
Joey Hess
46cf00ffd8 Fix bug that caused bad information to be written to the git-annex branch when running describe or other commands with a remote that has no uuid.
Still need to fix crash caused by the bad info.
2013-11-09 13:37:30 -04:00
Joey Hess
958312885f webapp: Improve UI around remote that have no annex.uuid set, either because setup of them is incomplete, or because the remote git repository is not a git-annex repository.
Complicated by such repositories potentially being repos that should have
an annex.uuid, but it failed to be gotten, perhaps due to the past ssh repo
setup bugs. This is handled now by an Upgrade Repository button.
2013-11-07 18:02:00 -04:00
Joey Hess
a7821c0581 automatically launch git repository repair
Added a RemoteChecker thread, that waits for problems to be reported with
remotes, and checks if their git repository is in need of repair.

Currently, only failures to sync with the remote cause a problem to be
reported. This seems enough, but we'll see.

Plugging in a removable drive with a repository on it that is corrupted
does automatically repair the repository, as long as the corruption causes
git push or git pull to fail. Some types of corruption do not, eg
missing/corrupt objects for blobs that git push doesn't need to look at.

So, this is not really a replacement for scheduled git repository fscking.
But it does make the assistant more robust.

This commit is sponsored by Fernando Jimenez.
2013-10-27 16:42:13 -04:00
Joey Hess
1ffb3bb0ba add remote fsck interface
Currently only implemented for local git remotes. May try to add support
to git-annex-shell for ssh remotes later. Could concevably also be
supported by some special remote, although that seems unlikely.

Cronner user this when available, and when not falls back to
fsck --fast --from remote

git annex fsck --from does not itself use this interface.
To do so, I would need to pass --fast and all other options that influence
fsck on to the git annex fsck that it runs inside the remote. And that
seems like a lot of work for a result that would be no better than
cd remote; git annex fsck
This may need to be revisited if git-annex-shell gets support, since it
may be the case that the user cannot ssh to the server to run git-annex
fsck there, but can run git-annex-shell there.

This commit was sponsored by Damien Diederen.
2013-10-11 16:03:18 -04:00