Ignore annex.numcopies set to 0 in gitattributes or git config, or by
git-annex numcopies or by --numcopies, since that configuration would make
git-annex easily lose data. Same for mincopies.
This is a continuation of the work to make data only be able to be lost
when --force is used. It earlier led to the --trust option being disabled,
and similar reasoning applies here.
Most numcopies configs had docs that strongly discouraged setting it to 0
anyway. And I can't imagine a use case for setting to 0. Not that there
might not be one, but it's just so far from the intended use case of
git-annex, of managing and storing your data, that it does not seem like
it makes sense to cater to such a hypothetical use case, where any
git-annex drop can lose your data at any time.
Using a smart constructor makes sure every place avoids 0. Note that this
does mean that NumCopies is for the configured desired values, and not the
actual existing number of copies, which of course can be 0. The name
configuredNumCopies is used to make that clear.
Sponsored-by: Brock Spratlen on Patreon
When this option is not used, there should be effectively no added
overhead, thanks to the optimisation in
b3cd0cc6ba.
When an action fails on a file, the size of the file still counts toward
the size limit. This was necessary to support concurrency, but also
generally seems like the right choice.
Most commands that operate on annexed files support the option.
export and import do not, and I don't know if it would make sense for
export to.. Why would you want an incomplete export? sync doesn't, and
while it would be easy to make it support it for transferring files,
it's not clear if dropping files should also take the size limit into
account. Commands like add that don't operate on annexed files don't
support the option either.
Exiting 101 not yet implemented.
Sponsored-by: Denis Dzyubenko on Patreon
The normalisation of filenames turns out to be the tricky part here,
because the associated files coming out of the keys db may look like
"./foo/bar" or "../bar". For the former to match a glob like "foo/*",
it needs to be normalised.
Note that, on windows, normalise "./foo/bar" = "foo\\bar"
which a glob like "foo/*" won't match. So the glob is matched a second
time, on the toInternalGitPath, so allowing the user to provide a glob
with the slashes in either direction. However, this still won't support
some wacky edge cases like the user providing a glob of "foo/bar\\*"
Sponsored-by: Dartmouth College's Datalad project
If it's passed a ConfigKey such as annex.version, avoid returning
an empty remote name and return Nothing instead. Also, foo.bar.baz is
not treated as a remote named "bar".
Most of the changes here involve global option parsing: GlobalSetter
changed so it can both run an Annex action to set state, but can also
change the AnnexRead value, which is immutable once the Annex monad is
running.
That allowed a debugselector value to be added to AnnexRead, seeded
from the git config. The --debugfilter option's GlobalSetter then updates
the AnnexRead.
This improved GlobalSetter can later be used to move more stuff to
AnnexRead. Things that don't involve a git config will be easier to
move, and probably a *lot* of things can be moved eventually.
fastDebug, while implemented, is not used anywhere yet. But it should be
fast..
Since that can lead to data loss, which should never be enabled by an
option other than --force.
I suppose that using --trust was in some situation, safer than --force,
because it doesn't entirely disable checking for data loss, but only
disables checking involving data that is on the specified repository.
But it seems better to be able to say that data loss only happens with
--force.
This commit was sponsored by Graham Spencer on Patreon.
This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.
Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.
This commit was sponsored by Denis Dzyubenko on Patreon.
It got broken in several ways by the streaming seeking optimisations
around version 8.20201007.
Moved time limit checking out of the matcher, which was a hack in the
first place. So everywhere that uses Limit.getMatcher needs to check
time limit. Well, almost everywhere. Command.Info uses it, but it does
not make sense to time limit getting info. And Command.MultiCast uses it
just to build up a list of files that then get passed to a command, so
it would never have hit the timeout in a useful way.
This implementation is a little more expensive when at time limit than
necessary, since it continues seeking only to discard everything after the
time limit. I did try making it close the file handles to force a faster
shutdown, but that didn't work and hung. Could certianly be improved
somehow, but seeking is probably not the expensive bit when a time limit
is hit, so this seems acceptable for now.
Always run Git.Config.store, so when the git config gets reloaded,
the override gets re-added to it, and changeGitRepo then calls extractGitConfig
on it and sees the annex.* settings from the override.
Remove any prior occurance of -c v and add it to the end. This way,
-c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2
Note that, if git had some multiline config that got built up by
multiple -c's, this would not work still. But it never worked because
before the bug got fixed in the first place, the -c value was repeated
many times, so the multivalue thing would have been wrong. I don't think
-c can be used with multiline configs anyway, though git-config does
talk about them?
Avoid running a large number of git cat-file child processes when run with
a large -J value.
This implementation takes care to avoid adding any overhead to git-annex
when run without -J. When run with -J, there is a small bit of added
overhead, to manipulate the resource pool. That optimisation added a
fair bit of complexity.
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.
So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
The 'fail' method has been moved to the 'MonadFail' class. I made the changes
so that the code still compiles with previous versions of 'base' that don't
have the new MonadFail class exported by Prelude yet.
The parser and looking up config keys in the map should both be faster
due to using ByteString.
I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.
Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.
The only remaining version ifdefs in the entire code base are now a couple
for aws!
This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.
This commit was sponsored by Ilya Shlyakhter.
Added the ability to run one job per CPU (core), by setting annex.jobs=cpus,
or using option --jobs=cpus or -Jcpus.
Built with future expansion in mind, including not defaulting matching on
Concurrency so more constructors can later be added, and using "cpu"
instead of "0".
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
No deprecation warning at run time, just one on the man page.
One thing findref remains able to do that find cannot is to run in a bare
repo. Find was made to refuse to run in a bare repo because it seemed
confusing for it to not list any files ever in that situation. It would be
better for find --branch to work in a bare repo but not without --branch
but I don't currently have a way to do that.
Probably a better solution would be to make git-annex in a bare repo
default to --branch master or something like that instead of --all.
This commit was sponsored by Denis Dzyubenko on Patreon.
* findref: Support file matching options: --include, --exclude,
--want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin
* Commands supporting --branch now apply file matching options --include,
--exclude, --want-get, --want-drop to filenames from the branch.
Previously, combining --branch with those would fail to match anything.
* add, import, findref: Support --time-limit.
This commit was sponsored by Jake Vosloo on Patreon.
Added annex.jobs setting, which is like using the -J option.
Of course, -J overrides annex.jobs.
This commit was sponsored by Trenton Cronholm on Patreon.
Useful for dropping old objects from cache repositories.
But also, quite a genrally useful thing to have..
Rather than imitiating find's -atime and other options, all of which are
pretty horrible to use, I made this match files accessed within a time
period, using the same duration format used by git-annex schedule and
--limit-time
In passing, changed the --limit-time option parser to parse the
duration, instead of having it later throw an error.
This commit was supported by the NSF-funded DataLad project.
Added --json-error-messages option, which includes error messages in the
json output, rather than outputting them to stderr.
The actual rediretion of errors is not implemented yet, this is only
the docs and option plumbing.
This commit was supported by the NSF-funded DataLad project.
Fix behavior of --json-progress followed by --json, in which
the latter option disabled the former.
This commit was supported by the NSF-funded DataLad project.
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.
Got rid of the remotes member of Git.Repo. This was a bit painful.
Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.
This commit was sponsored by Jake Vosloo on Patreon.
optparse-applicative-0.14.0.0 adds support for these, so have the
Makefile install their scripts when built with it.
CmdLine/GitAnnex/Options.hs now uses action "file" in cmdParams,
which affects the bash and zsh completions, letting them complete
filenames for subcommands that use that. This is not needed for
bash, since bash-completion.bash enables -o bashdefault, which
lets it complete filenames too. But it does not seem to break the bash
completions. It is needed for zsh; the zsh completion otherwise
does not complete filenames. The fish completion will always complete
filenames no matter what. Messy.
This commit was sponsored by Denis Dzyubenko on Patreon.
Reworked remote name parsing to allow things like that. Command.Move
uses it for --to=here, although there's not yet an implementation of
that option.
This commit was sponsored by Ignacio on Patreon.
Added --securehash option to match files using a secure hash function, and
corresponding securehash preferred content expression.
This commit was sponsored by Ethan Aubin.
Where before the "name" of a key and a backend was a string, this makes
it a concrete data type.
This is groundwork for allowing some varieties of keys to be disabled
in file2key, so git-annex won't use them at all.
Benchmarks ran in my big repo:
old git-annex info:
real 0m3.338s
user 0m3.124s
sys 0m0.244s
new git-annex info:
real 0m3.216s
user 0m3.024s
sys 0m0.220s
new git-annex find:
real 0m7.138s
user 0m6.924s
sys 0m0.252s
old git-annex find:
real 0m7.433s
user 0m7.240s
sys 0m0.232s
Surprising result; I'd have expected it to be slower since it now parses
all the key varieties. But, the parser is very simple and perhaps
sharing KeyVarieties uses less memory or something like that.
This commit was supported by the NSF-funded DataLad project.
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.
Concurrent json output is currently a mess though since threads output
chunks over top of one-another.