This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.
Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.
This commit was sponsored by Denis Dzyubenko on Patreon.
These don't have importTree in their config, because they don't support
tree import, but they do still support import, and do not support export
or key/value modification.
Don't want to try to use these remotes as key/value remotes, which will
surely fail. It only recently became possible for importtree to be set
w/o exporttree, so before this code was ok.
(cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.
Over the course of 2 grueling days.
withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.
Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
This was already prevented in other ways, but as seen in commit
c30fd24d91, those were a bit fragile.
And I'm not sure races were avoided in every case before. At least a
race between two separate git-annex processes, dropping the same
content, seemed possible.
This way, if locking fails, and the content is not present, it will
always do the right thing. Also, it avoids the overhead of an unncessary
inAnnex check for every file.
This commit was sponsored by Denis Dzyubenko on Patreon.
The test suite noticed this case, where two files with the same key are
dropped, and the seek stage sees both have content due to the way files
stream through it. But then locking the content to drop fails on the
second file, because the first file has already been dropped.
So, add back otherwise redundant inAnnex check.
Sped up seeking files to drop by 2x, and also some performance
improvements to checking numcopies.
Interestingly, the seek speedup is not due to precaching, but I think is
due to calling getParsed earlier.
Annex.Drop had to be changed to check inAnnex there, since it was removed
from Command.Drop. All other users of Command.Drop already checked inAnnex
themselves.
This commit was sponsored by Ryan Newton on Patreon.
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.
Git.Repo also changed to use RawFilePath for the path to the repo.
This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
The hoped for optimisation of CommandStart with -J did not materialize.
In fact, not runnign CommandStart in parallel is slower than -J3.
So, CommandStart are still run in parallel.
(The actual bad performance I've been seeing with -J in my big repo
has to do with building the remoteList.)
But, this is still progress toward making -J faster, because it gets rid
of the onlyActionOn roadblock in the way of making CommandCleanup jobs
run separate from CommandPerform jobs.
Added OnlyActionOn constructor for ActionItem which fixes the
onlyActionOn breakage in the last commit.
Made CustomOutput include an ActionItem, so even things using it can
specify OnlyActionOn.
In Command.Move and Command.Sync, there were CommandStarts that used
includeCommandAction, so output messages, which is no longer allowed.
Fixed by using startingCustomOutput, but that's still not quite right,
since it prevents message display for the includeCommandAction run
inside it too.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.
Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
This fixes a bug with the numcopies counting when using sync --content.
It did not always pass the local repo uuid to handleDropsFrom, and so the
numcopies counting was off by one, and unwanted local content would only be
dropped when there were numcopies+1 remote copies.
Also, support dropping local content that has reached an
exporttree remote that is not untrusted (currently only S3 remotes
with versioning).
Make git-annex sync and the assistant skip trying to drop from appendonly
remotes since it's just going to fail.
git-annex drop and similar commands will still try to drop from
appendonly, so the user will see failure messages when they try to do
that. To do otherwise would be confusing since the user has explicitly
asked for a drop with those commands.
This commit was supported by the NSF-funded DataLad project.
Show branch:file that is being operated on.
I had to make ActionItem a type and not a type class because
withKeyOptions' passed two different types of values when using the type
class, and I could not get the type checker to accept that.
The type checker should have noticed this, but the changes to mapM
that make it accept any Traversable hid the fact that it was not being
passed a list at all. Thus, what should have returned an empty list most
of the time instead returned [""] which was treated as the name of the
associated file, with disasterout consequences.
When I have time, I should add a test case checking what sync --content
drops. I should also consider replacing mapM with one re-specialized to
lists.
Fixes several bugs with updates of pointer files. When eg, running
git annex drop --from localremote
it was updating the pointer file in the local repository, not the remote.
Also, fixes drop ../foo when run in a subdir, and probably lots of other
problems. Test suite drops from ~30 to 11 failures now.
TopFilePath is used to force thinking about what the filepath is relative
to.
The data stored in the sqlite db is still just a plain string, and
TopFilePath is a newtype, so there's no overhead involved in using it in
DataBase.Keys.
There should be no behavior changes in this commit, it just adds a more
expressive data type and adjusts code that had been passing around a [UUID]
or sometimes a Maybe Remote to instead use [VerifiedCopy].
Although, since some functions were taking two different [UUID] lists,
there's some potential for me to have gotten it horribly wrong.
This works, and seems fairly robust. Clean get of 20 files at -J3. At -J10,
there are some messages about ssh multiplexing, probably due to a race
spinning up the ssh connection cacher. But, it manages to get all the files
ok regardless.
The progress bars are a scrambled mess though, due to bugs in
ascii-progress, which I've already filed. Particularly this one:
https://github.com/yamadapc/haskell-ascii-progress/issues/8
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.
As part of this, preferred content checks pass the Key around.
Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.
I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
* numcopies: New command, sets global numcopies value that is seen by all
clones of a repository.
* The annex.numcopies git config setting is deprecated. Once the numcopies
command is used to set the global number of copies, any annex.numcopies
git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.
This global numcopies setting is needed to let preferred content
expressions operate on numcopies.
It's also convenient, because typically if you want git-annex to preserve N
copies of files in a repo, you want it to do that no matter which repo it's
running in. Making it global avoids needing to warn the user about gotchas
involving inconsistent annex.numcopies settings.
(See changes to doc/numcopies.mdwn.)
Added a new variety of git-annex branch log file, that holds only 1 value.
Will probably be useful for other stuff later.
This commit was sponsored by Nicolas Pouillard.
I've been disliking how the command seek actions were written for some
time, with their inversion of control and ugly workarounds.
The last straw to fix it was sync --content, which didn't fit the
Annex [CommandStart] interface well at all. I have not yet made it take
advantage of the changed interface though.
The crucial change, and probably why I didn't do it this way from the
beginning, is to make each CommandStart action be run with exceptions
caught, and if it fails, increment a failure counter in annex state.
So I finally remove the very first code I wrote for git-annex, which
was before I had exception handling in the Annex monad, and so ran outside
that monad, passing state explicitly as it ran each CommandStart action.
This was a real slog from 1 to 5 am.
Test suite passes.
Memory usage is lower than before, sometimes by a couple of megabytes, and
remains constant, even when running in a large repo, and even when
repeatedly failing and incrementing the error counter. So no accidental
laziness space leaks.
Wall clock speed is identical, even in large repos.
This commit was sponsored by an anonymous bitcoiner.
Similar to the assistant, this honors any configured preferred content
expressions.
I am not entirely happpy with the implementation. It would be nicer if
the seek function returned a list of actions which included the individual
file gets and copies and drops, rather than the current list of calls to
syncContent. This would allow getting rid of the somewhat reundant display
of "sync file [ok|failed]" after the get/put display.
But, do that, withFilesInGit would need to somehow be able to construct
such a mixed action list. And it would be less efficient than the current
implementation, which is able to reuse several values between eg get and
drop.
Note that currently this does not try to satisfy numcopies when
getting/putting files (numcopies are of course checked when dropping
files!) This makes it like the assistant, and unlike get --auto
and copy --auto, which do duplicate files when numcopies is not yet
satisfied. I don't know if this is the right decision; it only seemed to
make sense to have this parallel the assistant as far as possible to start
with, since I know the assistant works.
This commit was sponsored by Øyvind Andersen Holm.