This can happen when ingesting a new file in either locked or unlocked
mode, when some unlocked files in the repo use the same key, and the
content was not locally available before.
There should be no behavior changes in this commit, it just adds a more
expressive data type and adjusts code that had been passing around a [UUID]
or sometimes a Maybe Remote to instead use [VerifiedCopy].
Although, since some functions were taking two different [UUID] lists,
there's some potential for me to have gotten it horribly wrong.
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.
In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.
Problems noticed while doing this conversion:
* Some uses of Params "oneword" which was entirely unnecessary
overhead.
* A few places that built up a list of parameters with ++
and then used Params to split it!
Test suite passes.
Most of the time, there will be no discreprancy between programPath and
readProgramFile.
But, the programFile might have been written by an old version of git-annex
that is still installed, while a newer one is currently running. In this
case, we want to run the same one that's currently running.
This is especially important for things like the GIT_SSH=git-annex used for
ssh connection caching.
The only code that still uses readProgramFile directly is the upgrade code,
which needs to know where the standalone git-annex was installed, in order to
upgrade it.
Note that while the assistant detects changes made to remote names, I left
the commit message fixed rather than calculating it after every commit. It
doesn't seem worth the CPU to do the latter.
* init: Repository tuning parameters can now be passed when initializing a
repository for the first time. For details, see
http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
that has been tuned in incompatable ways.
This is necessary for interop between inode caches created on unix and
windows. Which is more important than supporting inodecaches for large keys
with the wrong size, which are broken anyway.
There should be no slowdown from this change, except on Windows.
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.
Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.
This commit was sponsored by Christian Dietrich.
Reverts 965e106f24
Unfortunately, this caused breakage on Windows, and possibly elsewhere,
because parentDir and takeDirectory do not behave the same when there is a
trailing directory separator.
parentDir is less safe than takeDirectory, especially when working
with relative FilePaths. It's really only useful in loops that
want to terminate at /
This commit was sponsored by Audric SCHILTKNECHT.
The new yesod needs the ViewPatterns extension.
Also, a TH splice in Assistant/Threads/WebApp.hs failed to work without
OverLoadedStrings.
This commit was sponsored by Brock Spratlen.
Found these with:
git grep "^ " $(find -type f -name \*.hs) |grep -v ': where'
Unfortunately there is some inline hamlet that cannot use tabs for
indentation.
Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
Support users who have set commit.gpgsign, by disabling gpg signatures for
git-annex branch commits and commits made by the assistant.
The thinking here is that a user sets commit.gpgsign intending the commits
that they manually initiate to be gpg signed. But not commits made in the
background, whether by a deamon or implicitly to the git-annex branch.
gpg signing those would be at best a waste of CPU and at worst would fail,
or flood the user with gpg passphrase prompts, or put their signature on
changes they did not directly do.
See Debian bug #753720.
Also makes all commits done by git-annex go through a few central control
points, to make such changes easier in future.
Also disables commit.gpgsign in the test suite.
This commit was sponsored by Antoine Boegli.
Yes, this means that git annex webapp on windows execs git-annex, which
execs itself to set env, and the execs itself again to redirect logs.
This is disgusting. This is Windows(TM).
When in direct mode, update the master branch after committing to the
annex/direct/master branch. Also, update the synced/master branch.
This fixes a topology A->B where both A and B are in direct mode and
running the assistant, and a change is made to B. Before this fix, A pulled
the changes from B, but since they were only on the annex/direct/master
branch, it did not merge them.
Note that I considered making the assistant merge the
remotes/B/annex/direct/master, but decided to keep it simple and only merge
the sync branches as before.
On Windows, changing the time zone causes the apparent mtime of files to
change. This confuses git-annex, which natually thinks this means the files
have actually been modified (since THAT'S WHAT A MTIME IS FOR, BILL <sheesh>).
Work around this stupidity, by using the inode sentinal file to detect if
the timezone has changed, and calculate a TSDelta, which will be applied
when generating InodeCaches.
This should add no overhead at all on unix. Indeed, I sped up a few
things slightly in the refactoring.
Seems to basically work! But it has a big known problem:
If the timezone changes while the assistant (or a long-running command)
runs, it won't notice, since it only checks the inode cache once, and
so will use the old delta for all new inode caches it generates for new
files it's added. Which will result in them seeming changed the next time
it runs.
This commit was sponsored by Vincent Demeester.
This version of wai changed the type of Middleware, so I cannot seem
to liftIO inside it. So, got rid of a lot of not really needed
complexity to use System.Log.Logger's logging stuff, and just use
the standard wai stdout logger when debug logging is enabled.
Format may change some, and it logs http to stdout instead of stderr
now. Doesn't matter for the webapp since both go to the same log anyway.
Only fsck and reinject and the test suite used the Backend, and they can
look it up as needed from the Key. This simplifies the code and also speeds
it up.
There is a small behavior change here. Before, all commands would warn when
acting on an annexed file with an unknown backend. Now, only fsck and
reinject show that warning.
This is a better approach to finding both when NM has lost a network
connection, and when a new network connection is made by NM.
Tested with network-manager 0.9.8.8.
This commit was sponsored by Cedric Staub.
* Remote system might be available, and connection get lost. Should
reconnect, but needs to avoid bad behavior (ie, constant reconnect
attempts.) Use exponential backoff.
* Detect if old system had a too old git-annex-shell, and show the user
a nice message in the webapp. Required parsing error messages, so perhaps
this code shoudl be removed once enough time has passed..
* Switch the protocol to using remote URI's, rather than remote names.
Names change. Also avoids issues with serialization of names containing
whitespace.
This is nearly ready for merge into master now. I'd still like to make the ssh
transport smarter about reusing ssh connection caching during git pull.
This commit was sponsored by Jim Paris.
a ssh remote, and pulls.
XMPP is no longer needed in this configuration!
Requires the remote server have git-annex-shell with notifychanges support.
(untested)
This commit was sponsored by Geog Wechslberger.
This includes checking when dropping files that any required content
configuration is satisfied. However, it does not yet include an active
check on the required content; the location log is trusted when checking
the required content expression.
This is a new feature, it was not handled before, since it's a bit of an
edge case. However, it can be handled exactly the same as a file/dir
conflict, just leave the non-annexed item alone.
While implementing this, the core resolveMerge' function got a lot simpler
and clearer. Note especially that where before there was an asymetric call to
stagefromdirectmergedir, now graftin is called symmetrically in both cases.
And, in order to add that `graftin us`, the current branch needed to be
known (if there is no current branch, there cannot be a merge conflict).
This led to some cleanups of how autoMergeFrom behaved when there is no
current branch.
This commit was sponsored by Philippe Gauthier.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).
.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.
I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.
git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)
Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
* sync --content: Honor annex-ignore configuration.
* sync: Don't try to sync with xmpp remotes, which are only currently
supported when using the assistant.
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.
As part of this, preferred content checks pass the Key around.
Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.
I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
Make sanity checker run git annex unused daily, and queue up transfers
of unused files to any remotes that will have them. The transfer retrying
code works for us here, so eg when a backup disk remote is plugged in,
any transfers to it are done. Once the unused files reach a remote,
they'll be removed locally as unwanted.
If the setup does not cause unused files to go to a remote, they'll pile
up, and the sanity checker detects this using some heuristics that are
pretty good -- 1000 unused files, or 10% of disk used by unused files,
or more disk wasted by unused files than is left free. Once it detects
this, it pops up an alert in the webapp, with a button to take action.
TODO: Webapp UI to configure this, and also the ability to launch an
immediate cleanup of all unused files.
This commit was sponsored by Simon Michael.
* numcopies: New command, sets global numcopies value that is seen by all
clones of a repository.
* The annex.numcopies git config setting is deprecated. Once the numcopies
command is used to set the global number of copies, any annex.numcopies
git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.
This global numcopies setting is needed to let preferred content
expressions operate on numcopies.
It's also convenient, because typically if you want git-annex to preserve N
copies of files in a repo, you want it to do that no matter which repo it's
running in. Making it global avoids needing to warn the user about gotchas
involving inconsistent annex.numcopies settings.
(See changes to doc/numcopies.mdwn.)
Added a new variety of git-annex branch log file, that holds only 1 value.
Will probably be useful for other stuff later.
This commit was sponsored by Nicolas Pouillard.
Similar to the assistant, this honors any configured preferred content
expressions.
I am not entirely happpy with the implementation. It would be nicer if
the seek function returned a list of actions which included the individual
file gets and copies and drops, rather than the current list of calls to
syncContent. This would allow getting rid of the somewhat reundant display
of "sync file [ok|failed]" after the get/put display.
But, do that, withFilesInGit would need to somehow be able to construct
such a mixed action list. And it would be less efficient than the current
implementation, which is able to reuse several values between eg get and
drop.
Note that currently this does not try to satisfy numcopies when
getting/putting files (numcopies are of course checked when dropping
files!) This makes it like the assistant, and unlike get --auto
and copy --auto, which do duplicate files when numcopies is not yet
satisfied. I don't know if this is the right decision; it only seemed to
make sense to have this parallel the assistant as far as possible to start
with, since I know the assistant works.
This commit was sponsored by Øyvind Andersen Holm.
Batch detection is heuristic, so can sometimes fail. I observed one such
failure while starting up in a repository with 87000 files. After the first
several batches of ~5000 files, it fell out of batch mode, and never
re-entered it, and so made many more commits of a few files at a time
than necessary.
So, let's always use batch mode when in the startup scan. This avoids the
heuristic there, at least.
There is clearly also room to improve the heuristic. Possibly 10 files is
too high a bar to be found during a commit, on a system that can commit
quickly.
Fixes a test case I received where a corrupted repo was repaired, but the
git-annex branch was not. The root of the problem was that the
MissingObject returned by the repair code was not necessarily a complete
set of all objects that might have been deleted during the repair.
So, stop trying to return that at all, and instead make the index file
checking code explicitly verify that each object the index uses is present.