* sync: Use the ssh-options git config when doing git pull and push.
* remotedaemon: Use the ssh-options git config.
Note that the rename env var means that if a new git-annex calls an old one
for git-annex ssh, or a new calls an old, nothing much will go wrong;
just ssh caching won't happen.
Note that while the assistant detects changes made to remote names, I left
the commit message fixed rather than calculating it after every commit. It
doesn't seem worth the CPU to do the latter.
hGetSomeString reads one byte at a time, so unicode bytes are not composed.
The problem comes when outputting that to the console with hPut; that
tried to apply the handle's encoding, and so we get mojibake.
Instead, use ByteStrings, and only convert it to a string for parsing, not
for display.
Note that there are a couple of other things that use hGetSomeString,
which I've left as-is for now.
--deduplicate, --skip-duplicates, and --clean-duplicates still checksum the
file twice, the first time to determine if it's a duplicate. This cannot be
easily merged with the checksumming done to add the file, since the file
needs to be locked down before that second checksum is taken.
I hope this doesn't impact speed much -- it does have to pull out a value
from Annex state every time it accesses the branch now.
The test case I dropped has never caught any problems that I can remember,
and would have been rather difficult to convert.
* init: Repository tuning parameters can now be passed when initializing a
repository for the first time. For details, see
http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
that has been tuned in incompatable ways.
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.
Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.
This commit was sponsored by Christian Dietrich.
* info: Can now display info about a given uuid.
* Added to remote/uuid info: Count of the number of keys present
on the remote, and their size. This is rather expensive to calculate,
so comes last and --fast will disable it.
* Git remote info now includes the date of the last sync with the remote.
This allows the git repository to be moved while git-annex is running in
it, with fewer problems.
On Windows, this avoids some of the problems with the absurdly small
MAX_PATH of 260 bytes. In particular, git-annex repositories should
work in deeper/longer directory structures than before. See
http://git-annex.branchable.com/bugs/__34__git-annex:_direct:_1_failed__34___on_Windows/
There are several possible ways this change could break git-annex:
1. If it changes its working directory while it's running, that would
be Bad News. Good news everyone! git-annex never does so. It would also
break thread safety, so all such things were stomped out long ago.
2. parentDir "." -> "" which is not a valid path. I had to fix one
instace of this, and I should probably wipe all calls to parentDir out
of the git-annex code base; it was never a good idea.
3. Things like relPathDirToFile require absolute input paths,
and code assumes that the git repo path is absolute and passes it to it
as-is. In the case of relPathDirToFile, I converted it to not make
this assumption.
Currently, the test suite has 16 failures.
It's ok to probe every time for git-branch remove because that's
run quite rarely. For git-checkattr, it's run only once, when
starting the --batch mode, and so again the overhead is pretty minimal.
This leaves 2 places where the build version is still used.
git merge might be interactive or fail if one skews, and --no-gpg-sign
might not be pased, or might be passed to a git that doesn't understand it
if the other skews. It seems a little expensive to check the git version
each time these are used.
This doesn't seem likely to cause many problems, at least compared with
check-attr hanging on skew.
In the case where a remote of the bare repo has a fetch = configuration,
refs/remotes/origin/master will exist, and so the merge code path tried to
run in the bare repo.
Getting rid of build warning
warning: 'statfs64' is deprecated: first deprecated in OS X 10.6
[-Wdeprecated-declarations]
10.6 is much older than the oldest git-annex OSX port, so won't break
anything.
More aggressive rsync params fixup for windows. Param may contain a url, or
a file path, so check if it looks like a local file path and if so, fix it
up.
On windows only, rsyncUrlIsPath will treat c:foo as a path, rather than as
a rsyncurl starting with a host "c".
addurl behavior change: When downloading an url ending in .torrent,
it will download files from bittorrent, instead of the old behavior
of adding the torrent file to the repository.
Added Recommends on aria2 and bittornado | bittorrent.
This commit was sponsored by Asbjørn Sloth Tønnesen.
Now that deps are sorted out in hackage, cabal is unlikely to try to
install a too old AWS, so I don't think this flag is worth the bother of
being completely correct with the dependency versioning.
This avoids me needing to enable to flag on the autobuilders..
This allows bypassing the direct mode guard in a safe way to do all sorts
of things including git revert, git mv, git checkout ...
This commit was sponsored by the WikiMedia Foundation.
I had hoped that the git devs could change git's handling of partial
commits to not use a false index file, but seems not.
So, this relies on some git internals to detect that case. The test suite
has a test case added to catch it if changes to git break it.
This commit was sponsored by Paul Tagliamonte.
This is intended to let the user easily tell if a remote's creds are
coming from info embedded in the repository, or instead from the
environment, or perhaps are locally stored in a creds file.
This commit was sponsored by Frédéric Schütz.
Didn't know that this library existed!
This includes making git-annex not re-exec itself on start on windows, and
making the test suite on Windows run tests without forking.
This is not a complete fix. For one, git remote will happily go add a
remote that has the same name as an existing special remote. For another,
enableremote will enable a special remote over top of an existing git
remote. And, also, the webapp might.
Added a Default instance for TrustLevel, and was able to use that to clear
up several other parts of the code too.
This commit was sponsored by Stephan Schulz
The new yesod needs the ViewPatterns extension.
Also, a TH splice in Assistant/Threads/WebApp.hs failed to work without
OverLoadedStrings.
This commit was sponsored by Brock Spratlen.
See 2f3c3aa01f for backstory about how a repo
could be in this state.
When decryption fails, the repo must be using non-encrypted creds. Note
that creds are encrypted/decrypted using the encryption cipher which is
stored in the repo, so the decryption cannot fail due to missing gpg keys
etc. (For !shared encryptiom, the cipher is iteself encrypted using some
gpg key(s), and the decryption of the cipher happens earlier, so not
affected by this change.
Print a warning message for !shared repos, and continue on using the
cipher. Wrote a page explaining what users hit by this bug should do.
This commit was sponsored by Samuel Tardieu.
encryptionSetup must be called before setRemoteCredPair. Otherwise,
the RemoteConfig doesn't have the cipher in it, and so no cipher is used to
encrypt the embedded creds.
This is a security fix for non-shared encryption methods!
For encryption=shared, there's no security problem, just an
inconsistentency in whether the embedded creds are encrypted.
This is very important to get right, so used some types to help ensure that
setRemoteCredPair is only run after encryptionSetup. Note that the external
special remote bypasses the type safety, since creds can be set after the
initial remote config, if the external special remote program requests it.
Also note that IA remotes never use encryption, so encryptionSetup is not
run for them at all, and again the type safety is bypassed.
This leaves two open questions:
1. What to do about S3 and glacier remotes that were set up
using encryption=pubkey/hybrid with embedcreds?
Such a git repo has a security hole embedded in it, and this needs to be
communicated to the user. Is the changelog enough?
2. enableremote won't work in such a repo, because git-annex will
try to decrypt the embedded creds, which are not encrypted, so fails.
This needs to be dealt with, especially for ecryption=shared repos,
which are not really broken, just inconsistently configured.
Noticing that problem for encryption=shared is what led to commit
fbdeeeed5f, which tried to
fix the problem by not decrypting the embedded creds.
This commit was sponsored by Josh Taylor.
This reverts commit fbdeeeed5f.
I can find no basis for that commit and think that I made it in error.
setRemoteCredPair always encrypts using the cipher from remoteCipher,
even when the cipher is shared.
* New annex.hardlink setting. Closes: #758593
* init: Automatically detect when a repository was cloned with --shared,
and set annex.hardlink=true, as well as marking the repository as
untrusted.
Had to reorganize Logs.Trust a bit to avoid a cycle between it and
Annex.Init.
It seems that all other uses of <div .col-sm-9> occur outside of
<div .content-box>. This one occurred inside it, when xmpp pairing.
This was introduced in the bootstrap 3 conversion.
This avoids cp -a overriding the default mode acls that the user might have
set in a git repository.
With GNU cp, this behavior change should not be a breaking change, because
git-anex also uses rsync sometimes in the same situation, and has only ever
preserved timestamps when using rsync.
Systems without GNU cp will no longer use cp -a, but instead just cp.
So, timestamps will no longer be preserved. Preserving timestamps when
copying between repos is not guaranteed anyway.
Closes: #729757
Old behavior was to take the first fuzzy match. Now, it checks the globa
git config, and runs the normal fuzzy handling, including failing to run a
semi-random command by default.
This does mean that eg, copying multiple files to a local remote will
become slightly slower, since it now restarts git-cat-file after each copy.
Should not be significant slowdown.
The reason git-cat-file is run on the remote at all is to update its
location log. In order to add an item to it, it needs to get the current
content of the log. Finding a way to avoid needing to do that would be a
good path to avoiding this slowdown if it does become a problem somehow.
This commit was sponsored by Evan Deaubl.
(With the exception of daemon pid locking.)
This fixes at part of #758630. I reproduced the assistant locking eg, a
removable drive's annex journal lock file and forking a long-running
git-cat-file process that inherited that lock.
This did not affect Windows.
Considered doing a portable Utility.LockFile layer, but git-annex uses
posix locks in several special ways that have no direct Windows equivilant,
and it seems like it would mostly be a complication.
This commit was sponsored by Protonet.
Note that this means getopt parsing is done even when not in a git
repository, even though currently cmdnorepo is not passed the results of
it. I'd like to move to cmdnorepo not doing its own ad-hoc option parsing,
so this is really a good thing. (But as long as eg, getOptionFlag needs an
Annex monad, it cannot be used in cmdnorepo handling.)
There is a potential for problems if any cmdnorepo branch of a command
handles options that are not in its regular getopt, but that would be a bug
anyway.
This is needed only because of the new MonadMask needed for bracket
in the new version. Ifdefing it everywhere is not practical, since the
Setup.hs uses it.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
This opens up the possibility of using http-conduit for large file
downloads, but for now I've left it using wget/curl.
This commit was sponsored by Paul Tötterman.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
Since encryption=shared, the encryption key is stored in the git repo, so
there is no point at all in encrypting the creds, also stored in the git
repo with that key. So `initremote` doesn't. The creds are simply stored
base-64 encoded.
However, it then tried to always decrypt creds when encryption was used..
replaceFileOr was broken and ran the rollback action always.
Luckily, for replaceFile, the rollback action was safe to run, since it
just nuked a temp file that had already been moved into place.
However, when `git annex direct` used replaeFileOr, its rollback printed a
scary message:
/home/joey/tmp/rrrr/.git/annex/misctmp/tmp32268: rename: does not exist (No such file or directory)
There was actually no bad result though.
Implemented the Retriever.
Unfortunately, it is a fileRetriever and not a byteRetriever.
It should be possible to convert this to a byteRetiever, but I got stuck:
The conduit sink needs to process individual chunks, but a byteRetriever
needs to pass a single L.ByteString to its callback for processing. I
looked into using unsafeInerlaveIO to build up the bytestring lazily,
but the sink is already operating under conduit's inversion of control,
and does not run directly in IO anyway.
On the plus side, no more memory leak..
Currently, initremote works, but not the other operations. They should be
fairly easy to add from this base.
Also, https://github.com/aristidb/aws/issues/119 blocks internet archive
support.
Note that since http-conduit is used, this also adds https support to S3.
Although git-annex encrypts everything anyway, so that may not be extremely
useful. It is not enabled by default, because existing S3 special remotes
have port=80 in their config. Setting port=443 will enable it.
This commit was sponsored by Daniel Brockman.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
This speeds up the webdav special remote somewhat, since it often now
groups actions together in a single http connection when eg, storing a
file.
Legacy chunks are still supported, but have not been sped up.
This depends on a as-yet unreleased version of DAV.
This commit was sponsored by Thomas Hochstein.
Reusing http connection when operating on chunks is not done yet,
I had to submit some patches to DAV to support that. However, this is no
slower than old-style chunking was.
Note that it's a fileRetriever and a fileStorer, despite DAV using
bytestrings that would allow streaming. As a result, upload/download of
encrypted files is made a bit more expensive, since it spools them to temp
files. This was needed to get the progress meters to work.
There are probably ways to avoid that.. But it turns out that the current
DAV interface buffers the whole file content in memory, and I have
sent in a patch to DAV to improve its interfaces. Using the new interfaces,
it's certainly going to need to be a fileStorer, in order to read the file
size from the file (getting the size of a bytestring would destroy
laziness). It should be possible to use the new interface to make it be a
byteRetriever, so I'll change that when I get to it.
This commit was sponsored by Andreas Olsson.
When files are stored using rsync, they have their write bit removed;
so does the directory they're put in. The local repo code did not turn
these bits back on, so failed to remove.
bup already splits files and does rolling deltas, so there is no reason to
use chunking here.
The new API made it easier to add progress support for storeKey, so that's
done. Unfortunately, bup-split still outputs its own progress with -q,
so a little ugly, but not too bad.
Made dropping remove the branch for an object, for two reasons:
1. The new API calls removeKey to roll back a storeKey when the content
changed unexpectedly.
2. So that testremote will be happy.
Also, fixed a bug that caused a crash when removing the branch for an
object in rollback.
This only performs some basic tests so far; no testing of chunking or
resuming. Also, the existing encryption type of the remote is used; it
would be good later to derive an encrypted and a non-encrypted version of
the remote and test them both.
This commit was sponsored by Joseph Liu.
For example, I had a copy to a remote that was failing for an unknown
reason. This let me see the exception was createDirectory: permission
denied; the underlying problem being a permissions issue.
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.
But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!
This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.
(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)
This commit was sponsored by Thomas Djärv.
The repair code assumed that if fsck found no broken objects, after
removing bad objects and possibly pulling replacements from remote, all was
well.. but this is not really true. Removing bad objects could leave some
branches broken. fsck doesn't report any missing objects in this case,
and its messages about broken branches are ignored by the fsck output
parser.
To deal with this, added a separate scan of all refs to find broken ones
and remove them when --forced. This will also let anyone who ran into this
bug run repair again to fix up the incomplete repair done before.
This commit was sponsored by Aaron Whitehouse.
Based on the example from the tip, but modified to cd into the repo before
running git-annex, since konqueror does not. Also, at least on my system,
the directory is ~/.kde, not ~/.kde4. (konqueror 4.12.4)
This commit was sponsored by Jürgen Peters.
This is a security/usability tradeoff. To avoid exposing the gpg key ids
who can decrypt the repository, users can unset
gcrypt-publish-participants.
The gcrypt-publish-participants option is available in my fork of
git-remote-gcrypt.
This commit was sponsored by Christopher Kernahan.
Catch an exception when ensureInitialized is run in a non-initted
repository. In this case, just read the git config, so that the Git.Repo
object is not LocalUnknown, which is what is used to represent remotes
on eg, drives that are not connected.
The assistant already got this right, and like with the assistant, this
causes an implicit git-annex init of the local remote on the second sync,
once the git-annex branch has been pushed to it.
See this comment for more analysis:
http://git-annex.branchable.com/todo/Recovering_from_a_bad_sync/#comment-64e469a2c1969829ee149cbb41b1c138
This commit was sponsored by jscit.
I think this is a git behavior change, but have not checked to be sure.
Conflict cruft used to look like $foo~HEAD, but now just $foo is left
behind as conflict cruft.
With test case.
Running `git annex direct` would cause loss of data, because the object
was moved to a temp file, which it then tried to replace the work tree file
with, and on failure, the temp file got deleted. Now it's instead moved
back into the annex object location.
Minor because normally only 1 FD is leaked per git-annex run. However,
the test suite leaks a few hundred FDs, and this broke it on the Debian
autobuilders, which seem to have a tigher than usual ulimit.
The leak was introduced by the lazy getDirectoryContents' that was
introduced in e6330988dd in order to scale to
millions of journal files -- if the lazy list was never fully consumed, the
directory handle did not get closed.
Instead, pull in openDirectory/readDirectory/closeDirectory code that I
already developed and submitted in a patch to the haskell directory library
earlier. Using this in journalDirty avoids the place that the lazy list
caused a problem. And using it in stageJournal eliminates the need for
getDirectoryContents'.
The getJournalFiles* functions are switched back to using the regular
strict getDirectoryContents. I'm not sure if those always consume the whole
list, so this avoids any leak. And the things that call those are things
like git annex unused, which also look at every file committed to the
git-annex branch, so would need more work to scale to insane numbers of
files anyway.
When one side is an annexed symlink, and the other side is a non-annexed symlink.
In this case, git-merge does not replace the annexed symlink in the work
tree with the non-annexed symlink, which is different from it's handling of
conflicts between annexed symlinks and regular files or directories.
So, while git-annex generated the correct merge commit, the work tree
didn't get updated to reflect it.
See comments on bug for additional analysis.
Did not add this to the test suite yet; just unloaded a truckload of firewood
and am feeling lazy.
This commit was sponsored by Adam Spiers.
Eg after git-annex add has run on 2 million files in one go.
Slightly unhappy with the neeed to use a temp file here, but I cannot see
any other alternative (see comments on the bug report).
This commit was sponsored by Hamish Coleman.
Support users who have set commit.gpgsign, by disabling gpg signatures for
git-annex branch commits and commits made by the assistant.
The thinking here is that a user sets commit.gpgsign intending the commits
that they manually initiate to be gpg signed. But not commits made in the
background, whether by a deamon or implicitly to the git-annex branch.
gpg signing those would be at best a waste of CPU and at worst would fail,
or flood the user with gpg passphrase prompts, or put their signature on
changes they did not directly do.
See Debian bug #753720.
Also makes all commits done by git-annex go through a few central control
points, to make such changes easier in future.
Also disables commit.gpgsign in the test suite.
This commit was sponsored by Antoine Boegli.
When annex.genmetadata is set, metadata from the feed is added to files
that are imported from it.
Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names
that are used in --template.
Also added title and author, which are the item title/author if available,
falling back to the feed title/author. These are more likely to be common
metadata fields.
(There is a small bit of dupication here, but once git gets
around to packing the object, it will compress it away.)
The itempubdate field is not included in the metadata as a string; instead
it is used to generate year and month fields, same as is done when adding
files with annex.genmetadata set.
This commit was sponsored by Amitai Schlair, who cooincidentially
is responsible for ikiwiki generating nice feed metadata!
I had thought that this was already done, but apparently not. There may
have been a reversion around version 5.20140606. Anna's laptop had its
desktop menu file etc having that version despite having upgraded git-annex
to a newer version. However, I could not find any commits that removed a
call to ensureInstalled.
The bug caused the size of the queue to be miscalculted; it was doubled
each time an item was added. Commands run after approx 140 items rather
than the intended 10240!
Yes, this means that git annex webapp on windows execs git-annex, which
execs itself to set env, and the execs itself again to redirect logs.
This is disgusting. This is Windows(TM).
Using the crazy but apparently best approach of a VB Script that runs
git-annex, in a hidden DOS window.
Note that currently the git-annex messages are not directed to daemon.log.
Would probably need another layer of script. Also problimatic since the
repository may not exist yet.
When in direct mode, update the master branch after committing to the
annex/direct/master branch. Also, update the synced/master branch.
This fixes a topology A->B where both A and B are in direct mode and
running the assistant, and a change is made to B. Before this fix, A pulled
the changes from B, but since they were only on the annex/direct/master
branch, it did not merge them.
Note that I considered making the assistant merge the
remotes/B/annex/direct/master, but decided to keep it simple and only merge
the sync branches as before.
Rather than calculating the TSDelta once, and caching it, this now
reads the inode sential file's InodeCache file once, and then each time a
new InodeCache is generated, looks at the sentinal file to get the current
delta.
This way, if the time zone changes while git-annex is running, it will
adapt.
This adds some inneffiency, but only on Windows, and only 1 stat per new
file added. The worst innefficiency is that `git annex status` and
`git annex sync` will now (on Windows) stat the inode sentinal file once per
file in the repo.
It would be more efficient to use getCurrentTimeZone, rather than needing
to stat the sentinal file. This should be easy to do, once the time
package gets my bugfix patch.
This commit was sponsored by Jürgen Lüters.
Deal with FAT's low resolution timestamps, which in combination with
Linux's caching of higher res timestamps while a FAT is mounted, caused
direct mode repositories on FAT to seem to have modified files after they
were unmounted and remounted.
This commit was sponsored by Fabrice Rossi.
This version of wai changed the type of Middleware, so I cannot seem
to liftIO inside it. So, got rid of a lot of not really needed
complexity to use System.Log.Logger's logging stuff, and just use
the standard wai stdout logger when debug logging is enabled.
Format may change some, and it logs http to stdout instead of stderr
now. Doesn't matter for the webapp since both go to the same log anyway.
It was possible for a interrupted sync or merge in direct mode to
leave the work tree out of sync with the last recorded commit.
This would result in the next commit seeing files missing from the work
tree, and committing their removal.
Now, a direct mode merge happens not only in a throwaway work tree, but using
a temporary index file, and without any commits or index changes
being made until the real work tree has been updated. If the merge is
interrupted, the work tree may have some updated files, but worst case a
commit will redundantly commit changes that come from the merge.
This commit was sponsored by Tony Cantor.
This avoids a collision if different ssh ports are used on the same host
for some reason.
Note that it's ok to change the format of the mangled hostname; unmangling
only extracts the hostname from it, and once ssh is configured for a
mangled hostname, that config is not changed.
This avoids a potential slowdown when using lots of views.
I think that it makes sense for unused to ignore (local) view branches,
since these are by definition supposed to be views of an existing branch,
so looking at the branch should be sufficient (and if the view is out of
date and has files that have since been deleted from the branch, the user's
intent is not to preserve those from unused reaping).
Avoid stomping on existing group and preferred content settings
when enabling or combining with an already existing remote.
Two level fix. First, use defaultStandardGroup rather than
setStandardGroup, so if there is an existing configuration in the git-annex
branch, it's not overwritten.
To handle pre-existing ssh remotes (including gcrypt), a second level is
needed, because before syncing with the remote, it's configuration won't be
available locally. (And syncing could take a long time.) So, in this case,
keep track of whether the remote is being created or enabled, and only set
configs when creating it.
This commit was sponsored by Anders Lannerback.