This avoids cp -a overriding the default mode acls that the user might have
set in a git repository.
With GNU cp, this behavior change should not be a breaking change, because
git-anex also uses rsync sometimes in the same situation, and has only ever
preserved timestamps when using rsync.
Systems without GNU cp will no longer use cp -a, but instead just cp.
So, timestamps will no longer be preserved. Preserving timestamps when
copying between repos is not guaranteed anyway.
Closes: #729757
Note that while before checkTransfer this called getLock with WriteLock,
getLockStatus's use of ReadLock will also notice any exclusive locks.
Since transfer info files are only locked exclusively, never shared,
there is no behavior change.
Also, fixes checkLocked to actually return Just False when the file
exists, but is not locked.
Also fixes a test suite failures introduced in recent commits, where
inAnnexSafe failed in indirect mode, since it tried to open the lock file
ReadWrite. This is why the new checkLocked opens it ReadOnly.
This commit was sponsored by Chad Horohoe.
Added a convenience Utility.LockFile that is not a windows/posix
portability shim, but still manages to cut down on the boilerplate around
locking.
This commit was sponsored by Johan Herland.
(With the exception of daemon pid locking.)
This fixes at part of #758630. I reproduced the assistant locking eg, a
removable drive's annex journal lock file and forking a long-running
git-cat-file process that inherited that lock.
This did not affect Windows.
Considered doing a portable Utility.LockFile layer, but git-annex uses
posix locks in several special ways that have no direct Windows equivilant,
and it seems like it would mostly be a complication.
This commit was sponsored by Protonet.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
This opens up the possibility of using http-conduit for large file
downloads, but for now I've left it using wget/curl.
This commit was sponsored by Paul Tötterman.
FileID type changed, needs Arbitrary instance.
On the plus side, getFileStatus on Windows now actually gets file id's,
not always 0, so direct mode is safer there now.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
This reaping of any processes came to cause me problems when redoing the
rsync special remote -- a gpg process that was running gets waited on and
the place that then checks its return code fails.
I cannot reproduce any zombies when using the rsync special remote.
But I still can when using a normal git remote, accessed over ssh.
There is 1 zombie per file downloaded without this horrible hack enabled.
So, move the hack to only be used in that case.
This only performs some basic tests so far; no testing of chunking or
resuming. Also, the existing encryption type of the remote is used; it
would be good later to derive an encrypted and a non-encrypted version of
the remote and test them both.
This commit was sponsored by Joseph Liu.
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.
There should be no behavior changes in this commit.
This commit was sponsored by Michael Barabanov.
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.
But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!
This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.
(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)
This commit was sponsored by Thomas Djärv.
Not yet used by any special remotes, but should not be too hard to add it
to most of them.
storeChunks is the hairy bit! It's loosely based on
Remote.Directory.storeLegacyChunked. The object is read in using a lazy
bytestring, which is streamed though, creating chunks as needed, without
ever buffering more than 1 chunk in memory.
Getting the progress meter update to work right was also fun, since
progress meter values are absolute. Finessed by constructing an offset
meter.
This commit was sponsored by Richard Collins.
Configure crashed on systems with that process and without eg, sha256sum.
The rest of the code in configure looks to work ok, since it uses sh -c to
probe for commands, and sh is always in path so it works.
Dunno about all the rest of git-annex. Not a huge amount of external
program use, other than git, so perhaps this won't be a large pain.
Note that boolSystem can throw an exception now if the program doesn't
exist. Could easily be changed back to False.
Minor because normally only 1 FD is leaked per git-annex run. However,
the test suite leaks a few hundred FDs, and this broke it on the Debian
autobuilders, which seem to have a tigher than usual ulimit.
The leak was introduced by the lazy getDirectoryContents' that was
introduced in e6330988dd in order to scale to
millions of journal files -- if the lazy list was never fully consumed, the
directory handle did not get closed.
Instead, pull in openDirectory/readDirectory/closeDirectory code that I
already developed and submitted in a patch to the haskell directory library
earlier. Using this in journalDirty avoids the place that the lazy list
caused a problem. And using it in stageJournal eliminates the need for
getDirectoryContents'.
The getJournalFiles* functions are switched back to using the regular
strict getDirectoryContents. I'm not sure if those always consume the whole
list, so this avoids any leak. And the things that call those are things
like git annex unused, which also look at every file committed to the
git-annex branch, so would need more work to scale to insane numbers of
files anyway.
Yes, this means that git annex webapp on windows execs git-annex, which
execs itself to set env, and the execs itself again to redirect logs.
This is disgusting. This is Windows(TM).
Rather than calculating the TSDelta once, and caching it, this now
reads the inode sential file's InodeCache file once, and then each time a
new InodeCache is generated, looks at the sentinal file to get the current
delta.
This way, if the time zone changes while git-annex is running, it will
adapt.
This adds some inneffiency, but only on Windows, and only 1 stat per new
file added. The worst innefficiency is that `git annex status` and
`git annex sync` will now (on Windows) stat the inode sentinal file once per
file in the repo.
It would be more efficient to use getCurrentTimeZone, rather than needing
to stat the sentinal file. This should be easy to do, once the time
package gets my bugfix patch.
This commit was sponsored by Jürgen Lüters.
On Windows, changing the time zone causes the apparent mtime of files to
change. This confuses git-annex, which natually thinks this means the files
have actually been modified (since THAT'S WHAT A MTIME IS FOR, BILL <sheesh>).
Work around this stupidity, by using the inode sentinal file to detect if
the timezone has changed, and calculate a TSDelta, which will be applied
when generating InodeCaches.
This should add no overhead at all on unix. Indeed, I sped up a few
things slightly in the refactoring.
Seems to basically work! But it has a big known problem:
If the timezone changes while the assistant (or a long-running command)
runs, it won't notice, since it only checks the inode cache once, and
so will use the old delta for all new inode caches it generates for new
files it's added. Which will result in them seeming changed the next time
it runs.
This commit was sponsored by Vincent Demeester.
Deal with FAT's low resolution timestamps, which in combination with
Linux's caching of higher res timestamps while a FAT is mounted, caused
direct mode repositories on FAT to seem to have modified files after they
were unmounted and remounted.
This commit was sponsored by Fabrice Rossi.
This version of wai changed the type of Middleware, so I cannot seem
to liftIO inside it. So, got rid of a lot of not really needed
complexity to use System.Log.Logger's logging stuff, and just use
the standard wai stdout logger when debug logging is enabled.
Format may change some, and it logs http to stdout instead of stderr
now. Doesn't matter for the webapp since both go to the same log anyway.
This avoids ssh prompting for passwords on stdin, ever.
It may also change other behavior of other programs, as there is no
controlling terminal now. However, setsid was already done when running the
assistant in daemon mode, so any behavior changes should not be really new.