Added remote.<name>.annex-checkuuid config, which can be set to false to
disable the default checking of the uuid of remotes that point to
directories. This can be useful to avoid unncessary drive spin-ups and
automounting.
Note that the UUID check is still done before writing to the repository,
to avoid writing to the wrong repository if it got relocated. Check is
also done before checkPresent to avoid getting confused about what is in
which repo. This is effectively the same as the use of git-annex-shell
with a uuid to check that the remote repository is the expected one.
Did not bother with the check for retrieveKeyFile because it doesn't
matter if the wrong repo is used then.
This commit was sponsored by Trenton Cronholm on Patreon.
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.
Got rid of the remotes member of Git.Repo. This was a bit painful.
Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.
This commit was sponsored by Jake Vosloo on Patreon.
git grep writeFile finds some more that might also be problems, but
for now I've concentrated on .git/annex/ log files. There are certianly
cases where writeFile is not a problem too.
This commit was sponsored by mo on Patreon.
Fourth or fifth try at this and finally found a way to make it work.
Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.
Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
And also now in non-fast mode, since it was just changed to query for the
filename separately.
And avoid processTranscript which mixed up stdout and stderr and could have
led to weirdness if there were warnings that didn't get suppressed.
addurl: When the file youtube-dl will download is already an annexed file,
don't download it again and fail to overwrite it, instead just do nothing,
like it used to when quvi was used.
This commit was sponsored by Anthony DeRobertis on Patreon.
This reverts commit 51228c2306.
No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.
With cabal, System.Posix.Process and System.Posix.Env are both missing.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.
This commit was sponsored by Fernando Jimenez on Patreon.
Chose to make this only handle files actively being downloaded, not temp
files for downloads that were interrupted or files that have been fully
downloaded.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Test suite is always included.
Building with this flag disabled has actually been broken for some time,
since Command.TestRemote uses tasty. Fewer build flags are better, so good
time to drop it.
This commit was sponsored by Thomas Hochstein on Patreon.
A top-level .noannex file will prevent git-annex init from being used in a
repository. This is useful for repositories that have a policy reason not
to use git-annex. The content of the file will be displayed to the user who
tries to run git-annex init.
This also affects git annex reinit and initialization via the webapp.
It does not affect automatic inits, when there's a sibling git-annex branch
already.
This commit was supported by the NSF-funded DataLad project.
lookupkey: Support being given an absolute filename to a file within the
current git repository.
This commit was supported by the NSF-funded DataLad project.
initremote, enableremote: Really support gpg subkeys suffixed with an
exclamation mark, which forces gpg to use a specific subkey. (Previous try
had a bug.)
This commit was sponsored by Jake Vosloo on Patreon.
Better to make it not be surprising and slow, than surprising and fast.
--raw can be used when it needs to be really fast.
Implemented adding a youtube-dl supported url to an existing file.
This commit was sponsored by andrea rota.
Fully working, including --fast/--relaxed.
Note that, while git-annex addurl --relaxed is not going to check
youtube-dl, I kept git annex importfeed --relaxed checking it.
Thinking is that, let's not break people's importfeed cron jobs, and
importfeed does not typically have to check a large number of new items,
so it's ok if it's a little bit slower when used with youtube playlist
feeds.
importfeed's behavior is also improved (?) when a feed has links in it
to non-media files. Before, those were skipped. Now, the content of the
link is downloaded. This had to be done, because trying to use
youtube-dl is slow, and if those were skipped, it would have to check
every time importfeed was run. While this behavior change may not be
desirable for some feeds, that intersperse links to web pages with
enclosures, it will be desirable for other feeds, that have
non-enclosure directy links to media files.
Remove old quvi modules.
This commit was sponsored by Øyvind Andersen Holm.
As it was getting too expensive to patch out use of the "new" syscalls
We could revisit this if someone has hardware with an older kernel
that's still being maintained, but I've verified that the Synology
NAS that had used a too old kernel version has been updated to 2.6.32.
Was trying to rmdir the file, which had already been deleted, and when that
failed, it skipped trying to delete the parent directories.
Noticed the bug through testremote, but it can't itself detect such
problems as there is no enumeration in the API.
This commit was sponsored by Brock Spratlen on Patreon.
As long as the class of remotes supports exporting, it's tested whether
or not the remote is configured with exporttree=yes.
Also, made testremote of a remote configured with exporttree=yes
disable that configuration for testing non-export storage.
This commit was supported by the NSF-funded DataLad project.
When there are multiple urls for a file, still treat it as being present
in the web when some urls don't work, as long as at least one url does
work.
This is consistent with the other web methods handling of multiple urls.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Actual problem is the keyName was set to "Ref \"sha\"", which led to
this follow-on failure since it contained a space.
The bad data would also get into the export database when exporting to a
non-external special remote. Looking briefly at that, I don't think the bad
data will lead to anything more than a re-upload of the file content
now that the problem has been fixed.
This commit was sponsored by Peter Hogg on Patreon.
Seems I forgot to fully test that feature when documenting it.
git rev-parse needs a colon after a branch to de-reference the tree
it points to, rather than the commit. But that had it adding an extra
colon when the user specified "branch:subdir". So, check if there is a
colon before adding one.
This commit was sponsored by Francois Marier on Patreon.
Windows: Fix reversion that caused the path used to link to annexed
content include the drive letter and full path, rather than being
relative. (`git annex fix` will fix up after this problem).
I've not identified the commit that brought the reversion (probably it
happened this spring when I was removing MisingH and last touched
Utility.Path). Likely commit 18b9a4b8024115db67ae309fdaf54e1553037529?
The problem is that relPathDirToFile got called two paths that had the
slashes different ways around. Since takeDrive includes the first slash,
this made two paths on the same drive seem different and it bailed.
(ifdefs around this to avoid doing extra work on non-windows)
This commit was sponsored by Jack Hill on Patreon.
Get ugly reversion out of CHANGELOG.
Also, relocated the windows stack.yaml to top, and updated windows build
instructions.
This commit was sponsored by Henrik Riomar on Patreon.
wget was broken even in the previous old release of the windows bundle,
this is not new breakage. msys-idn-11.dll and probably more would be needed
to use it. git for windows includes msys-idn2-0.dll instead.
Code for terminating processes on Windows is not linking anymore;
made a warning be displayed instead. This breaks restarting the
assistant and git annex assistant --stop.
I hope to see the code added to the Win32 library, where it should fit
better and should avoid whatever problem is making the linker not like it
when included in git-annex. I opened an issue requesting its addition,
here: https://github.com/haskell/win32/issues/91
This commit was sponsored by Thomas Hochstein on Patreon.
This avoids all the complication about redundant work discussed in
the previous try at fixing this. At the expense of needing each command
that could have the problem to be patched to simply wrap the action in
onlyActionOn once the key is known. But there do not seem to be many
such commands.
onlyActionOn' should not be used with a CommandStart (or CommandPerform),
although the types do allow it. onlyActionOn handles running the whole
CommandStart chain. I couldn't immediately see a way to avoid mistken
use of onlyActionOn'.
This commit was supported by the NSF-funded DataLad project.
After a false start, I found a fairly non-intrusive way to deal with it.
Although it only handles transfers -- there may be issues with eg
concurrent dropping of the same key, or other operations.
There is no added overhead when -J is not used, other than an added
inAnnex check. When -J is used, it has to maintain and check a small
Set, which should be negligible overhead.
It could output some message saying that the transfer is being done by
another thread. Or it could even display the same progress info for both
files that are being downloaded since they have the same content. But I
opted to keep it simple, since this is rather an edge case, so it just
doesn't say anything about the transfer of the file until the other
thread finishes.
Since the deferred transfer action still runs, actions that do more than
transfer content will still get a chance to do their other work. (An
example of something that needs to do such other work is P2P.Annex,
where the download always needs to receive the content from the peer.)
And, if the first thread fails to complete a transfer, the second thread
can resume it.
But, this unfortunately means that there's a risk of redundant work
being done to transfer a key that just got transferred.
That's not ideal, but should never cause breakage; the same
thing can occur when running two separate git-annex processes.
The get/move/copy/mirror --from commands had extra inAnnex checks added,
inside the download actions. Without those checks, the first thread
downloaded the content, and then the second thread woke up and
downloaded the same content redundantly.
move/copy/mirror --to is left doing redundant uploads for now. It
would need a second checkPresent of the remote inside the upload
to avoid them, which would be expensive. A better way to avoid
redundant work needs to be found..
This commit was supported by the NSF-funded DataLad project.
git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.
Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.
This commit was supported by the NSF-funded DataLad project.
Before, there was a window where interrupting an add could result in the
file being moved into the annex, with no symlink yet created.
This commit was supported by the NSF-funded DataLad project.
when storing files there, since that collection is created by initremote.
(This seems to work around some brokenness of the box.com webdav server
which was entering a redirect loop.)
Note that the fix makes locationParent return Nothing instead of "."
when there's no parent directory between the path and the top of the webdav
repo.
This commit was sponsored by André Pereira on Patreon.
In my git-annex repos, I found some stale transfer info files
without lock files.
Pass a mode to tryLockExclusive, so it will create the lock file if
not present, and so not fail to clean up such transfer info files.
Normally, transfer info files are accompanied by a lock file.
But, when alwaysRunTransfer is used, the locking can fail
and it will still write the transfer info file. Perhaps there are other
cases too? Note that mkProgressUpdater's meter
writes to the transfer info file too, and it might be possible for
that meter to fire after runTransfer has cleaned up.
This commit was sponsored by andrea rota.
Fix process and file descriptor leak that was exposed when git-annex was
built with ghc 8.2.1. Apparently ghc has changed its behavior of GC
of open file handles that are pipes to running processes. That
broke git-annex test on OSX due to running out of FDs.
Audited for all uses of Annex.new and made stopCoProcesses be called
once it's done with the state. Fixed several places that might have
leaked in other situations than running the test suite.
This commit was sponsored by Ewen McNeill.
Using annexeval to run probeCrippledFileSystem' caused Git.CurrentRepo.get
to be run. Fixed easily since probeCrippledFileSystem' had no need to use
the Annex monad.
This commit was sponsored by Ethan Aubin.
When the external special remote program crashed, a newline
could be output, which messed up the expected output for --batch mode.
Avoid checking EXPORTSUPPORTED for special remotes that are
not configured to use exports. The datalad special remote apparently is/was
buggy and crashed on EXPORTSUPPORTED. Anyway, there's no need to send
it when the configuration doesn't need it.
This commit was supported by the NSF-funded DataLad project.
Also deletes any tagged pushes that the assistant might have done,
since those would also prevent resetting a branch back.
This commit was sponsored by andrea rota.
Motivation is to remove all metadata when it gets copied from a previous
version of the file, and that is not deisrable.
This commit was supported by the NSF-funded DataLad project.
This is similar to the pusher thread, but a separate thread because git
pushes can be done in parallel with exports, and updating a big export
should not prevent other git pushes going out in the meantime.
The exportThread only runs at most every 30 seconds, since updating an
export is more expensive than pushing. This may need to be tuned.
Added a separate channel for export commits; the committer records a
commit in that channel.
Also, reconnectRemotes records a dummy commit, to make the exporter
thread wake up and make sure all exports are up-to-date. So,
connecting a drive with a directory special remote export will
immediately update it, and getting online will automatically
update S3 and WebDAV exports.
The transfer queue is not involved in exports. Instead, failed
exports are retried much like failed pushes.
This commit was sponsored by Ewen McNeill.
Done to avoid a "tmp" directory appearing in webdav exports.
Also affects non-export webdav remotes, so interrupted uploads using the
old path will not overwrite it. However, PUT is quite likely to be
implemented atomically on web servers anyway, so I doubt this will cause
problems.
inDAVLocation does not url-escape, and so exporting a filename with spaces
to box.com at least resulted in a error 400.
It might also have affected storing keys on a webdav remote, if the key
contained a space or other problem character. Pretty unlikely.
I emailed Clint about the inDAVLocation gotcha, but seems best to fix it
here.
This commit was supported by the NSF-funded DataLad project.
webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.
It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.
Deal with such problems by assuming such behavior means the file is not
present.
Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.
This commit was supported by the NSF-funded DataLad project.
This basically works, but there's a bug when renaming a file that leaves
a .git-annex-temp-content-key file in the webdav store, that never gets
cleaned up.
Also, exporting files with spaces to box.com seems to fail; perhaps it
does not support it?
This commit was supported by the NSF-funded DataLad project.
In a test, I uploaded a pdf, and several files were derived from it.
After removing the pdf, the derived files went away after approximatly
half an hour. This window does not seem worth warning about every time.
Documented it in the tip.
Removal works, only derives are a potential issue, so allow removing
with a warning. This way, unexporting a file works, and behavior is
consistent with IA remotes whether or not exporttree=yes.
Also tested exporting filenames containing unicode, spaces, underscores.
All worked, despite the IA's faq saying it doesn't.
This commit was sponsored by Trenton Cronholm on Patreon.
It opens a http connection per file exported, but then so does git
annex copy --to s3.
Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.
This commit was supported by the NSF-funded DataLad project.
https://github.com/haskell/cabal/issues/4655
This means that when a module is conditionally imported via ifdef
depending on the OS or build flags, the cabal file has to mirror the
same logic there to only list the module then.
Since there are lots of OS's and lots of combinations of build flags
here, it's rather difficult to know if the cabal file has been completelty
correctly updated to match the source code.
So I am very unhappy with needing to update things in two places. I've
only tested this on linux with most build flags enables; this will
probably need significant time and testing to catch every cabal file
tweak that this change to Cabal requires. And it will be a continual
source of compile failures going forward when the code is modified and
the cabal file not also updated.
DRY DRY DRY, I repeat myself, but: DRY! Sigh..
(Also, had to remove all Build.* that are standalone programs from the
Other-Modules list, because since cabal passes those modules to ghc when
building git-annex, it complains that they use module Main. Those
modules are only used when building with the Makefile anyway, so this
change shouldn't break anything.)
This commit was sponsored by Thomas Hochstein on Patreon.
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.
No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.
Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.
SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.
Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.
This commit was sponsored by Jochen Bartl on Patreon.
Fix the external special remotes git-annex-remote-ipfs,
git-annex-remote-torrent and the example.sh template to correctly support
filenames with spaces.
This commit was sponsored by John Peloquin on Patreon.
External special remotes will refuse to operate on keys with spaces in
their names. That has never worked correctly due to the design of the
external special remote protocol. Display an error message suggesting
migration.
Not super happy with this, but it's a pragmatic solution. Better than
complicating the external special remote interface and all external special
remotes.
Note that I only made it use SafeKey in Request, not Response. git-annex
does not construct a Response, so that would not add any safety. And
presumably, if git-annex avoids feeding any such keys to an external
special remote, it will never have a reason to make a Response using such a
key. If it did, it would result in a protocol error anyway.
There's still a Serializeable instance for Key; it's used by P2P.Protocol.
There, the Key is always in the final position, so it's ok if it contains
spaces.
Note that the protocol documentation has been fixed to say that the File
may contain spaces. One way that can happen, even though the Key can't,
is when using direct mode, and the work tree filename contains spaces.
When sending such a file to the external special remote the worktree
filename is used.
This commit was sponsored by Thom May on Patreon.
To work around the problem that the external special remote protocol does
not support keys containing spaces.
This commit was sponsored by Denis Dzyubenko on Patreon.
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.
For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.
Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.
This commit was sponsored by Trenton Cronholm on Patreon.
By forking a worker process and only deleting the test directory once it exits.
This way, if a test leaves files open, they'll get closed when the worker
exits, so avoiding failure to delete open files on Windows, and failure to
delete directories due to NFS lock files.
If a test leaves a git worker process running, the closed pipes should
cause the worker to exit too, also avoiding the problem there. The 10
second sleep ought to give plenty of time for such worker processes to
exit, although this is of course a race.
Finally, even if test directory fails to be deleted still,
it won't appear as if the last test in the test suite failed; the error
will be displayed at the very end.
This commit was supported by the NSF-funded DataLad project.
Should fix this:
lock (v6 --force): FAIL
Exception: .git/annex/keys: removeDirectoryRecursive: unsatisfied constraints (Directory not empty)
Verified that the test case still catches the regression it's meant to.
This commit was supported by the NSF-funded DataLad project.
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.
Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).
There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.
A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)
So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.
Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.
This commit was supported by the NSF-funded DataLad project.
QuickCheck added an Arbitrary instance for CTime aka EpochTime. However,
while git-annex's instance disallowed times before the epoch, QuickCheck's
does not. So, rather than using its instance, convert from an Integer.
This commit was sponsored by Thomas Hochstein on Patreon.
Don't trust OSX FSEvents's eventFlagItemModified to be called when the last
writer of a file closes it; apparently that sometimes does not happen,
which prevented files from being quickly added.
This commit was sponsored by John Peloquin on Patreon.