Commit graph

1329 commits

Author SHA1 Message Date
Joey Hess
a14001785e
fix --branch combined with --unlocked or --locked
Since it's using git ls-tree anyway, can just look at the file modes to see
if they're unlocked or are symlinks.
2021-03-02 13:47:27 -04:00
Joey Hess
25e4ab7e81
Prevent combinations of options such as --all with --include
Previously such nonsensical combinations always treated the matching option
as if it didn't match.

For now, made find --branch refuse matching options that need a
filename, because one is not provided to them in a way they'll use.
There's an open bug report to support it, but making it error out is
better than the old behavior of not finding what it was asked to.

Also, made --mimetype combined with eg --all work, by looking at the
object file when operating on keys.
2021-03-01 16:25:23 -04:00
Joey Hess
eb594c710e
unregisterurl: New command
Implemented by generalizing registerurl. Without the implicit batch mode
of registerurl since that is only a backwards compatability thing
(see commit 1d1054faa6).
2021-03-01 14:28:24 -04:00
Joey Hess
97ae474585
registerurl: Allow it to be used in a bare repository. 2021-03-01 14:03:03 -04:00
Joey Hess
a8b627d82b
uninit: Fix a small bug that left a lock file in .git/annex
unannex using git queue caused the queue lock to be taken after uninit had
cleaned out .git/annex. Flush the queue earlier to avoid.
2021-03-01 13:05:47 -04:00
Joey Hess
a942ed4bb9
Windows: Correct the path to the html help file for 64 bit build. 2021-02-24 13:19:42 -04:00
Joey Hess
d670346b22
releasing package git-annex version 8.20210223 2021-02-23 14:40:45 -04:00
Joey Hess
530e96b80e
fix unannex data overwrite bug
unannex, uninit: When an annexed file is modified, don't overwrite the
modified version with an older version from the annex

This commit was sponsored by Mark Reidenbach on Patreon.
2021-02-22 13:35:00 -04:00
Joey Hess
62d5a73bdd
unannex, uninit: Avoid running git rm once per annexed file, for a large speedup. 2021-02-22 12:56:11 -04:00
Joey Hess
cddf2343b2
wording 2021-02-22 12:51:52 -04:00
Joey Hess
f44d4704c6
incremental checksum for local remotes
This benchmarks only slightly faster than the old git-annex. Eg, for a 1
gb file, 14.56s vs 15.57s. (On a ram disk; there would certianly be
more of an effect if the file was written to disk and didn't stay in
cache.)

Commenting out the updateIncremental calls make the same run in 6.31s.
May be that overhead in the implementation, other than the actual
checksumming, is slowing it down. Eg, MVar access.

(I also tried using 10x larger chunks, which did not change the speed.)
2021-02-10 16:05:24 -04:00
Joey Hess
e24ddb8946
Bugfix: fsck --from a ssh remote did not actually check that the content on the remote is not corrupted
Changing to the P2P protocol broke this, because preseedTmp copies
the local copy of the object to the temp file, and then the P2P transfer
sees the right length file and uses it as-is.

When git-annex-shell is too old and rsync is used, it did verify the
content, and when the local repo does not have the object it did verify the
content.
2021-02-10 13:29:12 -04:00
Joey Hess
4b63e932f3
incremental checksum on upload to ssh or p2p 2021-02-10 12:41:05 -04:00
Joey Hess
62e152f210
incremental checksum on download from ssh or p2p
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.

Not tested at all yet, but I imagine it will work!

Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
2021-02-09 17:03:27 -04:00
Joey Hess
fa3d71d924
Tahoe: Avoid verifying hash after download, since tahoe does sufficient verification itself
See my comment in the next commit for some details about why
Verified needs a hash with preimage resistance. As far as tahoe goes,
it's fully cryptographically secure.

I think that bup could also return Verified. However, the Retriever
interface does not currenly support that.
2021-02-09 13:42:16 -04:00
Joey Hess
bb8db94655
reorder 2021-02-08 17:54:29 -04:00
Joey Hess
3a66cd715f
avoid making absolute git remote path relative
When a git remote is configured with an absolute path, use that path,
rather than making it relative. If it's configured with a relative path,
use that.

Git.Construct.fromPath changed to preserve the path as-is,
rather than making it absolute. And Annex.new changed to not
convert the path to relative. Instead, Git.CurrentRepo.get
generates a relative path.

A few things that used fromAbsPath unncessarily were changed in passing to
use fromPath instead. I'm seeing fromAbsPath as a security check,
while before it was being used in some cases when the path was
known absolute already. It may be that fromAbsPath is not really needed,
but only git-annex-shell uses it now, and I'm not 100% sure that there's
not some input that would cause a relative path to be used, opening a
security hole, without the security check. So left it as-is.

Test suite passes and strace shows the configured remote url is used
unchanged in the path into it. I can't be 100% sure there's not some code
somewhere that takes an absolute path to the repo and converts it to
relative and uses it, but it seems pretty unlikely that the code paths used
for a git remote would call such code. One place I know of is gitAnnexLink,
but I'm pretty sure that git remotes never deal with annex symlinks. If
that did get called, it generates a path relative to cwd, which would have
been wrong before this change as well, when operating on a remote.
2021-02-08 13:18:01 -04:00
Joey Hess
dd39e9e255
suggest when user may want annex.stalldetection
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.

Note that the progress meter display is not taken down when displaying
the message, so it will display like this:

	0%    8 B                 0 B/s
	  Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
	0%    10 B                0 B/s

Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.

Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.

A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.

This commit was sponsored by Kevin Mueller on Patreon.
2021-02-03 15:57:19 -04:00
Joey Hess
135757d64a
automatic stall detection
annex.stalldetection can now be set to "true" to make git-annex do
automatic stall detection when it detects a remote is updating its transfer
progress consistently enough.

This commit was sponsored by Luke Shumaker on Patreon.
2021-02-03 13:33:57 -04:00
Joey Hess
aec2cf0abe
addon commands
Seems only fair, that, like git runs git-annex, git-annex runs
git-annex-foo.

Implementation relies on O.forwardOptions, so that any options are passed
through to the addon program. Note that this includes options before the
subcommand, eg: git-annex -cx=y foo

Unfortunately, git-annex eats the --help/-h options.
This is because it uses O.hsubparser, which injects that option into each
subcommand. Seems like this should be possible to avoid somehow, to let
commands display their own --help, instead of the dummy one git-annex
displays.

The two step searching mirrors how git works, it makes finding
git-annex-foo fast when "git annex foo" is run, but will also support fuzzy
matching, once findAllAddonCommands gets implemented.

This commit was sponsored by Dr. Land Raider on Patreon.
2021-02-02 16:32:49 -04:00
Joey Hess
58216ef39d
Include libkqueue.h file needed to build the assistant on BSDs
I suspect this is a bug in cabal sdist, because with
Includes: Utility/libkqueue.h
the file is not included, but putting it in extra-files does
get it into the tarball.
2021-02-01 12:00:56 -04:00
Joey Hess
41bf440729
Fix build on openbsd. Thanks, James Cook for the patch. 2021-02-01 11:56:17 -04:00
Joey Hess
8d4eb2d34e
get: Improve output when failing to get a file fails
showTriedRemotes lists the remotes it tried to access. So there's
no need to list those again in "Try making some of these remotes
available".
2021-01-29 15:11:19 -04:00
Joey Hess
c35fa6975b
fix handling of implicit and before parens
Fix an oddity in matching options and preferred content expressions such as
"foo (bar or baz)", which was incorrectly handled as if it were "(foo or
bar) and baz)" rather than the intended "foo and (bar or baz)"

Seemed like a change to consume should be able to handle this case
better, but I was having trouble writing it that way, so instead added
a separate pass that inserts the implicit ands explicitly. Also added
several test cases to make sure versions with and without explicit ands
generate the same.
2021-01-28 13:51:07 -04:00
Joey Hess
6f78497572
When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked
Missed this when implementing it because of the default case catching
the new constructor. So, removed that default case to make sure
future types of adjusted branches don't make the same mistake.

Complicated by git-annex addurl --fast which adds the file whose content
is not present, so it needs to stay unlocked when on such a branch.

This commit was sponsored by Brock Spratlen on Patreon.
2021-01-28 12:47:46 -04:00
Joey Hess
e3224ff77d
formatLsTree did not use a tab where git does
Fixed that, and made parserLsTree accept the space as well as tab.

Fixes a reversion that made import of a tree from a special remote result in
a merge that deleted files that were not preferred content of that special
remote.
2021-01-28 12:36:37 -04:00
Joey Hess
a82aca67b8
releasing package git-annex version 8.20210127 2021-01-27 11:13:25 -04:00
Joey Hess
b372d962ae
Added GETGITREMOTENAME to extenal special remote protocol 2021-01-26 12:42:47 -04:00
Joey Hess
03b0b61018
wording 2021-01-25 17:40:08 -04:00
Joey Hess
47338bf270
support modifying and running git add on an unlocked file that used an URL key
Avoids the smudge --clean filter failing because URL keys do not support
genKey. Instead the modified content will be added using the default
backend.

This commit was sponsored by Jochen Bartl on Patreon.
2021-01-25 17:37:16 -04:00
Joey Hess
34a535ebea
adjust: Fix some bad behavior when unlocked files use URL keys.
This avoids the smudge --clean filter failing on the URL keys.

git checkout runs the post-checkout hook, which runs smudge --update.
That populates all the pointer files, but it neglected to store their inode
caches in the keys db. With that done, and the keys db flushed before
smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap
check can tell the file is not modified, so will not try to re-ingest it,
which does not work with URL keys because they do not support genKey.

It also seems possible that the isUnmodifiedCheap was also failing for
non-URL keys, which would cause them to be re-ingested, leading to a lot of
extra work. I have not verified that, but don't see why it wouldn't have
happened. So this probably also speeds up checking out adjusted branches.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2021-01-25 17:25:42 -04:00
Joey Hess
5c7e6629cf
Fix a bug in view filename generation when a metadata value ended with "/"
Or ":" or "\" on Windows, eg "c:" again.
2021-01-22 14:05:14 -04:00
Joey Hess
95cd49abdb
fix a bug that prevented git-annex init from working in a submodule
This is probably a reversion, but not sure what caused it. By the time
Annex.Init runs fixupUnusualReposAfterInit, another git-annex process has
at least sometimes already done the necessary fixups. (Eg, one run
indirectly by a git command.) But since the Repo is cached, it doesn't
realize and does them again. So, avoid crashing when git config --unset
fails.

This commit was sponsored by Jack Hill on Patreon.
2021-01-21 15:33:15 -04:00
Joey Hess
73df633a62
omit inode from ContentIdentifier for directory special remote
Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.

A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.

I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.

There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.

This commit was sponsored by Brett Eisenberg on Patreon.
2021-01-19 13:15:07 -04:00
Joey Hess
2aa4fab62a
avoid crashing when there are remotes using unparseable urls
Including the non-standard URI form that git-remote-gcrypt uses for rsync.

Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port
number. But git could have a remote with that, it would try to run
git-remote-ook to handle it. So, git-annex has to allow for such things,
rather than crashing.

This commit was sponsored by Luke Shumaker on Patreon.
2021-01-18 14:59:08 -04:00
Joey Hess
5193aae385
Bug fix: Fix tilde expansion in ssh urls when the tilde is the last character in the url. Thanks, Grond for the patch. 2021-01-18 12:22:48 -04:00
Joey Hess
6a30d04ece
Bug fix: export with -J could fail when two files had the same content.
Exporting is done inside a call to writeLockDbWhile which guarantees there
is only one process uploading to a given ExportLocation.
2021-01-13 14:50:48 -04:00
Joey Hess
5e39b7eb8d
Windows: Work around win32 length limits when dealing with lock files 2021-01-13 14:38:35 -04:00
Joey Hess
1e65d1b9af
merged fix from kyle 2021-01-07 13:47:36 -04:00
Joey Hess
c8b1fa67b4
Behavior change: --trust-glacier option no longer overrides trust
Since that can lead to data loss, which should never be enabled by an
option other than --force.

This commit was sponsored by Jake Vosloo on Patreon.
2021-01-07 10:37:43 -04:00
Joey Hess
2bf34fc17f
Behavior change: --trust option no longer overrides trust
Since that can lead to data loss, which should never be enabled by an
option other than --force.

I suppose that using --trust was in some situation, safer than --force,
because it doesn't entirely disable checking for data loss, but only
disables checking involving data that is on the specified repository.
But it seems better to be able to say that data loss only happens with
--force.

This commit was sponsored by Graham Spencer on Patreon.
2021-01-07 10:34:57 -04:00
Joey Hess
6a0030a110
Behavior change: git-annex trust now needs --force
Since unconsidered use of trusted repositories can lead to data loss.

Trusted has always been this way, but it used to be acceptable for
git-annex to be set up so that data could be lost without using --force,
and most or all other ways that can happen have already been eliminated.

This commit was sponsored by Mark Reidenbach on Patreon.
2021-01-07 10:09:39 -04:00
Joey Hess
715c6013d4
wording 2021-01-06 14:26:39 -04:00
Joey Hess
cc89699457
mincopies
This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.

Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.

This commit was sponsored by Denis Dzyubenko on Patreon.
2021-01-06 14:15:19 -04:00
Joey Hess
428d228ee5
docs for requirednumcopies
Not implemented yet.
2021-01-05 14:22:44 -04:00
Joey Hess
a3a19518d8
fix --time-limit
It got broken in several ways by the streaming seeking optimisations
around version 8.20201007.

Moved time limit checking out of the matcher, which was a hack in the
first place. So everywhere that uses Limit.getMatcher needs to check
time limit. Well, almost everywhere. Command.Info uses it, but it does
not make sense to time limit getting info. And Command.MultiCast uses it
just to build up a list of files that then get passed to a command, so
it would never have hit the timeout in a useful way.

This implementation is a little more expensive when at time limit than
necessary, since it continues seeking only to discard everything after the
time limit. I did try making it close the file handles to force a faster
shutdown, but that didn't work and hung. Could certianly be improved
somehow, but seeking is probably not the expensive bit when a time limit
is hit, so this seems acceptable for now.
2021-01-04 15:57:11 -04:00
Joey Hess
5ce61c6b2a
add: Significantly speed up adding lots of non-large files to git
* add: Significantly speed up adding lots of non-large files to git,
  by disabling the annex smudge filter when running git add.
* add --force-small: Run git add rather than updating the index itself,
  so any other smudge filters than the annex one that may be enabled will
  be used.
2021-01-04 13:12:28 -04:00
Joey Hess
7d843e909d
releasing package git-annex version 8.20201129 2020-12-29 13:51:40 -04:00
Joey Hess
7916fc98a3
graft in imported tree to avoid gc
Fix a bug that could prevent getting files from an importtree=yes remote,
because the imported tree was allowed to be garbage collected.
2020-12-23 14:27:38 -04:00
Joey Hess
cd4c68924b
merged borg
Still a couple related todos, but it's basically usable now.
2020-12-22 16:22:44 -04:00
Joey Hess
6b13574827
Windows: include= and exclude= containing '/' will also match filenames that are written using '\'
And vice-versa, but it's better to use '/' for portability.

Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
2020-12-15 12:39:34 -04:00
Joey Hess
3519f1ab7f
reorg 2020-12-15 12:12:03 -04:00
Joey Hess
017ce1b811
clarify 2020-12-15 12:09:27 -04:00
Joey Hess
6c890d62f6
initremote: Prevent enabling encryption with exporttree=yes/importtree=yes
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
2020-12-15 12:08:08 -04:00
Joey Hess
ed68a2166d
importfeed: Avoid using youtube-dl when a feed does not contain an enclosure, but only a link to an url which youtube-dl does not support
This is common in some feeds, which might mix some items with enclosures,
with others that link to posts or whatever. Before this, it would try to
use youtube-dl and fail, or if youtube-dl was not allowed, it would
incorrectly complain that an url was supported by youtube-dl.
2020-12-15 01:13:21 -04:00
Joey Hess
16315b7812
typo 2020-12-14 21:35:31 -04:00
Joey Hess
01527b21d8
add key to FileInfo
MatchingKey is not the thing to use when matching on actual worktreee
files.

Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
2020-12-14 17:42:02 -04:00
Joey Hess
75acf5f440
improve some edge cases around partial initialization
* Guard against running in a repo where annex.uuid is set but
  annex.version is set, or vice-versa.
* Avoid autoinit when a repo does not have annex.version or annex.uuid
  set, but has a git-annex objects directory, suggesting it was used
  by git-annex before.
2020-12-14 13:17:43 -04:00
Joey Hess
6a11b6fab8
Support special remotes that are configured with importtree=yes but without exporttree=yes
There was no particular reason not to support this, other than maybe a lack
of a use case. One use case would of course be a remote that you want to
avoid overwriting content on. A new use case is the idea of importing from
backups, eg borg, where exporting is not necessarily supported at all.

This commit was sponsored by Brock Spratlen on Patreon.
2020-12-10 13:17:40 -04:00
Joey Hess
41f2c308ff
stall detection is working
New config annex.stalldetection, remote.name.annex-stalldetection, which
can be used to deal with remotes that stall during transfers, or are
sometimes too slow to want to use.

This commit was sponsored by Luke Shumaker on Patreon.
2020-12-08 15:22:18 -04:00
Joey Hess
0540e987b3
improve p2p protocol handling of requested object not available
Avoid spurious "verification of content failed" message when downloading
content from a ssh or tor remote fails due to the remote no longer having a
copy of the content.

The P2P protocol already handled this case by sending DATA 0, followed by
VALID. But VALID was not really right, because the data is not the
requested data. So, send DATA 0, followed by INVALID. Old versions of
git-annex handle INVALID the same as VALID in this case. Now new versions
avoid displaying an incorrect message.

It would be better for the P2P protocol to have a different way to indicate
this, like perhaps sending INVALID without DATA. But that would be a
breaking change and need a new protocol verison. Since INVALID already is
part of the protocol and already needs to be handled, using it for this
special case too seems ok, and avoids the complication of another protocol
version.

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-01 16:05:55 -04:00
Joey Hess
92136284b1
avoid hGetMetered 0 closing the handle
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.

I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-01 15:39:22 -04:00
Joey Hess
7776677a5f
Fix hang on shutdown of external special remote using ASYNC protocol extension.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.

Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.

The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-30 13:04:02 -04:00
Joey Hess
dad8442572
releasing package git-annex version 8.20201127 2020-11-27 12:57:02 -04:00
Joey Hess
ff4354c6e4
Made the test suite significantly less noisy
Only displaying git-annex and git command output when something went wrong.

A few could still leak stderr. These include the couple of calls
to readProcess, which reads stdin but lets stderr through. But they don't
leak any usually, so probably only would when failing anyway.

Currently, there is no excess output at all!

This commit was sponsored by Brock Spratlen on Patreon.
2020-11-24 14:15:40 -04:00
Joey Hess
88cef18fac
upgrade: Support an edge case upgrading a v5 direct mode repo where nothing had ever been committed to the head branch
This commit was sponsored by Jack Hill on Patreon.
2020-11-24 12:31:17 -04:00
Joey Hess
04dca96710
changelog 2020-11-19 14:46:54 -04:00
Joey Hess
b90b9b936d
don't rely on exception for http 416
Fix a bug that could make resuming a download from the web fail when the
entire content of the file is actually already present locally.

What a mess that Request can throw exceptions or not, depending on how
it's configured. Makes it very hard if you need to handle some specific
http status codes in a function like this! Implementing everything two
ways did not seem appealing, if possible at all, so I decided to
override the Request if it did come configured to throw exception on
non-2xx http status. Other exceptions, like from http-client-restricted,
or due to a redirect to a non-http url, still get thrown.

This commit was sponsored by Luke Shumaker on Patreon.
2020-11-19 14:44:42 -04:00
Joey Hess
b3c88da181
fix windows assistant upgrade glitch
Prevent windows assistant from trying (and failing) to upgrade itself,
which has never been supported on windows.

The new windows build is made with UPGRADE_LOCATION set, which enabled this
code path that had never run on windows before, and doesn't work. I don't
want to try to support self-upgrade on windows, or generally on other OS's
than the ones where its working, so added a check for that. This way the
build can keep setting UPGRADE_LOCATION and if some later git-annex does
learn how to upgrade itself on some OS, it won't need changing the build
setup.
2020-11-19 12:50:25 -04:00
Joey Hess
4b739fc460
Fix build on Windows
Thanks to bug reporter for the patch.
2020-11-19 12:33:00 -04:00
Joey Hess
043eee0cb5
update 2020-11-18 15:16:49 -04:00
Joey Hess
6b63278f31
init: When writing hook scripts, set all execute bits, not only the user execute bit 2020-11-17 13:31:12 -04:00
Joey Hess
0896038ba7
annex.adjustedbranchrefresh
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.

Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:

* If two files point to one file, then eg, `git annex get foo` will
  update the branch to unlock foo, but will not unlock bar, because it
  does not know about it. Might be fixable by making `git annex get
  bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
  of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info

Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
2020-11-16 14:27:28 -04:00
Joey Hess
26cf26caca
Merge branch 'master' into symlink-missing 2020-11-16 10:03:12 -04:00
Joey Hess
5a8d01f63e
examinekey: Added a "file" format variable
For consistency with find, and for easier scripting.
2020-11-16 09:59:11 -04:00
Joey Hess
864af53a2d
releasing package git-annex version 8.20201116 2020-11-16 09:38:29 -04:00
Joey Hess
e66b7d2e1b
rename to --unlock-present and better reverse adjusting
An --unlock-present branch reverses back to a branch where
all files that get modified or renamed become locked, even if they were
originally unlocked. This is the same that reversing a --unlock branch
works, and the new name makes that commonality more clear.
2020-11-13 14:56:43 -04:00
Joey Hess
3899e216af
Merge branch 'master' into symlink-missing 2020-11-13 14:19:45 -04:00
Joey Hess
a30030c4a6
move: Fix a regression in the last release that made move --to not honor numcopies settings
This commit was sponsored by Svenne Krap on Patreon.
2020-11-13 14:19:32 -04:00
Joey Hess
c8e49c5ef5
git-annex adjust --lock-missing
Like --hide-missing the branch does not get updated when content
availability changes.

Seems to basically work, but sync does not update it yet.

Also, when a file is present and so unlocked, git mv followed by
git-annex sync results in the basis branch being updated to contain the
file with the new name, unlocked. This seems different than what
happens in an adjusted unlocked branch, where the commit propigates back
locked. Probably the reverse adjustment code needs to be improved to
handle this case.
2020-11-13 13:39:44 -04:00
Joey Hess
7566aa6bc5
examinekey: Added --migrate-to-backend
Note that, the way the SeekInput parser is written to support batch mode,
it's actually possible to do git-annex examinekey
"SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E

While that might be kind of useful to support multiple migrations not using
batch mode, I have not documented it. It would be better to take pairs of
key and file in that case.
2020-11-12 14:09:14 -04:00
Joey Hess
12e32d1dee
examinekey: Added two new format variables: objectpath and objectpointer 2020-11-12 13:02:31 -04:00
Joey Hess
92b7b1964d
add warning on add of annex link
Warn when adding a annex symlink or pointer file that uses a key that is
not known to the repository, to prevent confusion if the user has copied it
from some other repository.

This commit was sponsored by Jake Vosloo on Patreon.
2020-11-10 12:10:51 -04:00
Joey Hess
d032b0885d
use MatchingKey when a Key is known
This fixes a bug where a file that was not preferred content could be
transferred to a remote. This happened when the file got deleted after
the sync started running.

The only time checkMatcher is run without a Key is in calls to
checkFileMatcher, which are only done by add, addurl, import, and
smudge --clean. Those won't be affected by this kind of race. Anything
else that might be precaching and have a similar race as sync will also
be fixed, but I don't know if it actually affected anything other than
sync.

As well as fixing a bug, this also probably makes sync and --auto faster
by avoiding the redundant key lookup.

This commit was sponsored by Graham Spencer on Patreon.
2020-11-09 15:17:22 -04:00
Joey Hess
2dabd4cc2d
releasing package git-annex version 8.20201103 2020-11-03 11:53:11 -04:00
Joey Hess
9252f86b2e
view: Fix a reversion in 8.20200522 that broke entering or changing views.
Commit 2dc7b5186a messed up indentation.

This commit was sponsored by Noam Kremen on Patreon.
2020-11-02 14:47:08 -04:00
Joey Hess
7245a9ed53
Improve shutdown process for external special remotes and external backends
Make sure to relay any remaining stderr from the process after it has
shut down, rather than closing stderr just before shutdown. This avoids
a situation where the process is still running and tries to write to
stderr, getting a SIGPIPE. And, it ensures that no stderr output is
lost.

This may fix a problem encountered by datalad on windows, where it hangs
during the external special remote shutdown.

Before commit a49d300545, it closed stdin
and stdout, but left stderr open, and never killed the stderr waiter
thread, which presumably exited on its own. For async exception
safety, do need to at make sure that thread gets waited on, as that
commit does, but it introduced this problem.

Note that, the process's stdout is closed before waiting on it. It's too
late for anything it writes to stdout to be processed, and since we're
not going to consume any such writes, this avoids the process getting
blocked writing to stdout due to us not reading what it's buffered. This
does mean that if the process writes to stdout too late, it will get a
SIGPIPE. (This was already the case before the above-mentioned commit.)
In practice, I think only the protocol's ERROR is allowed to be
sent at a point where this could happen.
2020-11-02 12:56:35 -04:00
Joey Hess
64e7bac810
view: Avoid using ':' from metadata when generating a view
Because it's a special character on Windows ("c:").

Use same technique already used for '/' and '\'.

I didn't record how I generated their encoded forms before, so am sure
there was a better way, but the way I did it now is to look at

	ghci> encodeFilePath "∕"
	"\226\136\149"

And then the difference from that to "\56546\56456\56469"
is adding 56320 to each, to get up to the escaped code plane.

See comment for why I think handling ':' is ok, but that other illegal
windows filenames won't. Note that, this should be enough to make the
test suite always work. Other windows illegal filenames will fail at
checkout time when it tries to put the illegal filename on the
filesystem.
2020-10-26 15:38:08 -04:00
Joey Hess
5e458e8ac6
changelog 2020-10-26 13:43:40 -04:00
Joey Hess
f3070d2d7d
Windows build changed to one done by the datalad-extensions project using Github actions
This is a cleaner build than on Jenkins because the whole environment setup
is handled by the CI config, at least up to the point of "get a random bag
of Windows bytes".

Also, the Jenkins autobuilder has been intermittently failing for a long
time, not due to any problem with git-annex but just a failure to clean up
directories.

Also, this build runs the test suite, and it is (mostly) passing. Test
suite always failed in the jenkins environment.

Also, this build includes libmagic.

Here is the build workflow used by github actions:
https://github.com/datalad/datalad-extensions/blob/master/.github/workflows/build-git-annex-windows.yaml
The libmagic build has its own workflow:
https://github.com/datalad/file-windows/blob/master/.github/workflows/build.yml

(Also cleaned up some windows build cruft I don't use anymore.)

There is no build-version file to link to. I've opened a todo requesting
one: https://github.com/datalad/datalad-extensions/issues/55
2020-10-26 13:17:23 -04:00
Joey Hess
8fda1ef0fa
document version for --force-large/--force-small
also fix the wrong name in the changelog
2020-10-26 11:34:30 -04:00
Joey Hess
a108b00b33
testremote: Display exceptions when tests fail, to aid debugging 2020-10-23 15:41:57 -04:00
Joey Hess
681313dfd4
deal with .git pointer file in Git.CurrentRepo
This fixes the bug.

Note, it's only done when GIT_DIR is set. When it's not set,
Git.Construct already handled it. This is why it was only noticed with this
git submodule command.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-10-23 14:56:12 -04:00
Joey Hess
0736383e98
Fix bug that prevented linux standalone bundle from working on a fresh install
Bug was introduced in version 8.20201007, lost a necessary mkdir.

This commit was sponsored by Noam Kremen on Patreon.
2020-10-23 12:19:40 -04:00
Joey Hess
dad4be97c2
speculatively use remote's configured chunk size as a fallback
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.

This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.

Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
2020-10-22 13:11:06 -04:00
Joey Hess
0133b7e5a8
move: Improve resuming a move that was interrupted after the object was transferred
In cases where numcopies checks prevented the resumed move from dropping
the object from the source repository, it now relies on a log of recent
moves to replicate the behavior of the interrupted command.

Performance: Probably noticable impact, since it has to add to the log,
check the log, and remove from the log. Seems worth it to avoid this
annoying edge case. The log functions are pretty well optimised to avoid
unncessary work.

An performance improvement to make later would be to avoid cleanup doing
anything if it's not written to the log file, and has confirmed that the
log file does not contain the log line.

This commit was sponsored by Jake Vosloo on Patreon.
2020-10-21 10:31:56 -04:00
Joey Hess
7036d0a4c1
add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan
This commit was sponsored by Jochen Bartl on Patreon.
2020-10-19 15:36:18 -04:00
Joey Hess
9a5cd96f0d
Fix a memory leak introduced in the last release
The problem was this line:

	cleanup = and <$> sequence (map snd v)

That caused all of v to be held onto until the end, when the cleanup action
was run.

I could not seem to find a bang pattern that avoided the leak, so I
resorted to a IORef, rather clunky, but not a performance problem because
it will only be written once per git ls-files, so typically just 1 time.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-10-13 16:31:01 -04:00
Joey Hess
d54dd0ef9c
Fix build on Windows with network-3
inet_addr was removed, but all this needs is localhost, so hardcoding it
should work fine.

It may be that this windows ifdef is no longer needed. It was added in 2013
with a note that getAddrInfo didn't work on windows, but it seems likely
such a problem would have been fixed since.
2020-10-08 10:50:39 -04:00
Joey Hess
cf33be21ac
releasing package git-annex version 8.20201007 2020-10-07 14:10:56 -04:00
Joey Hess
20f86e43f7
Fix a build failure on Windows. 2020-10-07 12:04:54 -04:00
Joey Hess
e0ca1236ee
runshell: Update files atomically when preparing to run git-annex
This does not make it entirely idempotent, but it's a start.
2020-10-05 13:38:34 -04:00
Joey Hess
cd9a60bc7d
runshell: Fix a edge case where rm errors were sent to stdout, which could confuse things parsing git-annex output. 2020-10-05 12:44:40 -04:00
Joey Hess
5555697ae6
Enable building with git-annex benchmark by default
Only turning it off when the criterion library is not installed.

Not enabled for osx or i386ancient yet since that will need some
invesitgation to update their respective stack.yaml files.
2020-10-02 13:57:10 -04:00
Joey Hess
37426920d8
Fix build with Benchmark build flag
Broke a while ago during optimisation work, and not noticed since the flag
is disabled by default.

This commit was sponsored by Brock Spratlen on Patreon.
2020-10-02 13:30:24 -04:00
Joey Hess
c56efbbdb6
import: Check gitignores when importing trees from special remotes
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-30 10:41:59 -04:00
Joey Hess
4c32499e82
Parse youtube-dl progress output
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.

The youtube-dl output is no longer displayed, except for any errors.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-29 17:53:48 -04:00
Joey Hess
084b502c7a
httpalso: Support being used with special remotes that do not have encryption= in their config. 2020-09-29 13:56:27 -04:00
Joey Hess
b2cf284d2a
upgrade: Avoid an upgrade failure of a bare repo in unusual circumstances 2020-09-29 13:45:14 -04:00
Joey Hess
1610d94776
addurl: Avoid a redundant git ignores check for speed
Ensure that checkCanAdd is used everywhere a file is added to git,
so git add is run with -f, presumably avoiding the work it would usually
do to check ignores.
2020-09-29 13:00:41 -04:00
Joey Hess
658ea7ca3c
sync --no-content import from directory special remote
sync: When run without --content, import without copying from
importtree=yes directory special remotes. (Other special remotes may
support this later as well.)

This commit was sponsored by Svenne Krap on Patreon.
2020-09-28 15:29:08 -04:00
Joey Hess
15c1ee16d9
import --no-content: Check annex.largefiles
Import small files into git, the same as is done when importing with content.
Which means, for small files, --no-content does download them.

If the largefiles expression needs the file content available
(due to mimetype or mimeencoding being used), the import will fail.

This commit was sponsored by Jake Vosloo on Patreon.
2020-09-28 13:28:57 -04:00
Joey Hess
ace02f41b0
seek: defer matcher check until more info is known
Sped up seeking for files to operate on, when using options like --copies
or --in, by around 20%.

Benchmark showed an increase for --copies from 155 seconds to 121
seconds, and --in remote will be similar to that.

For --in here, the speedup was less, 5-10% or so.

(both warm cache)

This commit was sponsored by Jack Hill on Patreon.
2020-09-24 17:59:12 -04:00
Joey Hess
d89984b121
sync --all avoid unncessary first pass
Sped up seeking to around twice as fast, by avoiding a pass over the
worktree files when preferred content expressions of the local repo and
remotes don't use include=/exclude=.

Thanks to Lukey for identifying the optimisation.

This commit was sponsored by Brock Spratlen on Patreon.
2020-09-24 15:12:09 -04:00
Joey Hess
68f9766544
Improve --debug output to show pid of processes that are started and stopped
getPid returns Nothing if the process has already been stopped, and in that
case, the pid will not be displayed. I think that would only happen if
waitForProcess or similar gets called more than once on the same process
handle though.

getPid on unix has an overhead of only a MVar read. On Windows it needs to
make a syscall, so will be probably more expensive. While the added expense
happens even when debug logging is disabled, it should be small enough
compared with the overhead of starting a process that it's not a problem.

(It does occur to me that a debugM that took an IO String could only run it
when debugging is really enabled, which would improve performance. It does
not seem possible to use the current hslogger interface to do that though;
it does not expose the information that would be needed.)
2020-09-24 12:39:57 -04:00
Joey Hess
6a5e0cbfc7
Improve the "Try making some of these repositories available" message
With some hints for the user for what to do.

Took care to avoid changing the json output. It would have been ok to add
the new separated lists to it, in addition to the old list, but I didn't
do that because I didn't see much point.
2020-09-22 14:10:30 -04:00
Joey Hess
d0b06c17c0
Added --no-check-gitignore option for finer grained control than using --force.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.

(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)

addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.

In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
2020-09-18 13:19:13 -04:00
Joey Hess
922621301a
Serialize use of C magic library, which is not thread safe.
This fixes failures uploading to S3 when using -J.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-17 17:27:42 -04:00
Joey Hess
83df401d93
Merge branch 'batchasync' into master 2020-09-16 13:02:58 -04:00
Joey Hess
877ef84a1b
support --batch -J
--batch combined with -J now runs batch requests concurrently for many
commands. Before, the combination was accepted, but did not enable
concurrency. Since the output of batch requests can be in any order, --json
with the new "input" field is recommended to be used, to determine which
batch request each response corresponds to.

If --json is not used, batch mode still runs concurrently, using the usual
concurrent-output. That will not be very useful for most batch mode users,
probably, but who knows.

If a program was using --batch -J before, and was parsing non-json output,
this could break it. But, it was relying on git-annex not supporting
concurrency despite it being enabled, so it should have expected concurrent
output. So, I think that's ok.

annex.jobs does not enable concurrency in --batch mode, because that would
confuse programs that use --batch but don't expect concurrency.
2020-09-16 12:10:37 -04:00
Joey Hess
fcf5d11c63
add "input" field to json output
The use case of this field is mostly to support -J combined with --json.
When that is implemented, a user will be able to look at the field to
determine which of the requests they have sent it corresponds to.

The field typically has a single value in its list, but in some cases
mutliple values (eg 2 command-line params) are combined together and the
list will have more.

Note that json parsing was already non-strict, so old git-annex metadata
--json --batch can be fed json produced by the new git-annex and will
not stumble over the new field.
2020-09-15 16:22:44 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
5844a54869
aws-0.22 improved its support for setting etags, which improves support for versioned S3 buckets.
Remove placeholder version number I used when implementing the feature in
aws.

This commit was sponsored by Ethan Aubin.
2020-09-14 18:37:49 -04:00
Joey Hess
1a785d05c0
releasing package git-annex version 8.20200908 2020-09-08 14:20:47 -04:00
Joey Hess
dcaa1c1cc9
reorder 2020-09-08 12:54:17 -04:00
Joey Hess
6ea511beb4
Removed the S3 and WebDAV build flags
So these special remotes are always supported.

IIRC these build flags were added because the dep chains were a bit too
long, or perhaps because the libraries were not available in Debian stable,
or something like that. That was long ago, those reasons no longer apply,
and users get confused when builtin special remotes are not available, so
it seems best to remove the build flags now.

If this does cause a problem it can be reverted of course..

This commit was sponsored by Jochen Bartl on Patreon.
2020-09-08 12:42:59 -04:00
Joey Hess
62372ee052
resolvemerge: Improve cleanup of cruft left in the working tree by a conflicted merge
This commit was sponsored by Jake Vosloo on Patreon.
2020-09-07 16:50:27 -04:00
Joey Hess
d120c73302
sync, assistant: When merge.directoryRenames is not set, default it it to "false"
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".

This is not done when automatic merge conflict resolution is disabled.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-09-07 13:50:58 -04:00
Joey Hess
69053a93a2
resolvemerge: Improve cleanup of files that were deleted by one side of a conflicted merge, and modified by the other side
This case was handled by cleanConflictCruft, but only when the annexed
file's object was present. When not present, it left the annexed file
with the original name, not checked into git, while adding the variant
file. So, add an explicit deletion of the deleted file in this case.

My specific case where this happened actually involves
merge.directoryRenames=conflict. After a merge involving that,
the situation was the file appears as "added by them", because that
caused the file that they added to be moved into a directory we renamed.

That case is the same as them adding a modified version of the file,
while we deleted it. (Except for the history of the file, since it's a
new file, but this doesn't look at history.)

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-07 12:25:57 -04:00
Joey Hess
e36bae74da
Exposed annex.forward-retry git config
One reason is, 5 is an arbitrary number so ought to be configurable.

The real reason though, is I wanted to make the man page explain when
forward retry can override annex.retry, and having a config made the
man page easier to write.
2020-09-04 15:16:40 -04:00
Joey Hess
2bb933eb60
import: Retry downloads that fail
Also, using the transfer machinery for this makes eg, git-annex info show
in-progress imports, and makes --notify-start/finish work.
2020-09-04 13:54:05 -04:00
Joey Hess
46eb48d7c0
Retry transfers to exporttree=yes remotes same as for other remotes
The comment about noRetry is not well-justified, because transfers to many
remotes cannot be resumed, but retries are still allowed for those.
2020-09-04 13:24:08 -04:00
Joey Hess
1d244bafbd
Limit retrying of failed transfers when forward progress is being made to 5
To avoid some unusual edge cases where too much retrying could result in
far more data transfer than makes sense.
2020-09-04 12:46:37 -04:00
Joey Hess
6e9a4f50f3
make viaTmp honor umask
Fixed several cases where files were created without file mode bits that
the umask would usually set. This included exports to the directory special
remote, torrent files used by the bittorrent special remote, hooks written
by git-annex init, and some log files in .git/annex/

Audited all calls, looking for ones that didn't want the umask bits to be
set. All such turned out to already set the specific restrictive file mode
they wanted.
2020-09-02 14:54:07 -04:00
Joey Hess
8656afd3e1
rename http special remote to httpalso
"http" was too generic and easy to confuse with web. The new name makes
clear it's used in addition to some other remote. And other protocols
can use the same naming scheme.
2020-09-02 10:41:53 -04:00
Joey Hess
571ec900ac
Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https.
With automatic layout learning!
2020-09-01 15:16:35 -04:00
Joey Hess
41ebed3941
Support git remotes where .git is a file, not a directory
Eg when --separate-git-dir was used, and core.symlinks=false.

This commit was sponsored by Brock Spratlen on Patreon.
2020-08-28 15:08:14 -04:00
Joey Hess
cde3e5eb0c
test: Stop gpg-agent daemons that are started for the test framework's gpg key
They normally shutdown when the GNUPGHOME directory is deleted, but on
NFS they keep the directory from being deleted. And also, this avoids
a number of them piling up while the test suite is running.
2020-08-28 14:28:42 -04:00
Joey Hess
b68f214312
Display a message when git-annex has to wait for a pid lock file held by another process 2020-08-26 13:05:34 -04:00
Joey Hess
7bdb0cdc0d
add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.

Renamed runsGitAnnexChildProcess to make clearer where it should be
used.

Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
2020-08-25 14:57:49 -04:00
Joey Hess
6b0532e532
wording 2020-08-25 14:47:17 -04:00
Joey Hess
2ca1ff62dc
addurl --file youtube-dl reversion fix
addurl: Fix reversion in 7.20190322 that made --file not be honored when
youtube-dl was used to download media.

8758f9c561 was on the right track, but missed that | otherwise prevented
the code it added from being used.

Also, refactored out a common function.

This commit was sponsored by Graham Spencer on Patreon.
2020-08-25 12:56:45 -04:00
Joey Hess
27329f0bb1
stack.yaml: Updated to lts-16.10
Needs stack version 2.3 to build, which has only recently made it into
debian unstable.

This commit was sponsored by Jake Vosloo on Patreon.
2020-08-24 14:11:37 -04:00
Joey Hess
f241a3cd3d
Display warning when external special remote does not start up properly, or is not usable
I'm sure this used to work, but somewhere along the line something or
things (getCost and getAvailability I think, probably others)
started catching the exception and not displaying it. So, show warnings.
2020-08-14 15:38:31 -04:00
Joey Hess
05b2b46a82
async extension done 2020-08-14 15:24:34 -04:00
Joey Hess
020e588262
reorder 2020-08-10 16:18:35 -04:00
Joey Hess
bcbdada8bf
fixed 2020-08-10 13:12:55 -04:00
Joey Hess
506ffea5e6
stop symlink check once the top of the working tree is reached
Avoid complaining that a file with "is beyond a symbolic link" when the
filepath is absolute and the symlink in question is not actually inside the
git repository.

This assumes that inodes remain stable while the command is running.
I think they always will, the filesystems where they are unstable change
them across mounts. (If inodes were not stable, it would just complain about
symlinks in the path that are not inside the working tree.)

(On windows, I don't want to assume anything about inodes, they could be
random numbers for all I know. But if they were, this would still be ok, as
long as windows doesn't have symlinks that are detected by isSymbolicLink.
Which seems a fair bet.)
2020-08-06 20:14:30 -04:00
Joey Hess
283d2f85d1
importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_'
sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was
running it on parts of the template. So eg the leading '.' in the extension
got sanitized.

Note the added case for sanitizeLeadingFilePathCharacter ('/':_)
-- this was added because, if the template is title/episode and the title
is not set, it would expand to "/episode". So this is another potential
security fix.
2020-08-05 11:35:00 -04:00
Joey Hess
c4ec52b9ae
Slightly sped up the linux standalone bundle
Reduce the number of directories listed in libdirs, which makes the linker
check a lot less dead ends looking for directories.

Eliminated some directories that didn't really contain shared libraries,
or only contained the linker.

That left only 2, one in lib and one in usr/lib, so consolidate those two.

Doing it this way, rather than just consolidating all libs that might exist
into a single directory means that, if there are optimised versions of some
libs, eg in lib/subarch/foo.so, and lib/subarch2/foo.so, they don't get
moved around in a way that would make the linker pick the wrong one.
2020-07-31 14:42:03 -04:00
Joey Hess
049807dbba
external backends implemented 2020-07-29 17:24:34 -04:00
Joey Hess
00c5f04f20
Deal with unusual IFS settings in the shell scripts for linux standalone and OSX app.
Thanks, Yaroslav Halchenko
2020-07-24 14:46:50 -04:00
Joey Hess
79187a6eaf
Revert "Unset IFS in shell scripts in the linux standalone build and OSX app."
This reverts commit 24125e8dc4.

yoh has a better patch I see
2020-07-24 14:33:13 -04:00
Joey Hess
24125e8dc4
Unset IFS in shell scripts in the linux standalone build and OSX app. 2020-07-24 14:31:11 -04:00
Joey Hess
c5ea2e9d12
better benchmark for move/copy speedup 2020-07-24 13:34:12 -04:00
Joey Hess
18f1fb5841
drop performance improvements
Sped up seeking files to drop by 2x, and also some performance
improvements to checking numcopies.

Interestingly, the seek speedup is not due to precaching, but I think is
due to calling getParsed earlier.

Annex.Drop had to be changed to check inAnnex there, since it was removed
from Command.Drop. All other users of Command.Drop already checked inAnnex
themselves.

This commit was sponsored by Ryan Newton on Patreon.
2020-07-24 13:27:46 -04:00
Joey Hess
d732ef1a89
move, copy: Sped up seeking for annexed files to operate on by a factor of nearly 2x. 2020-07-24 12:56:02 -04:00
Joey Hess
00865cdae8
Fix a bug in find --branch in the previous version
inAnnex check was lost for that code path. To avoid more such mistakes,
made withKeyOptions check it when the AnnexedFileSeeker specifies.
2020-07-24 12:05:28 -04:00
Joey Hess
cb74cefde7
Fix a hang when using git-annex with an old openssh 7.2p2
Which had some weird inheriting of ssh FDs by sshd.

Bug was introduced in git-annex version 7.20200202.7.
2020-07-21 16:14:25 -04:00
Joey Hess
ac56a5c2a0
Fix a lock file descriptor leak that could occur when running commands like git-annex add with -J
Bug was introduced as part of a different FD leak fix in version 6.20160318.
2020-07-21 15:30:47 -04:00
Joey Hess
798fdad660
fix build with dlist-1.0
That removed the list function. This new implementation appears to
actually be more efficient anyway, since it avoids toList.
2020-07-21 12:58:51 -04:00
Joey Hess
1ccb6699a1
guidance on size and mtime fields 2020-07-20 19:56:47 -04:00
Joey Hess
abd56fb019
Fix a bug in find --batch in the previous version. 2020-07-20 19:50:53 -04:00
Joey Hess
af901d1366
releasing package git-annex version 8.20200720 2020-07-20 14:41:12 -04:00
Joey Hess
889603336a
fix reversion in skipping deleted files
And add a test case for that.

This certianly loses some of the 2x performance improvement in file
seeking that seekFilteredKeys led to, because now it has to stat the
worktree files again. Without benchmarking, I expect there will still be
a sizable improvement, and also the git-annex branch precaching that
seekFilteredKeys can do will still be a win of its approach.

Also worth noting that lookupKey, when the file DNE, check if it's in an
adjusted branch with hidden files, and if so, finds the key for the
file anyway. That was intended to make git-annex sync --content be able
to process those files, but a side effect was that, when a file was
deleted but the deletion not yet staged, git-annex commands used to
still list it. That was actually a bug. This commit fixes that bug too.
(git-annex sync --content on such a branch does not use seekFilteredKeys
so was not affected by the reversion or by this behavior change)

This commit was sponsored by Jake Vosloo on Patreon.
2020-07-19 21:25:01 -04:00
Joey Hess
7b2d236556
importfeed: stream metadata for 5% speedup
On top of the 10% speedup from streaming url logs.
2020-07-14 14:35:26 -04:00
Joey Hess
535cdc8d48
importfeed: Made checking known urls step around 10% faster.
This was a bit disappointing, I was hoping for a 2x speedup. But, I think
the metadata lookup is wasting a lot of time and also needs to be made to
stream.

The changes to catObjectStreamLsTree were benchmarked to not also speed
up --all around 3% more. Seems I managed to make it polymorphic after all.
2020-07-14 12:47:51 -04:00
Joey Hess
a6afa62a60
improve wording 2020-07-13 17:57:55 -04:00
Joey Hess
75aab72d23
mostly done with location log precaching
Some nice wins.
2020-07-13 17:04:02 -04:00
Joey Hess
b4d0f6dfc2
slower but sequential filtering of large files from pointer files
There should still be a speedup seeking over pointer files, just not as
large as the one seeking over symlinks.
2020-07-10 15:21:58 -04:00
Joey Hess
de3d7d044d
make catObjectStream support newline and carriage return in filenames
Turns out the %(rest) trick was not needed. Instead, just maintain a
list of files we've asked for, and each cat-file response is for the
next file in the list.

This actually benchmarks 25% faster than before! Very surprising, but it
must be due to needing to shove less data through the pipe, and parse
less.
2020-07-08 13:49:03 -04:00
Joey Hess
d010ab04be
sped up the --all option by 2x to 16x by using git cat-file --buffer
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.

Keys have been prevented from containing newlines since 2011,
commit 480495beb4. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.

Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.

Thanks to Lukey for finding this optimisation.
2020-07-07 13:54:04 -04:00
Joey Hess
d66fc1a464
Revert "async exception safety for coprocesses"
This reverts commit 7013798df5.
2020-07-06 15:11:28 -04:00
Joey Hess
dfa1c21b8a
comment
and update changelog with benchmark results
2020-07-06 13:39:42 -04:00
Joey Hess
e72ec8b9b2
add back git-annex branch read cache
The cache was removed way back in 2012,
commit 3417c55189

Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".

The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.

Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
2020-07-06 12:22:33 -04:00
Joey Hess
85cd79ea01
no importKey for android yet
adb shell has sha256sum sha1sum and some others, so they could be used.
They're provided by toybox, so seem about as likely to keep
working as find and stat, which it already depends on.

Or to not add a dep, could use stat the same as getExportContentIdentifier
to get a mtime, and make a WORM key. But do I really want this to
default to WORM?

Unsure what's the best path, so punting for now.
2020-07-03 14:02:50 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
f912f8e5fd
refix bug in a better way
Always run Git.Config.store, so when the git config gets reloaded,
the override gets re-added to it, and changeGitRepo then calls extractGitConfig
on it and sees the annex.* settings from the override.

Remove any prior occurance of -c v and add it to the end. This way,
-c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2

Note that, if git had some multiline config that got built up by
multiple -c's, this would not work still. But it never worked because
before the bug got fixed in the first place, the -c value was repeated
many times, so the multivalue thing would have been wrong. I don't think
-c can be used with multiline configs anyway, though git-config does
talk about them?
2020-07-02 13:32:33 -04:00
Joey Hess
ec0f8a6e74
Fix reversion that broke passing git configs with -c
Reverting commit c8fec6ab0
2020-07-02 12:42:13 -04:00
Joey Hess
8a797358b7
changelog wording 2020-06-26 14:27:42 -04:00
Joey Hess
8b22e0bf37
lockContent for tahoe
Trivial since git-annex cannot remove, but do an active checkKey verification
anyway, in case the data was lost somehow.

This commit was sponsored by Ryan Newton on Patreon.
2020-06-26 14:23:21 -04:00
Joey Hess
3175015d1b
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.

In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.

Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.

Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.

So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.

This commit was sponsored by Ethan Aubin.
2020-06-26 13:46:42 -04:00
Joey Hess
4229713e63
importfeed: Added some additional --template variables for date and time
This commit was sponsored by Ethan Aubin.
2020-06-24 14:24:50 -04:00
Joey Hess
b651d3ede0
test: Fix some test cases that assumed git's default branch name
git is making that configurable, and configuring it globally would break
the test suite in a few places.

No other part of git-annex assumes any branch name. Renamed a few
placeholders to make that clearer.

This commit was sponsored by Jake Vosloo on Patreon.
2020-06-23 16:40:51 -04:00
Joey Hess
7757c0e900
Honor annex.largefiles when importing a tree from a special remote.
This commit was sponsored by Martin D on Patreon.
2020-06-23 16:07:18 -04:00
Joey Hess
5098236c6b
testremote: Fix over-allocation of resources and bad caching
Including starting up a large number of external special remote processes.
(Regression introduced in version 8.20200501)
2020-06-22 14:25:49 -04:00
Joey Hess
104b3a9c6a
Build with the http-client-restricted library when available
Otherwise use the vendored copy as before.

The library is in Debian testing but not stable. Once it reaches
stable, the vendored copy can be removed.

Did not add it to debian/control because IIRC that's used to build
git-annex on stable too, possibly. However, the Debian maintainer will
probably want to make the package depend on libghc-http-client-restricted-dev

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:31:31 -04:00
Joey Hess
01eb863a14
Build with the git-lfs library when available
Otherwise use the vendored copy as before.

The library is in Debian testing but not stable. Once it reaches
stable, the vendored copy can be removed.

Did not add it to debian/control because IIRC that's used to build
git-annex on stable too, possibly. However, the Debian maintainer will
probably want to make the package depend on libghc-git-lfs-dev.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:21:25 -04:00
Joey Hess
d5451afc8f
fix deadlock
Fix a deadlock that could occur after git-annex got an unlocked file,
causing the command to hang indefinitely.

Known to happen on vfat filesystems, possibly others.

Note that a deadlock is still theoretically possible, if anything
smudge --clean does causes it to run the git queue for some other
reason.

Apparently that doesn't happen, but will need to keep an eye on it.
2020-06-18 12:56:29 -04:00
Joey Hess
48a88d822d
releasing package git-annex version 8.20200617 2020-06-17 15:59:34 -04:00
Joey Hess
82448bdf39
fix a annex.pidlock issue
That made eg git-annex get of an unlocked file hang until the
annex.pidlocktimeout and then fail.

This fix should be fully thread safe no matter what else git-annex is
doing.

Only using runsGitAnnexChildProcess in the one place it's known to be a
problem. Could audit for all places where git-annex runs itself as a child
and add it to all of them, later.
2020-06-17 15:30:59 -04:00
Joey Hess
9583b267f5
confirmed fix 2020-06-17 12:12:41 -04:00
Joey Hess
ad81feb053
fix implicit embedcreds regression
Fix bug that made creds not be stored in git when a special remote was
initialized with gpg encryption, but without an explicit embedcreds=yes.

(Yet nother regression introduced in version 7.20200202.7. 5th so far.)
2020-06-16 18:00:19 -04:00
Joey Hess
a1d4c8e4ec
external: SETCREDS include creds in externalConfigChanges
This makes the creds get saved, since only things recorded there will be
saved.

IIRC, unparsedRemoteConfig was not originally available when I
implemented this; now that it is things get a bit simpler.

More could probably be simplified, is externalConfigChanges needed at
all?

This does not entirely fix the bugs though, because creds are only
embedded when embedcreds=yes, but not when encryption=pubkey is used
without embedcreds=yes.
2020-06-16 17:24:24 -04:00
Joey Hess
4773713cc9
analysis of regression and fix related less serious regression 2020-06-16 15:16:36 -04:00
Joey Hess
c4f2c56f5e
checkpresentkey: fix behavior to match documentation
checkpresentkey: When no remote is specified, try all remotes, not only
ones that the location log says contain the key. This is what the
documentation has always said it did.

Still try the logged remotes first, because they are far more likely to
have the key.
2020-06-16 13:54:26 -04:00
Joey Hess
a76b1ba3d6
local git remote autoinit improvements
* Improve display of problems auto-initializing or upgrading local git
  remotes.
* When a local git remote cannot be initialized because it has no
  git-annex branch or a .noannex file, avoid displaying a message about it.
2020-06-16 13:24:00 -04:00
Joey Hess
41952204ce
S3: The REDUCED_REDUNDANCY storage class is no longer cheaper
So stop documenting it, and stop offering it as a choice in the assistant.

Removed the code that parses it into S3.ReducedRedundancy, because
S3.OtherStorageClass with the value will work just the same and avoids a
special case for a deprecated this.
2020-06-16 12:04:29 -04:00
Joey Hess
8a7c615a8f
import: Avoid using some strange names for temporary keys
The ContentIdentifier can contain almost anything, so could have characters
that are not fit for the filesystem, or might be longer than a key usually
is, or contain a newline, or .... genKeyName deals with those problems.

This should not present a back-compat issue, because this is a temporary
key used while downloading the imported file, before the real key for it
can be generated.
2020-06-11 16:07:36 -04:00
Joey Hess
6b0cb2d732
defer cleaning keys db of old data
Avoid creating the keys database during init when there are no unlocked
files, to prevent init failing when sqlite does not work in the filesystem.
2020-06-11 15:40:13 -04:00
Joey Hess
a49d300545
async exception safety for external special remote processes
Since an external process can be in the middle of some operation when an
async exception is received, it has to be shut down then. Using
cleanupProcess will close its IO handles and send it a SIGTERM.

If a special remote choses to catch SIGTERM, it's fine for it to do some
cleanup then, but until it finishes, git-annex will be blocked waiting
for it. If a special remote blocked SIGTERM, it would cause a hang.
Mentioned in docs.

Also, in passing, fixed a FD leak, it was not closing the error handle
when shutting down the external. In practice that didn't matter before because
it was only run when git-annex was itself shutting down, but now that it
can run on exception, it would have been a problem.
2020-06-09 12:22:14 -04:00
Joey Hess
1dd770b1af
fix file descriptor leak
when importing from a directory special remote that is configured with
exporttree=yes
2020-06-05 15:34:43 -04:00
Joey Hess
2bff3b7c49
init: When annex.pidlock is set, skip lock probing. 2020-06-05 11:12:16 -04:00
Joey Hess
1d41ae5d2a
init warning on stalled lock probe
init: If lock probing stalls for a long time (eg a broken NFS server),
display a message to let the user know what's taking so long.
2020-06-05 11:06:19 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
484a74f073
auto-init autoenable=yes
Try to enable special remotes configured with autoenable=yes when git-annex
auto-initialization happens in a new clone of an existing repo. Previously,
git-annex init had to be explicitly run to enable them. That was a bit of a
wart of a special case for users to need to keep in mind.

Special remotes cannot display anything when autoenabled this way, to avoid
interfering with the output of git-annex query commands.

Any error messages will be hidden, and if it fails, nothing is displayed.
The user will realize the remote isn't enable when they try to use it,
and can run git-annex init manually then to try the autoenable again and
see what failed.

That seems like a reasonable approach, and it's less complicated than
communicating something across a pipe in order to display it as a side
message. Other reason not to do that is that, if the first command the
user runs is one like git-annex find that has machine readable output,
any message about autoenable failing would need to not be displayed anyway.
So better to not display a failure message ever, for consistency.

(Had to split out Remote.List.Util to avoid an import cycle.)
2020-05-27 12:40:35 -04:00
Joey Hess
864ba4ecaa
disable buggy concurrency in Command.Export
Fix a crash or potentially not all files being exported when sync -J
--content is used with an export remote.

Crash as described in fixed bug report.

waitForAllRunningCommandActions inserted in several points where all the
commandActions started before need to have finished before moving on to
the next stage of the export. A race across those points could have
maybe resulted in not all files being exported, or a wrong tree being
export.

For example, changeExport starting up an action like
a rename of A to B. Then, with that action still running, fillExport
uploading a new A, *before* the rename occurred. That race seems
unlikely to have happened. There are some other ones that this also
fixes.
2020-05-26 13:54:08 -04:00
Joey Hess
e04a931439
improve transfer stages for some commands
move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup
actions in separate job pool from uploads.

transferStages was confusingly named, it's only useful when doing downloads
as then the verify actions can be run concurrently with other downloads.
For commands that upload, there will be more concurrency from running
cleanup actions in a separate job pool.

As for sync, I left it using downloadStages although that's not optimal
for the part of a sync that uploads. Perhaps it should use the union of
both?
2020-05-26 11:55:50 -04:00
Joey Hess
0bcecb67f5
export: Let concurrent transfers be done with -J or annex.jobs
Tested working, although I did find this bug in my testing, which also
afflicts sync -J to an export remote.
2020-05-26 11:44:07 -04:00
Joey Hess
f7fe71602c
import: Added --json-progress
Already supported --json, but not that.

Also checked all other commands that only support --json, and the only
other one that does transfers is fsck (--from), which it did not seem worth
adding --json-progress to really.
2020-05-26 11:27:47 -04:00
Joey Hess
5b8524e1e6
addurl: Make --preserve-filename also apply when eg a torrent contains multiple files
Forgot to remove sanitizeFilePath after adding sanitizeOrPreserveFilePath
here.
2020-05-26 10:45:57 -04:00
Joey Hess
fc9833f68d
export: Added options for json output
Just worked, no need to do anything except add the options.
2020-05-26 10:31:10 -04:00
Joey Hess
01513da127
releasing package git-annex version 8.20200522 2020-05-22 12:07:59 -04:00
Joey Hess
27459c6e3f
Support building with tasty-1.3
This commit was sponsored by Ethan Aubin.
2020-05-21 15:26:44 -04:00
Joey Hess
e63dcbf36c
fix embedcreds=yes reversion
Fix bug that made enableremote of S3 and webdav remotes, that have
embedcreds=yes, fail to set up the embedded creds, so accessing the remotes
failed.

(Regression introduced in version 7.20200202.7 in when reworking all the
remote configs to be parsed.)

Root problem is that parseEncryptionConfig excludes all other config keys
except encryption ones, so it is then unable to find the
credPairRemoteField. And since that field is not required to be
present, it proceeds as if it's not, rather than failing in any visible
way.

This causes it to not find any creds, and so it does not cache
them. When when the S3 remote tries to make a S3 connection, it finds no
creds, so assumes it's being used in no-creds mode, and tries to find a
public url. With no public url available, it fails, but the failure doesn't
say a lack of creds is the problem.

Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full
set of configs of the remote can be parsed. A bit annoying to need to
parse the remote config before the full config (as returned by
setRemoteCredPair) is available, but this avoids the problem.

I assume webdav also had the problem by inspection, but didn't try to
reproduce it with it.

Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted
String, but that does not seem right. Now that it runs that code, it
crashed saying it had just a String.

Remotes that have already been enableremoted, and so lack the cached creds
file will work after this fix, because getRemoteCredPair will extract
the creds from the remote config, writing the missing file.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-21 14:35:30 -04:00
Joey Hess
d7c7245438
whereis: Added --format option.
One way this can be used is to remove all urls for some website that went
away:

git-annex whereis --format '${file} ${url}\0' | \
	grep -z whatever.com | git-annex rmurl --batch -z

Combining ${url} and ${uuid} is a bit of a combinatorial explosion.
It didn't seem worth only outputting a uuid alongside an url belonging
to it, so each uuid is output beside each url.
2020-05-19 16:20:56 -04:00
Joey Hess
d9c7f81ba4
make retrieveKeyFile and retrieveKeyFileCheap throw exceptions
Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a
exception when a remote doesn't support it.
2020-05-13 17:07:07 -04:00
Joey Hess
c1cd402081
make storeKey throw exceptions
When storing content on remote fails, always display a reason why.

Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
2020-05-13 14:03:00 -04:00
Joey Hess
2a8fdfc7d8
Display a warning message when asked to operate on a file inside a directory that's a symbolic link to elsewhere
This relicates git's behavior. It adds a few stat calls for the command
line parameters, so there is some minor slowdown, but even with thousands
of parameters it will not be very noticable, and git does the same statting
in similar circumstances.

Note that this does not prevent eg "git annex add symlink"; the symlink
will be added to git as usual. And "git annex find symlink" will silently
list nothing as well. It's only "symlink/foo" or "subdir/symlink/foo" that
triggers the warning.
2020-05-11 15:03:35 -04:00
Joey Hess
cabbc91b18
addurl, importfeed: Allow '-' in filenames, as long as it's not the first character 2020-05-11 13:50:49 -04:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
69e2e4763e
only check --force at init time, not enable time
git-lfs repos that encrypt the annexed content but not the git repo only
need --force passed to initremote, allow enableremote and autoenable of
such remotes without forcing again.

Needing --force again particularly made autoenable of such a repo not work.
And once such a repo has been set up, it seems a second --force when
enabling it elsewhere has little added value. It does tell the user about
the possibly insecure configuration, but if the git repo has already been
pushed to that remote in the clear, data has already been exposed. The goal
of that --force was not to prevent every situation where such an exposure
can happen -- anyone who sets up a public git repo and pushes to it will
expose things similarly and git-annex is not involved. Instead, the purpose
of the --force is to point out to the user that they're asking for a
configuration where encryption is inconsistently applied.
2020-05-07 15:59:29 -04:00
Joey Hess
1532d67c3e
S3: Support signature=v4
To use S3 Signature Version 4. Some S3 services seem to require v4, while
others may only support v2, which remains the default.

I'm also not sure if v4 works correctly in all cases, there is this
upstream bug report: https://github.com/aristidb/aws/issues/262
I've only tested it against the default S3 endpoint.
2020-05-07 13:18:11 -04:00
Joey Hess
bb88a01910
upgrade: When upgrade fails due to an exception, display it.
37b42e72e7 made it catch exceptions but
thought they were unlikely to be useful to display, which may be right when
a git command fails, but not in the case yoh found.
2020-05-07 12:22:32 -04:00
Joey Hess
0040d2c129
sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from
Now the warning gets displayed, which is better than an arcane git error.

The warning is still kind of ugly, especially when the pull later in the
sync will clear up what it warns about. But, this is an unusual situation
not likely to happen, and if there is no remote to pull from, the warning
message is needed or the sync will seem to succeed despite not merging the
synced master branch.

Would still be better if it could merge the synced master branch in this
situation, making an empty commit to master to do it seems wrong, and
otherwise it would need a whole separate code path, and would bypass using
git merge in favor of say, setting master to the syned branch. Which would
bypass git configs like arguably merge.ff and certianly
merge.verifySignatures. So don't want to do that.
2020-05-05 14:31:37 -04:00
Joey Hess
dfc4e641b5
repair: Improve fetching from a remote with an url in host:path format.
User reported git@my.gitlab.foo:username/myrepo.git didn't work with
git-repair, because it rewrites it to an url
ssh://git@my.gitlab.foo/~/username/myrepo.git
and the /~/ was not something the hosting site supported.

Since git-annex still generally needs the repo url to be well, an url, did
not change the conversion code. But in this case, we're running git fetch,
so we might as well pass it the remote name rather than the url.

Did a quick audit of repoLocation uses to see if there was anything else
like this problem elsewhere, and didn't see any. But this is not the first
time this special case in git and git-annex's attempt to de-special-case
it has caused a problem..
2020-05-04 15:32:06 -04:00
Joey Hess
f9ed30de3b
avoid beware of the leopard situation
* Display a warning message when a remote uses a protocol, such as
  git://, that git-annex does not support. Silently skipping such a
  remote was confusing behavior.

  It sets annex-ignore, so the warning is only displayed once.

* Also display a warning message when a remote, without a known uuid,
  is located in a directory that does not currently exist, to avoid
  silently skipping such a remote.

  This is a bit more debatable, since git-annex get will say,
  try making repository available. And since it does not set annex-ignore,
  the warning will be displayed repeatedly. It's also an extreme edge case,
  I don't think I've ever seen it happen in real life.
2020-05-04 13:01:11 -04:00
Joey Hess
e72ab75633
releasing package git-annex version 8.20200501 2020-05-01 17:41:36 -04:00
Joey Hess
d7db481471
wip
This does not compile, and I hit a bad dead end. Wah.
2020-04-29 15:48:39 -04:00
Joey Hess
4a6d328ae9
Avoid a test suite failure when the environment does not let gpg be tested
Due to eg, too long a path to the agent socket, caused by running gpg in a
container where /run is not mounted, and/or some other gpg behavior like
unnecessarily making relative paths to its home directory absolute.
2020-04-28 15:47:23 -04:00
Joey Hess
57b89c635f
support required groupwanted
When the required content is set to "groupwanted", use whatever expression
has been set in groupwanted as the required content of the repo, similar to
how setting required content to "standard" already worked.
2020-04-28 13:31:26 -04:00
Joey Hess
19b5137227
addurl --fast error message improvement
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.

(Also importfeed --fast potentially.)
2020-04-27 13:48:14 -04:00
Joey Hess
c05c4e549e
sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes
Do not sync with a faster remote that was not specified.

That old behavior was only documented in the changelog, and was certianly
surprising. It also meant adding --fast made it slower..
2020-04-23 16:08:45 -04:00
Joey Hess
2aeb79249b
external: stop storing readonly=true in remote.log
readonly=true is used to make an external special remote that does not
need the external program to be installed. It was stored in the
remote.log by default, and so every time it was specified in an
enableremote or initremote, whatever value was used became the new
default for subsequent enableremotes of that remote.

That was surprising, and I consider it to be a bug.

It does not make much sense to pass it to initremote because then how
would you populate that remote with anything? You would have to
enableremote elsewhere, and store content there. I'm assuming nobody
used it that way.

Someone might rely on passing it to enableremote once, and then that
being inherited in other clones. But that is not how it's documented to
be used. It is barely documented in git-annex at all, only in the
external special remote protocol, and the documentation there says to
"Document that this external special remote can be used in readonly
mode." (by the user of it passing readonly=true to enableremote). The
one external special remote that I know of that does document that is
<https://github.com/bgilbert/gcsannex> (the one that motivated adding
it). That one's docs do say to pass it to enableremote.

So, it seemed safe to make this behavior change. If someone was in fact
relying on one of those behaviors, all their current repos will still
work as they configured them (although they will need to deal
with the related change in 9f3c2dfeda).
In new clones, they will find enableremote fails, complaining the
external program is not in path. An easy enough problem to recover from.
2020-04-23 15:21:26 -04:00
Joey Hess
9f3c2dfeda
stop using remote.name.annex-readonly for two distinct things 2020-04-23 14:56:03 -04:00
Joey Hess
cd1676d604
fix bug involving local git remote and out of date location log
get --from, move --from: When used with a local git remote, these used to
silently skip files that the location log thought were present on the
remote, when the remote actually no longer contained them. Since that
behavior could be surprising, now instead display a warning.

I got very confused when I encountered this behavior, since it was silently
skipping a file I needed that whereis said was on the remote.

get without --from already displayed a "unable to access these remotes"
message, which while a bit misleading in that the remote is likely
accessible, but just doesn't contain the file, at least indicated something
went wrong.

Having get --from display a warning makes it in line with get
w/o --from, so seems certianly ok. It might be there are situations where
move --from is used, on eg a whole directory, and the user only wants to
move whatever is present in the remote, and is perfectly ok with files
that are not present being skipped. So I'm less sure about the new warning
being ok there. OTOH, only local git remotes avoiding displaying a warning
in that case too, so this just brings them into line with other remotes.

(Also note that this makes it a little bit faster when dealing with a lot of
files, since it avoids a redundant stat of the file.)
2020-04-21 12:36:58 -04:00
Joey Hess
04352ed9c5
check-ignore resource pool
Much like check-attr before.
2020-04-21 11:25:28 -04:00
Joey Hess
45fb7af21c
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.

In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.

Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 11:05:57 -04:00
Joey Hess
cee6b344b4
cat-file resource pool
Avoid running a large number of git cat-file child processes when run with
a large -J value.

This implementation takes care to avoid adding any overhead to git-annex
when run without -J. When run with -J, there is a small bit of added
overhead, to manipulate the resource pool. That optimisation added a
fair bit of complexity.
2020-04-20 15:19:31 -04:00
Joey Hess
529f488ec4
fix a thundering herd problem
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.

What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.

Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
2020-04-17 17:09:29 -04:00
Joey Hess
957a87b437
fix absolute filenames fed into --batch and git-annex info 2020-04-15 16:04:05 -04:00
Joey Hess
5a62e8132d
When parsing git configs, support all the documented ways to write true and false, including "yes", "on", "1", etc.
This change does impact git-annex config
eg "git annex config --set annex.addunlocked on"
will store "on" and new git-annex will understand that value, while
old git-annex will error:
git-annex: bad annex.addunlocked configuration in git annex config:
Parse failure: near "on"
That seems acceptable.

Not special remote configs that are only documented as =true or =false
however. Having git-annex support other values for those would break
backwards compatability when used with old versions of git-annex. And
older versions ignore invalid special remote configs.. That would not
be a good combination.
2020-04-13 14:05:30 -04:00
Joey Hess
9cb69dbb76
support boolean git configs that are represented by the name of the setting with no value
Eg"core.bare" is the same as "core.bare = true".

Note that git treats "core.bare =" the same as "core.bare = false", so the
code had to become more complicated in order to treat the absense of a
value differently than an empty value. Ugh.
2020-04-13 13:35:22 -04:00
Joey Hess
ca9c6c5f60
Fix a potential failure to parse git config
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.

So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
2020-04-13 13:05:41 -04:00
Joey Hess
86426036a0
optimise catfile interface with ByteString and Attoparsec
Around 3% total speedup.

Profiling git annex find --not --in web, it's now bytestring end-to-end,
and there is only a little added overhead in eg accessing the Annex
state MVar (3%). The rest of the runtime is spent reading symlinks, and
in attoparsec.

This feels like the end of the optimisation road, without a major change
like caching information for faster queries.
2020-04-10 14:18:52 -04:00
Joey Hess
2caf579718
cache annex index filename for 1.5% speedup to queries 2020-04-10 13:37:04 -04:00
Joey Hess
aeca7c2207
Sped up query commands that read the git-annex branch by around 5%
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.

--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
2020-04-09 13:54:43 -04:00
Joey Hess
541803bb52
changelog for ByteString Ref 2020-04-08 14:10:29 -04:00
Joey Hess
7ebc118776
adb: Better messages when the adb command is not installed
After a user completely ignored the display of the exception probably
because it didn't make sense..

This does make it a little bit slower since it checks adb is in path each
time before running it. Also, it might display a lot of warnings about it
not being installed.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-04-02 10:48:28 -04:00
Joey Hess
87d5583a91
use programPath consistently, not readProgramFile
Improve git-annex's ability to find the path to its program, especially
when it needs to run itself in another repo to upgrade it.

Some parts of the code used readProgramFile, probably because I forgot that
programPath exists.

I noticed this when a git-annex auto-upgrade failed because it was running
git-annex upgrade --autoonly, but the code to run git-annex used
readProgramFile, which happened to point to an older build of git-annex.
2020-03-30 16:06:27 -04:00