Commit graph

2593 commits

Author SHA1 Message Date
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
15000dee07
improve thirdpartypopulated support
May actually work now.

Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.
2020-12-21 16:19:44 -04:00
Joey Hess
57b03630b3
support thirdPartyPopulated
These don't have importTree in their config, because they don't support
tree import, but they do still support import, and do not support export
or key/value modification.
2020-12-21 13:49:47 -04:00
Joey Hess
771b6c64f0
Merge branch 'master' into borg 2020-12-18 16:05:09 -04:00
Joey Hess
e0062c4f93
build fix 2020-12-18 16:04:56 -04:00
Joey Hess
909318dcee
Merge branch 'master' into borg 2020-12-18 15:27:24 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
f62aee0525
fix handling of importtree-only remotes
Don't want to try to use these remotes as key/value remotes, which will
surely fail. It only recently became possible for importtree to be set
w/o exporttree, so before this code was ok.

(cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)
2020-12-18 15:13:30 -04:00
Joey Hess
53fd1564b1
improve synopsis 2020-12-17 12:51:49 -04:00
Joey Hess
2abda21123
update 2020-12-15 16:35:06 -04:00
Joey Hess
f29d49d478
check Remote.hasKeyCheap again
In cd1676d604, it stopped using that to avoid surprising behavior
when the location log and remote content were out of sync.
But, it seems that may have changed some behavior users relied on as
well, and also Remote.hasKeyCheap should be faster than checking then
location log.

So, try Remote.hasKeyCheap first, and only if it does not have the key,
fall back to checking the location log. If the location log still thinks
it's present, go ahead and try to get it, so the user will see a failure
rather than silently skipping a file what whereis says is on the remote.

This does make slightly slower the case where the remote does not have
the key, and location log and Remote.hasKeyCheap agree, since it now
checks both. But only 1 stat slower.
2020-12-15 14:44:00 -04:00
Joey Hess
00526a6739
pass along -c options to child git-annex processes 2020-12-15 10:49:29 -04:00
Joey Hess
ed68a2166d
importfeed: Avoid using youtube-dl when a feed does not contain an enclosure, but only a link to an url which youtube-dl does not support
This is common in some feeds, which might mix some items with enclosures,
with others that link to posts or whatever. Before this, it would try to
use youtube-dl and fail, or if youtube-dl was not allowed, it would
incorrectly complain that an url was supported by youtube-dl.
2020-12-15 01:13:21 -04:00
Joey Hess
01527b21d8
add key to FileInfo
MatchingKey is not the thing to use when matching on actual worktreee
files.

Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
2020-12-14 17:42:02 -04:00
Joey Hess
4a8723246d
avoid transferrer committing the git-annex branch on shutdown
The parent is will do it when it shuts down, and having both of them
trying to do it at the same time seems like something good to avoid.
2020-12-11 16:16:07 -04:00
Joey Hess
d3f78da0ed
propagate signals to the transferrer process group
Done on unix, could not implement it on windows quite.

The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.

Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-11 15:32:00 -04:00
Joey Hess
a422a056f2
make getViaTmpFrom no longer update location log
All callers adjusted to update it themselves.

In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.

This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
2020-12-11 11:50:13 -04:00
Joey Hess
cedad7b37d
refactor 2020-12-10 16:33:52 -04:00
Joey Hess
04c12aa6df
custom protocol for transferrer
Rather than using Read/Show, which would force me to preserve data types
into the future.

I considered just deriving json and sending that, but I don't much like
deriving json with data types that have named constructors (like Key
does) because again it locks in data type details.

So instead, used SimpleProtocol, with a fairly complex and unreadable
protocol. But it is as efficient as the p2p protocol at least, and as
future proof.

(Writing my own custom json instances would have worked but I thought
of it too late and don't want to do all the work twice. The only real
benefit might be that aeson could be faster.)

Note that, when a new protocol request type is added later, git-annex
trying to use it will cause the git-annex transferrer to display a
protocol error message. That seems ok; it would only happen if a new
git-annex found an old version of itself in PATH or the program
file. So it's unlikely, and all it can do anyway is display an error.
(The error message could perhaps be improved..)

This commit was sponsored by Jack Hill on Patreon.
2020-12-09 16:13:59 -04:00
Joey Hess
004a4f5fb1
factor out Types.Transferrer 2020-12-09 13:28:49 -04:00
Joey Hess
677003a6df
rename helper
More consistent name with TransferrerPool
2020-12-09 13:24:24 -04:00
Joey Hess
05c0543e8e
move new interface to git-annex transfer
This is to avoid breakage when upgrading or downgrading git-annex with a
process running that uses the interface. It's better to keep the
compatability code for a few years than worry about such breakage.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-12-09 12:33:56 -04:00
Joey Hess
fcc9e01556
finally using transferkeys
Seems to work! Even progress bars. Have not tested prompting or various
error message displays yet.

transferkeys had to be made to operate in different modes for the
Assistant and Annex monads. A bit ugly, but it did relegate that
really ugly Database.Keys.closeDb in transferkeys to only the assistant
code path.

This commit was sponsored by Noam Kremen.
2020-12-07 16:18:26 -04:00
Joey Hess
4c47568876
refactoring
This is groundwork for using git-annex transferkeys to run transfers,
in order to allow stalled transfers to be interrupted and retried.

The new upload and download are closer to what git-annex transferkeys
does, so the plan is to make them use it.

Then things that were left using upload' and download' won't recover
from stalls. Notably, that includes import and export. But
at least get/move/copy will be able to. (Also the assistant hopefully,
but not yet.)

This commit was sponsored by Jake Vosloo on Patreon.
2020-12-07 14:49:17 -04:00
Joey Hess
438d5be1f7
support prompt in message serialization
That seems to be the last thing needed for message serialization.
Although it's only used in the assistant currently, so hard to tell if I
forgot something.

At this point, it should be possible to start using transferkeys
when performing transfers, which will allow killing a transferkeys
process if a transfer times out or stalls. But that's for another day.

This commit was sponsored by Ethan Aubin.
2020-12-04 14:54:09 -04:00
Joey Hess
7a9b618d5d
fix problem with last commit and assistant
liftAnnex blocks all others calls, so avoid using it with a long-duration
call to readResponse.
2020-12-04 12:20:04 -04:00
Joey Hess
cad147cbbf
new protocol for transferkeys, with message serialization
Necessarily threw out the old protocol, so if an old git-annex assistant
is running, and starts a transferkeys from the new git-annex, it would
fail. But, that seems unlikely; the assistant starts up transferkeys
processes and then keeps them running. Still, may need to test that
scenario.

The new protocol is simple read/show and looks like this:

TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo"))
TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9}))
TransferOutput (OutputMessage "(checksum...) ")
TransferResult True

Granted, this is not optimally fast, but it seems good enough, and is
probably nearly as fast as the old protocol anyhow.

emitSerializedOutput for ProgressMeter is not yet implemented. It needs
to somehow start or update a progress meter. There may need to be a new
message that allocates a progress meter, and then have ProgressMeter
update it.

This commit was sponsored by Ethan Aubin
2020-12-03 16:21:20 -04:00
Joey Hess
a3b714ddd9
finish fixing removeLink on windows
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink

This commit was sponsored by Ethan Aubin.
2020-11-24 13:20:44 -04:00
Joey Hess
631c8d3e5b
avoid redundant adjusted branch update in sync
sync still does update it if the config would otherwise not, since it
already did.
2020-11-16 15:13:48 -04:00
Joey Hess
0896038ba7
annex.adjustedbranchrefresh
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.

Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:

* If two files point to one file, then eg, `git annex get foo` will
  update the branch to unlock foo, but will not unlock bar, because it
  does not know about it. Might be fixable by making `git annex get
  bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
  of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info

Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
2020-11-16 14:27:28 -04:00
Joey Hess
26cf26caca
Merge branch 'master' into symlink-missing 2020-11-16 10:03:12 -04:00
Joey Hess
5a8d01f63e
examinekey: Added a "file" format variable
For consistency with find, and for easier scripting.
2020-11-16 09:59:11 -04:00
Joey Hess
ccfa9b2dc4
make sync update --unlock-present branch 2020-11-13 15:04:34 -04:00
Joey Hess
e66b7d2e1b
rename to --unlock-present and better reverse adjusting
An --unlock-present branch reverses back to a branch where
all files that get modified or renamed become locked, even if they were
originally unlocked. This is the same that reversing a --unlock branch
works, and the new name makes that commonality more clear.
2020-11-13 14:56:43 -04:00
Joey Hess
3899e216af
Merge branch 'master' into symlink-missing 2020-11-13 14:19:45 -04:00
Joey Hess
a30030c4a6
move: Fix a regression in the last release that made move --to not honor numcopies settings
This commit was sponsored by Svenne Krap on Patreon.
2020-11-13 14:19:32 -04:00
Joey Hess
c8e49c5ef5
git-annex adjust --lock-missing
Like --hide-missing the branch does not get updated when content
availability changes.

Seems to basically work, but sync does not update it yet.

Also, when a file is present and so unlocked, git mv followed by
git-annex sync results in the basis branch being updated to contain the
file with the new name, unlocked. This seems different than what
happens in an adjusted unlocked branch, where the commit propigates back
locked. Probably the reverse adjustment code needs to be improved to
handle this case.
2020-11-13 13:39:44 -04:00
Joey Hess
7566aa6bc5
examinekey: Added --migrate-to-backend
Note that, the way the SeekInput parser is written to support batch mode,
it's actually possible to do git-annex examinekey
"SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E

While that might be kind of useful to support multiple migrations not using
batch mode, I have not documented it. It would be better to take pairs of
key and file in that case.
2020-11-12 14:09:14 -04:00
Joey Hess
12e32d1dee
examinekey: Added two new format variables: objectpath and objectpointer 2020-11-12 13:02:31 -04:00
Joey Hess
92b7b1964d
add warning on add of annex link
Warn when adding a annex symlink or pointer file that uses a key that is
not known to the repository, to prevent confusion if the user has copied it
from some other repository.

This commit was sponsored by Jake Vosloo on Patreon.
2020-11-10 12:10:51 -04:00
Joey Hess
e81bb05b25
add debug in two unusual situations 2020-11-09 17:52:06 -04:00
Joey Hess
1db49497e0
finished this stage of the RawFilePath conversion
This commit was sponsored by Denis Dzyubenko on Patreon.
2020-11-06 14:10:58 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
5a1e73617d
finished this stage of the RawFilePath conversion
Finally compiles again, and test suite passes.

This commit was sponsored by Brock Spratlen on Patreon.
2020-11-04 14:20:37 -04:00
Joey Hess
4bcb4030a5
more RawFilePath conversion
580/645

This commit was sponsored by Jack Hill on Patreon.
2020-11-03 18:34:27 -04:00
Joey Hess
eb42cd4d46
more RawFilePath conversion
535/645

This commit was sponsored by Brett Eisenberg on Patreon.
2020-11-03 10:11:04 -04:00
Joey Hess
55400a03d3
more RawFilePath conversion
This commit was sponsored by Luke Shumaker on Patreon.
2020-11-02 16:31:28 -04:00
Joey Hess
87f91ce563
more RawFilePath conversion
451/645
2020-10-30 15:55:59 -04:00
Joey Hess
e505c03bcc
more RawFilePath conversion
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.

This commit was sponsored by Graham Spencer on Patreon.
2020-10-29 10:50:29 -04:00
Joey Hess
a108b00b33
testremote: Display exceptions when tests fail, to aid debugging 2020-10-23 15:41:57 -04:00
Joey Hess
0133b7e5a8
move: Improve resuming a move that was interrupted after the object was transferred
In cases where numcopies checks prevented the resumed move from dropping
the object from the source repository, it now relies on a log of recent
moves to replicate the behavior of the interrupted command.

Performance: Probably noticable impact, since it has to add to the log,
check the log, and remove from the log. Seems worth it to avoid this
annoying edge case. The log functions are pretty well optimised to avoid
unncessary work.

An performance improvement to make later would be to avoid cleanup doing
anything if it's not written to the log file, and has confirmed that the
log file does not contain the log line.

This commit was sponsored by Jake Vosloo on Patreon.
2020-10-21 10:31:56 -04:00
Joey Hess
7036d0a4c1
add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan
This commit was sponsored by Jochen Bartl on Patreon.
2020-10-19 15:36:18 -04:00
Joey Hess
2dd38b6403
switch to Haskell2010
When I put in Haskell98 this spring, I was under the mistaken
apprehension that ghc defaulted to that. But it actually its default
is a third mode, which is closer to Haskell2010 but with some differences.
The manual says "By default, GHC mainly aims to behave (mostly) like a
Haskell 2010 compiler"

Fixed two cases where the Haskell98 do indentation flexability let
wrongly indented code build. That is one of the places where
ghc does not behave like Haskell2010 by default.

The other place that I think I was concerned about, is GHC manual
section 19.1.1.3. Expressions and patterns. But that only seems to
affect code using bottoms, so would only affect pure functions throwing
an error, which I don't think git-annex does in many places as it's
pretty horrid style. And it would only affect rare cases like shown in
that section. If it did happen, it would mean that the error was not
thrown before specifying Haskell98, and then was. Haskell2010 behaves
the same as Haskell98.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-10-19 11:26:16 -04:00
Joey Hess
c56efbbdb6
import: Check gitignores when importing trees from special remotes
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-30 10:41:59 -04:00
Joey Hess
0033e08193
avoid a second traversal of the ImportableContents
Do all filtering in one pass.
2020-09-30 10:10:03 -04:00
Joey Hess
4c32499e82
Parse youtube-dl progress output
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.

The youtube-dl output is no longer displayed, except for any errors.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-29 17:53:48 -04:00
Joey Hess
1610d94776
addurl: Avoid a redundant git ignores check for speed
Ensure that checkCanAdd is used everywhere a file is added to git,
so git add is run with -f, presumably avoiding the work it would usually
do to check ignores.
2020-09-29 13:00:41 -04:00
Joey Hess
658ea7ca3c
sync --no-content import from directory special remote
sync: When run without --content, import without copying from
importtree=yes directory special remotes. (Other special remotes may
support this later as well.)

This commit was sponsored by Svenne Krap on Patreon.
2020-09-28 15:29:08 -04:00
Joey Hess
3eaaec3113
consistently use importKey when available
This avoids import with --no-content and with --content potentially
generating two different trees, leading to a merge conflict when run in
two different clones of a repo. And it's necessary groundwork to make
git-annex sync --no-content import from special remotes that support
importKey.

Only the directory special remote currently supports importKey, and it
generates the same key as git-annex usually does, so there is no
behavior change for it.

Future special remotes will need to take care when adding importKey,
if it generates different keys. Added some warnings about that to
comments.

This commit was sponsored by Noam Kremen on Patreon.
2020-09-28 15:27:46 -04:00
Joey Hess
8b74f01a26
split ProvidedInfo and UserProvidedInfo
The latter is for git-annex matchexpression and matching against it can
throw an exception. Splitting out the former reduces the potential for
mistakes and avoids needing to worry about matching against that
throwing an exception.

This is more groundwork for matching largefiles while importing,
without downloading content.

This commit was sponsored by Graham Spencer on Patreon.
2020-09-28 12:12:38 -04:00
Joey Hess
00dbe35fbc
allow matching on files whose content is not present
Anything that needs to examine the file content will fail to match,
or fall back to other available information. But the intent is that the
matcher be checked for matchNeedsFileContent and only be used if it does
not, so the exact behavior doesn't much matter as it should never
happen.

The real point of this is to not need to provide a dummy content file
when matching.

This commit was sponsored by Martin D on Patreon.
2020-09-28 11:17:46 -04:00
Joey Hess
f624876dc2
remove zombie process in file seeking
This was the last one marked as a zombie. There might be others I don't
know about, but except for in the hypothetical case of a thread dying
due to an async exception before it can wait on a process it started, I
don't know of any.

It would probably be safe to remove the reapZombies now, but let's wait
and so that in its own commit in case it turns out to cause problems.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-25 11:38:42 -04:00
Joey Hess
ca454c47f2
explicitly wait for a git process
Eliminate a zombie that was only cleaned up by the later zombie cleanup
code.

This is still not ideal, it would be cleaner if it used conduit or
something, and if the thread gets killed before waiting, it won't stop
the process.

Only remaining zombies are in CmdLine.Seek
2020-09-25 11:03:12 -04:00
Joey Hess
051e16a945
remove debug print 2020-09-24 15:37:39 -04:00
Joey Hess
d89984b121
sync --all avoid unncessary first pass
Sped up seeking to around twice as fast, by avoiding a pass over the
worktree files when preferred content expressions of the local repo and
remotes don't use include=/exclude=.

Thanks to Lukey for identifying the optimisation.

This commit was sponsored by Brock Spratlen on Patreon.
2020-09-24 15:12:09 -04:00
Joey Hess
b45b37b088
wait for first pass to complete before second pass
Otherwise the bloom filter may not be fully populated when the second
pass starts, which could have led to incorrect behavior with --all -J,
probably in very rare circumstances.
2020-09-24 14:23:25 -04:00
Joey Hess
167da965b9
remove obsolete comment 2020-09-24 14:22:56 -04:00
Joey Hess
c1b4d76e6b
make MatchFiles introspectable
matchNeedsFileContent is not used yet, but shows how to add information
about terminals. That one would be needed for
https://git-annex.branchable.com/todo/sync_fast_import/

Note the tricky bit in Annex.FileMatcher.call where it folds over the
included matcher to propagate the information.

This commit was sponsored by Svenne Krap on Patreon.
2020-09-24 14:01:53 -04:00
Joey Hess
5cfcf1f05f
cache remote.log
Unlikely to speed up any of the existing uses much, but I want to use it
in a message that might be displayed many times.
2020-09-22 13:52:26 -04:00
Joey Hess
3457b526ef
make git-annex add --no-check-gitignore not skip ignored files, same as with --force 2020-09-18 13:33:35 -04:00
Joey Hess
d0b06c17c0
Added --no-check-gitignore option for finer grained control than using --force.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.

(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)

addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.

In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
2020-09-18 13:19:13 -04:00
Joey Hess
fcf5d11c63
add "input" field to json output
The use case of this field is mostly to support -J combined with --json.
When that is implemented, a user will be able to look at the field to
determine which of the requests they have sent it corresponds to.

The field typically has a single value in its list, but in some cases
mutliple values (eg 2 command-line params) are combined together and the
list will have more.

Note that json parsing was already non-strict, so old git-annex metadata
--json --batch can be fed json produced by the new git-annex and will
not stumble over the new field.
2020-09-15 16:22:44 -04:00
Joey Hess
2a3c2b1843
use Branch.name instead of hard coding the branch name
Makes much more clear why ActionItemOther is being passed "git-annex".
2020-09-15 15:47:22 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
f4c4b89aa3
refactor
Make all calls to git merge go through autoMergeFrom, in preparation
for fine-tuning git merge's config for automatic merge conflict
resolution.

This commit was sponsored by Ryan Newton on Patreon.
2020-09-07 13:26:16 -04:00
Joey Hess
46eb48d7c0
Retry transfers to exporttree=yes remotes same as for other remotes
The comment about noRetry is not well-justified, because transfers to many
remotes cannot be resumed, but retries are still allowed for those.
2020-09-04 13:24:08 -04:00
Joey Hess
7bdb0cdc0d
add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.

Renamed runsGitAnnexChildProcess to make clearer where it should be
used.

Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
2020-08-25 14:57:49 -04:00
Joey Hess
2ca1ff62dc
addurl --file youtube-dl reversion fix
addurl: Fix reversion in 7.20190322 that made --file not be honored when
youtube-dl was used to download media.

8758f9c561 was on the right track, but missed that | otherwise prevented
the code it added from being used.

Also, refactored out a common function.

This commit was sponsored by Graham Spencer on Patreon.
2020-08-25 12:56:45 -04:00
Joey Hess
4c58433c48
avoid using MonadFail in ParseDuration
There's no instance for Either String, so that makes it not as useful as
it could be, so instead just return an Either String.
2020-08-15 15:53:35 -04:00
Joey Hess
5d380c6c5c
when workTreeItems finds a problem with a parameter, don't go on to process it
Part of workTreeItems is trying detect a case
where git porcelain refuses to process a file, and where
git ls-files silently outputs nothing. But, it's hard to perfectly
replicate git's behavior, and besides, git's behavior could change.
So it could be that we warn, but then git ls-files does not skip over
it, and so git-annex also processes it after warning about it.

So, if we think we have a problem with a parameter, display the warning,
and skip processing it at all.

Implementing this was complicated by needing to handle the case where
all command-line parameters get filtered out this way. Which is
different than the case where there are none, because we don't want to
operate on all files in this new case..
2020-08-06 13:47:45 -04:00
Joey Hess
283d2f85d1
importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_'
sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was
running it on parts of the template. So eg the leading '.' in the extension
got sanitized.

Note the added case for sanitizeLeadingFilePathCharacter ('/':_)
-- this was added because, if the template is title/episode and the title
is not set, it would expand to "/episode". So this is another potential
security fix.
2020-08-05 11:35:00 -04:00
Joey Hess
f75be32166
external backends wip
It's able to start them up, the only thing not implemented is generating
and verifying keys. And, the key translation for HasExt.
2020-07-29 15:23:18 -04:00
Joey Hess
2a45b5ae9a
avoid failure to lock content of removed file causing drop etc to fail
This was already prevented in other ways, but as seen in commit
c30fd24d91, those were a bit fragile.
And I'm not sure races were avoided in every case before. At least a
race between two separate git-annex processes, dropping the same
content, seemed possible.

This way, if locking fails, and the content is not present, it will
always do the right thing. Also, it avoids the overhead of an unncessary
inAnnex check for every file.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-07-25 11:59:33 -04:00
Joey Hess
c30fd24d91
add back inAnnex check after seeking
The test suite noticed this case, where two files with the same key are
dropped, and the seek stage sees both have content due to the way files
stream through it. But then locking the content to drop fails on the
second file, because the first file has already been dropped.

So, add back otherwise redundant inAnnex check.
2020-07-25 11:18:50 -04:00
Joey Hess
18f1fb5841
drop performance improvements
Sped up seeking files to drop by 2x, and also some performance
improvements to checking numcopies.

Interestingly, the seek speedup is not due to precaching, but I think is
due to calling getParsed earlier.

Annex.Drop had to be changed to check inAnnex there, since it was removed
from Command.Drop. All other users of Command.Drop already checked inAnnex
themselves.

This commit was sponsored by Ryan Newton on Patreon.
2020-07-24 13:27:46 -04:00
Joey Hess
a01aa214be
enable location log precaching for mirror
It will be some perf increase, but the command is not much used so I
have not bothered to benchmark it.
2020-07-24 13:19:24 -04:00
Joey Hess
d732ef1a89
move, copy: Sped up seeking for annexed files to operate on by a factor of nearly 2x. 2020-07-24 12:56:02 -04:00
Joey Hess
00865cdae8
Fix a bug in find --branch in the previous version
inAnnex check was lost for that code path. To avoid more such mistakes,
made withKeyOptions check it when the AnnexedFileSeeker specifies.
2020-07-24 12:05:28 -04:00
Joey Hess
2d771a7d32
add back inAnnex check for keys options
Lost in recent commit.
2020-07-24 11:49:15 -04:00
Joey Hess
4685612f43
small git-annex get speedup
Remove an redundant inAnnex check. The checkContentPresent handles that,
and after the last commit also does in batch mode.
2020-07-22 14:29:30 -04:00
Joey Hess
1be92381ec
unify batch mode with non-batch by using AnnexedFileSeeker 2020-07-22 14:23:28 -04:00
Joey Hess
abd56fb019
Fix a bug in find --batch in the previous version. 2020-07-20 19:50:53 -04:00
Joey Hess
c4cc2cdf4c
rename getKey to genKey
for consistency with external backend protocol
2020-07-20 14:06:05 -04:00
Joey Hess
172743728e
move cryptographicallySecure into Backend type
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.

Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
2020-07-20 12:17:42 -04:00
Joey Hess
a7156b875c
fix fsck reversion
75aab72d23 made fsck skip files whose
content is not present, but it should complain if there are not enough
copies.
2020-07-15 11:21:43 -04:00
Joey Hess
9c23f99d45
add back missing check that content is present
Lost in 75aab72d23 and some related
commits. unannex skips files whose content is not present.
2020-07-15 11:15:28 -04:00
Joey Hess
377866d884
remove unused import 2020-07-14 14:37:40 -04:00
Joey Hess
7b2d236556
importfeed: stream metadata for 5% speedup
On top of the 10% speedup from streaming url logs.
2020-07-14 14:35:26 -04:00
Joey Hess
75aab72d23
mostly done with location log precaching
Some nice wins.
2020-07-13 17:04:02 -04:00
Joey Hess
df58609804
convert sync to use seekFilteredKeys
This only speeds up sync --content from 34.75 to 33.17 seconds;
location log precaching will probably be a bigger win.
2020-07-13 15:02:52 -04:00
Joey Hess
88a7fb5cbb
convert all applicable commands to new 2x faster annexed file seeking
This removes all calls to inAnnex, except for some involving --batch.
It may be that the batch code could get a similar speedup, but I don't
know if people habitually pass a huge number of files through --batch
that git-annex does not need to do anything to process, so I skipped it
for now.

A few calls to ifAnnexed remain, and might be worth doing more to
convert. In particular, Command.Sync has one that would probably speed
it up by a good amount.

(also removed some dead code from Command.Lock)
2020-07-10 15:45:38 -04:00
Joey Hess
7a42a47902
renaming 2020-07-10 14:17:35 -04:00
Joey Hess
4c9ad1de46
optimisation: stream keys through git cat-file --buffer
This is only implemented for git-annex get so far. It makes git-annex
get nearly twice as fast in a repo with 10k files, all of them present!

But, see the TODO for some caveats.
2020-07-10 13:54:52 -04:00
Joey Hess
e72ec8b9b2
add back git-annex branch read cache
The cache was removed way back in 2012,
commit 3417c55189

Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".

The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.

Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
2020-07-06 12:22:33 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
4229713e63
importfeed: Added some additional --template variables for date and time
This commit was sponsored by Ethan Aubin.
2020-06-24 14:24:50 -04:00
Joey Hess
7757c0e900
Honor annex.largefiles when importing a tree from a special remote.
This commit was sponsored by Martin D on Patreon.
2020-06-23 16:07:18 -04:00
Joey Hess
5098236c6b
testremote: Fix over-allocation of resources and bad caching
Including starting up a large number of external special remote processes.
(Regression introduced in version 8.20200501)
2020-06-22 14:25:49 -04:00
Joey Hess
aa1ad0b7ca
remove redundant imports
Clean build under ghc 8.8.3, which seems to do better at finding cases
where two imports both provide the same symbol, and warns about one of
them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:05:34 -04:00
Joey Hess
d5451afc8f
fix deadlock
Fix a deadlock that could occur after git-annex got an unlocked file,
causing the command to hang indefinitely.

Known to happen on vfat filesystems, possibly others.

Note that a deadlock is still theoretically possible, if anything
smudge --clean does causes it to run the git queue for some other
reason.

Apparently that doesn't happen, but will need to keep an eye on it.
2020-06-18 12:56:29 -04:00
Joey Hess
96f6aa39dd
add runsGitAnnexChildProcess calls
This is all the calls to git-annex that seem capable of possibly locking
the same pidlock as their parent. Except possibly for some in the
assistant.
2020-06-17 15:31:03 -04:00
Joey Hess
c4f2c56f5e
checkpresentkey: fix behavior to match documentation
checkpresentkey: When no remote is specified, try all remotes, not only
ones that the location log says contain the key. This is what the
documentation has always said it did.

Still try the logged remotes first, because they are far more likely to
have the key.
2020-06-16 13:54:26 -04:00
Joey Hess
2670890b17
convert to withCreateProcess for async exception safety
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.

Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
2020-06-04 15:45:52 -04:00
Joey Hess
92f775eba0
convert to withCreateProcess for async exception safety
Not yet 100% done, so far I've grepped for waitForProcess and converted
everything that uses that to start the process with withCreateProcess.

Except for some things like P2P.IO and Assistant.TransferrerPool,
and Utility.CoProcess, that manage a pool of processes. See #2
in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7
for how those will need to be dealt with.

checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so
callers of them will also need to be dealt with, and have not been yet.
2020-06-03 15:48:09 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
484a74f073
auto-init autoenable=yes
Try to enable special remotes configured with autoenable=yes when git-annex
auto-initialization happens in a new clone of an existing repo. Previously,
git-annex init had to be explicitly run to enable them. That was a bit of a
wart of a special case for users to need to keep in mind.

Special remotes cannot display anything when autoenabled this way, to avoid
interfering with the output of git-annex query commands.

Any error messages will be hidden, and if it fails, nothing is displayed.
The user will realize the remote isn't enable when they try to use it,
and can run git-annex init manually then to try the autoenable again and
see what failed.

That seems like a reasonable approach, and it's less complicated than
communicating something across a pipe in order to display it as a side
message. Other reason not to do that is that, if the first command the
user runs is one like git-annex find that has machine readable output,
any message about autoenable failing would need to not be displayed anyway.
So better to not display a failure message ever, for consistency.

(Had to split out Remote.List.Util to avoid an import cycle.)
2020-05-27 12:40:35 -04:00
Joey Hess
3824645368
change to new waitForAllRunningCommandActions
waitForAllRunningCommandActions is a subset of finishCommandActions
and more appropriate for what is being done here: Just a concurrency
barrier.
2020-05-26 14:00:51 -04:00
Joey Hess
864ba4ecaa
disable buggy concurrency in Command.Export
Fix a crash or potentially not all files being exported when sync -J
--content is used with an export remote.

Crash as described in fixed bug report.

waitForAllRunningCommandActions inserted in several points where all the
commandActions started before need to have finished before moving on to
the next stage of the export. A race across those points could have
maybe resulted in not all files being exported, or a wrong tree being
export.

For example, changeExport starting up an action like
a rename of A to B. Then, with that action still running, fillExport
uploading a new A, *before* the rename occurred. That race seems
unlikely to have happened. There are some other ones that this also
fixes.
2020-05-26 13:54:08 -04:00
Joey Hess
e04a931439
improve transfer stages for some commands
move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup
actions in separate job pool from uploads.

transferStages was confusingly named, it's only useful when doing downloads
as then the verify actions can be run concurrently with other downloads.
For commands that upload, there will be more concurrency from running
cleanup actions in a separate job pool.

As for sync, I left it using downloadStages although that's not optimal
for the part of a sync that uploads. Perhaps it should use the union of
both?
2020-05-26 11:55:50 -04:00
Joey Hess
0d82a88742
drop: use commandStages, not transferStages
I cannot find any rationalle for why this was changed before.
drop certianly does not do any transfers, so commandStages will perform
better.
2020-05-26 11:47:54 -04:00
Joey Hess
0bcecb67f5
export: Let concurrent transfers be done with -J or annex.jobs
Tested working, although I did find this bug in my testing, which also
afflicts sync -J to an export remote.
2020-05-26 11:44:07 -04:00
Joey Hess
f7fe71602c
import: Added --json-progress
Already supported --json, but not that.

Also checked all other commands that only support --json, and the only
other one that does transfers is fsck (--from), which it did not seem worth
adding --json-progress to really.
2020-05-26 11:27:47 -04:00
Joey Hess
5b8524e1e6
addurl: Make --preserve-filename also apply when eg a torrent contains multiple files
Forgot to remove sanitizeFilePath after adding sanitizeOrPreserveFilePath
here.
2020-05-26 10:45:57 -04:00
Joey Hess
fc9833f68d
export: Added options for json output
Just worked, no need to do anything except add the options.
2020-05-26 10:31:10 -04:00
Joey Hess
d7c7245438
whereis: Added --format option.
One way this can be used is to remove all urls for some website that went
away:

git-annex whereis --format '${file} ${url}\0' | \
	grep -z whatever.com | git-annex rmurl --batch -z

Combining ${url} and ${uuid} is a bit of a combinatorial explosion.
It didn't seem worth only outputting a uuid alongside an url belonging
to it, so each uuid is output beside each url.
2020-05-19 16:20:56 -04:00
Joey Hess
6361074174
convert renameExport to throw exception
Finishes the transition to make remote methods throw exceptions, rather
than silently hide them.

A bit on the fence about this one, because when renameExport fails,
it falls back to deleting instead, and so does the user care why it failed?

However, it did let me clean up several places in the code.

This commit was sponsored by Ethan Aubin.
2020-05-15 15:08:09 -04:00
Joey Hess
037440ef36
convert removeExportDirectory to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 14:43:18 -04:00
Joey Hess
cdbfaae706
change removeExport to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Graham Spencer on Patreon.
2020-05-15 14:15:14 -04:00
Joey Hess
3334d3831b
change retrieveExport and getKey to throw exception
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 13:45:53 -04:00
Joey Hess
4814b444dd
make storeExport throw exceptions 2020-05-15 12:20:02 -04:00
Joey Hess
dc7dc1e179
refactor 2020-05-14 14:21:58 -04:00
Joey Hess
4be94c67c7
make removeKey throw exceptions 2020-05-14 14:11:05 -04:00
Joey Hess
d9c7f81ba4
make retrieveKeyFile and retrieveKeyFileCheap throw exceptions
Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a
exception when a remote doesn't support it.
2020-05-13 17:07:07 -04:00
Joey Hess
c1cd402081
make storeKey throw exceptions
When storing content on remote fails, always display a reason why.

Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
2020-05-13 14:03:00 -04:00
Joey Hess
39d7e6dd2a
addurl --preserve-filename for other remotes
Finishing work begun in 6952060665

Also, truncate filenames provided by other remotes if they're too long,
when --preserve-filename is not used. That seems to have been omitted
before by accident.
2020-05-11 14:33:27 -04:00
Joey Hess
5f5170b22b
remove SafeFilePath
Move sanitizeFilePath call to where fromSafeFilePath had been.
2020-05-11 14:04:56 -04:00
Thomas Koch
8a0480daf3 Fix haddock parse error
I run haddock with `cabal haddock --executables`. It fails with:

    Types/Remote.hs:271:17: error: parse error on input ‘->’

Apparently haddock does not like to find haddock blocks outside of
declarations? In any case, this patch makes these type of errors go
away.

Afterwards, I see errors like these, that need to be investigated as
a next step:

haddock: internal error: internal: extractDecl
CallStack (from HasCallStack):
  error, called at utils/haddock/haddock-api/src/Haddock/Interface/Create.hs:1116:12 in main:Haddock.Interface.Create
2020-05-11 08:40:13 +02:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
0040d2c129
sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from
Now the warning gets displayed, which is better than an arcane git error.

The warning is still kind of ugly, especially when the pull later in the
sync will clear up what it warns about. But, this is an unusual situation
not likely to happen, and if there is no remote to pull from, the warning
message is needed or the sync will seem to succeed despite not merging the
synced master branch.

Would still be better if it could merge the synced master branch in this
situation, making an empty commit to master to do it seems wrong, and
otherwise it would need a whole separate code path, and would bypass using
git merge in favor of say, setting master to the syned branch. Which would
bypass git configs like arguably merge.ff and certianly
merge.verifySignatures. So don't want to do that.
2020-05-05 14:31:37 -04:00
Joey Hess
9fa940569c
added remote variants
Todo item is done at last.

Might later want to think about testing some other types of remotes that
can be tested locally. The git remote itself is probably already well
enough tested by the test suite that testremote is not needed. Could
test things like bup, or rsync to a local directory. Or even external,
although that would require embedding an external special remote program
into the test suite..
2020-04-30 13:52:03 -04:00
Joey Hess
fc1ae62ef1
added export remote tests 2020-04-30 13:13:08 -04:00
Joey Hess
735d2e90df
testremote in test is working
Not yet testing export, or remote variants, but it already adds several
hundred test cases, so big win.
2020-04-30 12:59:20 -04:00
Joey Hess
d7db481471
wip
This does not compile, and I hit a bad dead end. Wah.
2020-04-29 15:48:39 -04:00
Joey Hess
20f954c3b2
groundwork for adding testremote to git-annex test
Factored out a mkTestTree, which can be used to get a TestTree,
w/o needing to first run any annex actions, which the main test suite
cannot do because it does not operate in an annex repo to start with,
and it needs to start testing before a repo is available.
2020-04-29 13:16:43 -04:00
Joey Hess
fa98025de0
fix testremote to not throw away annex state
aeca7c2207 exposed this problem, but it
was never a good idea to have a series of test cases, some of which depend on
prior ones, and throw away annex state after each.
2020-04-28 17:19:07 -04:00
Joey Hess
19b5137227
addurl --fast error message improvement
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.

(Also importfeed --fast potentially.)
2020-04-27 13:48:14 -04:00
Joey Hess
c05c4e549e
sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes
Do not sync with a faster remote that was not specified.

That old behavior was only documented in the changelog, and was certianly
surprising. It also meant adding --fast made it slower..
2020-04-23 16:08:45 -04:00
Joey Hess
cd1676d604
fix bug involving local git remote and out of date location log
get --from, move --from: When used with a local git remote, these used to
silently skip files that the location log thought were present on the
remote, when the remote actually no longer contained them. Since that
behavior could be surprising, now instead display a warning.

I got very confused when I encountered this behavior, since it was silently
skipping a file I needed that whereis said was on the remote.

get without --from already displayed a "unable to access these remotes"
message, which while a bit misleading in that the remote is likely
accessible, but just doesn't contain the file, at least indicated something
went wrong.

Having get --from display a warning makes it in line with get
w/o --from, so seems certianly ok. It might be there are situations where
move --from is used, on eg a whole directory, and the user only wants to
move whatever is present in the remote, and is perfectly ok with files
that are not present being skipped. So I'm less sure about the new warning
being ok there. OTOH, only local git remotes avoiding displaying a warning
in that case too, so this just brings them into line with other remotes.

(Also note that this makes it a little bit faster when dealing with a lot of
files, since it avoids a redundant stat of the file.)
2020-04-21 12:36:58 -04:00
Joey Hess
529f488ec4
fix a thundering herd problem
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.

What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.

Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
2020-04-17 17:09:29 -04:00
Joey Hess
957a87b437
fix absolute filenames fed into --batch and git-annex info 2020-04-15 16:04:05 -04:00
Joey Hess
f85ca7dc80
fix all remaining -Wincomplete-uni-patterns warnings
A couple of these were probably actual bugs in edge cases. Most of the
changes I'm fine with. The fact that aeson's object returns sometihng
that we know will be an Object, but the type checker does not know is
kind of annoying.
2020-04-15 13:55:08 -04:00
Joey Hess
9cb69dbb76
support boolean git configs that are represented by the name of the setting with no value
Eg"core.bare" is the same as "core.bare = true".

Note that git treats "core.bare =" the same as "core.bare = false", so the
code had to become more complicated in order to treat the absense of a
value differently than an empty value. Ugh.
2020-04-13 13:35:22 -04:00
Joey Hess
ca9c6c5f60
Fix a potential failure to parse git config
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.

So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
2020-04-13 13:05:41 -04:00
Joey Hess
aeca7c2207
Sped up query commands that read the git-annex branch by around 5%
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.

--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
2020-04-09 13:54:43 -04:00
Joey Hess
c0cd07c36b
Ref ByteString conversion done
Test suite passes.
2020-04-07 17:41:09 -04:00
Joey Hess
435c722904
remove unused import 2020-03-30 16:07:10 -04:00
Joey Hess
87d5583a91
use programPath consistently, not readProgramFile
Improve git-annex's ability to find the path to its program, especially
when it needs to run itself in another repo to upgrade it.

Some parts of the code used readProgramFile, probably because I forgot that
programPath exists.

I noticed this when a git-annex auto-upgrade failed because it was running
git-annex upgrade --autoonly, but the code to run git-annex used
readProgramFile, which happened to point to an older build of git-annex.
2020-03-30 16:06:27 -04:00
Joey Hess
0e4d80d5c1
remove pre-commit hook
This was originally added so that unannex could prevent the hook from
running while files were in a state that the hook would interpret as
old-style unlocked and so would lock.

Now that's gone, so the only thing the hook was preventing was two
pre-commit processes running simulantaneously. But such concurrency
is normal in git-annex and should not be a problem.

Does mean that .git/hooks/pre-commit-annex might run more concurrently,
that seems the only risk of it causing any problems.
2020-03-30 11:54:04 -04:00
Kyle Meyer
39131b55ca
add --force-small: Send all non-regular files through addFile
Running `git annex add --force-small` on a modified submodule fails
when the submodule path is fed to hash-object.  This failure is
unlikely to be triggered by a caller passing a submodule explicitly to
`git annex add` because there's nothing useful that annex-add can do
with a submodule.  A more likely scenario for hitting this failure is
that the caller passes "." or a subdirectory to `annex-add` while a
submodule underneath the specified path happens to be modified.

addSmallOverridden already routes symbolic links through addFile
rather than using the custom hash-object/update-index call.  The
latter is valid only for regular files, so extend this condition so
that everything that isn't a regular file goes through addFile.  Doing
so avoids the above error because submodules come in as directories.
2020-03-26 13:14:16 -04:00
Kyle Meyer
339aebc6ad
add --force-small: Don't dereference link when checking file status
addSmallOverridden calls getFileStatus and then checks the result with
isSymbolicLink.  getFileStatus dereferences symbolic links, so
isSymbolicLink will always return false (assuming the getFileStatus
call doesn't fail on a broken link).  Use getSymbolicLinkStatus
instead.
2020-03-26 13:11:27 -04:00
Joey Hess
afe72d04ff
fix problems with upgrade of local remotes
Upgrade other repos than the current one by running git-annex upgrade
inside them, which avoids problems with upgrade code making assumptions
that the cwd will be inside the repo being upgraded.

In particular, this fixes a problem where upgrading a v7 repo to v8 caused
an ugly git error message.

I actually could not find a way to make Upgrade.V7 work properly
without changing directory to the remote. Once I got git ls-files to work,
the git cat-file failed because :path can only be used in the current git
repo.
2020-03-09 16:49:28 -04:00
Joey Hess
1978a24207
Fix bug that caused unlocked annexed dotfiles to be added to git by the smudge filter when annex.dotfiles was not set. 2020-03-09 14:20:02 -04:00
Joey Hess
01acb5212e
fix build 2020-03-09 12:31:14 -04:00
Joey Hess
7f992ef59c
mostly finished with createDirectoryUnder conversion
Remaining things needing converted are in the assistant, and Annex.Ssh.

Every other remaining call to createDirectoryIfMissing True has been
audited and is not relevant. The ones in Build/ of course don't get
included in the program. Others included eg, Remote.Tahoe and
Config.Files which both write to dotfiles under the home directory.
2020-03-06 11:57:15 -04:00
Joey Hess
eaa49ab53d
convert replaceFile to createDirectoryUnder
Since it was used on both worktree and .git/annex files, split into
multiple functions.

In passing, this also improves permissions of created directories in
.git/annex, using createAnnexDirectory on those.
2020-03-06 11:31:01 -04:00
Joey Hess
ccd8c43dc8
git-annex config: guard against non-repo-global configs
git-annex config: Only allow configs be set that are ones git-annex
actually supports reading from repo-global config, to avoid confused users
trying to set other configs with this.
2020-03-02 15:54:18 -04:00
Joey Hess
f6d629e483
changelog and minor style 2020-02-28 12:57:55 -04:00
Peter Simons
73cf523a4b
Fix build with ghc-8.8.x.
The 'fail' method has been moved to the 'MonadFail' class. I made the changes
so that the code still compiles with previous versions of 'base' that don't
have the new MonadFail class exported by Prelude yet.
2020-02-28 12:54:20 -04:00
Joey Hess
2366e7fb84
catch whereisKey exception and provide error messages when external programs neglect to
* whereis: If a remote fails to report on urls where a key
  is located, display a warning, rather than giving up and not displaying
  any information.
* When external special remotes fail but neglect to provide an error
  message, say what request failed, which is better than displaying an
  empty error message to the user.
2020-02-27 14:09:18 -04:00
Joey Hess
81e3faf810
Merge branch 'v7' 2020-02-26 18:15:18 -04:00
Joey Hess
d37975357d
Bugfix: export --tracking (a deprecated option) set annex-annex-tracking-branch, instead of annex-tracking-branch.
(cherry picked from commit a3a674d15b)
2020-02-26 18:08:04 -04:00
Joey Hess
8af6d2c3c5
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.

Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)

To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.

And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.

(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 18:05:36 -04:00
Joey Hess
c31e1be781
convert KeySource to RawFilePath 2020-02-21 10:04:44 -04:00
Joey Hess
029c883713
Merge branch 'master' into v8 2020-02-19 14:32:11 -04:00
Joey Hess
79a0435b77
automate remote.name.skipFetchAll
initremote, enableremote: Set remote.name.skipFetchAll when the remote
cannot be fetched from by git, so git fetch --all will not try to use it.
2020-02-19 13:58:26 -04:00
Joey Hess
69f2d1dd43
remoteConfig rework
remoteAnnexConfig will avoid bugs like
a3a674d15b

Use now more generic remoteConfig in a couple places that built
non-annex config settings manually before.
2020-02-19 13:45:11 -04:00
Joey Hess
a3a674d15b
Bugfix: export --tracking (a deprecated option) set annex-annex-tracking-branch, instead of annex-tracking-branch. 2020-02-19 13:34:24 -04:00
Joey Hess
72959b23e5
remove mention of receive.denyNonFastforwards on push failure
That was added back in 2013 commit 2af652e1b8
and I'm a bit unclear about the reasons.

It seemed that, at the time, receive.denyNonFastforwards=true, which is
the default in a repo created by git init --shared --bare (but not
without --shared), which the assistant did, caused problems syncing.
But even at the time the bug report showed an error message clearly
explaining that it was a non-fast-forward push being denied.

I tried it with the current version, and since git-annex sync pulls
from the bare repo and merges, it pushes a fast-forward. So there's no
failure to push. (There could be one if another push happened after the
pull, but you'd want it to fail then presumably.)

I'm not 100% sure what changed to make it not be a problem, but I know
I've seen this message in many circumstances and I can't ever recall it
having anything to do with any issue that prevented a push.

Based on doc/forum/non_fast_forward_error_with_git_annex_sync.mdwn,
which showed the problem when syncing from a direct mode repo,
and on doc/forum/receiving_indirect_renames_on_direct_repo___63__/comment_3_0246fff6c7c75f6be45bd257ec3872a5._comment
which seems to show the problem was actually a problem pulling,
I think there's a good chance that the problem actually involved direct
mode.
2020-02-19 11:46:24 -04:00
Joey Hess
06f6eb7a70
--only-annex --no-content combination 2020-02-18 12:29:31 -04:00
Joey Hess
a78eb6dd58
sync --only-annex and annex.synconlyannex
* Added sync --only-annex, which syncs the git-annex branch and annexed
  content but leaves managing the other git branches up to you.
* Added annex.synconlyannex git config setting, which can also be set with
  git-annex config to configure sync in all clones of the repo.

Use case is then the user has their own git workflow, and wants to use
git-annex without disrupting that, so they sync --only-annex to get the
git-annex stuff in sync in addition to their usual git workflow.

When annex.synconlyannex is set, --not-only-annex can be used to override
it.

It's not entirely clear what --only-annex --commit or --only-annex
--push should do, and I left that combination not documented because I
don't know if I might want to change the current behavior, which is that
such options do not override the --only-annex. My gut feeling is that
there is no good reasons to use such combinations; if you want to use
your own git workflow, you'll be doing your own committing and pulling
and pushing.

A subtle question is, how should import/export special remotes be handled?
Importing updates their remote tracking branch and merges it into master.
If --only-annex prevented that git branch stuff, then it would prevent
exporting to the special remote, in the case where it has changes that
were not imported yet, because there would be a unresolved conflict.

I decided that it's best to treat the fact that there's a remote tracking
branch for import/export as an implementation detail in this case. The more
important thing is that an import/export special remote is entirely annexed
content, and so it makes a lot of sense that --only-annex will still sync
with it.
2020-02-17 16:33:10 -04:00
Joey Hess
879f52a116
annex.tune.branchhash1=true bugfix
Fix support for repositories tuned with annex.tune.branchhash1=true,
including --all not working and git-annex log not displaying anything for
annexed files.
2020-02-14 15:22:48 -04:00
Joey Hess
352963690a
fsck --from remote -J concurrency bug
fsck --from remote: Fix a concurrency bug that could make it incorrectly
detect that content in the remote is corrupt, and remove it, resulting in
data loss.
2020-02-14 14:52:15 -04:00
Joey Hess
1883f7ef8f
support git remotes that need http basic auth
using git credential to get the password

One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
2020-01-22 16:16:19 -04:00
Joey Hess
5c6bf1be97
--whatelse is a better name than --describe-other-params
The use case is basically the user having forgotten, so --help would be
best, but it would be quite hard to include this in --help, since it may
even have to spin up an external special remote program.

I also considered --umm but typoed it the first time I tried it as
--uum, and while memorable, it's too cutesy. --whatelse is good because
it explicitly asks, what other params, besides the ones I've given?
2020-01-20 17:04:45 -04:00
Joey Hess
2be4122bfc
include passthrough params in --describe-other-params 2020-01-20 16:53:27 -04:00
Joey Hess
aa949bbb7d
initremote --describe-other-params
Does not yet include descriptions from external special remote programs.
2020-01-20 16:05:51 -04:00
Joey Hess
987076690c
started on --list-params-for 2020-01-15 14:09:30 -04:00
Joey Hess
6a982e38eb
a few more field functions 2020-01-15 12:57:56 -04:00
Joey Hess
2edf0506a5
a few forgotten remote config fields
preferreddir can be used with any special remote, so its parser needs to
be included in the commonFieldParsers.

initremote with uuid= changed to delete that field, so it does not
need to be included in commonFieldParsers. Note that, existing remotes
initialized before this change will have the field in remote.log.
This will not cause problems parsing, because the value will be
Accepted.

Grepping for 'Accepted "' found these, and I'm pretty sure this is all of
them.
2020-01-15 11:22:36 -04:00
Joey Hess
963239da5c
separate RemoteConfig parsing basically working
Many special remotes are not updated yet and are commented out.
2020-01-14 12:35:08 -04:00
Joey Hess
71ecfbfccf
be stricter about rejecting invalid configurations for remotes
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log

This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.

This also lays groundwork for rejecting unknown/typoed config keys.
2020-01-10 14:52:48 -04:00
Joey Hess
6db4aee7df
use --no-abbrev instead of --abbrev=40
This avoids hardcoding the sha size, so when git uses sha256, it will
output the full sha256 and not a truncation to 40 characters.

I reviewed git's history, and while there have been some
bugs with commands not supporting --no-abbrev (eg git diff --no-index
--no-abbrev was broken in git 2.1), none of the commands git-annex
uses will be impacted by those old bugs.
2020-01-07 12:29:37 -04:00
Joey Hess
5e4deb3620
support sha256 git repos
Git will eventually switch to sha2 and there will not be one single
shaSize anymore, but two (40 and 64).

Changed all parsers for git plumbing output to support both sizes of
shas.

One potential problem this does not deal with is, if somewhere in
git-annex it reads two shas from different sources, and compares them
to see if they're the same sha, it would fail if they're sha1 and sha256
of the same value. I don't know if that will really be a concern.
2020-01-07 12:22:19 -04:00
Joey Hess
2de3dddfd2
reinject --known: Fix bug that prevented it from working in a bare repo.
ifAnnexed in a bare repo passes to git cat-file :./filename , which it
refuses to do since the repo is bare.

Note that, reinject somefile someannexedfile in a bare repo silently does
nothing, because someannexedfile is never actually an annexed worktree
file, because the repo is bare.
2020-01-06 14:22:22 -04:00
Joey Hess
2cea674d1e
Merge branch 'master' into v8 2020-01-01 14:26:43 -04:00
Joey Hess
503788238c
add --force-annex/--force-git
options make it easier to override annex.largefiles configuration
(and potentially safer as it avoids bugs like the smudge bug fixed
in the last release)

Deleted some old comments that were posted to the man page discussing such
options.

Updated docs that used -c annex.largefiles to use the options.

Note that addSmallOverridden was needed to avoid the clean filter running
on the file. It would be possible to make addFile also update the index
directly, rather than going via git add. However, it was not necessary,
and I want to avoid breaking on some edge case, particularly if the code in
addSmallOverridden has some oversight.

Also, when annex.addunlocked is set and annex.largefiles does not match a file,
git annex add --force-large works, but git status will then show the file
as added, with a unstaged modification. The unstaged modification adds the
file to git. This is identical behavior to using -c annex.largefiles=nothing
when annex.addunlocked is set. This does not prevent committing what was
intended to be added. I have not gotten to the bottom of why git thinks
the file is modified and runs it through the clean filter in this case.
2020-01-01 14:03:06 -04:00
Joey Hess
ea3cb7d277
fix a case where file tracked by git unexpectedly becomes annex pointer file
smudge: When annex.largefiles=anything, files that were already stored in
git, and have not been modified could sometimes be converted to being
stored in the annex. Changes in 7.20191024 made this more of a problem.
This case is now detected and prevented.
2019-12-27 15:08:03 -04:00
Joey Hess
3cd3757236
annex.dotfiles
The git add behavior changes could be avoided if it turns out to be
really annoying, but then it would need to behave the old way when
annex.dotfiles=false and the new way when annex.dotfiles=true. I'd
rather not have the config option result in such divergent behavior as
`git annex add .` skipping a dotfile (old) vs adding to annex (new).

Note that the assistant always adds dotfiles to the annex.
This is surprising, but not new behavior. Might be worth making it also
honor annex.dotfiles, but I wonder if perhaps some user somewhere uses
it and keeps large files in a directory that happens to begin with a
dot. Since dotfiles and dotdirs are a unix culture thing, and the
assistant users may not be part of that culture, it seems best to keep
its current behavior for now.
2019-12-26 16:33:39 -04:00
Joey Hess
de14a7bab5
didn't mean to commit this incomplete workaround
though I suppose it's nice to have it in the history..
2019-12-26 15:07:50 -04:00
Joey Hess
293f95c2d6
analysis 2019-12-26 15:05:36 -04:00
Joey Hess
37467a008f
annex.addunlocked expressions
* annex.addunlocked can be set to an expression with the same format used by
  annex.largefiles, in case you want to default to unlocking some files but
  not others.
* annex.addunlocked can be configured by git-annex config.

Added a git-annex-matching-expression man page, broken out from
tips/largefiles.

A tricky consequence of this is that git-annex add --relaxed
honors annex.addunlocked, but an expression might want to know the size
or content of an url, which it's not going to download. I decided it was
better not to fail, and just dummy up some plausible data in that case.

Performance impact should be negligible. The global config is already
loaded for annex.largefiles. The expression only has to be parsed once,
and in the simple true/false case, it should not do any additional work
matching it.
2019-12-20 15:56:25 -04:00
Joey Hess
5591622731
git-annex-config --set/--unset: No longer change the local git config setting
e53070c1f quietly made it set the local git config too, but that was never
documented anywhere, and it had surprising results. If I set
annex.largefiles globally in a repo, I would expect to be able to change it
in another repo, and the original repo would get the change and use it,
rather than being stuck on the old value set there.

And, if I have a local annex.largefiles and set a different global default,
I'd be surprised to have my local setting overwritten.

annex.securehashesonly does need to be set locally, since it's a security
feature and the global is only a default until it gets set locally. So
special cased.
2019-12-20 13:17:28 -04:00
Joey Hess
686791c4ed
more RawFilePath
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
2019-12-18 17:10:28 -04:00
Joey Hess
7d9dff5b05
Merge branch 'master' into bs
and update changelog
2019-12-18 15:13:30 -04:00
Joey Hess
7fd5376334
inprogress: Support --key 2019-12-18 14:14:16 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
a7004375ec
avoid deprecation warning 2019-12-06 15:47:56 -04:00
Joey Hess
a0168cd9a2
use RawFilePath getSymbolicLinkStatus for speed 2019-12-06 15:42:54 -04:00
Joey Hess
5f391179f1
use RawFilePath getFileStatus for speed
Only done on those calls to getFileStatus that had a RawFilePath, not a
FilePath. The others would probably be just as fast if converted to use
it with toRawFilePath, but I'm not 100% sure.

Note that genInodeCache' uses fromRawFilePath, but that value only gets
used on Windows, so on unix the thunk will never be evaluated.
2019-12-06 14:44:42 -04:00
Joey Hess
0e9d699ef3
use R.readSymbolicLink
This will be faster once gitAnnexLink is converted to a RawFilePath.
2019-12-06 14:20:18 -04:00
Joey Hess
3266ad3ff7
everything is building again
However, the test suite fails some quickchecks, so this branch is not
yet in a mergeable state.
2019-12-05 15:10:23 -04:00
Joey Hess
c20f4704a7
all commands building except for assistant
also, changed ConfigValue to a newtype, and moved it into Git.Config.
2019-12-05 14:41:18 -04:00
Joey Hess
3c7fd09ec8
get many more commands building again
about half are building now
2019-12-05 11:40:10 -04:00
Joey Hess
b88f89c1ef
get the most commonly used commands building again
A quick benchmark of whereis shows not much speed improvement, maybe a
few percent. Profiling it found a hotspot, adds to todo.
2019-12-04 13:45:18 -04:00
Joey Hess
f3047d7186
include git-annex-shell back in
Also pushed ConfigKey down into the Git modules, which is the bulk of
the changes.
2019-12-02 11:51:52 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
25ba8156bc
improve benchmark --databases
* benchmark: Changed --databases to take a parameter specifiying the size
  of the database to benchmark.
* benchmark --databases: Display size of the populated database.
* benchmark --databases: Improve the "addAssociatedFile to (new)"
  benchmark to really add new values, not overwriting old values.
2019-11-21 17:25:20 -04:00
Joey Hess
6f35b576d7
encourage use of import from directory special remote rather than legacy interface 2019-11-19 13:30:27 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
0be23bae2f
refactor
Better to not have a single function module, and better to have a more
specific type than Bool.

This commit was sponsored by Jack Hill on Patreon
2019-11-11 19:10:52 -04:00
Joey Hess
3b34d123ed
Added annex.allowsign option.
This commit was sponsored by Ilya Shlyakhter on Patreon.
2019-11-11 16:28:56 -04:00
Joey Hess
25f912de5b
benchmark: Add --databases to benchmark sqlite databases
Rescued from commit 11d6e2e260 which removed
db benchmarks in favor of benchmarking arbitrary git-annex commands. Which
is nice and general, but microbenchmarks are useful too.
2019-10-29 16:59:27 -04:00
Joey Hess
4a3f3a2cb5
make git add only annex when configured by annex.largefiles 2019-10-24 14:17:29 -04:00
Joey Hess
168f91efec
avoid warning over name 2019-10-24 11:46:40 -04:00
Joey Hess
bd197be3ad
annex.gitaddtoannex configuration
Added annex.gitaddtoannex configuration. Setting it to false prevents
git add from usually adding files to the annex.
(Unless the file was annexed before, or a renamed annexed file is detected.)

Currently left at true; some users are encouraging it be set to false.
2019-10-23 15:29:46 -04:00
Joey Hess
ec08b66bda
shouldAnnex: check isInodeKnown
Renamed unlocked files are now detected, and will always be
annexed, unless annex.largefiles disallows it.

This allows for git add's behavior to later be changed to otherwise
not annex files (whether by default or as a config option), without
worrying about the rename case.

This is not a major behavior change; annexing is still the default. But
there is one case where the behavior is changed, I think for the better:

	touch f
	git -c annex.largefiles=nothing add f
	git add bigfile
	git commit -m ...
	mv bigfile f
	git add f

Before, git-annex would see that f was previously not annexed,
and so the renamed bigfile content gets added to git. Now, it notices
that the inode is the one that bigfile used, and so it annexes it.

This potentially slows down git add a lot in some repositories because
of the poor performance of isInodeKnown when there are a lot of unlocked
files. Configuring annex.largefiles avoids the speed hit.
2019-10-23 14:49:45 -04:00
Joey Hess
3d4aab38ce
remove obsolete comment 2019-10-21 13:51:38 -04:00
Joey Hess
668b878995
remove recently added and unncessary cwd parameter
I later made Utility.Su change back to the cwd, so this parameter is not
needed.
2019-10-21 13:48:52 -04:00
Joey Hess
9a5d9019ba
Deal with pkexec changing to root's home directory when running a command.
Wow, that's not documented anywhere, and seems like a major gotcha in
pkexec.

Broke enable-tor.
2019-10-21 12:39:19 -04:00
Joey Hess
9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00
Joey Hess
debafcba2b
autoenable sameas remotes 2019-10-11 15:52:40 -04:00
Joey Hess
ec778888d2
got enableremote working for sameas
Also the assistant can enable sameas remotes, should work, but not
tested.
2019-10-11 15:11:08 -04:00
Joey Hess
35d7ffe128
initremote --sameas fully working
And using sameas remotes is working.

Moved annex-config-uuid setting out of Remote.Helper.Special.
EnableRemote will also have to set it.
2019-10-11 14:19:10 -04:00
Joey Hess
91eed85fd4
add sameas inherited configs to newConfig
This makes initremote --sameas work with encryption inherited.
2019-10-11 13:05:20 -04:00
Joey Hess
59908586f4
rename RemoteConfigKey to RemoteConfigField
And some associated renames.
I was going to have some values named fooKeyKey otherwise..
2019-10-10 15:44:05 -04:00
Joey Hess
d1130ea04a
get rid of hardcoded "name" lookups
Support "sameas-name" being set instead.

In RenameRemote, rename which ever of the two is set.
2019-10-10 13:25:10 -04:00
Joey Hess
97b499a4dc
use sameas-name and sameas-uuid for sameas remotes
initremote --sameas=remotename sets sameas-name and sameas-uuid

Using sameas-name rather than name prevents old git-annex initremote
from enabling a sameas remote by name, since it would not handle it
correctly.
2019-10-10 12:32:05 -04:00
Joey Hess
61b384d2b7
add --sameas option, not yet used 2019-10-01 12:36:25 -04:00
Joey Hess
2b55a2b882
remotedaemon: Don't list --stop in help since it's not supported.
Also, move out of plumbing section. When using tor, the remotedaemon is
part of the user's workflow, as it runs the tor hidden service.
2019-09-30 14:40:46 -04:00
Joey Hess
090898a138
adjust --lock: This enters an adjusted branch where files are locked.
Straightforward, except for the issue of how to reverse LockAdjustment.

With --unlock, a commit that modifies/adds unlocked files gets reverse
adjusted to use locked files. That's fairly reasonable, I think.

But reversing --lock by unlocking all modified files feels wrong. Maybe
that's just because repositories typically seem to still have mostly
locked files in them (unless one is in an adjusted unlocked branch of
course!)

It may be that eventually how to reverse both will need to be configurable,
I don't know.
2019-09-27 14:23:25 -04:00
Joey Hess
53fd746705
avoid some build warnings on windows 2019-09-12 14:11:19 -04:00
Joey Hess
99b509572d
post-receive hook updateInstead emulation cleanup
The code is only needed because for a long time, git-annex didn't
install hooks in repos on crippled filesystems. Now it does, and they
work at least on FAT (where all files are executable) and Windows.

It would be possible to remove this code in v8 simply by re-installing
the hooks.
2019-09-11 14:41:51 -04:00
Joey Hess
061231621e
Merge branch 'master' into v7-default 2019-09-10 16:06:43 -04:00
Joey Hess
0af7ebdc2a
info: Display trust level when getting info on a uuid, same as on a remote. 2019-09-01 16:48:46 -04:00
Joey Hess
f845195354
Added annex.autoupgraderepository configuration
Can be set to false to prevent any automatic repository upgrades.

Also, removed direct mode specific upgrade code in Annex.Init, and made
needsUpgrade always include the name/path of the repo, so if
there's a problem it's clear what repo has the problem.

And, made needsUpgrade catch any exceptions that might occur during the
upgrade, so it can display a more useful error message than just the
exception.
2019-09-01 13:42:26 -04:00
Joey Hess
3f0eef4baa
v7 for all repositories
* Default to v7 for new repositories.
* Automatically upgrade v5 repositories to v7.
2019-08-30 14:09:14 -04:00
Joey Hess
4f59ac05b6
info: remove "repository mode"
info: Removed the "repository mode" from its output (including the --json
output) since with the removal of direct mode, there is no repository mode.
2019-08-29 14:12:22 -04:00