Commit graph

2195 commits

Author SHA1 Message Date
Joey Hess
ea3cb7d277
fix a case where file tracked by git unexpectedly becomes annex pointer file
smudge: When annex.largefiles=anything, files that were already stored in
git, and have not been modified could sometimes be converted to being
stored in the annex. Changes in 7.20191024 made this more of a problem.
This case is now detected and prevented.
2019-12-27 15:08:03 -04:00
Joey Hess
de14a7bab5
didn't mean to commit this incomplete workaround
though I suppose it's nice to have it in the history..
2019-12-26 15:07:50 -04:00
Joey Hess
293f95c2d6
analysis 2019-12-26 15:05:36 -04:00
Joey Hess
37467a008f
annex.addunlocked expressions
* annex.addunlocked can be set to an expression with the same format used by
  annex.largefiles, in case you want to default to unlocking some files but
  not others.
* annex.addunlocked can be configured by git-annex config.

Added a git-annex-matching-expression man page, broken out from
tips/largefiles.

A tricky consequence of this is that git-annex add --relaxed
honors annex.addunlocked, but an expression might want to know the size
or content of an url, which it's not going to download. I decided it was
better not to fail, and just dummy up some plausible data in that case.

Performance impact should be negligible. The global config is already
loaded for annex.largefiles. The expression only has to be parsed once,
and in the simple true/false case, it should not do any additional work
matching it.
2019-12-20 15:56:25 -04:00
Joey Hess
5591622731
git-annex-config --set/--unset: No longer change the local git config setting
e53070c1f quietly made it set the local git config too, but that was never
documented anywhere, and it had surprising results. If I set
annex.largefiles globally in a repo, I would expect to be able to change it
in another repo, and the original repo would get the change and use it,
rather than being stuck on the old value set there.

And, if I have a local annex.largefiles and set a different global default,
I'd be surprised to have my local setting overwritten.

annex.securehashesonly does need to be set locally, since it's a security
feature and the global is only a default until it gets set locally. So
special cased.
2019-12-20 13:17:28 -04:00
Joey Hess
686791c4ed
more RawFilePath
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
2019-12-18 17:10:28 -04:00
Joey Hess
7d9dff5b05
Merge branch 'master' into bs
and update changelog
2019-12-18 15:13:30 -04:00
Joey Hess
7fd5376334
inprogress: Support --key 2019-12-18 14:14:16 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
a7004375ec
avoid deprecation warning 2019-12-06 15:47:56 -04:00
Joey Hess
a0168cd9a2
use RawFilePath getSymbolicLinkStatus for speed 2019-12-06 15:42:54 -04:00
Joey Hess
5f391179f1
use RawFilePath getFileStatus for speed
Only done on those calls to getFileStatus that had a RawFilePath, not a
FilePath. The others would probably be just as fast if converted to use
it with toRawFilePath, but I'm not 100% sure.

Note that genInodeCache' uses fromRawFilePath, but that value only gets
used on Windows, so on unix the thunk will never be evaluated.
2019-12-06 14:44:42 -04:00
Joey Hess
0e9d699ef3
use R.readSymbolicLink
This will be faster once gitAnnexLink is converted to a RawFilePath.
2019-12-06 14:20:18 -04:00
Joey Hess
3266ad3ff7
everything is building again
However, the test suite fails some quickchecks, so this branch is not
yet in a mergeable state.
2019-12-05 15:10:23 -04:00
Joey Hess
c20f4704a7
all commands building except for assistant
also, changed ConfigValue to a newtype, and moved it into Git.Config.
2019-12-05 14:41:18 -04:00
Joey Hess
3c7fd09ec8
get many more commands building again
about half are building now
2019-12-05 11:40:10 -04:00
Joey Hess
b88f89c1ef
get the most commonly used commands building again
A quick benchmark of whereis shows not much speed improvement, maybe a
few percent. Profiling it found a hotspot, adds to todo.
2019-12-04 13:45:18 -04:00
Joey Hess
f3047d7186
include git-annex-shell back in
Also pushed ConfigKey down into the Git modules, which is the bulk of
the changes.
2019-12-02 11:51:52 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
25ba8156bc
improve benchmark --databases
* benchmark: Changed --databases to take a parameter specifiying the size
  of the database to benchmark.
* benchmark --databases: Display size of the populated database.
* benchmark --databases: Improve the "addAssociatedFile to (new)"
  benchmark to really add new values, not overwriting old values.
2019-11-21 17:25:20 -04:00
Joey Hess
6f35b576d7
encourage use of import from directory special remote rather than legacy interface 2019-11-19 13:30:27 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
0be23bae2f
refactor
Better to not have a single function module, and better to have a more
specific type than Bool.

This commit was sponsored by Jack Hill on Patreon
2019-11-11 19:10:52 -04:00
Joey Hess
3b34d123ed
Added annex.allowsign option.
This commit was sponsored by Ilya Shlyakhter on Patreon.
2019-11-11 16:28:56 -04:00
Joey Hess
25f912de5b
benchmark: Add --databases to benchmark sqlite databases
Rescued from commit 11d6e2e260 which removed
db benchmarks in favor of benchmarking arbitrary git-annex commands. Which
is nice and general, but microbenchmarks are useful too.
2019-10-29 16:59:27 -04:00
Joey Hess
4a3f3a2cb5
make git add only annex when configured by annex.largefiles 2019-10-24 14:17:29 -04:00
Joey Hess
168f91efec
avoid warning over name 2019-10-24 11:46:40 -04:00
Joey Hess
bd197be3ad
annex.gitaddtoannex configuration
Added annex.gitaddtoannex configuration. Setting it to false prevents
git add from usually adding files to the annex.
(Unless the file was annexed before, or a renamed annexed file is detected.)

Currently left at true; some users are encouraging it be set to false.
2019-10-23 15:29:46 -04:00
Joey Hess
ec08b66bda
shouldAnnex: check isInodeKnown
Renamed unlocked files are now detected, and will always be
annexed, unless annex.largefiles disallows it.

This allows for git add's behavior to later be changed to otherwise
not annex files (whether by default or as a config option), without
worrying about the rename case.

This is not a major behavior change; annexing is still the default. But
there is one case where the behavior is changed, I think for the better:

	touch f
	git -c annex.largefiles=nothing add f
	git add bigfile
	git commit -m ...
	mv bigfile f
	git add f

Before, git-annex would see that f was previously not annexed,
and so the renamed bigfile content gets added to git. Now, it notices
that the inode is the one that bigfile used, and so it annexes it.

This potentially slows down git add a lot in some repositories because
of the poor performance of isInodeKnown when there are a lot of unlocked
files. Configuring annex.largefiles avoids the speed hit.
2019-10-23 14:49:45 -04:00
Joey Hess
3d4aab38ce
remove obsolete comment 2019-10-21 13:51:38 -04:00
Joey Hess
668b878995
remove recently added and unncessary cwd parameter
I later made Utility.Su change back to the cwd, so this parameter is not
needed.
2019-10-21 13:48:52 -04:00
Joey Hess
9a5d9019ba
Deal with pkexec changing to root's home directory when running a command.
Wow, that's not documented anywhere, and seems like a major gotcha in
pkexec.

Broke enable-tor.
2019-10-21 12:39:19 -04:00
Joey Hess
9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00
Joey Hess
debafcba2b
autoenable sameas remotes 2019-10-11 15:52:40 -04:00
Joey Hess
ec778888d2
got enableremote working for sameas
Also the assistant can enable sameas remotes, should work, but not
tested.
2019-10-11 15:11:08 -04:00
Joey Hess
35d7ffe128
initremote --sameas fully working
And using sameas remotes is working.

Moved annex-config-uuid setting out of Remote.Helper.Special.
EnableRemote will also have to set it.
2019-10-11 14:19:10 -04:00
Joey Hess
91eed85fd4
add sameas inherited configs to newConfig
This makes initremote --sameas work with encryption inherited.
2019-10-11 13:05:20 -04:00
Joey Hess
59908586f4
rename RemoteConfigKey to RemoteConfigField
And some associated renames.
I was going to have some values named fooKeyKey otherwise..
2019-10-10 15:44:05 -04:00
Joey Hess
d1130ea04a
get rid of hardcoded "name" lookups
Support "sameas-name" being set instead.

In RenameRemote, rename which ever of the two is set.
2019-10-10 13:25:10 -04:00
Joey Hess
97b499a4dc
use sameas-name and sameas-uuid for sameas remotes
initremote --sameas=remotename sets sameas-name and sameas-uuid

Using sameas-name rather than name prevents old git-annex initremote
from enabling a sameas remote by name, since it would not handle it
correctly.
2019-10-10 12:32:05 -04:00
Joey Hess
61b384d2b7
add --sameas option, not yet used 2019-10-01 12:36:25 -04:00
Joey Hess
2b55a2b882
remotedaemon: Don't list --stop in help since it's not supported.
Also, move out of plumbing section. When using tor, the remotedaemon is
part of the user's workflow, as it runs the tor hidden service.
2019-09-30 14:40:46 -04:00
Joey Hess
090898a138
adjust --lock: This enters an adjusted branch where files are locked.
Straightforward, except for the issue of how to reverse LockAdjustment.

With --unlock, a commit that modifies/adds unlocked files gets reverse
adjusted to use locked files. That's fairly reasonable, I think.

But reversing --lock by unlocking all modified files feels wrong. Maybe
that's just because repositories typically seem to still have mostly
locked files in them (unless one is in an adjusted unlocked branch of
course!)

It may be that eventually how to reverse both will need to be configurable,
I don't know.
2019-09-27 14:23:25 -04:00
Joey Hess
53fd746705
avoid some build warnings on windows 2019-09-12 14:11:19 -04:00
Joey Hess
99b509572d
post-receive hook updateInstead emulation cleanup
The code is only needed because for a long time, git-annex didn't
install hooks in repos on crippled filesystems. Now it does, and they
work at least on FAT (where all files are executable) and Windows.

It would be possible to remove this code in v8 simply by re-installing
the hooks.
2019-09-11 14:41:51 -04:00
Joey Hess
061231621e
Merge branch 'master' into v7-default 2019-09-10 16:06:43 -04:00
Joey Hess
0af7ebdc2a
info: Display trust level when getting info on a uuid, same as on a remote. 2019-09-01 16:48:46 -04:00
Joey Hess
f845195354
Added annex.autoupgraderepository configuration
Can be set to false to prevent any automatic repository upgrades.

Also, removed direct mode specific upgrade code in Annex.Init, and made
needsUpgrade always include the name/path of the repo, so if
there's a problem it's clear what repo has the problem.

And, made needsUpgrade catch any exceptions that might occur during the
upgrade, so it can display a more useful error message than just the
exception.
2019-09-01 13:42:26 -04:00