Commit graph

36 commits

Author SHA1 Message Date
Joey Hess
9483b10469
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.

Especially when the --all optimisation in the previous commit
pre-cached the location log.

This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.

The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.

But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.

Clearly there could be some further benchmarking and tuning here.
2020-07-07 14:18:55 -04:00
Joey Hess
98e2e3cb9c
optimise logfile to key parsing
Using bytestring-filepath
2020-07-07 13:03:33 -04:00
Joey Hess
879f52a116
annex.tune.branchhash1=true bugfix
Fix support for repositories tuned with annex.tune.branchhash1=true,
including --all not working and git-annex log not displaying anything for
annexed files.
2020-02-14 15:22:48 -04:00
Joey Hess
686791c4ed
more RawFilePath
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
2019-12-18 17:10:28 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
12e4906657
refactor
locationLogFileKey had an out of date list of toplevel log files to skip
over, and was only not broken because the other toplevel log files don't
look like keys. Fixed that too.
2019-03-06 17:54:29 -04:00
Joey Hess
d65a78ff5b
Fix cleanup of git-annex:export.log after git-annex forget --drop-dead
This log, unlike all other current top-level logs, is a new format log.

I have not checked what throwing it at the old log parser did, but it seems
likely it ignored unparsable lines, and so perhaps deleted all lines from
the log.
2019-02-22 21:34:31 -04:00
Joey Hess
9887a378fe
renamings to make clean when old-format logs are being used 2019-02-21 13:43:44 -04:00
Joey Hess
e8bfc3640b
storing ContentIdentifier in the git-annex branch 2019-02-20 15:40:07 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
0a7c5a9982
dropdead per-remote metadata
Had to refactor pure code into separate modules so it is accessible
inside Annex.Branch.Transitions.

This commit was sponsored by Peter on Patreon.
2018-09-05 13:52:46 -04:00
Joey Hess
5c99f6247e
per-remote metadata storage
Actually very straightforward reuse of the metadata log file code.
Although I had to add a todo item as git-annex forget won't clean up
dead remote's metadata yet.

This would be worth adding to the external special remote interface
sometime. Have not opened a todo though, guess I'll wait until something
needs it.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 12:23:22 -04:00
Joey Hess
978885247e
implement export.log and resolve export conflicts
Incremental export updates work now too.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-08-31 15:47:23 -04:00
Joey Hess
c3970f6c1a
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting.
This commit was supported by the NSF-funded DataLad project.
2017-03-30 19:35:30 -04:00
Joey Hess
339464e847
config: New command for storing configuration in the git-annex branch.
Any config names can be set using this; git-annex commands will only look
at specific ones that make sense and are worth the overhead of querying the
branch.

This might also be useful for storing whatever other config-type stuff the
user might want to shove into the git-annex branch.

This commit was sponsored by Jochen Bartl on Patreon.
2017-01-30 16:46:38 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
9445556c97 rethought distributed fsck; instead add activity.log and expire command
This is much more space efficient!
2015-04-05 12:50:02 -04:00
Joey Hess
b0575c621f implement annex.tune.branchhash1
I hope this doesn't impact speed much -- it does have to pull out a value
from Annex state every time it accesses the branch now.

The test case I dropped has never caught any problems that I can remember,
and would have been rather difficult to convert.
2015-01-28 17:17:26 -04:00
Joey Hess
e8c376e0ad import Data.Default in Common 2015-01-28 16:11:28 -04:00
Joey Hess
0fd5f257d0 groundwork for parameterizing hash depth 2015-01-28 15:55:17 -04:00
Joey Hess
70736d2b41 Repository tuning parameters can now be passed when initializing a repository for the first time.
* init: Repository tuning parameters can now be passed when initializing a
  repository for the first time. For details, see
  http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
  that has been tuned in incompatable ways.
2015-01-27 17:38:06 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
b61c6bc2ff hlint 2014-10-09 15:46:05 -04:00
Joey Hess
9fd95d9025 indent with tabs not spaces
Found these with:
git grep "^  " $(find -type  f -name \*.hs) |grep -v ':  where'

Unfortunately there is some inline hamlet that cannot use tabs for
indentation.

Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
2014-10-09 15:09:26 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
e2c44bf656 implement chunk logs
Slightly tricky as they are not normal UUIDBased logs, but are instead maps
from (uuid, chunksize) to chunkcount.

This commit was sponsored by Frank Thomas.
2014-07-24 16:23:36 -04:00
Joey Hess
fe19e15040 reorg matcher types; no non-type code changes 2014-03-29 14:43:34 -04:00
Joey Hess
431d805a96 factored out a generic MapLog from uuid-based logs
UUIDBased is just a MapLog with a UUID for the field.
2014-03-15 13:45:25 -04:00
Joey Hess
9f7e76130e add metadata command to get/set metadata
Adds metadata log, and command.

Note that unsetting field values seems to currently be broken.
And in general this has had all of 2 minutes worth of testing.

This commit was sponsored by Julien Lefrique.
2014-02-12 21:30:33 -04:00
Joey Hess
40cec65ace more hlint 2014-02-11 10:48:52 -04:00
Joey Hess
d66535f065 global numcopies setting
* numcopies: New command, sets global numcopies value that is seen by all
  clones of a repository.
* The annex.numcopies git config setting is deprecated. Once the numcopies
  command is used to set the global number of copies, any annex.numcopies
  git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.

This global numcopies setting is needed to let preferred content
expressions operate on numcopies.

It's also convenient, because typically if you want git-annex to preserve N
copies of files in a repo, you want it to do that no matter which repo it's
running in. Making it global avoids needing to warn the user about gotchas
involving inconsistent annex.numcopies settings.
(See changes to doc/numcopies.mdwn.)

Added a new variety of git-annex branch log file, that holds only 1 value.
Will probably be useful for other stuff later.

This commit was sponsored by Nicolas Pouillard.
2014-01-20 16:47:56 -04:00
Joey Hess
3e68c1c2fd add remote state logs
This allows a remote to store a piece of arbitrary state associated with a
key. This is needed to support Tahoe, where the file-cap is calculated from
the data stored in it, and used to retrieve a key later. Glacier also would
be much improved by using this.

GETSTATE and SETSTATE are added to the external special remote protocol.

Note that the state is left as-is even when a key is removed from a remote.
It's up to the remote to decide when it wants to clear the state.

The remote state log, $KEY.log.rmt, is a UUID-based log. However,
rather than using the old UUID-based log format, I created a new variant
of that format. The new varient is more space efficient (since it lacks the
"timestamp=" hack, and easier to parse (and the parser doesn't mess with
whitespace in the value), and avoids compatability cruft in the old one.

This seemed worth cleaning up for these new files, since there could be a
lot of them, while before UUID-based logs were only used for a few log
files at the top of the git-annex branch. The transition code has also
been updated to handle these new UUID-based logs.

This commit was sponsored by Daniel Hofer.
2014-01-03 16:35:57 -04:00
Joey Hess
29ca49dad4 add a log file for scheduled activities 2013-10-07 16:06:34 -04:00
Joey Hess
62beaa1a86 refactor git-annex branch log filename code into central location
Having one module that knows about all the filenames used on the branch
allows working back from an arbitrary filename to enough information about
it to implement dropping dead remotes and doing other log file compacting
as part of a forget transition.
2013-08-29 19:13:00 -04:00