Commit graph

43 commits

Author SHA1 Message Date
Joey Hess
7b2d236556
importfeed: stream metadata for 5% speedup
On top of the 10% speedup from streaming url logs.
2020-07-14 14:35:26 -04:00
Joey Hess
535cdc8d48
importfeed: Made checking known urls step around 10% faster.
This was a bit disappointing, I was hoping for a 2x speedup. But, I think
the metadata lookup is wasting a lot of time and also needs to be made to
stream.

The changes to catObjectStreamLsTree were benchmarked to not also speed
up --all around 3% more. Seems I managed to make it polymorphic after all.
2020-07-14 12:47:51 -04:00
Joey Hess
9f6bd6cc05
add inRepoDetails
planned to use for an optimisation

most things using stagedDetails were not expecting to get dup files in a
conflicted merge and deal with them, so converted them to use
inRepoDetails.
2020-07-08 15:36:35 -04:00
Joey Hess
7347e50123
add stage number to stagedDetails parser
And convert parser to attoparsec, probably faster.

Before, a parse failure threw the whole --stage output line in to the
filename, which was certianly a bad idea, so fixed that.
2020-07-08 15:05:12 -04:00
Joey Hess
aeca7c2207
Sped up query commands that read the git-annex branch by around 5%
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.

--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
2020-04-09 13:54:43 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
bfc9039ead
convert git-annex branch access to ByteStrings and Builders
Most of the individual logs are not converted yet, only presense logs
have an efficient ByteString Builder implemented so far. The rest
convert to and from String.
2019-01-03 13:21:48 -04:00
Joey Hess
451171b7c1
clean up url removal presence update
* rmurl: Fix a case where removing the last url left git-annex thinking
  content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
  used to update the presence information of the external special remote
  that called them; this was not documented behavior and is no longer done.

Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.

In AddUrl, had to make it update location tracking, to handle the
non-web-url case.

This commit was sponsored by Ewen McNeill on Patreon.
2018-10-04 17:35:49 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
3febb79c8f
wip 2017-11-28 17:17:40 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
1f3358512a
refactor 2016-01-19 15:55:32 -04:00
Joey Hess
656fc1c881 fsck: Added --distributed and --expire options, for distributed fsck. 2015-04-01 17:53:16 -04:00
Joey Hess
9e25cbde20 importfeed: Avoid downloading a redundant item from a feed whose guid has been downloaded before, even when the url has changed.
To support this, always store itemid in metadata; before this was only done
when annex.genmetadata was set.
2015-03-31 13:30:13 -04:00
Joey Hess
f0195b2a43 Fix GETURLS in external special remote protocol to strip downloader prefix from logged url info before checking for the specified prefix.
This doesn't change what GETURLS returns, but only whether it matches
any prefix that the external special remote asked for.
2015-03-27 18:49:03 -04:00
Joey Hess
6045406deb Added SETURIPRESENT and SETURIMISSING to external special remote protocol
Useful for things like ipfs that don't use regular urls.

An external special remote can add a regular url to a key, and then
git-annex get will download it from the web. But for ipfs, we want to
instead tell git-annex that the uri uses OtherDownloader. Before this
change, the external special remote protocol lacked a way to do that.
2015-03-05 13:50:15 -04:00
Joey Hess
b0575c621f implement annex.tune.branchhash1
I hope this doesn't impact speed much -- it does have to pull out a value
from Annex state every time it accesses the branch now.

The test case I dropped has never caught any problems that I can remember,
and would have been rather difficult to convert.
2015-01-28 17:17:26 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
589a048a7d fix addurl behavior when location and url logs are inconsistent
The url log could have an url for a key, while the location log thinks it's
not present in the web. In this case, addurl --file url would not do
anything. Fixed it to re-add the web as a location.

I don't know how this situation could arise, but I saw it in the wild in
the conference_proceedings repo, affecting key
URL-s17806003--http://mirror.linux.org.au/pub/linux.conf.au/2014/Wednesday/53-Building_Effective_Alliances_around_the_Trans-Pacific_Partnershi-c0505b631127ccc67e38e637344d988e
Investigating the presence log, it looked like that key
was originally listed as present in the web, then in commit
56abf9e9f3e691ed9d83513037d4019313321ca3 someone else's git-annex
set it and some other things to not present in the web. It would be
interesting to know what that user did, but I doubt I'll be able to find
out. All I can tell from this investigation is that the inconsistency was
not introduced when originally addurl-ing the url.
2014-12-29 14:22:47 -04:00
Joey Hess
7e422269a6 move dummy uuids to Annex.UUID 2014-12-17 13:57:52 -04:00
Joey Hess
30bf112185 Urls can now be claimed by remotes. This will allow creating, for example, a external special remote that handles magnet: and *.torrent urls. 2014-12-08 19:15:07 -04:00
Joey Hess
cb6e16947d add stub claimUrl 2014-12-08 13:40:15 -04:00
Joey Hess
8093008ef4 External special remote protocol now includes commands for setting and getting the urls associated with a key. 2014-12-08 13:32:46 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
eb42bde19a sync, pre-commit, indirect: Avoid unnecessarily catting non-symlink files from git, which can be so large it runs out of memory. 2013-09-19 14:48:42 -04:00
Joey Hess
62beaa1a86 refactor git-annex branch log filename code into central location
Having one module that knows about all the filenames used on the branch
allows working back from an arbitrary filename to enough information about
it to implement dropping dead remotes and doing other log file compacting
as part of a forget transition.
2013-08-29 19:13:00 -04:00
Joey Hess
824241b6fb better cases 2013-08-22 23:44:13 -04:00
Joey Hess
46b6d75274 Youtube support! (And 53 other video hosts)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.

web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.

This commit was sponsored by Mark Hepburn. Thanks!
2013-08-22 18:50:43 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
6be815a30c rmurl: New command, removes one of the recorded urls for a file. 2013-04-22 17:18:53 -04:00
Joey Hess
ea5d7292e6 dropping from web 2012-11-29 17:01:07 -04:00
Joey Hess
2172cc586e where indenting 2012-11-11 00:51:07 -04:00
Joey Hess
94fcd0cf59 add routes to pause/start/cancel transfers
This commit includes a paydown on technical debt incurred two years ago,
when I didn't know that it was bad to make custom Read and Show instances
for types. As the routes need Read and Show for Transfer, which includes a
Key, and deriving my own Read instance of key was not practical,
I had to finally clean that up.

So the compact Key read and show functions are now file2key and key2file,
and Read and Show are now derived instances.

Changed all code that used the old instances, compiler checked.
(There were a few places, particularly in Command.Unused, and the test
suite where the Show instance continue to be used for legitimate
comparisons; ie show key_x == show key_y (though really in a bloom filter))
2012-08-08 16:20:24 -04:00
Joey Hess
0cbbf0da79 warning 2012-02-18 11:54:47 -04:00
Joey Hess
0fada43808 avoid unnecessary log changes when re-adding the same url 2012-02-17 23:58:56 -04:00
Joey Hess
5bf07b3b5c Store web special remote url info in a more efficient location.
storing it in remotes/web/xx/yy/foo.log meant lots of extra directory
objects in git. Now I use xx/yy/foo.log.web, which is just as unique, but
more efficient since foo.log is there anyway.

Of course, it still looks in the old location too.
2012-02-17 23:15:29 -04:00
Joey Hess
56b8194470 cleanup 2011-11-09 01:33:20 -04:00
Joey Hess
63a292324d add a UUID type
Should have done this a long time ago.
2011-11-07 15:59:16 -04:00
Joey Hess
ee9af605bc break out non-log stuff to separate module 2011-10-15 17:47:03 -04:00
Joey Hess
ec169f84b1 migrate: Copy url logs for keys when migrating. 2011-10-15 16:36:56 -04:00
Joey Hess
b4015064e1 break web log handling into a separate module 2011-10-15 16:25:51 -04:00