git-annex

Author	SHA1	Message	Date
Joey Hess	55bf01b788	add equivilant key log for VURL keys When downloading a VURL from the web, make sure that the equivilant key log is populated. Unfortunately, this does not hash the content while it's being downloaded from the web. There is not an interface in Backend currently for incrementally hash generation, only for incremental verification of an existing hash. So this might add a noticiable delay, and it has to show a "(checksum...") message. This could stand to be improved. But, that separate hashing step only has to happen on the first download of new content from the web. Once the hash is known, the VURL key can have its hash verified incrementally while downloading except when the content in the web has changed. (Doesn't happen yet because verifyKeyContentIncrementally is not implemented yet for VURL keys.) Note that the equivilant key log file is formatted as a presence log. This adds a tiny bit of overhead (eg "1 ") per line over just listing the urls. The reason I chose to use that format is it seems possible that there will need to be a way to remove an equivilant key at some point in the future. I don't know why that would be necessary, but it seemed wise to allow for the possibility. Downloads of VURL keys from other special remotes that claim urls, like bittorrent for example, does not popilate the equivilant key log. So for now, no checksum verification will be done for those. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:01:49 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	984034f335	filter-branch working aside from some edge cases Added a note to man page about what happens to information that is recorded in the private journal. Since it uses Branch.get, that information will be copied when options allow. It seemed better to allow it and document it than not allow it, since the options allow excluding repositories and so can be used to exclude private repos if desired.	2021-05-17 13:24:58 -04:00
Joey Hess	40ade7a515	add some functions listing log files Not used yet, will be used by copy-branch to generate the list of files to copy.	2021-05-13 14:57:38 -04:00
Joey Hess	4ff8a1ae2b	refactoring filterBranch should be reusable for copy-branch command. Changed LogVariety to differentiate between LocationLog and UrlLog; only location logs contain uuids and need to be filtered by uuid, while url logs do not. This does not change current behavior, but it will let filterBranch be reused without filtering url logs incorrectly.	2021-05-13 14:43:25 -04:00
Joey Hess	10af498be1	add configLog to otherLogs This oversight didn't cause any problems because the only place that uses the information is dropDead, which handles Nothing the same as Just OtherLog.	2021-05-13 12:48:56 -04:00
Joey Hess	8e7dc958d2	forget: Preserve currently exported trees Avoiding problems with exporttree remotes in some unusual circumstances. This commit was sponsored by Brett Eisenberg on Patreon.	2021-04-13 15:00:23 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	9483b10469	cache one more log file for metadata My worry was that a preferred content expression that matches on metadata would have removed the location log from cache, causing an expensive re-read when a Seek action later checked the location log. Especially when the --all optimisation in the previous commit pre-cached the location log. This also means that the --all optimisation could cache the metadata log too, if it wanted too, but not currently done. The cache is a list, with the most recently accessed file first. That optimises it for the common case of reading the same file twice, eg a get, examine, followed by set reads it twice. And sync --content reads the location log 3 times in a row commonly. But, as a list, it should not be made to be too long. I thought about expanding it to 5 items, but that seemed unlikely to be a win commonly enough to outweigh the extra time spent checking the cache. Clearly there could be some further benchmarking and tuning here.	2020-07-07 14:18:55 -04:00
Joey Hess	98e2e3cb9c	optimise logfile to key parsing Using bytestring-filepath	2020-07-07 13:03:33 -04:00
Joey Hess	879f52a116	annex.tune.branchhash1=true bugfix Fix support for repositories tuned with annex.tune.branchhash1=true, including --all not working and git-annex log not displaying anything for annexed files.	2020-02-14 15:22:48 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	12e4906657	refactor locationLogFileKey had an out of date list of toplevel log files to skip over, and was only not broken because the other toplevel log files don't look like keys. Fixed that too.	2019-03-06 17:54:29 -04:00
Joey Hess	d65a78ff5b	Fix cleanup of git-annex:export.log after git-annex forget --drop-dead This log, unlike all other current top-level logs, is a new format log. I have not checked what throwing it at the old log parser did, but it seems likely it ignored unparsable lines, and so perhaps deleted all lines from the log.	2019-02-22 21:34:31 -04:00
Joey Hess	9887a378fe	renamings to make clean when old-format logs are being used	2019-02-21 13:43:44 -04:00
Joey Hess	e8bfc3640b	storing ContentIdentifier in the git-annex branch	2019-02-20 15:40:07 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	0a7c5a9982	dropdead per-remote metadata Had to refactor pure code into separate modules so it is accessible inside Annex.Branch.Transitions. This commit was sponsored by Peter on Patreon.	2018-09-05 13:52:46 -04:00
Joey Hess	5c99f6247e	per-remote metadata storage Actually very straightforward reuse of the metadata log file code. Although I had to add a todo item as git-annex forget won't clean up dead remote's metadata yet. This would be worth adding to the external special remote interface sometime. Have not opened a todo though, guess I'll wait until something needs it. This commit was supported by the NSF-funded DataLad project.	2018-08-31 12:23:22 -04:00
Joey Hess	978885247e	implement export.log and resolve export conflicts Incremental export updates work now too. This commit was sponsored by Anthony DeRobertis on Patreon.	2017-08-31 15:47:23 -04:00
Joey Hess	c3970f6c1a	multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project.	2017-03-30 19:35:30 -04:00
Joey Hess	339464e847	config: New command for storing configuration in the git-annex branch. Any config names can be set using this; git-annex commands will only look at specific ones that make sense and are worth the overhead of querying the branch. This might also be useful for storing whatever other config-type stuff the user might want to shove into the git-annex branch. This commit was sponsored by Jochen Bartl on Patreon.	2017-01-30 16:46:38 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	9445556c97	rethought distributed fsck; instead add activity.log and expire command This is much more space efficient!	2015-04-05 12:50:02 -04:00
Joey Hess	b0575c621f	implement annex.tune.branchhash1 I hope this doesn't impact speed much -- it does have to pull out a value from Annex state every time it accesses the branch now. The test case I dropped has never caught any problems that I can remember, and would have been rather difficult to convert.	2015-01-28 17:17:26 -04:00
Joey Hess	e8c376e0ad	import Data.Default in Common	2015-01-28 16:11:28 -04:00
Joey Hess	0fd5f257d0	groundwork for parameterizing hash depth	2015-01-28 15:55:17 -04:00
Joey Hess	70736d2b41	Repository tuning parameters can now be passed when initializing a repository for the first time. * init: Repository tuning parameters can now be passed when initializing a repository for the first time. For details, see http://git-annex.branchable.com/tuning/ * merge: Refuse to merge changes from a git-annex branch of a repo that has been tuned in incompatable ways.	2015-01-27 17:38:06 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	b61c6bc2ff	hlint	2014-10-09 15:46:05 -04:00
Joey Hess	9fd95d9025	indent with tabs not spaces Found these with: git grep "^ " $(find -type f -name \*.hs) \|grep -v ': where' Unfortunately there is some inline hamlet that cannot use tabs for indentation. Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm leaving it as-is.	2014-10-09 15:09:26 -04:00
Joey Hess	7b50b3c057	fix some mixed space+tab indentation This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.	2014-10-09 15:09:11 -04:00
Joey Hess	e2c44bf656	implement chunk logs Slightly tricky as they are not normal UUIDBased logs, but are instead maps from (uuid, chunksize) to chunkcount. This commit was sponsored by Frank Thomas.	2014-07-24 16:23:36 -04:00
Joey Hess	fe19e15040	reorg matcher types; no non-type code changes	2014-03-29 14:43:34 -04:00
Joey Hess	431d805a96	factored out a generic MapLog from uuid-based logs UUIDBased is just a MapLog with a UUID for the field.	2014-03-15 13:45:25 -04:00
Joey Hess	9f7e76130e	add metadata command to get/set metadata Adds metadata log, and command. Note that unsetting field values seems to currently be broken. And in general this has had all of 2 minutes worth of testing. This commit was sponsored by Julien Lefrique.	2014-02-12 21:30:33 -04:00
Joey Hess	40cec65ace	more hlint	2014-02-11 10:48:52 -04:00
Joey Hess	d66535f065	global numcopies setting * numcopies: New command, sets global numcopies value that is seen by all clones of a repository. * The annex.numcopies git config setting is deprecated. Once the numcopies command is used to set the global number of copies, any annex.numcopies git configs will be ignored. * assistant: Make the prefs page set the global numcopies. This global numcopies setting is needed to let preferred content expressions operate on numcopies. It's also convenient, because typically if you want git-annex to preserve N copies of files in a repo, you want it to do that no matter which repo it's running in. Making it global avoids needing to warn the user about gotchas involving inconsistent annex.numcopies settings. (See changes to doc/numcopies.mdwn.) Added a new variety of git-annex branch log file, that holds only 1 value. Will probably be useful for other stuff later. This commit was sponsored by Nicolas Pouillard.	2014-01-20 16:47:56 -04:00
Joey Hess	3e68c1c2fd	add remote state logs This allows a remote to store a piece of arbitrary state associated with a key. This is needed to support Tahoe, where the file-cap is calculated from the data stored in it, and used to retrieve a key later. Glacier also would be much improved by using this. GETSTATE and SETSTATE are added to the external special remote protocol. Note that the state is left as-is even when a key is removed from a remote. It's up to the remote to decide when it wants to clear the state. The remote state log, $KEY.log.rmt, is a UUID-based log. However, rather than using the old UUID-based log format, I created a new variant of that format. The new varient is more space efficient (since it lacks the "timestamp=" hack, and easier to parse (and the parser doesn't mess with whitespace in the value), and avoids compatability cruft in the old one. This seemed worth cleaning up for these new files, since there could be a lot of them, while before UUID-based logs were only used for a few log files at the top of the git-annex branch. The transition code has also been updated to handle these new UUID-based logs. This commit was sponsored by Daniel Hofer.	2014-01-03 16:35:57 -04:00
Joey Hess	29ca49dad4	add a log file for scheduled activities	2013-10-07 16:06:34 -04:00
Joey Hess	62beaa1a86	refactor git-annex branch log filename code into central location Having one module that knows about all the filenames used on the branch allows working back from an arbitrary filename to enough information about it to implement dropping dead remotes and doing other log file compacting as part of a forget transition.	2013-08-29 19:13:00 -04:00

44 commits