git-annex

Author	SHA1	Message	Date
Joey Hess	1265d7e5df	implement maxsize log and command * maxsize: New command to tell git-annex how large the expected maximum size of a repository is. * vicfg: Include maxsize configuration.	2024-08-11 15:41:26 -04:00
Joey Hess	aa56d433d5	implement cluster.log Not used yet. (Or tested.) I did consider making the log start with the uuid of the node, followed by the cluster uuid (or uuids). That would perhaps mean a smaller write to the git-annex branch when adding a node, but overall the log file would be larger, and it will be read and cached near to startup on most git-annex runs.	2024-06-13 16:00:58 -04:00
Joey Hess	f97f4b8bdb	Added updateproxy command and remote.name.annex-proxy configuration So far this only records proxy information on the git-annex branch.	2024-06-04 14:52:03 -04:00
Joey Hess	dfb09ad1ad	preparing to merge git-remote-annex Update its todo with remaining items. Add changelog entry. Simplified internals document to no longer be notes to myself, but target users who want to understand how the data is stored and might want to extract these repos manually. Sponsored-by: Kevin Mueller on Patreon	2024-05-10 15:06:15 -04:00
Joey Hess	55bf01b788	add equivilant key log for VURL keys When downloading a VURL from the web, make sure that the equivilant key log is populated. Unfortunately, this does not hash the content while it's being downloaded from the web. There is not an interface in Backend currently for incrementally hash generation, only for incremental verification of an existing hash. So this might add a noticiable delay, and it has to show a "(checksum...") message. This could stand to be improved. But, that separate hashing step only has to happen on the first download of new content from the web. Once the hash is known, the VURL key can have its hash verified incrementally while downloading except when the content in the web has changed. (Doesn't happen yet because verifyKeyContentIncrementally is not implemented yet for VURL keys.) Note that the equivilant key log file is formatted as a presence log. This adds a tiny bit of overhead (eg "1 ") per line over just listing the urls. The reason I chose to use that format is it seems possible that there will need to be a way to remove an equivilant key at some point in the future. I don't know why that would be necessary, but it seemed wise to allow for the possibility. Downloads of VURL keys from other special remotes that claim urls, like bittorrent for example, does not popilate the equivilant key log. So for now, no checksum verification will be done for those. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:01:49 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	9d60385001	convert renameFile to moveFile to support cross-device moves Improve handling of some .git/annex/ subdirectories being on other filesystems, in the bittorrent special remote, and youtube-dl integration, and git-annex addurl. The only one of these that I've confirmed to be a problem is in the bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are on different filesystems. As well as auditing for renameFile, also audited for createLink, all of those are ok as are the other remaining renameFile calls. Also audited all code paths that use .git/annex/othertmp, and did not find any other cross-device problems. So, removing mention of othertmp needing to be on the same device. Sponsored-by: Dartmouth College's Datalad project	2022-12-20 15:17:50 -04:00
Joey Hess	67245ae00f	fully specify the pointer file format This format is designed to detect accidental appends, while having some room for future expansion. Detect when an unlocked file whose content is not present has gotten some other content appended to it, and avoid treating it as a pointer file, so that appended content will not be checked into git, but will be annexed like any other file. Dropped the max size of a pointer file down to 32kb, it was around 80 kb, but without any good reason and certianly there are no valid pointer files anywhere that are larger than 8kb, because it's just been specified what it means for a pointer file with additional data even looks like. I assume 32kb will be good enough for anyone. ;-) Really though, it needs to be some smallish number, because that much of a file in git gets read into memory when eg, catting pointer files. And since we have no use cases for the extra lines of a pointer file yet, except possibly to add some human-visible explanation that it is a git-annex pointer file, 32k seems as reasonable an arbitrary number as anything. Increasing it would be possible, eg to 64k, as long as users of such jumbo pointer files didn't mind upgrading all their git-annex installations to one that supports the new larger size. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 14:20:31 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	1ccd6a4600	generalize docs so they will also work when git uses SHA256	2020-01-07 16:10:57 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	20741b1eb4	Automatically convert direct mode repositories to v7 with adjusted unlocked branches * Automatically convert direct mode repositories to v7 with adjusted unlocked branches and set annex.thin. * init: When run on a crippled filesystem with --version=5, will error out, since version 7 is needed for adjusted unlocked branch. * direct: This command always errors out as direct mode is no longer supported. * indirect: This command has become a deprecated noop. * proxy: This command is deprecated because it was only needed in direct mode. (But it continues to work.) Also removed mentions of direct mode throughough the documentation. I have not removed all the direct mode code yet.	2019-08-26 15:05:25 -04:00
Joey Hess	56137ce0d2	use colon not space to delimit content identifier list InodeCache serializes to a value with spaces, and seems likely other things will too, and want to avoid unncessary base64 of content identifiers when possible.	2019-02-21 13:45:16 -04:00
Joey Hess	e8bfc3640b	storing ContentIdentifier in the git-annex branch	2019-02-20 15:40:07 -04:00
Joey Hess	d5f2463702	misctmp cleanup * Switch to using .git/annex/othertmp for tmp files other than partial downloads, and make stale files left in that directory when git-annex is interrupted be cleaned up promptly by subsequent git-annex processes. * The .git/annex/misctmp directory is no longer used and git-annex will delete anything lingering in there after it's 1 week old. Also, in Annex.Ingest, made the filename it uses in the tmp dir be prefixed with "ingest-" to avoid potentially using a filename used by some other code.	2019-01-17 16:02:22 -04:00
Ilya_Shlyakhter	7a9c2f3ac3	added anchor for git-annex branch	2018-09-25 16:54:00 +00:00
Joey Hess	5c99f6247e	per-remote metadata storage Actually very straightforward reuse of the metadata log file code. Although I had to add a todo item as git-annex forget won't clean up dead remote's metadata yet. This would be worth adding to the external special remote interface sometime. Have not opened a todo though, guess I'll wait until something needs it. This commit was supported by the NSF-funded DataLad project.	2018-08-31 12:23:22 -04:00
Joey Hess	c8ed941a26	change export.log format to support multiple export remotes This breaks backwards compatibility, but only with unreleased versions of git-annex, which I think is acceptable. This commit was supported by the NSF-funded DataLad project.	2017-09-12 17:45:52 -04:00
Joey Hess	0fa948b402	record incomplete exports in export.log Not yet used, but essential for resuming cleanly. Note that, in normmal operation, only one commit is made to export.log during an export; the incomplete version only gets to the journal and is then overwritten. This commit was supported by the NSF-funded DataLad project.	2017-09-06 13:45:03 -04:00
Joey Hess	978885247e	implement export.log and resolve export conflicts Incremental export updates work now too. This commit was sponsored by Anthony DeRobertis on Patreon.	2017-08-31 15:47:23 -04:00
Joey Hess	74aa4c503b	devblog	2017-08-29 17:26:42 -04:00
Joey Hess	d6c0b25147	mention autoenable=true	2017-05-24 13:37:06 -04:00
Joey Hess	c3970f6c1a	multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project.	2017-03-30 19:35:30 -04:00
Joey Hess	339464e847	config: New command for storing configuration in the git-annex branch. Any config names can be set using this; git-annex commands will only look at specific ones that make sense and are worth the overhead of querying the branch. This might also be useful for storing whatever other config-type stuff the user might want to shove into the git-annex branch. This commit was sponsored by Jochen Bartl on Patreon.	2017-01-30 16:46:38 -04:00
Joey Hess	a46158240b	doc improvements	2015-12-27 16:06:11 -04:00
Joey Hess	b0eb6493f7	note on deleting files	2015-06-09 16:33:25 -04:00
Joey Hess	53ede1a10e	parse X in location log file as indicating a dead key A dead key is both not present at the location that thinks it has a copy, and also is assumed to probably not be present anywhere else. Although there may be lurking disconnected repos that somehow still have a copy. Suprisingly few changes needed for this! This is because the presence log code only really concerns itself with keys that are present, and dead keys are not present. Note that both the location and web log can be parsed as having a dead key. I don't see any value to having keys listed as dead in the web log, but since it doesn't change any behavior, there was no point in not parsing it.	2015-06-09 13:28:30 -04:00
Joey Hess	9445556c97	rethought distributed fsck; instead add activity.log and expire command This is much more space efficient!	2015-04-05 12:50:02 -04:00
Joey Hess	daec4b007a	splitting up the man page Common command man pages all split out and often expanded. A few sections split out into their own pages. Still need to do all the other commands..	2015-03-23 15:36:10 -04:00
Joey Hess	ba3825441c	rework Differences data type Eliminated complexity and future proofed. The most important change is that all functions over Difference are now total; any Difference that can be expressed should be handled. Avoids needs for sanity checking of inputs, and version skew with the future. Also, the difference.log now serializes a [Difference], not a Differences. This saves space and keeps it simpler. Note that [Difference] might contain conflicting differences (eg, [Version5, Version6]. In this case, one of them needs to consistently win over the others, probably based on Ord.	2015-01-28 13:50:02 -04:00
Joey Hess	70736d2b41	Repository tuning parameters can now be passed when initializing a repository for the first time. * init: Repository tuning parameters can now be passed when initializing a repository for the first time. For details, see http://git-annex.branchable.com/tuning/ * merge: Refuse to merge changes from a git-annex branch of a repo that has been tuned in incompatable ways.	2015-01-27 17:38:06 -04:00
Joey Hess	30bf112185	Urls can now be claimed by remotes. This will allow creating, for example, a external special remote that handles magnet: and *.torrent urls.	2014-12-08 19:15:07 -04:00
Yaroslav Halchenko	0efe9825d0	DOC: minor typos and rewording in few docs	2014-12-04 22:28:07 -05:00
Joey Hess	a4810a4757	better organization and a few wording tweaks	2014-10-31 11:27:05 -04:00
Joey Hess	e2c44bf656	implement chunk logs Slightly tricky as they are not normal UUIDBased logs, but are instead maps from (uuid, chunksize) to chunkcount. This commit was sponsored by Frank Thomas.	2014-07-24 16:23:36 -04:00
Joey Hess	4bbc629cb0	document new chunk logfiles	2014-07-24 13:28:54 -04:00
Joey Hess	d00d06135c	update for required content	2014-03-29 14:39:10 -04:00
Joey Hess	431d805a96	factored out a generic MapLog from uuid-based logs UUIDBased is just a MapLog with a UUID for the field.	2014-03-15 13:45:25 -04:00
Joey Hess	b01628f1d1	document more .git/annex/ contents	2014-02-26 17:04:03 -04:00
Joey Hess	9f7e76130e	add metadata command to get/set metadata Adds metadata log, and command. Note that unsetting field values seems to currently be broken. And in general this has had all of 2 minutes worth of testing. This commit was sponsored by Julien Lefrique.	2014-02-12 21:30:33 -04:00
Joey Hess	d66535f065	global numcopies setting * numcopies: New command, sets global numcopies value that is seen by all clones of a repository. * The annex.numcopies git config setting is deprecated. Once the numcopies command is used to set the global number of copies, any annex.numcopies git configs will be ignored. * assistant: Make the prefs page set the global numcopies. This global numcopies setting is needed to let preferred content expressions operate on numcopies. It's also convenient, because typically if you want git-annex to preserve N copies of files in a repo, you want it to do that no matter which repo it's running in. Making it global avoids needing to warn the user about gotchas involving inconsistent annex.numcopies settings. (See changes to doc/numcopies.mdwn.) Added a new variety of git-annex branch log file, that holds only 1 value. Will probably be useful for other stuff later. This commit was sponsored by Nicolas Pouillard.	2014-01-20 16:47:56 -04:00
Joey Hess	3e68c1c2fd	add remote state logs This allows a remote to store a piece of arbitrary state associated with a key. This is needed to support Tahoe, where the file-cap is calculated from the data stored in it, and used to retrieve a key later. Glacier also would be much improved by using this. GETSTATE and SETSTATE are added to the external special remote protocol. Note that the state is left as-is even when a key is removed from a remote. It's up to the remote to decide when it wants to clear the state. The remote state log, $KEY.log.rmt, is a UUID-based log. However, rather than using the old UUID-based log format, I created a new variant of that format. The new varient is more space efficient (since it lacks the "timestamp=" hack, and easier to parse (and the parser doesn't mess with whitespace in the value), and avoids compatability cruft in the old one. This seemed worth cleaning up for these new files, since there could be a lot of them, while before UUID-based logs were only used for a few log files at the top of the git-annex branch. The transition code has also been updated to handle these new UUID-based logs. This commit was sponsored by Daniel Hofer.	2014-01-03 16:35:57 -04:00
Joey Hess	1b3e2d8eb1	document schedule.log and transitions.log	2013-12-17 20:13:40 -04:00
Joey Hess	5cf8a2ffcd	fix link	2013-11-22 16:19:46 -04:00
Joey Hess	70f3f22d8c	some thoughts for madduck	2013-11-04 14:34:34 -04:00
https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U	bc5c2e0ee3		2013-07-15 09:44:10 +00:00
Joey Hess	e0f3d1a3ba	document directory hashes	2013-03-31 20:13:49 -04:00
Joey Hess	b20c3a6252	document the encryption cipher	2013-03-03 20:47:36 -04:00
Joey Hess	221584ec7f	document direct mode files	2012-12-25 14:25:47 -04:00
Joey Hess	08eedfef5d	document the key format	2012-11-30 16:01:29 -04:00

1 2

72 commits