git-annex

Author	SHA1	Message	Date
Joey Hess	d37fe6a547	annex.largefiles can be configured in .gitattributes too This is particulary useful for v6 repositories, since the .gitattributes configuration will apply in all clones of the repository.	2016-02-02 15:18:17 -04:00
Joey Hess	e8fc2ff27c	add "nothing" to preferred content DSL Same as "not anything"; will be particularly useful in annex.largefiles gitattributes.	2016-02-02 14:42:13 -04:00
Gabor Greif	daf8aa76fe	Unneded constraint	2016-01-28 12:34:07 -04:00
Gabor Greif	50e4ec36c7	Another redundant constraint	2016-01-28 12:34:07 -04:00
Joey Hess	710d44a16e	add the known associated file to the list of others	2016-01-26 14:48:19 -04:00
Joey Hess	039e83ed5d	Fix nasty reversion in the last release that broke sync --content's handling of many preferred content expressions. The type checker should have noticed this, but the changes to mapM that make it accept any Traversable hid the fact that it was not being passed a list at all. Thus, what should have returned an empty list most of the time instead returned [""] which was treated as the name of the associated file, with disasterout consequences. When I have time, I should add a test case checking what sync --content drops. I should also consider replacing mapM with one re-specialized to lists.	2016-01-26 14:28:43 -04:00
Joey Hess	23ff58cd4f	optimise getUUID This avoids a Map lookup each time it's called, instead the GitConfig field lazily looks it up once and then caches.	2016-01-20 16:55:06 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	b52cf5697b	immediate queue flushing when annex.queuesize=1 Previously, it only flushed when the queue got larger than 1. Also, make the queue auto-flush when items are added, rather than needing to be flushed as a separate step. This simplifies the code and make it more efficient too, as it avoids needing to read the queue out of the state to check if it should be flushed.	2016-01-13 14:55:01 -04:00
Joey Hess	bafcbe95c3	fix one more test failure with v6 unlocked file merge conflict resolution	2016-01-08 15:23:15 -04:00
Joey Hess	51bc32e21e	better fix for slash in view metadata The homomorphs are back, just encoded such that it doesn't crash in LANG=C However, I noticed a bug in the old escaping; [pseudoSlash] was escaped the same as ['/','/']. Fixed by using '%' to escape pseudoSlash. Which requires doubling '%' to escape it, but that's already done in the escaping of worktree filenames in a view, so is probably ok.	2016-01-08 13:55:35 -04:00
Joey Hess	42619e2231	view: Avoid using cute unicode homomorphs for '/' and '\' and instead use ugly escaping, as the unicode method doesn't work on non-unicode supporting systems.	2016-01-08 12:45:32 -04:00
Joey Hess	4b819bee2b	avoid confusing git with a modified ctime in clean filter Linking the file to the tmp dir was not necessary in the clean filter, and it caused the ctime to change, which caused git to think the file was changed. This caused git status to get slow as it kept re-cleaning unchanged files.	2016-01-07 17:48:04 -04:00
Joey Hess	3b960d1422	migrate and rekey v6 unlocked file support	2016-01-07 15:14:15 -04:00
Joey Hess	0b59fb423e	migrate: Copy over metadata to new key.	2016-01-07 14:21:12 -04:00
Joey Hess	b3d60ca285	use TopFilePath for associated files Fixes several bugs with updates of pointer files. When eg, running git annex drop --from localremote it was updating the pointer file in the local repository, not the remote. Also, fixes drop ../foo when run in a subdir, and probably lots of other problems. Test suite drops from ~30 to 11 failures now. TopFilePath is used to force thinking about what the filepath is relative to. The data stored in the sqlite db is still just a plain string, and TopFilePath is a newtype, so there's no overhead involved in using it in DataBase.Keys.	2016-01-05 17:22:19 -04:00
Joey Hess	f36f24197a	scan for unlocked files on init/upgrade of v6 repo	2016-01-01 15:09:42 -04:00
Joey Hess	a2c056df65	convert isPointerFile from Annex to IO	2016-01-01 13:22:38 -04:00
Joey Hess	829ae91009	fix failing git-annex unused test case in v6 WorkTree.lookupFile was finding a key for a file that's deleted from the work tree, which is different than the v5 behavior (though perhaps the same as the direct mode behavior). Fix by checking that the work tree file exists before catting its key. Hopefully this won't slow down much, probably the catKey is much more expensive. I can't see any way to optimise this, except perhaps to make Command.Unused check if work tree files exist before/after calling lookupFile. But, it seems better to make lookupFile really only find keys for worktree files; that's what it's intended to do.	2015-12-30 14:23:31 -04:00
Joey Hess	5057fffccd	flush queue before cleaning cruft Else, queued file stages won't have reached the index, and it won't find everthing. This evidently fixes a reversion in my work today, although I don't see how I broke it. It didn't use to flush the queue first, before, and worked somehow. Test suite for v5 is back to 100% green now.	2015-12-29 17:35:57 -04:00
Joey Hess	f3be28eedc	test suite noticed a direct mode reversion	2015-12-29 17:12:57 -04:00
Joey Hess	10ecc43790	rename	2015-12-29 17:02:14 -04:00
Joey Hess	996ae9b172	don't disable smudge filter while merging The smudge filter does need to be run, because if the key is in the local annex already (due to renaming, or a copy of a file added, or a new file added and its content has already arrived), git merge smudges the file and this should provide its content. This does probably mean that in merge conflict resolution, git smudges the existing file, re-copying all its content to it, and then the file is deleted. So, not efficient.	2015-12-29 16:36:21 -04:00
Joey Hess	24bbaa2346	avoid renaming file when auto-resolving conflict in annex pointer This is a behavior change for merge conflicts between locked files that both pointed to the same key, in different ways. Before, the conflict was resolved, but the file was renamed to .variant. This was unnecessary, because there was only one variant. Of course, this also handles conflicts between unlocked and locked, or even two unlocked files with different pointer contents.	2015-12-29 16:35:34 -04:00
Joey Hess	2e9341a47d	fix inode cache consistency bug when a merge unlocks a present file Since the file was present and locked, its annex object was not in the inode cache. So, despite not needing to update the annex object when the clean filter is run on the content by git merge, it does need to record the inode cache of the annex object. Otherwise, the annex object will be assumed to be bad, since its inode is not cached.	2015-12-29 16:26:27 -04:00
Joey Hess	b6b34f4916	automatic conflict resolution for v6 unlocked files Several tricky parts: * When the conflict is just between the same key being locked and unlocked, the unlocked version wins, and the file is not renamed in this case. * Need to update associated file map when conflict resolution renames an unlocked file. * git merge runs the smudge filter on the conflicting file, and actually overwrites the file with the same content it had before, and so invalidates its inode cache. This makes it difficult to know when it's safe to remove such files as conflict cruft, without going so far as to compare their entire contents. Dealt with this by preventing the smudge filter from populating the file when a merge is run. However, that also prevents the smudge filter being run for non-conflicting files, so eg moving a file won't put its new content into place. * Ideally, if a merge or a merge conflict resolution renames an unlocked file, the file in the work tree can just be moved, rather than copying the content to a new worktree file. This is attempted to be done in merge conflict resolution, but due to git merge's behavior of running smudge filters, what actually seems to happen is the old worktree file with the content is deleted and rewritten as a pointer file, so doesn't get reused. So, this is probably not as efficient as it optimally could be. If that becomes a problem, could look into running the merge in a separate worktree and updating the real worktree more efficiently, similarly to the direct mode merge. However, the direct mode merge had a lot of bugs, and I'd rather not use that more error-prone method unless really needed.	2015-12-29 15:41:09 -04:00
Joey Hess	645833774d	fix windows build	2015-12-28 12:44:04 -04:00
Joey Hess	121f5d5b0c	annex.thin Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.	2015-12-27 15:59:59 -04:00
Joey Hess	54f87ef95f	get associated files from Keys database	2015-12-26 15:09:53 -04:00
Joey Hess	7593917147	cleanup	2015-12-26 15:09:47 -04:00
Joey Hess	289a3592c3	support v6 unlocked files This optimisation was not necessary, and didn't work for v6 unlocked files. Typically only a small number of files will be changed by a commit, so just catKey them all.	2015-12-26 15:04:26 -04:00
Joey Hess	60c36ef6ba	make views work with v6 unlocked files Have to only use the view index in one place; lookupFile was failing for unlocked files because it was run using the view index, which was empty.	2015-12-26 14:52:58 -04:00
Joey Hess	49fca49991	remove dead code	2015-12-26 14:45:07 -04:00
Joey Hess	f324ad24c1	improve comment	2015-12-26 13:47:36 -04:00
Joey Hess	0c03629173	clean up cruft in assistant fast rename code path	2015-12-22 18:03:47 -04:00
Joey Hess	d8a8c77a8f	move cleanOldKey into ingest	2015-12-22 16:55:49 -04:00
Joey Hess	cfaac52b88	populate unlocked files with newly available content when ingesting This can happen when ingesting a new file in either locked or unlocked mode, when some unlocked files in the repo use the same key, and the content was not locally available before.	2015-12-22 16:22:28 -04:00
Joey Hess	4f60234690	finish v6 support for assistant Seems to basically work now!	2015-12-22 15:23:27 -04:00
Joey Hess	4392140946	make linkAnnex detect when the file changes as it's being copied/linked in This fixes a race where the modified file ended up in annex/objects, and the InodeCache stored in the database was for the modified version, so git-annex didn't know it had gotten modified. The race could occur when the smudge filter was running; now it gets the InodeCache before generating the Key, which avoids the race.	2015-12-22 15:20:03 -04:00
Joey Hess	8e9608d7f0	refactoring no behavior changes	2015-12-22 13:42:58 -04:00
Joey Hess	ca2c977704	wip v6 support for assistant Files are not yet added to v6 repos in unlocked mode.	2015-12-21 18:41:15 -04:00
Joey Hess	35f6a78b66	fix reversion in v5 git-annex add of unlocked file In v5, lookupFile is supposed to only look at symlinks on disk (except when in direct mode). Note that v6 also has a bug when a locked file's symlink is deleted and is replaced with a new file. It sees that a link is staged and gets that key.	2015-12-16 14:27:12 -04:00
Joey Hess	38a23928e9	temporarily remove cached keys database connection The problem is that shutdown is not always called, particularly in the test suite. So, a database connection would be opened, possibly some changes queued, and then not shut down. One way this can happen is when using Annex.eval or Annex.run with a new state. A better fix might be to make both of them call Keys.shutdown (and be sure to do it even if the annex action threw an error). Complication: Sometimes they're run reusing an existing state, so shutting down a database connection could cause problems for other users of that same state. I think this would need a MVar holding the database handle, so it could be emptied once shut down, and another user of the database connection could then start up a new one if it got shut down. But, what if 2 threads were concurrently using the same database handle and one shut it down while the other was writing to it? Urgh. Might have to go that route eventually to get the database access to run fast enough. For now, a quick fix to get the test suite happier, at the expense of speed.	2015-12-16 14:05:26 -04:00
Joey Hess	7d0e79b9e1	Use git-annex init --version=6 to get v6 for now Not ready to make it default because of the direct mode upgrade needing to all happen at once.	2015-12-15 17:17:13 -04:00
Joey Hess	f9d077186a	implemented upgrade of direct mode repo to v6	2015-12-15 16:00:26 -04:00
Joey Hess	cdd27b8920	reorg	2015-12-15 15:34:28 -04:00
Joey Hess	2bc920e266	update inode cache to cover file even when nothing needs to be done to linkAnnex This covers the case where multiple files have the same content and are added with git add. Previously only the one that was linked to the annex got its inode cached; now both are.	2015-12-15 13:02:33 -04:00
Joey Hess	1dad3af3fc	checked getKeysPresent; it's ok for v6 unlocked files When a v6 unlocked files is removed from the work tree, unused doesn't show it. When it gets removed from the index, unused does show it. This is the same as a locked file.	2015-12-11 16:12:42 -04:00
Joey Hess	7790e059b2	finish v6 git-annex lock This was a doozy!	2015-12-11 15:28:34 -04:00
Joey Hess	50e83b606c	only make 1 hardlink max between pointer file and annex object If multiple files point to the same annex object, the user may want to modify them independently, so don't use a hard link. Also, check diskreserve when copying.	2015-12-11 14:00:21 -04:00
Joey Hess	c608a752a5	Merge branch 'master' into smudge	2015-12-11 13:50:31 -04:00
Joey Hess	abd66c7089	fsck: Failed to honor annex.diskreserve when checking a remote.	2015-12-11 13:50:27 -04:00
Joey Hess	c910b4e255	wip	2015-12-11 10:42:18 -04:00
Joey Hess	9dffd3d255	add generalized linkAnnex'	2015-12-10 16:08:19 -04:00
Joey Hess	06a8256bf6	always format pointer file with a trailing newline Before the smudge filter added a trailing newline, but other things that wrote formatPointer to a file did not. also some new pointer staging code to use later	2015-12-10 16:06:58 -04:00
Joey Hess	f80a3d8cd0	check InodeCache in inAnnex et al This avoids querying the database when the content file doen't exist (or otherwise fails the provided check). However, it does add overhead of querying the database, and will certianly impact performance.	2015-12-10 14:51:04 -04:00
Joey Hess	2b8f6b8b2f	check inode cache in prepSendAnnex This does mean one query of the database every time an object is sent. May impact performance.	2015-12-10 14:50:52 -04:00
Joey Hess	3b2a7f216d	move	2015-12-10 14:20:38 -04:00
Joey Hess	3719d1b390	make clear when code is using deprecated direct mode files	2015-12-09 19:43:15 -04:00
Joey Hess	aa88851ec1	reorder	2015-12-09 19:38:37 -04:00
Joey Hess	ce73a96e4e	use InodeCache when dropping a key to see if a pointer file can be safely reset The Keys database can hold multiple inode caches for a given key. One for the annex object, and one for each pointer file, which may not be hard linked to it. Inode caches for a key are recorded when its content is added to the annex, but only if it has known pointer files. This is to avoid the overhead of maintaining the database when not needed. When the smudge filter outputs a file's content, the inode cache is not updated, because git's smudge interface doesn't let us write the file. So, dropping will fall back to doing an expensive verification then. Ideally, git's interface would be improved, and then the inode cache could be updated then too.	2015-12-09 17:54:54 -04:00
Joey Hess	5e8c628d2e	add inode cache to the db Renamed the db to keys, since it is various info about a Keys. Dropping a key will update its pointer files, as long as their content can be verified to be unmodified. This falls back to checksum verification, but I want it to use an InodeCache of the key, for speed. But, I have not made anything populate that cache yet.	2015-12-09 17:00:37 -04:00
Joey Hess	3311c48631	move InodeSentinal from direct mode code to its own module Will be used outside of direct mode for v6 unlocked files, and is already used outside of direct mode when adding files to annex.	2015-12-09 15:52:11 -04:00
Joey Hess	8a818088a3	link/copy pointer files to object content when it's added	2015-12-09 15:27:29 -04:00
Joey Hess	751120c171	avoid pre-commit hook messing up new-style unlocked files in v6 repo	2015-12-09 15:18:54 -04:00
Joey Hess	78a6b8ce05	refactor and improve pointer file handling code	2015-12-09 14:27:43 -04:00
Joey Hess	712c9fc590	require "annex/objects/" before key in pointer files This removes ambiguity, because while someone might have "WORM--foo" in a file that's not intended to be a git-annex pointer file, "annex/objects/WORM--foo" is less likely. Also, `664cc987e8` had a caveat about symlink targets being parsed as pointer files, and now the same parser is used for both. I did not include any hash directories before the key in the pointer file, as they're not needed. However, if they were included, the parser would still work ok.	2015-12-07 15:45:08 -04:00
Joey Hess	664cc987e8	support pointer files Backend.lookupFile is changed to always fall back to catKey when operating on a file that's not a symlink. catKey is changed to understand pointer files, as well as annex symlinks. Before, catKey needed a file mode witness, to be sure it was looking at a symlink. That was complicated stuff. Now, it doesn't actually care if a file in git is a symlink or not; in either case asking git for the content of the file will get the pointer to the key. This does mean that git-annex will treat a link foo -> WORM--bar as a git-annex file, and also treats a regular file containing annex/objects/WORM--bar as a git-annex file. Calling catKey could make git-annex commands need to do more work than before. This would especially be the case if a repo contained many regular files, and only a few annexed files, as now git-annex will need to ask git about the contents of the regular files.	2015-12-07 15:35:36 -04:00
Joey Hess	62a2fba1cd	Merge branch 'master' into smudge	2015-12-07 12:29:34 -04:00
Joey Hess	2936153fc4	fix temp filename Was not putting it inside the temp dir, but next to it! This was just wrong, and it led to a longer filename that desired being used, leading to some bug reports.	2015-12-06 16:54:01 -04:00
Joey Hess	6e71094e7d	avoid too long temp dir template The filename might be at or close to the filename length limit, so using it as the template for the temp dir would then fail.	2015-12-06 16:42:40 -04:00
Joey Hess	e7f75b079d	don't let git-annex direct be run in a v6 repo	2015-12-04 16:33:09 -04:00
Joey Hess	ccc49861ca	add v6; keep v5 working for now and manual upgrade Since all places where a repo is used in direct mode need to have git-annex upgraded before the repo can safely be converted to v6, the upgrade needs to be manual for now. I suppose that at some point I'll want to drop all the direct mode support code. At that point, will stop supporting v5, and will need to auto-upgrade any remaining v5 repos. If possible, I'd like to carry the direct mode support for say, a year or so, to give people plenty of time to upgrade and avoid disruption.	2015-12-04 16:14:48 -04:00
Joey Hess	34ead644d9	auto-configure filter.annex.smudge and clean on init	2015-12-04 16:14:11 -04:00
Joey Hess	983c1894eb	avoid unnecessary reading of git-annex branch data when matching on annex.largefiles This makes git annex clean not look at the git-annex branch at all, and so speeds it up by 50% or more.	2015-12-04 15:06:41 -04:00
Joey Hess	99b2a524a0	clean filter should update location log when adding new content to annex	2015-12-04 14:20:32 -04:00
Joey Hess	2c6454a2e2	basic clean filter working	2015-12-04 13:39:14 -04:00
Joey Hess	0d432dd1a4	annex object file mode for core.sharedRepository When core.sharedRepository is set, annex object files are not made mode 444, since that prevents a user other than the file owner from locking them. Instead, a mode such as 664 is used in this case.	2015-11-18 15:45:32 -04:00
Joey Hess	3449c0e8ec	avoid spawning file size polling thread when not in -J mode	2015-11-16 21:21:58 -04:00
Joey Hess	e97fce35a6	Display progress meter in -J mode when downloading from the web. Including in addurl, and get --from web, but also in S3 and External special remotes when a web url is known for content in those remotes.	2015-11-16 21:00:54 -04:00
Joey Hess	262c37c16e	add missing checkSaneLock wrapper for pidlocks	2015-11-16 15:35:41 -04:00
Joey Hess	bb86eebfbd	init: Automatically enable annex.pidlock when necessary.	2015-11-13 13:35:29 -04:00
Joey Hess	aaf1ef268d	convert from Utility.LockPool to Annex.LockPool everywhere	2015-11-12 18:13:37 -04:00
Joey Hess	aa4192aea6	pid locking configuration and abstraction layer for git-annex (not actually used anywhere yet)	2015-11-12 17:50:34 -04:00
Joey Hess	7c741302cc	assistant: Pass ssh-options through 3 more git pull/push calls that were missed before. It was used for regular pull, but not for regular push, tagged push, or the fallback fetching.	2015-11-10 16:52:30 -04:00
Joey Hess	7938b87864	add: Fix error recovery rollback to not move the injested file content out of the annex back to the file, because other files may point to that same content. Instead, copy the injected file content out to recover. That was not a data loss, but it came close!	2015-11-06 15:28:20 -04:00
Joey Hess	51e60259e1	fix replaceFile makeAnnexLink race replaceFile created a temp file, which was guaranteed to not overlap with another temp file. However, makeAnnexLink then deleted that file, in preparation for making the symlink in its place. This caused a race, since some other replaceFile could create a temp file, using the same name! I was able to reproduce the race easily running git-annex add -J10 in a directory with 100 files (all with different contents). Some files would get ingested into the annex, but their annex links would fail to be added. There could be other situations where this same problem could occur. Perhaps when the assistant is adding a file, if the user manually also ran git-annex add. Perhaps in cases not involving adding a file. The new replaceFile makes a temprary directory, which is guaranteed to be unique, and doesn't make a temp file in there. makeAnnexLink can thus create the symlink without problem and the race is avoided. Audited all calls to replaceFile to make sure that the old behavior of providing an empty temp file was not relied on. The general problem of asking for a temp file and deleting it as part of the process of using it could reach beyond replaceFile. Did some quick audits and didn't find other cases of it. Probably only symlink creation stuff would tend to make that mistake, mostly.	2015-11-06 15:08:19 -04:00
Joey Hess	31472161e4	merge git command queue when joining with concurrent thread	2015-11-05 18:21:48 -04:00
Joey Hess	a4dd8503b8	add regions to concurrent output still no progress displays when getting files etc, but a big improvement	2015-11-04 14:52:07 -04:00
Joey Hess	640dba43b6	enableremote: List uuids and descriptions of remotes that can be enabled, and accept either the uuid or the description in leu if the name.	2015-10-26 14:55:40 -04:00
Joey Hess	806819be57	Avoid displaying network transport warning when a ssh remote does not yet have an annex.uuid set. Instead, only display transport error if the configlist output doesn't include an annex.uuid line, even an empty one. A recent change made git-annex init try to get all the remote uuids, and so the transport error would be displayed by it. It was also displayed when eg, copying files to a remote that had no uuid yet.	2015-10-15 15:36:54 -04:00
Joey Hess	3879f6e6be	do tmp dir cleanup in error case too	2015-10-15 14:27:14 -04:00
Joey Hess	27eaa6f410	avoid making post-merge-conflict-resolution commit when no conflicts were resolved sync, merge, assistant: When git merge failed for a reason other than a conflicted merge, such as a crippled filesystem not allowing particular characters in filenames, git-annex would make a merge commit that could omit such files or otherwise be bad. Fixed by aborting the whole merge process when git merge fails for any reason other than a merge conflict.	2015-10-15 14:22:46 -04:00
Joey Hess	9e90c033d3	Changed drop ordering when using git annex sync --content or the assistant, to drop from remotes first and from the local repo last. This works better with the behavior changes to drop in many cases.	2015-10-14 12:33:02 -04:00
Joey Hess	1ff7610118	fix windows build	2015-10-12 15:48:59 -04:00
Joey Hess	f9adb905fc	Avoid unncessary write to the location log when a file is unlocked and then added back with unchanged content. Implemented with no additional overhead of compares etc. This is safe to do for presence logs because of their locality of change; a given repo's presence logs are only ever changed in that repo, or in a repo that has just been actively changing the content of that repo. So, we don't need to worry about a split-brain situation where there'd be disagreement about the location of a key in a repo. And so, it's ok to not update the timestamp when that's the only change that would be made due to logging presence info.	2015-10-12 14:46:47 -04:00
Joey Hess	fa9333e99f	use action, not sideAction sideAction is for things not generally related to the current action being performed. And, it adds a newline after the side action. This was not the right thing to use for stuff like "checksum", where doing a checksum is part of the git annex get process, and indeed we want it to display "(checksum...) ok"	2015-10-11 13:29:44 -04:00
Joey Hess	3b89d5a20c	implement lockContent for ssh remotes	2015-10-09 16:55:41 -04:00
Joey Hess	e392ec112f	also generate a drop safety proof for move --from remote	2015-10-09 16:16:03 -04:00
Joey Hess	6a72045707	fix local dropping to not require extra locking of copies, but only that the local copy be locked for removal	2015-10-09 15:48:02 -04:00
Joey Hess	1043880432	improve message when drop failed due to no locked copy	2015-10-09 15:14:25 -04:00
Joey Hess	b021321aae	rename constructor	2015-10-09 15:01:33 -04:00
Joey Hess	45e1a7c361	verify local copy of content with locking	2015-10-09 14:57:32 -04:00
Joey Hess	4c6095b6f5	content locking during drop working for local git remotes Only ssh remotes lack locking now	2015-10-09 13:12:58 -04:00
Joey Hess	ceb5819538	finish and use lockContent interface	2015-10-09 12:36:04 -04:00
Joey Hess	cf79dffa4c	improve drop proof code	2015-10-09 11:09:46 -04:00
Joey Hess	f57ac29be1	refactor	2015-10-09 10:30:22 -04:00
Joey Hess	7f5958eec2	TrustedCopy is good enough to allow dropping By definition, a trusted repository is trusted to always have its location tracking log accurate. Thus, it should never be in a position where content is being dropped from it concurrently, as that would result in the location tracking log not being accurate.	2015-10-08 18:34:48 -04:00
Joey Hess	e4a33967a1	try harder to verify until at least one VerifiedCopyLock is obtained This avoids a failure where eg, we start with RecentlyVerifiedCopies for all remotes, and so didn't do any active verification, which is required. Also, dedup the list of VerifiedCopies when checking if we have enough, in case 2 copies of a UUID slip in.	2015-10-08 18:20:36 -04:00
Joey Hess	b17f5da6c9	require 1 locked copy while dropping from local or a remote See doc/bugs/concurrent_drop--from_presence_checking_failures.mdwn for discussion about why 1 locked copy is all we can require, and how this fixes concurrent dropping bugs. Note that, since nothing yet generates a VerifiedCopyLock yet, this commit breaks dropping temporarily.	2015-10-08 18:11:39 -04:00
Joey Hess	c75c79864d	support invalidating existing VerifiedCopys	2015-10-08 17:58:32 -04:00
Joey Hess	90f7c4b6a2	add VerifiedCopy data type There should be no behavior changes in this commit, it just adds a more expressive data type and adjusts code that had been passing around a [UUID] or sometimes a Maybe Remote to instead use [VerifiedCopy]. Although, since some functions were taking two different [UUID] lists, there's some potential for me to have gotten it horribly wrong.	2015-10-08 16:55:11 -04:00
Joey Hess	beedf1da25	unused import	2015-10-08 14:59:34 -04:00
Joey Hess	9cb9dab69b	I think this comment is stale/confusing; remove	2015-10-08 14:51:44 -04:00
Joey Hess	4d50958ed7	add lockContentShared Also, rename lockContent to lockContentExclusive inAnnexSafe should perhaps be eliminated, and instead use `lockContentShared inAnnex`. However, I'm waiting on that, as there are only 2 call sites for inAnnexSafe and it's fiddly.	2015-10-08 14:29:35 -04:00
Joey Hess	2def1d0a23	other 80% of avoding verification when hard linking to objects in shared repo In `c6632ee5c8`, it actually only handled uploading objects to a shared repository. To avoid verification when downloading objects from a shared repository, was a lot harder. On the plus side, if the process of downloading a file from a remote is able to verify its content on the side, the remote can indicate this now, and avoid the extra post-download verification. As of yet, I don't have any remotes (except Git) using this ability. Some more work would be needed to support it in special remotes. It would make sense for tahoe to implicitly verify things downloaded from it; as long as you trust your tahoe server (which typically runs locally), there's cryptographic integrity. OTOH, despite bup being based on shas, a bup repo under an attacker's control could have the git ref used for an object changed, and so a bup repo shouldn't implicitly verify. Indeed, tahoe seems unique in being trustworthy enough to implicitly verify.	2015-10-02 14:35:12 -04:00
Joey Hess	7c7fe895f9	disabling verification also disables size verification It's not expensive to do size verification, but let's be consistent and turn it off too.	2015-10-02 12:38:02 -04:00
Joey Hess	c6632ee5c8	avoid verification when hard linking to objects in shared repository Such a repository is implicitly trusted, so there's no point.	2015-10-02 12:36:03 -04:00
Joey Hess	2fb3722ce9	Do verification of checksums of annex objects downloaded from remotes. * When annex objects are received into git repositories, their checksums are verified then too. * To get the old, faster, behavior of not verifying checksums, set annex.verify=false, or remote.<name>.annex-verify=false. * setkey, rekey: These commands also now verify that the provided file matches the key, unless annex.verify=false. * reinject: Already verified content; this can now be disabled by setting annex.verify=false. recvkey and reinject already did verification, so removed now duplicate code from them. fsck still does its own verification, which is ok since it does not use getViaTmp, so verification doesn't happen twice when using fsck --from.	2015-10-01 15:56:39 -04:00
Joey Hess	b72d3fbeba	rename function	2015-10-01 14:18:57 -04:00
Joey Hess	807ba6a903	refactor	2015-10-01 14:07:06 -04:00
Joey Hess	dc2f1f09b7	Improve robustness of direct mode merge, avoiding a crash if the index file is missing. I couldn't find a good way to make an empty index file (zero byte file won't do), so I punted and just don't make index.lock when there's no index yet. This means some other git process could race and write an index file at the same time as the merge is ongoing, in theory. Only happens in new repos though.	2015-09-22 13:00:18 -04:00
Joey Hess	b88739f0d0	avoid auto-enabling a remote that's already enabled	2015-09-14 15:34:15 -04:00
Joey Hess	c919489c3e	avoid autoenable of dead special remotes	2015-09-14 15:28:14 -04:00
Joey Hess	9cfb96c53d	Special remotes configured with autoenable=true will be automatically enabled when git-annex init is run.	2015-09-14 14:49:48 -04:00
Joey Hess	97962591d6	init: Fix reversion in detection of repo made with git clone --shared	2015-09-09 13:56:37 -04:00
Joey Hess	c242e248e8	Fix reversion in init when ran as root, introduced in version 5.20150731.	2015-08-19 12:36:17 -04:00
Joey Hess	0f5d6c09ac	importfeed --relaxed: Avoid hitting the urls of items in the feed.	2015-08-19 12:24:55 -04:00
Joey Hess	23e9d3bb77	Fix setting/setting/viewing metadata that contains unicode or other special characters, when in a non-unicode locale. Oh boy, not again. So, another place that the filesystem encoding needs to be applied. Yay. In passing, I changed decodeBS so if a NUL is embedded in the input, the resulting FilePath doesn't get truncated at that NUL. This was needed to make prop_b64_roundtrips pass, and on reviewing the callers of decodeBS, I didn't see any where this wouldn't make sense. When a FilePath is used to operate on the filesystem, it'll get truncated at a NUL anyway, whereas if a String is being used for something else, it might conceivably have a NUL in it, and we wouldn't want it to get truncated when going through decodeBS. (NB: There may be a speed impact from this change.)	2015-08-11 18:40:59 -04:00
Joey Hess	f7d7995172	clean	2015-08-04 17:07:45 -04:00
Joey Hess	3c971c414e	sshopts is never going to be null; the concat of it may be	2015-08-04 16:53:38 -04:00
Joey Hess	a6374b7a3d	typo	2015-08-04 15:44:46 -04:00
Joey Hess	f041a65c33	Windows: Fix bug that caused git-annex sync to fail due to missing environment variable. I think that the problem was caused by windows not having a concept of an env var that is set, but to the empty string. So, GIT_ANNEX_SSHOPTION got set to "" and was not seen as set at all. Easy fix, which also makes git-annex sync a little faster is to not set GIT_SSH, when GIT_ANNEX_SSHOPTION has no options. Might as well let git use ssh per usual in this case, no need to run git-annex as the proxy ssh command..	2015-08-04 15:27:48 -04:00
Joey Hess	6c15cdfcb8	proxy: Fix proxy git commit of non-annexed files in direct mode. * proxy: Fix proxy git commit of non-annexed files in direct mode. * proxy: If a non-proxied git command, such as git revert would normally fail because of unstaged files in the work tree, make the proxied command fail the same way.	2015-08-04 14:01:59 -04:00
Joey Hess	ea765ec022	windows build warning fixes	2015-08-03 15:54:29 -04:00
Joey Hess	9dfe03dbcd	Improve shutdown due to --time-limit, especially for fsck * Perform a clean shutdown when --time-limit is reached. This includes running queued git commands, and cleanup actions normally run when a command is finished. * fsck: Commit incremental fsck database when --time-limit is reached. Previously, some of the last files fscked did not make it into the database when using --time-limit. Note that this changes Annex.addCleanup hooks, to run after --time-limit expires. Fsck was using such a hook to clean up after a --incremental-schedule, and that shouldn't run when --time-limit exipires it. So, instead, moved that cleanup code to be run by cleanupIncremental. Resulted in some data type juggling.	2015-07-31 16:01:54 -04:00
Joey Hess	b30324fec7	init: Detect when the filesystem is crippled such that it ignores attempts to remove the write bit from a file, and enable direct mode. Seen with eg, NTFS fuse on linux.	2015-07-30 14:06:17 -04:00
Joey Hess	267f397d82	avoid calling copy when file DNE This avoids an ugly warning when running git annex fsck --from a rsync remote in a repo in direct mode.	2015-07-30 13:40:17 -04:00
Joey Hess	24800b1bf1	Only look at reflogs for relevant branches, not for git-annex branches This speeds it up quite a bit.. May still be too slow in large repos.	2015-07-07 17:36:30 -04:00
Joey Hess	b11d2f5a8a	unused: --used-refspec can now be configured to look at refs in the reflog. This provides a way to not consider old versions of files to be unused after they have reached a specified age, when the old refs in the reflog expire. May be slow.	2015-07-07 17:13:50 -04:00
Joey Hess	f7dc20595e	refactor ls-tree params All in one place to avoid bugs like `174da80ddc`	2015-07-06 14:21:43 -04:00
Joey Hess	174da80ddc	bugfix: Pass --full-tree when using git ls-files to get a list of files on the git-annex branch, so it works when run in a subdirectory. This bug affected git-annex unused, and potentially also transitions running code and other things.	2015-07-06 14:09:54 -04:00
Joey Hess	adba0595bd	use bloom filter in second pass of sync --all --content This is needed because when preferred content matches on files, the second pass would otherwise want to drop all keys. Using a bloom filter avoids this, and in the case of a false positive, a key will be left undropped that preferred content would allow dropping. Chances of that happening are a mere 1 in 1 million.	2015-06-16 18:50:13 -04:00
Joey Hess	a0a8127956	instance Hashable Key for bloomfilter	2015-06-16 18:37:41 -04:00
Joey Hess	8b74aec3ea	Increased the default annex.bloomaccuracy from 1000 to 10000000 This makes git annex unused use around 48 mb more memory than it did before, but the massive increase in accuracy makes this worthwhile for all but the smallest systems. Also, I want to use the bloom filter for sync --all --content, to avoid dropping files that the preferred content doesn't want, and 1/1000 false positives would be far too many in that use case, even if it were acceptable for unused. Actual memory use numbers: 1000: 21.06user 3.42system 0:26.40elapsed 92%CPU (0avgtext+0avgdata 501552maxresident)k 1000000: 21.41user 3.55system 0:26.84elapsed 93%CPU (0avgtext+0avgdata 549496maxresident)k 10000000: 21.84user 3.52system 0:27.89elapsed 90%CPU (0avgtext+0avgdata 549920maxresident)k Based on these numbers, 10 million seemed a better pick than 1 million.	2015-06-16 18:12:00 -04:00
Joey Hess	8c46ea22c2	Added new "anything" preferred content expression, which matches all versions of all files.	2015-06-16 17:03:34 -04:00
Joey Hess	0a998032ed	Fix bug that prevented enumerating locally present objects in repos tuned with annex.tune.objecthash1=true Need to walk 1 level of subdirs less in this case. The git-annex branch traversal code didn't have a similar bug.	2015-06-11 15:15:05 -04:00
Joey Hess	de3bd11a2c	import --clean-duplicates: Fix bug that didn't count local or trusted repo's copy of a file as one of the necessary copies to allow removing it from the import location.	2015-06-03 13:15:38 -04:00
Joey Hess	d28e8fbfd5	get --incomplete: New option to resume any interrupted downloads.	2015-06-02 14:20:38 -04:00
Joey Hess	eb33569f9d	remove Params constructor from Utility.SafeCommand This removes a bit of complexity, and should make things faster (avoids tokenizing Params string), and probably involve less garbage collection. In a few places, it was useful to use Params to avoid needing a list, but that is easily avoided. Problems noticed while doing this conversion: * Some uses of Params "oneword" which was entirely unnecessary overhead. * A few places that built up a list of parameters with ++ and then used Params to split it! Test suite passes.	2015-06-01 13:52:23 -04:00
Joey Hess	a6d54e49a0	sync, remotedaemon: Pass configured ssh-options even when annex.sshcaching is disabled.	2015-05-30 22:01:52 -04:00
Joey Hess	83b262f1b6	fix windows build	2015-05-22 13:54:54 -04:00
Joey Hess	167539a354	better memoize core.sharedrepository handling It was memoized, but that was not used consistently. Move it to Types.GitConfig so it will auto-memoize.	2015-05-19 15:04:24 -04:00
Joey Hess	b47c9fd587	honor core.sharedRepository settings in lockContent The content file may not be owned by the user running git-annex, in which case, setting the owner write bit was not enough to let lockContent act on the file. However, with some core.sharedRepository configs, the file should be writable by the user's group. So, the thing to do is to call thawContent on it.	2015-05-19 14:53:19 -04:00
Joey Hess	f4e2093760	fix inAnnexSafe result for direct file that is being dropped It was returning Just False in this situation, which differed from indirect mode behavior. I don't think this led to any actual problems; things that checked if the file being dropped was present just failed to fail, and instead reported it wasn't present, possibly incorrectly. Hmm, it's possible that this could have made git annex fsck --from remote update the location log wrongly, if a remote was in direct mode, and was in the middle of trying to drop a key, and the drop later failed.	2015-05-19 14:26:07 -04:00
Joey Hess	1312e721ed	convert lockContent to use new LockPools Also cleaned up the code, avoiding creating a lock file if we're going to open it for create later anyway. And, if there's an exception while preparing to lock the file, but not at the point of actually taking the lock, throw an exception, instead of silently not locking and pretending to succeed. And, on Windows, always use lock file, even if the repo somehow got into indirect mode (maybe with cygwin git..)	2015-05-19 14:12:23 -04:00
Joey Hess	ecb0d5c087	use lock pools throughout git-annex The one exception is in Utility.Daemon. As long as a process only daemonizes once, which seems reasonable, and as long as it avoids calling checkDaemon once it's already running as a daemon, the fcntl locking gotchas won't be a problem there. Annex.LockFile has it's own separate lock pool layer, which has been renamed to LockCache. This is a persistent cache of locks that persist until closed. This is not quite done; lockContent stil needs to be converted.	2015-05-19 14:09:52 -04:00
Joey Hess	7ebf234616	Stale transfer lock and info files will be cleaned up automatically when get/unused/info commands are run. Deleting lock files is tricky, tricky stuff. I think I got it right!	2015-05-12 20:11:23 -04:00
Joey Hess	7299bbb639	don't clean up transfer lock file when retrying transfer This affected callers that used forwardRetry; if the 1st attempt failed it would clean up the transfer lock before retrying.	2015-05-12 19:43:24 -04:00
Joey Hess	8c2dd7d8ee	Fix an unlikely race that could result in two transfers of the same key running at once. As discussed in bug report.	2015-05-12 19:39:28 -04:00
Joey Hess	e25ecab7dd	convert to using Utility.Lockfile for transfer lock files Should be no behavior changes, just simplified code. The only actual difference is it doesn't truncate the lock file. I think that was a holdover from when transfer info was written to the lock file.	2015-05-12 19:36:16 -04:00
Joey Hess	61ccf95004	Avoid accumulating transfer failure log files unless the assistant is being used. Only the assistant uses these, and only the assistant cleans them up, so make only git annex transferkeys write them, There is one behavior change from this. If glacier is being used, and a manual git annex get --from glacier fails because the file isn't available yet, the assistant will no longer later see that failed transfer file and retry the get. Hope no-one depended on that old behavior.	2015-05-12 15:53:38 -04:00
Joey Hess	a812d598ef	Take space that will be used by running downloads into account when checking annex.diskreserve.	2015-05-12 15:20:22 -04:00
Joey Hess	e27b97d364	Merge branch 'master' into concurrentprogress Conflicts: Command/Fsck.hs Messages.hs Remote/Directory.hs Remote/Git.hs Remote/Helper/Special.hs Types/Remote.hs debian/changelog git-annex.cabal	2015-05-12 13:23:22 -04:00
Joey Hess	64a4553e0b	rename traverse to walk since Data.Traversable is imported by default in ghc 7.10	2015-05-10 16:43:09 -04:00
Joey Hess	08308dc9b3	fix build warning with ghc 7.10	2015-05-10 15:28:13 -04:00
Joey Hess	9f3e51dd51	move nubbing into function whose algo needs a nubbed list	2015-04-30 14:11:59 -04:00
Joey Hess	38c458b407	refactor	2015-04-30 14:02:56 -04:00
Joey Hess	5948c148fb	Make repo init more robust. The setDifferences that got added to initialize turns out to make a git commit, and before ensureCommit has been used. Thus, repo init can fail when the system has a broken hostname etc. Move the ensureCommit to the very first thing to avoid this kind of breakage.	2015-04-20 14:01:41 -04:00
Joey Hess	3a078ab357	When a key's size is unknown, still check the annex.diskreserve, and avoid getting content if the disk is too full. We can't check if there's enough disk space to download the content, but we can check if there's certainly not enough!	2015-04-17 21:29:15 -04:00
Joey Hess	86a2f9dc4d	Merge branch 'master' into concurrentprogress Conflicts: debian/changelog	2015-04-14 15:35:15 -04:00
Joey Hess	2b79e6fe08	a few hlints	2015-04-11 00:10:34 -04:00
Joey Hess	9971c82ead	refactor	2015-04-10 17:53:58 -04:00
Joey Hess	8077ccbd54	get, move, copy, mirror: Concurrent downloads and uploads are now supported! This works, and seems fairly robust. Clean get of 20 files at -J3. At -J10, there are some messages about ssh multiplexing, probably due to a race spinning up the ssh connection cacher. But, it manages to get all the files ok regardless. The progress bars are a scrambled mess though, due to bugs in ascii-progress, which I've already filed. Particularly this one: https://github.com/yamadapc/haskell-ascii-progress/issues/8	2015-04-10 17:08:07 -04:00
Joey Hess	0880c8319e	simplify and make more atomic	2015-04-10 15:16:17 -04:00
Joey Hess	ce0a82f493	contentlocationn: New plumbing command.	2015-04-09 15:34:47 -04:00
Joey Hess	b99b8d5d4c	followup to bug I cannot reproduce, and analysis based presumptive fix	2015-04-09 14:03:44 -04:00
Joey Hess	42e46a8701	avoid using --literal-pathspecs with git older than 1.8.1 which added it Windows is still building with an older git.	2015-04-06 13:46:11 -04:00
Joey Hess	1d57f142f1	Merge branch 'concurrentprogress'	2015-04-04 15:01:00 -04:00
Joey Hess	2343f99c85	well along the way to fully quiet --quiet Came up with a generic way to filter out progress messages while keeping errors, for commands that use stderr for both. --json mode will disable command outputs too.	2015-04-04 14:34:03 -04:00
Joey Hess	ff2eeaf054	avoid progress bar for url download with --quiet	2015-04-03 20:38:56 -04:00
Joey Hess	bd110516c0	init: Improve fifo test to detect NFS systems that support fifos but not well enough for sshcaching. ssh tries to hard link a fifo, and if not, complains: muxserver_listen: link mux listener .git/annex/ssh/SHARD1@iabak.archiveteam.org.QK8zOCbtNebI7q54 => .git/annex/ssh/SHARD1@iabak.archiveteam.org: Operation not permitted	2015-04-03 14:57:10 -04:00
Joey Hess	0a6933771d	cleanup	2015-03-30 19:55:35 -04:00
Joey Hess	15d45186cc	use --literal-pathspecs globally, as a better way to avoid globbing This might be overkill; I only know I need it in ls-files, but other git commands can also do their own globbing, it turns out, and I am pretty sure I never want them too when git-annex is using them as plumbing. Test suite still passes and it looks ok.	2015-03-30 19:44:13 -04:00
Joey Hess	5be536e523	Fix bug introduced in the last release that broke git-annex sync when git-annex was installed from the standalone tarball. This was introduced by commit `450ee53ab6` However, the same problem could affect other calls to programPath, specifically some on the assistant. So, I fixed it at a deeper level.	2015-03-27 12:55:18 -04:00
Joey Hess	3af4691978	Improve error message when --in @date is used and there is no reflog for the git-annex branch.	2015-03-26 11:15:15 -04:00
Joey Hess	798da6cf2e	Added a post-update-annex hook, which is run after the git-annex branch is updated. Needed for git update-server-info. See https://github.com/datalad/datalad/issues/1#issuecomment-84094406	2015-03-20 14:52:58 -04:00
Joey Hess	cf903d5a3c	fixup annex link target calculation when submodules are used in filesystems not supporting symlinks	2015-03-04 16:08:41 -04:00
Joey Hess	e322826e33	Submodules are now supported by git-annex! Seems to work, but still experimental until it's been tested more. When repositories are on filesystems not supporting symlinks, the .git dir symlink trick cannot be used. Since we're going to be in direct mode anyway, the .git dir symlink is not strictly needed. However, I have not fixed the code that creates new annex symlinks to handle this case -- the committed symlinks will be wrong. git annex sync happens to currently fail in a submodule using direct mode, because there's no HEAD ref. That also needs to be dealt with to get this fully working in crippled filesystems. Leaving http://github.com/datalad/datalad/issues/44 open until these issues are dealt with.	2015-03-02 16:43:44 -04:00
Joey Hess	450ee53ab6	When re-execing git-annex, use current program location, rather than ~/.config/git-annex/program, when possible. Most of the time, there will be no discreprancy between programPath and readProgramFile. But, the programFile might have been written by an old version of git-annex that is still installed, while a newer one is currently running. In this case, we want to run the same one that's currently running. This is especially important for things like the GIT_SSH=git-annex used for ssh connection caching. The only code that still uses readProgramFile directly is the upgrade code, which needs to know where the standalone git-annex was installed, in order to upgrade it.	2015-02-28 17:23:13 -04:00
Joey Hess	b9275b65f9	make programPath return FilePath not Maybe FilePath Looking at the few current callers, it's ok to have programPath throw an exception, in the unusual case where it cannot find git-annex.	2015-02-28 16:59:52 -04:00
Joey Hess	afb3e3e472	avoid crash when starting fsck --incremental when one is already running Turns out sqlite does not like having its database deleted out from underneath it. It might suffice to empty the table, but I would rather start each fsck over with a new database, so I added a lock file, and running incremental fscks use a shared lock. This leaves one concurrency bug left; running two concurrent fsck --more will lead to: "SQLite3 returned ErrorBusy while attempting to perform step." and one or both will fail. This is a concurrent writers problem.	2015-02-17 13:30:24 -04:00
Joey Hess	15107d2c5a	propigate ssh-options everywhere ssh caching is used * sync: Use the ssh-options git config when doing git pull and push. * remotedaemon: Use the ssh-options git config. Note that the rename env var means that if a new git-annex calls an old one for git-annex ssh, or a new calls an old, nothing much will go wrong; just ssh caching won't happen.	2015-02-12 16:14:53 -04:00
Joey Hess	5be7ba7ee5	The ssh-options git config is now used by gcrypt, rsync, and ddar special remotes that use ssh as a transport.	2015-02-12 15:44:10 -04:00
Joey Hess	7fce85adac	Improve race recovery code when committing to git-annex branch.	2015-02-09 18:34:48 -04:00
Joey Hess	b94eb9b22c	relFile does not have to be relative; rename to currFile	2015-02-06 16:03:02 -04:00
Joey Hess	c8163ce29a	use a Set	2015-01-28 18:17:10 -04:00
Joey Hess	b0575c621f	implement annex.tune.branchhash1 I hope this doesn't impact speed much -- it does have to pull out a value from Annex state every time it accesses the branch now. The test case I dropped has never caught any problems that I can remember, and would have been rather difficult to convert.	2015-01-28 17:17:26 -04:00
Joey Hess	009bd050c1	implement annex.tune.objecthashlower Split out Annex.DirHashes which never really belonged in Locations.	2015-01-28 16:52:08 -04:00
Joey Hess	e8c376e0ad	import Data.Default in Common	2015-01-28 16:11:28 -04:00

... 2 3 4 5 6 ...

923 commits