git-annex

Author	SHA1	Message	Date
Joey Hess	b657242f5d	enforce retrievalSecurityPolicy Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon.	2018-06-21 13:37:01 -04:00
Joey Hess	28720c795f	limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.	2018-06-16 11:57:50 -04:00
Joey Hess	c34152777b	Use http-conduit for url downloads by default, annex.web-options enables curl * For url downloads, git-annex now defaults to using a http library, rather than wget or curl. But, if annex.web-options is set, it will use curl. To use the .netrc file, run: git config annex.web-options --netrc * git-annex no longer uses wget (and wget is no longer shipped with git-annex builds). Note that curl is always run in silent mode, since the new API for download has a MeterUpdate and doesn't make way for curl progress output. It might be worth writing a parser for curl's progress output to update the meter when using it, but I didn't bother with this edge case for now. This commit was supported by the NSF-funded DataLad project.	2018-04-06 17:36:20 -04:00
Joey Hess	9b98d3f630	better HTTP connection reuse Enable HTTP connection reuse across multiple files, when git-annex uses http-conduit. Before, a new Manager was created each time Utility.Url used it. Now, a single Manager gets created the first time, so connections are reused. Doesn't help when external programs are used for url download, but does speed up addurl --fast, fsck --from web, etc. Testing fsck --fast --from web with 3 files, over high-latency satellite internet, it sped up from 19.37s to 14.96s. This commit was supported by the NSF-funded DataLad project.	2018-04-04 15:39:40 -04:00
Joey Hess	4015c5679a	force verification when resuming download When resuming a download and not using a rolling checksummer like rsync, the partial file we start with might contain garbage, in the case where a file changed as it was being downloaded. So, disabling verification on resumes risked a bad object being put into the annex. Even downloads with rsync are currently affected. It didn't seem worth the added complexity to special case those to prevent verification, especially since git-annex is using rsync less often now. This commit was sponsored by Brock Spratlen on Patreon.	2018-03-13 14:50:49 -04:00
Joey Hess	31e1adc005	deal with unlocked files P2P protocol version 1 adds VALID\|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.	2018-03-13 14:27:14 -04:00
Joey Hess	f4103744c3	make sure that lockContentShared is always paired with an inAnnex check lockContentShared had a screwy caveat that it didn't verify that the content was present when locking it, but in the most common case, eg indirect mode, it failed to lock when the content is not present. That led to a few callers forgetting to check inAnnex when using it, but the potential data loss was unlikely to be noticed because it only affected direct mode I think. Fix data loss bug when the local repository uses direct mode, and a locally modified file is dropped from a remote repsitory. The bug caused the modified file to be counted as a copy of the original file. (This is not a severe bug because in such a situation, dropping from the remote and then modifying the file is allowed and has the same end result.) And, in content locking over tor, when the remote repository is in direct mode, it neglected to check that the content was actually present when locking it. This could cause git annex drop to remove the only copy of a file when it thought the tor remote had a copy. So, make lockContentShared do its own inAnnex check. This could perhaps be optimised for direct mode, to avoid the check then, since locking the content necessarily verifies it exists there, but I have not bothered with that. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-07 14:23:52 -04:00
Joey Hess	639a6df58a	fix windows build	2017-12-05 13:11:03 -04:00
Joey Hess	1228fe8c86	honor annex.diskreserve when running youtube-dl This commit was sponsored by André Pereira on Patreon.	2017-11-30 16:14:36 -04:00
Joey Hess	2528e3ddb0	rethought --relaxed change Better to make it not be surprising and slow, than surprising and fast. --raw can be used when it needs to be really fast. Implemented adding a youtube-dl supported url to an existing file. This commit was sponsored by andrea rota.	2017-11-30 14:13:20 -04:00
Joey Hess	99bebdface	youtube-dl working Including resuming and cleanup of incomplete downloads. Still todo: --fast, --relaxed, importfeed, disk reserve checking, quvi code cleanup. This commit was sponsored by Anthony DeRobertis on Patreon.	2017-11-29 16:40:32 -04:00
Joey Hess	4e7e1fcff4	add gitAnnexTmpWorkDir and withTmpWorkDir Needed to run youtube-dl in, but could also be useful for other stuff. The tricky part of this was making the workdir be cleaned up whenever the tmp object file is cleaned up. This commit was sponsored by Ole-Morten Duesund on Patreon.	2017-11-29 13:53:39 -04:00
Joey Hess	187b3e7780	enable LambdaCase and convert around 10% of places that could use it Needs ghc 7.6.1, so minimum base version increased slightly. All builds are well above this version of ghc, and debian oldstable is as well. Code that could use lambdacase can be found by running: git grep -B 1 'case ' \| less and searching in less for "<-" This commit was sponsored by andrea rota.	2017-11-15 16:59:32 -04:00
Joey Hess	8dd84b87f9	use unix-compat 0.5 on windows Re-applying `3ec579f5e1`	2017-11-14 14:00:24 -04:00
Joey Hess	5f55082d10	Revert "use unix-compat 0.5 on windows" This reverts commit `3ec579f5e1`. Too early for this; needs newer Win32 version. Le sigh.	2017-11-09 15:14:00 -04:00
Joey Hess	3ec579f5e1	use unix-compat 0.5 on windows That version has my patches for the problems that Utility.PosixFiles was working around, so am able to get rid of that module now. This will later allow bringing back the custom-setup stanza in the cabal file. It will need to depend on unix-compat 0.5 on all OS's, which I'm not ready to do yet. This commit was sponsored by Nick Daly on Patreon.	2017-11-09 12:47:05 -04:00
Joey Hess	16eb2f976c	prevent exporttree=yes on remotes that don't support exports Don't allow "exporttree=yes" to be set when the special remote does not support exports. That would be confusing since the user would set up a special remote for exports, but `git annex export` to it would later fail. This commit was supported by the NSF-funded DataLad project.	2017-09-07 13:48:44 -04:00
Joey Hess	662f2a5ee7	git annex get from exports Straightforward enough, except for the needed belt-and-suspenders sanity checks to avoid foot shooting due to exports not being key/value stores. * Even when annex.verify=false, always verify from exports. * Only get files from exports that use a backend that supports checksum verification. * Never trust exports, even if the user says to, because then `git annex drop` would drop content if the export seemed to contain a copy. This commit was supported by the NSF-funded DataLad project.	2017-09-04 16:39:56 -04:00
Joey Hess	07f1e638ee	annex.securehashesonly Cryptographically secure hashes can be forced to be used in a repository, by setting annex.securehashesonly. This does not prevent the git repository from containing files with insecure hashes, but it does prevent the content of such files from being pulled into .git/annex/objects from another repository. We want to make sure that at no point does git-annex accept content into .git/annex/objects that is hashed with an insecure key. Here's how it was done: * .git/annex/objects/xx/yy/KEY/ is kept frozen, so nothing can be written to it normally * So every place that writes content must call, thawContent or modifyContent. We can audit for these, and be sure we've considered all cases. * The main functions are moveAnnex, and linkToAnnex; these were made to check annex.securehashesonly, and are the main security boundary for annex.securehashesonly. * Most other calls to modifyContent deal with other files in the KEY directory (inode cache etc). The other ones that mess with the content are: - Annex.Direct.toDirectGen, in which content already in the annex directory is moved to the direct mode file, so not relevant. - fix and lock, which don't add new content - Command.ReKey.linkKey, which manually unlocks it to make a copy. * All other calls to thawContent appear safe. Made moveAnnex return a Bool, so checked all callsites and made them deal with a failure in appropriate ways. linkToAnnex simply returns LinkAnnexFailed; all callsites already deal with it failing in appropriate ways. This commit was sponsored by Riku Voipio.	2017-02-27 13:33:59 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	d4fbc3b460	make --json-progress work for url downloads	2016-09-09 16:15:39 -04:00
Joey Hess	1a0e2c9901	get, move, copy, mirror: Added --failed switch which retries failed copies/moves Note that get --from foo --failed will get things that a previous get --from bar tried and failed to get, etc. I considered making --failed only retry transfers from the same remote, but it was easier, and seems more useful, to not have the same remote requirement. Noisy due to some refactoring into Types/	2016-08-03 12:37:12 -04:00
Joey Hess	9d952fe9d1	reinject: When src file's content cannot be verified, leave it alone, instead of deleting it.	2016-04-20 13:21:56 -04:00
Joey Hess	b7c8bf5274	Preserve execute bits of unlocked files in v6 mode. When annex.thin is set, adding an object will add the execute bits to the work tree file, and this does mean that the annex object file ends up executable. This doesn't add any complexity that wasn't already present, because git annex add of an executable file has always ingested it so that the annex object ends up executable. But, since an annex object file can be executable or not, when populating an unlocked file from one, the executable bit is always added or removed to match the mode of the pointer file.	2016-04-14 14:47:08 -04:00
Joey Hess	cf06dac2b8	hard links on windows * annex.thin and annex.hardlink are now supported on Windows. * unannex --fast now makes hard links on Windows.	2016-04-08 15:25:32 -04:00
Joey Hess	4b3355cf3c	refactor	2016-03-09 13:43:22 -04:00
Joey Hess	9039bdb4ea	Always try to thaw content, even when annex.crippledfilesystem is set.	2016-03-09 13:33:13 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	3b960d1422	migrate and rekey v6 unlocked file support	2016-01-07 15:14:15 -04:00
Joey Hess	b3d60ca285	use TopFilePath for associated files Fixes several bugs with updates of pointer files. When eg, running git annex drop --from localremote it was updating the pointer file in the local repository, not the remote. Also, fixes drop ../foo when run in a subdir, and probably lots of other problems. Test suite drops from ~30 to 11 failures now. TopFilePath is used to force thinking about what the filepath is relative to. The data stored in the sqlite db is still just a plain string, and TopFilePath is a newtype, so there's no overhead involved in using it in DataBase.Keys.	2016-01-05 17:22:19 -04:00
Joey Hess	a2c056df65	convert isPointerFile from Annex to IO	2016-01-01 13:22:38 -04:00
Joey Hess	2e9341a47d	fix inode cache consistency bug when a merge unlocks a present file Since the file was present and locked, its annex object was not in the inode cache. So, despite not needing to update the annex object when the clean filter is run on the content by git merge, it does need to record the inode cache of the annex object. Otherwise, the annex object will be assumed to be bad, since its inode is not cached.	2015-12-29 16:26:27 -04:00
Joey Hess	645833774d	fix windows build	2015-12-28 12:44:04 -04:00
Joey Hess	121f5d5b0c	annex.thin Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.	2015-12-27 15:59:59 -04:00
Joey Hess	cfaac52b88	populate unlocked files with newly available content when ingesting This can happen when ingesting a new file in either locked or unlocked mode, when some unlocked files in the repo use the same key, and the content was not locally available before.	2015-12-22 16:22:28 -04:00
Joey Hess	4392140946	make linkAnnex detect when the file changes as it's being copied/linked in This fixes a race where the modified file ended up in annex/objects, and the InodeCache stored in the database was for the modified version, so git-annex didn't know it had gotten modified. The race could occur when the smudge filter was running; now it gets the InodeCache before generating the Key, which avoids the race.	2015-12-22 15:20:03 -04:00
Joey Hess	f9d077186a	implemented upgrade of direct mode repo to v6	2015-12-15 16:00:26 -04:00
Joey Hess	2bc920e266	update inode cache to cover file even when nothing needs to be done to linkAnnex This covers the case where multiple files have the same content and are added with git add. Previously only the one that was linked to the annex got its inode cached; now both are.	2015-12-15 13:02:33 -04:00
Joey Hess	1dad3af3fc	checked getKeysPresent; it's ok for v6 unlocked files When a v6 unlocked files is removed from the work tree, unused doesn't show it. When it gets removed from the index, unused does show it. This is the same as a locked file.	2015-12-11 16:12:42 -04:00
Joey Hess	7790e059b2	finish v6 git-annex lock This was a doozy!	2015-12-11 15:28:34 -04:00
Joey Hess	50e83b606c	only make 1 hardlink max between pointer file and annex object If multiple files point to the same annex object, the user may want to modify them independently, so don't use a hard link. Also, check diskreserve when copying.	2015-12-11 14:00:21 -04:00
Joey Hess	c608a752a5	Merge branch 'master' into smudge	2015-12-11 13:50:31 -04:00
Joey Hess	abd66c7089	fsck: Failed to honor annex.diskreserve when checking a remote.	2015-12-11 13:50:27 -04:00
Joey Hess	c910b4e255	wip	2015-12-11 10:42:18 -04:00
Joey Hess	9dffd3d255	add generalized linkAnnex'	2015-12-10 16:08:19 -04:00
Joey Hess	f80a3d8cd0	check InodeCache in inAnnex et al This avoids querying the database when the content file doen't exist (or otherwise fails the provided check). However, it does add overhead of querying the database, and will certianly impact performance.	2015-12-10 14:51:04 -04:00
Joey Hess	2b8f6b8b2f	check inode cache in prepSendAnnex This does mean one query of the database every time an object is sent. May impact performance.	2015-12-10 14:50:52 -04:00
Joey Hess	3719d1b390	make clear when code is using deprecated direct mode files	2015-12-09 19:43:15 -04:00
Joey Hess	aa88851ec1	reorder	2015-12-09 19:38:37 -04:00
Joey Hess	ce73a96e4e	use InodeCache when dropping a key to see if a pointer file can be safely reset The Keys database can hold multiple inode caches for a given key. One for the annex object, and one for each pointer file, which may not be hard linked to it. Inode caches for a key are recorded when its content is added to the annex, but only if it has known pointer files. This is to avoid the overhead of maintaining the database when not needed. When the smudge filter outputs a file's content, the inode cache is not updated, because git's smudge interface doesn't let us write the file. So, dropping will fall back to doing an expensive verification then. Ideally, git's interface would be improved, and then the inode cache could be updated then too.	2015-12-09 17:54:54 -04:00
Joey Hess	5e8c628d2e	add inode cache to the db Renamed the db to keys, since it is various info about a Keys. Dropping a key will update its pointer files, as long as their content can be verified to be unmodified. This falls back to checksum verification, but I want it to use an InodeCache of the key, for speed. But, I have not made anything populate that cache yet.	2015-12-09 17:00:37 -04:00
Joey Hess	3311c48631	move InodeSentinal from direct mode code to its own module Will be used outside of direct mode for v6 unlocked files, and is already used outside of direct mode when adding files to annex.	2015-12-09 15:52:11 -04:00
Joey Hess	8a818088a3	link/copy pointer files to object content when it's added	2015-12-09 15:27:29 -04:00
Joey Hess	99b2a524a0	clean filter should update location log when adding new content to annex	2015-12-04 14:20:32 -04:00
Joey Hess	2c6454a2e2	basic clean filter working	2015-12-04 13:39:14 -04:00
Joey Hess	0d432dd1a4	annex object file mode for core.sharedRepository When core.sharedRepository is set, annex object files are not made mode 444, since that prevents a user other than the file owner from locking them. Instead, a mode such as 664 is used in this case.	2015-11-18 15:45:32 -04:00
Joey Hess	3449c0e8ec	avoid spawning file size polling thread when not in -J mode	2015-11-16 21:21:58 -04:00
Joey Hess	e97fce35a6	Display progress meter in -J mode when downloading from the web. Including in addurl, and get --from web, but also in S3 and External special remotes when a web url is known for content in those remotes.	2015-11-16 21:00:54 -04:00
Joey Hess	aaf1ef268d	convert from Utility.LockPool to Annex.LockPool everywhere	2015-11-12 18:13:37 -04:00
Joey Hess	7938b87864	add: Fix error recovery rollback to not move the injested file content out of the annex back to the file, because other files may point to that same content. Instead, copy the injected file content out to recover. That was not a data loss, but it came close!	2015-11-06 15:28:20 -04:00
Joey Hess	1ff7610118	fix windows build	2015-10-12 15:48:59 -04:00
Joey Hess	3b89d5a20c	implement lockContent for ssh remotes	2015-10-09 16:55:41 -04:00
Joey Hess	6a72045707	fix local dropping to not require extra locking of copies, but only that the local copy be locked for removal	2015-10-09 15:48:02 -04:00
Joey Hess	b021321aae	rename constructor	2015-10-09 15:01:33 -04:00
Joey Hess	f57ac29be1	refactor	2015-10-09 10:30:22 -04:00
Joey Hess	c75c79864d	support invalidating existing VerifiedCopys	2015-10-08 17:58:32 -04:00
Joey Hess	90f7c4b6a2	add VerifiedCopy data type There should be no behavior changes in this commit, it just adds a more expressive data type and adjusts code that had been passing around a [UUID] or sometimes a Maybe Remote to instead use [VerifiedCopy]. Although, since some functions were taking two different [UUID] lists, there's some potential for me to have gotten it horribly wrong.	2015-10-08 16:55:11 -04:00
Joey Hess	9cb9dab69b	I think this comment is stale/confusing; remove	2015-10-08 14:51:44 -04:00
Joey Hess	4d50958ed7	add lockContentShared Also, rename lockContent to lockContentExclusive inAnnexSafe should perhaps be eliminated, and instead use `lockContentShared inAnnex`. However, I'm waiting on that, as there are only 2 call sites for inAnnexSafe and it's fiddly.	2015-10-08 14:29:35 -04:00
Joey Hess	2def1d0a23	other 80% of avoding verification when hard linking to objects in shared repo In `c6632ee5c8`, it actually only handled uploading objects to a shared repository. To avoid verification when downloading objects from a shared repository, was a lot harder. On the plus side, if the process of downloading a file from a remote is able to verify its content on the side, the remote can indicate this now, and avoid the extra post-download verification. As of yet, I don't have any remotes (except Git) using this ability. Some more work would be needed to support it in special remotes. It would make sense for tahoe to implicitly verify things downloaded from it; as long as you trust your tahoe server (which typically runs locally), there's cryptographic integrity. OTOH, despite bup being based on shas, a bup repo under an attacker's control could have the git ref used for an object changed, and so a bup repo shouldn't implicitly verify. Indeed, tahoe seems unique in being trustworthy enough to implicitly verify.	2015-10-02 14:35:12 -04:00
Joey Hess	7c7fe895f9	disabling verification also disables size verification It's not expensive to do size verification, but let's be consistent and turn it off too.	2015-10-02 12:38:02 -04:00
Joey Hess	c6632ee5c8	avoid verification when hard linking to objects in shared repository Such a repository is implicitly trusted, so there's no point.	2015-10-02 12:36:03 -04:00
Joey Hess	2fb3722ce9	Do verification of checksums of annex objects downloaded from remotes. * When annex objects are received into git repositories, their checksums are verified then too. * To get the old, faster, behavior of not verifying checksums, set annex.verify=false, or remote.<name>.annex-verify=false. * setkey, rekey: These commands also now verify that the provided file matches the key, unless annex.verify=false. * reinject: Already verified content; this can now be disabled by setting annex.verify=false. recvkey and reinject already did verification, so removed now duplicate code from them. fsck still does its own verification, which is ok since it does not use getViaTmp, so verification doesn't happen twice when using fsck --from.	2015-10-01 15:56:39 -04:00
Joey Hess	b72d3fbeba	rename function	2015-10-01 14:18:57 -04:00
Joey Hess	807ba6a903	refactor	2015-10-01 14:07:06 -04:00
Joey Hess	ea765ec022	windows build warning fixes	2015-08-03 15:54:29 -04:00
Joey Hess	267f397d82	avoid calling copy when file DNE This avoids an ugly warning when running git annex fsck --from a rsync remote in a repo in direct mode.	2015-07-30 13:40:17 -04:00
Joey Hess	0a998032ed	Fix bug that prevented enumerating locally present objects in repos tuned with annex.tune.objecthash1=true Need to walk 1 level of subdirs less in this case. The git-annex branch traversal code didn't have a similar bug.	2015-06-11 15:15:05 -04:00
Joey Hess	d28e8fbfd5	get --incomplete: New option to resume any interrupted downloads.	2015-06-02 14:20:38 -04:00
Joey Hess	83b262f1b6	fix windows build	2015-05-22 13:54:54 -04:00
Joey Hess	167539a354	better memoize core.sharedrepository handling It was memoized, but that was not used consistently. Move it to Types.GitConfig so it will auto-memoize.	2015-05-19 15:04:24 -04:00
Joey Hess	b47c9fd587	honor core.sharedRepository settings in lockContent The content file may not be owned by the user running git-annex, in which case, setting the owner write bit was not enough to let lockContent act on the file. However, with some core.sharedRepository configs, the file should be writable by the user's group. So, the thing to do is to call thawContent on it.	2015-05-19 14:53:19 -04:00
Joey Hess	f4e2093760	fix inAnnexSafe result for direct file that is being dropped It was returning Just False in this situation, which differed from indirect mode behavior. I don't think this led to any actual problems; things that checked if the file being dropped was present just failed to fail, and instead reported it wasn't present, possibly incorrectly. Hmm, it's possible that this could have made git annex fsck --from remote update the location log wrongly, if a remote was in direct mode, and was in the middle of trying to drop a key, and the drop later failed.	2015-05-19 14:26:07 -04:00
Joey Hess	1312e721ed	convert lockContent to use new LockPools Also cleaned up the code, avoiding creating a lock file if we're going to open it for create later anyway. And, if there's an exception while preparing to lock the file, but not at the point of actually taking the lock, throw an exception, instead of silently not locking and pretending to succeed. And, on Windows, always use lock file, even if the repo somehow got into indirect mode (maybe with cygwin git..)	2015-05-19 14:12:23 -04:00
Joey Hess	ecb0d5c087	use lock pools throughout git-annex The one exception is in Utility.Daemon. As long as a process only daemonizes once, which seems reasonable, and as long as it avoids calling checkDaemon once it's already running as a daemon, the fcntl locking gotchas won't be a problem there. Annex.LockFile has it's own separate lock pool layer, which has been renamed to LockCache. This is a persistent cache of locks that persist until closed. This is not quite done; lockContent stil needs to be converted.	2015-05-19 14:09:52 -04:00
Joey Hess	a812d598ef	Take space that will be used by running downloads into account when checking annex.diskreserve.	2015-05-12 15:20:22 -04:00
Joey Hess	08308dc9b3	fix build warning with ghc 7.10	2015-05-10 15:28:13 -04:00
Joey Hess	3a078ab357	When a key's size is unknown, still check the annex.diskreserve, and avoid getting content if the disk is too full. We can't check if there's enough disk space to download the content, but we can check if there's certainly not enough!	2015-04-17 21:29:15 -04:00
Joey Hess	2b79e6fe08	a few hlints	2015-04-11 00:10:34 -04:00
Joey Hess	ce0a82f493	contentlocationn: New plumbing command.	2015-04-09 15:34:47 -04:00
Joey Hess	2343f99c85	well along the way to fully quiet --quiet Came up with a generic way to filter out progress messages while keeping errors, for commands that use stderr for both. --json mode will disable command outputs too.	2015-04-04 14:34:03 -04:00
Joey Hess	ff2eeaf054	avoid progress bar for url download with --quiet	2015-04-03 20:38:56 -04:00
Joey Hess	e8c376e0ad	import Data.Default in Common	2015-01-28 16:11:28 -04:00
Joey Hess	70736d2b41	Repository tuning parameters can now be passed when initializing a repository for the first time. * init: Repository tuning parameters can now be passed when initializing a repository for the first time. For details, see http://git-annex.branchable.com/tuning/ * merge: Refuse to merge changes from a git-annex branch of a repo that has been tuned in incompatable ways.	2015-01-27 17:38:06 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	4f657aa14e	add getFileSize, which can get the real size of a large file on Windows Avoid using fileSize which maxes out at just 2 gb on Windows. Instead, use hFileSize, which doesn't have a bounded size. Fixes support for files > 2 gb on Windows. Note that the InodeCache code only needs to compare a file size, so it doesn't matter it the file size wraps. So it has been left as-is. This was necessary both to avoid invalidating existing inode caches, and because the code passed FileStatus around and would have become more expensive if it called getFileSize. This commit was sponsored by Christian Dietrich.	2015-01-20 17:09:24 -04:00
Joey Hess	3bab5dfb1d	revert parentDir change Reverts `965e106f24` Unfortunately, this caused breakage on Windows, and possibly elsewhere, because parentDir and takeDirectory do not behave the same when there is a trailing directory separator.	2015-01-09 13:11:56 -04:00
Joey Hess	965e106f24	made parentDir return a Maybe FilePath; removed most uses of it parentDir is less safe than takeDirectory, especially when working with relative FilePaths. It's really only useful in loops that want to terminate at / This commit was sponsored by Audric SCHILTKNECHT.	2015-01-06 18:55:56 -04:00
Joey Hess	7b50b3c057	fix some mixed space+tab indentation This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.	2014-10-09 15:09:11 -04:00
Joey Hess	b874f84086	New annex.hardlink setting. Closes: #758593 * New annex.hardlink setting. Closes: #758593 * init: Automatically detect when a repository was cloned with --shared, and set annex.hardlink=true, as well as marking the repository as untrusted. Had to reorganize Logs.Trust a bit to avoid a cycle between it and Annex.Init.	2014-09-05 13:44:09 -04:00
Joey Hess	6eb5c3f479	Do not preserve permissions and acls when copying files from one local git repository to another. Timestamps are still preserved as long as cp --preserve=timestamps is supported. This avoids cp -a overriding the default mode acls that the user might have set in a git repository. With GNU cp, this behavior change should not be a breaking change, because git-anex also uses rsync sometimes in the same situation, and has only ever preserved timestamps when using rsync. Systems without GNU cp will no longer use cp -a, but instead just cp. So, timestamps will no longer be preserved. Preserving timestamps when copying between repos is not guaranteed anyway. Closes: #729757	2014-08-26 17:10:25 -07:00
Joey Hess	aebcc395ff	use types to enforce that removeAnnex can only be called inside lockContent This fixed one bug where it needed to be and wasn't (in Assistant.Unused). And also found one place where lockContent was used unnecessarily (by drop --from remote). A few other places like uninit probably don't really need to lockContent, but it doesn't hurt to do call it anyway. This commit was sponsored by David Wagner.	2014-08-20 20:13:47 -04:00
Joey Hess	1994771215	more lock file refactoring Also fixes a test suite failures introduced in recent commits, where inAnnexSafe failed in indirect mode, since it tried to open the lock file ReadWrite. This is why the new checkLocked opens it ReadOnly. This commit was sponsored by Chad Horohoe.	2014-08-20 18:58:14 -04:00
Joey Hess	e386e26ef2	avoid trying to create a content file in order to lock it The nice refactoring in `ec7dd0446a` highlighted a bug in lockContent -- when the content is not present, this incorrectly created an empty lock file, using the same filename as the content file. This seems like it could result in empty objects, which fsck would detect and complain about. Both drop and move --to call lockContent, as does Remote.Git.dropKey -- I think we got lucky and this bug didn't show up because both all of those only operate on files that are present. So this bug could only manifest if there was a race, and a file's content was dropped at just the wrong time, just as another process was about to drop it. (And then only if the other process's dropping failed, otherwise it'd delete the empty object file.) Hmm, move --from also called lockContent. Unnecessarily, since the content is not being removed from the local annex. In this case, the combination of the 2 bugs could result in an empty lock file being written, and then if the download of the content failed, left in the object directory as the content. This commit also optimises lockContent, avoiding an unncessary doesFileExist test and instead just catching the exception that's thrown when the file doesn't exist. This commit was sponsored by Justine Lam.	2014-08-20 17:25:30 -04:00
Joey Hess	ec7dd0446a	more lock file refactoring	2014-08-20 17:03:04 -04:00
Joey Hess	d279180266	reorganize and refactor lock code Added a convenience Utility.LockFile that is not a windows/posix portability shim, but still manages to cut down on the boilerplate around locking. This commit was sponsored by Johan Herland.	2014-08-20 16:45:58 -04:00
Joey Hess	092041fab0	Ensure that all lock fds are close-on-exec, fixing various problems with them being inherited by child processes such as git commands. (With the exception of daemon pid locking.) This fixes at part of #758630. I reproduced the assistant locking eg, a removable drive's annex journal lock file and forking a long-running git-cat-file process that inherited that lock. This did not affect Windows. Considered doing a portable Utility.LockFile layer, but git-annex uses posix locks in several special ways that have no direct Windows equivilant, and it seems like it would mostly be a complication. This commit was sponsored by Protonet.	2014-08-20 11:37:02 -04:00
Joey Hess	c784ef4586	unify exception handling into Utility.Exception Removed old extensible-exceptions, only needed for very old ghc. Made webdav use Utility.Exception, to work after some changes in DAV's exception handling. Removed Annex.Exception. Mostly this was trivial, but note that tryAnnex is replaced with tryNonAsync and catchAnnex replaced with catchNonAsync. In theory that could be a behavior change, since the former caught all exceptions, and the latter don't catch async exceptions. However, in practice, nothing in the Annex monad uses async exceptions. Grepping for throwTo and killThread only find stuff in the assistant, which does not seem related. Command.Add.undo is changed to accept a SomeException, and things that use it for rollback now catch non-async exceptions, rather than only IOExceptions.	2014-08-07 22:03:29 -04:00
Joey Hess	bc9e4697b9	better type for Retriever Putting a callback in the Retriever type allows for the callback to remove the retrieved file when it's done with it. I did not really want to make Retriever be fixed to Annex Bool, but when I tried to use Annex a, I got into some type of type mess.	2014-07-29 18:41:41 -04:00
Joey Hess	47e522979c	allow Retriever action to update the progress meter Needed for eg, Remote.External. Generally, any Retriever that stores content in a file is responsible for updating the meter, while ones that procude a lazy bytestring cannot update the meter, so are not asked to.	2014-07-29 17:18:49 -04:00
Joey Hess	d0c1a22e7c	import metadata from feeds When annex.genmetadata is set, metadata from the feed is added to files that are imported from it. Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names that are used in --template. Also added title and author, which are the item title/author if available, falling back to the feed title/author. These are more likely to be common metadata fields. (There is a small bit of dupication here, but once git gets around to packing the object, it will compress it away.) The itempubdate field is not included in the metadata as a string; instead it is used to generate year and month fields, same as is done when adding files with annex.genmetadata set. This commit was sponsored by Amitai Schlair, who cooincidentially is responsible for ikiwiki generating nice feed metadata!	2014-07-03 14:15:00 -04:00
Joey Hess	dcddacfd5c	fixed getting files from bare repos on windows	2014-06-05 15:54:06 -04:00
Joey Hess	eb86f1338f	wip	2014-06-05 15:31:23 -04:00
Joey Hess	1f99a6778f	Fix direct mode getKeysPresent false positive & also sped up direct mode unused and unannex unused: In direct mode, files that are deleted from the work tree are no longer incorrectly detected as unused. Direct mode `git annex info` slows down a bit due to more stringent checking, but not by a lot.	2014-03-07 12:43:56 -04:00
Joey Hess	a1432bce2f	Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects. This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO of temp object files is too annoying (and if you don't want to keep partially transferred objects across reboots). .git/annex/misctmp must be on the same filesystem as the git work tree, since files are moved to there in a way that will not work cross-device, as well as symlinked into there. I first wanted to put the tmp objects in .git/annex/objects/tmp, but that would pose transition problems on upgrade when partially transferred objects existed. git annex info does not currently show the size of .git/annex/misctemp, since it should stay small. It would also be ok to make something clean it out, periodically.	2014-02-26 16:52:56 -04:00
Joey Hess	003fc2b7e1	add UrlOptions sum type	2014-02-24 22:00:25 -04:00
Joey Hess	c69d6eb035	Make annex.web-options be used in several places that call curl.	2014-02-24 21:29:37 -04:00
Joey Hess	db48b8a4a3	unused: Fix to actually detect unused keys when in direct mode.	2014-02-20 13:53:49 -04:00
Joey Hess	2d480602aa	random hlint (to give the autobuilder something new to build)	2014-02-11 00:41:19 -04:00
Joey Hess	43d17632f6	remove workaround for old bug `ef24751922` described a bug moving between remotes in direct mode; I can no longer reproduce it with this strange workaround removed. Also test suite still passes. Hope the broken code just got fixed in the meantime.	2014-02-06 17:36:14 -04:00
Joey Hess	897d877472	work around absNormPath not working on Windows When making git-annex links, we want unix-style paths in the link targets.	2014-02-06 17:17:35 -04:00
Joey Hess	1669e80e85	Windows: Avoid using unix-compat's rename, which refuses to rename directories. Opened a bug about this: https://github.com/jystic/unix-compat/issues/10	2014-01-29 15:19:03 -04:00
Joey Hess	721cc0cd22	rework annexed object locking in direct mode & support Windows Seems that locking of annexed objects when they're being dropped was broken in direct mode: * When taking the lock before dropping, it created the .git/annex/objects file, as an empty file. It seems that the dropping code deleted that, but that is not right, and for all I know could in some situation cause a corrupted object to leak out. * When the lock was checked, it actually tried to open each direct mode file, and checked if it was locked. Not the same lock used above, and could also fail if some consumer of the file locked it. Fixed this, and added windows support by switching direct mode to lock a .lck file.	2014-01-28 16:43:11 -04:00
Joey Hess	b93e485ef1	added annex.secure-erase-command config option.	2014-01-24 12:58:52 -04:00
Joey Hess	78c7c54fdb	also check diskreserve for quvi downloads	2014-01-04 15:38:59 -04:00
Joey Hess	f9e7b6cf61	addurl, importfeed: Honor annex.diskreserve as long as the size of the url can be checked. This adds a http HEAD before the download is done. That was already the case when the assistant was running, and it seems worth it to avoid filling up the whole disk, like happened to my server today.	2014-01-04 15:08:06 -04:00
Joey Hess	e563c7e6f4	fsck distribution key	2013-11-23 21:58:39 -04:00
Joey Hess	5561b46416	fix windows build	2013-11-18 11:05:16 -04:00
Joey Hess	d48b00ebed	Direct mode .git/annex/objects directories are no longer left writable Because that allowed writing to symlinks of files that are not present, which followed the link and put bad content in an object location. fsck: Fix up .git/annex/object directory permissions. This commit was sponsored by an anonymous bitcoin donor.	2013-11-15 14:52:03 -04:00
Joey Hess	18f4d1b400	queue downloads of keys that fsck finds with bad content	2013-10-10 17:27:00 -04:00
Joey Hess	12f6b9693a	Send a git-annex user-agent when downloading urls. Overridable with --user-agent option. Not yet done for S3 or WebDAV due to limitations of libraries used -- nether allows a user-agent header to be specified. This commit sponsored by Michael Zehrer.	2013-09-28 14:35:21 -04:00
Joey Hess	b405295aee	hlint test suite still passes	2013-09-25 03:09:06 -04:00
Joey Hess	67fda9e669	Honor core.sharedrepository when receiving and adding files in direct mode.	2013-09-03 13:35:49 -04:00
Joey Hess	a3224ce35b	avoid more build warnings on Windows	2013-08-04 14:05:36 -04:00
Joey Hess	06db8e0bd9	squash compiler warnings on Windows	2013-08-04 13:18:05 -04:00
Joey Hess	93f2371e09	get rid of __WINDOWS__, use mingw32_HOST_OS The latter is harder for me to remember, but avoids build failures in code used by the configure program.	2013-08-02 12:27:32 -04:00
Joey Hess	3e422cb5fa	fix uninit to delete content from annex when it ended up hard linked back to the work tree	2013-07-18 13:30:12 -04:00
Joey Hess	eba9ee5bc6	remove debug print	2013-05-27 11:18:18 -04:00
Joey Hess	bf86b5ca16	improve robustness of fromDirect and replaceFile Made fromDirect check that a file in the tree has good content (and is not a broken symlink either) before copying it to another file that has the same key. Made replaceFile clean up the temp file if the action that creates it, or the file replacement action fails.	2013-05-25 15:06:02 -04:00
Joey Hess	1b616c5d37	improve handling of receiving object in direct mode when associated files are modified Before, if a direct mode repo had one or more associated files that were modifed, moving the object into it would overwrite the associated files with the pristine object. Now, modified associated files are left unchanged. To ensure that, when an object is moved into a direct mode repo, it's not thrown away, it gets stored in indirect mode.	2013-05-17 16:25:18 -04:00
Joey Hess	b8e5b9c645	test suite passes in direct mode This fixes a bug with git annex add in direct mode. If some files already existed in the tree pointing at the same key as a file that was just added, and their content was not present, add neglected to copy the content to those files. I also changed the behavior of moveAnnex slightly: When content is moved into the annex in direct mode, it does not overwrite any content already present in direct mode files. That content may be modified after all.	2013-05-17 15:59:37 -04:00
Joey Hess	25cb9a48da	fix the day's Windows permissions damage	2013-05-14 20:15:14 -04:00
Joey Hess	fee6cd4635	fix imports	2013-05-14 14:21:35 -05:00
Joey Hess	abe8d549df	fix permission damage (thanks, Windows)	2013-05-11 23:54:25 -04:00
Joey Hess	3c7e30a295	git-annex now builds on Windows (doesn't work)	2013-05-11 15:03:00 -05:00
Joey Hess	6c74a42cc6	stub out POSIX stuff	2013-05-10 16:29:59 -05:00
Joey Hess	adde00f4f3	git-annex-shell: Ensure that received files can be read. Files transferred from some Android devices may have very broken permissions as received.	2013-05-06 17:30:57 -04:00
Joey Hess	0807211a67	thaw content directory in direct mode too A content directory can be frozen in direct mode. One way this can happen is if the content is transferred before direct mode has a mapping for it, so it's stored in the content directory. So, we need to thaw the content directory before doing things with it.	2013-04-30 19:33:43 -04:00
Joey Hess	11ca4cee34	refactor	2013-04-30 19:09:36 -04:00

1 2 3 4 5 ...

335 commits