git-annex

Author	SHA1	Message	Date
Joey Hess	df2001aa88	Improve display of errors when transfers fail Transfers from or to a local git repo could fail without a reason being given, if the content failed to verify, or if the object file's stat changed while it was being copied. Now display messages in these cases. Sponsored-by: Jack Hill on Patreon	2021-06-25 13:17:04 -04:00
Joey Hess	e147ae07f4	remove supportUnlocked check that is not worth its overhead moveAnnex only gets to that check if the object file was not present before. So in the case where dup files are being added repeatedly, it will only run the first time, and so there's no significant speedup from doing it; all it avoids is a single sqlite lookup. Since MVar accesses do have overhead, it's better to optimise for the common case, where unlocked files are supported. removeAnnex is less clear cut, but I think mostly is skipped running on keys when the object has already been dropped, so similar reasoning applies.	2021-06-15 09:28:56 -04:00
Joey Hess	014dc63a55	avoid sometimes expensive operations when annex.supportunlocked = false This will mostly just avoid a DB lookup, so things get marginally faster. But in cases where there are many files using the same key, it can be a more significant speedup. Added overhead is one MVar lookup per call, which should be small enough, since this happens after transferring or ingesting a file, which is always a lot more work than that. It would be nice, though, to move getGitConfig to AnnexRead, which there is an open todo about.	2021-06-14 12:40:41 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	4b739fc460	Fix build on Windows Thanks to bug reporter for the patch.	2020-11-19 12:33:00 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	af6af35228	split out Annex.Content.Presence This will let a module that Annex.Content imports use inAnnex. Unsure yet if I will need that, but this split still seems to make sense, and Annex.Content was way too long so splitting it is good.	2020-11-16 11:24:57 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	00937c4813	when downloading same content from multiple urls, only display error if all fail	2020-09-02 11:35:07 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	fe9cf1256e	move remoteList into dupState This does mean that RemoteDaemon.Transport.Tor's call runs it, otherwise no change, but this is groundwork for doing more such expensive actions in dupState.	2020-04-17 14:36:45 -04:00
Joey Hess	6d58ca94d6	some easy createDirectoryUnder conversions	2020-03-05 15:20:10 -04:00
Joey Hess	1883f7ef8f	support git remotes that need http basic auth using git credential to get the password One thing this doesn't do is wrap the password prompting inside the prompt action. So with -J, the output can be a bit garbled.	2020-01-22 16:16:19 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	890330f0fe	make --json-error-messages capture url download errors Convert Utility.Url to return Either String so the error message can be displated in the annex monad and so captured. (When curl is used, its errors are still not caught.)	2019-11-12 13:52:38 -04:00
Joey Hess	da6f4d8887	remove direct mode support from Annex.Content No longer used. The only possible user of it would be code in Upgrade.V5, so I verified that the parts of Annex.Content it used were not used to manipulate direct mode files.	2019-08-27 13:14:06 -04:00
Joey Hess	9d36c826c0	use fine-grained WorkerStages when transferring and verifying This means that Command.Move and Command.Get don't need to manually set the stage, and is a lot cleaner conceptually. Also, this makes Command.Sync.syncFile use the worker pool better. In the scenario where it first downloads content and then uploads it to some other remotes, it will start in TransferStage, then enter VerifyStage and then go back to TransferStage for each transfer to the remotes. Before, it entered CleanupStage after the download, and stayed in it for the upload, so too many transfer jobs could run at the same time. Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent inside onLocal. That has a Annex state for the remote, with no worker pool. So the resulting calls to enteringStage won't block in there. While Remote.Git.copyToRemote does do checksum verification, I realized that should not use a verification slot in the WorkerPool to do it. Because, it's reading back from eg, a removable disk to checksum. That will contend with other writes to that disk. It's best to treat that checksum verification as just part of the transer. So, removed the todo item about that, as there's nothing needing to be done.	2019-06-19 13:24:20 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	a26514d67e	Fix doubled progress display when downloading an url when -J is used. downloadUrl uses meteredFile, which sets up one progress meter, and Remote.Web also uses metered, so two progress meters are displayed for the same download. Reversion introduced with the http-conduit switch in `c34152777b` -- I don't know why the extra call to metered was added there. When -J is not used, the extra progress meter didn't display, but an extra blank line did get output, which is also fixed. This commit was sponsored by John Pellman on Patreon.	2018-12-30 12:29:49 -04:00
Joey Hess	aa8243df4c	dropunused edge case when annex.thin caused unused object to be modified dropunused: When an unused object file has gotten modified, eg due to annex.thin being set, don't silently skip it, but display a warning and let --force drop it. This commit was sponsored by Ethan Aubin.	2018-12-04 12:20:34 -04:00
Joey Hess	370757087d	catch lockContentForRemoval exception removeKey should not throw exceptions, so catch exception there In Assistant.Unused, keep trying to drop other keys if one drop fails	2018-11-15 15:39:57 -04:00
Joey Hess	1c71f563e0	explicitly close keys db in saveState Should be redundant, but test suite is ending up with a lot of extra sqlite connections before unused keys database handles get garbage collected. While running the test suite, I often saw 2-4+ open fds to the same repo's keys database. After this change, it seems to mostly have 1, occasionally 2. And that might explain some of the strange sqlite failures in the test suite. Especially the failures of test_lock_v7_force, where the keys database gets renamed to a new directory out from under sqlite.	2018-10-30 22:19:32 -04:00
Joey Hess	5ab0f48ffb	high-res mtimes Cache high-resolution mtimes for improved detection of modified files in v7 (and direct mode). Including on Windows. With back-compat support so old low-res mtimes won't break anything, and so the new information also won't break old versions of git-annex.	2018-10-30 00:41:26 -04:00
Joey Hess	e2c894d3df	remove debug prints	2018-10-26 12:56:40 -04:00
Joey Hess	c28ca8294f	optimize smudge --clean of unmodified file Usually, git won't run clean filter when a file is unmodified. But, when git checkout runs git annex smudge --update, it populates the pointer runs git update-index, which sees the file has changed and runs git annex smudge --clean, which was checksumming the file unncessarily as it re-ingested it. With annex.thin set, this is the difference between git checkout of a branch with a 1 gb file taking 30s and 0.1s. This commit was sponsored by Brett Eisenberg on Patreon.	2018-10-25 16:46:46 -04:00
Joey Hess	b2bafdb2fc	v6: Fix database inconsistency That could cause git-annex to get confused about whether a locked file's content was present, when the object file got touched. Unfortunately this means more work sometimes when annex.thin is set, since it has to checksum the file to tell if it's still got the right content. Had to suppress output when inAnnex calls isUnmodified, otherwise "(checksum...)" would be printed in places it ought not to be, eg "git annex get" could turn out not need to get anything, and so only display that. This commit was sponsored by Ole-Morten Duesund on Patreon.	2018-10-16 13:51:37 -04:00
Joey Hess	fcff64f8bb	optimisation: avoid stat call This commit was sponsored by Paul Walmsley on Patreon.	2018-09-05 17:26:12 -04:00
Joey Hess	b600ad71ce	make linkToAnnex freezeContent the object file v6: Fix annex object file permissions when git-annex add is run on a modified unlocked file, and in some related cases. If a hard link is made, don't freeze it; annex.thin uses writable object files. Also: For some reason, linkToAnnex used to thawContent src. I can see no reason why it needed to do that, so I eliminated that. This commit was sponsored by Brock Spratlen on Patreon.	2018-09-05 15:27:22 -04:00
Joey Hess	9ff1c62a4d	fix race If a pointer file is being populated and something modifies it at the same time, there was a race there the modified file's InodeCache could get added into the keys database. Note that replaceFile normally renames the temp file into place, so the inode cache caculated for the temp file will still be good. If it has to fall back to a copy, the worktree file won't be put in the inode cache. This has the same result as if the worktree file gets touched, and will be handled the same way. Eg, when dropping, isUnmodified will do an expensive comparison and notice that the worktree file does have the same content, and so drop it. This commit was supported by the NSF-funded DataLad project.	2018-08-22 15:22:52 -04:00
Joey Hess	e094cf3377	split out modules from Annex.Content	2018-08-22 14:45:53 -04:00
Joey Hess	82a239675f	narrow the race where a file gets modified before update-index Check just before running update-index if the worktree file's content is still the same, don't update it when it's been modified. This narrows the race window a lot, from possibly minutes or hours, to seconds or less. (Use replaceFile so that the worktree update happens atomically, allowing the InodeCache of the new worktree file to itself be gathered w/o any other race.) This doesn't eliminate the race; it can still occur in the window before update-index runs. When annex.queue is large, a lot of files will be statted by the checks, and so the window may still be large enough to be a problem. When only a few files are being processed, the window is as small as it is in the race where a modification gets overwritten by git-annex when it updates the worktree. Or maybe as small as whatever race git checkout/pull/merge may have when the worktree gets modified during it. Still, I've kept a todo about this race. This commit was supported by the NSF-funded DataLad project.	2018-08-16 15:56:43 -04:00
Joey Hess	82cfcfc838	better index file refresh method Use git update-index --refresh, since it's a little bit more efficient and the user can be told to run it if a locked index prevents git-annex from running it. This also fixes the problem where an annexed file was deleted in the index and a get of another file that uses the same key caused the index update to add back the deleted file. update-index will not add back the deleted file. Documented in tips/unlocked_files.mdwn the gotcha that the index update may conflict with other operations. I can't see any way to possibly avoid that conflict. One new todo about a race that causes a modification to be accidentially staged. Note that the assistant only flushes the git command queue when it commits a modification. I have not tested the assistant with v6 unlocked files, but assume most users of the assistant won't care if the index shows a file as modified for a while. This commit was supported by the NSF-funded DataLad project.	2018-08-16 14:16:24 -04:00
Joey Hess	5e87389f40	refactor	2018-08-15 13:46:28 -04:00
Joey Hess	48e9e12961	finally fixed v6 get/drop git status After updating the worktree for an add/drop, update git's index, so git status will not show the files as modified. What actually happens is that the index update removes the inode information from the index. The next git status (or similar) run then has to do some work. It runs the clean filter. So, this depends on the clean filter being reasonably fast and on git not leaking memory when running it. Both problems were fixed in `a96972015d`, but only for git 2.5. Anyone using an older git will see very expensive git status after an add/drop. This uses the same git update-index queue as other parts of git-annex, so the actual index update is fairly efficient. Of course, updating the index does still have some overhead. The annex.queuesize config will control how often the index gets updated when working on a lot of files. This is an imperfect workaround... Added several todos about new problems this workaround causes. Still, this seems a lot better than the old behavior. This commit was supported by the NSF-funded DataLad project.	2018-08-14 16:23:58 -04:00
Joey Hess	ae11394efa	added annex.commitmessage Added annex.commitmessage config that can specify a commit message for the git-annex branch instead of the usual "update". This commit was supported by the NSF-funded DataLad project.	2018-08-02 14:06:06 -04:00
Joey Hess	2c62f8e63d	remove empty tmp workdir on failure No point in keeping an empty tmp workdir around. The associated tmp object file is retained even if empty, didn't want to deal with any possible races with something else downloading to that file at the same time this would check if it's empty. Anyhow, temp object files are normally retained, and this will get cleaned out the same way those do, by dropunused.	2018-06-28 12:58:11 -04:00
Joey Hess	eb8a8976a9	comment	2018-06-21 20:54:02 -04:00
Joey Hess	b657242f5d	enforce retrievalSecurityPolicy Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon.	2018-06-21 13:37:01 -04:00
Joey Hess	28720c795f	limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.	2018-06-16 11:57:50 -04:00
Joey Hess	c34152777b	Use http-conduit for url downloads by default, annex.web-options enables curl * For url downloads, git-annex now defaults to using a http library, rather than wget or curl. But, if annex.web-options is set, it will use curl. To use the .netrc file, run: git config annex.web-options --netrc * git-annex no longer uses wget (and wget is no longer shipped with git-annex builds). Note that curl is always run in silent mode, since the new API for download has a MeterUpdate and doesn't make way for curl progress output. It might be worth writing a parser for curl's progress output to update the meter when using it, but I didn't bother with this edge case for now. This commit was supported by the NSF-funded DataLad project.	2018-04-06 17:36:20 -04:00
Joey Hess	9b98d3f630	better HTTP connection reuse Enable HTTP connection reuse across multiple files, when git-annex uses http-conduit. Before, a new Manager was created each time Utility.Url used it. Now, a single Manager gets created the first time, so connections are reused. Doesn't help when external programs are used for url download, but does speed up addurl --fast, fsck --from web, etc. Testing fsck --fast --from web with 3 files, over high-latency satellite internet, it sped up from 19.37s to 14.96s. This commit was supported by the NSF-funded DataLad project.	2018-04-04 15:39:40 -04:00
Joey Hess	4015c5679a	force verification when resuming download When resuming a download and not using a rolling checksummer like rsync, the partial file we start with might contain garbage, in the case where a file changed as it was being downloaded. So, disabling verification on resumes risked a bad object being put into the annex. Even downloads with rsync are currently affected. It didn't seem worth the added complexity to special case those to prevent verification, especially since git-annex is using rsync less often now. This commit was sponsored by Brock Spratlen on Patreon.	2018-03-13 14:50:49 -04:00
Joey Hess	31e1adc005	deal with unlocked files P2P protocol version 1 adds VALID\|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.	2018-03-13 14:27:14 -04:00
Joey Hess	f4103744c3	make sure that lockContentShared is always paired with an inAnnex check lockContentShared had a screwy caveat that it didn't verify that the content was present when locking it, but in the most common case, eg indirect mode, it failed to lock when the content is not present. That led to a few callers forgetting to check inAnnex when using it, but the potential data loss was unlikely to be noticed because it only affected direct mode I think. Fix data loss bug when the local repository uses direct mode, and a locally modified file is dropped from a remote repsitory. The bug caused the modified file to be counted as a copy of the original file. (This is not a severe bug because in such a situation, dropping from the remote and then modifying the file is allowed and has the same end result.) And, in content locking over tor, when the remote repository is in direct mode, it neglected to check that the content was actually present when locking it. This could cause git annex drop to remove the only copy of a file when it thought the tor remote had a copy. So, make lockContentShared do its own inAnnex check. This could perhaps be optimised for direct mode, to avoid the check then, since locking the content necessarily verifies it exists there, but I have not bothered with that. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-07 14:23:52 -04:00
Joey Hess	639a6df58a	fix windows build	2017-12-05 13:11:03 -04:00
Joey Hess	1228fe8c86	honor annex.diskreserve when running youtube-dl This commit was sponsored by André Pereira on Patreon.	2017-11-30 16:14:36 -04:00
Joey Hess	2528e3ddb0	rethought --relaxed change Better to make it not be surprising and slow, than surprising and fast. --raw can be used when it needs to be really fast. Implemented adding a youtube-dl supported url to an existing file. This commit was sponsored by andrea rota.	2017-11-30 14:13:20 -04:00
Joey Hess	99bebdface	youtube-dl working Including resuming and cleanup of incomplete downloads. Still todo: --fast, --relaxed, importfeed, disk reserve checking, quvi code cleanup. This commit was sponsored by Anthony DeRobertis on Patreon.	2017-11-29 16:40:32 -04:00
Joey Hess	4e7e1fcff4	add gitAnnexTmpWorkDir and withTmpWorkDir Needed to run youtube-dl in, but could also be useful for other stuff. The tricky part of this was making the workdir be cleaned up whenever the tmp object file is cleaned up. This commit was sponsored by Ole-Morten Duesund on Patreon.	2017-11-29 13:53:39 -04:00
Joey Hess	187b3e7780	enable LambdaCase and convert around 10% of places that could use it Needs ghc 7.6.1, so minimum base version increased slightly. All builds are well above this version of ghc, and debian oldstable is as well. Code that could use lambdacase can be found by running: git grep -B 1 'case ' \| less and searching in less for "<-" This commit was sponsored by andrea rota.	2017-11-15 16:59:32 -04:00
Joey Hess	8dd84b87f9	use unix-compat 0.5 on windows Re-applying `3ec579f5e1`	2017-11-14 14:00:24 -04:00
Joey Hess	5f55082d10	Revert "use unix-compat 0.5 on windows" This reverts commit `3ec579f5e1`. Too early for this; needs newer Win32 version. Le sigh.	2017-11-09 15:14:00 -04:00
Joey Hess	3ec579f5e1	use unix-compat 0.5 on windows That version has my patches for the problems that Utility.PosixFiles was working around, so am able to get rid of that module now. This will later allow bringing back the custom-setup stanza in the cabal file. It will need to depend on unix-compat 0.5 on all OS's, which I'm not ready to do yet. This commit was sponsored by Nick Daly on Patreon.	2017-11-09 12:47:05 -04:00
Joey Hess	16eb2f976c	prevent exporttree=yes on remotes that don't support exports Don't allow "exporttree=yes" to be set when the special remote does not support exports. That would be confusing since the user would set up a special remote for exports, but `git annex export` to it would later fail. This commit was supported by the NSF-funded DataLad project.	2017-09-07 13:48:44 -04:00
Joey Hess	662f2a5ee7	git annex get from exports Straightforward enough, except for the needed belt-and-suspenders sanity checks to avoid foot shooting due to exports not being key/value stores. * Even when annex.verify=false, always verify from exports. * Only get files from exports that use a backend that supports checksum verification. * Never trust exports, even if the user says to, because then `git annex drop` would drop content if the export seemed to contain a copy. This commit was supported by the NSF-funded DataLad project.	2017-09-04 16:39:56 -04:00
Joey Hess	07f1e638ee	annex.securehashesonly Cryptographically secure hashes can be forced to be used in a repository, by setting annex.securehashesonly. This does not prevent the git repository from containing files with insecure hashes, but it does prevent the content of such files from being pulled into .git/annex/objects from another repository. We want to make sure that at no point does git-annex accept content into .git/annex/objects that is hashed with an insecure key. Here's how it was done: * .git/annex/objects/xx/yy/KEY/ is kept frozen, so nothing can be written to it normally * So every place that writes content must call, thawContent or modifyContent. We can audit for these, and be sure we've considered all cases. * The main functions are moveAnnex, and linkToAnnex; these were made to check annex.securehashesonly, and are the main security boundary for annex.securehashesonly. * Most other calls to modifyContent deal with other files in the KEY directory (inode cache etc). The other ones that mess with the content are: - Annex.Direct.toDirectGen, in which content already in the annex directory is moved to the direct mode file, so not relevant. - fix and lock, which don't add new content - Command.ReKey.linkKey, which manually unlocks it to make a copy. * All other calls to thawContent appear safe. Made moveAnnex return a Bool, so checked all callsites and made them deal with a failure in appropriate ways. linkToAnnex simply returns LinkAnnexFailed; all callsites already deal with it failing in appropriate ways. This commit was sponsored by Riku Voipio.	2017-02-27 13:33:59 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	d4fbc3b460	make --json-progress work for url downloads	2016-09-09 16:15:39 -04:00
Joey Hess	1a0e2c9901	get, move, copy, mirror: Added --failed switch which retries failed copies/moves Note that get --from foo --failed will get things that a previous get --from bar tried and failed to get, etc. I considered making --failed only retry transfers from the same remote, but it was easier, and seems more useful, to not have the same remote requirement. Noisy due to some refactoring into Types/	2016-08-03 12:37:12 -04:00
Joey Hess	9d952fe9d1	reinject: When src file's content cannot be verified, leave it alone, instead of deleting it.	2016-04-20 13:21:56 -04:00
Joey Hess	b7c8bf5274	Preserve execute bits of unlocked files in v6 mode. When annex.thin is set, adding an object will add the execute bits to the work tree file, and this does mean that the annex object file ends up executable. This doesn't add any complexity that wasn't already present, because git annex add of an executable file has always ingested it so that the annex object ends up executable. But, since an annex object file can be executable or not, when populating an unlocked file from one, the executable bit is always added or removed to match the mode of the pointer file.	2016-04-14 14:47:08 -04:00
Joey Hess	cf06dac2b8	hard links on windows * annex.thin and annex.hardlink are now supported on Windows. * unannex --fast now makes hard links on Windows.	2016-04-08 15:25:32 -04:00
Joey Hess	4b3355cf3c	refactor	2016-03-09 13:43:22 -04:00
Joey Hess	9039bdb4ea	Always try to thaw content, even when annex.crippledfilesystem is set.	2016-03-09 13:33:13 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	3b960d1422	migrate and rekey v6 unlocked file support	2016-01-07 15:14:15 -04:00
Joey Hess	b3d60ca285	use TopFilePath for associated files Fixes several bugs with updates of pointer files. When eg, running git annex drop --from localremote it was updating the pointer file in the local repository, not the remote. Also, fixes drop ../foo when run in a subdir, and probably lots of other problems. Test suite drops from ~30 to 11 failures now. TopFilePath is used to force thinking about what the filepath is relative to. The data stored in the sqlite db is still just a plain string, and TopFilePath is a newtype, so there's no overhead involved in using it in DataBase.Keys.	2016-01-05 17:22:19 -04:00
Joey Hess	a2c056df65	convert isPointerFile from Annex to IO	2016-01-01 13:22:38 -04:00
Joey Hess	2e9341a47d	fix inode cache consistency bug when a merge unlocks a present file Since the file was present and locked, its annex object was not in the inode cache. So, despite not needing to update the annex object when the clean filter is run on the content by git merge, it does need to record the inode cache of the annex object. Otherwise, the annex object will be assumed to be bad, since its inode is not cached.	2015-12-29 16:26:27 -04:00
Joey Hess	645833774d	fix windows build	2015-12-28 12:44:04 -04:00
Joey Hess	121f5d5b0c	annex.thin Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.	2015-12-27 15:59:59 -04:00
Joey Hess	cfaac52b88	populate unlocked files with newly available content when ingesting This can happen when ingesting a new file in either locked or unlocked mode, when some unlocked files in the repo use the same key, and the content was not locally available before.	2015-12-22 16:22:28 -04:00
Joey Hess	4392140946	make linkAnnex detect when the file changes as it's being copied/linked in This fixes a race where the modified file ended up in annex/objects, and the InodeCache stored in the database was for the modified version, so git-annex didn't know it had gotten modified. The race could occur when the smudge filter was running; now it gets the InodeCache before generating the Key, which avoids the race.	2015-12-22 15:20:03 -04:00
Joey Hess	f9d077186a	implemented upgrade of direct mode repo to v6	2015-12-15 16:00:26 -04:00
Joey Hess	2bc920e266	update inode cache to cover file even when nothing needs to be done to linkAnnex This covers the case where multiple files have the same content and are added with git add. Previously only the one that was linked to the annex got its inode cached; now both are.	2015-12-15 13:02:33 -04:00
Joey Hess	1dad3af3fc	checked getKeysPresent; it's ok for v6 unlocked files When a v6 unlocked files is removed from the work tree, unused doesn't show it. When it gets removed from the index, unused does show it. This is the same as a locked file.	2015-12-11 16:12:42 -04:00
Joey Hess	7790e059b2	finish v6 git-annex lock This was a doozy!	2015-12-11 15:28:34 -04:00
Joey Hess	50e83b606c	only make 1 hardlink max between pointer file and annex object If multiple files point to the same annex object, the user may want to modify them independently, so don't use a hard link. Also, check diskreserve when copying.	2015-12-11 14:00:21 -04:00
Joey Hess	c608a752a5	Merge branch 'master' into smudge	2015-12-11 13:50:31 -04:00
Joey Hess	abd66c7089	fsck: Failed to honor annex.diskreserve when checking a remote.	2015-12-11 13:50:27 -04:00
Joey Hess	c910b4e255	wip	2015-12-11 10:42:18 -04:00
Joey Hess	9dffd3d255	add generalized linkAnnex'	2015-12-10 16:08:19 -04:00
Joey Hess	f80a3d8cd0	check InodeCache in inAnnex et al This avoids querying the database when the content file doen't exist (or otherwise fails the provided check). However, it does add overhead of querying the database, and will certianly impact performance.	2015-12-10 14:51:04 -04:00
Joey Hess	2b8f6b8b2f	check inode cache in prepSendAnnex This does mean one query of the database every time an object is sent. May impact performance.	2015-12-10 14:50:52 -04:00
Joey Hess	3719d1b390	make clear when code is using deprecated direct mode files	2015-12-09 19:43:15 -04:00
Joey Hess	aa88851ec1	reorder	2015-12-09 19:38:37 -04:00

1 2 3 4 5 ...

335 commits