git-annex

Author	SHA1	Message	Date
Joey Hess	0b307f43e1	avoid accidental Show of VectorClock Removed its Show instance.	2017-08-14 14:51:54 -04:00
Joey Hess	2cecc8d2a3	Added GIT_ANNEX_VECTOR_CLOCK environment variable Can be used to override the default timestamps used in log files in the git-annex branch. This is a dangerous environment variable; use with caution. Note that this only affects writing to the logs on the git-annex branch. It is not used for metadata in git commits (other env vars can be set for that). There are many other places where timestamps are still used, that don't get committed to git, but do touch disk. Including regular timestamps of files, and timestamps embedded in some files in .git/annex/, including the last fsck timestamp and timestamps in transfer log files. A good way to find such things in git-annex is to get for getPOSIXTime and getCurrentTime, although some of the results are of course false positives that never hit disk (unless git-annex gets swapped out..) So this commit does NOT necessarily make git-annex comply with some HIPPA privacy regulations; it's up to the user to determine if they can use it in a way compliant with such regulations. Benchmarking: It takes 0.00114 milliseconds to call getEnv "GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log files can be written with an added overhead of only 0.114 seconds. That should be by far swamped by the actual overhead of writing the log files and making the commit containing them. This commit was supported by the NSF-funded DataLad project.	2017-08-14 14:19:58 -04:00
Joey Hess	bcf276655c	Keys marked as dead are now skipped by --all. fsck already special-cased dead keys to make --all not report errors with them, and it makes sense to also expand that to whereis. I think it makes sense for dead keys to be skipped by all uses of --all, so mistakes can be completely forgotten about and not come back to haunt us. The speed impact of testing if the key is dead is negligible for fsck and whereis, since they use the location log anyway and it gets cached. This does slow down a few commands that support --all, in particular metadata --all runs around 2x as slow. I don't think metadata --all is often used though. It might slow down copy/move/mirror --all and get --all. log --all is not affected (does not use the normal --all machinery). Dead keys will still be processed by --incomplete, --branch, --failed, and --key. Although it would be unlikely for a dead key to ave in incomplete or failed transfer. It seems to make perfect sense for --branch to process keys on the branch, even if dead. (fsck's special-casing of dead keys was left in, so if one of these options causes a dead key to be fscked, there will be a nice message.) This commit was supported by the NSF-funded DataLad project.	2017-05-09 12:55:21 -04:00
Joey Hess	c3970f6c1a	multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting. This commit was supported by the NSF-funded DataLad project.	2017-03-30 19:35:30 -04:00
Joey Hess	1c4e5f65fc	Drop support for building with old versions of directory, feed, and http-types.	2017-03-10 15:57:41 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	27eca014be	fix up Read instance incompatability caused by recent commit `9c4650358c` changed the Read instance for Key. I've checked all uses of that instance (by removing it and seeing what breaks), and they're all limited to the webapp, except one. That is GitAnnexDistribution's Read instance. So, `9c4650358c` would have broken upgrades of git-annex from downloads.kitenet.net. Once the .info files there got updated for a new release, old releases would have failed to parse them and never upgraded. To fix this, I found a way to make the .info files that contain GitAnnexDistribution values be readable by the old version of git-annex. This commit was sponsored by Ewen McNeill.	2017-02-24 18:59:12 -04:00
Joey Hess	ed56dba868	annex.autocommit can be configured via git-annex config ... to control the default behavior in all clones of a repository. This includes a new Configurable data type, so the GitConfig type indicates which values can be configured this way. The implementation should be quite efficient; the config log is only read once, and only when a Configurable value has not already been set by git-config. Indeed, it would be nice in the future to extend this, so that git-config is itself only read on demand. Some commands may not need to look at the git configuration at all. This commit was sponsored by Trenton Cronholm on Patreon.	2017-02-03 13:58:53 -04:00
Joey Hess	9eb10caa27	Some optimisations to string splitting code. Turns out that Data.List.Utils.split is slow and makes a lot of allocations. Here's a much simpler single character splitter that behaves the same (even in wacky corner cases) while running in half the time and 75% the allocations. As well as being an optimisation, this helps move toward eliminating use of missingh. (Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and allocates even more.) I have not benchmarked the effect on git-annex, but would not be surprised to see some parsing of eg, large streams from git commands run twice as fast, and possibly in less memory. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2017-01-31 19:06:22 -04:00
Joey Hess	5676d267b5	forgot to add this new source file	2017-01-30 17:36:45 -04:00
Joey Hess	26d23e38f1	vicfg: Include the numcopies configuation. Docs say vicfg can configure everything from git-annex branch, so it ought to configure numcopies. Note that commenting out existing numcopies does not unset it. This commit was sponsored by Thom May on Patreon.	2017-01-30 15:27:25 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	7cae6c746c	Optimised git-annex branch log file timestamp parsing. 10% speedup This sped up git annex find --not --in web from 6.64s to 5.69s. The optimised parser is probably more like 50% faster than the general one it replaced.	2016-09-29 14:04:53 -04:00
Joey Hess	c9082cf0e4	move Arbitrary instance to new Types.Transfer module Avoid orphan instance warning	2016-09-05 14:52:06 -04:00
Joey Hess	d17f08afdc	avoid warning about orphan Arbirary instance	2016-09-05 14:51:07 -04:00
Joey Hess	1a0e2c9901	get, move, copy, mirror: Added --failed switch which retries failed copies/moves Note that get --from foo --failed will get things that a previous get --from bar tried and failed to get, etc. I considered making --failed only retry transfers from the same remote, but it was easier, and seems more useful, to not have the same remote requirement. Noisy due to some refactoring into Types/	2016-08-03 12:37:12 -04:00
Joey Hess	c4d011bf3e	log: Added --all option.	2016-07-17 15:15:08 -04:00
Joey Hess	176cd98293	remove \r from Arbitrary for log tests	2016-05-27 12:04:49 -04:00
Joey Hess	eba68572dc	Split lines in the git-annex branch on \r as well as \n, to deal with \r\n terminated lines written by some versions of git-annex on Windows. This fixes strange displays in some cases, including whereis showing many duplicate locations, and showing more total copies than actually exist. It's unknown if that lead to data loss when eg, dropping. At the moment, it seems unlikely it could, since the UUID with \r's appended is not the same as a UUID without, and so no remote matches it. It's also unknown if \r's can leak in on windows, perhaps when merging the git-annex branch.	2016-05-27 11:45:13 -04:00
Joey Hess	823c28d2dc	nub transitionList to avoid ugly message after repeated transitions, and avoid redundant work for repeated ForgetDeadRemotes transitions	2016-05-18 12:26:38 -04:00
Joey Hess	8ab27235ea	reinject: Added new mode which can reinject known files into the annex. For example: git-annex reinject --known /mnt/backup/*	2016-04-22 13:49:32 -04:00
Joey Hess	403b56fb91	Limit annex.largefiles parsing to the subset of preferred content expressions that make sense in its context. So, not "standard" or "lackingcopies", etc.	2016-02-03 15:04:42 -04:00
Joey Hess	cdf5977053	simplify	2016-02-03 13:23:34 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	1f3358512a	refactor	2016-01-19 15:55:32 -04:00
Joey Hess	983c1894eb	avoid unnecessary reading of git-annex branch data when matching on annex.largefiles This makes git annex clean not look at the git-annex branch at all, and so speeds it up by 50% or more.	2015-12-04 15:06:41 -04:00
Joey Hess	b0626230b7	fix use of hifalutin terminology	2015-11-16 14:37:31 -04:00
Joey Hess	aaf1ef268d	convert from Utility.LockPool to Annex.LockPool everywhere	2015-11-12 18:13:37 -04:00
Joey Hess	f9adb905fc	Avoid unncessary write to the location log when a file is unlocked and then added back with unchanged content. Implemented with no additional overhead of compares etc. This is safe to do for presence logs because of their locality of change; a given repo's presence logs are only ever changed in that repo, or in a repo that has just been actively changing the content of that repo. So, we don't need to worry about a split-brain situation where there'd be disagreement about the location of a key in a repo. And so, it's ok to not update the timestamp when that's the only change that would be made due to logging presence info.	2015-10-12 14:46:47 -04:00
Joey Hess	6fbabfcf16	oops, didn't mean to commit this debug	2015-10-06 17:28:20 -04:00
Joey Hess	ba7ecf68c0	analysis	2015-10-06 17:11:52 -04:00
Joey Hess	16947ef654	Fix bug in combination of preferred and required content settings. When one was set to the empty string and the other set to some expression, this bug caused all files to be wanted, instead of only files matching the expression. Avoid: MAny `MOr` otherexpression Which matches anything.	2015-09-15 12:50:14 -04:00
Joey Hess	6e829939e9	add test case that all standard group preferred content expressions parse	2015-06-17 13:44:19 -04:00
Joey Hess	5c960601aa	4 ns optimisation of repeated calls to hasDifference on the same Differences I want this as fast as possible, so it can be added to code paths without slowing them down. Avoid the set lookup, and rely on laziness, drops runtime from 14.37 ns to 11.03 ns according to this criterion benchmark: import Criterion.Main import qualified Types.Difference as New import qualified Types.DifferenceOld as Old main :: IO () main = defaultMain [ bgroup "hasDifference" [ bench "new" $ whnf (New.hasDifference New.OneLevelObjectHash) new , bench "old" $ whnf (Old.hasDifference Old.OneLevelObjectHash) old ] ] where s = "fromList [ObjectHashLower, OneLevelObjectHash, OneLevelBranchHash]" new = New.readDifferences s old = Old.readDifferences s A little bit of added boilerplate, but I suppose it's worth it to not need to worry about set lookup overhead. Note that adding more differences would slow down the old implementation; the new implementation will run the same speed.	2015-06-11 16:34:35 -04:00
Joey Hess	f8ab3bc449	dead --key: Can be used to mark a key as dead.	2015-06-09 14:52:05 -04:00
Joey Hess	6eefc5db65	fsck: Ignore keys that are known to be dead when running in --all mode or a in a bare repo. Otherwise, still reports files with lost contents, even if the content is dead.	2015-06-09 14:08:57 -04:00
Joey Hess	53ede1a10e	parse X in location log file as indicating a dead key A dead key is both not present at the location that thinks it has a copy, and also is assumed to probably not be present anywhere else. Although there may be lurking disconnected repos that somehow still have a copy. Suprisingly few changes needed for this! This is because the presence log code only really concerns itself with keys that are present, and dead keys are not present. Note that both the location and web log can be parsed as having a dead key. I don't see any value to having keys listed as dead in the web log, but since it doesn't change any behavior, there was no point in not parsing it.	2015-06-09 13:28:30 -04:00
Joey Hess	6383d22ffa	remove back-compat code for old version of containers Already b-d on a newer version.	2015-06-06 15:23:53 -04:00
Joey Hess	87f28bb2ea	ignore failure to clean up stale transfer lock file Perhaps due to permissions problem, or perhaps a race with another process also cleaning up.	2015-05-19 23:46:42 -04:00
Joey Hess	9de5cd2966	fix crash in stale transfer lockfile cleanup code Need to differentiate between the lockfile not being locked, and it not existing.	2015-05-19 23:35:24 -04:00
Joey Hess	ecb0d5c087	use lock pools throughout git-annex The one exception is in Utility.Daemon. As long as a process only daemonizes once, which seems reasonable, and as long as it avoids calling checkDaemon once it's already running as a daemon, the fcntl locking gotchas won't be a problem there. Annex.LockFile has it's own separate lock pool layer, which has been renamed to LockCache. This is a persistent cache of locks that persist until closed. This is not quite done; lockContent stil needs to be converted.	2015-05-19 14:09:52 -04:00
Joey Hess	6915b71c57	lock pools to work around non-concurrency/composition safety of POSIX fcntl	2015-05-18 15:57:17 -04:00
Joey Hess	7ebf234616	Stale transfer lock and info files will be cleaned up automatically when get/unused/info commands are run. Deleting lock files is tricky, tricky stuff. I think I got it right!	2015-05-12 20:11:23 -04:00
Joey Hess	643b233860	an optimization that also fixes a reversion This is a little optimisation; avoid loading the info file for the download of the current key when checking for other downloads. The reversion it fixes is sorta strange. `a812d598ef` broke checking for transfers that were already in progress. Indeed, the transfer lock was not held after getTransfers was called. Why? I think it's magic in ghc's handling of getLock and setLock, although it's hard to tell since those functions are almost entirely undocumented as to their semantics. Something, either the RTS (or maybe it's linux?) notices that the same process has taken a lock and is now calling getLock on a FD attached to the same file. So, it drops the lock. So, this optimisation avoids that problematic behavior.	2015-05-12 18:34:49 -04:00
Joey Hess	a812d598ef	Take space that will be used by running downloads into account when checking annex.diskreserve.	2015-05-12 15:20:22 -04:00
Joey Hess	03667a162a	couple of AMP warnings I missed before	2015-05-10 16:51:03 -04:00
Joey Hess	ec267aa1ea	rejigger imports for clean build with ghc 7.10's AMP changes The explict import Prelude after import Control.Applicative is a trick to avoid a warning.	2015-05-10 16:20:30 -04:00
Joey Hess	6c2d5b5e41	more time-1.5 fixes	2015-05-10 15:36:58 -04:00
Joey Hess	33a2264546	fix build warning with time 1.5	2015-05-10 15:28:23 -04:00

1 2 3 4 5 ...

318 commits