git-annex

Author	SHA1	Message	Date
Joey Hess	e7652b0997	implement URL to VURL migration This needs the content to be present in order to hash it. But it's not possible for a module used by Backend.URL to call inAnnex because that would entail a dependency loop. So instead, rely on the fact that Command.Migrate calls inAnnex before performing a migration. But, Command.ExamineKey calls fastMigrate and the key may or may not exist, and it's not wanting to actually perform a migration in any case. To handle that, had to add an additional value to fastMigrate to indicate whether the content is inAnnex. Factored generateEquivilantKey out of Remote.Web. Note that migrateFromURLToVURL hardcodes use of the SHA256E backend. It would have been difficult not to, given all the dependency loop issues. But --backend and annex.backend are used to tell git-annex migrate to use VURL in any case, so there's no config knob that the user could expect to configure that. Sponsored-by: Brock Spratlen on Patreon	2024-03-01 16:42:02 -04:00
Joey Hess	cc17ac423b	implement isCryptographicallySecureKey for VURL Considerable difficulty to work around an import cycle. Had to move the list of backends (except for VURL) to Backend.Variety to VURL could use it. Sponsored-by: Kevin Mueller on Patreon	2024-02-29 17:26:35 -04:00
Joey Hess	e7b7ea78af	lift isCryptographicallySecure to Annex Needed for VURL backend. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:14:13 -04:00
Joey Hess	55bf01b788	add equivilant key log for VURL keys When downloading a VURL from the web, make sure that the equivilant key log is populated. Unfortunately, this does not hash the content while it's being downloaded from the web. There is not an interface in Backend currently for incrementally hash generation, only for incremental verification of an existing hash. So this might add a noticiable delay, and it has to show a "(checksum...") message. This could stand to be improved. But, that separate hashing step only has to happen on the first download of new content from the web. Once the hash is known, the VURL key can have its hash verified incrementally while downloading except when the content in the web has changed. (Doesn't happen yet because verifyKeyContentIncrementally is not implemented yet for VURL keys.) Note that the equivilant key log file is formatted as a presence log. This adds a tiny bit of overhead (eg "1 ") per line over just listing the urls. The reason I chose to use that format is it seems possible that there will need to be a way to remove an equivilant key at some point in the future. I don't know why that would be necessary, but it seemed wise to allow for the possibility. Downloads of VURL keys from other special remotes that claim urls, like bittorrent for example, does not popilate the equivilant key log. So for now, no checksum verification will be done for those. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-02-29 16:01:49 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	24ae4b291c	addurl, importfeed: Fix failure when annex.securehashesonly is set The temporary URL key used for the download, before the real key is generated, was blocked by annex.securehashesonly. Fixed by passing the Backend that will be used for the final key into runTransfer. When a Backend is provided, have preCheckSecureHashes check that, rather than the key being transferred. Sponsored-by: unqueued on Patreon	2023-03-27 15:10:46 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	65cf6fb7fc	remove unused imports	2021-10-05 21:27:06 -04:00
Joey Hess	f2846d803c	remove old and uncessary sanity check All backends use genKeyName, which uses preSanitizeKeyName, which will escape newlines. I suspect this newline prevention was added back when git-annex was reading the output of sha1sum.. Sponsored-by: Kevin Mueller on Patreon	2021-10-05 20:25:51 -04:00
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	de482c7eeb	move verifyKeyContent to Annex.Verify The goal is that Database.Keys be able to use it; it can't use Annex.Content.Presence due to an import loop. Several other things also needed to be moved to Annex.Verify as a conseqence.	2021-07-27 14:07:23 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	c4cc2cdf4c	rename getKey to genKey for consistency with external backend protocol	2020-07-20 14:06:05 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	96aba8eff7	Revert "cache the serialization of a Key" This reverts commit `4536c93bb2`. That broke Read/Show of a Key, and unfortunately Key is read in at least one place; the GitAnnexDistribution data type. It would be worth bringing this optimisation back, but it would need either a custom Read/Show instance that preserves back-compat, or wrapping Key in a data type that contains the serialization, or changing how GitAnnexDistribution is serialized. Also, the Eq instance would need to compare keys with and without a cached seralization the same.	2019-01-16 16:21:59 -04:00
Joey Hess	4536c93bb2	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. It means that every place a Key has any of its fields changed, the cache has to be dropped. I've grepped and found them all. But, it would be better to avoid that gotcha somehow..	2019-01-14 16:37:28 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	fc845e6530	more lambda-case conversion	2017-12-05 15:00:50 -04:00
Joey Hess	4c1e3210fa	annex.backend is the new name for what was annex.backends It takes a single key-value backend, rather than the unncessary and confusing list. The old option still works if set. Simplified some old old code too. This commit was sponsored by Thomas Hochstein on Patreon.	2017-05-09 15:04:07 -04:00
Joey Hess	de43d1d407	convert error to giveup	2017-03-01 12:35:16 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	cdd27b8920	reorg	2015-12-15 15:34:28 -04:00
Joey Hess	78a6b8ce05	refactor and improve pointer file handling code	2015-12-09 14:27:43 -04:00
Joey Hess	664cc987e8	support pointer files Backend.lookupFile is changed to always fall back to catKey when operating on a file that's not a symlink. catKey is changed to understand pointer files, as well as annex symlinks. Before, catKey needed a file mode witness, to be sure it was looking at a symlink. That was complicated stuff. Now, it doesn't actually care if a file in git is a symlink or not; in either case asking git for the content of the file will get the pointer to the key. This does mean that git-annex will treat a link foo -> WORM--bar as a git-annex file, and also treats a regular file containing annex/objects/WORM--bar as a git-annex file. Calling catKey could make git-annex commands need to do more work than before. This would especially be the case if a repo contained many regular files, and only a few annexed files, as now git-annex will need to ask git about the contents of the regular files.	2015-12-07 15:35:36 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	89416ba2d9	only chunk stable keys The content of unstable keys can potentially be different in different repos, so eg, resuming a chunked upload started by another repo would corrupt data.	2014-07-30 10:34:39 -04:00
Joey Hess	13bbb61a51	add key stability checking interface Needed for resuming from chunks. Url keys are considered not stable. I considered treating url keys with a known size as stable, but just don't feel that is enough information.	2014-07-27 12:33:46 -04:00
Joey Hess	aad8cfe718	use map for faster backend name lookup	2014-07-27 12:24:12 -04:00
Joey Hess	e880d0d22c	replace (Key, Backend) with Key Only fsck and reinject and the test suite used the Backend, and they can look it up as needed from the Key. This simplifies the code and also speeds it up. There is a small behavior change here. Before, all commands would warn when acting on an annexed file with an unknown backend. Now, only fsck and reinject show that warning.	2014-04-17 18:03:39 -04:00
Joey Hess	a05b763b01	Added SKEIN256 and SKEIN512 backends SHA3 is still waiting for final standardization. Although this is looking less likely given https://www.cdt.org/blogs/joseph-lorenzo-hall/2409-nist-sha-3 In the meantime, cryptohash implements skein, and it's used by some of the haskell ecosystem (for yesod sessions, IIRC), so this implementation is likely to continue working. Also, I've talked with the cryprohash author and he's a reasonable guy. It makes sense to have an alternate high security hash, in case some horrible attack is found against SHA2 tomorrow, or in case SHA3 comes out and worst fears are realized. I'd also like to support using skein for HMAC. But no hurry there and a new version of cryptohash has much nicer HMAC code, so I will probably wait until I can use that version.	2013-10-01 20:34:36 -04:00
Joey Hess	8a5b397ac4	hlint	2013-04-03 03:52:41 -04:00
Joey Hess	38d61f934d	Update working tree files fully atomically This avoids commit churn by the assistant when eg, replacing a file with a symlink. But, just as importantly, it prevents the working tree being left with a deleted file if git-annex, or perhaps the whole system, crashes at the wrong time. (It also probably avoids confusing displays in file managers.)	2013-04-02 15:02:00 -04:00
Joey Hess	d7c93b8913	fully support core.symlinks=false in all relevant symlink handling code Refactored annex link code into nice clean new library. Audited and dealt with calls to createSymbolicLink. Remaining calls are all safe, because: Annex/Link.hs: ( liftIO $ createSymbolicLink linktarget file only when core.symlinks=true Assistant/WebApp/Configurators/Local.hs: createSymbolicLink link link test if symlinks can be made Command/Fix.hs: liftIO $ createSymbolicLink link file command only works in indirect mode Command/FromKey.hs: liftIO $ createSymbolicLink link file command only works in indirect mode Command/Indirect.hs: liftIO $ createSymbolicLink l f refuses to run if core.symlinks=false Init.hs: createSymbolicLink f f2 test if symlinks can be made Remote/Directory.hs: go [file] = catchBoolIO $ createSymbolicLink file f >> return True fast key linking; catches failure to make symlink and falls back to copy Remote/Git.hs: liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True ditto Upgrade/V1.hs: liftIO $ createSymbolicLink link f v1 repos could not be on a filesystem w/o symlinks Audited and dealt with calls to readSymbolicLink. Remaining calls are all safe, because: Annex/Link.hs: ( liftIO $ catchMaybeIO $ readSymbolicLink file only when core.symlinks=true Assistant/Threads/Watcher.hs: ifM ((==) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) code that fixes real symlinks when inotify sees them It's ok to not fix psdueo-symlinks. Assistant/Threads/Watcher.hs: mlink <- liftIO (catchMaybeIO $ readSymbolicLink file) ditto Command/Fix.hs: stopUnless ((/=) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) $ do command only works in indirect mode Upgrade/V1.hs: getsymlink = takeFileName <$> readSymbolicLink file v1 repos could not be on a filesystem w/o symlinks Audited and dealt with calls to isSymbolicLink. (Typically used with getSymbolicLinkStatus, but that is just used because getFileStatus is not as robust; it also works on pseudolinks.) Remaining calls are all safe, because: Assistant/Threads/SanityChecker.hs: \| isSymbolicLink s -> addsymlink file ms only handles staging of symlinks that were somehow not staged (might need to be updated to support pseudolinks, but this is only a belt-and-suspenders check anyway, and I've never seen the code run) Command/Add.hs: if isSymbolicLink s \|\| not (isRegularFile s) avoids adding symlinks to the annex, so not relevant Command/Indirect.hs: \| isSymbolicLink s -> void $ flip whenAnnexed f $ only allowed on systems that support symlinks Command/Indirect.hs: whenM (liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f) $ do ditto Seek.hs:notSymlink f = liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f used to find unlocked files, only relevant in indirect mode Utility/FSEvents.hs: \| Files.isSymbolicLink s = runhook addSymlinkHook $ Just s Utility/FSEvents.hs: \| Files.isSymbolicLink s -> Utility/INotify.hs: \| Files.isSymbolicLink s -> Utility/INotify.hs: checkfiletype Files.isSymbolicLink addSymlinkHook f Utility/Kqueue.hs: \| Files.isSymbolicLink s = callhook addSymlinkHook (Just s) change all above are lower-level, not relevant Audited and dealt with calls to isSymLink. Remaining calls are all safe, because: Annex/Direct.hs: \| isSymLink (getmode item) = This is looking at git diff-tree objects, not files on disk Command/Unused.hs: \| isSymLink (LsTree.mode l) = do This is looking at git ls-tree, not file on disk Utility/FileMode.hs:isSymLink :: FileMode -> Bool Utility/FileMode.hs:isSymLink = checkMode symbolicLinkMode low-level Done!!	2013-02-17 16:43:14 -04:00
Joey Hess	5ea4b91fb4	start to support core.symlinks=false Utility functions to handle no symlink mode, and converted Annex.Content to use them; still many other places to convert.	2013-02-15 16:03:11 -04:00
Joey Hess	1cdf2b923d	assistant: Make expensive transfer scan work fully in direct mode. The expensive scan uses lookupFile, but in direct mode, that doesn't work for files that are present. So the scan was not finding things that are present that need to be uploaded. (It did find things not present that needed to be downloaded.) Now lookupFile also works in direct mode. Note that it still prefers symlinks on disk to info committed to git, in direct mode. This is necessary to make things like Assistant.Threads.Watcher.onAddSymlink work correctly, when given a new symlink not yet checked into git (or replacing a file checked into git).	2013-01-05 15:57:53 -04:00
Joey Hess	4008590c68	type based git config handling for remotes Still a couple of places that use git config ad-hoc, but this is most of it done.	2013-01-01 13:58:14 -04:00
Joey Hess	7f7c31df1c	type based git config handling Now there's a Config type, that's extracted from the git config at startup. Note that laziness means that individual config values are only looked up and parsed on demand, and so we get implicit memoization for all of them. So this is not only prettier and more type safe, it optimises several places that didn't have explicit memoization before. As well as getting rid of the ugly explicit memoization code. Not yet done for annex.<remote>.* configuration settings.	2012-12-29 23:10:18 -04:00
Joey Hess	e7b8cb0063	direct mode committing	2012-12-12 19:20:38 -04:00
Joey Hess	ec0bac9d73	more indentation. must stop.	2012-10-28 22:09:09 -04:00
Joey Hess	e0fdfb2e70	maintain set of files pendingAdd Kqueue needs to remember which files failed to be added due to being open, and retry them. This commit gets the data in place for such a retry thread. Broke KeySource out into its own file, and added Eq and Ord instances so it can be stored in a Set.	2012-06-20 16:31:46 -04:00
Joey Hess	d3cee987ca	separate source of content from the filename associated with the key when generating a key This already made migrate's code a lot simpler.	2012-06-05 19:51:03 -04:00
Joey Hess	6fd83851c1	Fix display of warning message when encountering a file that uses an unsupported backend.	2012-05-31 21:03:24 -04:00

1 2 3 4

155 commits