git-annex

Author	SHA1	Message	Date
Joey Hess	6a97ff6b3a	wip RawFilePath Goal is to make git-annex faster by using ByteString for all the worktree traversal. For now, this is focusing on Command.Find, in order to benchmark how much it helps. (All other commands are temporarily disabled) Currently in a very bad unbuildable in-between state.	2019-11-25 16:18:19 -04:00
Joey Hess	ddf6973d22	minor optimisation avoid repeated scan of the same bytestring	2019-11-22 19:13:05 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	94c75d2bd9	init: Fix a reversion that broke initialization on systems that need to use pid locking This brings back .git/annex/misctmp, but only for init. If an init is interrupted while probing using that temp directory, the files it left will get deleted 1 week later by a subsequent git-annex run.	2019-09-10 13:37:07 -04:00
Joey Hess	97fd9da6e7	add back non-preferred files to imported tree Prevents merging the import from deleting the non-preferred files from the branch it's merged into. adjustTree previously appended the new list of items to the old, which could result in it generating a tree with multiple files with the same name. That is not good and confuses some parts of git. Gave it a function to resolve such conflicts. That allowed dealing with the problem of what happens when the import contains some files (or subtrees) with the same name as files that were filtered out of the export. The files from the import win.	2019-05-20 16:43:52 -04:00
Joey Hess	2d33122215	avoid ingest lockdown file escaping the withOtherTmp call Fixes bug that caused git-annex to fail to add a file when another git-annex process cleaned up the temp directory it was using. Solution is just to push withOtherTmp out to a higher level, so that the whole ingest process can be completed inside it. But in the assistant, that was not practical to do, since withOtherTmp runs in the Annex monad and the assistant does not. Worked around by introducing a separate temp directory that only the assistant uses for lockdown. Since only one assistant can run at a time, it's easy to clean up that directory of old cruft at startup.	2019-05-07 13:04:57 -04:00
Joey Hess	b03e65d260	Improved locking when multiple git-annex processes are writing to the .git/index file	2019-05-06 15:15:12 -04:00
Joey Hess	6babb2c73f	remove wrong uniqueness constraint from ContentIdentifier db Fix bug that caused importing from a special remote to repeatedly download unchanged files when multiple files in the remote have the same content. Unfortunately, there's really no good way to remove a uniqueness constraint from a sqlite database. The best that can be done is to make a new table and copy the data over. But that would require using persistent's migrations or raw sql, and I don't want to do either. Instead, a sledgehammer approach: Renamed .git/annex/cid to .git/annex/cids. When the new database doesn't exist, it will be populated from the git-annex branch. Noting deletes the old database. Don't want to delete it out from under some long-running git-annex process that might be using it. It could eventually be deleted. But this is such a new feature, probably few repos have the database in any case.	2019-04-09 19:58:24 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	e3a704224f	fix export db locking deadlock	2019-03-07 16:06:02 -04:00
Joey Hess	a818bc5e73	add Database.ContentIdentifier Does not yet have a way to update with new information from the git-annex branch, which will be needed when multiple repos are importing from the same remote.	2019-02-20 16:59:10 -04:00
Joey Hess	d5f2463702	misctmp cleanup * Switch to using .git/annex/othertmp for tmp files other than partial downloads, and make stale files left in that directory when git-annex is interrupted be cleaned up promptly by subsequent git-annex processes. * The .git/annex/misctmp directory is no longer used and git-annex will delete anything lingering in there after it's 1 week old. Also, in Annex.Ingest, made the filename it uses in the tmp dir be prefixed with "ingest-" to avoid potentially using a filename used by some other code.	2019-01-17 16:02:22 -04:00
Joey Hess	96aba8eff7	Revert "cache the serialization of a Key" This reverts commit `4536c93bb2`. That broke Read/Show of a Key, and unfortunately Key is read in at least one place; the GitAnnexDistribution data type. It would be worth bringing this optimisation back, but it would need either a custom Read/Show instance that preserves back-compat, or wrapping Key in a data type that contains the serialization, or changing how GitAnnexDistribution is serialized. Also, the Eq instance would need to compare keys with and without a cached seralization the same.	2019-01-16 16:21:59 -04:00
Joey Hess	2be6130053	better function name	2019-01-14 20:59:09 -04:00
Joey Hess	1b6319a2c8	double speed of keyFile Optimising for the common case of nothing needing to be escaped, from 5.434 μs to 1.727 μs. In the uncommon case, it only runs around 70 ns slower.	2019-01-14 20:52:54 -04:00
Joey Hess	d9a33d98cf	remove unused import	2019-01-14 18:29:10 -04:00
Joey Hess	d5bbf123fd	bugfix The first item in the list from split '&' did not start with a '&'	2019-01-14 17:42:18 -04:00
Joey Hess	e0c4ac99b5	convert serializeKey' to strict ByteString The builder produces a lazy ByteString, and L.toStrict has to copy it, but needing to use the builder is no longer to common case; the serialization will normally be cached already as a strict ByteString, and this avoids keyFile' needing to use L.toStrict . serializeKey'	2019-01-14 17:03:46 -04:00
Joey Hess	0a8d93cb8a	convert to ByteString	2019-01-14 14:02:47 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	917a2c6095	defer updating unlocked files until after smudge filter The smuge filter no longer provides git with annexed file content, to avoid a git memory leak, and because that did not honor annex.thin. git annex smudge --update has to be run after a checkout to update unlocked files in the working tree with annexed file contents. No hooks yet to run it. This commit was sponsored by Nick Piper on Patreon.	2018-10-25 15:08:20 -04:00
Joey Hess	18ecf41917	avoid running reconcileStaged when the index has not changed This commit was supported by the NSF-funded DataLad project.	2018-08-22 13:04:12 -04:00
Joey Hess	99bebdface	youtube-dl working Including resuming and cleanup of incomplete downloads. Still todo: --fast, --relaxed, importfeed, disk reserve checking, quvi code cleanup. This commit was sponsored by Anthony DeRobertis on Patreon.	2017-11-29 16:40:32 -04:00
Joey Hess	4e7e1fcff4	add gitAnnexTmpWorkDir and withTmpWorkDir Needed to run youtube-dl in, but could also be useful for other stuff. The tricky part of this was making the workdir be cleaned up whenever the tmp object file is cleaned up. This commit was sponsored by Ole-Morten Duesund on Patreon.	2017-11-29 13:53:39 -04:00
Joey Hess	f4be3c3f89	merge changes made on other repos into ExportTree Now when one repository has exported a tree, another repository can get files from the export, after syncing. There's a bug: While the database update works, somehow the database on disk does not get updated, and so the database update is run the next time, etc. Wasn't able to figure out why yet. This commit was sponsored by Ole-Morten Duesund on Patreon.	2017-09-18 19:21:41 -04:00
Joey Hess	486902389d	lock to avoid more than one export to a remote at a time This commit was sponsored by Jack Hill on Patreon.	2017-09-18 12:38:07 -04:00
Joey Hess	7eb9889bfd	track exported files in a sqlite database Went with a separate db per export remote, rather than a single export database. Mostly because there will probably not be a lot of separate export remotes, and it might be convenient to be able to delete a given remote's export database. This commit was supported by the NSF-funded DataLad project.	2017-09-04 13:53:08 -04:00
Joey Hess	96c055eda2	migrate: WORM keys containing spaces will be migrated to not contain spaces anymore To work around the problem that the external special remote protocol does not support keys containing spaces. This commit was sponsored by Denis Dzyubenko on Patreon.	2017-08-17 15:09:38 -04:00
Joey Hess	51801cff6a	Prevent spaces from being embedded in the name of new WORM keys, as that handing spaces in keys would complicate things like the external special remote protocol.	2017-08-17 14:46:33 -04:00
Joey Hess	18b9a4b802	remove absNormPathUnix again Moving toward dropping MissingH dep. I think I've addressed the problem identified earlier in `09a66f702d`. On Windows, absPathFrom "/tmp/repo/xxx" "y/bar" would be "/tmp/repo/xxx\\y/bar", which then confuses relPathDirToFile. Fixed by converting to unix (git) style paths. Also, relPathDirToFile was splitting only on \\ on windows and not / which broke the example in `09a66f702d` of relPathDirToFile (absPathFrom "/tmp/repo/xxx" "y/bar") "/tmp/repo/.git/annex/objects/xxx" Now, on windows, that will yield "..\\..\\..\\.git/annex/objects/xxx" which once converted to unix style paths is what we want.	2017-05-15 21:35:35 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	ca0daa8bb8	factor non-type stuff out of Key	2017-02-24 13:42:30 -04:00
Joey Hess	48d9624a2d	Revert ServerAliveInterval Revert ServerAliveInterval change in 6.20161111, which caused problems with too many old versions of ssh and unusual ssh configurations. It should have not been needed anyway since ssh is supposted to have TCPKeepAlive enabled by default.	2016-12-13 12:12:38 -04:00
Joey Hess	0ae08947ac	Run ssh with ServerAliveInterval 60 So that stalled transfers will be noticed within about 3 minutes, even if TCPKeepAlive is disabled or doesn't work. Rather than setting with -o, use -F with another config file, so that any settings in ~/.ssh/config or /etc/ssh/ssh_config overrides this.	2016-10-26 16:41:34 -04:00
Joey Hess	8794dcf27b	Optimisations to time it takes git-annex to walk working tree and find files to work on. Sped up by around 18%. key2file and file2key were top cost centers according to profiling. The repeated use of replace was not efficient. This new approach is quite a lot more efficient. This commit was sponsored by Denis Dzyubenko on Patreon.	2016-09-26 16:48:57 -04:00
Joey Hess	154c939830	Speed up startup time by caching the refs that have been merged into the git-annex branch. This can speed up git-annex commands by as much as a second, depending on the number of remotes.	2016-07-17 12:24:34 -04:00
Yaroslav Halchenko	64e844e1fe	minor typo fixes throughout problematic flexibility	2016-06-02 11:22:18 -04:00
Joey Hess	d56175164b	avoid checking locations in regular repo In commit `2d00523609` I accidentially made gitAnnexLocation do more work, checking content locations, when used in a regular repo.	2016-05-16 17:19:07 -04:00
Joey Hess	eda5d9cc74	adjust: Add --fix adjustment, which is useful when the git directory is in a nonstandard place.	2016-05-16 17:18:33 -04:00
Joey Hess	2d00523609	In the unusual configuration where annex.crippledfilesystem=true but core.symlinks=true, store object contents in mixed case hash directories so that symlinks will point to them. Contents are searched for in both locations, same as before, so this does not add any overhead.	2016-05-10 15:00:22 -04:00
Joey Hess	b9e4e2ba84	new method for merging changes into adjusted branch that avoids unncessary merge conflicts Still needs work when there are actual merge conflicts.	2016-04-06 15:36:18 -04:00
Joey Hess	f9d79d194b	Windows: Fix v6 unlocked files to actually work. Pointer files were not being treated as annex content, so "git annex get" didn't replace them with the object.	2016-02-15 16:12:18 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00

44 commits