git-annex

Author	SHA1	Message	Date
Joey Hess	0b91afb57d	avoid warning	2021-06-15 11:11:55 -04:00
Joey Hess	3af4c9a29a	fix exponential blowup when adding lots of identical files This was an old problem when the files were being added unlocked, so the changelog mentions that being fixed. However, recently it's also affected locked files. The fix for locked files is kind of stupidly simple. moveAnnex already handles populating unlocked files, and only does it when the object file was not already present. So remove the redundant populateUnlockedFiles call. (That call was added all the way back in `cfaac52b88`, and has always been unncessary.) Sponsored-by: Dartmouth College's Datalad project	2021-06-15 09:45:55 -04:00
Joey Hess	014dc63a55	avoid sometimes expensive operations when annex.supportunlocked = false This will mostly just avoid a DB lookup, so things get marginally faster. But in cases where there are many files using the same key, it can be a more significant speedup. Added overhead is one MVar lookup per call, which should be small enough, since this happens after transferring or ingesting a file, which is always a lot more work than that. It would be nice, though, to move getGitConfig to AnnexRead, which there is an open todo about.	2021-06-14 12:40:41 -04:00
Joey Hess	673b2feaf3	rename for clarity Associated files are recorded now also for locked files, but this is only needed to populate unlocked files.	2021-06-14 10:55:24 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	cbf94fd13d	prep for fixing find --branch --unlocked Added LinkType to ProvidedInfo, and unified MatchingKey with ProvidedInfo. They're both used in the same way, so there was no real reason to keep separate. Note that addLocked and addUnlocked still set matchNeedsFileName, because to handle MatchingFile, they do need it. However, they don't use it when MatchingInfo is provided. This should be ok, the --branch case will be able skip checking matchNeedsFileName, since it will provide a filename in any case.	2021-03-02 13:39:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	6f78497572	When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked Missed this when implementing it because of the default case catching the new constructor. So, removed that default case to make sure future types of adjusted branches don't make the same mistake. Complicated by git-annex addurl --fast which adds the file whose content is not present, so it needs to stay unlocked when on such a branch. This commit was sponsored by Brock Spratlen on Patreon.	2021-01-28 12:47:46 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	87de360e98	populate new field	2020-12-15 10:37:07 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	92b7b1964d	add warning on add of annex link Warn when adding a annex symlink or pointer file that uses a key that is not known to the repository, to prevent confusion if the user has copied it from some other repository. This commit was sponsored by Jake Vosloo on Patreon.	2020-11-10 12:10:51 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	eb42cd4d46	more RawFilePath conversion 535/645 This commit was sponsored by Brett Eisenberg on Patreon.	2020-11-03 10:11:04 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	8b74f01a26	split ProvidedInfo and UserProvidedInfo The latter is for git-annex matchexpression and matching against it can throw an exception. Splitting out the former reduces the potential for mistakes and avoids needing to worry about matching against that throwing an exception. This is more groundwork for matching largefiles while importing, without downloading content. This commit was sponsored by Graham Spencer on Patreon.	2020-09-28 12:12:38 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	d0b06c17c0	Added --no-check-gitignore option for finer grained control than using --force. add, addurl, importfeed, import: Added --no-check-gitignore option for finer grained control than using --force. (--force is used for too many different things, and at least one of these also uses it for something else. I would like to reduce --force's footprint until it only forces drops or a few other data losses. For now, --force still disables checking ignores too.) addunused: Don't check .gitignores when adding files. This is a behavior change, but I justify it by analogy with git add of a gitignored file adding it, asking to add all unused files back should add them all back, not skip some. The old behavior was surprising. In Command.Lock and Command.ReKey, CheckGitIgnore False does not change behavior, it only makes explicit what is done. Since these commands are run on annexed files, the file is already checked into git, so git add won't check ignores.	2020-09-18 13:19:13 -04:00
Joey Hess	319f2a4afc	audit all uses of SomeException to avoid catching async exceptions Except for the assistant, which I think may use them between threads? Most of the uses of SomeException were already catching only async exceptions. But I did find a few places that were accidentially catching them.	2020-06-05 15:16:57 -04:00
Joey Hess	0a9a3ed1c3	left an unhandled case in previous commit	2020-05-15 14:31:50 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	eaa49ab53d	convert replaceFile to createDirectoryUnder Since it was used on both worktree and .git/annex files, split into multiple functions. In passing, this also improves permissions of created directories in .git/annex, using createAnnexDirectory on those.	2020-03-06 11:31:01 -04:00
Joey Hess	c31e1be781	convert KeySource to RawFilePath	2020-02-21 10:04:44 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	2d33122215	avoid ingest lockdown file escaping the withOtherTmp call Fixes bug that caused git-annex to fail to add a file when another git-annex process cleaned up the temp directory it was using. Solution is just to push withOtherTmp out to a higher level, so that the whole ingest process can be completed inside it. But in the assistant, that was not practical to do, since withOtherTmp runs in the Annex monad and the assistant does not. Worked around by introducing a separate temp directory that only the assistant uses for lockdown. Since only one assistant can run at a time, it's easy to clean up that directory of old cruft at startup.	2019-05-07 13:04:57 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	d5f2463702	misctmp cleanup * Switch to using .git/annex/othertmp for tmp files other than partial downloads, and make stale files left in that directory when git-annex is interrupted be cleaned up promptly by subsequent git-annex processes. * The .git/annex/misctmp directory is no longer used and git-annex will delete anything lingering in there after it's 1 week old. Also, in Annex.Ingest, made the filename it uses in the tmp dir be prefixed with "ingest-" to avoid potentially using a filename used by some other code.	2019-01-17 16:02:22 -04:00
Joey Hess	5d97898a7c	touch files with high-resolution timestamp Needs unix 2.7.2, but that was included in ghc 8.0.1 (and much older) so not really a new dep.	2018-10-29 22:25:21 -04:00
Joey Hess	daa259ec6a	remove unused import	2018-10-25 16:25:21 -04:00
Joey Hess	8be5a7269a	refactor getCurrentBranch Both Command.Sync and Annex.Ingest had their own versions of this. The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem to need it. That is only checking to see if it's in an adjusted unlocked branch, and when in an adjusted branch, the branch does in fact exist, so the added check that Git.Branch.current does is fine. This commit was sponsored by Denis Dzyubenko on Patreon.	2018-10-19 17:29:18 -04:00
Joey Hess	a6c8de84b6	improve types to allow combining some adjustments Combinations like --hide-misssing --unlocked seem very useful. On the other hand, combining --fix with --unlock doesn't make sense because a file can be either unlocked or a symlink that can be fixed, but not both. Changed the serialization of HideMissingAdjustment in passing, but it has not actually been used yet so nothing will be broken. This commit was sponsored by Trenton Cronholm on Patreon.	2018-10-18 12:59:05 -04:00
Joey Hess	48e9e12961	finally fixed v6 get/drop git status After updating the worktree for an add/drop, update git's index, so git status will not show the files as modified. What actually happens is that the index update removes the inode information from the index. The next git status (or similar) run then has to do some work. It runs the clean filter. So, this depends on the clean filter being reasonably fast and on git not leaking memory when running it. Both problems were fixed in `a96972015d`, but only for git 2.5. Anyone using an older git will see very expensive git status after an add/drop. This uses the same git update-index queue as other parts of git-annex, so the actual index update is fairly efficient. Of course, updating the index does still have some overhead. The annex.queuesize config will control how often the index gets updated when working on a lot of files. This is an imperfect workaround... Added several todos about new problems this workaround causes. Still, this seems a lot better than the old behavior. This commit was supported by the NSF-funded DataLad project.	2018-08-14 16:23:58 -04:00
Joey Hess	187b3e7780	enable LambdaCase and convert around 10% of places that could use it Needs ghc 7.6.1, so minimum base version increased slightly. All builds are well above this version of ghc, and debian oldstable is as well. Code that could use lambdacase can be found by running: git grep -B 1 'case ' \| less and searching in less for "<-" This commit was sponsored by andrea rota.	2017-11-15 16:59:32 -04:00
Joey Hess	a461cf2ce6	add: Replace work tree file atomically. Before, there was a window where interrupting an add could result in the file being moved into the annex, with no symlink yet created. This commit was supported by the NSF-funded DataLad project.	2017-10-16 12:57:28 -04:00
Joey Hess	07f1e638ee	annex.securehashesonly Cryptographically secure hashes can be forced to be used in a repository, by setting annex.securehashesonly. This does not prevent the git repository from containing files with insecure hashes, but it does prevent the content of such files from being pulled into .git/annex/objects from another repository. We want to make sure that at no point does git-annex accept content into .git/annex/objects that is hashed with an insecure key. Here's how it was done: * .git/annex/objects/xx/yy/KEY/ is kept frozen, so nothing can be written to it normally * So every place that writes content must call, thawContent or modifyContent. We can audit for these, and be sure we've considered all cases. * The main functions are moveAnnex, and linkToAnnex; these were made to check annex.securehashesonly, and are the main security boundary for annex.securehashesonly. * Most other calls to modifyContent deal with other files in the KEY directory (inode cache etc). The other ones that mess with the content are: - Annex.Direct.toDirectGen, in which content already in the annex directory is moved to the direct mode file, so not relevant. - fix and lock, which don't add new content - Command.ReKey.linkKey, which manually unlocks it to make a copy. * All other calls to thawContent appear safe. Made moveAnnex return a Bool, so checked all callsites and made them deal with a failure in appropriate ways. linkToAnnex simply returns LinkAnnexFailed; all callsites already deal with it failing in appropriate ways. This commit was sponsored by Riku Voipio.	2017-02-27 13:33:59 -04:00
Joey Hess	f617988a29	Make import --deduplicate and --skip-duplicates only hash once, not twice import: --deduplicate and --skip-duplicates were implemented inneficiently; they unncessarily hashed each file twice. They have been improved to only hash once. The new approach is to lock down (minimally) and hash files, and then reuse that information when importing them. This was rather tricky, especially in detecting changes to files while they are being imported. The output of import changed slightly. While before it silently skipped over files with eg --skip-duplicates, now it shows each file as it starts to act on it. Since every file is hashed first thing, it would otherwise not be clear what file import is chewing on. (Actually, it wasn't clear before when any of the duplicates switches were used.) This commit was sponsored by Alexander Thompson on Patreon.	2017-02-09 15:32:22 -04:00
Joey Hess	ee309d6941	lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.	2016-10-17 13:58:43 -04:00
Joey Hess	b6b5a11601	Make git clean filter preserve the backend that was used for a file.	2016-06-09 15:17:08 -04:00
Joey Hess	b7c8bf5274	Preserve execute bits of unlocked files in v6 mode. When annex.thin is set, adding an object will add the execute bits to the work tree file, and this does mean that the annex object file ends up executable. This doesn't add any complexity that wasn't already present, because git annex add of an executable file has always ingested it so that the annex object ends up executable. But, since an annex object file can be executable or not, when populating an unlocked file from one, the executable bit is always added or removed to match the mode of the pointer file.	2016-04-14 14:47:08 -04:00
Joey Hess	42b7ccc89f	git annex add in adjusted unlocked branch Cached the current branch lookup just because it seems unnecessary overhead to run an extra git command per add to query the current branch.	2016-03-29 13:26:06 -04:00

1 2

62 commits