git-annex

Author	SHA1	Message	Date
Joey Hess	0ed1369dcd	remove unused import	2021-06-15 11:31:59 -04:00
Joey Hess	af9fdf5dba	verify associated files when checking numcopies Most of this is just refactoring. But, handleDropsFrom did not verify that associated files from the keys db were still accurate, and has now been fixed to. A minor improvement to this would be to avoid calling catKeyFile twice on the same file, when getting the numcopies and mincopies value, in the common case where the same file has the highest value for both. But, it avoids checking every associated file, so it will scale well to lots of dups already. Sponsored-by: Kevin Mueller on Patreon	2021-06-15 11:14:52 -04:00
Joey Hess	0b91afb57d	avoid warning	2021-06-15 11:11:55 -04:00
Joey Hess	77517ab506	avoid nub It's O(N^2) which could matter when there are many dup files using the same key.	2021-06-15 10:48:11 -04:00
Joey Hess	3af4c9a29a	fix exponential blowup when adding lots of identical files This was an old problem when the files were being added unlocked, so the changelog mentions that being fixed. However, recently it's also affected locked files. The fix for locked files is kind of stupidly simple. moveAnnex already handles populating unlocked files, and only does it when the object file was not already present. So remove the redundant populateUnlockedFiles call. (That call was added all the way back in `cfaac52b88`, and has always been unncessary.) Sponsored-by: Dartmouth College's Datalad project	2021-06-15 09:45:55 -04:00
Joey Hess	e147ae07f4	remove supportUnlocked check that is not worth its overhead moveAnnex only gets to that check if the object file was not present before. So in the case where dup files are being added repeatedly, it will only run the first time, and so there's no significant speedup from doing it; all it avoids is a single sqlite lookup. Since MVar accesses do have overhead, it's better to optimise for the common case, where unlocked files are supported. removeAnnex is less clear cut, but I think mostly is skipped running on keys when the object has already been dropped, so similar reasoning applies.	2021-06-15 09:28:56 -04:00
Joey Hess	dcd2c95249	fix windows build	2021-06-14 12:43:26 -04:00
Joey Hess	014dc63a55	avoid sometimes expensive operations when annex.supportunlocked = false This will mostly just avoid a DB lookup, so things get marginally faster. But in cases where there are many files using the same key, it can be a more significant speedup. Added overhead is one MVar lookup per call, which should be small enough, since this happens after transferring or ingesting a file, which is always a lot more work than that. It would be nice, though, to move getGitConfig to AnnexRead, which there is an open todo about.	2021-06-14 12:40:41 -04:00
Joey Hess	c4f1465a81	check symlink before reading file This is faster because when multiple files are in a directory, it gets cached.	2021-06-14 11:53:51 -04:00
Joey Hess	26a9ea12d1	handle edge case of symlink to something that is not really a pointer file That seems very unlikely to happen, but still, it's possible it could. And with the recent addition of locked files to the keys db, this could be called by places that did not call it before, so it seems even more important it's correct. Adds an extra stat of the file, and is potentially racy, but both problems are fixed by the unix-2.8.0 path. I have not tested that path builds because that package is not yet released and it would be difficult to install it since it's tightly tied to a ghc version.	2021-06-14 11:35:52 -04:00
Joey Hess	673b2feaf3	rename for clarity Associated files are recorded now also for locked files, but this is only needed to populate unlocked files.	2021-06-14 10:55:24 -04:00
Joey Hess	7b6deb1109	display scanning message whenever reconcileStaged has enough files to chew on Clear visible progress bar first. Removed showSideActionAfter because it can't be used in reconcileStaged (import loop). Instead, it counts the number of files it processes and displays it after it's seen a sufficient to know it's taking a while. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 12:48:30 -04:00
Joey Hess	13b9a288d3	scanAnnexedFiles in smudge --update This makes git checkout and git merge hooks do the work to catch up with changes that they made to the tree. Rather than doing it at some later point when the user is not thinking about that past operation. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:37:47 -04:00
Joey Hess	7f742589f9	claw back annexed file scan speedup Following commit `c941ab6f5b`, this avoids the second, redundant scan when annex.thin is not set. The benchmark now runs in 35.5 seconds, down from 40 seconds. Note that the inode cache of the annex object has to be passed to addInodeCaches now, because it might not already be in the inode caches, unlike previously. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:09:15 -04:00
Joey Hess	c941ab6f5b	avoid double work in git-annex init, second try reconcileStaged populates the db, so scanAnnexedFiles does not need to do it again. It still makes a pass over the HEAD tree, but populating the db was most of the expensive part. Benchmarking with 100,000 files, git-annex init now takes 40 seconds, vs 37 seconds with the old, buggy version of this fix. It should be possible to win those 3 precious seconds per 100k files back, in the case when when annex.thin is not set, with improvements to reconcileStaged that avoid needing this second pass. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 09:36:53 -04:00
Joey Hess	2cb7b7b336	Revert "avoid double work in git-annex init" This reverts commit `0f10f208a7`. The implementation of this turns out to be unsafe; it can lead to a keys db deadlock. scanAnnexedFiles injects a call to inAnnex into reconcileStaged, but inAnnex sometimes needs to read from the keys db, which will try to re-open it when it's in the process of being opened. The exclusive lock of gitAnnexKeysDbLock will then deadlock. This needs to be done in some other way...	2021-06-08 09:11:24 -04:00
Joey Hess	0f10f208a7	avoid double work in git-annex init reconcileStaged was doing a redundant scan to scannAnnexedFiles. It would probably make sense to move the body of scannAnnexedFiles into reconcileStaged, the separation does not really serve any purpose. Sponsored-by: Dartmouth College's Datalad project	2021-06-07 16:50:14 -04:00
Joey Hess	0434674c85	avoid displaying the scanning annexed files message when repo is not large Avoids users thinking this scan is a big deal, when it's not in the majority of repos. showSideActionAfter has some ugly caveats, since it has to display in the background of another action. I could not see a better way to do it and it works fine in this particular case. It also doesn't really belong in Annex.Concurrent, but cannot go in Messages due to an import loop. Sponsored-by: Dartmouth College's Datalad project	2021-06-04 13:16:48 -04:00
Joey Hess	0f54e5e0ae	speed up initial scanning for annexed files Streaming through git this way speeds it up by around 25%. This is similar to the optimisations of seeking annexed files. Sponsored-by: Dartmouth College's Datalad project	2021-05-31 14:29:34 -04:00
Joey Hess	aa00e171cb	annex.supportunlocked should not prevent scan for annexed files That scan used to be only for unlocked files, but no longer..	2021-05-31 10:51:39 -04:00
Joey Hess	189fb05ffb	Added annex.adviceNoSshCaching config. Sponsored-by: Brock Spratlen on Patreon	2021-05-27 12:37:49 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	f46e4c9b7c	fix case where keys db was not initialized in time When the keys db is opened for read, and did not exist yet, it used to skip creating it, and return mempty values. But that prevents reconcileStaged from populating associated files information in time for the read. This fixes the one remaining case I know of where the fix in `a56b151f90` didn't work. Note that, when there is a permissions error, it still avoids creating the db and returns mempty for all queries. This does mean that reconcileStaged does not run and so it may want to drop files that it should not. However, presumably a permissions error on the keys database also means that the user does not have permission to delete annex objects, so they won't be able to drop the files anyway. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:46:59 -04:00
Joey Hess	a56b151f90	fix longstanding indeterminite preferred content for duplicated file problem * drop: When two files have the same content, and a preferred content expression matches one but not the other, do not drop the file. * sync --content, assistant: Fix an edge case where a file that is not preferred content did not get dropped. The sync --content edge case is that handleDropsFrom loaded associated files and used them without verifying that the information from the database was not stale. It seemed best to avoid changing --want-drop's behavior, this way when debugging a preferred content expression with it, the files matched will still reflect the expression. So added a note to the --want-drop documentation, to make clear it may not behave identically to git-annex drop --auto. While it would be possible to introspect the preferred content expression to see if it matches on filenames, and only look up the associated files when it does, it's generally fairly rare for 2 files to have the same content, and the database lookup is already avoided when there's only 1 file, so I did not implement that further optimisation. Note that there are still some situations where the associated files database does not get locked files recorded in it, which will prevent this fix from working. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:07:05 -04:00
Joey Hess	428c91606b	include locked files in the keys database associated files Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.	2021-05-21 16:24:37 -04:00
Joey Hess	8b6dad11a2	add createMessage init: When annex.commitmessage is set, use that message for the commit that creates the git-annex branch. This will be used by filter-branch too, and it seems to make sense to let annex.commitmessage affect it.	2021-05-17 13:07:47 -04:00
Joey Hess	1da9fe5bd8	implemented filter-branch for key info Not tested yet but should work. Noted a possible optimisation, which should probably be added, to speed it up in cases where there is no uuid filtering being done. It would need Annex.Branch to add a function like getRef that uses catFileDetails, so the sha is also returned. The difficulty would be making it support the precached file content; if it didn't it would probably not be any faster and could even be slower. So probably the precaching would need to be changed to also cache the sha.	2021-05-17 11:11:39 -04:00
Joey Hess	4ff8a1ae2b	refactoring filterBranch should be reusable for copy-branch command. Changed LogVariety to differentiate between LocationLog and UrlLog; only location logs contain uuids and need to be filtered by uuid, while url logs do not. This does not change current behavior, but it will let filterBranch be reused without filtering url logs incorrectly.	2021-05-13 14:43:25 -04:00
Joey Hess	947d2a10bc	assistant: Fix a crash on startup by avoiding using forkProcess ghc 8.8.4 seems to have changed something that broke code that has been successfully using forkProcess since 2012. Likely a change to GC internals. Since forkProcess has never had clear documentation about how to use it safely, avoid using it at all. Instead, when git-annex needs to daemonize itself, re-run the git-annex command, in a new process group and session. This commit was sponsored by Luke Shumaker on Patreon.	2021-05-12 15:08:03 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	4edde98709	improve message Pluralize copies appropriately. This commit was sponsored by Mark Reidenbach on Patreon.	2021-04-27 13:44:08 -04:00
Joey Hess	a166d2520b	check mincopies is satisfied even when numcopies is known to be satisfied I had been assuming that numcopies would be a larger or at most equal to mincopies, so no need to check both. But users get confused and use configs that don't really make sense, so make sure to handle mincopies being larger than numcopies. Also add something to the mincopies man page to discourage this misconfiguration. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-04-27 13:37:18 -04:00
Joey Hess	32138b8cd8	implement annex.privateremote and remote.name.private configs The slightly unusual parsing in Types.GitConfig avoids the need to look at the remote list to get configs of remotes. annexPrivateRepos combines all the configs, and will only be calculated once, so it's nice and fast. privateUUIDsKnown and regardingPrivateUUID now need to read from the annex mvar, so are not entirely free. But that overhead can be optimised away, as seen in getJournalFileStale. The other call sites didn't seem worth optimising to save a single MVar access. The feature should have impreceptable speed overhead when not being used.	2021-04-23 14:21:57 -04:00
Joey Hess	d5a05655b4	Merge branch 'master' into hiddenannex	2021-04-23 13:06:33 -04:00
Joey Hess	657d55c401	convert withKnownUrls to use overBranchFileContents This only partly fixes importfeed to see journalled files, since it separately cats metadata directly from the branch. Held off on a changelog for a bug fix until that's dealt with.	2021-04-23 11:32:25 -04:00
Joey Hess	c687eae80b	got private repos really working This new TODO will need private indexes to resolve; until then the private journal has to be checked when private UUIDs are known.	2021-04-21 16:26:23 -04:00
Joey Hess	d0c5f6d2f0	optimisation Avoid trying to read private journal files when no private uuids are known.	2021-04-21 16:02:56 -04:00
Joey Hess	24eeacdba8	adapt recent bug fixes to support private journal At this point, private repos should mostly work, except for a few commands that directly read from the git-annex branch and will not see the private journal. Private index not yet implemented.	2021-04-21 16:01:13 -04:00
Joey Hess	0bb57702e1	Merge branch 'master' into hiddenannex	2021-04-21 15:45:12 -04:00
Joey Hess	653b719472	fix --all to include not yet committed files from the journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. This does mean that --all can end up processing the same key more than once, but before the optimisations that introduced this bug, it used to also behave that way. So I didn't try to fix that; it's an edge case and anyway git-annex behaves well when run on the same key repeatedly. I am not too happy with the use of a MVar to buffer the list of files in the journal. I guess it doesn't defeat lazy streaming of the list, if that list is actually generated lazily, and anyway the size of the journal is normally capped and small, so if configs are changed to make it huge and this code path fire, git-annex using enough memory to buffer it all is not a large problem.	2021-04-21 15:40:32 -04:00
Joey Hess	74acf17a31	refactoring	2021-04-21 14:29:02 -04:00
Joey Hess	6eb3c0a6b4	fix branch precacheing bug by checking journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. When not using --all, precaching only gets triggered when the command actually needs location logs, and so there's no speed hit there. This is a minor speed hit for --all, because it precaches even when the location log is not actually going to be used, and so checking the journal is not necessary. It would have been possible to defer checking the journal until the cache gets used. But that would complicate the usual Branch.get code path with two different kinds of caches, and the speed hit is really minimal. A better way to speed up --all, later, would be to avoid precaching at all when the location log is not going to be used.	2021-04-21 14:02:15 -04:00
Joey Hess	05989556a2	start implementing hidden git-annex repositories This adds a separate journal, which does not currently get committed to an index, but is planned to be committed to .git/annex/index-private. Changes that are regarding a UUID that is private will get written to this journal, and so will not be published into the git-annex branch. All log writing should have been made to indicate the UUID it's regarding, though I've not verified this yet. Currently, no UUIDs are treated as private yet, a way to configure that is needed. The implementation is careful to not add any additional IO work when privateUUIDsKnown is False. It will skip looking at the private journal at all. So this should be free, or nearly so, unless the feature is used. When it is used, all branch reads will be about twice as expensive. It is very lucky -- or very prudent design -- that Annex.Branch.change and maybeChange are the only ways to change a file on the branch, and Annex.Branch.set is only internal use. That let Annex.Branch.get always yield any private information that has been recorded, without the risk that Annex.Branch.set might be called, with a non-private UUID, and end up leaking the private information into the git-annex branch. And, this relies on the way git-annex union merges the git-annex branch. When reading a file, there can be a public and a private version, and they are just concacenated together. That will be handled the same as if there were two diverged git-annex branches that got union merged.	2021-04-20 15:04:53 -04:00
Joey Hess	b2222e4639	optimisation Avoid unnecessary conversion to/from String.	2021-04-20 13:13:45 -04:00
Joey Hess	c30557594e	remove now redundant function	2021-04-20 12:42:57 -04:00
Joey Hess	e1a9b79fa6	fix hardcoded origin name in checkAdjustedClone init: Fix a crash when the repo's was cloned from a repo that had an adjusted branch checked out, and the origin remote is not named "origin". The only other hardcoding of the name of origin is in: - Upgrade.V2, which can be ignored probably - Annex.Branch, which doesn't fail if it has some other name, but just doesn't set up the git-annex branch with quite as linear a history in that case.	2021-04-14 18:53:27 -04:00
Joey Hess	d18b37f769	remove part of comment that is no longer relevant	2021-04-14 18:32:15 -04:00
Joey Hess	b86206b553	directory CoW on import	2021-04-14 16:10:09 -04:00
Joey Hess	441f65c2cf	split out Annex.CopyFile Goal is to use it in Remote.Directory, but also it's nice to shrink Remote.Git.	2021-04-14 14:06:43 -04:00
Joey Hess	8e7dc958d2	forget: Preserve currently exported trees Avoiding problems with exporttree remotes in some unusual circumstances. This commit was sponsored by Brett Eisenberg on Patreon.	2021-04-13 15:00:23 -04:00
Joey Hess	bdba2c5914	fastDebug Annex.Branch reads and writes Reads of cached data are not debugged, only cache misses are, and since many commands pre-cache location log data, this avoids a slew of fastDebug calls when running commands such as git-annex get --from	2021-04-06 16:48:24 -04:00
Joey Hess	2e9d4ac754	fix fastDebug to check if debugging is actually enabled Had to add to AnnexRead an indication of whether debugging is enabled. Could have just made setupConsole not install a debug output action that outputs, and have enableDebug be what installs that, but then in the common case where there is no debug selector, and so all debug output is selected, it would run the debug output action every time, which entails an IORef access. Which would make fastDebug too slow..	2021-04-06 16:28:37 -04:00
Joey Hess	13c090b37a	use fastDebug everywhere it can be used None of these are likely to yeild a noticable speedup though.	2021-04-06 15:41:24 -04:00
Joey Hess	d16d739ce2	implement fastDebug Most of the changes here involve global option parsing: GlobalSetter changed so it can both run an Annex action to set state, but can also change the AnnexRead value, which is immutable once the Annex monad is running. That allowed a debugselector value to be added to AnnexRead, seeded from the git config. The --debugfilter option's GlobalSetter then updates the AnnexRead. This improved GlobalSetter can later be used to move more stuff to AnnexRead. Things that don't involve a git config will be easier to move, and probably a lot of things can be moved eventually. fastDebug, while implemented, is not used anywhere yet. But it should be fast..	2021-04-06 15:24:28 -04:00
Joey Hess	aaba83795b	switch from hslogger to purpose-built Utility.Debug This uses a DebugSelector, rather than debug levels, which will allow for a later option like --debug-from=Process to only see debuging about running processes. The module name that contains the thing being debugged is used as the DebugSelector (in most cases; does not need to be a hard and fast rule). Debug calls were changed to add that. hslogger did not display that first parameter to debugM, but the DebugSelector does get displayed. Also fastDebug will allow doing debugging in places that are used in tight loops, with the DebugSelector coming from the Annex Reader essentially for free. Not done yet.	2021-04-05 13:40:31 -04:00
Joey Hess	c2f612292a	start splitting out readonly values from AnnexState Values in AnnexRead can be read more efficiently, without MVar overhead. Only a few things have been moved into there, and the performance increase so far is not likely to be noticable. This is groundwork for putting more stuff in there, particularly a value that indicates if debugging is enabled. The obvious next step is to change option parsing to not run in the Annex monad to set values in AnnexState, and instead return a pure value that gets stored in AnnexRead.	2021-04-02 15:51:44 -04:00
Joey Hess	ced91b3fbd	Avoid excess commits to the git-annex branch when stall detection is enabled When git-annex transferrer started up, and the journal contained something, it would commit it to the git-annex branch. This caused excess commits to the branch, in cases where normally several changes would be journalled and committed together. That generated some excess git objects and was also just noisy on stdout. Since transferrer uses enableInteractiveBranchAccess, it does not need to commit journalled changes, since the optimisation that avoids checking the journal when reading from the branch is disabled for processes that call that. This commit was sponsored by Svenne Krap on Patreon.	2021-04-02 11:57:18 -04:00
Joey Hess	c75f7e1d98	improve comment	2021-04-02 10:35:15 -04:00
Joey Hess	31eb5fddf3	borg: Fix a bug that prevented importing keys of type URL and WORM Keys stored on the filesystem are mangled by keyFile to avoid problem chars. So, that mangling has to be reversed when parsing files from a borg backup back to a key. The directory special remote also so mangles them. Some other special remotes do not; eg S3 just serializes the key -- but S3 object names are not limited to filesystem valid filenames anyway, so a S3 server must not map them directly to files in any case. It seems unlikely that a borg backup of some such special remote will get broken by this change. This commit was sponsored by Graham Spencer on Patreon.	2021-03-26 12:07:00 -04:00
Joey Hess	537f9d9a11	Improved display of errors when accessing a git http remote fails. New error message: Remote foo not usable by git-annex; setting annex-ignore http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1 If git config parse fails, or the git config file is not available at the url, a better error message for that is also shown. This commit was sponsored by Mark Reidenbach on Patreon.	2021-03-24 14:19:32 -04:00
Joey Hess	fdf1ccbe3f	move comment	2021-03-24 13:57:00 -04:00
Joey Hess	5d78cd9d08	Sped up git-annex init in a clone of an existing repository Seems that hasOrigin was never finding origin's git-annex branch, so a new one got created each time. And so then it later needed to merge the two branches, which is expensive. Added --no-track to git branch to avoid it displaying a message about setting up tracking branches. Of course there's no reason to make the git-annex branch a tracking branch since git-annex auto-merges it.	2021-03-23 15:23:13 -04:00
Joey Hess	798f685077	New annex.supportunlocked config Can beet to false to avoid some expensive things needed to support unlocked files. See my comment for why this only controls what init sets up, and not other behavior. I didn't bother with making the v5 upgrade code path look at this, though it easily could, because the docs say to run git-annex init after setting it to make it take effect.	2021-03-23 14:04:34 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	5545e78a1e	Make --debug also enable debugging in child git-annex processes Especially necessary with stalldetection using child processes for transfers. This commit was sponsored by Jack Hill on Patreon.	2021-03-22 14:25:28 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	526b9ed9d6	update comment	2021-03-16 14:53:29 -04:00
Joey Hess	8bae692486	better interface for catKey' It only needs the size, so don't require the other stuff. Should let it be used in more places, making things faster.	2021-03-16 14:52:23 -04:00
Joey Hess	cdd512cd9f	simplify	2021-03-05 14:22:04 -04:00
Joey Hess	fc61915230	use GIT keys for export of non-annexed files This solves the problem that import of such files gets confused and converts them back to annexed files. The import code already used GIT keys internally when it determined a file should not be annexed. So now when it sees a GIT key that export used, it already does the right thing. This also means that even older version of git-annex can import and will do the right thing, once a fixed version has exported. Still, there may be other complications around upgrades; still need to think it all through. Moved gitShaKey and keyGitSha from Key to Annex.Export since they're only used for export/import. Documented GIT keys in backends, since they do appear in the git-annex branch now. This commit was sponsored by Graham Spencer on Patreon.	2021-03-05 14:12:11 -04:00
Joey Hess	cbf94fd13d	prep for fixing find --branch --unlocked Added LinkType to ProvidedInfo, and unified MatchingKey with ProvidedInfo. They're both used in the same way, so there was no real reason to keep separate. Note that addLocked and addUnlocked still set matchNeedsFileName, because to handle MatchingFile, they do need it. However, they don't use it when MatchingInfo is provided. This should be ok, the --branch case will be able skip checking matchNeedsFileName, since it will provide a filename in any case.	2021-03-02 13:39:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	7db4e62a90	remove accidental duplicated code The code in Annex.WorkerStage and Annex.Concurrent was 100% identical.	2021-02-03 15:23:52 -04:00
Joey Hess	135757d64a	automatic stall detection annex.stalldetection can now be set to "true" to make git-annex do automatic stall detection when it detects a remote is updating its transfer progress consistently enough. This commit was sponsored by Luke Shumaker on Patreon.	2021-02-03 13:33:57 -04:00
Joey Hess	1b63132ca3	add searchPathContents And rename related functions for consistency.	2021-02-02 19:06:15 -04:00
Joey Hess	6f78497572	When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked Missed this when implementing it because of the default case catching the new constructor. So, removed that default case to make sure future types of adjusted branches don't make the same mistake. Complicated by git-annex addurl --fast which adds the file whose content is not present, so it needs to stay unlocked when on such a branch. This commit was sponsored by Brock Spratlen on Patreon.	2021-01-28 12:47:46 -04:00
Joey Hess	34a535ebea	adjust: Fix some bad behavior when unlocked files use URL keys. This avoids the smudge --clean filter failing on the URL keys. git checkout runs the post-checkout hook, which runs smudge --update. That populates all the pointer files, but it neglected to store their inode caches in the keys db. With that done, and the keys db flushed before smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap check can tell the file is not modified, so will not try to re-ingest it, which does not work with URL keys because they do not support genKey. It also seems possible that the isUnmodifiedCheap was also failing for non-URL keys, which would cause them to be re-ingested, leading to a lot of extra work. I have not verified that, but don't see why it wouldn't have happened. So this probably also speeds up checking out adjusted branches. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-01-25 17:25:42 -04:00
Joey Hess	5c7e6629cf	Fix a bug in view filename generation when a metadata value ended with "/" Or ":" or "\" on Windows, eg "c:" again.	2021-01-22 14:05:14 -04:00
Joey Hess	95cd49abdb	fix a bug that prevented git-annex init from working in a submodule This is probably a reversion, but not sure what caused it. By the time Annex.Init runs fixupUnusualReposAfterInit, another git-annex process has at least sometimes already done the necessary fixups. (Eg, one run indirectly by a git command.) But since the Repo is cached, it doesn't realize and does them again. So, avoid crashing when git config --unset fails. This commit was sponsored by Jack Hill on Patreon.	2021-01-21 15:33:15 -04:00
Joey Hess	3847aa3c9c	change user-visible error to giveup	2021-01-21 14:13:14 -04:00
Joey Hess	7ccddd4aea	display exception as part of warnings and comment that led to this change	2021-01-19 12:27:42 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	5ce61c6b2a	add: Significantly speed up adding lots of non-large files to git * add: Significantly speed up adding lots of non-large files to git, by disabling the annex smudge filter when running git add. * add --force-small: Run git add rather than updating the index itself, so any other smudge filters than the annex one that may be enabled will be used.	2021-01-04 13:12:28 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	6280af2901	generate more compact git-annex branch for imports Especially from borg, where the content identifier logs all end up being the same identical file! But also, for other imports, the location tracking logs can, in some cases, be identical files. Bonus optimisation: Avoid looking up (and parsing when set) GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to. Although the lookup does happen at startup even when no log will be written now.	2020-12-23 15:25:16 -04:00
Joey Hess	7916fc98a3	graft in imported tree to avoid gc Fix a bug that could prevent getting files from an importtree=yes remote, because the imported tree was allowed to be garbage collected.	2020-12-23 14:27:38 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	15000dee07	improve thirdpartypopulated support May actually work now. Note that, importKey now has to add the size to the key if it's supposed to have size. Remote.Directory relied on the importer adding the size, which is no longer done, so it was changed; it was the only one. This way, importKey does not need to behave differently between regular and thirdpartypopulated imports.	2020-12-21 16:19:44 -04:00
Joey Hess	57b03630b3	support thirdPartyPopulated These don't have importTree in their config, because they don't support tree import, but they do still support import, and do not support export or key/value modification.	2020-12-21 13:49:47 -04:00
Joey Hess	1c054f1cf7	started borg special remote Still need to implement 3 methods, but importKeyM looks like it will work well to find annex object files.	2020-12-18 16:56:54 -04:00
Joey Hess	909318dcee	Merge branch 'master' into borg	2020-12-18 15:27:24 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	f62aee0525	fix handling of importtree-only remotes Don't want to try to use these remotes as key/value remotes, which will surely fail. It only recently became possible for importtree to be set w/o exporttree, so before this code was ok. (cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)	2020-12-18 15:13:30 -04:00
Joey Hess	400bdb48db	update warnExportImportConflict for import-only remotes	2020-12-17 16:25:46 -04:00
Joey Hess	a4451ac391	add missing space	2020-12-17 15:58:14 -04:00
Joey Hess	26aad24fd3	simplify As the only blocking operation now is threadDelaySeconds, no need to calculate actual time and actual expected minimum size.	2020-12-17 12:09:49 -04:00
Joey Hess	6b13574827	Windows: include= and exclude= containing '/' will also match filenames that are written using '\' And vice-versa, but it's better to use '/' for portability. Notably, standardPreferredContent contains "archive/*" and that might not match if the filename ends up coming in with the slashes the other way around.	2020-12-15 12:39:34 -04:00
Joey Hess	74c1e0660b	propagate git-annex -c on to transferrer child process git -c was already propagated via environment, but need this for consistency. Also, notice it does not use gitAnnexChildProcess to run the transferrer. So nothing is done about avoid it taking the pid lock. It's possible that the caller is already doing something that took the pid lock, and if so, the transferrer will certianly fail, since it needs to take the pid lock too. This may prevent combining annex.stalldetection with annex.pidlock, but I have not verified it's really a problem. If it was, it seems git-annex would have to take the pid lock when starting a transferrer, and hold it until shutdown, or would need to take pid lock when starting to use a transferrer, and hold it until done with a transfer and then drop it. The latter would require starting the transferrer with pid locking disabled for the child process, so assumes that the transferrer does not do anyting that needs locking when not running a transfer.	2020-12-15 11:36:25 -04:00
Joey Hess	00526a6739	pass along -c options to child git-annex processes	2020-12-15 10:49:29 -04:00
Joey Hess	87de360e98	populate new field	2020-12-15 10:37:07 -04:00
Joey Hess	9de5506f19	improve readability and fix a warning	2020-12-14 17:48:30 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	75acf5f440	improve some edge cases around partial initialization * Guard against running in a repo where annex.uuid is set but annex.version is set, or vice-versa. * Avoid autoinit when a repo does not have annex.version or annex.uuid set, but has a git-annex objects directory, suggesting it was used by git-annex before.	2020-12-14 13:17:43 -04:00
Joey Hess	19e26f091d	rename and refactor	2020-12-14 12:32:21 -04:00
Joey Hess	0d0f6d9c23	fix stall detection to actually work when fully stalled When fully stalled, the progress bar doesn't update, so waiting on a MVar would block forever. There's no need to wait anyway, just wake up after sleeping the configured period and check the current value. Luckily Viasat makes it really easy for me to notice this kind of mistake, by stalling long TCP connections frequently.	2020-12-11 18:28:46 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	04c12aa6df	custom protocol for transferrer Rather than using Read/Show, which would force me to preserve data types into the future. I considered just deriving json and sending that, but I don't much like deriving json with data types that have named constructors (like Key does) because again it locks in data type details. So instead, used SimpleProtocol, with a fairly complex and unreadable protocol. But it is as efficient as the p2p protocol at least, and as future proof. (Writing my own custom json instances would have worked but I thought of it too late and don't want to do all the work twice. The only real benefit might be that aeson could be faster.) Note that, when a new protocol request type is added later, git-annex trying to use it will cause the git-annex transferrer to display a protocol error message. That seems ok; it would only happen if a new git-annex found an old version of itself in PATH or the program file. So it's unlikely, and all it can do anyway is display an error. (The error message could perhaps be improved..) This commit was sponsored by Jack Hill on Patreon.	2020-12-09 16:13:59 -04:00
Joey Hess	004a4f5fb1	factor out Types.Transferrer	2020-12-09 13:28:49 -04:00
Joey Hess	677003a6df	rename helper More consistent name with TransferrerPool	2020-12-09 13:24:24 -04:00
Joey Hess	a3fb1754f2	clean up transferrer pool Doing this at shutdown is not very important at all, but I do like to make sure that when git-annex allocates a resource, it later cleans it up. More importantly, stopCoProcesses is used in eg, Remote.Git in a situation where it needs to stop long-running processes like these.	2020-12-09 13:10:35 -04:00
Joey Hess	a8cdcf528e	fix build failure by avoiding refutable pattern match	2020-12-09 12:43:38 -04:00
Joey Hess	05c0543e8e	move new interface to git-annex transfer This is to avoid breakage when upgrading or downgrading git-annex with a process running that uses the interface. It's better to keep the compatability code for a few years than worry about such breakage. This commit was sponsored by Brett Eisenberg on Patreon.	2020-12-09 12:33:56 -04:00
Joey Hess	41f2c308ff	stall detection is working New config annex.stalldetection, remote.name.annex-stalldetection, which can be used to deal with remotes that stall during transfers, or are sometimes too slow to want to use. This commit was sponsored by Luke Shumaker on Patreon.	2020-12-08 15:22:18 -04:00
Joey Hess	b9cfd15e90	add killTransferrer There is redundant code in the assistant that does the same thing, but that code uses a PID, not a ProcessHandle, and gets the PID from, apparently, the TransferInfo transferPid (although I can't seem to find where that gets set on non-windows).	2020-12-08 11:43:06 -04:00
Joey Hess	822a8eadf8	rename	2020-12-08 10:53:07 -04:00
Joey Hess	fcc9e01556	finally using transferkeys Seems to work! Even progress bars. Have not tested prompting or various error message displays yet. transferkeys had to be made to operate in different modes for the Assistant and Annex monads. A bit ugly, but it did relegate that really ugly Database.Keys.closeDb in transferkeys to only the assistant code path. This commit was sponsored by Noam Kremen.	2020-12-07 16:18:26 -04:00
Joey Hess	4c47568876	refactoring This is groundwork for using git-annex transferkeys to run transfers, in order to allow stalled transfers to be interrupted and retried. The new upload and download are closer to what git-annex transferkeys does, so the plan is to make them use it. Then things that were left using upload' and download' won't recover from stalls. Notably, that includes import and export. But at least get/move/copy will be able to. (Also the assistant hopefully, but not yet.) This commit was sponsored by Jake Vosloo on Patreon.	2020-12-07 14:49:17 -04:00
Joey Hess	47016fc656	move TransferrerPool from Assistant state to Annex state This commit was sponsored by Graham Spencer on Patreon.	2020-12-07 13:21:35 -04:00
Joey Hess	72e5764a87	move TransferrerPool from assistant This old code will now be useful for git-annex beyond the assistant. git-annex won't use the CheckTransferrer part, and won't run transferkeys as a batch process, and will want withTransferrer to not shut down transferkeys processes. Still, the rest of this is a good fit for what I need now. Also removed some dead code, and simplified a little bit. This commit was sponsored by Mark Reidenbach on Patreon.	2020-12-07 12:50:48 -04:00
Joey Hess	63839532c9	remove uses of warningIO It's not concurrent-output safe, and doesn't support --json-error-messages. Using Annex.makeRunner is a bit scary, because what if it's run in a different thread from an active annex action? Normally the same Annex state is not used concurrently in several threads, and it's not designed to be fully concurrency safe. (Annex.Concurrent exists to deal with that.) I think it will be ok in these simple cases though. Eg, when buffering a warning message to json, Annex.changeState is used, and it modifies the MVar in a concurrency safe way. The only warningIO remaining is not a problem.	2020-12-02 14:57:43 -04:00
Joey Hess	e92117bfd0	fix test failure on windows "a:" failed; this test wants a relative filename so isDrive avoids it Note that on linux, isDrive "/foo" is true. This test also filters out absolute paths already, so that is ok. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-26 11:48:52 -04:00
Joey Hess	d15c2d9ed3	fix build on windows	2020-11-25 06:24:49 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	dce0781391	squash remaining build warnings on windows	2020-11-24 12:35:09 -04:00
Joey Hess	88cef18fac	upgrade: Support an edge case upgrading a v5 direct mode repo where nothing had ever been committed to the head branch This commit was sponsored by Jack Hill on Patreon.	2020-11-24 12:31:17 -04:00
Joey Hess	804808d569	squash build warnings on windows	2020-11-23 14:00:17 -04:00
Joey Hess	06a80dc790	fix build on windows	2020-11-23 13:53:12 -04:00
Joey Hess	ff0927bde9	converted reads from stderr to use hGetLineUntilExitOrEOF These are all unlikely to suffer from the inherited stderr fd problem, but who knows, it could happen.	2020-11-19 16:21:17 -04:00
Joey Hess	4b739fc460	Fix build on Windows Thanks to bug reporter for the patch.	2020-11-19 12:33:00 -04:00
Joey Hess	aafae46bcb	WIP for https://git-annex.branchable.com/bugs/Buggy_external_special_remote_stalls_after_7245a9e/	2020-11-17 17:31:08 -04:00
Joey Hess	631c8d3e5b	avoid redundant adjusted branch update in sync sync still does update it if the config would otherwise not, since it already did.	2020-11-16 15:13:48 -04:00
Joey Hess	805af01562	bug fix really innefficient but it does solve dropping	2020-11-16 14:57:51 -04:00
Joey Hess	557a6e11a6	avoid spurious blank line when updating adjusted branch git checkout run with --quiet should have no output	2020-11-16 14:41:38 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	af6af35228	split out Annex.Content.Presence This will let a module that Annex.Content imports use inAnnex. Unsure yet if I will need that, but this split still seems to make sense, and Annex.Content was way too long so splitting it is good.	2020-11-16 11:24:57 -04:00
Joey Hess	ccfa9b2dc4	make sync update --unlock-present branch	2020-11-13 15:04:34 -04:00
Joey Hess	e66b7d2e1b	rename to --unlock-present and better reverse adjusting An --unlock-present branch reverses back to a branch where all files that get modified or renamed become locked, even if they were originally unlocked. This is the same that reversing a --unlock branch works, and the new name makes that commonality more clear.	2020-11-13 14:56:43 -04:00
Joey Hess	c8e49c5ef5	git-annex adjust --lock-missing Like --hide-missing the branch does not get updated when content availability changes. Seems to basically work, but sync does not update it yet. Also, when a file is present and so unlocked, git mv followed by git-annex sync results in the basis branch being updated to contain the file with the new name, unlocked. This seems different than what happens in an adjusted unlocked branch, where the commit propigates back locked. Probably the reverse adjustment code needs to be improved to handle this case.	2020-11-13 13:39:44 -04:00
Joey Hess	b1eb47599a	move old direct mode stuff out of Annex.Locations	2020-11-12 12:40:35 -04:00
Joey Hess	92b7b1964d	add warning on add of annex link Warn when adding a annex symlink or pointer file that uses a key that is not known to the repository, to prevent confusion if the user has copied it from some other repository. This commit was sponsored by Jake Vosloo on Patreon.	2020-11-10 12:10:51 -04:00
Joey Hess	885974be99	add newtypes for QuickCheck to avoid LANG=C issues All properties changed to use them, except for prop_encode_c_decode_c_roundtrip, which already filtered to ascii for other reasons. A few modules had to be split out, because Setup does not build-depend on QuickCheck.	2020-11-09 20:21:18 -04:00
Joey Hess	d032b0885d	use MatchingKey when a Key is known This fixes a bug where a file that was not preferred content could be transferred to a remote. This happened when the file got deleted after the sync started running. The only time checkMatcher is run without a Key is in calls to checkFileMatcher, which are only done by add, addurl, import, and smudge --clean. Those won't be affected by this kind of race. Anything else that might be precaching and have a similar race as sync will also be fixed, but I don't know if it actually affected anything other than sync. As well as fixing a bug, this also probably makes sync and --auto faster by avoiding the redundant key lookup. This commit was sponsored by Graham Spencer on Patreon.	2020-11-09 15:17:22 -04:00
Joey Hess	907a0bcad6	avoid providing filename with NUL to quickcheck properties instance Arbitrary [Char] allows that, and it's not a legal part of a filename so can break processing them. Noticed when prop_view_roundtrips failed. The instance Arbitrary AssociatedFile avoids this problem. This commit was sponsored by Mark Reidenbach on Patreon.	2020-11-06 15:15:33 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	2c8cf06e75	more RawFilePath conversion Converted file mode setting to it, and follow-on changes. Compiles up through 369/646. This commit was sponsored by Ethan Aubin.	2020-11-05 18:45:37 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	f9fc26f05a	Merge branch 'master' into rawfilepath	2020-11-04 14:21:44 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	4bcb4030a5	more RawFilePath conversion 580/645 This commit was sponsored by Jack Hill on Patreon.	2020-11-03 18:34:27 -04:00
Joey Hess	78178d4c33	clean build warning	2020-11-03 11:36:48 -04:00
Joey Hess	eb42cd4d46	more RawFilePath conversion 535/645 This commit was sponsored by Brett Eisenberg on Patreon.	2020-11-03 10:11:04 -04:00
Joey Hess	9252f86b2e	view: Fix a reversion in 8.20200522 that broke entering or changing views. Commit `2dc7b5186a` messed up indentation. This commit was sponsored by Noam Kremen on Patreon.	2020-11-02 14:47:08 -04:00
Joey Hess	c41be0d3bd	Merge branch 'master' into rawfilepath	2020-11-02 14:35:46 -04:00
Joey Hess	7245a9ed53	Improve shutdown process for external special remotes and external backends Make sure to relay any remaining stderr from the process after it has shut down, rather than closing stderr just before shutdown. This avoids a situation where the process is still running and tries to write to stderr, getting a SIGPIPE. And, it ensures that no stderr output is lost. This may fix a problem encountered by datalad on windows, where it hangs during the external special remote shutdown. Before commit `a49d300545`, it closed stdin and stdout, but left stderr open, and never killed the stderr waiter thread, which presumably exited on its own. For async exception safety, do need to at make sure that thread gets waited on, as that commit does, but it introduced this problem. Note that, the process's stdout is closed before waiting on it. It's too late for anything it writes to stdout to be processed, and since we're not going to consume any such writes, this avoids the process getting blocked writing to stdout due to us not reading what it's buffered. This does mean that if the process writes to stdout too late, it will get a SIGPIPE. (This was already the case before the above-mentioned commit.) In practice, I think only the protocol's ERROR is allowed to be sent at a point where this could happen.	2020-11-02 12:56:35 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	b4b02e4c61	more RawFilePath conversion 412/645	2020-10-30 13:31:35 -04:00
Joey Hess	ca80c3154c	more RawFilePath conversion removeFile changed to removeLink, because AFAICS it should be fine to remove non-file things here. In particular, it's fine to remove a symlink, since we're about to write a symlink. (removeLink does not remove directories, so file, symlink, and unix socket are the only possibilities.)	2020-10-30 13:07:41 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	f45ad178cb	more RawFilePath conversion At 318/645 after 4k lines of changes This commit was sponsored by Jake Vosloo on Patreon.	2020-10-29 12:03:50 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	8d66f7ba0f	more RawFilePath conversion Added a RawFilePath createDirectory and kept making stuff build. Up to 296/645 This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-28 17:25:59 -04:00
Joey Hess	b8bd2e45e3	more RawFilePath conversion Notable wins in Annex.Locations which was sometimes doing 6 conversions in a single function call. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-10-28 16:24:14 -04:00
Joey Hess	6c29817748	RawFilePath version of getCurrentDirectory This commit was sponsored by Jochen Bartl on Patreon	2020-10-28 16:03:45 -04:00
Joey Hess	64e7bac810	view: Avoid using ':' from metadata when generating a view Because it's a special character on Windows ("c:"). Use same technique already used for '/' and '\'. I didn't record how I generated their encoded forms before, so am sure there was a better way, but the way I did it now is to look at ghci> encodeFilePath "∕" "\226\136\149" And then the difference from that to "\56546\56456\56469" is adding 56320 to each, to get up to the escaped code plane. See comment for why I think handling ':' is ok, but that other illegal windows filenames won't. Note that, this should be enough to make the test suite always work. Other windows illegal filenames will fail at checkout time when it tries to put the illegal filename on the filesystem.	2020-10-26 15:38:08 -04:00
Joey Hess	0133b7e5a8	move: Improve resuming a move that was interrupted after the object was transferred In cases where numcopies checks prevented the resumed move from dropping the object from the source repository, it now relies on a log of recent moves to replicate the behavior of the interrupted command. Performance: Probably noticable impact, since it has to add to the log, check the log, and remove from the log. Seems worth it to avoid this annoying edge case. The log functions are pretty well optimised to avoid unncessary work. An performance improvement to make later would be to avoid cleanup doing anything if it's not written to the log file, and has confirmed that the log file does not contain the log line. This commit was sponsored by Jake Vosloo on Patreon.	2020-10-21 10:31:56 -04:00
Joey Hess	62d630272e	improve name	2020-10-20 15:06:55 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	c3e5417c17	don't try to remove pre-commit-annex and post-update-annex-hooks Those are not installed by git-annex but by the user, and so removal will never find the default content, and so if the user did install them, it would display a misleading message. Seems better, since the user installed them, to let the user remove them if they want to.	2020-10-19 13:13:49 -04:00
Joey Hess	20f86e43f7	Fix a build failure on Windows.	2020-10-07 12:04:54 -04:00
Joey Hess	41271e4eb4	avoid git check-ignore overhead on importing known files isKnownImportLocation does a database lookup and there's an index to make that lookup fast, so it's probably faster than talking to git check-ignore. Checking the matcher is faster still. While before the gitignore check was added it did not need to always check isknown, now it does, because it's that or the more expensive notignored. But at least we can skip notignored when a file is known, which will often be the common case: Importing from a remote that's been exported to, and/or imported from before, only new files will not be known, so only those will need to check notignored. At first, I had this: (matches <&&> (isknown <\|\|> notignored)) <\|\|> isknown Notice that checks isknown every time, whether it matches or not. So, it's no slower to instead do this: isknown <\|\|> (matches <&&> notignored) That has the benefit that, when it's known, it doesn't need to run matches, which while faster than isknown, is still going to use some CPU. And it perhaps more clearly expresses the condition: Any known file is wanted, otherwise it's down to what matches and is not ignored. This commit was sponsored by Jack Hill on Patren.	2020-09-30 11:20:44 -04:00
Joey Hess	c56efbbdb6	import: Check gitignores when importing trees from special remotes It seemed best to do this, for consistency with every other way files can get into a git-annex repo. Although it's just a bit strange that a local .gitignore file affects the pseudo-commits made for the remote that's imported from. This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-30 10:41:59 -04:00
Joey Hess	0033e08193	avoid a second traversal of the ImportableContents Do all filtering in one pass.	2020-09-30 10:10:03 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	dc274a6804	fix inverted logic in recent commit	2020-09-29 12:11:50 -04:00
Joey Hess	658ea7ca3c	sync --no-content import from directory special remote sync: When run without --content, import without copying from importtree=yes directory special remotes. (Other special remotes may support this later as well.) This commit was sponsored by Svenne Krap on Patreon.	2020-09-28 15:29:08 -04:00
Joey Hess	3eaaec3113	consistently use importKey when available This avoids import with --no-content and with --content potentially generating two different trees, leading to a merge conflict when run in two different clones of a repo. And it's necessary groundwork to make git-annex sync --no-content import from special remotes that support importKey. Only the directory special remote currently supports importKey, and it generates the same key as git-annex usually does, so there is no behavior change for it. Future special remotes will need to take care when adding importKey, if it generates different keys. Added some warnings about that to comments. This commit was sponsored by Noam Kremen on Patreon.	2020-09-28 15:27:46 -04:00
Joey Hess	15c1ee16d9	import --no-content: Check annex.largefiles Import small files into git, the same as is done when importing with content. Which means, for small files, --no-content does download them. If the largefiles expression needs the file content available (due to mimetype or mimeencoding being used), the import will fail. This commit was sponsored by Jake Vosloo on Patreon.	2020-09-28 13:28:57 -04:00
Joey Hess	8b74f01a26	split ProvidedInfo and UserProvidedInfo The latter is for git-annex matchexpression and matching against it can throw an exception. Splitting out the former reduces the potential for mistakes and avoids needing to worry about matching against that throwing an exception. This is more groundwork for matching largefiles while importing, without downloading content. This commit was sponsored by Graham Spencer on Patreon.	2020-09-28 12:12:38 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	3e577a6dd3	remove reapZombies Believed to be no longer needed as I've squashed the last ones. Note that, in Test.Framework, I can see no reason for the code to have run it twice. It does not cause running processes to exit after all, so any process that has leaked and is running and causing problems with cleanup of the directory won't be helped by running it. This commit was sponsored by Mark Reidenbach on Patreon.	2020-09-25 11:50:38 -04:00
Joey Hess	ca454c47f2	explicitly wait for a git process Eliminate a zombie that was only cleaned up by the later zombie cleanup code. This is still not ideal, it would be cleaner if it used conduit or something, and if the thread gets killed before waiting, it won't stop the process. Only remaining zombies are in CmdLine.Seek	2020-09-25 11:03:12 -04:00
Joey Hess	d81f549385	fix some compile warnings left in yesterday at least 2 could have caused a crash in some circumstances This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-25 10:55:39 -04:00
Joey Hess	ace02f41b0	seek: defer matcher check until more info is known Sped up seeking for files to operate on, when using options like --copies or --in, by around 20%. Benchmark showed an increase for --copies from 155 seconds to 121 seconds, and --in remote will be similar to that. For --in here, the speedup was less, 5-10% or so. (both warm cache) This commit was sponsored by Jack Hill on Patreon.	2020-09-24 17:59:12 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	c1b4d76e6b	make MatchFiles introspectable matchNeedsFileContent is not used yet, but shows how to add information about terminals. That one would be needed for https://git-annex.branchable.com/todo/sync_fast_import/ Note the tricky bit in Annex.FileMatcher.call where it folds over the included matcher to propagate the information. This commit was sponsored by Svenne Krap on Patreon.	2020-09-24 14:01:53 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	d0b06c17c0	Added --no-check-gitignore option for finer grained control than using --force. add, addurl, importfeed, import: Added --no-check-gitignore option for finer grained control than using --force. (--force is used for too many different things, and at least one of these also uses it for something else. I would like to reduce --force's footprint until it only forces drops or a few other data losses. For now, --force still disables checking ignores too.) addunused: Don't check .gitignores when adding files. This is a behavior change, but I justify it by analogy with git add of a gitignored file adding it, asking to add all unused files back should add them all back, not skip some. The old behavior was surprising. In Command.Lock and Command.ReKey, CheckGitIgnore False does not change behavior, it only makes explicit what is done. Since these commands are run on annexed files, the file is already checked into git, so git add won't check ignores.	2020-09-18 13:19:13 -04:00
Joey Hess	922621301a	Serialize use of C magic library, which is not thread safe. This fixes failures uploading to S3 when using -J. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-17 17:27:42 -04:00
Joey Hess	77c42782d0	differentiate between concurrency enabled at command line and by git config The latter should not affect --batch mode.	2020-09-16 11:47:12 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	62372ee052	resolvemerge: Improve cleanup of cruft left in the working tree by a conflicted merge This commit was sponsored by Jake Vosloo on Patreon.	2020-09-07 16:50:27 -04:00
Joey Hess	0e21a3221e	clean up old code withworktree is no longer doing anything useful so remove it	2020-09-07 16:16:15 -04:00
Joey Hess	03dee56546	revert change that broke test suite Opened a new bug about it. This commit was sponsored by Ethan Aubin.	2020-09-07 15:42:38 -04:00
Joey Hess	d120c73302	sync, assistant: When merge.directoryRenames is not set, default it it to "false" Works better with automatic merge conflict resolution than git's ususual default of "conflict". This is not done when automatic merge conflict resolution is disabled. This commit was sponsored by Mark Reidenbach on Patreon.	2020-09-07 13:50:58 -04:00
Joey Hess	f4c4b89aa3	refactor Make all calls to git merge go through autoMergeFrom, in preparation for fine-tuning git merge's config for automatic merge conflict resolution. This commit was sponsored by Ryan Newton on Patreon.	2020-09-07 13:26:16 -04:00
Joey Hess	69053a93a2	resolvemerge: Improve cleanup of files that were deleted by one side of a conflicted merge, and modified by the other side This case was handled by cleanConflictCruft, but only when the annexed file's object was present. When not present, it left the annexed file with the original name, not checked into git, while adding the variant file. So, add an explicit deletion of the deleted file in this case. My specific case where this happened actually involves merge.directoryRenames=conflict. After a merge involving that, the situation was the file appears as "added by them", because that caused the file that they added to be moved into a directory we renamed. That case is the same as them adding a modified version of the file, while we deleted it. (Except for the history of the file, since it's a new file, but this doesn't look at history.) This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-07 12:25:57 -04:00
Joey Hess	a360437215	make automerge behavior when one side deleted explict This does not actually change how the merge conflict is resolved when one side deleted the file, but it was not documented before, and I think it only worked by accident. This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-07 12:01:03 -04:00
Joey Hess	e36bae74da	Exposed annex.forward-retry git config One reason is, 5 is an arbitrary number so ought to be configurable. The real reason though, is I wanted to make the man page explain when forward retry can override annex.retry, and having a config made the man page easier to write.	2020-09-04 15:16:40 -04:00
Joey Hess	2bb933eb60	import: Retry downloads that fail Also, using the transfer machinery for this makes eg, git-annex info show in-progress imports, and makes --notify-start/finish work.	2020-09-04 13:54:05 -04:00
Joey Hess	1a42b2c5a3	combine retry deciders in better way This fixes the problem that, if forwardRetry was checked for the first 5 and decided to retry, the 6th would go to configuredRetry which would see the counter was 6 and so wait retry-delay*2^5 seconds (default 32). Now, it waits for retry-delay before each retry, even when forwardRetry initiated the retry.	2020-09-04 12:48:30 -04:00
Joey Hess	1d244bafbd	Limit retrying of failed transfers when forward progress is being made to 5 To avoid some unusual edge cases where too much retrying could result in far more data transfer than makes sense.	2020-09-04 12:46:37 -04:00
Joey Hess	eed20fe3b7	fix some file modes in calls to withTmpFileIn to honor umask Also audited for other calls to openTempFile, and all are ok, except for viaTmp which will need further work. Remote.Directory fixed to set umask mode when writing to an export, although it has another one using viaTmp that's not fixed. Will make exports that are published via a http server running as another user work, for example. Remote.BitTorrent fixed to set umask mode when downloading the torrent file. Normally this does not matter as that file does not hang around after the download, but if a bittorrent download were started by one user, got interrupted and then another user ran it, this will let them access the torrent file created by the first user.	2020-09-02 14:36:08 -04:00
Joey Hess	00937c4813	when downloading same content from multiple urls, only display error if all fail	2020-09-02 11:35:07 -04:00
Joey Hess	571ec900ac	Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https. With automatic layout learning!	2020-09-01 15:16:35 -04:00
Joey Hess	f95664305b	remove unused imports	2020-08-28 11:16:51 -04:00
Joey Hess	b68f214312	Display a message when git-annex has to wait for a pid lock file held by another process	2020-08-26 13:05:34 -04:00
Joey Hess	b24ba92231	refactor out Annex.PidLock	2020-08-26 12:29:13 -04:00
Joey Hess	7bdb0cdc0d	add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess Fixes reversion in 8.20200617 that made annex.pidlock being enabled result in some commands stalling, particularly those needing to autoinit. Renamed runsGitAnnexChildProcess to make clearer where it should be used. Arguably, it would be better to have a way to make any process git-annex runs have the env var set. But then it would need to take the pid lock when running any and all processes, and that would be a problem when git-annex runs two processes concurrently. So, I'm left doing it ad-hoc in places where git-annex really does run a child process, directly or indirectly via a particular git command.	2020-08-25 14:57:49 -04:00
Joey Hess	2b6fc17f70	fix comment format	2020-08-25 13:40:52 -04:00
Joey Hess	283d2f85d1	importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_' sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was running it on parts of the template. So eg the leading '.' in the extension got sanitized. Note the added case for sanitizeLeadingFilePathCharacter ('/':_) -- this was added because, if the template is title/episode and the title is not set, it would expand to "/episode". So this is another potential security fix.	2020-08-05 11:35:00 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	555fe669e1	refactoring in preparation for external backends	2020-07-29 12:00:27 -04:00
Joey Hess	f5e65d680b	add back inAnnex check for drop here Needed again after last commit removed it from startLocal again.	2020-07-25 18:17:33 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	c30fd24d91	add back inAnnex check after seeking The test suite noticed this case, where two files with the same key are dropped, and the seek stage sees both have content due to the way files stream through it. But then locking the content to drop fails on the second file, because the first file has already been dropped. So, add back otherwise redundant inAnnex check.	2020-07-25 11:18:50 -04:00
Joey Hess	18f1fb5841	drop performance improvements Sped up seeking files to drop by 2x, and also some performance improvements to checking numcopies. Interestingly, the seek speedup is not due to precaching, but I think is due to calling getParsed earlier. Annex.Drop had to be changed to check inAnnex there, since it was removed from Command.Drop. All other users of Command.Drop already checked inAnnex themselves. This commit was sponsored by Ryan Newton on Patreon.	2020-07-24 13:27:46 -04:00
Joey Hess	c4cc2cdf4c	rename getKey to genKey for consistency with external backend protocol	2020-07-20 14:06:05 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	2634a5ed99	avoid inflating error counter when forking and merging annex state	2020-07-19 18:31:25 -04:00
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	9f6bd6cc05	add inRepoDetails planned to use for an optimisation most things using stagedDetails were not expecting to get dup files in a conflicted merge and deal with them, so converted them to use inRepoDetails.	2020-07-08 15:36:35 -04:00
Joey Hess	7347e50123	add stage number to stagedDetails parser And convert parser to attoparsec, probably faster. Before, a parse failure threw the whole --stage output line in to the filename, which was certianly a bad idea, so fixed that.	2020-07-08 15:05:12 -04:00
Joey Hess	9483b10469	cache one more log file for metadata My worry was that a preferred content expression that matches on metadata would have removed the location log from cache, causing an expensive re-read when a Seek action later checked the location log. Especially when the --all optimisation in the previous commit pre-cached the location log. This also means that the --all optimisation could cache the metadata log too, if it wanted too, but not currently done. The cache is a list, with the most recently accessed file first. That optimises it for the common case of reading the same file twice, eg a get, examine, followed by set reads it twice. And sync --content reads the location log 3 times in a row commonly. But, as a list, it should not be made to be too long. I thought about expanding it to 5 items, but that seemed unlikely to be a win commonly enough to outweigh the extra time spent checking the cache. Clearly there could be some further benchmarking and tuning here.	2020-07-07 14:18:55 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	57cceac569	simplify interface by removing size Add size to the returned key after the fact, unless the remote happened to add it itself.	2020-07-03 14:22:22 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	b2f4b84d27	clean up some build warnings on windows	2020-07-02 11:34:18 -04:00
Joey Hess	087b7ee66a	Revert "data type that starts off using a set but converts to a bloom filter when large" This reverts commit `7e2c4ed216`. I was not able to use this in the end.. See comment in the previous commit.	2020-07-01 20:12:19 -04:00
Joey Hess	a09937580e	more windows build fixes	2020-07-01 15:22:56 -04:00
Joey Hess	7e2c4ed216	data type that starts off using a set but converts to a bloom filter when large This adds a dep on hashable, but it's a free dependency, since unordered-containers already pulled it in. Using unordered-containers for the set seems to make sense, since it hashes and bloom filter hashes too. (Though different hashes.) I dunno, never quite know if I should use unordered-containers or containers.	2020-07-01 14:06:12 -04:00
Joey Hess	d3d187c869	fix build on windows Annex.GitOverlay was using a module that needs posix to build.	2020-07-01 11:22:15 -04:00
Joey Hess	a59e95a82d	improve "unable to lock down 1 copy" message This is a fairly hard to understand situation for the user. Listing the remotes should help them understand it a bit better. This commit was sponsored by Ethan Aubin.	2020-06-26 13:00:40 -04:00
Joey Hess	b651d3ede0	test: Fix some test cases that assumed git's default branch name git is making that configurable, and configuring it globally would break the test suite in a few places. No other part of git-annex assumes any branch name. Renamed a few placeholders to make that clearer. This commit was sponsored by Jake Vosloo on Patreon.	2020-06-23 16:40:51 -04:00
Joey Hess	7757c0e900	Honor annex.largefiles when importing a tree from a special remote. This commit was sponsored by Martin D on Patreon.	2020-06-23 16:07:18 -04:00
Joey Hess	104b3a9c6a	Build with the http-client-restricted library when available Otherwise use the vendored copy as before. The library is in Debian testing but not stable. Once it reaches stable, the vendored copy can be removed. Did not add it to debian/control because IIRC that's used to build git-annex on stable too, possibly. However, the Debian maintainer will probably want to make the package depend on libghc-http-client-restricted-dev This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:31:31 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	d5451afc8f	fix deadlock Fix a deadlock that could occur after git-annex got an unlocked file, causing the command to hang indefinitely. Known to happen on vfat filesystems, possibly others. Note that a deadlock is still theoretically possible, if anything smudge --clean does causes it to run the git queue for some other reason. Apparently that doesn't happen, but will need to keep an eye on it.	2020-06-18 12:56:29 -04:00
Joey Hess	96f6aa39dd	add runsGitAnnexChildProcess calls This is all the calls to git-annex that seem capable of possibly locking the same pidlock as their parent. Except possibly for some in the assistant.	2020-06-17 15:31:03 -04:00
Joey Hess	82448bdf39	fix a annex.pidlock issue That made eg git-annex get of an unlocked file hang until the annex.pidlocktimeout and then fail. This fix should be fully thread safe no matter what else git-annex is doing. Only using runsGitAnnexChildProcess in the one place it's known to be a problem. Could audit for all places where git-annex runs itself as a child and add it to all of them, later.	2020-06-17 15:30:59 -04:00
Joey Hess	ad81feb053	fix implicit embedcreds regression Fix bug that made creds not be stored in git when a special remote was initialized with gpg encryption, but without an explicit embedcreds=yes. (Yet nother regression introduced in version 7.20200202.7. 5th so far.)	2020-06-16 18:00:19 -04:00
Joey Hess	a76b1ba3d6	local git remote autoinit improvements * Improve display of problems auto-initializing or upgrading local git remotes. * When a local git remote cannot be initialized because it has no git-annex branch or a .noannex file, avoid displaying a message about it.	2020-06-16 13:24:00 -04:00
Joey Hess	8a7c615a8f	import: Avoid using some strange names for temporary keys The ContentIdentifier can contain almost anything, so could have characters that are not fit for the filesystem, or might be longer than a key usually is, or contain a newline, or .... genKeyName deals with those problems. This should not present a back-compat issue, because this is a temporary key used while downloading the imported file, before the real key for it can be generated.	2020-06-11 16:07:36 -04:00
Joey Hess	6b0cb2d732	defer cleaning keys db of old data Avoid creating the keys database during init when there are no unlocked files, to prevent init failing when sqlite does not work in the filesystem.	2020-06-11 15:40:13 -04:00
Joey Hess	24ff5e2b29	use uninterruptibleMask Some recent changes to use mask missed that async exceptions can still be thrown inside it. The goal is to make sure a block of cleanup code runs entirely, w/o being interrupted by an async exception, so use uninterruptibleMask. Also, converted a few to bracket, which is nicer.	2020-06-09 15:02:56 -04:00
Joey Hess	0210e81d83	async exception safety for openFd Audited for openFile and openFd, and this fixes all the ones I found where an async exception could prevent the file getting closed. Except for the lock pool, which is a whole other can of worms.	2020-06-05 15:48:00 -04:00
Joey Hess	319f2a4afc	audit all uses of SomeException to avoid catching async exceptions Except for the assistant, which I think may use them between threads? Most of the uses of SomeException were already catching only async exceptions. But I did find a few places that were accidentially catching them.	2020-06-05 15:16:57 -04:00
Joey Hess	2bff3b7c49	init: When annex.pidlock is set, skip lock probing.	2020-06-05 11:12:16 -04:00
Joey Hess	1d41ae5d2a	init warning on stalled lock probe init: If lock probing stalls for a long time (eg a broken NFS server), display a message to let the user know what's taking so long.	2020-06-05 11:06:19 -04:00
Joey Hess	2670890b17	convert to withCreateProcess for async exception safety This handles all createProcessSuccess callers, and aside from process pools, the complete conversion of all process running to async exception safety should be complete now. Also, was able to remove from Utility.Process the old API that I now know was not a good idea. And proof it was bad: The code size went down, despite there being a fair bit of boilerplate for some future API to reduce.	2020-06-04 15:45:52 -04:00
Joey Hess	438dbe3b66	convert to withCreateProcess for async exception safety This handles all sites where checkSuccessProcess/ignoreFailureProcess is used, except for one: Git.Command.pipeReadLazy That one will be significantly more work to convert to bracketing. (Also skipped Command.Assistant.autoStart, but it does not need to shut down the processes it started on exception because they are git-annex assistant daemons..) forceSuccessProcess is done, except for createProcessSuccess. All call sites of createProcessSuccess will need to be converted to bracketing. (process pools still todo also)	2020-06-04 12:44:09 -04:00
Joey Hess	2dc7b5186a	convert to withCreateProcess for async exception safety	2020-06-04 12:05:25 -04:00
Joey Hess	92f775eba0	convert to withCreateProcess for async exception safety Not yet 100% done, so far I've grepped for waitForProcess and converted everything that uses that to start the process with withCreateProcess. Except for some things like P2P.IO and Assistant.TransferrerPool, and Utility.CoProcess, that manage a pool of processes. See #2 in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7 for how those will need to be dealt with. checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so callers of them will also need to be dealt with, and have not been yet.	2020-06-03 15:48:09 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	484a74f073	auto-init autoenable=yes Try to enable special remotes configured with autoenable=yes when git-annex auto-initialization happens in a new clone of an existing repo. Previously, git-annex init had to be explicitly run to enable them. That was a bit of a wart of a special case for users to need to keep in mind. Special remotes cannot display anything when autoenabled this way, to avoid interfering with the output of git-annex query commands. Any error messages will be hidden, and if it fails, nothing is displayed. The user will realize the remote isn't enable when they try to use it, and can run git-annex init manually then to try the autoenable again and see what failed. That seems like a reasonable approach, and it's less complicated than communicating something across a pipe in order to display it as a side message. Other reason not to do that is that, if the first command the user runs is one like git-annex find that has machine readable output, any message about autoenable failing would need to not be displayed anyway. So better to not display a failure message ever, for consistency. (Had to split out Remote.List.Util to avoid an import cycle.)	2020-05-27 12:40:35 -04:00
Joey Hess	0a9a3ed1c3	left an unhandled case in previous commit	2020-05-15 14:31:50 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	5f5170b22b	remove SafeFilePath Move sanitizeFilePath call to where fromSafeFilePath had been.	2020-05-11 14:04:56 -04:00
Joey Hess	cabbc91b18	addurl, importfeed: Allow '-' in filenames, as long as it's not the first character	2020-05-11 13:50:49 -04:00
Joey Hess	6952060665	addurl --preserve-filename and a few related changes * addurl --preserve-filename: New option, uses server-provided filename without any sanitization, but with some security checking. Not yet implemented for remotes other than the web. * addurl, importfeed: Avoid adding filenames with leading '.', instead it will be replaced with '_'. This might be considered a security fix, but a CVE seems unwattanted. It was possible for addurl to create a dotfile, which could change behavior of some program. It was also possible for a web server to say the file name was ".git" or "foo/.git". That would not overrwrite the .git directory, but would cause addurl to fail; of course git won't add "foo/.git". sanitizeFilePath is too opinionated to remain in Utility, so moved it. The changes to mkSafeFilePath are because it used sanitizeFilePath. In particular: isDrive will never succeed, because "c:" gets munged to "c_" ".." gets sanitized now ".git" gets sanitized now It will never be null, because sanitizeFilePath keeps the length the same, and splitDirectories never returns a null path. Also, on the off chance a web server suggests a filename of "", ignore that, rather than trying to save to such a filename, which would fail in some way.	2020-05-08 16:22:55 -04:00
Joey Hess	19b5137227	addurl --fast error message improvement addurl: When run with --fast on an url that annex.security.allowed-ip-addresses prevents accessing, display a more useful message. (Also importfeed --fast potentially.)	2020-04-27 13:48:14 -04:00
Joey Hess	04352ed9c5	check-ignore resource pool Much like check-attr before.	2020-04-21 11:25:28 -04:00
Joey Hess	45fb7af21c	check-attr resource pool Limited to min of -JN or number of CPU cores, because it will often be CPU bound, once it's read the gitignore file for a directory. In some situations it's more disk bound, but in any case it's unlikely to be the main bottleneck that -J is used to avoid. Eg, when dropping, this is used for numcopies checks, but the main bottleneck will be accessing the remotes to verify presence. So the user might decide to -J32 that, but having 32 check-attr processes would just waste however many filehandles they open, and probably worsen their performance due to CPU contention. Note that, I first tried just letting up to the -JN be started. However, even when it's no bottleneck at all, that still results in all of them being started. Why? Well, all the worker threads start up nearly simulantaneously, so there's a thundering herd..	2020-04-21 11:05:57 -04:00
Joey Hess	cee6b344b4	cat-file resource pool Avoid running a large number of git cat-file child processes when run with a large -J value. This implementation takes care to avoid adding any overhead to git-annex when run without -J. When run with -J, there is a small bit of added overhead, to manipulate the resource pool. That optimisation added a fair bit of complexity.	2020-04-20 15:19:31 -04:00
Joey Hess	fe9cf1256e	move remoteList into dupState This does mean that RemoteDaemon.Transport.Tor's call runs it, otherwise no change, but this is groundwork for doing more such expensive actions in dupState.	2020-04-17 14:36:45 -04:00
Joey Hess	a7840c0e04	improve programPath Fixes a failure mode where git-annex sync would try to run git-annex and complain that it failed to find it in ~/.config/git-annex/program or PATH, when there was a git-annex in /usr/bin/, but the original one was run from elsewhere (eg, ~/bin) and happened not to be present any longer. Now, it will fall back to using git-annex from PATH in such a case. Which might fail due to some version incompatability, but still better than a misleading error message. Also made readProgramFile only read the file, not look for git-annex in PATH as a fallback. That fallback may have confused Assistant.Upgrade, which really wants the value from the file.	2020-04-15 16:46:34 -04:00
Joey Hess	43a9808292	disable journal read optimisation when alwayscommit=false The journal read optimisation in `aeca7c220` later got fixed in `eedd73b84` to stage and commit any files that were left in the journal by a previous git-annex run. That's necessary for the optimisation to work correctly. But it also meant that alwayscommit=false started committing the previous git-annex processes journalled changes, which defeated the purpose of the config setting entirely. So, disable the optimisation when alwayscommit=false, leaving the files in the journal and not committing them. See my comments on the bug report for why this seemed the best approach. Also fixes a problem when annex.merge-annex-branches=false and there are changes in the journal. That config indirectly prevents committing the journal. (Which seems a bit odd given its name, but it always has..) So, when there were changes in the journal, perhaps left there due to alwayscommit=false being set before, the optimisation would prevent git-annex from reading the journal files, and it would operate with out of date information.	2020-04-15 13:24:33 -04:00
Joey Hess	5a62e8132d	When parsing git configs, support all the documented ways to write true and false, including "yes", "on", "1", etc. This change does impact git-annex config eg "git annex config --set annex.addunlocked on" will store "on" and new git-annex will understand that value, while old git-annex will error: git-annex: bad annex.addunlocked configuration in git annex config: Parse failure: near "on" That seems acceptable. Not special remote configs that are only documented as =true or =false however. Having git-annex support other values for those would break backwards compatability when used with old versions of git-annex. And older versions ignore invalid special remote configs.. That would not be a good combination.	2020-04-13 14:05:30 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	eedd73b846	fix reversion caused by earlier optimisation to git-annex branch reads `aeca7c2207` was predicated on the assumption that updateTo would stage any journal files, but in one case it did not actually do so. The test suite happened to expose the bug.	2020-04-10 15:25:22 -04:00
Joey Hess	2caf579718	cache annex index filename for 1.5% speedup to queries	2020-04-10 13:37:04 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	c0cd07c36b	Ref ByteString conversion done Test suite passes.	2020-04-07 17:41:09 -04:00
Joey Hess	6c81e0c8f1	ByteString Ref continued Several nice speed wins I think. At 340/633 files converted.	2020-04-07 13:27:11 -04:00
Joey Hess	87d5583a91	use programPath consistently, not readProgramFile Improve git-annex's ability to find the path to its program, especially when it needs to run itself in another repo to upgrade it. Some parts of the code used readProgramFile, probably because I forgot that programPath exists. I noticed this when a git-annex auto-upgrade failed because it was running git-annex upgrade --autoonly, but the code to run git-annex used readProgramFile, which happened to point to an older build of git-annex.	2020-03-30 16:06:27 -04:00
Joey Hess	f6d19b18f6	remove unused imports	2020-03-30 12:11:52 -04:00
Joey Hess	0e4d80d5c1	remove pre-commit hook This was originally added so that unannex could prevent the hook from running while files were in a state that the hook would interpret as old-style unlocked and so would lock. Now that's gone, so the only thing the hook was preventing was two pre-commit processes running simulantaneously. But such concurrency is normal in git-annex and should not be a problem. Does mean that .git/hooks/pre-commit-annex might run more concurrently, that seems the only risk of it causing any problems.	2020-03-30 11:54:04 -04:00
Joey Hess	2e6e8aa60a	fix windows build some more	2020-03-20 11:47:09 -04:00
Joey Hess	d930a2035c	Avoid converting .git file in a worktree or submodule to a symlink when the repository is not a git-annex repository. This means it will still be a .git file when git-annex init runs. That's ok, the repo probably contains no annexed objects yet, and even if it does, git-annex init does not care if symlinks in the worktree don't point to the objects. I made init, at the end, run the conversion code. Not really necessary because the next git-annex command could do it just as well. But, this avoids commands that don't normally write to the repo needing to write to it, which might avoid some problem or other, and seems worth avoiding generally.	2020-03-09 14:54:14 -04:00
Joey Hess	c0a981cb0e	update comment	2020-03-09 14:31:28 -04:00
Joey Hess	093fde5abd	completed the createDirectoryIfMissing conversion Remaining calls in the assistant and Annex.Ssh have been audited and are ok.	2020-03-06 12:55:03 -04:00
Joey Hess	2f204b5d37	refactor	2020-03-06 11:43:07 -04:00
Joey Hess	eaa49ab53d	convert replaceFile to createDirectoryUnder Since it was used on both worktree and .git/annex files, split into multiple functions. In passing, this also improves permissions of created directories in .git/annex, using createAnnexDirectory on those.	2020-03-06 11:31:01 -04:00
Joey Hess	6d58ca94d6	some easy createDirectoryUnder conversions	2020-03-05 15:20:10 -04:00
Joey Hess	ebbc5004fa	convert createAnnexDirectory to use createDirectoryUnder It will create foo/.git/annex/, but not foo/.git/ and not foo/. This will avoid it creating an empty path to a repo when a drive is yanked out and the mount point goes away, for example.	2020-03-05 14:33:04 -04:00
Joey Hess	ccd8c43dc8	git-annex config: guard against non-repo-global configs git-annex config: Only allow configs be set that are ones git-annex actually supports reading from repo-global config, to avoid confused users trying to set other configs with this.	2020-03-02 15:54:18 -04:00
Joey Hess	c78b9b55b6	rename changeGitConfig to overrideGitConfig and avoid unncessary calls It's important that it be clear that it overrides a config, such that reloading the git config won't change it, and in particular, setConfig won't change it. Most of the calls to changeGitConfig were actually after setConfig, which was redundant and unncessary. So removed those. The only remaining one, besides --debug, is in the handling of repository-global config values. That one's ok, because the way mergeGitConfig is implemented, it does not override any value that is set in git config. If a value with a repo-global setting was passed to setConfig, it would set it in the git config, reload the git config, re-apply mergeGitConfig, and use the newly set value, which is the right thing.	2020-02-27 01:11:53 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	9659f1c30f	annex.security.allowed-ip-addresses ports syntax Extended annex.security.allowed-ip-addresses to let specific ports of an IP address to be used, while denying use of other ports.	2020-02-25 15:45:52 -04:00
Joey Hess	1bb32098d6	jump right to v8, don't stop part way * init --version: When the version given is one that automatically upgrades to a newer version, use the newer version instead. * Auto upgrades from older repo versions, like v5, now jump right to v8.	2020-02-24 13:21:00 -04:00
Joey Hess	c31e1be781	convert KeySource to RawFilePath	2020-02-21 10:04:44 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	69f2d1dd43	remoteConfig rework remoteAnnexConfig will avoid bugs like `a3a674d15b` Use now more generic remoteConfig in a couple places that built non-annex config settings manually before.	2020-02-19 13:45:11 -04:00
Joey Hess	ae4177d456	fix warning	2020-02-17 15:06:28 -04:00
Joey Hess	da9945c013	silence build warning	2020-02-14 19:38:50 -04:00
Joey Hess	879f52a116	annex.tune.branchhash1=true bugfix Fix support for repositories tuned with annex.tune.branchhash1=true, including --all not working and git-annex log not displaying anything for annexed files.	2020-02-14 15:22:48 -04:00
Joey Hess	a490947068	annex.sshcaching warning improvement and allow overridding build time default * When git-annex is built with a ssh that does not support ssh connection caching, default annex.sshcaching to false, but let the user override it. * Improve warning messages further when ssh connection caching cannot be used, to clearly state why.	2020-02-14 14:21:03 -04:00
Joey Hess	5c3636037b	Display a warning when concurrency is enabled but ssh connection caching is not enabled or won't work due to a crippled filesystem A warning message is unsatisfying. But erroring out is too hard a failure, especially since it may well work fine if the user has enabled passwordless ssh. I did think about falling back to one ssh connection at a time in this case, but it would have needed a rework of every ssh call, which seems far overboard for such a niche problem. There's no single place where git-annex runs ssh, so no one place that it could block a concurrent call on a semaphore. And, even if it did fall back to one ssh connection at a time, it seems to me that doing so without warning the user about the problem just invites bug reports like "git-annex is ignoring my -J2 and only doing one download at a time". So a warning is needed, and I suppose is good enough.	2020-01-23 12:35:46 -04:00
Joey Hess	6f90bb7738	handle git-credential prompt in -J mode If git-credential has it cached and does not prompt, this will unfortunately result in a brief flicker, as the displayed console regions are hidden while running it and then re-displayed. Better than a corrupted display. Actually, I tried it and don't see a visible flicker, so probably only over a slow ssh will it be apparent.	2020-01-22 16:42:15 -04:00
Joey Hess	1883f7ef8f	support git remotes that need http basic auth using git credential to get the password One thing this doesn't do is wrap the password prompting inside the prompt action. So with -J, the output can be a bit garbled.	2020-01-22 16:16:19 -04:00
Joey Hess	2be4122bfc	include passthrough params in --describe-other-params	2020-01-20 16:53:27 -04:00
Joey Hess	aa949bbb7d	initremote --describe-other-params Does not yet include descriptions from external special remote programs.	2020-01-20 16:05:51 -04:00
Joey Hess	7038acf96c	add descriptions for all remote config fields not yet used	2020-01-20 15:20:04 -04:00
Joey Hess	923230ea30	convert RemoteConfigFieldParser to data type	2020-01-20 13:49:30 -04:00
Joey Hess	8b9b90c74a	bugfixes getRemoteConfigPassedThrough was never returning anything, Typeable prevented the type checker from noticing a dumb mistake. parseRemoteConfig was not adding Accepted values as PassedThrough	2020-01-17 17:09:56 -04:00
Joey Hess	1d711c4378	use "param" not "field" to match man pages	2020-01-15 14:07:05 -04:00
Joey Hess	2edf0506a5	a few forgotten remote config fields preferreddir can be used with any special remote, so its parser needs to be included in the commonFieldParsers. initremote with uuid= changed to delete that field, so it does not need to be included in commonFieldParsers. Note that, existing remotes initialized before this change will have the field in remote.log. This will not cause problems parsing, because the value will be Accepted. Grepping for 'Accepted "' found these, and I'm pretty sure this is all of them.	2020-01-15 11:22:36 -04:00
Joey Hess	c4ea3ca40a	ported almost all remotes, until my brain melted external is not started yet, and S3 is part way through and not compiling yet	2020-01-14 15:41:34 -04:00
Joey Hess	c498269a88	convert configParser to Annex action and add passthrough option Needed so Remote.External can query the external program for its configs. When the external program does not support the query, the passthrough option will make all input fields be available.	2020-01-14 13:52:03 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71f78fe45d	wip separate RemoteConfig parsing Remote now contains a ParsedRemoteConfig. The parsing happens when the Remote is constructed, rather than when individual configs are used. This is more efficient, and it lets initremote/enableremote reject configs that have unknown fields or unparsable values. It also allows for improved type safety, as shown in Remote.Helper.Encryptable where things that used to match on string configs now match on data types. This is a work in progress, it does not build yet. The main risk in this conversion is forgetting to add a field to RemoteConfigParser. That will prevent using that field with initremote/enableremote, and will prevent remotes that already are set up from seeing that configuration. So will need to check carefully that every field that getRemoteConfigValue is called on has been added to RemoteConfigParser. (One such case I need to remember is that credPairRemoteField needs to be included in the RemoteConfigParser.)	2020-01-13 12:39:21 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	5e4deb3620	support sha256 git repos Git will eventually switch to sha2 and there will not be one single shaSize anymore, but two (40 and 64). Changed all parsers for git plumbing output to support both sizes of shas. One potential problem this does not deal with is, if somewhere in git-annex it reads two shas from different sources, and compares them to see if they're the same sha, it would fail if they're sha1 and sha256 of the same value. I don't know if that will really be a concern.	2020-01-07 12:22:19 -04:00
Joey Hess	2000e9a4b8	avoid build warning on windows	2020-01-01 14:40:35 -04:00
Joey Hess	2cea674d1e	Merge branch 'master' into v8	2020-01-01 14:26:43 -04:00
Joey Hess	ea3cb7d277	fix a case where file tracked by git unexpectedly becomes annex pointer file smudge: When annex.largefiles=anything, files that were already stored in git, and have not been modified could sometimes be converted to being stored in the annex. Changes in 7.20191024 made this more of a problem. This case is now detected and prevented.	2019-12-27 15:08:03 -04:00
Joey Hess	2b821eb225	Merge branch 'master' into sqlite	2019-12-26 15:15:42 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	8e9e809d9b	when annex.largefiles parse fails, say where the config came from	2019-12-20 13:07:10 -04:00
Joey Hess	4acbb40112	git-annex config annex.largefiles annex.largefiles can be configured by git-annex config, to more easily set a default that will also be used by clones, without needing to shoehorn the expression into the gitattributes file. The git config and gitattributes override that. Whenever something is added to git-annex config, we have to consider what happens if a user puts a purposfully bad value in there. Or, if a new git-annex adds some new value that an old git-annex can't parse. In this case, a global annex.largefiles that can't be parsed currently makes an error be thrown. That might not be ideal, but the gitattribute behaves the same, and is almost equally repo-global. Performance notes: git-annex add and addurl construct a matcher once and uses it for every file, so the added time penalty for reading the global config log is minor. If the gitattributes annex.largefiles were deprecated, git-annex add would get around 2% faster (excluding hashing), because looking that up for each file is not fast. So this new way of setting it is progress toward speeding up add. git-annex smudge does need to load the log every time. As well as checking the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids both overheads.	2019-12-20 13:01:41 -04:00
Joey Hess	02e00fd7ab	Merge branch 'master' into sqlite	2019-12-19 16:33:42 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	d5628a16b8	Merge branch 'bs' into sqlite-bs	2019-12-18 14:51:03 -04:00
Joey Hess	322c542b5c	fix ByteString conversion on windows the encode' and decode' functions on Windows should not apply the filesystem encoding, which does not work there. Instead, convert to and from UTF-8. Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem encoding, so won't work as expected on windows.	2019-12-18 13:32:56 -04:00
Joey Hess	3d38ec9585	fix fileJournal My ByteString rewrite oversimplified it, resulting in any _ in a journal file turning into a / in the git-annex branch, which was often the wrong filename, or sometimes (//) an invalid filename that git refused to add.	2019-12-18 11:29:34 -04:00
Joey Hess	cee0d738fc	match also / path separator on windows	2019-12-11 17:08:08 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	2f9a80d803	merging sqlite and bs branches Since the sqlite branch uses blobs extensively, there are some performance benefits, ByteStrings now get stored and retrieved w/o conversion in some cases like in Database.Export.	2019-12-06 15:30:45 -04:00
Joey Hess	5f391179f1	use RawFilePath getFileStatus for speed Only done on those calls to getFileStatus that had a RawFilePath, not a FilePath. The others would probably be just as fast if converted to use it with toRawFilePath, but I'm not 100% sure. Note that genInodeCache' uses fromRawFilePath, but that value only gets used on Windows, so on unix the thunk will never be evaluated.	2019-12-06 14:44:42 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	6535aea49a	optimisation This was already optimised before, but profiling found that delEntry was around 1.5% of the total runtime of git-annex whereis. It was being called once per environment variable per file processed. Fixed by better caching. Since withIndexFile is almost always run with the same .git/annex/index file, it can cache the modified environment, rather than re-modifying it each time called.	2019-12-04 14:27:11 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	c756006374	fix hacked up AutoMerge module to work again	2019-12-02 10:51:43 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	6a97ff6b3a	wip RawFilePath Goal is to make git-annex faster by using ByteString for all the worktree traversal. For now, this is focusing on Command.Find, in order to benchmark how much it helps. (All other commands are temporarily disabled) Currently in a very bad unbuildable in-between state.	2019-11-25 16:18:19 -04:00
Joey Hess	ddf6973d22	minor optimisation avoid repeated scan of the same bytestring	2019-11-22 19:13:05 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	d4661959de	Merge branch 'master' into sqlite	2019-11-21 17:26:50 -04:00
Joey Hess	740e0ddbfe	avoid running scanUnlockedFiles in bare repo It's not necessary. And if the bare repo somehow has a pointer file in it with the same name as a file in HEAD, that file would be populated, which would be surprising since the file is not really under git's control.	2019-11-21 14:31:12 -04:00
Joey Hess	5877de5e80	git-lfs: remember urls, and autoenable remotes using known urls * git-lfs: The url provided to initremote/enableremote will now be stored in the git-annex branch, allowing enableremote to be used without an url. initremote --sameas can be used to add additional urls. * git-lfs: When there's a git remote with an url that's known to be used for git-lfs, automatically enable the special remote.	2019-11-18 16:09:09 -04:00

... 5 6 7 8 9 ...

1988 commits