git-annex

Author	SHA1	Message	Date
Joey Hess	2d65c4ff1d	avoid unix-compat's rename On Windows, that does not support long paths https://github.com/jacobstanley/unix-compat/issues/56 Instead, use System.Directory.renamePath, which does support long paths. Sponsored-by: Dartmouth College's Datalad project	2022-07-12 14:55:02 -04:00
Joey Hess	201e41cffd	add: Fix reversion when adding an annex link that has been moved to another directory Fixes commit `f259be7f39` Sponsored-by: Dartmouth College's Datalad project	2022-07-05 16:22:41 -04:00
Joey Hess	b223988e22	remove --backend from global options --backend is no longer a global option, and is only accepted by commands that actually need it. Three commands that used to support backend but don't any longer are watch, webapp, and assistant. It would be possible to make them support it, but I doubt anyone used the option with these. And in the case of webapp and assistant, the option was handled inconsistently, only taking affect when the command is run with an existing git-annex repo, not when it creates a new one. Also, renamed GlobalOption etc to AnnexOption. Because there are many options of this type that are not actually global (any more) and get added to commands that need them. Sponsored-by: Kevin Mueller on Patreon	2022-06-29 13:33:25 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	f259be7f39	fix overwrite race with small file that got large When adding a small file, it does not get locked down, so can be modified after git-annex checks that it's small. The use of queued git add made the race window nice and wide too. Fixed by checking if the file has changed, and by not using git add. Instead, have to recapitulate git add's handling of things like symlinks and executable files. Sponsored-by: Jochen Bartl on Patreon	2022-06-14 16:38:56 -04:00
Joey Hess	5ef79125ad	fix overwrite race with git-annex add of annex symlink In the unlikely case where git-annex add is run on an annex symlink that is not already added, and while it's processing it, the annex symlink is overwritten with something else, avoid git-annex overwriting that with the symlink again. Sponsored-by: Jack Hill on Patreon	2022-06-14 14:00:13 -04:00
Joey Hess	dd6dec4eb1	fix add overwrite race with git-annex add to annex This is not a complete fix for all such races, only the one where a large file gets changed while adding and gets added to git rather than to the annex. addLink needs to go away, any caller of it is probably subject to the same kind of race. (Also, addLink itself fails to check gitignore when symlinks are not supported.) ingestAdd no longer checks gitignore. (It didn't check it consistently before either, since there were cases where it did not run git add!) When git-annex import calls it, it's already checked gitignore itself earlier. When git-annex add calls it, it's usually on files found by withFilesNotInGit, which handles checking ignores. There was one other case, when git-annex add --batch calls it. In that case, old git-annex behaved rather badly, it would seem to add the file, but git add would later fail, leaving the file as an unstaged annex symlink. That behavior has also been fixed. Sponsored-by: Brett Eisenberg on Patreon	2022-06-14 13:37:19 -04:00
Joey Hess	6079b0c72c	fix reversion add: Avoid unncessarily converting a newly unlocked file to be stored in git when it is not modified, even when annex.largefiles does not match it. This fixes a reversion in version 10.20220222, where git-annex unlock followed by git-annex add, followed by git commit file could result in git thinking the file was modified after the commit. I do have half a mind to remove the withUnmodifiedUnlockedPointers part of git-annex add. It seems weird, despite that old bug report arguing a case of consistency that it ought to behave that way. When git-annex add surpises me, it seems likely it's wrong.. But for now, this is the smallest possible fix. Sponsored-by: Dartmouth College's Datalad project	2022-03-21 15:54:04 -04:00
Joey Hess	835c50966a	reject batch options combined with non-batch options Reject combinations of --batch (or --batch-keys) with options like --all or --key or with filenames. Most commands ignored the non-batch items when batch mode was enabled. For some reason, addurl and dropkey both processed first the specified non-batch items, followed by entering batch mode. Changed them to also error out, for consistency. Sponsored-by: Dartmouth College's Datalad project	2022-01-26 13:00:19 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	ec12537774	defer write permissions checking in import until after copy to repo This should complete the fix started in `6329997ac4`, fixing the actual cause of the test suite failure this time. Sponsored-by: Dartmouth College's Datalad project	2021-09-02 13:45:21 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	949627b902	remove inode cache in unannex Similar to what commit `675556fd9a` did for adding a non-annexed file, this prevents the smudge clean filter recognising the inode if git add is later run on the unannexed file.	2021-05-12 11:09:38 -04:00
Joey Hess	675556fd9a	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache. git-annex add was changed, when adding a small file, to remove the inode cache for it. This is necessary to keep the recipe in doc/tips/largefiles.mdwn for converting from annex to git working. It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn which the earlier try at this change introduced.	2021-05-10 13:20:10 -04:00
Joey Hess	6481991208	export --json: Fill in the file field Like import was using ActionItemWorkTreeFile, it's ok to use it for export, even though it might not correspond with a file in the work tree. And renamed it to ActionItemTreeFile to make that clearer. Note that when an export has to rename files, it still uses ActionItemOther, so file will still be null in that case, but as no file is being transferred, that seems ok.	2021-03-12 14:11:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	6f78497572	When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked Missed this when implementing it because of the default case catching the new constructor. So, removed that default case to make sure future types of adjusted branches don't make the same mistake. Complicated by git-annex addurl --fast which adds the file whose content is not present, so it needs to stay unlocked when on such a branch. This commit was sponsored by Brock Spratlen on Patreon.	2021-01-28 12:47:46 -04:00
Joey Hess	5ce61c6b2a	add: Significantly speed up adding lots of non-large files to git * add: Significantly speed up adding lots of non-large files to git, by disabling the annex smudge filter when running git add. * add --force-small: Run git add rather than updating the index itself, so any other smudge filters than the annex one that may be enabled will be used.	2021-01-04 13:12:28 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	92b7b1964d	add warning on add of annex link Warn when adding a annex symlink or pointer file that uses a key that is not known to the repository, to prevent confusion if the user has copied it from some other repository. This commit was sponsored by Jake Vosloo on Patreon.	2020-11-10 12:10:51 -04:00
Joey Hess	e81bb05b25	add debug in two unusual situations	2020-11-09 17:52:06 -04:00
Joey Hess	eb42cd4d46	more RawFilePath conversion 535/645 This commit was sponsored by Brett Eisenberg on Patreon.	2020-11-03 10:11:04 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	3457b526ef	make git-annex add --no-check-gitignore not skip ignored files, same as with --force	2020-09-18 13:33:35 -04:00
Joey Hess	d0b06c17c0	Added --no-check-gitignore option for finer grained control than using --force. add, addurl, importfeed, import: Added --no-check-gitignore option for finer grained control than using --force. (--force is used for too many different things, and at least one of these also uses it for something else. I would like to reduce --force's footprint until it only forces drops or a few other data losses. For now, --force still disables checking ignores too.) addunused: Don't check .gitignores when adding files. This is a behavior change, but I justify it by analogy with git add of a gitignored file adding it, asking to add all unused files back should add them all back, not skip some. The old behavior was surprising. In Command.Lock and Command.ReKey, CheckGitIgnore False does not change behavior, it only makes explicit what is done. Since these commands are run on annexed files, the file is already checked into git, so git add won't check ignores.	2020-09-18 13:19:13 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	1be92381ec	unify batch mode with non-batch by using AnnexedFileSeeker	2020-07-22 14:23:28 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Kyle Meyer	39131b55ca	add --force-small: Send all non-regular files through addFile Running `git annex add --force-small` on a modified submodule fails when the submodule path is fed to hash-object. This failure is unlikely to be triggered by a caller passing a submodule explicitly to `git annex add` because there's nothing useful that annex-add can do with a submodule. A more likely scenario for hitting this failure is that the caller passes "." or a subdirectory to `annex-add` while a submodule underneath the specified path happens to be modified. addSmallOverridden already routes symbolic links through addFile rather than using the custom hash-object/update-index call. The latter is valid only for regular files, so extend this condition so that everything that isn't a regular file goes through addFile. Doing so avoids the above error because submodules come in as directories.	2020-03-26 13:14:16 -04:00
Kyle Meyer	339aebc6ad	add --force-small: Don't dereference link when checking file status addSmallOverridden calls getFileStatus and then checks the result with isSymbolicLink. getFileStatus dereferences symbolic links, so isSymbolicLink will always return false (assuming the getFileStatus call doesn't fail on a broken link). Use getSymbolicLinkStatus instead.	2020-03-26 13:11:27 -04:00
Joey Hess	2cea674d1e	Merge branch 'master' into v8	2020-01-01 14:26:43 -04:00
Joey Hess	503788238c	add --force-annex/--force-git options make it easier to override annex.largefiles configuration (and potentially safer as it avoids bugs like the smudge bug fixed in the last release) Deleted some old comments that were posted to the man page discussing such options. Updated docs that used -c annex.largefiles to use the options. Note that addSmallOverridden was needed to avoid the clean filter running on the file. It would be possible to make addFile also update the index directly, rather than going via git add. However, it was not necessary, and I want to avoid breaking on some edge case, particularly if the code in addSmallOverridden has some oversight. Also, when annex.addunlocked is set and annex.largefiles does not match a file, git annex add --force-large works, but git status will then show the file as added, with a unstaged modification. The unstaged modification adds the file to git. This is identical behavior to using -c annex.largefiles=nothing when annex.addunlocked is set. This does not prevent committing what was intended to be added. I have not gotten to the bottom of why git thinks the file is modified and runs it through the clean filter in this case.	2020-01-01 14:03:06 -04:00
Joey Hess	3cd3757236	annex.dotfiles The git add behavior changes could be avoided if it turns out to be really annoying, but then it would need to behave the old way when annex.dotfiles=false and the new way when annex.dotfiles=true. I'd rather not have the config option result in such divergent behavior as `git annex add .` skipping a dotfile (old) vs adding to annex (new). Note that the assistant always adds dotfiles to the annex. This is surprising, but not new behavior. Might be worth making it also honor annex.dotfiles, but I wonder if perhaps some user somewhere uses it and keeps large files in a directory that happens to begin with a dot. Since dotfiles and dotdirs are a unix culture thing, and the assistant users may not be part of that culture, it seems best to keep its current behavior for now.	2019-12-26 16:33:39 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	a0168cd9a2	use RawFilePath getSymbolicLinkStatus for speed	2019-12-06 15:42:54 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	a51a479fb9	fix a couple warnings	2019-08-27 12:24:31 -04:00
Joey Hess	689d1fcc92	remove most remnants of direct mode A few remain, as needed for upgrades, and for accessing objects from remotes that are direct mode repos that have not been converted yet.	2019-08-26 16:27:48 -04:00
Joey Hess	42c386fc47	add: Display progress meter when hashing files. * add: Display progress meter when hashing files. * add: Support --json-progress option.	2019-06-25 13:12:47 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	2d33122215	avoid ingest lockdown file escaping the withOtherTmp call Fixes bug that caused git-annex to fail to add a file when another git-annex process cleaned up the temp directory it was using. Solution is just to push withOtherTmp out to a higher level, so that the whole ingest process can be completed inside it. But in the assistant, that was not practical to do, since withOtherTmp runs in the Annex monad and the assistant does not. Worked around by introducing a separate temp directory that only the assistant uses for lockdown. Since only one assistant can run at a time, it's easy to clean up that directory of old cruft at startup.	2019-05-07 13:04:57 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00

1 2 3 4 5

216 commits