git-annex

Author	SHA1	Message	Date
Joey Hess	af1a45c69c	use replaceWorkTreeFile when fixing an annex symlink This does not change any behavior, but it's useful for all worktree changes to be made using this. Sponsored-by: Graham Spencer on Patreon	2022-06-22 13:41:41 -04:00
Joey Hess	f259be7f39	fix overwrite race with small file that got large When adding a small file, it does not get locked down, so can be modified after git-annex checks that it's small. The use of queued git add made the race window nice and wide too. Fixed by checking if the file has changed, and by not using git add. Instead, have to recapitulate git add's handling of things like symlinks and executable files. Sponsored-by: Jochen Bartl on Patreon	2022-06-14 16:38:56 -04:00
Joey Hess	78a3d44ea0	get rid of racy addLink The remaining callers all did not rely on it checking gitignore, so were easy to convert. They were susceptable to the same overwrite race as add and fix, although less likely to have it and a narrower window than add's race. Command.Rekey in passing got an unncessary call to removeFile deleted. addSymlink handles deleting any existing worktree file.	2022-06-14 14:47:15 -04:00
Joey Hess	64c7f60f7a	fixed overwrite race with git-annex fix Similar to git-annex add, git-annex fix queued git add, so if a file got modified before git add ran, the wrong content would be staged, perhaps a large file content. Sponsored-by: Brock Spratlen on Patreon	2022-06-14 14:19:58 -04:00
Joey Hess	5ef79125ad	fix overwrite race with git-annex add of annex symlink In the unlikely case where git-annex add is run on an annex symlink that is not already added, and while it's processing it, the annex symlink is overwritten with something else, avoid git-annex overwriting that with the symlink again. Sponsored-by: Jack Hill on Patreon	2022-06-14 14:00:13 -04:00
Joey Hess	dd6dec4eb1	fix add overwrite race with git-annex add to annex This is not a complete fix for all such races, only the one where a large file gets changed while adding and gets added to git rather than to the annex. addLink needs to go away, any caller of it is probably subject to the same kind of race. (Also, addLink itself fails to check gitignore when symlinks are not supported.) ingestAdd no longer checks gitignore. (It didn't check it consistently before either, since there were cases where it did not run git add!) When git-annex import calls it, it's already checked gitignore itself earlier. When git-annex add calls it, it's usually on files found by withFilesNotInGit, which handles checking ignores. There was one other case, when git-annex add --batch calls it. In that case, old git-annex behaved rather badly, it would seem to add the file, but git add would later fail, leaving the file as an unstaged annex symlink. That behavior has also been fixed. Sponsored-by: Brett Eisenberg on Patreon	2022-06-14 13:37:19 -04:00
Joey Hess	6d0b243d9d	avoid cleaning up move log when drop from remote fails move: Improve resuming a move that succeeded in transferring the content, but where dropping failed due to eg a network problem, in cases where numcopies checks prevented the resumed move from dropping the object from the source repository. This was earlier done for moves that got interrupted during the drop stage. Sponsored-by: Svenne Krap on Patreon	2022-06-09 15:26:25 -04:00
Joey Hess	c59ea5b1ca	info: Added --autoenable option Use cases include using git-annex init --no-autoenable and then going back and enabling the special remotes that have autoenable configured. As well as just querying to remember which ones have it enabled. It lists all special remotes that have autoenable=yes whether currently enabled or not. And it can be used with --json. I pondered making this "git-annex info autoenable", but that seemed wrong because then if the use has a directory named "autoenable", it's unclear what they are asking for. (Although "git-annex info remote" may be similarly unclear.) Making it an option does mean that it can't be provided via --batch though. Sponsored-by: Dartmouth College's Datalad project	2022-06-01 14:20:38 -04:00
Joey Hess	0d50c90794	init: Added --no-autoenable option Someone may disagree with what repositories are set to autoenable and it's good to have local overrides. See https://github.com/datalad/datalad/issues/6634 Sponsored-by: Dartmouth College's Datalad project	2022-06-01 13:27:49 -04:00
Joey Hess	aa414d97c9	make fsck normalize object locations The purpose of this is to fix situations where the annex object file is stored in a directory structure other than where annex symlinks point to. But it will also move object files from the hashdirmixed back to hashdirlower if the repo configuration makes that the normal location. It would have been more work to avoid that than to let it do it. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 15:38:06 -04:00
Joey Hess	5a98f2d509	avoid creating content directory when locking content If the content directory does not exist, then it does not make sense to lock the content file, as it also does not exist, and so it's ok for the lock operation to fail. This avoids potential races where the content file exists but is then deleted/renamed, while another process sees that it exists and goes to lock it, resulting in a dangling lock file in an otherwise empty object directory. Also renamed modifyContent to modifyContentDir since it is not only necessarily used for modifying content files, but also other files in the content directory. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 12:34:56 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	8675b2b075	rename memoryUnits It's not just used for memory sizes.	2022-05-05 15:35:11 -04:00
Joey Hess	fd65de0eb9	multicast: Support uftp 5.0 by switching from aes256-cbc to aes256-gcm aes256-gcm is supported by both 4.x and 5.x, while 5.x dropped aes256-cbc. Sponsored-by: Graham Spencer on Patreon	2022-04-19 12:02:10 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00
Joey Hess	6079b0c72c	fix reversion add: Avoid unncessarily converting a newly unlocked file to be stored in git when it is not modified, even when annex.largefiles does not match it. This fixes a reversion in version 10.20220222, where git-annex unlock followed by git-annex add, followed by git commit file could result in git thinking the file was modified after the commit. I do have half a mind to remove the withUnmodifiedUnlockedPointers part of git-annex add. It seems weird, despite that old bug report arguing a case of consistency that it ought to behave that way. When git-annex add surpises me, it seems likely it's wrong.. But for now, this is the smallest possible fix. Sponsored-by: Dartmouth College's Datalad project	2022-03-21 15:54:04 -04:00
Joey Hess	a314a8dfd0	add back lost packString The patch that removed it did not break anything, since the strings it's used on are all ASCII not unicode. But I like making sure to use packString everywhere just in case the code later changes in a way that needs it.	2022-03-02 18:22:38 -04:00
sternenseemann	ca596e7c54	allow building with aeson >= 2.0 In aeson 2.0, Text has been replaced by the Key type and HashMap by the KeyMap interface. Accomodating this required adding some CPP in order to still be able to compile with aeson < 2.0. The required changes were: * Prevent Key from being re-exported by Utilities.Aeson, as it clashes with git-annex's own Key type. * Fix up convertion from String/Text to Key (or Text in aeson 1.) in a couple of places Import Data.Aeson.KeyMap instead of Data.HashMap.Strict, as they are mostly API-compatible. insertWith needs to be replaced by unionWith, however, as KeyMap lacks the former function.	2022-03-02 18:01:41 -04:00
Joey Hess	952664641a	turn of PackageImports in cabal file This makes it easier to build eg benchmarks of individual modules. May be that most of these PackageImports are not really necessary, dunno.	2022-02-25 13:16:36 -04:00
Joey Hess	64ccb4734e	smudge: Warn when encountering a pointer file that has other content appended to it It will then proceed to add the file the same as if it were any other file containing possibly annexable content. Usually the file is one that was annexed before, so the new, probably corrupt content will also be added to the annex. If the file was not annexed before, the content will be added to git. It's not possible for the smudge filter to throw an error here, because git then just adds the file to git anyway. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 15:17:08 -04:00
Joey Hess	5b373a9dd2	read a consistent amount from pointer file A few places were reading the max symlink size of a pointer file, then passing tp parseLinkTargetOrPointer. Which is fine currently, but to support pointer files with lines of data after the pointer, enough has to be read that parseLinkTargetOrPointer can be assured of seeing enough of that data to know if it's correctly formatted. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 12:52:34 -04:00
Joey Hess	ce1b3a9699	info: Allow using matching options in more situations File matching options like --include will be rejected in situations where there is no filename to match against. (Or where there is a filename but it's not relative to the cwd, or otherwise seemed too bothersome to match against.) The addition of listKeys' was necessary to avoid using more memory in the common case of "git-annex info". Adding a filterM would have caused the list to buffer in memory and not stream. This is an ugly hack, but listKeys had previously run Annex operations inside unafeInterleaveIO (for direct mode). And matching against a matcher should hopefully not change any Annex state. This does allow for eg `git-annex info somefile --include=*.ext` although why someone would want to do that I don't really know. But it seems to make sense to allow it. But, consider: `git-annex info ./somefile --include=somefile` This does not match, so will not display info about somefile. If the user really wants to, they can `--include=./somefile`. Using matching options like --copies or --in=remote seems likely to be slower than git-annex find with those options, because unlike such commands, info does not have optimised streaming through the matcher. Note that `git-annex info remote` is not the same as `git-annex info --in remote`. The former shows info about all files in the remote. The latter shows local keys that are also in that remote. The output should make that clear, but this still seems like a point where users could get confused. Sponsored-by: Jochen Bartl on Patreon	2022-02-21 14:46:07 -04:00
Joey Hess	c68f52c6a2	restage pointer file after unlock This avoids a later git status or similar taking a long time to run as it runs git-annex smudge once per file. While v9 repositories do avoid that taking long when the files are small, large files can still make git status take a very long time. This does make unlock slower, because now git-annex smudge is being run once per file unlocked. However, the next commit should speed that up in many cases. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-02-18 14:55:52 -04:00
Joey Hess	0edf01d7d4	registerurl,unregisterurl: rework output and support --json * registerurl, unregisterurl: Improved output when reading from stdin to be more like other batch commands. * registerurl, unregisterurl: Added --json and --json-error-messages options. Note that this did change the --batch output in a way that could possibly break something that expected the old output to never change. I think it's acceptable to break that because there has never been a guarantee of unchanging output format except with --batch for most commands. The old output was just really weird too! One possible wart is that "git-annex registerurl" with no options now seems to just hang, since it's waiting for stdin input. Before, it said "registerurl (stdin)" which was clearer about what's happenening. But this is a deprecated mode anyway, --batch makes clear what's happening. If anything, this problem would be a reason to eventually remove the support for reading from stdin w/o --batch. Sponsored-by: Dartmouth College's Datalad project	2022-02-14 13:29:20 -04:00
Joey Hess	835c50966a	reject batch options combined with non-batch options Reject combinations of --batch (or --batch-keys) with options like --all or --key or with filenames. Most commands ignored the non-batch items when batch mode was enabled. For some reason, addurl and dropkey both processed first the specified non-batch items, followed by entering batch mode. Changed them to also error out, for consistency. Sponsored-by: Dartmouth College's Datalad project	2022-01-26 13:00:19 -04:00
Joey Hess	4f7b8ce09d	fix spelling of upgradeable	2022-01-19 12:14:50 -04:00
Joey Hess	3936599885	move code from Command.Fsck Sponsored-by: Dartmouth College's Datalad project	2022-01-13 13:24:50 -04:00
Joey Hess	e416635021	renameremote: Better handling of case where there are multiple special remotes with a name Instead of renaming one at random, error out and ask that a uuid be specified. Sponsored-by: Brett Eisenberg on Patreon	2022-01-05 15:24:02 -04:00
Joey Hess	58afb00f6e	enableremote: Better handling of the unusual case where multiple special remotes have been initialized with the same name Before it would pick one at random, though preferring ones that were not dead over dead ones. Now, if one is dead and the other not, it will use the non-dead one. But if both are not dead, or both dead, it will error out, suggesting the user clarify what they want to enable. Sponsored-by: Luke Shumaker on Patreon	2022-01-05 15:12:11 -04:00
Joey Hess	7e2f5edd68	avoid exporting non-annexed symlinks So that importing does not replace them with plain files. This works similarly to how the previous handling of submodules and matchers did, except that annexed symlinks still get exported as plain files of course, it's only non-annexed symlinks that it does not make sense to export. When symlinks have previously been exported, updating the export will unexport them after upgrading to this commit. Sponsored-by: Kevin Mueller on Patreon	2022-01-03 14:21:50 -04:00
Joey Hess	f8ebd0363b	complete the magic wormhole pairin appid transition Started in 2017 in commit `3fe9d99f24`. Starting tomorrow, all versions of git-annex since then will provide an appid, and so it will no longer be necessary to check the date. Sponsored-by: Nicholas Golder-Manning on Patreon	2021-12-30 12:16:22 -04:00
Joey Hess	b1d719f9d2	handle transitions with read-only unmerged git-annex branches Capstone to this feature. Any transitions that have been performed on an unmerged remote ref but not on the local git-annex branch, or vice-versa have to be applied on the fly when reading files. Sponsored-by: Dartmouth College's Datalad project	2021-12-28 13:23:32 -04:00
Joey Hess	058193adc6	prevent git-annex log with read-only unmerged git-annex branches It would display incomplete information, which would differ from the information displayed with write access. So refuse to display anything. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 15:44:15 -04:00
Joey Hess	23a485498f	handle Annex.Branch.files with read-only unmerged git-annex branches It would be difficult to make Annex.Branch.files query the unmerged git-annex branches. Might be possible, similar to what was discussed in `7f6b2ca49c` but again I decided to make it not do anything in that situation to start with before adding such a complicated thing. git-annex info uses it when getting info about a repostory. The choices were to make that fail with an error, or display the info it can, and change the output slightly for the bits of info it cannot access. While that is a behavior change, and I want to avoid any behavior changes due to unmerged git-annex branches in a read-only repo, displaying a message that is not a number seems unlikely to break anything that was consuming a number, any worse than throwing an exception would. Probably. Also git-annex unused --from origin is made to throw an error, but it would fail later anyway when trying to write to the unused log files. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 15:28:31 -04:00
Joey Hess	7f6b2ca49c	handle overBranchFileContents with read-only unmerged git-annex branches This makes --all error out in that situation. Which is better than ignoring information from the branches. To really handle the branches right, overBranchFileContents would need to both query all the branches and union merge file contents (or perhaps not provide any file content), as well as diffing between branches to find files that are only present in the unmerged branches. And also, it would need to handle transitions.. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:30:51 -04:00
Joey Hess	5ff55f622d	improve sync message in export edge case sync: Better error message when unable to export to a remote because remote.name.annex-tracking-branch is configured to a ref that does not exist. It does not suggest how to fix the problem because there are several possible solutions: Change the git config to point to something that does exist, git add some files, or put files on the special remote that will be imported and so populate the ref. I considered just silently not doing anything, which is what it does when annex-tracking-branch = master and nothing has been committed to master yet. But it seems better to be explicit about it, since this is a fairly confusing situation to find yourself in. Sponsored-By: Max Thoursie on Patreon	2021-12-23 14:45:01 -04:00
Joey Hess	567f63ba47	export: Avoid unncessarily re-exporting non-annexed files that were already exported Commit `b6e4ed9aa7` made non-annexed files be re-uploaded every time, since they're not tracked in the location log, and it made it check the location log. Don't do that for non-annexed files. Sponsored-by: Brock Spratlen on Patreon	2021-11-29 14:02:38 -04:00
Joey Hess	01a5ee6998	addurl, youtube-dl: When --check-raw prevents downloading an url, still continue with any downloads that come after it, rather than erroring out Sponsored-By: Mark Reidenbach on Patreon	2021-11-28 19:40:06 -04:00
Joey Hess	1d513540e9	Fix build with old versions of feed library	2021-11-23 16:06:51 -04:00
Joey Hess	31be0770a5	importfeed: Display url before starting youtube-dl download It was displaying a blank line before.	2021-11-17 13:23:55 -04:00
Joey Hess	86fa460ce2	better wording	2021-11-17 12:48:28 -04:00
Joey Hess	332385a117	use parseFeedFromFile to avoid mojibake As mentioned in commit `2bd778a46e`, there was mojibake when LANG=C. Looking at parseFeedFromFile, it is very particular to read the file as unicode. parseFeedString looks like it will accept any old String, but a String that was read using the filesystem encoding will not in fact have the right encoding. I think this is a bug in the feed library and will file one. Sponsored-by: Svenne Krap on Patreon	2021-11-15 15:31:02 -04:00
Joey Hess	2bd778a46e	importfeed: Fix a crash when used in a non-unicode locale See comment for analysis. At first I thought I'd need to convert all T.unpack in git-annex, but luckily not -- so long as the Text is read from a file, the filesystem encoding is applied and T.unpack is fine. It's only when using Feed that the filesystem encoding is not applied. While this fixes the crash, it does result in some mojibake, eg: itemid=http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-��-questions/ Have not tracked that down, but it must be unrelated, because I've verified that it roundtrips when using encodeUf8: joey@darkstar:~/src/git-annex>LANG=C ghci Utility/FileSystemEncoding.hs ghci> useFileSystemEncoding ghci> Just f <- Text.Feed.Import.parseFeedFromFile "/home/joey/tmp/career_tools_podcasts.xml" ghci> Just (_, x) = Text.Feed.Query.getItemId (Text.Feed.Query.feedItems f !! 0) ghci> decodeBS (Data.Text.Encoding.encodeUtf8 x) "http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-\56546\56448\56467-questions/" ghci> writeFile "foo" $ decodeBS (Data.Text.Encoding.encodeUtf8 x) Writes a file containing the ENDASH character. Sponsored-by: Jochen Bartl on Patreon	2021-11-15 15:04:21 -04:00
Joey Hess	889e771357	display error message if unable to run youtube-dl This would have made the typo of the command name that was just fixed obvious earlier, when --no-raw was used to force using it.	2021-11-13 09:07:43 -04:00
Joey Hess	51b73ea1fc	migrate: New --remove-size option While intended for converting URL keys added by addurl --fast to be as if added by addurl --relaxed, it can also be used to remove size from other types of keys. Although that is not likely to be useful for checksummed keys, I suppose it could be used for WORM or other non-checksum keys. Specifying the --remove-size option does not prevent other migrations from taking effect if there's a key upgrade to perform, or if the backend has changed. So --backend=URL needs to be used to prevent migrating an URL key to the default backend. Note that it's not possible to use git-annex migrate to convert from a non-URL key to an URL key, as URL keys cannot be generated, except by addurl. So while this can get the same effect as --relaxed would have when addurl --fast was used, when --fast was not used, it won't work, or if --backend=URL is not used will remove the size but not prevent checksum verification, which is not useful. Due to this complexity, I decided not to mention it in the git-annex addurl man page. Sponsored-by: Jochen Bartl on Patreon	2021-11-12 13:28:28 -04:00
Joey Hess	9d3ce224e3	uninit edge cases * uninit: Avoid error message when no commits have been made to the repository yet. * uninit: Avoid error message when there is no git-annex branch. Sponsored-by: Svenne Krap on Patreon	2021-11-08 16:47:00 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	07158b7cf6	shorten synopsis This is to avoid the display being too wide.	2021-11-04 14:33:07 -04:00
Joey Hess	438e5b56aa	tighter --json parsing for metadata metadata --batch --json: Reject input whose "fields" does not consist of arrays of strings. Such invalid input used to be silently ignored. Used to be that parseJSON for a JSONActionItem ran parseJSON separately for the itemAdded, and if that failed, did not propagate the error. That allowed different items with differently named fields to be parsed. But it was actually only used to parse "fields" for metadata, so that flexability is not needed. The fix is just to parse "fields" as-is. AddJSONActionItemFields is needed only because of the wonky way Command.MetaData adds onto the started json object. Note that this line got a dummy type signature added, just because the type checker needs it to be some type. itemFields = Nothing :: Maybe Bool Since it's Nothing, it doesn't really matter what type it is, and the value gets turned into json and is then thrown away. Sponsored-by: Kevin Mueller on Patreon	2021-11-01 14:42:37 -04:00
Joey Hess	80f1354685	metadata --batch: Avoid crashing when a non-annexed file is input Turns out that CommandStart actions do not have their exceptions caught, which is why the giveup was causing a crash. Mostly these actions do not do very much work on their own, but it does seem possible there are other commands whose CommandStart also throws an exception. So, my first attempt at a fix was to catch those exceptions. But, --json-error-messages then causes a difficulty, because in order to output a json error message, an action needs to have been started; that sets up the json object that the error message will be included in a field of. While it would be possible to output an object with just an error field, this would be json output of a format that the user has no reason to expect, that happens only in an exceptional circumstance. That is something I have always wanted to avoid with the json output; while git-annex man pages don't document what the json looks like, the output has always been made to be self-describing. Eg, it includes "error-messages":[] even when there's no errors. With that ruled out, it doesn't seem a good idea to catch CommandStart exceptions and display the error to stderr when --json-error-messages is set. And so I don't know if it makes sense to catch exceptions from that at all. Maybe I'd have a different opinion if --json-error-messages did not exist though. So instead, output a blank line like other batch commands do. This also leaves open the possibility of implementing support for matching object with metadata --json, which would also want to output a blank line when the input didn't match. Sponsored-by: Dartmouth College's DANDI project	2021-11-01 13:40:43 -04:00

1 2 3 4 5 ...

2562 commits