git-annex

Author	SHA1	Message	Date
Joey Hess	189fb05ffb	Added annex.adviceNoSshCaching config. Sponsored-by: Brock Spratlen on Patreon	2021-05-27 12:37:49 -04:00
Joey Hess	b5f5475ed6	New matching options --excludesamecontent and --includesamecontent The normalisation of filenames turns out to be the tricky part here, because the associated files coming out of the keys db may look like "./foo/bar" or "../bar". For the former to match a glob like "foo/", it needs to be normalised. Note that, on windows, normalise "./foo/bar" = "foo\\bar" which a glob like "foo/" won't match. So the glob is matched a second time, on the toInternalGitPath, so allowing the user to provide a glob with the slashes in either direction. However, this still won't support some wacky edge cases like the user providing a glob of "foo/bar\\*" Sponsored-by: Dartmouth College's Datalad project	2021-05-25 13:08:18 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	7029ef1c3d	improve changelog	2021-05-25 10:08:29 -04:00
Joey Hess	5d18994736	clearer language	2021-05-24 14:54:51 -04:00
Joey Hess	a56b151f90	fix longstanding indeterminite preferred content for duplicated file problem * drop: When two files have the same content, and a preferred content expression matches one but not the other, do not drop the file. * sync --content, assistant: Fix an edge case where a file that is not preferred content did not get dropped. The sync --content edge case is that handleDropsFrom loaded associated files and used them without verifying that the information from the database was not stale. It seemed best to avoid changing --want-drop's behavior, this way when debugging a preferred content expression with it, the files matched will still reflect the expression. So added a note to the --want-drop documentation, to make clear it may not behave identically to git-annex drop --auto. While it would be possible to introspect the preferred content expression to see if it matches on filenames, and only look up the associated files when it does, it's generally fairly rare for 2 files to have the same content, and the database lookup is already avoided when there's only 1 file, so I did not implement that further optimisation. Note that there are still some situations where the associated files database does not get locked files recorded in it, which will prevent this fix from working. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:07:05 -04:00
Joey Hess	c525d18cf7	filter-branch: New command, useful to produce a filtered version of the git-annex branch, eg when splitting a repository	2021-05-17 14:16:46 -04:00
Joey Hess	8b6dad11a2	add createMessage init: When annex.commitmessage is set, use that message for the commit that creates the git-annex branch. This will be used by filter-branch too, and it seems to make sense to let annex.commitmessage affect it.	2021-05-17 13:07:47 -04:00
Joey Hess	947d2a10bc	assistant: Fix a crash on startup by avoiding using forkProcess ghc 8.8.4 seems to have changed something that broke code that has been successfully using forkProcess since 2012. Likely a change to GC internals. Since forkProcess has never had clear documentation about how to use it safely, avoid using it at all. Instead, when git-annex needs to daemonize itself, re-run the git-annex command, in a new process group and session. This commit was sponsored by Luke Shumaker on Patreon.	2021-05-12 15:08:03 -04:00
Joey Hess	675556fd9a	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache. git-annex add was changed, when adding a small file, to remove the inode cache for it. This is necessary to keep the recipe in doc/tips/largefiles.mdwn for converting from annex to git working. It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn which the earlier try at this change introduced.	2021-05-10 13:20:10 -04:00
Joey Hess	72a8bbce12	Revert "smudge: check for known annexed inodes before checking annex.largefiles" This reverts commit `424bef6b6f`. This commit caused other buggy behavior unfortunately.	2021-05-10 12:20:13 -04:00
Joey Hess	921753ac44	reinject: Error out when run on a file that is not annexed rather than silently skipping it	2021-05-07 13:31:03 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	424bef6b6f	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache.	2021-05-03 13:26:32 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	27e5f3cd52	releasing package git-annex version 8.20210428	2021-04-28 12:16:45 -04:00
Joey Hess	0f73b6d03a	Avoid more than 1 gpg password prompt at the same time Which could happen occasionally before when concurrency is enabled. While not much of a problem when it did happen, better to avoid it. Also, since it seems likely the gpg-agent sometimes fails in such a situation, this makes it not happen when running a single git-annex command with concurrency enabled. This commit was sponsored by Jake Vosloo on Patreon.	2021-04-27 16:36:44 -04:00
Joey Hess	a166d2520b	check mincopies is satisfied even when numcopies is known to be satisfied I had been assuming that numcopies would be a larger or at most equal to mincopies, so no need to check both. But users get confused and use configs that don't really make sense, so make sure to handle mincopies being larger than numcopies. Also add something to the mincopies man page to discourage this misconfiguration. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-04-27 13:37:18 -04:00
Joey Hess	d3e49b210a	git-annex-config: Allow setting annex.securehashesonly Which has otherwise been supported since 2019, but was missing from the list of allowed repo-global configs. Reordered the list to match the order in the git-annex-config man page, to make them easy to cross-compare. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-04-26 13:50:37 -04:00
Joey Hess	8e24fb3507	update	2021-04-26 13:12:51 -04:00
Joey Hess	2b264b3edf	initremote --private	2021-04-23 14:47:46 -04:00
Joey Hess	0547884eb2	importfeed: fix bug while also speeding up 12x! * Fix bug that could make git-annex importfeed not see recently recorded state when configured with annex.alwayscommit=false. * importfeed: Made "checking known urls" phase run 12 times faster. The massive speedup is because it no longer queries for metadata accompanying each url. Instead it processes the whole git-annex branch and checks all metadata files for feed item ids, and uses any it finds. This could result in a behavior change, in an unlikely situation: If a feed id is recorded in a key's metadata, but the url gets removed, the old code would not see that item id and would re-download it if it finds an url for it in a feed, while the new code will see the item id. I don't think the old behavior was intentional, and it may be that the new behavior is better. Not gonna worry about this.	2021-04-23 12:36:56 -04:00
Joey Hess	6eb3c0a6b4	fix branch precacheing bug by checking journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. When not using --all, precaching only gets triggered when the command actually needs location logs, and so there's no speed hit there. This is a minor speed hit for --all, because it precaches even when the location log is not actually going to be used, and so checking the journal is not necessary. It would have been possible to defer checking the journal until the cache gets used. But that would complicate the usual Branch.get code path with two different kinds of caches, and the speed hit is really minimal. A better way to speed up --all, later, would be to avoid precaching at all when the location log is not going to be used.	2021-04-21 14:02:15 -04:00
Joey Hess	e1a9b79fa6	fix hardcoded origin name in checkAdjustedClone init: Fix a crash when the repo's was cloned from a repo that had an adjusted branch checked out, and the origin remote is not named "origin". The only other hardcoding of the name of origin is in: - Upgrade.V2, which can be ignored probably - Annex.Branch, which doesn't fail if it has some other name, but just doesn't set up the git-annex branch with quite as linear a history in that case.	2021-04-14 18:53:27 -04:00
Joey Hess	4b048ca042	directory CoW on store Not for exports to directory yet though.	2021-04-14 15:11:00 -04:00
Joey Hess	7bb93896af	directory CoW on retrieve directory: When cp supports reflinks, use it when getting content from a directory special remote. Not yet for imports from directory though, and not for store. Note that, when it's chunked, using cp --reflink would not speed it up, and when reflink was not supported, would unnecessarily write the chunk to a file before reading it back in. So, only using a fileRetriever in the NoChunks case is necessary to keep chunking fast. fileCopier is told not to verify, because the special remote interface does not yet support verification in passing. AFAICS, fileCopies can never return False when not verifying so the added giveup should never actually happen.	2021-04-14 15:05:12 -04:00
Joey Hess	5783a8d081	fsck: avoid redundant checksum when transfer is Verified When downloading content from a remote, if the content is able to be verified during the transfer, skip checksumming it a second time. Note that in this case, the fsck output does not include "(checksum)" which it does when the checksumming is done separately from the download. This commit was sponsored by Brock Spratlen on Patreon.	2021-04-14 13:22:54 -04:00
Joey Hess	8e7dc958d2	forget: Preserve currently exported trees Avoiding problems with exporttree remotes in some unusual circumstances. This commit was sponsored by Brett Eisenberg on Patreon.	2021-04-13 15:00:23 -04:00
Joey Hess	805d325a8d	diffdriver: Support unlocked files	2021-04-08 14:32:09 -04:00
Joey Hess	1b645e1ace	added --debugfilter (and annex.debugfilter)	2021-04-05 15:31:10 -04:00
Joey Hess	ced91b3fbd	Avoid excess commits to the git-annex branch when stall detection is enabled When git-annex transferrer started up, and the journal contained something, it would commit it to the git-annex branch. This caused excess commits to the branch, in cases where normally several changes would be journalled and committed together. That generated some excess git objects and was also just noisy on stdout. Since transferrer uses enableInteractiveBranchAccess, it does not need to commit journalled changes, since the optimisation that avoids checking the journal when reading from the branch is disabled for processes that call that. This commit was sponsored by Svenne Krap on Patreon.	2021-04-02 11:57:18 -04:00
Joey Hess	8868a3a4c7	Fix build with persistent-2.12.0.1 persistent stopped using askLogFunc, and the thing to use is askLoggerIO from monad-logger. Bumped the dep to the first version that contained that. Note that the i386ancient build uses a newer monad-logger than 0.3.10, so the new versioned dep should not break it, and presumably nothing else either. This commit was sponsored by Noam Kremen on Patreon.	2021-04-01 12:21:02 -04:00
Joey Hess	315a81e3c6	releasing package git-annex version 8.20210330	2021-03-30 14:33:28 -04:00
Joey Hess	4611813ef1	Fix bug importing from a special remote into a subdirectory more than one level deep Which generated unusual git trees that could confuse git merge, since they incorrectly had 2 subtrees with the same name. Root of the bug was a) not testing that at all! but also b) confusing graftdirs, which contains eg "foo/bar" with non-recursively read trees, which would contain eg "bar" when reading a subtree of "foo". It's worth noting that Annex.Import uses graftTree, but it really shouldn't have needed to. Eg, when importing into foo/bar from a remote, it's enough to generate a tree of foo/bar/x, foo/bar/y, and does not include other files that are at the top of the master branch. It uses graftTree, so it does include the other files, as well as the foo/bar tree. git merge will do the same thing for both trees. With that said, switching it away from graftTree would result in another import generating a new commit that seems to delete files that were there in a previous commit, so it probably has to keep using graftTree since it used it before. This commit was sponsored by Kevin Mueller on Patreon.	2021-03-26 16:04:36 -04:00
Joey Hess	f085ae4937	borg: Support importing files that are hard linked in the borg backup Note that a key with no size field that is hard linked will result in listImportableContents reporting a file size of 0, rather than the actual size of the file. One result is that the progress meter when getting the file will seem to get stuck at 100%. Another is that the remote's preferred content expression, if it tries to match against file size, will treat it as an empty file. I don't see a way to improve the latter behavior, and the former behavior is a minor enough problem. This commit was sponsored by Jake Vosloo on Patreon.	2021-03-26 13:29:34 -04:00
Joey Hess	31eb5fddf3	borg: Fix a bug that prevented importing keys of type URL and WORM Keys stored on the filesystem are mangled by keyFile to avoid problem chars. So, that mangling has to be reversed when parsing files from a borg backup back to a key. The directory special remote also so mangles them. Some other special remotes do not; eg S3 just serializes the key -- but S3 object names are not limited to filesystem valid filenames anyway, so a S3 server must not map them directly to files in any case. It seems unlikely that a borg backup of some such special remote will get broken by this change. This commit was sponsored by Graham Spencer on Patreon.	2021-03-26 12:07:00 -04:00
Joey Hess	537f9d9a11	Improved display of errors when accessing a git http remote fails. New error message: Remote foo not usable by git-annex; setting annex-ignore http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1 If git config parse fails, or the git config file is not available at the url, a better error message for that is also shown. This commit was sponsored by Mark Reidenbach on Patreon.	2021-03-24 14:19:32 -04:00
Joey Hess	4631d1ab56	Fix build with attoparsec-0.14 It changed parseOnly in the ByteString.Lazy module to take a lazy, not strict ByteString. In all these cases though, we actually had a strict ByteString, so the most efficient fix, which also happens to avoid needing ifdefs, is to use the non-lazy module instead. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-03-24 12:11:50 -04:00
Joey Hess	5d78cd9d08	Sped up git-annex init in a clone of an existing repository Seems that hasOrigin was never finding origin's git-annex branch, so a new one got created each time. And so then it later needed to merge the two branches, which is expensive. Added --no-track to git branch to avoid it displaying a message about setting up tracking branches. Of course there's no reason to make the git-annex branch a tracking branch since git-annex auto-merges it.	2021-03-23 15:23:13 -04:00
Joey Hess	798f685077	New annex.supportunlocked config Can beet to false to avoid some expensive things needed to support unlocked files. See my comment for why this only controls what init sets up, and not other behavior. I didn't bother with making the v5 upgrade code path look at this, though it easily could, because the docs say to run git-annex init after setting it to make it take effect.	2021-03-23 14:04:34 -04:00
Joey Hess	c68ba7d893	whereis: Don't include yt: prefix when showing url to content retrieved with youtube-dl I don't think this was really intentional behavior. It may be that it was useful to include it so it could be passed to rmurl, since without it rmurl would not actually remove the url. Since that was changed earlier today, now seems like a good time to clean up the display of these urls. This commit was sponsored by Jochen Bartl on Patreon.	2021-03-22 19:56:24 -04:00
Joey Hess	637229c593	fix fsck --from --all to not fall over trying to check required content fsck: When --from is used in combination with --all or similar options, do not verify required content, which can't be checked properly when operating on keys. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-03-22 15:08:07 -04:00
Joey Hess	5545e78a1e	Make --debug also enable debugging in child git-annex processes Especially necessary with stalldetection using child processes for transfers. This commit was sponsored by Jack Hill on Patreon.	2021-03-22 14:25:28 -04:00
Joey Hess	5d75cbcdcf	webdav: deal with buggy webdav servers in renameExport box.com already had a special case, since its renaming was known buggy. In its case, renaming to the temp file succeeds, but then renaming the temp file to final destination fails. Then this 4shared server has buggy handling of renames across directories. While already worked around with for the temp files when storing exports now being in the same directory as the final filename, that also affected renameExport when the file moves between directories. I'm not entirely clear what happens on the 4shared server when it fails this way. It kind of looks like it may rename the file to destination and then still fail. To handle both, when rename fails, delete both the source and the destination, and fall back to uploading the content again. In the box.com case, the temp file is the source, and deleting it makes sure the temp file gets cleaned up. In the 4shared case, the file may have been renamed to the destination and so cleaning that up avoids any interference with the re-upload to the destination.	2021-03-22 13:08:18 -04:00
Joey Hess	0af9d1dcb6	unregisterurl: remove all forms of an url, no matter what the downloader is set to unregisterurl: Fix a bug that caused an url to not be unregistered when it is claimed by a special remote other than the web. See commit `f175d4cc90` for rationalle.	2021-03-22 12:17:17 -04:00
Joey Hess	f175d4cc90	rmurl: remove all forms of an url, no matter what the downloader is set to * rmurl: When youtube-dl was used for an url, it no longer needs to be prefixed with "yt:" in order to be removed. * rmurl: If an url is both used by the web and also claimed by another special remote, fix a bug that caused the url to to not be removed. The youtube-dl change is a consequence of how the bug fix is implemented. But I also think it's the right thing to do. Consider that, before, git-annex addurl $url followed by git-annex rmurl $url would not remove the url in the case where youtube-dl was used. That was surprising behavior. In the unlikely case where a special remote claims an url, and it's been added using OtherDownloader, but it was also added already as a web url, it seems better for rmurl to remove both than to arbitrarily remove only one. And in the case the bug report was filed for, when an url was added as a web url, but a special remote now claims it, that should not prevent rmurl removing the web url. Calling setUrlMissing lets other callers of it behave differently. Probably the calls to it in eg, Remote.External and Remote.BitTorrent are fine, since they don't mangle the url and just remove what was provided, and the OtherDownloader form of a bittorrent url, respectively. I suspect unregisterurl needs to have a similar change made to rmurl, for similar reasons.	2021-03-22 12:09:15 -04:00
Joey Hess	9856e10d3c	call out behavior change	2021-03-22 11:34:23 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	6481991208	export --json: Fill in the file field Like import was using ActionItemWorkTreeFile, it's ok to use it for export, even though it might not correspond with a file in the work tree. And renamed it to ActionItemTreeFile to make that clearer. Note that when an export has to rename files, it still uses ActionItemOther, so file will still be null in that case, but as no file is being transferred, that seems ok.	2021-03-12 14:11:31 -04:00
Joey Hess	1cb154f457	avoid importing deleting submodule import: When the previously exported tree contained a submodule, preserve it in the imported tree so it does not get deleted. The export exclude log, which was used for non-preferred content, now also includes the submodules. Since the log format is git ls-tree output, this does not break backwards compatibility.	2021-03-12 13:31:21 -04:00

1 2 3 4 5 ...

1186 commits