git-annex

Author	SHA1	Message	Date
Joey Hess	748addbe05	remove second pass in scanAnnexedFiles The pass was needed to populate files when annex.thin was set, but in commit `73e0cbbb19`, reconcileStaged started to do that. So, this second pass is not needed any longer.	2021-07-30 17:46:11 -04:00
Joey Hess	d2aead67bd	fsck: Detect and correct stale or missing inode caches for object files An easy way to see this in action is to have an unlocked file, and touch the object file. While all code that compares inode caches for object files needs to be prepared for this kind of problem and fall back to verification, having fsck notice it and correct it is cheap (as long as fsck is being run anyway) and ensures that if it happens for some unusual reason, there's a way for the user to notice that it's happening. Not that, when annex.thin is in use, the earlier call to isUnmodified (and also potentially earlier calls to inAnnex in eg, verifyLocationLog) will fix up the same problem silently. That might prevent the warning being displayed, although probably it still will be, because the Database.Keys write of the InodeCache will be queued but will not have happened yet. I can't see a way to improve this, but it's not great. Sponsored-by: Dartmouth College's Datalad project	2021-07-29 14:06:42 -04:00
Joey Hess	817ccbbc47	split verifyKeyContent This avoids it calling enteringStage VerifyStage when it's used in places that only fall back to verification rarely, and which might be called while in TransferStage and be going to perform a transfer after the verification.	2021-07-29 13:58:40 -04:00
Joey Hess	3c5280b1cf	improve comment wording	2021-07-29 13:21:23 -04:00
Joey Hess	72a13d2a5f	remove unused parameter	2021-07-29 13:12:11 -04:00
Joey Hess	14683da9eb	fix potential race in updating inode cache Some uses of linkFromAnnex are inside replaceWorkTreeFile, which was already safe, but others use it directly on the work tree file, which was race-prone. Eg, if the work tree file was first removed, then linkFromAnnex called to populate it, the user could have re-written it in the interim. This came to light during an audit of all calls of addInodeCaches, looking for such races. All the other uses of it seem ok. Sponsored-by: Brett Eisenberg on Patreon	2021-07-27 13:08:08 -04:00
Joey Hess	e4b2a067e0	fix potential race in updating inode cache In Annex.Content, the object file was statted after pointer files were populated. But if annex.thin is set, once the pointer files are populated, the object file can potentially be modified via the hard link. So, it was possible, though seemingly very unlikely, for the inode of the modified object file to be cached. Command.Fix and Command.Fsck had similar problems, statting the work tree files after they were in place. Changed them to stat the temp file that gets moved into place. This does rely on .git/annex being on the same filesystem. If it's not, the cached inode will not be the same as the one that the temp file gets moved to. Result will be that git-annex will later need to do an expensive verification of the content of the worktree files. Note that the cross-filesystem move of the temp file already is a larger amount of extra work, so this seems acceptable. Sponsored-by: Luke Shumaker on Patreon	2021-07-27 12:29:10 -04:00
Joey Hess	4637592325	fix a place where the inode cache could potentially have gotten stale When git-annex lock repopulates the object file by copying an associated file that still has its content, it negected to update the inode cache. I was not able to actually get this code to successfully repopulate the object file; the associated file gets replaced with a dangling pointer before unlock is able to do that. (By what I'm not sure.. reconcileStaged?) Which might be itself a bug, but anyway this makes me doubtful that this was really leading to a stale inode cache. Still, in case there is some situation in which it does work, fixed it to update the inode cache.	2021-07-26 14:12:58 -04:00
Joey Hess	3d50b47ded	sync, merge: Added --allow-unrelated-histories option Which is the same as the git merge option. After last commit, this turns out to be needed in the test suite, and when doing git-annex import from special remote, followed by a git-annex merge. Sponsored-by: Svenne Krap on Patreon	2021-07-19 12:14:26 -04:00
Joey Hess	b6bea0d3f2	remove direct mode remnant of merging unrelated histories sync, merge, post-receive: Avoid merging unrelated histories, which used to be allowed only to support direct mode repositories. (However, sync does still merge unrelated histories when importing trees from special remotes, and the assistant still merges unrelated histories always.) See `556b2ded2b` for why this was added back in 2016, for direct mode. This is a behavior change, which might break something that was relying on sync merging unrelated histories, but git had a good reason to prevent it, since it's easy to foot shoot with it, and git-annex should follow suit. Sponsored-by: Noam Kremen on Patreon	2021-07-19 11:41:26 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	274d2380c7	better key matching with a regexp Handles keys that are substrings of other keys, as well as pointer files that contain a newline after the key. Note that -S does not match regexp, while -G does by default. Docs are not clear, determined experimentally. The only other difference in changing to -G is that if a file used to contain the key and changed in some way, while still containing the key, -G will match and -S would not. So eg, annex links that git annex fix rewrites will match, and files that change lock status will match. Which is an improvement anyway. Sponsored-by: Jochen Bartl on Patreon	2021-07-14 16:31:17 -04:00
Joey Hess	7a46bb1b28	change message to suggest using whereused --historical	2021-07-14 16:08:47 -04:00
Joey Hess	d6f056eca3	have whereused also check the reflog Since the stash is part of that, it can also find stashed content. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-07-14 16:05:20 -04:00
Joey Hess	fcd1b93a7d	whereused --historical Does not check the reflog, but otherwise works. It's possible for it to display something that is not an annexed file, if a non-annexed file somehow ends up containing something that looks like the key's name. This seems very unlikely to happen, and it would add a lot of complexity to detect it and somehow skip over that file, since the git log would need to either be run again, or not limited to 1 result and canceled once enough results have been read. Also, it kind of seems ok, if a file refers to a key, to consider that as a place the key was used, for some definition of used. So, I punted on dealing with that. May revisit later. Sponsored-by: Brock Spratlen on Patreon	2021-07-14 15:38:28 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	b9db859221	addurl: Avoid crashing when used on beegfs. Sponsored-by: Dartmouth College's DANDI project	2021-07-05 13:02:40 -04:00
Joey Hess	b8e32e200e	addurl, importfeed: Added --no-raw option Forces eg, download with youtube-dl without falling back to raw download. Since youtube-dl failing due to an url not being supported is difficult to distinguish from it failing due to being blocked in some way, this can be useful to avoid the fallback of git-annex downloading the raw web page and adding that. Since --raw also prevents using special remotes, --no-raw also allows special remote downloads. Although it's always possible that some special remote may claim an url and fall back to raw download of the content, which --no-raw cannot prevent. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-06-27 11:14:51 -04:00
Joey Hess	3a14648142	dropping unused marks as dead Dropping an object with drop --unused or dropunused will mark it as dead, preventing fsck --all from complaining about it after it's been dropped from all repositories. If another repository still has a copy, it won't be treated as dead until it's also dropped from there. The drop has to use --unused, can't be --key or something else, because this indicates that the user has recently ran git-annex unused. If it checked the unused log on every drop, bad things would happen when the unused log was out of date, eg a file used to be unused but then got re-added. Marking such a file as dead could be confusing. When the user uses --unused/dropunused, they must consider the unused information to be up-to-date. The particular workflow this enables is: git annex add foo git annex unannex foo git annex unused git annex drop --unused / dropunused git annex fsck --all # no warnings The docs for git-annex unannex say to use git-annex unused and dropunused, so the user should be pointed in this direction when they want to undo an accidental add. Sponsored-by: Brock Spratlen on Patreon	2021-06-25 15:22:26 -04:00
Joey Hess	1cc7b2661e	push synced/master before synced/git-annex sync: Partly work around github behavior that first branch to be pushed to a new repository is assumed to be the head branch, by not pushing synced/git-annex first. github expects master (or whatever the name is) to be pushed first, but git-annex sync can't, because it's got to also support pushes to non-bare repos where pushing master fails, as explained in the big comment. So pushing synced/master is not entirely a fix, but at least it makes github default to a branch with the stuff the user expects in it, not a bunch of annex log files. Aside from fixing github to not make this assumption, or improving the git push protocol to include what the current HEAD is, the only other approach I can think of is to identify git push's progress messages and display those when pushing master, while filtering out error messages about non-fast-forward etc. But git doesn't provide a way to separate out or identify its progress messages. Sponsored-by: Luke Shumaker on Patreon	2021-06-21 12:32:21 -04:00
Joey Hess	d2be68907c	drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies Eg, before with a .gitattributes like: .2 annex.numcopies=2 .1 annex.numcopies=1 And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2 would succeed, leaving just 1 copy, despite foo.2 needing 2 copies. It dropped foo.1 first and then skipped foo.2 since its content was gone. Now that the keys database includes locked files, this longstanding wart can be fixed. Sponsored-by: Noam Kremen on Patreon	2021-06-15 11:38:44 -04:00
Joey Hess	d164434679	fix build	2021-06-15 11:14:43 -04:00
Joey Hess	b3712b6047	refactor	2021-06-15 10:27:33 -04:00
Joey Hess	78da00c7a6	Future proof activity log parsing When the log has an activity that is not known, eg added by a future version of git-annex, it used to be treated as no activity at all, which would make git-annex expire think it should expire the repository, despite it having some kind of recent activity. Hopefully there will be no reason to add a new activity until enough time has passed that this commit is in use everywhere. Sponsored-by: Jake Vosloo on Patreon	2021-06-14 14:18:19 -04:00
Joey Hess	13b9a288d3	scanAnnexedFiles in smudge --update This makes git checkout and git merge hooks do the work to catch up with changes that they made to the tree. Rather than doing it at some later point when the user is not thinking about that past operation. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:37:47 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	a56b151f90	fix longstanding indeterminite preferred content for duplicated file problem * drop: When two files have the same content, and a preferred content expression matches one but not the other, do not drop the file. * sync --content, assistant: Fix an edge case where a file that is not preferred content did not get dropped. The sync --content edge case is that handleDropsFrom loaded associated files and used them without verifying that the information from the database was not stale. It seemed best to avoid changing --want-drop's behavior, this way when debugging a preferred content expression with it, the files matched will still reflect the expression. So added a note to the --want-drop documentation, to make clear it may not behave identically to git-annex drop --auto. While it would be possible to introspect the preferred content expression to see if it matches on filenames, and only look up the associated files when it does, it's generally fairly rare for 2 files to have the same content, and the database lookup is already avoided when there's only 1 file, so I did not implement that further optimisation. Note that there are still some situations where the associated files database does not get locked files recorded in it, which will prevent this fix from working. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:07:05 -04:00
Joey Hess	428c91606b	include locked files in the keys database associated files Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.	2021-05-21 16:24:37 -04:00
Joey Hess	24c7d9ba78	decided not to include export/import trees They're only needed to cover a gc edge case, and it's better someone gets caught by that edge case than that someone who does not know about them ends up with a filtered git-annex branch that contains such a tree when some of the files listed in it are ones they wanted to remove from the repository.	2021-05-17 14:12:15 -04:00
Joey Hess	2420910ab8	include info for sameas repos It's not currently possible to exclude a sameas repo using its annex-config-uuid. (Remote.nameToUUID rejects them). Since there's no real documented way to learn those, this seems ok, at least for now. Also it avoids the problem of someone excluding the parent but including the sameas, which would probably make the sameas repo not usable when using the filtered branch.	2021-05-17 14:04:14 -04:00
Joey Hess	984034f335	filter-branch working aside from some edge cases Added a note to man page about what happens to information that is recorded in the private journal. Since it uses Branch.get, that information will be copied when options allow. It seemed better to allow it and document it than not allow it, since the options allow excluding repositories and so can be used to exclude private repos if desired.	2021-05-17 13:24:58 -04:00
Joey Hess	1da9fe5bd8	implemented filter-branch for key info Not tested yet but should work. Noted a possible optimisation, which should probably be added, to speed it up in cases where there is no uuid filtering being done. It would need Annex.Branch to add a function like getRef that uses catFileDetails, so the sha is also returned. The difficulty would be making it support the precached file content; if it didn't it would probably not be any faster and could even be slower. So probably the precaching would need to be changed to also cache the sha.	2021-05-17 11:11:39 -04:00
Joey Hess	80a9944f3b	don't implicitly include all when exclude options are used This is less erorr-prone, and easier for the user to reason about; it preserves the man page's promise that only explicitly included information will be copied.	2021-05-14 14:14:46 -04:00
Joey Hess	a58c90ccf4	skeleton of filter-branch command, with option parser	2021-05-14 10:59:48 -04:00
Joey Hess	947d2a10bc	assistant: Fix a crash on startup by avoiding using forkProcess ghc 8.8.4 seems to have changed something that broke code that has been successfully using forkProcess since 2012. Likely a change to GC internals. Since forkProcess has never had clear documentation about how to use it safely, avoid using it at all. Instead, when git-annex needs to daemonize itself, re-run the git-annex command, in a new process group and session. This commit was sponsored by Luke Shumaker on Patreon.	2021-05-12 15:08:03 -04:00
Joey Hess	949627b902	remove inode cache in unannex Similar to what commit `675556fd9a` did for adding a non-annexed file, this prevents the smudge clean filter recognising the inode if git add is later run on the unannexed file.	2021-05-12 11:09:38 -04:00
Joey Hess	675556fd9a	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache. git-annex add was changed, when adding a small file, to remove the inode cache for it. This is necessary to keep the recipe in doc/tips/largefiles.mdwn for converting from annex to git working. It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn which the earlier try at this change introduced.	2021-05-10 13:20:10 -04:00
Joey Hess	72a8bbce12	Revert "smudge: check for known annexed inodes before checking annex.largefiles" This reverts commit `424bef6b6f`. This commit caused other buggy behavior unfortunately.	2021-05-10 12:20:13 -04:00
Joey Hess	921753ac44	reinject: Error out when run on a file that is not annexed rather than silently skipping it	2021-05-07 13:31:03 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	1bd44c7742	Merge remote-tracking branch 'atemu/misc-fixes'	2021-05-07 11:23:54 -04:00
Kyle Meyer	4450fe3629	fromkey: create directory for pointer files too fromkey creates leading directories for symbolic links. Do the same for pointer files.	2021-05-07 11:10:06 -04:00
Atemu	a7f0014a53	Command/Multicast: use proper hyphen GHC was complaining about it possibly being a homoglyph: Command/Multicast.hs:111:36: error: warning: treating Unicode character <U+2212> as identifier character rather than as '-' symbol [-Wunicode-homoglyph] -- using a nice prime, namely 2521−1 but the sheer size of this ^ \| 111 \| -- using a nice prime, namely 2521−1 but the sheer size of this \| ^ 1 warning generated.	2021-05-04 05:44:31 +02:00
Joey Hess	424bef6b6f	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache.	2021-05-03 13:26:32 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	2b264b3edf	initremote --private	2021-04-23 14:47:46 -04:00
Joey Hess	d5a05655b4	Merge branch 'master' into hiddenannex	2021-04-23 13:06:33 -04:00
Joey Hess	0547884eb2	importfeed: fix bug while also speeding up 12x! * Fix bug that could make git-annex importfeed not see recently recorded state when configured with annex.alwayscommit=false. * importfeed: Made "checking known urls" phase run 12 times faster. The massive speedup is because it no longer queries for metadata accompanying each url. Instead it processes the whole git-annex branch and checks all metadata files for feed item ids, and uses any it finds. This could result in a behavior change, in an unlikely situation: If a feed id is recorded in a key's metadata, but the url gets removed, the old code would not see that item id and would re-download it if it finds an url for it in a feed, while the new code will see the item id. I don't think the old behavior was intentional, and it may be that the new behavior is better. Not gonna worry about this.	2021-04-23 12:36:56 -04:00
Joey Hess	b689f17062	refactoring	2021-04-23 11:44:10 -04:00
Joey Hess	da0a696c96	Revert "reorder another test" This reverts commit `3e63f00f63`.	2021-04-23 01:01:29 -04:00

1 2 3 4 5 ...

2493 commits