git-annex

Author	SHA1	Message	Date
Joey Hess	eb95ed4863	fix addurl concurrency issue addurl: Support adding the same url to multiple files at the same time when using -J with --batch --with-files. Implementation was easier than expected, was able to reuse OnlyActionOn. While it will download the url's content multiple times, that seems like the best thing to do; see my comment for why. Sponsored-by: Dartmouth College's DANDI project	2021-10-27 16:15:41 -04:00
Joey Hess	887edeb1ad	avoid warning when built with unix-compat 0.5.3 It re-exports modificationTimeHiRes, and provides a windows version. Might be worth using that windows version eventually, but I have not tested it.	2021-10-18 16:25:28 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	9012fa0187	reinject: Fix crash when reinjecting a file from outside the repository Commit `4bf7940d6b` introduced this problem, but was otherwise doing a good thing. Problem being that fileRef "/foo" used to return ":./foo", which was actually wrong, but as long as there was no foo in the local repository, catKey could operate on it without crashing. After that fix though, fileRef would return eg "../../foo", resulting in fileRef returning ":./../../foo", which will make git cat-file crash since that's not a valid path in the repo. Fix is simply to make fileRef detect paths outside the repo and return Nothing. Then catKey can be skipped. This needed several bugfixes to dirContains as well, in previous commits. In Command.Smudge, this led to needing to check for Nothing. That case should actually never happen, because the fileoutsiderepo check will detect it earlier. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 14:06:34 -04:00
Joey Hess	b9a1cc512d	avoid uncessary call to inAnnex sync --content: Avoid a redundant checksum of a file that was incrementally verified, when used on NTFS and perhaps other filesystems. When sync has just gotten the content, it does not need to check inAnnex a second time. On NTFS, for some reason the write of the inode cache after it gets the content is not immediately able to be read, and with an empty/non-matching inode cache due to that stale data, inAnnex falls back to hashing the whole object to determine if it's present. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 12:02:35 -04:00
Joey Hess	9ea8106bb0	sped up git-annex smudge --clean by 25% Disabling git-annex branch update for this command is ok, because it does not use any information from the branch, but only logs the location when it adds a key. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 14:15:20 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	ec12537774	defer write permissions checking in import until after copy to repo This should complete the fix started in `6329997ac4`, fixing the actual cause of the test suite failure this time. Sponsored-by: Dartmouth College's Datalad project	2021-09-02 13:45:21 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	a99a84f342	add: Detect when xattrs or perhaps ACLs prevent locking down a file's content And fail with an informative message. I don't think ACLs can prevent removing the write bit, but I'm not sure, so kept it mentioning them as a possibility. Should git-annex lock also check if the write bits are able to be removed? Maybe, but the case I know about with xattrs involves cp -a copying NFS xattrs, and it's the copy of the file that is the problem. So when locking a file, I guess it will not be the copy. Sponsored-by: Dartmouth College's Datalad project	2021-08-27 14:33:01 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	f9b92c81f6	unused: Skip the refs/annex/last-index ref that git-annex recently started creating This was unlikely to cause any problem, but it is unsightly to mention normally hidden refs, and it might have done a bit of unnecessary work to check that ref. Sponsored-by: Noam Kremen on Patreon	2021-08-24 12:58:14 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00
Joey Hess	b3c4579c79	work around strange auto-init bug git-annex get when run as the first git-annex command in a new repo did not populate unlocked files. (Reversion in version 8.20210621) I am not entirely happy with this, because I don't understand how `428c91606b` caused the problem in the first place, and I don't fully understand how skipping calling scanAnnexedFiles during autoinit avoids the problem. Kept the explicit call to scanAnnexedFiles during git-annex init, so that when reconcileStaged is expensive, it can be made to run then, rather than at some later point when the information is needed. Sponsored-by: Brock Spratlen on Patreon	2021-07-30 18:36:03 -04:00
Joey Hess	748addbe05	remove second pass in scanAnnexedFiles The pass was needed to populate files when annex.thin was set, but in commit `73e0cbbb19`, reconcileStaged started to do that. So, this second pass is not needed any longer.	2021-07-30 17:46:11 -04:00
Joey Hess	d2aead67bd	fsck: Detect and correct stale or missing inode caches for object files An easy way to see this in action is to have an unlocked file, and touch the object file. While all code that compares inode caches for object files needs to be prepared for this kind of problem and fall back to verification, having fsck notice it and correct it is cheap (as long as fsck is being run anyway) and ensures that if it happens for some unusual reason, there's a way for the user to notice that it's happening. Not that, when annex.thin is in use, the earlier call to isUnmodified (and also potentially earlier calls to inAnnex in eg, verifyLocationLog) will fix up the same problem silently. That might prevent the warning being displayed, although probably it still will be, because the Database.Keys write of the InodeCache will be queued but will not have happened yet. I can't see a way to improve this, but it's not great. Sponsored-by: Dartmouth College's Datalad project	2021-07-29 14:06:42 -04:00
Joey Hess	817ccbbc47	split verifyKeyContent This avoids it calling enteringStage VerifyStage when it's used in places that only fall back to verification rarely, and which might be called while in TransferStage and be going to perform a transfer after the verification.	2021-07-29 13:58:40 -04:00
Joey Hess	3c5280b1cf	improve comment wording	2021-07-29 13:21:23 -04:00
Joey Hess	72a13d2a5f	remove unused parameter	2021-07-29 13:12:11 -04:00
Joey Hess	14683da9eb	fix potential race in updating inode cache Some uses of linkFromAnnex are inside replaceWorkTreeFile, which was already safe, but others use it directly on the work tree file, which was race-prone. Eg, if the work tree file was first removed, then linkFromAnnex called to populate it, the user could have re-written it in the interim. This came to light during an audit of all calls of addInodeCaches, looking for such races. All the other uses of it seem ok. Sponsored-by: Brett Eisenberg on Patreon	2021-07-27 13:08:08 -04:00
Joey Hess	e4b2a067e0	fix potential race in updating inode cache In Annex.Content, the object file was statted after pointer files were populated. But if annex.thin is set, once the pointer files are populated, the object file can potentially be modified via the hard link. So, it was possible, though seemingly very unlikely, for the inode of the modified object file to be cached. Command.Fix and Command.Fsck had similar problems, statting the work tree files after they were in place. Changed them to stat the temp file that gets moved into place. This does rely on .git/annex being on the same filesystem. If it's not, the cached inode will not be the same as the one that the temp file gets moved to. Result will be that git-annex will later need to do an expensive verification of the content of the worktree files. Note that the cross-filesystem move of the temp file already is a larger amount of extra work, so this seems acceptable. Sponsored-by: Luke Shumaker on Patreon	2021-07-27 12:29:10 -04:00
Joey Hess	4637592325	fix a place where the inode cache could potentially have gotten stale When git-annex lock repopulates the object file by copying an associated file that still has its content, it negected to update the inode cache. I was not able to actually get this code to successfully repopulate the object file; the associated file gets replaced with a dangling pointer before unlock is able to do that. (By what I'm not sure.. reconcileStaged?) Which might be itself a bug, but anyway this makes me doubtful that this was really leading to a stale inode cache. Still, in case there is some situation in which it does work, fixed it to update the inode cache.	2021-07-26 14:12:58 -04:00
Joey Hess	3d50b47ded	sync, merge: Added --allow-unrelated-histories option Which is the same as the git merge option. After last commit, this turns out to be needed in the test suite, and when doing git-annex import from special remote, followed by a git-annex merge. Sponsored-by: Svenne Krap on Patreon	2021-07-19 12:14:26 -04:00
Joey Hess	b6bea0d3f2	remove direct mode remnant of merging unrelated histories sync, merge, post-receive: Avoid merging unrelated histories, which used to be allowed only to support direct mode repositories. (However, sync does still merge unrelated histories when importing trees from special remotes, and the assistant still merges unrelated histories always.) See `556b2ded2b` for why this was added back in 2016, for direct mode. This is a behavior change, which might break something that was relying on sync merging unrelated histories, but git had a good reason to prevent it, since it's easy to foot shoot with it, and git-annex should follow suit. Sponsored-by: Noam Kremen on Patreon	2021-07-19 11:41:26 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	274d2380c7	better key matching with a regexp Handles keys that are substrings of other keys, as well as pointer files that contain a newline after the key. Note that -S does not match regexp, while -G does by default. Docs are not clear, determined experimentally. The only other difference in changing to -G is that if a file used to contain the key and changed in some way, while still containing the key, -G will match and -S would not. So eg, annex links that git annex fix rewrites will match, and files that change lock status will match. Which is an improvement anyway. Sponsored-by: Jochen Bartl on Patreon	2021-07-14 16:31:17 -04:00
Joey Hess	7a46bb1b28	change message to suggest using whereused --historical	2021-07-14 16:08:47 -04:00
Joey Hess	d6f056eca3	have whereused also check the reflog Since the stash is part of that, it can also find stashed content. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-07-14 16:05:20 -04:00
Joey Hess	fcd1b93a7d	whereused --historical Does not check the reflog, but otherwise works. It's possible for it to display something that is not an annexed file, if a non-annexed file somehow ends up containing something that looks like the key's name. This seems very unlikely to happen, and it would add a lot of complexity to detect it and somehow skip over that file, since the git log would need to either be run again, or not limited to 1 result and canceled once enough results have been read. Also, it kind of seems ok, if a file refers to a key, to consider that as a place the key was used, for some definition of used. So, I punted on dealing with that. May revisit later. Sponsored-by: Brock Spratlen on Patreon	2021-07-14 15:38:28 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	b9db859221	addurl: Avoid crashing when used on beegfs. Sponsored-by: Dartmouth College's DANDI project	2021-07-05 13:02:40 -04:00
Joey Hess	b8e32e200e	addurl, importfeed: Added --no-raw option Forces eg, download with youtube-dl without falling back to raw download. Since youtube-dl failing due to an url not being supported is difficult to distinguish from it failing due to being blocked in some way, this can be useful to avoid the fallback of git-annex downloading the raw web page and adding that. Since --raw also prevents using special remotes, --no-raw also allows special remote downloads. Although it's always possible that some special remote may claim an url and fall back to raw download of the content, which --no-raw cannot prevent. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-06-27 11:14:51 -04:00
Joey Hess	3a14648142	dropping unused marks as dead Dropping an object with drop --unused or dropunused will mark it as dead, preventing fsck --all from complaining about it after it's been dropped from all repositories. If another repository still has a copy, it won't be treated as dead until it's also dropped from there. The drop has to use --unused, can't be --key or something else, because this indicates that the user has recently ran git-annex unused. If it checked the unused log on every drop, bad things would happen when the unused log was out of date, eg a file used to be unused but then got re-added. Marking such a file as dead could be confusing. When the user uses --unused/dropunused, they must consider the unused information to be up-to-date. The particular workflow this enables is: git annex add foo git annex unannex foo git annex unused git annex drop --unused / dropunused git annex fsck --all # no warnings The docs for git-annex unannex say to use git-annex unused and dropunused, so the user should be pointed in this direction when they want to undo an accidental add. Sponsored-by: Brock Spratlen on Patreon	2021-06-25 15:22:26 -04:00
Joey Hess	1cc7b2661e	push synced/master before synced/git-annex sync: Partly work around github behavior that first branch to be pushed to a new repository is assumed to be the head branch, by not pushing synced/git-annex first. github expects master (or whatever the name is) to be pushed first, but git-annex sync can't, because it's got to also support pushes to non-bare repos where pushing master fails, as explained in the big comment. So pushing synced/master is not entirely a fix, but at least it makes github default to a branch with the stuff the user expects in it, not a bunch of annex log files. Aside from fixing github to not make this assumption, or improving the git push protocol to include what the current HEAD is, the only other approach I can think of is to identify git push's progress messages and display those when pushing master, while filtering out error messages about non-fast-forward etc. But git doesn't provide a way to separate out or identify its progress messages. Sponsored-by: Luke Shumaker on Patreon	2021-06-21 12:32:21 -04:00
Joey Hess	d2be68907c	drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies Eg, before with a .gitattributes like: .2 annex.numcopies=2 .1 annex.numcopies=1 And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2 would succeed, leaving just 1 copy, despite foo.2 needing 2 copies. It dropped foo.1 first and then skipped foo.2 since its content was gone. Now that the keys database includes locked files, this longstanding wart can be fixed. Sponsored-by: Noam Kremen on Patreon	2021-06-15 11:38:44 -04:00
Joey Hess	d164434679	fix build	2021-06-15 11:14:43 -04:00
Joey Hess	b3712b6047	refactor	2021-06-15 10:27:33 -04:00
Joey Hess	78da00c7a6	Future proof activity log parsing When the log has an activity that is not known, eg added by a future version of git-annex, it used to be treated as no activity at all, which would make git-annex expire think it should expire the repository, despite it having some kind of recent activity. Hopefully there will be no reason to add a new activity until enough time has passed that this commit is in use everywhere. Sponsored-by: Jake Vosloo on Patreon	2021-06-14 14:18:19 -04:00
Joey Hess	13b9a288d3	scanAnnexedFiles in smudge --update This makes git checkout and git merge hooks do the work to catch up with changes that they made to the tree. Rather than doing it at some later point when the user is not thinking about that past operation. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:37:47 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	a56b151f90	fix longstanding indeterminite preferred content for duplicated file problem * drop: When two files have the same content, and a preferred content expression matches one but not the other, do not drop the file. * sync --content, assistant: Fix an edge case where a file that is not preferred content did not get dropped. The sync --content edge case is that handleDropsFrom loaded associated files and used them without verifying that the information from the database was not stale. It seemed best to avoid changing --want-drop's behavior, this way when debugging a preferred content expression with it, the files matched will still reflect the expression. So added a note to the --want-drop documentation, to make clear it may not behave identically to git-annex drop --auto. While it would be possible to introspect the preferred content expression to see if it matches on filenames, and only look up the associated files when it does, it's generally fairly rare for 2 files to have the same content, and the database lookup is already avoided when there's only 1 file, so I did not implement that further optimisation. Note that there are still some situations where the associated files database does not get locked files recorded in it, which will prevent this fix from working. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:07:05 -04:00
Joey Hess	428c91606b	include locked files in the keys database associated files Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.	2021-05-21 16:24:37 -04:00
Joey Hess	24c7d9ba78	decided not to include export/import trees They're only needed to cover a gc edge case, and it's better someone gets caught by that edge case than that someone who does not know about them ends up with a filtered git-annex branch that contains such a tree when some of the files listed in it are ones they wanted to remove from the repository.	2021-05-17 14:12:15 -04:00
Joey Hess	2420910ab8	include info for sameas repos It's not currently possible to exclude a sameas repo using its annex-config-uuid. (Remote.nameToUUID rejects them). Since there's no real documented way to learn those, this seems ok, at least for now. Also it avoids the problem of someone excluding the parent but including the sameas, which would probably make the sameas repo not usable when using the filtered branch.	2021-05-17 14:04:14 -04:00
Joey Hess	984034f335	filter-branch working aside from some edge cases Added a note to man page about what happens to information that is recorded in the private journal. Since it uses Branch.get, that information will be copied when options allow. It seemed better to allow it and document it than not allow it, since the options allow excluding repositories and so can be used to exclude private repos if desired.	2021-05-17 13:24:58 -04:00
Joey Hess	1da9fe5bd8	implemented filter-branch for key info Not tested yet but should work. Noted a possible optimisation, which should probably be added, to speed it up in cases where there is no uuid filtering being done. It would need Annex.Branch to add a function like getRef that uses catFileDetails, so the sha is also returned. The difficulty would be making it support the precached file content; if it didn't it would probably not be any faster and could even be slower. So probably the precaching would need to be changed to also cache the sha.	2021-05-17 11:11:39 -04:00
Joey Hess	80a9944f3b	don't implicitly include all when exclude options are used This is less erorr-prone, and easier for the user to reason about; it preserves the man page's promise that only explicitly included information will be copied.	2021-05-14 14:14:46 -04:00
Joey Hess	a58c90ccf4	skeleton of filter-branch command, with option parser	2021-05-14 10:59:48 -04:00
Joey Hess	947d2a10bc	assistant: Fix a crash on startup by avoiding using forkProcess ghc 8.8.4 seems to have changed something that broke code that has been successfully using forkProcess since 2012. Likely a change to GC internals. Since forkProcess has never had clear documentation about how to use it safely, avoid using it at all. Instead, when git-annex needs to daemonize itself, re-run the git-annex command, in a new process group and session. This commit was sponsored by Luke Shumaker on Patreon.	2021-05-12 15:08:03 -04:00
Joey Hess	949627b902	remove inode cache in unannex Similar to what commit `675556fd9a` did for adding a non-annexed file, this prevents the smudge clean filter recognising the inode if git add is later run on the unannexed file.	2021-05-12 11:09:38 -04:00
Joey Hess	675556fd9a	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache. git-annex add was changed, when adding a small file, to remove the inode cache for it. This is necessary to keep the recipe in doc/tips/largefiles.mdwn for converting from annex to git working. It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn which the earlier try at this change introduced.	2021-05-10 13:20:10 -04:00
Joey Hess	72a8bbce12	Revert "smudge: check for known annexed inodes before checking annex.largefiles" This reverts commit `424bef6b6f`. This commit caused other buggy behavior unfortunately.	2021-05-10 12:20:13 -04:00
Joey Hess	921753ac44	reinject: Error out when run on a file that is not annexed rather than silently skipping it	2021-05-07 13:31:03 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	1bd44c7742	Merge remote-tracking branch 'atemu/misc-fixes'	2021-05-07 11:23:54 -04:00
Kyle Meyer	4450fe3629	fromkey: create directory for pointer files too fromkey creates leading directories for symbolic links. Do the same for pointer files.	2021-05-07 11:10:06 -04:00
Atemu	a7f0014a53	Command/Multicast: use proper hyphen GHC was complaining about it possibly being a homoglyph: Command/Multicast.hs:111:36: error: warning: treating Unicode character <U+2212> as identifier character rather than as '-' symbol [-Wunicode-homoglyph] -- using a nice prime, namely 2521−1 but the sheer size of this ^ \| 111 \| -- using a nice prime, namely 2521−1 but the sheer size of this \| ^ 1 warning generated.	2021-05-04 05:44:31 +02:00
Joey Hess	424bef6b6f	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache.	2021-05-03 13:26:32 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	2b264b3edf	initremote --private	2021-04-23 14:47:46 -04:00
Joey Hess	d5a05655b4	Merge branch 'master' into hiddenannex	2021-04-23 13:06:33 -04:00
Joey Hess	0547884eb2	importfeed: fix bug while also speeding up 12x! * Fix bug that could make git-annex importfeed not see recently recorded state when configured with annex.alwayscommit=false. * importfeed: Made "checking known urls" phase run 12 times faster. The massive speedup is because it no longer queries for metadata accompanying each url. Instead it processes the whole git-annex branch and checks all metadata files for feed item ids, and uses any it finds. This could result in a behavior change, in an unlikely situation: If a feed id is recorded in a key's metadata, but the url gets removed, the old code would not see that item id and would re-download it if it finds an url for it in a feed, while the new code will see the item id. I don't think the old behavior was intentional, and it may be that the new behavior is better. Not gonna worry about this.	2021-04-23 12:36:56 -04:00
Joey Hess	b689f17062	refactoring	2021-04-23 11:44:10 -04:00
Joey Hess	da0a696c96	Revert "reorder another test" This reverts commit `3e63f00f63`.	2021-04-23 01:01:29 -04:00
Joey Hess	3e63f00f63	reorder another test continuing to try to narrow down cause of failure on windows	2021-04-22 10:03:35 -04:00
Joey Hess	9b870e29fd	Merge branch 'master' into hiddenannex	2021-04-21 13:04:40 -04:00
Joey Hess	39d94919cd	reorder tests debugging windows failure This order will work just as well, so no need to revert this change later.	2021-04-21 13:01:41 -04:00
Joey Hess	05989556a2	start implementing hidden git-annex repositories This adds a separate journal, which does not currently get committed to an index, but is planned to be committed to .git/annex/index-private. Changes that are regarding a UUID that is private will get written to this journal, and so will not be published into the git-annex branch. All log writing should have been made to indicate the UUID it's regarding, though I've not verified this yet. Currently, no UUIDs are treated as private yet, a way to configure that is needed. The implementation is careful to not add any additional IO work when privateUUIDsKnown is False. It will skip looking at the private journal at all. So this should be free, or nearly so, unless the feature is used. When it is used, all branch reads will be about twice as expensive. It is very lucky -- or very prudent design -- that Annex.Branch.change and maybeChange are the only ways to change a file on the branch, and Annex.Branch.set is only internal use. That let Annex.Branch.get always yield any private information that has been recorded, without the risk that Annex.Branch.set might be called, with a non-private UUID, and end up leaking the private information into the git-annex branch. And, this relies on the way git-annex union merges the git-annex branch. When reading a file, there can be a public and a private version, and they are just concacenated together. That will be handled the same as if there were two diverged git-annex branches that got union merged.	2021-04-20 15:04:53 -04:00
Joey Hess	5783a8d081	fsck: avoid redundant checksum when transfer is Verified When downloading content from a remote, if the content is able to be verified during the transfer, skip checksumming it a second time. Note that in this case, the fsck output does not include "(checksum)" which it does when the checksumming is done separately from the download. This commit was sponsored by Brock Spratlen on Patreon.	2021-04-14 13:22:54 -04:00
Joey Hess	805d325a8d	diffdriver: Support unlocked files	2021-04-08 14:32:09 -04:00
Joey Hess	13c090b37a	use fastDebug everywhere it can be used None of these are likely to yeild a noticable speedup though.	2021-04-06 15:41:24 -04:00
Joey Hess	1b645e1ace	added --debugfilter (and annex.debugfilter)	2021-04-05 15:31:10 -04:00
Joey Hess	aaba83795b	switch from hslogger to purpose-built Utility.Debug This uses a DebugSelector, rather than debug levels, which will allow for a later option like --debug-from=Process to only see debuging about running processes. The module name that contains the thing being debugged is used as the DebugSelector (in most cases; does not need to be a hard and fast rule). Debug calls were changed to add that. hslogger did not display that first parameter to debugM, but the DebugSelector does get displayed. Also fastDebug will allow doing debugging in places that are used in tight loops, with the DebugSelector coming from the Annex Reader essentially for free. Not done yet.	2021-04-05 13:40:31 -04:00
Joey Hess	c2f612292a	start splitting out readonly values from AnnexState Values in AnnexRead can be read more efficiently, without MVar overhead. Only a few things have been moved into there, and the performance increase so far is not likely to be noticable. This is groundwork for putting more stuff in there, particularly a value that indicates if debugging is enabled. The obvious next step is to change option parsing to not run in the Annex monad to set values in AnnexState, and instead return a pure value that gets stored in AnnexRead.	2021-04-02 15:51:44 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	c68ba7d893	whereis: Don't include yt: prefix when showing url to content retrieved with youtube-dl I don't think this was really intentional behavior. It may be that it was useful to include it so it could be passed to rmurl, since without it rmurl would not actually remove the url. Since that was changed earlier today, now seems like a good time to clean up the display of these urls. This commit was sponsored by Jochen Bartl on Patreon.	2021-03-22 19:56:24 -04:00
Joey Hess	637229c593	fix fsck --from --all to not fall over trying to check required content fsck: When --from is used in combination with --all or similar options, do not verify required content, which can't be checked properly when operating on keys. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-03-22 15:08:07 -04:00
Joey Hess	0af9d1dcb6	unregisterurl: remove all forms of an url, no matter what the downloader is set to unregisterurl: Fix a bug that caused an url to not be unregistered when it is claimed by a special remote other than the web. See commit `f175d4cc90` for rationalle.	2021-03-22 12:17:17 -04:00
Joey Hess	f175d4cc90	rmurl: remove all forms of an url, no matter what the downloader is set to * rmurl: When youtube-dl was used for an url, it no longer needs to be prefixed with "yt:" in order to be removed. * rmurl: If an url is both used by the web and also claimed by another special remote, fix a bug that caused the url to to not be removed. The youtube-dl change is a consequence of how the bug fix is implemented. But I also think it's the right thing to do. Consider that, before, git-annex addurl $url followed by git-annex rmurl $url would not remove the url in the case where youtube-dl was used. That was surprising behavior. In the unlikely case where a special remote claims an url, and it's been added using OtherDownloader, but it was also added already as a web url, it seems better for rmurl to remove both than to arbitrarily remove only one. And in the case the bug report was filed for, when an url was added as a web url, but a special remote now claims it, that should not prevent rmurl removing the web url. Calling setUrlMissing lets other callers of it behave differently. Probably the calls to it in eg, Remote.External and Remote.BitTorrent are fine, since they don't mangle the url and just remove what was provided, and the OtherDownloader form of a bittorrent url, respectively. I suspect unregisterurl needs to have a similar change made to rmurl, for similar reasons.	2021-03-22 12:09:15 -04:00
Joey Hess	8bae692486	better interface for catKey' It only needs the size, so don't require the other stuff. Should let it be used in more places, making things faster.	2021-03-16 14:52:23 -04:00
Joey Hess	6481991208	export --json: Fill in the file field Like import was using ActionItemWorkTreeFile, it's ok to use it for export, even though it might not correspond with a file in the work tree. And renamed it to ActionItemTreeFile to make that clearer. Note that when an export has to rename files, it still uses ActionItemOther, so file will still be null in that case, but as no file is being transferred, that seems ok.	2021-03-12 14:11:31 -04:00
Joey Hess	1cb154f457	avoid importing deleting submodule import: When the previously exported tree contained a submodule, preserve it in the imported tree so it does not get deleted. The export exclude log, which was used for non-preferred content, now also includes the submodules. Since the log format is git ls-tree output, this does not break backwards compatibility.	2021-03-12 13:31:21 -04:00
Joey Hess	f2a425bd92	export: When a submodule is in the tree to be exported, skip it.	2021-03-12 12:29:18 -04:00
Joey Hess	4fc5dbc942	update comment	2021-03-11 12:03:36 -04:00
Joey Hess	cdd512cd9f	simplify	2021-03-05 14:22:04 -04:00
Joey Hess	1b041f5c51	avoid logging location of GIT keys It's not necessary to log location of GIT keys, because these files are not annexed files and so git-annex will never need to get them. This corresponds to code in Annex.Import that already checked before updating the location log when handling deleted files. Older versions of git-annex that used SHA1 keys for non-annexed files also unncessarily updated the location log for them. GIT keys still appear in the git-annex branch for content identifier logs, so kept the documentation of them in backends.mdwn This commit was sponsored by Jake Vosloo on Patreon.	2021-03-05 14:12:34 -04:00
Joey Hess	fc61915230	use GIT keys for export of non-annexed files This solves the problem that import of such files gets confused and converts them back to annexed files. The import code already used GIT keys internally when it determined a file should not be annexed. So now when it sees a GIT key that export used, it already does the right thing. This also means that even older version of git-annex can import and will do the right thing, once a fixed version has exported. Still, there may be other complications around upgrades; still need to think it all through. Moved gitShaKey and keyGitSha from Key to Annex.Export since they're only used for export/import. Documented GIT keys in backends, since they do appear in the git-annex branch now. This commit was sponsored by Graham Spencer on Patreon.	2021-03-05 14:12:11 -04:00
Joey Hess	a14001785e	fix --branch combined with --unlocked or --locked Since it's using git ls-tree anyway, can just look at the file modes to see if they're unlocked or are symlinks.	2021-03-02 13:47:27 -04:00
Joey Hess	cbf94fd13d	prep for fixing find --branch --unlocked Added LinkType to ProvidedInfo, and unified MatchingKey with ProvidedInfo. They're both used in the same way, so there was no real reason to keep separate. Note that addLocked and addUnlocked still set matchNeedsFileName, because to handle MatchingFile, they do need it. However, they don't use it when MatchingInfo is provided. This should be ok, the --branch case will be able skip checking matchNeedsFileName, since it will provide a filename in any case.	2021-03-02 13:39:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	eb594c710e	unregisterurl: New command Implemented by generalizing registerurl. Without the implicit batch mode of registerurl since that is only a backwards compatability thing (see commit `1d1054faa6`).	2021-03-01 14:28:24 -04:00
Joey Hess	97ae474585	registerurl: Allow it to be used in a bare repository.	2021-03-01 14:03:03 -04:00
Joey Hess	a8b627d82b	uninit: Fix a small bug that left a lock file in .git/annex unannex using git queue caused the queue lock to be taken after uninit had cleaned out .git/annex. Flush the queue earlier to avoid.	2021-03-01 13:05:47 -04:00
Joey Hess	530e96b80e	fix unannex data overwrite bug unannex, uninit: When an annexed file is modified, don't overwrite the modified version with an older version from the annex This commit was sponsored by Mark Reidenbach on Patreon.	2021-02-22 13:35:00 -04:00
Joey Hess	62d5a73bdd	unannex, uninit: Avoid running git rm once per annexed file, for a large speedup.	2021-02-22 12:56:11 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	1b63132ca3	add searchPathContents And rename related functions for consistency.	2021-02-02 19:06:15 -04:00
Joey Hess	8d4eb2d34e	get: Improve output when failing to get a file fails showTriedRemotes lists the remotes it tried to access. So there's no need to list those again in "Try making some of these remotes available".	2021-01-29 15:11:19 -04:00
Joey Hess	6f78497572	When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked Missed this when implementing it because of the default case catching the new constructor. So, removed that default case to make sure future types of adjusted branches don't make the same mistake. Complicated by git-annex addurl --fast which adds the file whose content is not present, so it needs to stay unlocked when on such a branch. This commit was sponsored by Brock Spratlen on Patreon.	2021-01-28 12:47:46 -04:00
Joey Hess	d4aac64282	fix breakage caused by recent commit `34a535ebea` broke the test suite. Getting a file started failing in one case, because the annex object did not have its inode cached, so was not trusted to be unmodified. This adds something very similar to what was added to linkAnnex in commit `2e9341a47d` -- if there are not yet any inodes cached for a key, add the inode of the annex object when adding the inode of the unlocked file. Feels like this should be handled in a more principled way. How do we know the addInodeCaches call in getMoveRaceRecovery just above this change is currently correct? It doesn't add the annex object inode cache. Ah well, maybe sometime when I've not had my entire evening eaten by a reversion that the test suite caught as I was cooking dinner.	2021-01-25 21:22:18 -04:00
Joey Hess	47338bf270	support modifying and running git add on an unlocked file that used an URL key Avoids the smudge --clean filter failing because URL keys do not support genKey. Instead the modified content will be added using the default backend. This commit was sponsored by Jochen Bartl on Patreon.	2021-01-25 17:37:16 -04:00
Joey Hess	34a535ebea	adjust: Fix some bad behavior when unlocked files use URL keys. This avoids the smudge --clean filter failing on the URL keys. git checkout runs the post-checkout hook, which runs smudge --update. That populates all the pointer files, but it neglected to store their inode caches in the keys db. With that done, and the keys db flushed before smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap check can tell the file is not modified, so will not try to re-ingest it, which does not work with URL keys because they do not support genKey. It also seems possible that the isUnmodifiedCheap was also failing for non-URL keys, which would cause them to be re-ingested, leading to a lot of extra work. I have not verified that, but don't see why it wouldn't have happened. So this probably also speeds up checking out adjusted branches. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-01-25 17:25:42 -04:00
Joey Hess	6a30d04ece	Bug fix: export with -J could fail when two files had the same content. Exporting is done inside a call to writeLockDbWhile which guarantees there is only one process uploading to a given ExportLocation.	2021-01-13 14:50:48 -04:00
Joey Hess	09b0562ec3	test: avoid unnecessary tests of variants of git remote Configuring chunking and encryption for a git remote has no effect, so skip testing those variants in the TestRemote call. It would be better if TestRemote itself could do this, but it doesn't seem possible there. There is no way to look at a Remote and tell if it supports chunking or encryption. Note that, while the test suite displays output as it it's testing exporting, it actually skips doing anything for the tests when run on the git remote. So at least does not waste time even though the output is not ideal. This commit was sponsored by Noam Kremen on Patreon.	2021-01-11 13:43:55 -04:00
Joey Hess	8db09feeba	fix format of message newlines are eaten	2021-01-11 13:14:09 -04:00
Joey Hess	6a0030a110	Behavior change: git-annex trust now needs --force Since unconsidered use of trusted repositories can lead to data loss. Trusted has always been this way, but it used to be acceptable for git-annex to be set up so that data could be lost without using --force, and most or all other ways that can happen have already been eliminated. This commit was sponsored by Mark Reidenbach on Patreon.	2021-01-07 10:09:39 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	5ce61c6b2a	add: Significantly speed up adding lots of non-large files to git * add: Significantly speed up adding lots of non-large files to git, by disabling the annex smudge filter when running git add. * add --force-small: Run git add rather than updating the index itself, so any other smudge filters than the annex one that may be enabled will be used.	2021-01-04 13:12:28 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	6280af2901	generate more compact git-annex branch for imports Especially from borg, where the content identifier logs all end up being the same identical file! But also, for other imports, the location tracking logs can, in some cases, be identical files. Bonus optimisation: Avoid looking up (and parsing when set) GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to. Although the lookup does happen at startup even when no log will be written now.	2020-12-23 15:25:16 -04:00
Joey Hess	7916fc98a3	graft in imported tree to avoid gc Fix a bug that could prevent getting files from an importtree=yes remote, because the imported tree was allowed to be garbage collected.	2020-12-23 14:27:38 -04:00
Joey Hess	1574972ba9	make sync --content get from third-party populated remotes like borg	2020-12-23 12:10:39 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	15000dee07	improve thirdpartypopulated support May actually work now. Note that, importKey now has to add the size to the key if it's supposed to have size. Remote.Directory relied on the importer adding the size, which is no longer done, so it was changed; it was the only one. This way, importKey does not need to behave differently between regular and thirdpartypopulated imports.	2020-12-21 16:19:44 -04:00
Joey Hess	57b03630b3	support thirdPartyPopulated These don't have importTree in their config, because they don't support tree import, but they do still support import, and do not support export or key/value modification.	2020-12-21 13:49:47 -04:00
Joey Hess	771b6c64f0	Merge branch 'master' into borg	2020-12-18 16:05:09 -04:00
Joey Hess	e0062c4f93	build fix	2020-12-18 16:04:56 -04:00
Joey Hess	909318dcee	Merge branch 'master' into borg	2020-12-18 15:27:24 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	f62aee0525	fix handling of importtree-only remotes Don't want to try to use these remotes as key/value remotes, which will surely fail. It only recently became possible for importtree to be set w/o exporttree, so before this code was ok. (cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)	2020-12-18 15:13:30 -04:00
Joey Hess	53fd1564b1	improve synopsis	2020-12-17 12:51:49 -04:00
Joey Hess	2abda21123	update	2020-12-15 16:35:06 -04:00
Joey Hess	f29d49d478	check Remote.hasKeyCheap again In `cd1676d604`, it stopped using that to avoid surprising behavior when the location log and remote content were out of sync. But, it seems that may have changed some behavior users relied on as well, and also Remote.hasKeyCheap should be faster than checking then location log. So, try Remote.hasKeyCheap first, and only if it does not have the key, fall back to checking the location log. If the location log still thinks it's present, go ahead and try to get it, so the user will see a failure rather than silently skipping a file what whereis says is on the remote. This does make slightly slower the case where the remote does not have the key, and location log and Remote.hasKeyCheap agree, since it now checks both. But only 1 stat slower.	2020-12-15 14:44:00 -04:00
Joey Hess	00526a6739	pass along -c options to child git-annex processes	2020-12-15 10:49:29 -04:00
Joey Hess	ed68a2166d	importfeed: Avoid using youtube-dl when a feed does not contain an enclosure, but only a link to an url which youtube-dl does not support This is common in some feeds, which might mix some items with enclosures, with others that link to posts or whatever. Before this, it would try to use youtube-dl and fail, or if youtube-dl was not allowed, it would incorrectly complain that an url was supported by youtube-dl.	2020-12-15 01:13:21 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	4a8723246d	avoid transferrer committing the git-annex branch on shutdown The parent is will do it when it shuts down, and having both of them trying to do it at the same time seems like something good to avoid.	2020-12-11 16:16:07 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	cedad7b37d	refactor	2020-12-10 16:33:52 -04:00
Joey Hess	04c12aa6df	custom protocol for transferrer Rather than using Read/Show, which would force me to preserve data types into the future. I considered just deriving json and sending that, but I don't much like deriving json with data types that have named constructors (like Key does) because again it locks in data type details. So instead, used SimpleProtocol, with a fairly complex and unreadable protocol. But it is as efficient as the p2p protocol at least, and as future proof. (Writing my own custom json instances would have worked but I thought of it too late and don't want to do all the work twice. The only real benefit might be that aeson could be faster.) Note that, when a new protocol request type is added later, git-annex trying to use it will cause the git-annex transferrer to display a protocol error message. That seems ok; it would only happen if a new git-annex found an old version of itself in PATH or the program file. So it's unlikely, and all it can do anyway is display an error. (The error message could perhaps be improved..) This commit was sponsored by Jack Hill on Patreon.	2020-12-09 16:13:59 -04:00
Joey Hess	004a4f5fb1	factor out Types.Transferrer	2020-12-09 13:28:49 -04:00
Joey Hess	677003a6df	rename helper More consistent name with TransferrerPool	2020-12-09 13:24:24 -04:00
Joey Hess	05c0543e8e	move new interface to git-annex transfer This is to avoid breakage when upgrading or downgrading git-annex with a process running that uses the interface. It's better to keep the compatability code for a few years than worry about such breakage. This commit was sponsored by Brett Eisenberg on Patreon.	2020-12-09 12:33:56 -04:00
Joey Hess	fcc9e01556	finally using transferkeys Seems to work! Even progress bars. Have not tested prompting or various error message displays yet. transferkeys had to be made to operate in different modes for the Assistant and Annex monads. A bit ugly, but it did relegate that really ugly Database.Keys.closeDb in transferkeys to only the assistant code path. This commit was sponsored by Noam Kremen.	2020-12-07 16:18:26 -04:00
Joey Hess	4c47568876	refactoring This is groundwork for using git-annex transferkeys to run transfers, in order to allow stalled transfers to be interrupted and retried. The new upload and download are closer to what git-annex transferkeys does, so the plan is to make them use it. Then things that were left using upload' and download' won't recover from stalls. Notably, that includes import and export. But at least get/move/copy will be able to. (Also the assistant hopefully, but not yet.) This commit was sponsored by Jake Vosloo on Patreon.	2020-12-07 14:49:17 -04:00
Joey Hess	438d5be1f7	support prompt in message serialization That seems to be the last thing needed for message serialization. Although it's only used in the assistant currently, so hard to tell if I forgot something. At this point, it should be possible to start using transferkeys when performing transfers, which will allow killing a transferkeys process if a transfer times out or stalls. But that's for another day. This commit was sponsored by Ethan Aubin.	2020-12-04 14:54:09 -04:00
Joey Hess	7a9b618d5d	fix problem with last commit and assistant liftAnnex blocks all others calls, so avoid using it with a long-duration call to readResponse.	2020-12-04 12:20:04 -04:00
Joey Hess	cad147cbbf	new protocol for transferkeys, with message serialization Necessarily threw out the old protocol, so if an old git-annex assistant is running, and starts a transferkeys from the new git-annex, it would fail. But, that seems unlikely; the assistant starts up transferkeys processes and then keeps them running. Still, may need to test that scenario. The new protocol is simple read/show and looks like this: TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo")) TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9})) TransferOutput (OutputMessage "(checksum...) ") TransferResult True Granted, this is not optimally fast, but it seems good enough, and is probably nearly as fast as the old protocol anyhow. emitSerializedOutput for ProgressMeter is not yet implemented. It needs to somehow start or update a progress meter. There may need to be a new message that allocates a progress meter, and then have ProgressMeter update it. This commit was sponsored by Ethan Aubin	2020-12-03 16:21:20 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	631c8d3e5b	avoid redundant adjusted branch update in sync sync still does update it if the config would otherwise not, since it already did.	2020-11-16 15:13:48 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	26cf26caca	Merge branch 'master' into symlink-missing	2020-11-16 10:03:12 -04:00
Joey Hess	5a8d01f63e	examinekey: Added a "file" format variable For consistency with find, and for easier scripting.	2020-11-16 09:59:11 -04:00
Joey Hess	ccfa9b2dc4	make sync update --unlock-present branch	2020-11-13 15:04:34 -04:00
Joey Hess	e66b7d2e1b	rename to --unlock-present and better reverse adjusting An --unlock-present branch reverses back to a branch where all files that get modified or renamed become locked, even if they were originally unlocked. This is the same that reversing a --unlock branch works, and the new name makes that commonality more clear.	2020-11-13 14:56:43 -04:00
Joey Hess	3899e216af	Merge branch 'master' into symlink-missing	2020-11-13 14:19:45 -04:00
Joey Hess	a30030c4a6	move: Fix a regression in the last release that made move --to not honor numcopies settings This commit was sponsored by Svenne Krap on Patreon.	2020-11-13 14:19:32 -04:00
Joey Hess	c8e49c5ef5	git-annex adjust --lock-missing Like --hide-missing the branch does not get updated when content availability changes. Seems to basically work, but sync does not update it yet. Also, when a file is present and so unlocked, git mv followed by git-annex sync results in the basis branch being updated to contain the file with the new name, unlocked. This seems different than what happens in an adjusted unlocked branch, where the commit propigates back locked. Probably the reverse adjustment code needs to be improved to handle this case.	2020-11-13 13:39:44 -04:00
Joey Hess	7566aa6bc5	examinekey: Added --migrate-to-backend Note that, the way the SeekInput parser is written to support batch mode, it's actually possible to do git-annex examinekey "SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E While that might be kind of useful to support multiple migrations not using batch mode, I have not documented it. It would be better to take pairs of key and file in that case.	2020-11-12 14:09:14 -04:00
Joey Hess	12e32d1dee	examinekey: Added two new format variables: objectpath and objectpointer	2020-11-12 13:02:31 -04:00
Joey Hess	92b7b1964d	add warning on add of annex link Warn when adding a annex symlink or pointer file that uses a key that is not known to the repository, to prevent confusion if the user has copied it from some other repository. This commit was sponsored by Jake Vosloo on Patreon.	2020-11-10 12:10:51 -04:00
Joey Hess	e81bb05b25	add debug in two unusual situations	2020-11-09 17:52:06 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	4bcb4030a5	more RawFilePath conversion 580/645 This commit was sponsored by Jack Hill on Patreon.	2020-11-03 18:34:27 -04:00
Joey Hess	eb42cd4d46	more RawFilePath conversion 535/645 This commit was sponsored by Brett Eisenberg on Patreon.	2020-11-03 10:11:04 -04:00
Joey Hess	55400a03d3	more RawFilePath conversion This commit was sponsored by Luke Shumaker on Patreon.	2020-11-02 16:31:28 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	a108b00b33	testremote: Display exceptions when tests fail, to aid debugging	2020-10-23 15:41:57 -04:00
Joey Hess	0133b7e5a8	move: Improve resuming a move that was interrupted after the object was transferred In cases where numcopies checks prevented the resumed move from dropping the object from the source repository, it now relies on a log of recent moves to replicate the behavior of the interrupted command. Performance: Probably noticable impact, since it has to add to the log, check the log, and remove from the log. Seems worth it to avoid this annoying edge case. The log functions are pretty well optimised to avoid unncessary work. An performance improvement to make later would be to avoid cleanup doing anything if it's not written to the log file, and has confirmed that the log file does not contain the log line. This commit was sponsored by Jake Vosloo on Patreon.	2020-10-21 10:31:56 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	2dd38b6403	switch to Haskell2010 When I put in Haskell98 this spring, I was under the mistaken apprehension that ghc defaulted to that. But it actually its default is a third mode, which is closer to Haskell2010 but with some differences. The manual says "By default, GHC mainly aims to behave (mostly) like a Haskell 2010 compiler" Fixed two cases where the Haskell98 do indentation flexability let wrongly indented code build. That is one of the places where ghc does not behave like Haskell2010 by default. The other place that I think I was concerned about, is GHC manual section 19.1.1.3. Expressions and patterns. But that only seems to affect code using bottoms, so would only affect pure functions throwing an error, which I don't think git-annex does in many places as it's pretty horrid style. And it would only affect rare cases like shown in that section. If it did happen, it would mean that the error was not thrown before specifying Haskell98, and then was. Haskell2010 behaves the same as Haskell98. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-10-19 11:26:16 -04:00
Joey Hess	c56efbbdb6	import: Check gitignores when importing trees from special remotes It seemed best to do this, for consistency with every other way files can get into a git-annex repo. Although it's just a bit strange that a local .gitignore file affects the pseudo-commits made for the remote that's imported from. This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-30 10:41:59 -04:00
Joey Hess	0033e08193	avoid a second traversal of the ImportableContents Do all filtering in one pass.	2020-09-30 10:10:03 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	1610d94776	addurl: Avoid a redundant git ignores check for speed Ensure that checkCanAdd is used everywhere a file is added to git, so git add is run with -f, presumably avoiding the work it would usually do to check ignores.	2020-09-29 13:00:41 -04:00
Joey Hess	658ea7ca3c	sync --no-content import from directory special remote sync: When run without --content, import without copying from importtree=yes directory special remotes. (Other special remotes may support this later as well.) This commit was sponsored by Svenne Krap on Patreon.	2020-09-28 15:29:08 -04:00
Joey Hess	3eaaec3113	consistently use importKey when available This avoids import with --no-content and with --content potentially generating two different trees, leading to a merge conflict when run in two different clones of a repo. And it's necessary groundwork to make git-annex sync --no-content import from special remotes that support importKey. Only the directory special remote currently supports importKey, and it generates the same key as git-annex usually does, so there is no behavior change for it. Future special remotes will need to take care when adding importKey, if it generates different keys. Added some warnings about that to comments. This commit was sponsored by Noam Kremen on Patreon.	2020-09-28 15:27:46 -04:00
Joey Hess	8b74f01a26	split ProvidedInfo and UserProvidedInfo The latter is for git-annex matchexpression and matching against it can throw an exception. Splitting out the former reduces the potential for mistakes and avoids needing to worry about matching against that throwing an exception. This is more groundwork for matching largefiles while importing, without downloading content. This commit was sponsored by Graham Spencer on Patreon.	2020-09-28 12:12:38 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	f624876dc2	remove zombie process in file seeking This was the last one marked as a zombie. There might be others I don't know about, but except for in the hypothetical case of a thread dying due to an async exception before it can wait on a process it started, I don't know of any. It would probably be safe to remove the reapZombies now, but let's wait and so that in its own commit in case it turns out to cause problems. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-25 11:38:42 -04:00
Joey Hess	ca454c47f2	explicitly wait for a git process Eliminate a zombie that was only cleaned up by the later zombie cleanup code. This is still not ideal, it would be cleaner if it used conduit or something, and if the thread gets killed before waiting, it won't stop the process. Only remaining zombies are in CmdLine.Seek	2020-09-25 11:03:12 -04:00
Joey Hess	051e16a945	remove debug print	2020-09-24 15:37:39 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	b45b37b088	wait for first pass to complete before second pass Otherwise the bloom filter may not be fully populated when the second pass starts, which could have led to incorrect behavior with --all -J, probably in very rare circumstances.	2020-09-24 14:23:25 -04:00
Joey Hess	167da965b9	remove obsolete comment	2020-09-24 14:22:56 -04:00
Joey Hess	c1b4d76e6b	make MatchFiles introspectable matchNeedsFileContent is not used yet, but shows how to add information about terminals. That one would be needed for https://git-annex.branchable.com/todo/sync_fast_import/ Note the tricky bit in Annex.FileMatcher.call where it folds over the included matcher to propagate the information. This commit was sponsored by Svenne Krap on Patreon.	2020-09-24 14:01:53 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	3457b526ef	make git-annex add --no-check-gitignore not skip ignored files, same as with --force	2020-09-18 13:33:35 -04:00
Joey Hess	d0b06c17c0	Added --no-check-gitignore option for finer grained control than using --force. add, addurl, importfeed, import: Added --no-check-gitignore option for finer grained control than using --force. (--force is used for too many different things, and at least one of these also uses it for something else. I would like to reduce --force's footprint until it only forces drops or a few other data losses. For now, --force still disables checking ignores too.) addunused: Don't check .gitignores when adding files. This is a behavior change, but I justify it by analogy with git add of a gitignored file adding it, asking to add all unused files back should add them all back, not skip some. The old behavior was surprising. In Command.Lock and Command.ReKey, CheckGitIgnore False does not change behavior, it only makes explicit what is done. Since these commands are run on annexed files, the file is already checked into git, so git add won't check ignores.	2020-09-18 13:19:13 -04:00
Joey Hess	fcf5d11c63	add "input" field to json output The use case of this field is mostly to support -J combined with --json. When that is implemented, a user will be able to look at the field to determine which of the requests they have sent it corresponds to. The field typically has a single value in its list, but in some cases mutliple values (eg 2 command-line params) are combined together and the list will have more. Note that json parsing was already non-strict, so old git-annex metadata --json --batch can be fed json produced by the new git-annex and will not stumble over the new field.	2020-09-15 16:22:44 -04:00
Joey Hess	2a3c2b1843	use Branch.name instead of hard coding the branch name Makes much more clear why ActionItemOther is being passed "git-annex".	2020-09-15 15:47:22 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	f4c4b89aa3	refactor Make all calls to git merge go through autoMergeFrom, in preparation for fine-tuning git merge's config for automatic merge conflict resolution. This commit was sponsored by Ryan Newton on Patreon.	2020-09-07 13:26:16 -04:00
Joey Hess	46eb48d7c0	Retry transfers to exporttree=yes remotes same as for other remotes The comment about noRetry is not well-justified, because transfers to many remotes cannot be resumed, but retries are still allowed for those.	2020-09-04 13:24:08 -04:00
Joey Hess	7bdb0cdc0d	add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess Fixes reversion in 8.20200617 that made annex.pidlock being enabled result in some commands stalling, particularly those needing to autoinit. Renamed runsGitAnnexChildProcess to make clearer where it should be used. Arguably, it would be better to have a way to make any process git-annex runs have the env var set. But then it would need to take the pid lock when running any and all processes, and that would be a problem when git-annex runs two processes concurrently. So, I'm left doing it ad-hoc in places where git-annex really does run a child process, directly or indirectly via a particular git command.	2020-08-25 14:57:49 -04:00
Joey Hess	2ca1ff62dc	addurl --file youtube-dl reversion fix addurl: Fix reversion in 7.20190322 that made --file not be honored when youtube-dl was used to download media. `8758f9c561` was on the right track, but missed that \| otherwise prevented the code it added from being used. Also, refactored out a common function. This commit was sponsored by Graham Spencer on Patreon.	2020-08-25 12:56:45 -04:00
Joey Hess	4c58433c48	avoid using MonadFail in ParseDuration There's no instance for Either String, so that makes it not as useful as it could be, so instead just return an Either String.	2020-08-15 15:53:35 -04:00
Joey Hess	5d380c6c5c	when workTreeItems finds a problem with a parameter, don't go on to process it Part of workTreeItems is trying detect a case where git porcelain refuses to process a file, and where git ls-files silently outputs nothing. But, it's hard to perfectly replicate git's behavior, and besides, git's behavior could change. So it could be that we warn, but then git ls-files does not skip over it, and so git-annex also processes it after warning about it. So, if we think we have a problem with a parameter, display the warning, and skip processing it at all. Implementing this was complicated by needing to handle the case where all command-line parameters get filtered out this way. Which is different than the case where there are none, because we don't want to operate on all files in this new case..	2020-08-06 13:47:45 -04:00
Joey Hess	283d2f85d1	importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_' sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was running it on parts of the template. So eg the leading '.' in the extension got sanitized. Note the added case for sanitizeLeadingFilePathCharacter ('/':_) -- this was added because, if the template is title/episode and the title is not set, it would expand to "/episode". So this is another potential security fix.	2020-08-05 11:35:00 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	c30fd24d91	add back inAnnex check after seeking The test suite noticed this case, where two files with the same key are dropped, and the seek stage sees both have content due to the way files stream through it. But then locking the content to drop fails on the second file, because the first file has already been dropped. So, add back otherwise redundant inAnnex check.	2020-07-25 11:18:50 -04:00
Joey Hess	18f1fb5841	drop performance improvements Sped up seeking files to drop by 2x, and also some performance improvements to checking numcopies. Interestingly, the seek speedup is not due to precaching, but I think is due to calling getParsed earlier. Annex.Drop had to be changed to check inAnnex there, since it was removed from Command.Drop. All other users of Command.Drop already checked inAnnex themselves. This commit was sponsored by Ryan Newton on Patreon.	2020-07-24 13:27:46 -04:00
Joey Hess	a01aa214be	enable location log precaching for mirror It will be some perf increase, but the command is not much used so I have not bothered to benchmark it.	2020-07-24 13:19:24 -04:00
Joey Hess	d732ef1a89	move, copy: Sped up seeking for annexed files to operate on by a factor of nearly 2x.	2020-07-24 12:56:02 -04:00
Joey Hess	00865cdae8	Fix a bug in find --branch in the previous version inAnnex check was lost for that code path. To avoid more such mistakes, made withKeyOptions check it when the AnnexedFileSeeker specifies.	2020-07-24 12:05:28 -04:00
Joey Hess	2d771a7d32	add back inAnnex check for keys options Lost in recent commit.	2020-07-24 11:49:15 -04:00
Joey Hess	4685612f43	small git-annex get speedup Remove an redundant inAnnex check. The checkContentPresent handles that, and after the last commit also does in batch mode.	2020-07-22 14:29:30 -04:00
Joey Hess	1be92381ec	unify batch mode with non-batch by using AnnexedFileSeeker	2020-07-22 14:23:28 -04:00
Joey Hess	abd56fb019	Fix a bug in find --batch in the previous version.	2020-07-20 19:50:53 -04:00
Joey Hess	c4cc2cdf4c	rename getKey to genKey for consistency with external backend protocol	2020-07-20 14:06:05 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	a7156b875c	fix fsck reversion `75aab72d23` made fsck skip files whose content is not present, but it should complain if there are not enough copies.	2020-07-15 11:21:43 -04:00
Joey Hess	9c23f99d45	add back missing check that content is present Lost in `75aab72d23` and some related commits. unannex skips files whose content is not present.	2020-07-15 11:15:28 -04:00
Joey Hess	377866d884	remove unused import	2020-07-14 14:37:40 -04:00
Joey Hess	7b2d236556	importfeed: stream metadata for 5% speedup On top of the 10% speedup from streaming url logs.	2020-07-14 14:35:26 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	df58609804	convert sync to use seekFilteredKeys This only speeds up sync --content from 34.75 to 33.17 seconds; location log precaching will probably be a bigger win.	2020-07-13 15:02:52 -04:00
Joey Hess	88a7fb5cbb	convert all applicable commands to new 2x faster annexed file seeking This removes all calls to inAnnex, except for some involving --batch. It may be that the batch code could get a similar speedup, but I don't know if people habitually pass a huge number of files through --batch that git-annex does not need to do anything to process, so I skipped it for now. A few calls to ifAnnexed remain, and might be worth doing more to convert. In particular, Command.Sync has one that would probably speed it up by a good amount. (also removed some dead code from Command.Lock)	2020-07-10 15:45:38 -04:00
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	4c9ad1de46	optimisation: stream keys through git cat-file --buffer This is only implemented for git-annex get so far. It makes git-annex get nearly twice as fast in a repo with 10k files, all of them present! But, see the TODO for some caveats.	2020-07-10 13:54:52 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	4229713e63	importfeed: Added some additional --template variables for date and time This commit was sponsored by Ethan Aubin.	2020-06-24 14:24:50 -04:00
Joey Hess	7757c0e900	Honor annex.largefiles when importing a tree from a special remote. This commit was sponsored by Martin D on Patreon.	2020-06-23 16:07:18 -04:00
Joey Hess	5098236c6b	testremote: Fix over-allocation of resources and bad caching Including starting up a large number of external special remote processes. (Regression introduced in version 8.20200501)	2020-06-22 14:25:49 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	d5451afc8f	fix deadlock Fix a deadlock that could occur after git-annex got an unlocked file, causing the command to hang indefinitely. Known to happen on vfat filesystems, possibly others. Note that a deadlock is still theoretically possible, if anything smudge --clean does causes it to run the git queue for some other reason. Apparently that doesn't happen, but will need to keep an eye on it.	2020-06-18 12:56:29 -04:00
Joey Hess	96f6aa39dd	add runsGitAnnexChildProcess calls This is all the calls to git-annex that seem capable of possibly locking the same pidlock as their parent. Except possibly for some in the assistant.	2020-06-17 15:31:03 -04:00
Joey Hess	c4f2c56f5e	checkpresentkey: fix behavior to match documentation checkpresentkey: When no remote is specified, try all remotes, not only ones that the location log says contain the key. This is what the documentation has always said it did. Still try the logged remotes first, because they are far more likely to have the key.	2020-06-16 13:54:26 -04:00
Joey Hess	2670890b17	convert to withCreateProcess for async exception safety This handles all createProcessSuccess callers, and aside from process pools, the complete conversion of all process running to async exception safety should be complete now. Also, was able to remove from Utility.Process the old API that I now know was not a good idea. And proof it was bad: The code size went down, despite there being a fair bit of boilerplate for some future API to reduce.	2020-06-04 15:45:52 -04:00
Joey Hess	92f775eba0	convert to withCreateProcess for async exception safety Not yet 100% done, so far I've grepped for waitForProcess and converted everything that uses that to start the process with withCreateProcess. Except for some things like P2P.IO and Assistant.TransferrerPool, and Utility.CoProcess, that manage a pool of processes. See #2 in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7 for how those will need to be dealt with. checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so callers of them will also need to be dealt with, and have not been yet.	2020-06-03 15:48:09 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	484a74f073	auto-init autoenable=yes Try to enable special remotes configured with autoenable=yes when git-annex auto-initialization happens in a new clone of an existing repo. Previously, git-annex init had to be explicitly run to enable them. That was a bit of a wart of a special case for users to need to keep in mind. Special remotes cannot display anything when autoenabled this way, to avoid interfering with the output of git-annex query commands. Any error messages will be hidden, and if it fails, nothing is displayed. The user will realize the remote isn't enable when they try to use it, and can run git-annex init manually then to try the autoenable again and see what failed. That seems like a reasonable approach, and it's less complicated than communicating something across a pipe in order to display it as a side message. Other reason not to do that is that, if the first command the user runs is one like git-annex find that has machine readable output, any message about autoenable failing would need to not be displayed anyway. So better to not display a failure message ever, for consistency. (Had to split out Remote.List.Util to avoid an import cycle.)	2020-05-27 12:40:35 -04:00
Joey Hess	3824645368	change to new waitForAllRunningCommandActions waitForAllRunningCommandActions is a subset of finishCommandActions and more appropriate for what is being done here: Just a concurrency barrier.	2020-05-26 14:00:51 -04:00
Joey Hess	864ba4ecaa	disable buggy concurrency in Command.Export Fix a crash or potentially not all files being exported when sync -J --content is used with an export remote. Crash as described in fixed bug report. waitForAllRunningCommandActions inserted in several points where all the commandActions started before need to have finished before moving on to the next stage of the export. A race across those points could have maybe resulted in not all files being exported, or a wrong tree being export. For example, changeExport starting up an action like a rename of A to B. Then, with that action still running, fillExport uploading a new A, before the rename occurred. That race seems unlikely to have happened. There are some other ones that this also fixes.	2020-05-26 13:54:08 -04:00
Joey Hess	e04a931439	improve transfer stages for some commands move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup actions in separate job pool from uploads. transferStages was confusingly named, it's only useful when doing downloads as then the verify actions can be run concurrently with other downloads. For commands that upload, there will be more concurrency from running cleanup actions in a separate job pool. As for sync, I left it using downloadStages although that's not optimal for the part of a sync that uploads. Perhaps it should use the union of both?	2020-05-26 11:55:50 -04:00
Joey Hess	0d82a88742	drop: use commandStages, not transferStages I cannot find any rationalle for why this was changed before. drop certianly does not do any transfers, so commandStages will perform better.	2020-05-26 11:47:54 -04:00
Joey Hess	0bcecb67f5	export: Let concurrent transfers be done with -J or annex.jobs Tested working, although I did find this bug in my testing, which also afflicts sync -J to an export remote.	2020-05-26 11:44:07 -04:00
Joey Hess	f7fe71602c	import: Added --json-progress Already supported --json, but not that. Also checked all other commands that only support --json, and the only other one that does transfers is fsck (--from), which it did not seem worth adding --json-progress to really.	2020-05-26 11:27:47 -04:00
Joey Hess	5b8524e1e6	addurl: Make --preserve-filename also apply when eg a torrent contains multiple files Forgot to remove sanitizeFilePath after adding sanitizeOrPreserveFilePath here.	2020-05-26 10:45:57 -04:00
Joey Hess	fc9833f68d	export: Added options for json output Just worked, no need to do anything except add the options.	2020-05-26 10:31:10 -04:00
Joey Hess	d7c7245438	whereis: Added --format option. One way this can be used is to remove all urls for some website that went away: git-annex whereis --format '${file} ${url}\0' \| \ grep -z whatever.com \| git-annex rmurl --batch -z Combining ${url} and ${uuid} is a bit of a combinatorial explosion. It didn't seem worth only outputting a uuid alongside an url belonging to it, so each uuid is output beside each url.	2020-05-19 16:20:56 -04:00
Joey Hess	6361074174	convert renameExport to throw exception Finishes the transition to make remote methods throw exceptions, rather than silently hide them. A bit on the fence about this one, because when renameExport fails, it falls back to deleting instead, and so does the user care why it failed? However, it did let me clean up several places in the code. This commit was sponsored by Ethan Aubin.	2020-05-15 15:08:09 -04:00
Joey Hess	037440ef36	convert removeExportDirectory to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 14:43:18 -04:00
Joey Hess	cdbfaae706	change removeExport to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Graham Spencer on Patreon.	2020-05-15 14:15:14 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	4814b444dd	make storeExport throw exceptions	2020-05-15 12:20:02 -04:00
Joey Hess	dc7dc1e179	refactor	2020-05-14 14:21:58 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	39d7e6dd2a	addurl --preserve-filename for other remotes Finishing work begun in `6952060665` Also, truncate filenames provided by other remotes if they're too long, when --preserve-filename is not used. That seems to have been omitted before by accident.	2020-05-11 14:33:27 -04:00
Joey Hess	5f5170b22b	remove SafeFilePath Move sanitizeFilePath call to where fromSafeFilePath had been.	2020-05-11 14:04:56 -04:00
Thomas Koch	8a0480daf3	Fix haddock parse error I run haddock with `cabal haddock --executables`. It fails with: Types/Remote.hs:271:17: error: parse error on input ‘->’ Apparently haddock does not like to find haddock blocks outside of declarations? In any case, this patch makes these type of errors go away. Afterwards, I see errors like these, that need to be investigated as a next step: haddock: internal error: internal: extractDecl CallStack (from HasCallStack): error, called at utils/haddock/haddock-api/src/Haddock/Interface/Create.hs:1116:12 in main:Haddock.Interface.Create	2020-05-11 08:40:13 +02:00
Joey Hess	6952060665	addurl --preserve-filename and a few related changes * addurl --preserve-filename: New option, uses server-provided filename without any sanitization, but with some security checking. Not yet implemented for remotes other than the web. * addurl, importfeed: Avoid adding filenames with leading '.', instead it will be replaced with '_'. This might be considered a security fix, but a CVE seems unwattanted. It was possible for addurl to create a dotfile, which could change behavior of some program. It was also possible for a web server to say the file name was ".git" or "foo/.git". That would not overrwrite the .git directory, but would cause addurl to fail; of course git won't add "foo/.git". sanitizeFilePath is too opinionated to remain in Utility, so moved it. The changes to mkSafeFilePath are because it used sanitizeFilePath. In particular: isDrive will never succeed, because "c:" gets munged to "c_" ".." gets sanitized now ".git" gets sanitized now It will never be null, because sanitizeFilePath keeps the length the same, and splitDirectories never returns a null path. Also, on the off chance a web server suggests a filename of "", ignore that, rather than trying to save to such a filename, which would fail in some way.	2020-05-08 16:22:55 -04:00
Joey Hess	0040d2c129	sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from Now the warning gets displayed, which is better than an arcane git error. The warning is still kind of ugly, especially when the pull later in the sync will clear up what it warns about. But, this is an unusual situation not likely to happen, and if there is no remote to pull from, the warning message is needed or the sync will seem to succeed despite not merging the synced master branch. Would still be better if it could merge the synced master branch in this situation, making an empty commit to master to do it seems wrong, and otherwise it would need a whole separate code path, and would bypass using git merge in favor of say, setting master to the syned branch. Which would bypass git configs like arguably merge.ff and certianly merge.verifySignatures. So don't want to do that.	2020-05-05 14:31:37 -04:00
Joey Hess	9fa940569c	added remote variants Todo item is done at last. Might later want to think about testing some other types of remotes that can be tested locally. The git remote itself is probably already well enough tested by the test suite that testremote is not needed. Could test things like bup, or rsync to a local directory. Or even external, although that would require embedding an external special remote program into the test suite..	2020-04-30 13:52:03 -04:00
Joey Hess	fc1ae62ef1	added export remote tests	2020-04-30 13:13:08 -04:00
Joey Hess	735d2e90df	testremote in test is working Not yet testing export, or remote variants, but it already adds several hundred test cases, so big win.	2020-04-30 12:59:20 -04:00
Joey Hess	d7db481471	wip This does not compile, and I hit a bad dead end. Wah.	2020-04-29 15:48:39 -04:00
Joey Hess	20f954c3b2	groundwork for adding testremote to git-annex test Factored out a mkTestTree, which can be used to get a TestTree, w/o needing to first run any annex actions, which the main test suite cannot do because it does not operate in an annex repo to start with, and it needs to start testing before a repo is available.	2020-04-29 13:16:43 -04:00
Joey Hess	fa98025de0	fix testremote to not throw away annex state `aeca7c2207` exposed this problem, but it was never a good idea to have a series of test cases, some of which depend on prior ones, and throw away annex state after each.	2020-04-28 17:19:07 -04:00
Joey Hess	19b5137227	addurl --fast error message improvement addurl: When run with --fast on an url that annex.security.allowed-ip-addresses prevents accessing, display a more useful message. (Also importfeed --fast potentially.)	2020-04-27 13:48:14 -04:00
Joey Hess	c05c4e549e	sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes Do not sync with a faster remote that was not specified. That old behavior was only documented in the changelog, and was certianly surprising. It also meant adding --fast made it slower..	2020-04-23 16:08:45 -04:00
Joey Hess	cd1676d604	fix bug involving local git remote and out of date location log get --from, move --from: When used with a local git remote, these used to silently skip files that the location log thought were present on the remote, when the remote actually no longer contained them. Since that behavior could be surprising, now instead display a warning. I got very confused when I encountered this behavior, since it was silently skipping a file I needed that whereis said was on the remote. get without --from already displayed a "unable to access these remotes" message, which while a bit misleading in that the remote is likely accessible, but just doesn't contain the file, at least indicated something went wrong. Having get --from display a warning makes it in line with get w/o --from, so seems certianly ok. It might be there are situations where move --from is used, on eg a whole directory, and the user only wants to move whatever is present in the remote, and is perfectly ok with files that are not present being skipped. So I'm less sure about the new warning being ok there. OTOH, only local git remotes avoiding displaying a warning in that case too, so this just brings them into line with other remotes. (Also note that this makes it a little bit faster when dealing with a lot of files, since it avoids a redundant stat of the file.)	2020-04-21 12:36:58 -04:00
Joey Hess	529f488ec4	fix a thundering herd problem Avoid repeatedly opening keys db when accessing a local git remote and -J is used. What was happening was that Remote.Git.onLocal created a new annex state as each thread started up. The way the MVar was used did not prevent that. And that, in turn, led to repeated opening of the keys db, as well as probably other extra work or resource use. Also managed to get rid of Annex.remoteannexstate, and it turned out there was an unncessary Maybe in the keysdbhandle, since the handle starts out closed.	2020-04-17 17:09:29 -04:00
Joey Hess	957a87b437	fix absolute filenames fed into --batch and git-annex info	2020-04-15 16:04:05 -04:00
Joey Hess	f85ca7dc80	fix all remaining -Wincomplete-uni-patterns warnings A couple of these were probably actual bugs in edge cases. Most of the changes I'm fine with. The fact that aeson's object returns sometihng that we know will be an Object, but the type checker does not know is kind of annoying.	2020-04-15 13:55:08 -04:00
Joey Hess	9cb69dbb76	support boolean git configs that are represented by the name of the setting with no value Eg"core.bare" is the same as "core.bare = true". Note that git treats "core.bare =" the same as "core.bare = false", so the code had to become more complicated in order to treat the absense of a value differently than an empty value. Ugh.	2020-04-13 13:35:22 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	c0cd07c36b	Ref ByteString conversion done Test suite passes.	2020-04-07 17:41:09 -04:00
Joey Hess	435c722904	remove unused import	2020-03-30 16:07:10 -04:00
Joey Hess	87d5583a91	use programPath consistently, not readProgramFile Improve git-annex's ability to find the path to its program, especially when it needs to run itself in another repo to upgrade it. Some parts of the code used readProgramFile, probably because I forgot that programPath exists. I noticed this when a git-annex auto-upgrade failed because it was running git-annex upgrade --autoonly, but the code to run git-annex used readProgramFile, which happened to point to an older build of git-annex.	2020-03-30 16:06:27 -04:00
Joey Hess	0e4d80d5c1	remove pre-commit hook This was originally added so that unannex could prevent the hook from running while files were in a state that the hook would interpret as old-style unlocked and so would lock. Now that's gone, so the only thing the hook was preventing was two pre-commit processes running simulantaneously. But such concurrency is normal in git-annex and should not be a problem. Does mean that .git/hooks/pre-commit-annex might run more concurrently, that seems the only risk of it causing any problems.	2020-03-30 11:54:04 -04:00
Kyle Meyer	39131b55ca	add --force-small: Send all non-regular files through addFile Running `git annex add --force-small` on a modified submodule fails when the submodule path is fed to hash-object. This failure is unlikely to be triggered by a caller passing a submodule explicitly to `git annex add` because there's nothing useful that annex-add can do with a submodule. A more likely scenario for hitting this failure is that the caller passes "." or a subdirectory to `annex-add` while a submodule underneath the specified path happens to be modified. addSmallOverridden already routes symbolic links through addFile rather than using the custom hash-object/update-index call. The latter is valid only for regular files, so extend this condition so that everything that isn't a regular file goes through addFile. Doing so avoids the above error because submodules come in as directories.	2020-03-26 13:14:16 -04:00
Kyle Meyer	339aebc6ad	add --force-small: Don't dereference link when checking file status addSmallOverridden calls getFileStatus and then checks the result with isSymbolicLink. getFileStatus dereferences symbolic links, so isSymbolicLink will always return false (assuming the getFileStatus call doesn't fail on a broken link). Use getSymbolicLinkStatus instead.	2020-03-26 13:11:27 -04:00
Joey Hess	afe72d04ff	fix problems with upgrade of local remotes Upgrade other repos than the current one by running git-annex upgrade inside them, which avoids problems with upgrade code making assumptions that the cwd will be inside the repo being upgraded. In particular, this fixes a problem where upgrading a v7 repo to v8 caused an ugly git error message. I actually could not find a way to make Upgrade.V7 work properly without changing directory to the remote. Once I got git ls-files to work, the git cat-file failed because :path can only be used in the current git repo.	2020-03-09 16:49:28 -04:00
Joey Hess	1978a24207	Fix bug that caused unlocked annexed dotfiles to be added to git by the smudge filter when annex.dotfiles was not set.	2020-03-09 14:20:02 -04:00
Joey Hess	01acb5212e	fix build	2020-03-09 12:31:14 -04:00
Joey Hess	7f992ef59c	mostly finished with createDirectoryUnder conversion Remaining things needing converted are in the assistant, and Annex.Ssh. Every other remaining call to createDirectoryIfMissing True has been audited and is not relevant. The ones in Build/ of course don't get included in the program. Others included eg, Remote.Tahoe and Config.Files which both write to dotfiles under the home directory.	2020-03-06 11:57:15 -04:00
Joey Hess	eaa49ab53d	convert replaceFile to createDirectoryUnder Since it was used on both worktree and .git/annex files, split into multiple functions. In passing, this also improves permissions of created directories in .git/annex, using createAnnexDirectory on those.	2020-03-06 11:31:01 -04:00
Joey Hess	ccd8c43dc8	git-annex config: guard against non-repo-global configs git-annex config: Only allow configs be set that are ones git-annex actually supports reading from repo-global config, to avoid confused users trying to set other configs with this.	2020-03-02 15:54:18 -04:00
Joey Hess	f6d629e483	changelog and minor style	2020-02-28 12:57:55 -04:00
Peter Simons	73cf523a4b	Fix build with ghc-8.8.x. The 'fail' method has been moved to the 'MonadFail' class. I made the changes so that the code still compiles with previous versions of 'base' that don't have the new MonadFail class exported by Prelude yet.	2020-02-28 12:54:20 -04:00
Joey Hess	2366e7fb84	catch whereisKey exception and provide error messages when external programs neglect to * whereis: If a remote fails to report on urls where a key is located, display a warning, rather than giving up and not displaying any information. * When external special remotes fail but neglect to provide an error message, say what request failed, which is better than displaying an empty error message to the user.	2020-02-27 14:09:18 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00
Joey Hess	d37975357d	Bugfix: export --tracking (a deprecated option) set annex-annex-tracking-branch, instead of annex-tracking-branch. (cherry picked from commit `a3a674d15b`)	2020-02-26 18:08:04 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	c31e1be781	convert KeySource to RawFilePath	2020-02-21 10:04:44 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	79a0435b77	automate remote.name.skipFetchAll initremote, enableremote: Set remote.name.skipFetchAll when the remote cannot be fetched from by git, so git fetch --all will not try to use it.	2020-02-19 13:58:26 -04:00
Joey Hess	69f2d1dd43	remoteConfig rework remoteAnnexConfig will avoid bugs like `a3a674d15b` Use now more generic remoteConfig in a couple places that built non-annex config settings manually before.	2020-02-19 13:45:11 -04:00
Joey Hess	a3a674d15b	Bugfix: export --tracking (a deprecated option) set annex-annex-tracking-branch, instead of annex-tracking-branch.	2020-02-19 13:34:24 -04:00
Joey Hess	72959b23e5	remove mention of receive.denyNonFastforwards on push failure That was added back in 2013 commit `2af652e1b8` and I'm a bit unclear about the reasons. It seemed that, at the time, receive.denyNonFastforwards=true, which is the default in a repo created by git init --shared --bare (but not without --shared), which the assistant did, caused problems syncing. But even at the time the bug report showed an error message clearly explaining that it was a non-fast-forward push being denied. I tried it with the current version, and since git-annex sync pulls from the bare repo and merges, it pushes a fast-forward. So there's no failure to push. (There could be one if another push happened after the pull, but you'd want it to fail then presumably.) I'm not 100% sure what changed to make it not be a problem, but I know I've seen this message in many circumstances and I can't ever recall it having anything to do with any issue that prevented a push. Based on doc/forum/non_fast_forward_error_with_git_annex_sync.mdwn, which showed the problem when syncing from a direct mode repo, and on doc/forum/receiving_indirect_renames_on_direct_repo___63__/comment_3_0246fff6c7c75f6be45bd257ec3872a5._comment which seems to show the problem was actually a problem pulling, I think there's a good chance that the problem actually involved direct mode.	2020-02-19 11:46:24 -04:00
Joey Hess	06f6eb7a70	--only-annex --no-content combination	2020-02-18 12:29:31 -04:00
Joey Hess	a78eb6dd58	sync --only-annex and annex.synconlyannex * Added sync --only-annex, which syncs the git-annex branch and annexed content but leaves managing the other git branches up to you. * Added annex.synconlyannex git config setting, which can also be set with git-annex config to configure sync in all clones of the repo. Use case is then the user has their own git workflow, and wants to use git-annex without disrupting that, so they sync --only-annex to get the git-annex stuff in sync in addition to their usual git workflow. When annex.synconlyannex is set, --not-only-annex can be used to override it. It's not entirely clear what --only-annex --commit or --only-annex --push should do, and I left that combination not documented because I don't know if I might want to change the current behavior, which is that such options do not override the --only-annex. My gut feeling is that there is no good reasons to use such combinations; if you want to use your own git workflow, you'll be doing your own committing and pulling and pushing. A subtle question is, how should import/export special remotes be handled? Importing updates their remote tracking branch and merges it into master. If --only-annex prevented that git branch stuff, then it would prevent exporting to the special remote, in the case where it has changes that were not imported yet, because there would be a unresolved conflict. I decided that it's best to treat the fact that there's a remote tracking branch for import/export as an implementation detail in this case. The more important thing is that an import/export special remote is entirely annexed content, and so it makes a lot of sense that --only-annex will still sync with it.	2020-02-17 16:33:10 -04:00
Joey Hess	879f52a116	annex.tune.branchhash1=true bugfix Fix support for repositories tuned with annex.tune.branchhash1=true, including --all not working and git-annex log not displaying anything for annexed files.	2020-02-14 15:22:48 -04:00
Joey Hess	352963690a	fsck --from remote -J concurrency bug fsck --from remote: Fix a concurrency bug that could make it incorrectly detect that content in the remote is corrupt, and remove it, resulting in data loss.	2020-02-14 14:52:15 -04:00
Joey Hess	1883f7ef8f	support git remotes that need http basic auth using git credential to get the password One thing this doesn't do is wrap the password prompting inside the prompt action. So with -J, the output can be a bit garbled.	2020-01-22 16:16:19 -04:00
Joey Hess	5c6bf1be97	--whatelse is a better name than --describe-other-params The use case is basically the user having forgotten, so --help would be best, but it would be quite hard to include this in --help, since it may even have to spin up an external special remote program. I also considered --umm but typoed it the first time I tried it as --uum, and while memorable, it's too cutesy. --whatelse is good because it explicitly asks, what other params, besides the ones I've given?	2020-01-20 17:04:45 -04:00
Joey Hess	2be4122bfc	include passthrough params in --describe-other-params	2020-01-20 16:53:27 -04:00
Joey Hess	aa949bbb7d	initremote --describe-other-params Does not yet include descriptions from external special remote programs.	2020-01-20 16:05:51 -04:00
Joey Hess	987076690c	started on --list-params-for	2020-01-15 14:09:30 -04:00
Joey Hess	6a982e38eb	a few more field functions	2020-01-15 12:57:56 -04:00
Joey Hess	2edf0506a5	a few forgotten remote config fields preferreddir can be used with any special remote, so its parser needs to be included in the commonFieldParsers. initremote with uuid= changed to delete that field, so it does not need to be included in commonFieldParsers. Note that, existing remotes initialized before this change will have the field in remote.log. This will not cause problems parsing, because the value will be Accepted. Grepping for 'Accepted "' found these, and I'm pretty sure this is all of them.	2020-01-15 11:22:36 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	6db4aee7df	use --no-abbrev instead of --abbrev=40 This avoids hardcoding the sha size, so when git uses sha256, it will output the full sha256 and not a truncation to 40 characters. I reviewed git's history, and while there have been some bugs with commands not supporting --no-abbrev (eg git diff --no-index --no-abbrev was broken in git 2.1), none of the commands git-annex uses will be impacted by those old bugs.	2020-01-07 12:29:37 -04:00
Joey Hess	5e4deb3620	support sha256 git repos Git will eventually switch to sha2 and there will not be one single shaSize anymore, but two (40 and 64). Changed all parsers for git plumbing output to support both sizes of shas. One potential problem this does not deal with is, if somewhere in git-annex it reads two shas from different sources, and compares them to see if they're the same sha, it would fail if they're sha1 and sha256 of the same value. I don't know if that will really be a concern.	2020-01-07 12:22:19 -04:00
Joey Hess	2de3dddfd2	reinject --known: Fix bug that prevented it from working in a bare repo. ifAnnexed in a bare repo passes to git cat-file :./filename , which it refuses to do since the repo is bare. Note that, reinject somefile someannexedfile in a bare repo silently does nothing, because someannexedfile is never actually an annexed worktree file, because the repo is bare.	2020-01-06 14:22:22 -04:00
Joey Hess	2cea674d1e	Merge branch 'master' into v8	2020-01-01 14:26:43 -04:00
Joey Hess	503788238c	add --force-annex/--force-git options make it easier to override annex.largefiles configuration (and potentially safer as it avoids bugs like the smudge bug fixed in the last release) Deleted some old comments that were posted to the man page discussing such options. Updated docs that used -c annex.largefiles to use the options. Note that addSmallOverridden was needed to avoid the clean filter running on the file. It would be possible to make addFile also update the index directly, rather than going via git add. However, it was not necessary, and I want to avoid breaking on some edge case, particularly if the code in addSmallOverridden has some oversight. Also, when annex.addunlocked is set and annex.largefiles does not match a file, git annex add --force-large works, but git status will then show the file as added, with a unstaged modification. The unstaged modification adds the file to git. This is identical behavior to using -c annex.largefiles=nothing when annex.addunlocked is set. This does not prevent committing what was intended to be added. I have not gotten to the bottom of why git thinks the file is modified and runs it through the clean filter in this case.	2020-01-01 14:03:06 -04:00
Joey Hess	ea3cb7d277	fix a case where file tracked by git unexpectedly becomes annex pointer file smudge: When annex.largefiles=anything, files that were already stored in git, and have not been modified could sometimes be converted to being stored in the annex. Changes in 7.20191024 made this more of a problem. This case is now detected and prevented.	2019-12-27 15:08:03 -04:00
Joey Hess	3cd3757236	annex.dotfiles The git add behavior changes could be avoided if it turns out to be really annoying, but then it would need to behave the old way when annex.dotfiles=false and the new way when annex.dotfiles=true. I'd rather not have the config option result in such divergent behavior as `git annex add .` skipping a dotfile (old) vs adding to annex (new). Note that the assistant always adds dotfiles to the annex. This is surprising, but not new behavior. Might be worth making it also honor annex.dotfiles, but I wonder if perhaps some user somewhere uses it and keeps large files in a directory that happens to begin with a dot. Since dotfiles and dotdirs are a unix culture thing, and the assistant users may not be part of that culture, it seems best to keep its current behavior for now.	2019-12-26 16:33:39 -04:00
Joey Hess	de14a7bab5	didn't mean to commit this incomplete workaround though I suppose it's nice to have it in the history..	2019-12-26 15:07:50 -04:00
Joey Hess	293f95c2d6	analysis	2019-12-26 15:05:36 -04:00
Joey Hess	37467a008f	annex.addunlocked expressions * annex.addunlocked can be set to an expression with the same format used by annex.largefiles, in case you want to default to unlocking some files but not others. * annex.addunlocked can be configured by git-annex config. Added a git-annex-matching-expression man page, broken out from tips/largefiles. A tricky consequence of this is that git-annex add --relaxed honors annex.addunlocked, but an expression might want to know the size or content of an url, which it's not going to download. I decided it was better not to fail, and just dummy up some plausible data in that case. Performance impact should be negligible. The global config is already loaded for annex.largefiles. The expression only has to be parsed once, and in the simple true/false case, it should not do any additional work matching it.	2019-12-20 15:56:25 -04:00
Joey Hess	5591622731	git-annex-config --set/--unset: No longer change the local git config setting `e53070c1f` quietly made it set the local git config too, but that was never documented anywhere, and it had surprising results. If I set annex.largefiles globally in a repo, I would expect to be able to change it in another repo, and the original repo would get the change and use it, rather than being stuck on the old value set there. And, if I have a local annex.largefiles and set a different global default, I'd be surprised to have my local setting overwritten. annex.securehashesonly does need to be set locally, since it's a security feature and the global is only a default until it gets set locally. So special cased.	2019-12-20 13:17:28 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	7d9dff5b05	Merge branch 'master' into bs and update changelog	2019-12-18 15:13:30 -04:00
Joey Hess	7fd5376334	inprogress: Support --key	2019-12-18 14:14:16 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	a7004375ec	avoid deprecation warning	2019-12-06 15:47:56 -04:00
Joey Hess	a0168cd9a2	use RawFilePath getSymbolicLinkStatus for speed	2019-12-06 15:42:54 -04:00
Joey Hess	5f391179f1	use RawFilePath getFileStatus for speed Only done on those calls to getFileStatus that had a RawFilePath, not a FilePath. The others would probably be just as fast if converted to use it with toRawFilePath, but I'm not 100% sure. Note that genInodeCache' uses fromRawFilePath, but that value only gets used on Windows, so on unix the thunk will never be evaluated.	2019-12-06 14:44:42 -04:00
Joey Hess	0e9d699ef3	use R.readSymbolicLink This will be faster once gitAnnexLink is converted to a RawFilePath.	2019-12-06 14:20:18 -04:00
Joey Hess	3266ad3ff7	everything is building again However, the test suite fails some quickchecks, so this branch is not yet in a mergeable state.	2019-12-05 15:10:23 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	3c7fd09ec8	get many more commands building again about half are building now	2019-12-05 11:40:10 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	25ba8156bc	improve benchmark --databases * benchmark: Changed --databases to take a parameter specifiying the size of the database to benchmark. * benchmark --databases: Display size of the populated database. * benchmark --databases: Improve the "addAssociatedFile to (new)" benchmark to really add new values, not overwriting old values.	2019-11-21 17:25:20 -04:00
Joey Hess	6f35b576d7	encourage use of import from directory special remote rather than legacy interface	2019-11-19 13:30:27 -04:00
Joey Hess	890330f0fe	make --json-error-messages capture url download errors Convert Utility.Url to return Either String so the error message can be displated in the annex monad and so captured. (When curl is used, its errors are still not caught.)	2019-11-12 13:52:38 -04:00
Joey Hess	0be23bae2f	refactor Better to not have a single function module, and better to have a more specific type than Bool. This commit was sponsored by Jack Hill on Patreon	2019-11-11 19:10:52 -04:00
Joey Hess	3b34d123ed	Added annex.allowsign option. This commit was sponsored by Ilya Shlyakhter on Patreon.	2019-11-11 16:28:56 -04:00
Joey Hess	25f912de5b	benchmark: Add --databases to benchmark sqlite databases Rescued from commit `11d6e2e260` which removed db benchmarks in favor of benchmarking arbitrary git-annex commands. Which is nice and general, but microbenchmarks are useful too.	2019-10-29 16:59:27 -04:00
Joey Hess	4a3f3a2cb5	make git add only annex when configured by annex.largefiles	2019-10-24 14:17:29 -04:00
Joey Hess	168f91efec	avoid warning over name	2019-10-24 11:46:40 -04:00
Joey Hess	bd197be3ad	annex.gitaddtoannex configuration Added annex.gitaddtoannex configuration. Setting it to false prevents git add from usually adding files to the annex. (Unless the file was annexed before, or a renamed annexed file is detected.) Currently left at true; some users are encouraging it be set to false.	2019-10-23 15:29:46 -04:00
Joey Hess	ec08b66bda	shouldAnnex: check isInodeKnown Renamed unlocked files are now detected, and will always be annexed, unless annex.largefiles disallows it. This allows for git add's behavior to later be changed to otherwise not annex files (whether by default or as a config option), without worrying about the rename case. This is not a major behavior change; annexing is still the default. But there is one case where the behavior is changed, I think for the better: touch f git -c annex.largefiles=nothing add f git add bigfile git commit -m ... mv bigfile f git add f Before, git-annex would see that f was previously not annexed, and so the renamed bigfile content gets added to git. Now, it notices that the inode is the one that bigfile used, and so it annexes it. This potentially slows down git add a lot in some repositories because of the poor performance of isInodeKnown when there are a lot of unlocked files. Configuring annex.largefiles avoids the speed hit.	2019-10-23 14:49:45 -04:00
Joey Hess	3d4aab38ce	remove obsolete comment	2019-10-21 13:51:38 -04:00
Joey Hess	668b878995	remove recently added and unncessary cwd parameter I later made Utility.Su change back to the cwd, so this parameter is not needed.	2019-10-21 13:48:52 -04:00

... 5 6 7 8 9 ...

2812 commits