git-annex

Author	SHA1	Message	Date
Joey Hess	9d3ce224e3	uninit edge cases * uninit: Avoid error message when no commits have been made to the repository yet. * uninit: Avoid error message when there is no git-annex branch. Sponsored-by: Svenne Krap on Patreon	2021-11-08 16:47:00 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	07158b7cf6	shorten synopsis This is to avoid the display being too wide.	2021-11-04 14:33:07 -04:00
Joey Hess	438e5b56aa	tighter --json parsing for metadata metadata --batch --json: Reject input whose "fields" does not consist of arrays of strings. Such invalid input used to be silently ignored. Used to be that parseJSON for a JSONActionItem ran parseJSON separately for the itemAdded, and if that failed, did not propagate the error. That allowed different items with differently named fields to be parsed. But it was actually only used to parse "fields" for metadata, so that flexability is not needed. The fix is just to parse "fields" as-is. AddJSONActionItemFields is needed only because of the wonky way Command.MetaData adds onto the started json object. Note that this line got a dummy type signature added, just because the type checker needs it to be some type. itemFields = Nothing :: Maybe Bool Since it's Nothing, it doesn't really matter what type it is, and the value gets turned into json and is then thrown away. Sponsored-by: Kevin Mueller on Patreon	2021-11-01 14:42:37 -04:00
Joey Hess	80f1354685	metadata --batch: Avoid crashing when a non-annexed file is input Turns out that CommandStart actions do not have their exceptions caught, which is why the giveup was causing a crash. Mostly these actions do not do very much work on their own, but it does seem possible there are other commands whose CommandStart also throws an exception. So, my first attempt at a fix was to catch those exceptions. But, --json-error-messages then causes a difficulty, because in order to output a json error message, an action needs to have been started; that sets up the json object that the error message will be included in a field of. While it would be possible to output an object with just an error field, this would be json output of a format that the user has no reason to expect, that happens only in an exceptional circumstance. That is something I have always wanted to avoid with the json output; while git-annex man pages don't document what the json looks like, the output has always been made to be self-describing. Eg, it includes "error-messages":[] even when there's no errors. With that ruled out, it doesn't seem a good idea to catch CommandStart exceptions and display the error to stderr when --json-error-messages is set. And so I don't know if it makes sense to catch exceptions from that at all. Maybe I'd have a different opinion if --json-error-messages did not exist though. So instead, output a blank line like other batch commands do. This also leaves open the possibility of implementing support for matching object with metadata --json, which would also want to output a blank line when the input didn't match. Sponsored-by: Dartmouth College's DANDI project	2021-11-01 13:40:43 -04:00
Joey Hess	eb95ed4863	fix addurl concurrency issue addurl: Support adding the same url to multiple files at the same time when using -J with --batch --with-files. Implementation was easier than expected, was able to reuse OnlyActionOn. While it will download the url's content multiple times, that seems like the best thing to do; see my comment for why. Sponsored-by: Dartmouth College's DANDI project	2021-10-27 16:15:41 -04:00
Joey Hess	887edeb1ad	avoid warning when built with unix-compat 0.5.3 It re-exports modificationTimeHiRes, and provides a windows version. Might be worth using that windows version eventually, but I have not tested it.	2021-10-18 16:25:28 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	9012fa0187	reinject: Fix crash when reinjecting a file from outside the repository Commit `4bf7940d6b` introduced this problem, but was otherwise doing a good thing. Problem being that fileRef "/foo" used to return ":./foo", which was actually wrong, but as long as there was no foo in the local repository, catKey could operate on it without crashing. After that fix though, fileRef would return eg "../../foo", resulting in fileRef returning ":./../../foo", which will make git cat-file crash since that's not a valid path in the repo. Fix is simply to make fileRef detect paths outside the repo and return Nothing. Then catKey can be skipped. This needed several bugfixes to dirContains as well, in previous commits. In Command.Smudge, this led to needing to check for Nothing. That case should actually never happen, because the fileoutsiderepo check will detect it earlier. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 14:06:34 -04:00
Joey Hess	b9a1cc512d	avoid uncessary call to inAnnex sync --content: Avoid a redundant checksum of a file that was incrementally verified, when used on NTFS and perhaps other filesystems. When sync has just gotten the content, it does not need to check inAnnex a second time. On NTFS, for some reason the write of the inode cache after it gets the content is not immediately able to be read, and with an empty/non-matching inode cache due to that stale data, inAnnex falls back to hashing the whole object to determine if it's present. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 12:02:35 -04:00
Joey Hess	9ea8106bb0	sped up git-annex smudge --clean by 25% Disabling git-annex branch update for this command is ok, because it does not use any information from the branch, but only logs the location when it adds a key. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 14:15:20 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	ec12537774	defer write permissions checking in import until after copy to repo This should complete the fix started in `6329997ac4`, fixing the actual cause of the test suite failure this time. Sponsored-by: Dartmouth College's Datalad project	2021-09-02 13:45:21 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	a99a84f342	add: Detect when xattrs or perhaps ACLs prevent locking down a file's content And fail with an informative message. I don't think ACLs can prevent removing the write bit, but I'm not sure, so kept it mentioning them as a possibility. Should git-annex lock also check if the write bits are able to be removed? Maybe, but the case I know about with xattrs involves cp -a copying NFS xattrs, and it's the copy of the file that is the problem. So when locking a file, I guess it will not be the copy. Sponsored-by: Dartmouth College's Datalad project	2021-08-27 14:33:01 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	f9b92c81f6	unused: Skip the refs/annex/last-index ref that git-annex recently started creating This was unlikely to cause any problem, but it is unsightly to mention normally hidden refs, and it might have done a bit of unnecessary work to check that ref. Sponsored-by: Noam Kremen on Patreon	2021-08-24 12:58:14 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	1acdd18ea8	deal better with clock skew situations, using vector clocks * Deal with clock skew, both forwards and backwards, when logging information to the git-annex branch. * GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1) rather than needing to be advanced each time a new change is made. * Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex. When changing a file in the git-annex branch, the vector clock to use is now determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK when set), and comparing it to the newest vector clock already in use in that file. If a newer time stamp was already in use, advance it forward by a second instead. When the clock is set to a time in the past, this avoids logging with an old timestamp, which would risk that log line later being ignored in favor of "newer" line that is really not newer. When a log entry has been made with a clock that was set far ahead in the future, this avoids newer information being logged with an older timestamp and so being ignored in favor of that future-timestamped information. Once all clocks get fixed, this will result in the vector clocks being incremented, until finally enough time has passed that time gets back ahead of the vector clock value, and then it will return to usual operation. (This latter situation is not ideal, but it seems the best that can be done. The issue with it is, since all writers will be incrementing the last vector clock they saw, there's no way to tell when one writer made a write significantly later in time than another, so the earlier write might arbitrarily be picked when merging. This problem is why git-annex uses timestamps in the first place, rather than pure vector clocks.) Advancing forward by 1 second is somewhat arbitrary. setDead advances a timestamp by just 1 picosecond, and the vector clock could too. But then it would interfere with setDead, which wants to be overrulled by any change. So it could use 2 picoseconds or something, but that seems weird. It could just as well advance it forward by a minute or whatever, but then it would be harder for real time to catch up with the vector clock when forward clock slew had happened. A complication is that many log files contain several different peices of information, and it may be best to only use vector clocks for the same peice of information. For example, a key's location log file contains InfoPresent/InfoMissing for each UUID, and it only looks at the vector clocks for the UUID that is being changed, and not other UUIDs. Although exactly where the dividing line is can be hard to determine. Consider metadata logs, where a field "tag" can have multiple values set at different times. Should it advance forward past the last tag? Probably. What about when a different field is set, should it look at the clocks of other fields? Perhaps not, but currently it does, and this does not seems like it will cause any problems. Another one I'm not entirely sure about is the export log, which is keyed by (fromuuid, touuid). So if multiple repos are exporting to the same remote, different vector clocks can be used for that remote. It looks like that's probably ok, because it does not try to determine what order things occurred when there was an export conflict. Sponsored-by: Jochen Bartl on Patreon	2021-08-04 12:33:46 -04:00
Joey Hess	b3c4579c79	work around strange auto-init bug git-annex get when run as the first git-annex command in a new repo did not populate unlocked files. (Reversion in version 8.20210621) I am not entirely happy with this, because I don't understand how `428c91606b` caused the problem in the first place, and I don't fully understand how skipping calling scanAnnexedFiles during autoinit avoids the problem. Kept the explicit call to scanAnnexedFiles during git-annex init, so that when reconcileStaged is expensive, it can be made to run then, rather than at some later point when the information is needed. Sponsored-by: Brock Spratlen on Patreon	2021-07-30 18:36:03 -04:00
Joey Hess	748addbe05	remove second pass in scanAnnexedFiles The pass was needed to populate files when annex.thin was set, but in commit `73e0cbbb19`, reconcileStaged started to do that. So, this second pass is not needed any longer.	2021-07-30 17:46:11 -04:00
Joey Hess	d2aead67bd	fsck: Detect and correct stale or missing inode caches for object files An easy way to see this in action is to have an unlocked file, and touch the object file. While all code that compares inode caches for object files needs to be prepared for this kind of problem and fall back to verification, having fsck notice it and correct it is cheap (as long as fsck is being run anyway) and ensures that if it happens for some unusual reason, there's a way for the user to notice that it's happening. Not that, when annex.thin is in use, the earlier call to isUnmodified (and also potentially earlier calls to inAnnex in eg, verifyLocationLog) will fix up the same problem silently. That might prevent the warning being displayed, although probably it still will be, because the Database.Keys write of the InodeCache will be queued but will not have happened yet. I can't see a way to improve this, but it's not great. Sponsored-by: Dartmouth College's Datalad project	2021-07-29 14:06:42 -04:00
Joey Hess	817ccbbc47	split verifyKeyContent This avoids it calling enteringStage VerifyStage when it's used in places that only fall back to verification rarely, and which might be called while in TransferStage and be going to perform a transfer after the verification.	2021-07-29 13:58:40 -04:00
Joey Hess	3c5280b1cf	improve comment wording	2021-07-29 13:21:23 -04:00
Joey Hess	72a13d2a5f	remove unused parameter	2021-07-29 13:12:11 -04:00
Joey Hess	14683da9eb	fix potential race in updating inode cache Some uses of linkFromAnnex are inside replaceWorkTreeFile, which was already safe, but others use it directly on the work tree file, which was race-prone. Eg, if the work tree file was first removed, then linkFromAnnex called to populate it, the user could have re-written it in the interim. This came to light during an audit of all calls of addInodeCaches, looking for such races. All the other uses of it seem ok. Sponsored-by: Brett Eisenberg on Patreon	2021-07-27 13:08:08 -04:00
Joey Hess	e4b2a067e0	fix potential race in updating inode cache In Annex.Content, the object file was statted after pointer files were populated. But if annex.thin is set, once the pointer files are populated, the object file can potentially be modified via the hard link. So, it was possible, though seemingly very unlikely, for the inode of the modified object file to be cached. Command.Fix and Command.Fsck had similar problems, statting the work tree files after they were in place. Changed them to stat the temp file that gets moved into place. This does rely on .git/annex being on the same filesystem. If it's not, the cached inode will not be the same as the one that the temp file gets moved to. Result will be that git-annex will later need to do an expensive verification of the content of the worktree files. Note that the cross-filesystem move of the temp file already is a larger amount of extra work, so this seems acceptable. Sponsored-by: Luke Shumaker on Patreon	2021-07-27 12:29:10 -04:00
Joey Hess	4637592325	fix a place where the inode cache could potentially have gotten stale When git-annex lock repopulates the object file by copying an associated file that still has its content, it negected to update the inode cache. I was not able to actually get this code to successfully repopulate the object file; the associated file gets replaced with a dangling pointer before unlock is able to do that. (By what I'm not sure.. reconcileStaged?) Which might be itself a bug, but anyway this makes me doubtful that this was really leading to a stale inode cache. Still, in case there is some situation in which it does work, fixed it to update the inode cache.	2021-07-26 14:12:58 -04:00
Joey Hess	3d50b47ded	sync, merge: Added --allow-unrelated-histories option Which is the same as the git merge option. After last commit, this turns out to be needed in the test suite, and when doing git-annex import from special remote, followed by a git-annex merge. Sponsored-by: Svenne Krap on Patreon	2021-07-19 12:14:26 -04:00
Joey Hess	b6bea0d3f2	remove direct mode remnant of merging unrelated histories sync, merge, post-receive: Avoid merging unrelated histories, which used to be allowed only to support direct mode repositories. (However, sync does still merge unrelated histories when importing trees from special remotes, and the assistant still merges unrelated histories always.) See `556b2ded2b` for why this was added back in 2016, for direct mode. This is a behavior change, which might break something that was relying on sync merging unrelated histories, but git had a good reason to prevent it, since it's easy to foot shoot with it, and git-annex should follow suit. Sponsored-by: Noam Kremen on Patreon	2021-07-19 11:41:26 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	274d2380c7	better key matching with a regexp Handles keys that are substrings of other keys, as well as pointer files that contain a newline after the key. Note that -S does not match regexp, while -G does by default. Docs are not clear, determined experimentally. The only other difference in changing to -G is that if a file used to contain the key and changed in some way, while still containing the key, -G will match and -S would not. So eg, annex links that git annex fix rewrites will match, and files that change lock status will match. Which is an improvement anyway. Sponsored-by: Jochen Bartl on Patreon	2021-07-14 16:31:17 -04:00
Joey Hess	7a46bb1b28	change message to suggest using whereused --historical	2021-07-14 16:08:47 -04:00
Joey Hess	d6f056eca3	have whereused also check the reflog Since the stash is part of that, it can also find stashed content. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-07-14 16:05:20 -04:00
Joey Hess	fcd1b93a7d	whereused --historical Does not check the reflog, but otherwise works. It's possible for it to display something that is not an annexed file, if a non-annexed file somehow ends up containing something that looks like the key's name. This seems very unlikely to happen, and it would add a lot of complexity to detect it and somehow skip over that file, since the git log would need to either be run again, or not limited to 1 result and canceled once enough results have been read. Also, it kind of seems ok, if a file refers to a key, to consider that as a place the key was used, for some definition of used. So, I punted on dealing with that. May revisit later. Sponsored-by: Brock Spratlen on Patreon	2021-07-14 15:38:28 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	b9db859221	addurl: Avoid crashing when used on beegfs. Sponsored-by: Dartmouth College's DANDI project	2021-07-05 13:02:40 -04:00
Joey Hess	b8e32e200e	addurl, importfeed: Added --no-raw option Forces eg, download with youtube-dl without falling back to raw download. Since youtube-dl failing due to an url not being supported is difficult to distinguish from it failing due to being blocked in some way, this can be useful to avoid the fallback of git-annex downloading the raw web page and adding that. Since --raw also prevents using special remotes, --no-raw also allows special remote downloads. Although it's always possible that some special remote may claim an url and fall back to raw download of the content, which --no-raw cannot prevent. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-06-27 11:14:51 -04:00
Joey Hess	3a14648142	dropping unused marks as dead Dropping an object with drop --unused or dropunused will mark it as dead, preventing fsck --all from complaining about it after it's been dropped from all repositories. If another repository still has a copy, it won't be treated as dead until it's also dropped from there. The drop has to use --unused, can't be --key or something else, because this indicates that the user has recently ran git-annex unused. If it checked the unused log on every drop, bad things would happen when the unused log was out of date, eg a file used to be unused but then got re-added. Marking such a file as dead could be confusing. When the user uses --unused/dropunused, they must consider the unused information to be up-to-date. The particular workflow this enables is: git annex add foo git annex unannex foo git annex unused git annex drop --unused / dropunused git annex fsck --all # no warnings The docs for git-annex unannex say to use git-annex unused and dropunused, so the user should be pointed in this direction when they want to undo an accidental add. Sponsored-by: Brock Spratlen on Patreon	2021-06-25 15:22:26 -04:00
Joey Hess	1cc7b2661e	push synced/master before synced/git-annex sync: Partly work around github behavior that first branch to be pushed to a new repository is assumed to be the head branch, by not pushing synced/git-annex first. github expects master (or whatever the name is) to be pushed first, but git-annex sync can't, because it's got to also support pushes to non-bare repos where pushing master fails, as explained in the big comment. So pushing synced/master is not entirely a fix, but at least it makes github default to a branch with the stuff the user expects in it, not a bunch of annex log files. Aside from fixing github to not make this assumption, or improving the git push protocol to include what the current HEAD is, the only other approach I can think of is to identify git push's progress messages and display those when pushing master, while filtering out error messages about non-fast-forward etc. But git doesn't provide a way to separate out or identify its progress messages. Sponsored-by: Luke Shumaker on Patreon	2021-06-21 12:32:21 -04:00
Joey Hess	d2be68907c	drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies Eg, before with a .gitattributes like: .2 annex.numcopies=2 .1 annex.numcopies=1 And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2 would succeed, leaving just 1 copy, despite foo.2 needing 2 copies. It dropped foo.1 first and then skipped foo.2 since its content was gone. Now that the keys database includes locked files, this longstanding wart can be fixed. Sponsored-by: Noam Kremen on Patreon	2021-06-15 11:38:44 -04:00
Joey Hess	d164434679	fix build	2021-06-15 11:14:43 -04:00
Joey Hess	b3712b6047	refactor	2021-06-15 10:27:33 -04:00
Joey Hess	78da00c7a6	Future proof activity log parsing When the log has an activity that is not known, eg added by a future version of git-annex, it used to be treated as no activity at all, which would make git-annex expire think it should expire the repository, despite it having some kind of recent activity. Hopefully there will be no reason to add a new activity until enough time has passed that this commit is in use everywhere. Sponsored-by: Jake Vosloo on Patreon	2021-06-14 14:18:19 -04:00
Joey Hess	13b9a288d3	scanAnnexedFiles in smudge --update This makes git checkout and git merge hooks do the work to catch up with changes that they made to the tree. Rather than doing it at some later point when the user is not thinking about that past operation. Sponsored-by: Dartmouth College's Datalad project	2021-06-08 11:37:47 -04:00
Joey Hess	cedc28a783	prevent dropping required content of other file using same content When two files have the same content, and a required content expression matches one but not the other, dropping the latter file will fail as it would also remove the content of the required file. This will slow down drop (w/o --auto), dropunused, mirror, and move, by one keys db lookup per file. But I did include an optimisation to avoid a double db lookup in the drop --auto / sync --content case. I suspect that dropunused could also use PreferredContentChecked True, but haven't entirely thought it through and it's rarely used with enough files for the optimisation to matter. Sponsored-by: Dartmouth College's Datalad project	2021-05-25 11:34:06 -04:00
Joey Hess	a56b151f90	fix longstanding indeterminite preferred content for duplicated file problem * drop: When two files have the same content, and a preferred content expression matches one but not the other, do not drop the file. * sync --content, assistant: Fix an edge case where a file that is not preferred content did not get dropped. The sync --content edge case is that handleDropsFrom loaded associated files and used them without verifying that the information from the database was not stale. It seemed best to avoid changing --want-drop's behavior, this way when debugging a preferred content expression with it, the files matched will still reflect the expression. So added a note to the --want-drop documentation, to make clear it may not behave identically to git-annex drop --auto. While it would be possible to introspect the preferred content expression to see if it matches on filenames, and only look up the associated files when it does, it's generally fairly rare for 2 files to have the same content, and the database lookup is already avoided when there's only 1 file, so I did not implement that further optimisation. Note that there are still some situations where the associated files database does not get locked files recorded in it, which will prevent this fix from working. Sponsored-by: Dartmouth College's Datalad project	2021-05-24 14:07:05 -04:00
Joey Hess	428c91606b	include locked files in the keys database associated files Before only unlocked files were included. The initial scan now scans for locked as well as unlocked files. This does mean it gets a little bit slower, although I optimised it as well as I think it can be. reconcileStaged changed to diff from the current index to the tree of the previous index. This lets it handle deletions as well, removing associated files for both locked and unlocked files, which did not always happen before. On upgrade, there will be no recorded previous tree, so it will diff from the empty tree to current index, and so will fully populate the associated files, as well as removing any stale associated files that were present due to them not being removed before. reconcileStaged now does a bit more work. Most of the time, this will just be due to running more often, after some change is made to the index, and since there will be few changes since the last time, it will not be a noticable overhead. What may turn out to be a noticable slowdown is after changing to a branch, it has to go through the diff from the previous index to the new one, and if there are lots of changes, that could take a long time. Also, after adding a lot of files, or deleting a lot of files, or moving a large subdirectory, etc. Command.Lock used removeAssociatedFile, but now that's wrong because a newly locked file still needs to have its associated file tracked. Command.Rekey used removeAssociatedFile when the file was unlocked. It could remove it also when it's locked, but it is not really necessary, because it changes the index, and so the next time git-annex run and accesses the keys db, reconcileStaged will run and update it. There are probably several other places that use addAssociatedFile and don't need to any more for similar reasons. But there's no harm in keeping them, and it probably is a good idea to, if only to support mixing this with older versions of git-annex. However, mixing this and older versions does risk reconcileStaged not running, if the older version already ran it on a given index state. So it's not a good idea to mix versions. This problem could be dealt with by changing the name of the gitAnnexKeysDbIndexCache, but that would leave the old file dangling, or it would need to keep trying to remove it.	2021-05-21 16:24:37 -04:00
Joey Hess	24c7d9ba78	decided not to include export/import trees They're only needed to cover a gc edge case, and it's better someone gets caught by that edge case than that someone who does not know about them ends up with a filtered git-annex branch that contains such a tree when some of the files listed in it are ones they wanted to remove from the repository.	2021-05-17 14:12:15 -04:00
Joey Hess	2420910ab8	include info for sameas repos It's not currently possible to exclude a sameas repo using its annex-config-uuid. (Remote.nameToUUID rejects them). Since there's no real documented way to learn those, this seems ok, at least for now. Also it avoids the problem of someone excluding the parent but including the sameas, which would probably make the sameas repo not usable when using the filtered branch.	2021-05-17 14:04:14 -04:00
Joey Hess	984034f335	filter-branch working aside from some edge cases Added a note to man page about what happens to information that is recorded in the private journal. Since it uses Branch.get, that information will be copied when options allow. It seemed better to allow it and document it than not allow it, since the options allow excluding repositories and so can be used to exclude private repos if desired.	2021-05-17 13:24:58 -04:00
Joey Hess	1da9fe5bd8	implemented filter-branch for key info Not tested yet but should work. Noted a possible optimisation, which should probably be added, to speed it up in cases where there is no uuid filtering being done. It would need Annex.Branch to add a function like getRef that uses catFileDetails, so the sha is also returned. The difficulty would be making it support the precached file content; if it didn't it would probably not be any faster and could even be slower. So probably the precaching would need to be changed to also cache the sha.	2021-05-17 11:11:39 -04:00
Joey Hess	80a9944f3b	don't implicitly include all when exclude options are used This is less erorr-prone, and easier for the user to reason about; it preserves the man page's promise that only explicitly included information will be copied.	2021-05-14 14:14:46 -04:00
Joey Hess	a58c90ccf4	skeleton of filter-branch command, with option parser	2021-05-14 10:59:48 -04:00
Joey Hess	947d2a10bc	assistant: Fix a crash on startup by avoiding using forkProcess ghc 8.8.4 seems to have changed something that broke code that has been successfully using forkProcess since 2012. Likely a change to GC internals. Since forkProcess has never had clear documentation about how to use it safely, avoid using it at all. Instead, when git-annex needs to daemonize itself, re-run the git-annex command, in a new process group and session. This commit was sponsored by Luke Shumaker on Patreon.	2021-05-12 15:08:03 -04:00
Joey Hess	949627b902	remove inode cache in unannex Similar to what commit `675556fd9a` did for adding a non-annexed file, this prevents the smudge clean filter recognising the inode if git add is later run on the unannexed file.	2021-05-12 11:09:38 -04:00
Joey Hess	675556fd9a	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache. git-annex add was changed, when adding a small file, to remove the inode cache for it. This is necessary to keep the recipe in doc/tips/largefiles.mdwn for converting from annex to git working. It also avoids bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn which the earlier try at this change introduced.	2021-05-10 13:20:10 -04:00
Joey Hess	72a8bbce12	Revert "smudge: check for known annexed inodes before checking annex.largefiles" This reverts commit `424bef6b6f`. This commit caused other buggy behavior unfortunately.	2021-05-10 12:20:13 -04:00
Joey Hess	921753ac44	reinject: Error out when run on a file that is not annexed rather than silently skipping it	2021-05-07 13:31:03 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	1bd44c7742	Merge remote-tracking branch 'atemu/misc-fixes'	2021-05-07 11:23:54 -04:00
Kyle Meyer	4450fe3629	fromkey: create directory for pointer files too fromkey creates leading directories for symbolic links. Do the same for pointer files.	2021-05-07 11:10:06 -04:00
Atemu	a7f0014a53	Command/Multicast: use proper hyphen GHC was complaining about it possibly being a homoglyph: Command/Multicast.hs:111:36: error: warning: treating Unicode character <U+2212> as identifier character rather than as '-' symbol [-Wunicode-homoglyph] -- using a nice prime, namely 2521−1 but the sheer size of this ^ \| 111 \| -- using a nice prime, namely 2521−1 but the sheer size of this \| ^ 1 warning generated.	2021-05-04 05:44:31 +02:00
Joey Hess	424bef6b6f	smudge: check for known annexed inodes before checking annex.largefiles smudge: Fix a case where an unlocked annexed file that annex.largefiles does not match could get its unchanged content checked into git, due to git running the smudge filter unecessarily. When the file has the same inodecache as an already annexed file, we can assume that the user is not intending to change how it's stored in git. Note that checkunchangedgitfile already handled the inverse case, where the file was added to git previously. That goes further and actually sha1 hashes the new file and checks if it's the same hash in the index. It would be possible to generate a key for the file and see if it's the same as the old key, however that could be considerably more expensive than sha1 of a small file is, and it is not necessary for the case I have, at least, where the file is not modified or touched, and so its inode will match the cache.	2021-05-03 13:26:32 -04:00
Joey Hess	4588668a12	fromkey unlocked files support fromkey: Create an unlocked file when used in an adjusted branch where the file should be unlocked, or when configured by annex.addunlocked. There is some overlap with code in Annex.Ingest, however it's not quite the same because ingesting has a temp file with the content, where here the content, if any, is in the annex object file. So it eg, makes sense for Annex.Ingest to copy the execute mode of the content file, but it does not make sense for fromkey to do that. Also changed in passing to stage the file in git directly, rather than using git add. One consequence of that is that if the file is gitignored, it will still get added, rather than the old behavior: The following paths are ignored by one of your .gitignore files: ignored hint: Use -f if you really want to add them. hint: Turn this message off by running hint: "git config advice.addIgnoredFile false" git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123) That old behavior was a surprise to me, and so I consider it a bug, and doubt anyone would have relied on it. Note that, when on an --hide-missing branch, it is possible to fromkey a key that is not present (needs --force). The annex link or pointer file still gets written in this case. It doesn't seem to make any sense not to write it, because then fromkey would not do anything useful in this case, and this way the file can be committed and synced to master, and the branch re-adjusted to hide the new missing file. This commit was sponsored by Noam Kremen on Patreon.	2021-05-03 11:26:18 -04:00
Joey Hess	2b264b3edf	initremote --private	2021-04-23 14:47:46 -04:00
Joey Hess	d5a05655b4	Merge branch 'master' into hiddenannex	2021-04-23 13:06:33 -04:00
Joey Hess	0547884eb2	importfeed: fix bug while also speeding up 12x! * Fix bug that could make git-annex importfeed not see recently recorded state when configured with annex.alwayscommit=false. * importfeed: Made "checking known urls" phase run 12 times faster. The massive speedup is because it no longer queries for metadata accompanying each url. Instead it processes the whole git-annex branch and checks all metadata files for feed item ids, and uses any it finds. This could result in a behavior change, in an unlikely situation: If a feed id is recorded in a key's metadata, but the url gets removed, the old code would not see that item id and would re-download it if it finds an url for it in a feed, while the new code will see the item id. I don't think the old behavior was intentional, and it may be that the new behavior is better. Not gonna worry about this.	2021-04-23 12:36:56 -04:00
Joey Hess	b689f17062	refactoring	2021-04-23 11:44:10 -04:00
Joey Hess	da0a696c96	Revert "reorder another test" This reverts commit `3e63f00f63`.	2021-04-23 01:01:29 -04:00
Joey Hess	3e63f00f63	reorder another test continuing to try to narrow down cause of failure on windows	2021-04-22 10:03:35 -04:00
Joey Hess	9b870e29fd	Merge branch 'master' into hiddenannex	2021-04-21 13:04:40 -04:00
Joey Hess	39d94919cd	reorder tests debugging windows failure This order will work just as well, so no need to revert this change later.	2021-04-21 13:01:41 -04:00
Joey Hess	05989556a2	start implementing hidden git-annex repositories This adds a separate journal, which does not currently get committed to an index, but is planned to be committed to .git/annex/index-private. Changes that are regarding a UUID that is private will get written to this journal, and so will not be published into the git-annex branch. All log writing should have been made to indicate the UUID it's regarding, though I've not verified this yet. Currently, no UUIDs are treated as private yet, a way to configure that is needed. The implementation is careful to not add any additional IO work when privateUUIDsKnown is False. It will skip looking at the private journal at all. So this should be free, or nearly so, unless the feature is used. When it is used, all branch reads will be about twice as expensive. It is very lucky -- or very prudent design -- that Annex.Branch.change and maybeChange are the only ways to change a file on the branch, and Annex.Branch.set is only internal use. That let Annex.Branch.get always yield any private information that has been recorded, without the risk that Annex.Branch.set might be called, with a non-private UUID, and end up leaking the private information into the git-annex branch. And, this relies on the way git-annex union merges the git-annex branch. When reading a file, there can be a public and a private version, and they are just concacenated together. That will be handled the same as if there were two diverged git-annex branches that got union merged.	2021-04-20 15:04:53 -04:00
Joey Hess	5783a8d081	fsck: avoid redundant checksum when transfer is Verified When downloading content from a remote, if the content is able to be verified during the transfer, skip checksumming it a second time. Note that in this case, the fsck output does not include "(checksum)" which it does when the checksumming is done separately from the download. This commit was sponsored by Brock Spratlen on Patreon.	2021-04-14 13:22:54 -04:00
Joey Hess	805d325a8d	diffdriver: Support unlocked files	2021-04-08 14:32:09 -04:00
Joey Hess	13c090b37a	use fastDebug everywhere it can be used None of these are likely to yeild a noticable speedup though.	2021-04-06 15:41:24 -04:00
Joey Hess	1b645e1ace	added --debugfilter (and annex.debugfilter)	2021-04-05 15:31:10 -04:00
Joey Hess	aaba83795b	switch from hslogger to purpose-built Utility.Debug This uses a DebugSelector, rather than debug levels, which will allow for a later option like --debug-from=Process to only see debuging about running processes. The module name that contains the thing being debugged is used as the DebugSelector (in most cases; does not need to be a hard and fast rule). Debug calls were changed to add that. hslogger did not display that first parameter to debugM, but the DebugSelector does get displayed. Also fastDebug will allow doing debugging in places that are used in tight loops, with the DebugSelector coming from the Annex Reader essentially for free. Not done yet.	2021-04-05 13:40:31 -04:00
Joey Hess	c2f612292a	start splitting out readonly values from AnnexState Values in AnnexRead can be read more efficiently, without MVar overhead. Only a few things have been moved into there, and the performance increase so far is not likely to be noticable. This is groundwork for putting more stuff in there, particularly a value that indicates if debugging is enabled. The obvious next step is to change option parsing to not run in the Annex monad to set values in AnnexState, and instead return a pure value that gets stored in AnnexRead.	2021-04-02 15:51:44 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	c68ba7d893	whereis: Don't include yt: prefix when showing url to content retrieved with youtube-dl I don't think this was really intentional behavior. It may be that it was useful to include it so it could be passed to rmurl, since without it rmurl would not actually remove the url. Since that was changed earlier today, now seems like a good time to clean up the display of these urls. This commit was sponsored by Jochen Bartl on Patreon.	2021-03-22 19:56:24 -04:00
Joey Hess	637229c593	fix fsck --from --all to not fall over trying to check required content fsck: When --from is used in combination with --all or similar options, do not verify required content, which can't be checked properly when operating on keys. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-03-22 15:08:07 -04:00
Joey Hess	0af9d1dcb6	unregisterurl: remove all forms of an url, no matter what the downloader is set to unregisterurl: Fix a bug that caused an url to not be unregistered when it is claimed by a special remote other than the web. See commit `f175d4cc90` for rationalle.	2021-03-22 12:17:17 -04:00
Joey Hess	f175d4cc90	rmurl: remove all forms of an url, no matter what the downloader is set to * rmurl: When youtube-dl was used for an url, it no longer needs to be prefixed with "yt:" in order to be removed. * rmurl: If an url is both used by the web and also claimed by another special remote, fix a bug that caused the url to to not be removed. The youtube-dl change is a consequence of how the bug fix is implemented. But I also think it's the right thing to do. Consider that, before, git-annex addurl $url followed by git-annex rmurl $url would not remove the url in the case where youtube-dl was used. That was surprising behavior. In the unlikely case where a special remote claims an url, and it's been added using OtherDownloader, but it was also added already as a web url, it seems better for rmurl to remove both than to arbitrarily remove only one. And in the case the bug report was filed for, when an url was added as a web url, but a special remote now claims it, that should not prevent rmurl removing the web url. Calling setUrlMissing lets other callers of it behave differently. Probably the calls to it in eg, Remote.External and Remote.BitTorrent are fine, since they don't mangle the url and just remove what was provided, and the OtherDownloader form of a bittorrent url, respectively. I suspect unregisterurl needs to have a similar change made to rmurl, for similar reasons.	2021-03-22 12:09:15 -04:00
Joey Hess	8bae692486	better interface for catKey' It only needs the size, so don't require the other stuff. Should let it be used in more places, making things faster.	2021-03-16 14:52:23 -04:00
Joey Hess	6481991208	export --json: Fill in the file field Like import was using ActionItemWorkTreeFile, it's ok to use it for export, even though it might not correspond with a file in the work tree. And renamed it to ActionItemTreeFile to make that clearer. Note that when an export has to rename files, it still uses ActionItemOther, so file will still be null in that case, but as no file is being transferred, that seems ok.	2021-03-12 14:11:31 -04:00
Joey Hess	1cb154f457	avoid importing deleting submodule import: When the previously exported tree contained a submodule, preserve it in the imported tree so it does not get deleted. The export exclude log, which was used for non-preferred content, now also includes the submodules. Since the log format is git ls-tree output, this does not break backwards compatibility.	2021-03-12 13:31:21 -04:00
Joey Hess	f2a425bd92	export: When a submodule is in the tree to be exported, skip it.	2021-03-12 12:29:18 -04:00
Joey Hess	4fc5dbc942	update comment	2021-03-11 12:03:36 -04:00
Joey Hess	cdd512cd9f	simplify	2021-03-05 14:22:04 -04:00
Joey Hess	1b041f5c51	avoid logging location of GIT keys It's not necessary to log location of GIT keys, because these files are not annexed files and so git-annex will never need to get them. This corresponds to code in Annex.Import that already checked before updating the location log when handling deleted files. Older versions of git-annex that used SHA1 keys for non-annexed files also unncessarily updated the location log for them. GIT keys still appear in the git-annex branch for content identifier logs, so kept the documentation of them in backends.mdwn This commit was sponsored by Jake Vosloo on Patreon.	2021-03-05 14:12:34 -04:00
Joey Hess	fc61915230	use GIT keys for export of non-annexed files This solves the problem that import of such files gets confused and converts them back to annexed files. The import code already used GIT keys internally when it determined a file should not be annexed. So now when it sees a GIT key that export used, it already does the right thing. This also means that even older version of git-annex can import and will do the right thing, once a fixed version has exported. Still, there may be other complications around upgrades; still need to think it all through. Moved gitShaKey and keyGitSha from Key to Annex.Export since they're only used for export/import. Documented GIT keys in backends, since they do appear in the git-annex branch now. This commit was sponsored by Graham Spencer on Patreon.	2021-03-05 14:12:11 -04:00
Joey Hess	a14001785e	fix --branch combined with --unlocked or --locked Since it's using git ls-tree anyway, can just look at the file modes to see if they're unlocked or are symlinks.	2021-03-02 13:47:27 -04:00
Joey Hess	cbf94fd13d	prep for fixing find --branch --unlocked Added LinkType to ProvidedInfo, and unified MatchingKey with ProvidedInfo. They're both used in the same way, so there was no real reason to keep separate. Note that addLocked and addUnlocked still set matchNeedsFileName, because to handle MatchingFile, they do need it. However, they don't use it when MatchingInfo is provided. This should be ok, the --branch case will be able skip checking matchNeedsFileName, since it will provide a filename in any case.	2021-03-02 13:39:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	eb594c710e	unregisterurl: New command Implemented by generalizing registerurl. Without the implicit batch mode of registerurl since that is only a backwards compatability thing (see commit `1d1054faa6`).	2021-03-01 14:28:24 -04:00
Joey Hess	97ae474585	registerurl: Allow it to be used in a bare repository.	2021-03-01 14:03:03 -04:00
Joey Hess	a8b627d82b	uninit: Fix a small bug that left a lock file in .git/annex unannex using git queue caused the queue lock to be taken after uninit had cleaned out .git/annex. Flush the queue earlier to avoid.	2021-03-01 13:05:47 -04:00
Joey Hess	530e96b80e	fix unannex data overwrite bug unannex, uninit: When an annexed file is modified, don't overwrite the modified version with an older version from the annex This commit was sponsored by Mark Reidenbach on Patreon.	2021-02-22 13:35:00 -04:00
Joey Hess	62d5a73bdd	unannex, uninit: Avoid running git rm once per annexed file, for a large speedup.	2021-02-22 12:56:11 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	1b63132ca3	add searchPathContents And rename related functions for consistency.	2021-02-02 19:06:15 -04:00
Joey Hess	8d4eb2d34e	get: Improve output when failing to get a file fails showTriedRemotes lists the remotes it tried to access. So there's no need to list those again in "Try making some of these remotes available".	2021-01-29 15:11:19 -04:00
Joey Hess	6f78497572	When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked Missed this when implementing it because of the default case catching the new constructor. So, removed that default case to make sure future types of adjusted branches don't make the same mistake. Complicated by git-annex addurl --fast which adds the file whose content is not present, so it needs to stay unlocked when on such a branch. This commit was sponsored by Brock Spratlen on Patreon.	2021-01-28 12:47:46 -04:00
Joey Hess	d4aac64282	fix breakage caused by recent commit `34a535ebea` broke the test suite. Getting a file started failing in one case, because the annex object did not have its inode cached, so was not trusted to be unmodified. This adds something very similar to what was added to linkAnnex in commit `2e9341a47d` -- if there are not yet any inodes cached for a key, add the inode of the annex object when adding the inode of the unlocked file. Feels like this should be handled in a more principled way. How do we know the addInodeCaches call in getMoveRaceRecovery just above this change is currently correct? It doesn't add the annex object inode cache. Ah well, maybe sometime when I've not had my entire evening eaten by a reversion that the test suite caught as I was cooking dinner.	2021-01-25 21:22:18 -04:00
Joey Hess	47338bf270	support modifying and running git add on an unlocked file that used an URL key Avoids the smudge --clean filter failing because URL keys do not support genKey. Instead the modified content will be added using the default backend. This commit was sponsored by Jochen Bartl on Patreon.	2021-01-25 17:37:16 -04:00
Joey Hess	34a535ebea	adjust: Fix some bad behavior when unlocked files use URL keys. This avoids the smudge --clean filter failing on the URL keys. git checkout runs the post-checkout hook, which runs smudge --update. That populates all the pointer files, but it neglected to store their inode caches in the keys db. With that done, and the keys db flushed before smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap check can tell the file is not modified, so will not try to re-ingest it, which does not work with URL keys because they do not support genKey. It also seems possible that the isUnmodifiedCheap was also failing for non-URL keys, which would cause them to be re-ingested, leading to a lot of extra work. I have not verified that, but don't see why it wouldn't have happened. So this probably also speeds up checking out adjusted branches. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2021-01-25 17:25:42 -04:00
Joey Hess	6a30d04ece	Bug fix: export with -J could fail when two files had the same content. Exporting is done inside a call to writeLockDbWhile which guarantees there is only one process uploading to a given ExportLocation.	2021-01-13 14:50:48 -04:00
Joey Hess	09b0562ec3	test: avoid unnecessary tests of variants of git remote Configuring chunking and encryption for a git remote has no effect, so skip testing those variants in the TestRemote call. It would be better if TestRemote itself could do this, but it doesn't seem possible there. There is no way to look at a Remote and tell if it supports chunking or encryption. Note that, while the test suite displays output as it it's testing exporting, it actually skips doing anything for the tests when run on the git remote. So at least does not waste time even though the output is not ideal. This commit was sponsored by Noam Kremen on Patreon.	2021-01-11 13:43:55 -04:00
Joey Hess	8db09feeba	fix format of message newlines are eaten	2021-01-11 13:14:09 -04:00
Joey Hess	6a0030a110	Behavior change: git-annex trust now needs --force Since unconsidered use of trusted repositories can lead to data loss. Trusted has always been this way, but it used to be acceptable for git-annex to be set up so that data could be lost without using --force, and most or all other ways that can happen have already been eliminated. This commit was sponsored by Mark Reidenbach on Patreon.	2021-01-07 10:09:39 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	5ce61c6b2a	add: Significantly speed up adding lots of non-large files to git * add: Significantly speed up adding lots of non-large files to git, by disabling the annex smudge filter when running git add. * add --force-small: Run git add rather than updating the index itself, so any other smudge filters than the annex one that may be enabled will be used.	2021-01-04 13:12:28 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	6280af2901	generate more compact git-annex branch for imports Especially from borg, where the content identifier logs all end up being the same identical file! But also, for other imports, the location tracking logs can, in some cases, be identical files. Bonus optimisation: Avoid looking up (and parsing when set) GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to. Although the lookup does happen at startup even when no log will be written now.	2020-12-23 15:25:16 -04:00
Joey Hess	7916fc98a3	graft in imported tree to avoid gc Fix a bug that could prevent getting files from an importtree=yes remote, because the imported tree was allowed to be garbage collected.	2020-12-23 14:27:38 -04:00
Joey Hess	1574972ba9	make sync --content get from third-party populated remotes like borg	2020-12-23 12:10:39 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	15000dee07	improve thirdpartypopulated support May actually work now. Note that, importKey now has to add the size to the key if it's supposed to have size. Remote.Directory relied on the importer adding the size, which is no longer done, so it was changed; it was the only one. This way, importKey does not need to behave differently between regular and thirdpartypopulated imports.	2020-12-21 16:19:44 -04:00
Joey Hess	57b03630b3	support thirdPartyPopulated These don't have importTree in their config, because they don't support tree import, but they do still support import, and do not support export or key/value modification.	2020-12-21 13:49:47 -04:00
Joey Hess	771b6c64f0	Merge branch 'master' into borg	2020-12-18 16:05:09 -04:00
Joey Hess	e0062c4f93	build fix	2020-12-18 16:04:56 -04:00
Joey Hess	909318dcee	Merge branch 'master' into borg	2020-12-18 15:27:24 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	f62aee0525	fix handling of importtree-only remotes Don't want to try to use these remotes as key/value remotes, which will surely fail. It only recently became possible for importtree to be set w/o exporttree, so before this code was ok. (cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)	2020-12-18 15:13:30 -04:00
Joey Hess	53fd1564b1	improve synopsis	2020-12-17 12:51:49 -04:00
Joey Hess	2abda21123	update	2020-12-15 16:35:06 -04:00
Joey Hess	f29d49d478	check Remote.hasKeyCheap again In `cd1676d604`, it stopped using that to avoid surprising behavior when the location log and remote content were out of sync. But, it seems that may have changed some behavior users relied on as well, and also Remote.hasKeyCheap should be faster than checking then location log. So, try Remote.hasKeyCheap first, and only if it does not have the key, fall back to checking the location log. If the location log still thinks it's present, go ahead and try to get it, so the user will see a failure rather than silently skipping a file what whereis says is on the remote. This does make slightly slower the case where the remote does not have the key, and location log and Remote.hasKeyCheap agree, since it now checks both. But only 1 stat slower.	2020-12-15 14:44:00 -04:00
Joey Hess	00526a6739	pass along -c options to child git-annex processes	2020-12-15 10:49:29 -04:00
Joey Hess	ed68a2166d	importfeed: Avoid using youtube-dl when a feed does not contain an enclosure, but only a link to an url which youtube-dl does not support This is common in some feeds, which might mix some items with enclosures, with others that link to posts or whatever. Before this, it would try to use youtube-dl and fail, or if youtube-dl was not allowed, it would incorrectly complain that an url was supported by youtube-dl.	2020-12-15 01:13:21 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	4a8723246d	avoid transferrer committing the git-annex branch on shutdown The parent is will do it when it shuts down, and having both of them trying to do it at the same time seems like something good to avoid.	2020-12-11 16:16:07 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	cedad7b37d	refactor	2020-12-10 16:33:52 -04:00
Joey Hess	04c12aa6df	custom protocol for transferrer Rather than using Read/Show, which would force me to preserve data types into the future. I considered just deriving json and sending that, but I don't much like deriving json with data types that have named constructors (like Key does) because again it locks in data type details. So instead, used SimpleProtocol, with a fairly complex and unreadable protocol. But it is as efficient as the p2p protocol at least, and as future proof. (Writing my own custom json instances would have worked but I thought of it too late and don't want to do all the work twice. The only real benefit might be that aeson could be faster.) Note that, when a new protocol request type is added later, git-annex trying to use it will cause the git-annex transferrer to display a protocol error message. That seems ok; it would only happen if a new git-annex found an old version of itself in PATH or the program file. So it's unlikely, and all it can do anyway is display an error. (The error message could perhaps be improved..) This commit was sponsored by Jack Hill on Patreon.	2020-12-09 16:13:59 -04:00
Joey Hess	004a4f5fb1	factor out Types.Transferrer	2020-12-09 13:28:49 -04:00
Joey Hess	677003a6df	rename helper More consistent name with TransferrerPool	2020-12-09 13:24:24 -04:00
Joey Hess	05c0543e8e	move new interface to git-annex transfer This is to avoid breakage when upgrading or downgrading git-annex with a process running that uses the interface. It's better to keep the compatability code for a few years than worry about such breakage. This commit was sponsored by Brett Eisenberg on Patreon.	2020-12-09 12:33:56 -04:00
Joey Hess	fcc9e01556	finally using transferkeys Seems to work! Even progress bars. Have not tested prompting or various error message displays yet. transferkeys had to be made to operate in different modes for the Assistant and Annex monads. A bit ugly, but it did relegate that really ugly Database.Keys.closeDb in transferkeys to only the assistant code path. This commit was sponsored by Noam Kremen.	2020-12-07 16:18:26 -04:00
Joey Hess	4c47568876	refactoring This is groundwork for using git-annex transferkeys to run transfers, in order to allow stalled transfers to be interrupted and retried. The new upload and download are closer to what git-annex transferkeys does, so the plan is to make them use it. Then things that were left using upload' and download' won't recover from stalls. Notably, that includes import and export. But at least get/move/copy will be able to. (Also the assistant hopefully, but not yet.) This commit was sponsored by Jake Vosloo on Patreon.	2020-12-07 14:49:17 -04:00
Joey Hess	438d5be1f7	support prompt in message serialization That seems to be the last thing needed for message serialization. Although it's only used in the assistant currently, so hard to tell if I forgot something. At this point, it should be possible to start using transferkeys when performing transfers, which will allow killing a transferkeys process if a transfer times out or stalls. But that's for another day. This commit was sponsored by Ethan Aubin.	2020-12-04 14:54:09 -04:00
Joey Hess	7a9b618d5d	fix problem with last commit and assistant liftAnnex blocks all others calls, so avoid using it with a long-duration call to readResponse.	2020-12-04 12:20:04 -04:00
Joey Hess	cad147cbbf	new protocol for transferkeys, with message serialization Necessarily threw out the old protocol, so if an old git-annex assistant is running, and starts a transferkeys from the new git-annex, it would fail. But, that seems unlikely; the assistant starts up transferkeys processes and then keeps them running. Still, may need to test that scenario. The new protocol is simple read/show and looks like this: TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo")) TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9})) TransferOutput (OutputMessage "(checksum...) ") TransferResult True Granted, this is not optimally fast, but it seems good enough, and is probably nearly as fast as the old protocol anyhow. emitSerializedOutput for ProgressMeter is not yet implemented. It needs to somehow start or update a progress meter. There may need to be a new message that allocates a progress meter, and then have ProgressMeter update it. This commit was sponsored by Ethan Aubin	2020-12-03 16:21:20 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	631c8d3e5b	avoid redundant adjusted branch update in sync sync still does update it if the config would otherwise not, since it already did.	2020-11-16 15:13:48 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	26cf26caca	Merge branch 'master' into symlink-missing	2020-11-16 10:03:12 -04:00
Joey Hess	5a8d01f63e	examinekey: Added a "file" format variable For consistency with find, and for easier scripting.	2020-11-16 09:59:11 -04:00
Joey Hess	ccfa9b2dc4	make sync update --unlock-present branch	2020-11-13 15:04:34 -04:00
Joey Hess	e66b7d2e1b	rename to --unlock-present and better reverse adjusting An --unlock-present branch reverses back to a branch where all files that get modified or renamed become locked, even if they were originally unlocked. This is the same that reversing a --unlock branch works, and the new name makes that commonality more clear.	2020-11-13 14:56:43 -04:00
Joey Hess	3899e216af	Merge branch 'master' into symlink-missing	2020-11-13 14:19:45 -04:00
Joey Hess	a30030c4a6	move: Fix a regression in the last release that made move --to not honor numcopies settings This commit was sponsored by Svenne Krap on Patreon.	2020-11-13 14:19:32 -04:00
Joey Hess	c8e49c5ef5	git-annex adjust --lock-missing Like --hide-missing the branch does not get updated when content availability changes. Seems to basically work, but sync does not update it yet. Also, when a file is present and so unlocked, git mv followed by git-annex sync results in the basis branch being updated to contain the file with the new name, unlocked. This seems different than what happens in an adjusted unlocked branch, where the commit propigates back locked. Probably the reverse adjustment code needs to be improved to handle this case.	2020-11-13 13:39:44 -04:00
Joey Hess	7566aa6bc5	examinekey: Added --migrate-to-backend Note that, the way the SeekInput parser is written to support batch mode, it's actually possible to do git-annex examinekey "SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E While that might be kind of useful to support multiple migrations not using batch mode, I have not documented it. It would be better to take pairs of key and file in that case.	2020-11-12 14:09:14 -04:00
Joey Hess	12e32d1dee	examinekey: Added two new format variables: objectpath and objectpointer	2020-11-12 13:02:31 -04:00
Joey Hess	92b7b1964d	add warning on add of annex link Warn when adding a annex symlink or pointer file that uses a key that is not known to the repository, to prevent confusion if the user has copied it from some other repository. This commit was sponsored by Jake Vosloo on Patreon.	2020-11-10 12:10:51 -04:00
Joey Hess	e81bb05b25	add debug in two unusual situations	2020-11-09 17:52:06 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	4bcb4030a5	more RawFilePath conversion 580/645 This commit was sponsored by Jack Hill on Patreon.	2020-11-03 18:34:27 -04:00
Joey Hess	eb42cd4d46	more RawFilePath conversion 535/645 This commit was sponsored by Brett Eisenberg on Patreon.	2020-11-03 10:11:04 -04:00
Joey Hess	55400a03d3	more RawFilePath conversion This commit was sponsored by Luke Shumaker on Patreon.	2020-11-02 16:31:28 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	a108b00b33	testremote: Display exceptions when tests fail, to aid debugging	2020-10-23 15:41:57 -04:00
Joey Hess	0133b7e5a8	move: Improve resuming a move that was interrupted after the object was transferred In cases where numcopies checks prevented the resumed move from dropping the object from the source repository, it now relies on a log of recent moves to replicate the behavior of the interrupted command. Performance: Probably noticable impact, since it has to add to the log, check the log, and remove from the log. Seems worth it to avoid this annoying edge case. The log functions are pretty well optimised to avoid unncessary work. An performance improvement to make later would be to avoid cleanup doing anything if it's not written to the log file, and has confirmed that the log file does not contain the log line. This commit was sponsored by Jake Vosloo on Patreon.	2020-10-21 10:31:56 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	2dd38b6403	switch to Haskell2010 When I put in Haskell98 this spring, I was under the mistaken apprehension that ghc defaulted to that. But it actually its default is a third mode, which is closer to Haskell2010 but with some differences. The manual says "By default, GHC mainly aims to behave (mostly) like a Haskell 2010 compiler" Fixed two cases where the Haskell98 do indentation flexability let wrongly indented code build. That is one of the places where ghc does not behave like Haskell2010 by default. The other place that I think I was concerned about, is GHC manual section 19.1.1.3. Expressions and patterns. But that only seems to affect code using bottoms, so would only affect pure functions throwing an error, which I don't think git-annex does in many places as it's pretty horrid style. And it would only affect rare cases like shown in that section. If it did happen, it would mean that the error was not thrown before specifying Haskell98, and then was. Haskell2010 behaves the same as Haskell98. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-10-19 11:26:16 -04:00
Joey Hess	c56efbbdb6	import: Check gitignores when importing trees from special remotes It seemed best to do this, for consistency with every other way files can get into a git-annex repo. Although it's just a bit strange that a local .gitignore file affects the pseudo-commits made for the remote that's imported from. This commit was sponsored by Brett Eisenberg on Patreon.	2020-09-30 10:41:59 -04:00
Joey Hess	0033e08193	avoid a second traversal of the ImportableContents Do all filtering in one pass.	2020-09-30 10:10:03 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	1610d94776	addurl: Avoid a redundant git ignores check for speed Ensure that checkCanAdd is used everywhere a file is added to git, so git add is run with -f, presumably avoiding the work it would usually do to check ignores.	2020-09-29 13:00:41 -04:00
Joey Hess	658ea7ca3c	sync --no-content import from directory special remote sync: When run without --content, import without copying from importtree=yes directory special remotes. (Other special remotes may support this later as well.) This commit was sponsored by Svenne Krap on Patreon.	2020-09-28 15:29:08 -04:00
Joey Hess	3eaaec3113	consistently use importKey when available This avoids import with --no-content and with --content potentially generating two different trees, leading to a merge conflict when run in two different clones of a repo. And it's necessary groundwork to make git-annex sync --no-content import from special remotes that support importKey. Only the directory special remote currently supports importKey, and it generates the same key as git-annex usually does, so there is no behavior change for it. Future special remotes will need to take care when adding importKey, if it generates different keys. Added some warnings about that to comments. This commit was sponsored by Noam Kremen on Patreon.	2020-09-28 15:27:46 -04:00
Joey Hess	8b74f01a26	split ProvidedInfo and UserProvidedInfo The latter is for git-annex matchexpression and matching against it can throw an exception. Splitting out the former reduces the potential for mistakes and avoids needing to worry about matching against that throwing an exception. This is more groundwork for matching largefiles while importing, without downloading content. This commit was sponsored by Graham Spencer on Patreon.	2020-09-28 12:12:38 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	f624876dc2	remove zombie process in file seeking This was the last one marked as a zombie. There might be others I don't know about, but except for in the hypothetical case of a thread dying due to an async exception before it can wait on a process it started, I don't know of any. It would probably be safe to remove the reapZombies now, but let's wait and so that in its own commit in case it turns out to cause problems. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-25 11:38:42 -04:00
Joey Hess	ca454c47f2	explicitly wait for a git process Eliminate a zombie that was only cleaned up by the later zombie cleanup code. This is still not ideal, it would be cleaner if it used conduit or something, and if the thread gets killed before waiting, it won't stop the process. Only remaining zombies are in CmdLine.Seek	2020-09-25 11:03:12 -04:00
Joey Hess	051e16a945	remove debug print	2020-09-24 15:37:39 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	b45b37b088	wait for first pass to complete before second pass Otherwise the bloom filter may not be fully populated when the second pass starts, which could have led to incorrect behavior with --all -J, probably in very rare circumstances.	2020-09-24 14:23:25 -04:00
Joey Hess	167da965b9	remove obsolete comment	2020-09-24 14:22:56 -04:00
Joey Hess	c1b4d76e6b	make MatchFiles introspectable matchNeedsFileContent is not used yet, but shows how to add information about terminals. That one would be needed for https://git-annex.branchable.com/todo/sync_fast_import/ Note the tricky bit in Annex.FileMatcher.call where it folds over the included matcher to propagate the information. This commit was sponsored by Svenne Krap on Patreon.	2020-09-24 14:01:53 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	3457b526ef	make git-annex add --no-check-gitignore not skip ignored files, same as with --force	2020-09-18 13:33:35 -04:00
Joey Hess	d0b06c17c0	Added --no-check-gitignore option for finer grained control than using --force. add, addurl, importfeed, import: Added --no-check-gitignore option for finer grained control than using --force. (--force is used for too many different things, and at least one of these also uses it for something else. I would like to reduce --force's footprint until it only forces drops or a few other data losses. For now, --force still disables checking ignores too.) addunused: Don't check .gitignores when adding files. This is a behavior change, but I justify it by analogy with git add of a gitignored file adding it, asking to add all unused files back should add them all back, not skip some. The old behavior was surprising. In Command.Lock and Command.ReKey, CheckGitIgnore False does not change behavior, it only makes explicit what is done. Since these commands are run on annexed files, the file is already checked into git, so git add won't check ignores.	2020-09-18 13:19:13 -04:00
Joey Hess	fcf5d11c63	add "input" field to json output The use case of this field is mostly to support -J combined with --json. When that is implemented, a user will be able to look at the field to determine which of the requests they have sent it corresponds to. The field typically has a single value in its list, but in some cases mutliple values (eg 2 command-line params) are combined together and the list will have more. Note that json parsing was already non-strict, so old git-annex metadata --json --batch can be fed json produced by the new git-annex and will not stumble over the new field.	2020-09-15 16:22:44 -04:00
Joey Hess	2a3c2b1843	use Branch.name instead of hard coding the branch name Makes much more clear why ActionItemOther is being passed "git-annex".	2020-09-15 15:47:22 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	f4c4b89aa3	refactor Make all calls to git merge go through autoMergeFrom, in preparation for fine-tuning git merge's config for automatic merge conflict resolution. This commit was sponsored by Ryan Newton on Patreon.	2020-09-07 13:26:16 -04:00

... 2 3 4 5 6 ...

2667 commits